XML is a text-centric markup language derived from Standard Generalized Markup Language (SGML).
Moving to structured authoring with XML offers many benefits for documentation departments. However, moving from an unstructured authoring environment to a structured authoring environment requires effort and resources, is time-consuming, and can be expensive.
DITA is an XML-based open-standard architecture for document representation. DITA can help move to structured authoring.
If you’re interested in learning more via video, then watch below. Otherwise, skip ahead.
What is XML?
XML is the acronym for Extensible Markup Language. It is a text-centric markup language derived from Standard Generalized Markup Language (SGML).
XML is used to store structured data, rather than to format or display information on a page. You can use XML to represent structured information for documents, books, data, manuscripts, and more.
What is a Markup Language?
A markup language uses tags to define elements within a document.
Humans can read markup languages because they contain standard words rather than programming code or syntax. XML and HyperText Markup Language (HTML) are the two most popular markup languages.
What are Tags?
Tags are markup instructions enclosed in angle brackets e.g. <roots> and <note>.
Tags are examples of semantic markup: they describe the intended purpose or the meaning of the text they enclose.
The text between these instructions is the actual text of the document.
What is an XML file?
An XML file is a plain text file with the “.xml” extension.
You can incorporate different types of content into an XML file. For example, you can incorporate rich media content into XML through tags that identify the files in which the rich media content resides.
How Can You Open and Read XML Files?
XML files are saved in a plain text format. You can use a standard text editor to view XML files.
How Can You Edit XML Files?
You can edit XML files with either a simple text editor or specialized XML editors. An XML editor can include tools for validating XML code, including:
- Parsing XML code and displaying XML
- Flagging text not enclosed within a tag, known as orphaned text
- Identifying improper tags
Why is XML an Important Development?
XML is an important development because of the following two reasons:
- XML overcomes the inflexibility of and dependence on HTML, a single document type that was being used for tasks it was not designed for.
- XML overcomes the complexity of SGML whose syntax allows several powerful but hard-to-program options.
XML allows the flexible development of user-defined document types. It provides a robust, non-proprietary, persistent, and verifiable file format for the storage and transmission of text and data both on and off the Web; and it removes the more complex options of SGML, making it easier to program.
XML Benefits
XML offers the following benefits for technical communicators:
- XML does not participate in displaying data; it only carries the data. In essence, it allows users to store data independently of how it will be presented.
- XML is an extensible markup and enables programmers to create custom tags for their applications and describe the tags and their permitted use cases.
- XML is an open standard.
- XML markup is verbose. For example, every end tag must be supplied, which enables computer programs to catch common errors such as incorrect nesting.
- The readability of XML and the presence of element and attribute names in XML means that writers looking at an XML document often find it easy to understand the format. This also makes it easy to find mistakes.
- XML has a robust ecosystem of tools and resources that support its use, including editors, parsers, validators, and libraries. This makes it easier for authors and developers to work with XML and integrate it into their applications and systems.
- XML also has a large and active community of users and developers who share best practices, resources, and support for XML authoring and development.
What is DITA?
DITA is the acronym for Darwin Information Typing Architecture. It is
- An XML standard
- An architectural approach, and
- A writing methodology
DITA History
The technical publications department at IBM developed the DITA specification. In 2004, IBM donated DITA to the OASIS standards organization. The OASIS DITA Technical Committee now manages DITA.
DITA Features
DITA’s features include:
- With modular document development, DITA topics can be re-used and any group of topics or elements can be treated as a modular document component. Instead of being created as one document, a large manual can be designed as a collection of different modules, and those modules can be arranged into different configurations to create different manuals. Modular manuals are easier to maintain and can be produced with efficiency.
- Defines and enforces consistent organization of information, which reduces authoring time and increases analysis time.
- Information is structured by topics with content models appropriate to the nature of the content. The three basic DITA information types are concept, task, and reference.
- An approach that presents the reader with the smallest amount of information necessary to achieve the reader’s goals. The needs of the reader (or the learner), and not the system being documented, guide the information architecture and the writing style.
- Enables specialization of information types. The three base information types (concept, task, and reference) evolve from the topic proto-information type, and inherit the characteristics of a shared base structure.
A topic-based DITA architecture allows information reuse and makes translation and localization more efficient. DITA defines four types of topics:
- Topic: Provides a generic structure for information
- Concept: Contains background information and examples
- Task: Includes procedures
- Reference: Describes commands, parameters, and other features
- A special DITA file called a map or DITA map is used to specify topics included in a deliverable document. The DITA map does not store content; it contains pointers to the topics that contain DITA content.
DITA Implementations
Here are some of the most popular DITA XML implementations.
DITA Open Toolkit
DITA Open Toolkit (DITA-OT) is an open-source publishing engine for content authored in DITA. The toolkit’s extensible plug-in mechanism allows users to add their transformations and customize the default output, which includes Eclipse Help, HTML5, Microsoft Compiled HTML Help, Markdown, PDF, (through XSL-FO), troff, XHTML, and XHTML with JavaScript frameset.
Developed by IBM, the distribution packages contain Ant, Apache FOP, Java, Saxon, and Xerces.
Several DITA authoring tools and DITA CMSs integrate the DITA Open Toolkit or parts of it, into their publishing workflows.
Standalone tools have also been developed to run the DITA-OT via a graphical user interface instead of the command line.
Oxygen XML Editor
The Oxygen XML Editor is a multi-platform XML Editor, XSLT/XQuery debugger, and profiler with Unicode support. It is a Java application and can run on Windows, Mac OS X, and Linux. It also has a version that can run as an Eclipse plugin.
Oxygen XML includes schemas and DTDs for popular or major XML and Extensible Stylesheet Language (XSL) formats including DocBook (versions 4.0 and 5.0), TEI format, XSL Transformations (versions 1.0, 2.0, and 3.0), DITA, XHTML and HTML 5. The editor supports multiple output formats like HTML, PDF, EPUB, or DITA.
To learn about Oxygen XML Editor pricing, please visit the website.
DITA Benefits
- DITA reduces content duplication and increases the reuse of information through modular writing.
- DITA relies on open standards; free options like the DITA Open Toolkit eliminate licensing costs for proprietary tools.
- The Open Toolkit provides paths to multiple output types by default: PDF/print, HTML, HTML Help, JavaHelp, and Eclipse Help.
- The DITA structure has elements with similar names to corresponding HTML tags, which can reduce the learning curve for those familiar with HTML.
- The Open toolkit provides attribute-based conditional processing support: you can include or exclude content in output based on the values of attributes. For example, DITA provides a platform attribute. You can set that attribute to “win” for topics about the Windows version of a product; for Macintosh, you can use “mac” as the value. When you create a deliverable for the Windows version, you include all the “win” topics and exclude the “Mac” ones (and vice versa for the Macintosh deliverable).
- The growing adoption of the DITA standard means more writers are familiar with it.
DITA Drawbacks
- The Open Toolkit is far from perfect: you should expect to hit a few snags even in implementations with few changes to the default environment.
- Documentation for using DITA and the Open Toolkit is not complete or reliable.
- Modifications you make to the Open Toolkit might not mesh well with later releases of the toolkit; this can be particularly problematic when a new release fixes a problem or adds a new feature you want to use.
- The Open Toolkit provides no support for context-sensitive help by default.
- The default output for HTML and online help is not attractive, and modifying the stylesheets in the Open Toolkit to improve the formatting can be difficult.
- Default PDF output is rudimentary and not entirely reliable. Modifying the stylesheets in the Open Toolkit that control PDF processing is difficult.
Authoring Paradigms
An authoring paradigm presents technical writers with a particular view of the document model.
Unstructured Authoring
Writers use unstructured authoring to create content according to rules and approved styles described in style guides.
A style guide contains a documented approach to how the writing team is supposed to author content, including:
- Voice and tone
- Preferred writing style
- The number of heading levels
- Handling of images in documents
- Punctuation, grammar, and spelling requirements
Editors double-check adherence to the approved style guide. This manual process of ensuring style guide adherence is time-consuming.
Writers use desktop publishing tools for the unstructured authoring of documents. These tools, such as Microsoft Word, allow authoring and publishing from the same system. The tools integrate content and format, and the graphical user interface (GUI) of the tool is almost always What You See is What You Get (WYSIWYG). Desktop publishing tools give writers control over content presentation and delivery.
Structured Authoring
Structured authoring is a publishing paradigm that defines and enforces consistent organization of information.
Structured authoring incorporates:
- Systematic and consistent labeling allows the reader to identify semantic elements. A semantic element is an element of code that uses words to represent what that element contains in language that is easy for humans to understand.
- Content is authored in topics, and each topic must make sense in its own right. A topic is authored as a unit, not as part of a larger document. Topic-based authoring enables large-scale content reuse. Writers can assemble topics from a single pool or repository into different deliverable documents. Topics can be used in different documents as long as they make sense when read in different contexts. Topic-based writing makes it easy to translate Dita projects and localize them because a topic needs translation only once.
- Constraints aim to simplify the authoring process by reducing the complexity of an information type.
- Refers to the separation of content from presentation and delivery. Formatting and presentation are post-authoring considerations.
Even though the desktop publishing paradigm based on unstructured authoring is popular, it has many disadvantages.
For example, when employees are asked to create materials for a single presentation, each piece of the content created originates and resides in different places throughout the organization. Over time, a lot of duplicate content is created and a lot of content becomes obsolete. Content that is scattered throughout the organization is difficult to find and difficult to maintain. Moreover, HR departments must create and maintain training material for the different desktop publishing platforms employees prefer.
This time-consuming, inefficient, and error-prone approach to content management is frustrating for individual employees and costly for organizations.
For these reasons, organizations are finding that structured content is a much more efficient and reliable way to generate, maintain, and publish content.
How DITA helps with Making the Move to Structured Authoring
Creating a structured authoring environment involves a lot of work. Before a documentation group can begin the implementation, it must analyze content to understand the required structure. This effort to analyze content is significant and excludes the time required to implement the structured workflow.
This is where the use of DITA can help with making a move to structured authoring. Some documentation groups will find that the DITA structure closely matches their requirements, and they can bypass most of the content modeling effort by adopting the standard.
The free and popular DITA Open Toolkit contains the files you need to implement a structured authoring environment based on the DITA standard. The toolkit includes the files that define the structure and the XSL templates that transform the DITA XML content into output, including HTML, PDF, and three types of online help.
Final Remarks
XML is a markup language that you can use to implement structured authoring.
Especially for large documentation projects, structured authoring with XML offers numerous benefits for documentation departments, such as increased productivity and significant cost savings.
Despite the benefits, moving from an unstructured to a structured authoring environment involves significant investments such as an XML editor, training, expertise (in-house/outsourced), and process implementation. This can become a time-consuming and expensive process.
DITA is an open standard that you can use to implement structured authoring in your organization. The DITA Open Toolkit (OT) is a vendor-independent, open-source implementation of the DITA standard. Many of the best-known XML editors and enterprise authoring solutions, such as Adobe FrameMaker and oXygen XML Editor, rely on DITA-OT to publish XML content.
You can use the free DITA Open Toolkit to publish XML content. If required, you can also use other commercial DITA implementations for your documentation department.
FAQs
Here, you will find frequently asked questions about XML and DITA.
When is DITA preferred over general-purpose XML?
DITA is specifically designed for technical documentation with a high need for content modularity, reuse, and multichannel publishing. It is preferable in situations requiring a standardized approach to managing complex documentation.
What are the key distinctions between DITA and XML document structures?
While XML is a flexible markup language that allows you to define custom tags, DITA provides a set of predefined tags for creating structured documentation. DITA’s structure is optimized for topic-based authoring, which is absent in general-purpose XML schemas.
What benefits does DITA offer for content reuse compared to XML?
DITA’s topic-based architecture and predefined structures promote content reuse, allowing you to maintain consistency and reduce duplication.
If you are new to technical writing and are looking to break into the industry, we recommend taking our XML Writing Certification Course, where you will learn the fundamentals of XML writing and managing documentation.