XML vs DITA: What’s the Difference?

XML is a text-centric markup language derived from Standard Generalized Markup Language (SGML).

Making the move to structured authoring with XML offers many benefits for documentation departments. However, moving from an unstructured authoring environment to a structured authoring environment requires effort and resources, is time-consuming, and can be expensive.

DITA is an XML-based open-standard architecture for document representation. DITA can help with making the move to structured authoring.

What is XML?

XML is the acronym for Extensible Markup Language. It is a text-centric markup language derived from Standard Generalized Markup Language (SGML).

XML is used to store structured data, rather than to format or display information on a page. You can use XML to represent structured information for documents, books, data, manuscripts, and more. 

What is a Markup Language?

A markup language uses tags to define elements within a document.

Humans can read markup languages because they contain standard words, rather than programming code or syntax. The two most popular markup languages are XML and HyperText Markup Language (HTML).

What are Tags?

Tags are markup instructions enclosed in angle brackets e.g. <roots> and <note>.

Tags are examples of semantic markup: they describe the intended purpose or the meaning of the text they enclose.

The text between these instructions is the actual text of the document.

What is an XML file?

An XML file is a plain text file with the “.xml” extension.

You can incorporate different types of content into an XML file. For example, you can incorporate rich media content into XML through tags that identify the files in which the rich media content resides.

How Can You Open and Read XML files?

XML files are saved in a plain text format. You can use a standard text editor to view XML files.

How Can You Edit XML Files?

You can edit XML files with either a simple text editor or specialized XML editors. An XML editor can include tools for validating XML code, including:

  • Parsing XML code and displaying XML
  • Flagging text not enclosed within a tag, known as orphaned text
  • Identifying improper tags

Why is XML an Important Development?

XML is an important development because of the following two reasons:

  1. XML overcomes the inflexibility of and dependence on HTML, a single document type that was being used for tasks it was not designed for.
  2. XML overcomes the complexity of SGML whose syntax allows several powerful but hard-to-program options.

XML allows the flexible development of user-defined document types. It provides a robust, non-proprietary, persistent, and verifiable file format for the storage and transmission of text and data both on and off the Web; and it removes the more complex options of SGML, making it easier to program.

XML Benefits

XML offers the following benefits for technical communicators:

  • XML does not participate in displaying data, it only carries the data. In essence, it allows users to store data independently of how it will be presented.
  • Extensible: XML is an extensible markup and enables programmers to create custom tags for their applications, and describe the tags and their permitted use cases.
  • Open standard: XML is available as an open standard.
  • Redundancy: XML markup is verbose. For example, every end tag must be supplied, which enables computer programs to catch common errors such as incorrect nesting.
  • Self-describing: The readability of XML and the presence of element and attribute names in XML means that writers looking at an XML document often find it easy to understand the format. This also makes it easy to find mistakes.

What is DITA?

DITA is the acronym for Darwin Information Typing Architecture. It is  

  • An XML standard,
  • An architectural approach, and
  • A writing methodology

DITA History

The technical publications department at IBM developed DITA. In 2004, IBM donated DITA to the OASIS standards organization. The OASIS DITA Technical Committee now manages DITA.

DITA Features

DITA’s features include:

  • Modularity: With modular document development, topics can be re-used and any group of topics or elements can be treated as a modular document component. Instead of being created as one document, a large manual can be designed as a collection of different modules, and those modules can be arranged into different configurations to create different manuals. Modular manuals are easier to maintain and can be produced with efficiency.
  • Structured authoring: Defines and enforces consistent organization of information, which reduces authoring time and increases analysis time.
  • Information typing: Information is structured by topics with content models appropriate to the nature of the content. The three basic DITA information types are concept, task, and reference.
  • Minimalism: an approach that presents the reader with the smallest amount of information necessary to achieve the reader’s goals. The needs of the reader (or the learner), and not the system being documented, guide the information architecture and the writing style.
  • Inheritance: Enables specialization of information types. The three base information types (concept, task and reference) evolve from the topic proto information type, and inherit the characteristics of a shared base structure.
  • Topic-based: A topic-based architecture allows information reuse, and also makes translation and localization more efficient. DITA defines four types of topics:
    • Topic: Provides a generic structure for information
    • Concept: Contains background information and examples
    • Task: Includes procedures
    • Reference: Describes commands, parameters, and other features
  • Metadata: A special DITA file called a map or ditamap is used to specify topics included in a deliverable document. The ditamap does not store content; it contains pointers to the topics that contain content.

DITA Implementations

DITA Open Toolkit

DITA Open Toolkit (DITA-OT) is an open-source publishing engine for content authored in DITA. The toolkit’s extensible plug-in mechanism allows users to add their own transformations and customize the default output, which includes Eclipse Help, HTML5, Microsoft Compiled HTML Help, Markdown, PDF, (through XSL-FO), troff, XHTML, and XHTML with a JavaScript frameset

Developed by IBM, the distribution packages contain Ant, Apache FOP, Java, Saxon, and Xerces.

Several DITA authoring tools and DITA CMSs integrate the DITA Open Toolkit or parts of it, into their publishing workflows.

Standalone tools have also been developed to run the DITA-OT via a graphical user interface instead of the command line.

Oxygen XML Editor

The Oxygen XML Editor is a multi-platform XML Editor, XSLT/XQuery debugger, and profiler with Unicode support. It is a Java application and can run in Windows, Mac OS X, and Linux. It also has a version that can run as an Eclipse plugin

Oxygen XML includes schemas and DTDs for popular or major XML and Extensible Stylesheet Language (XSL) formats including DocBook (versions 4.0 and 5.0), TEI format, XSL Transformations (versions 1.0, 2.0 and 3.0), DITA, XHTML and HTML 5.

To learn about Oxygen XML Editor pricing, please visit the website.

DITA Benefits

  • DITA reduces content duplication and increases the reuse of information through modular writing.
  • DITA relies on open standards; using free options like the DITA Open Toolkit eliminates licensing costs for proprietary tools.
  • The Open Toolkit provides paths to multiple output types by default: PDF/print, HTML, HTML Help, JavaHelp, and Eclipse Help.
  • The DITA structure has elements with similar names to corresponding HTML tags, which can reduce the learning curve for those familiar with HTML.
  • The Open toolkit provides attribute-based conditional processing support: you can include or exclude content in output based on the values of attributes. For example, DITA provides a platform attribute. You can set that attribute to “win” for topics about the Windows version of a product; for Macintosh, you can use “mac” as the value. When you create a deliverable for the Windows version, you include all the “win” topics and exclude the “mac” ones (and vice versa for the Macintosh deliverable).
  • Growing adoption of the DITA standard means more writers are familiar with it.

DITA Drawbacks

  • The Open Toolkit is far from perfect: you should expect to hit a few snags even in implementations with few changes to the default environment.
  • Documentation for using DITA and the Open Toolkit is not complete or reliable.
  • Modifications you make to the Open Toolkit might not mesh well with later releases of the toolkit; this can be particularly problematic when a new release fixes a problem or adds a new feature you want to use.
  • The Open Toolkit provides no support for context-sensitive help by default.
  • The default output for HTML and online help is not attractive, and modifying the stylesheets in the Open Toolkit to improve the formatting can be difficult.
  • Default PDF output is rudimentary and not entirely reliable. Modifying the stylesheets in the Open Toolkit that control PDF processing is difficult.

Authoring Paradigms

An authoring paradigm presents technical writers with a particular view of the document model.

Unstructured Authoring

Writers use unstructured authoring to create content according to rules and approved styles described in style guides.

A style guide contains a documented approach to how the writing team is supposed to author content, including:

  • Voice and tone
  • Preferred writing style
  • The number of heading levels
  • Handling of images in documents
  • Punctuation, grammar, and spelling requirements

Adherence to the approved style guide is double-checked by editors. This manual process of ensuring style guide adherence is time-consuming.

Writers use desktop publishing tools for the unstructured authoring of documents. These tools, such as Microsoft Word, allow authoring and publishing from the same system. The tools integrate content and format, and the graphical user interface (GUI) of the tool is almost always What You See is What You Get (WYSIWYG). Desktop publishing tools give writers control over content presentation and delivery.

Structured Authoring

Structured authoring is a publishing paradigm that defines and enforces consistent organization of information.

Structured authoring incorporates:

  • Systematic labeling: Systematic and consistent labeling allows the identification of semantic elements by the reader. A semantic element is an element of code that uses words to represent what that element contains, in language that is easy for humans to understand.
  • Modular, topic-based architecture: Content is authored in topics, and each topic must make sense in its own right. A topic is authored as a unit, not as part of a larger document. Topic-based authoring enables large-scale content reuse. Writers can assemble topics from a single pool or repository into different deliverable documents. Topics can be used in different documents as long as they make sense when read in different contexts. Topic-based writing makes translation and localization more efficient because a topic needs translation only once.
  • Constrained writing environments: Constraints aim to simplify the authoring process by reducing the complexity of an information type.
  • Separation of content and form: Refers to the separation of content from presentation and delivery. Formatting and presentation are post-authoring considerations

Even though the desktop publishing paradigm based on unstructured authoring is popular, it has many disadvantages.

For e.g. when employees are asked to create materials for a single presentation, each piece of the created content originates and resides in different places throughout the organization. Over time, a lot of duplicate content is created and a lot of content becomes obsolete. Content that is scattered throughout the organization is difficult to find and difficult to maintain. Moreover, HR departments have to create and maintain training material for the different desktop publishing platforms preferred by employees.

This time-consuming, inefficient, and error-prone approach to content management is frustrating for individual employees and costly for organizations.

For these reasons, organizations are finding that structured content is a much more efficient and reliable way to generate, maintain and publish content.

How DITA Helps With Making the Move to Structured Authoring

Creating a structured authoring environment involves a lot of work. Before a documentation group can even begin with the implementation, it must analyze content to understand the required structure. This effort to analyze content is significant, and it is exclusive of the time required to implement the structured workflow.

This is where the use of DITA can help with making the move to structured authoring. Some documentation groups will find that the DITA structure is a close match for their requirements, and they can bypass most of the content modeling effort by adopting the standard.

The free and popular DITA Open Toolkit contains the files you need to implement a structured authoring environment based on the DITA standard. The toolkit includes the files that define the structure and the XSL templates that transform the XML content into output including HTML, PDF, and three types of online help.

Conclusion

XML is a markup language that you can use to implement structured authoring.

Especially for large documentation projects, structured authoring with XML offers numerous benefits for documentation departments such as increased productivity and significant cost savings.

Despite the benefits, making the move from an unstructured authoring environment to a structured authoring environment involves significant investments such as an XML editor, training, expertise (in-house/outsourced), and process implementation. This can become a time-consuming and expensive process.

DITA is an open standard that you can use to implement structured authoring in your organization. The DITA Open Toolkit (OT) is a vendor-independent, open-source implementation of the DITA standard. Many of the best-known XML editors and enterprise authoring solutions, such as Adobe FrameMaker and oXygen XML Editor, rely on DITA-OT to publish XML content.

You can use the free DITA Open Toolkit to publish XML content.  If required, you can also use other commercial DITA implementations for your documentation department. 

 

Josh Fechter
Josh is the founder of Technical Writer HQ and Squibler, a writing software. He had his first job in technical writing for a video editing software company in 2014. Since then, he has written several books on software documentation, personal branding, and computer hacking. You can connect with him on LinkedIn here.