Our reviewers evaluate career opinion pieces independently. Learn how we stay transparent, our methodology, and tell us about anything we missed.
XML is a text-centric markup language derived from Standard Generalized Markup Language (SGML).
Moving to structured authoring with XML offers many benefits for documentation departments. However, moving from an unstructured authoring environment to a structured authoring environment requires effort and resources, is time-consuming, and can be expensive.
DITA is an XML-based open-standard architecture for document representation. DITA can help move to structured authoring.
XML is the acronym for Extensible Markup Language. It is a text-centric markup language derived from Standard Generalized Markup Language (SGML).
XML is used to store structured data, rather than to format or display information on a page. You can use XML to represent structured information for documents, books, data, manuscripts, and more.
A markup language uses tags to define elements within a document.
Humans can read markup languages because they contain standard words rather than programming code or syntax. XML and HyperText Markup Language (HTML) are the two most popular markup languages.
Tags are markup instructions enclosed in angle brackets e.g. <roots> and <note>.
Tags are examples of semantic markup: they describe the intended purpose or the meaning of the text they enclose.
The text between these instructions is the actual text of the document.
An XML file is a plain text file with the “.xml” extension.
You can incorporate different types of content into an XML file. For example, you can incorporate rich media content into XML through tags that identify the files in which the rich media content resides.
XML files are saved in a plain text format. You can use a standard text editor to view XML files.
You can edit XML files with either a simple text editor or specialized XML editors. An XML editor can include tools for validating XML code, including:
XML is an important development because of the following two reasons:
XML allows the flexible development of user-defined document types. It provides a robust, non-proprietary, persistent, and verifiable file format for the storage and transmission of text and data both on and off the Web; and it removes the more complex options of SGML, making it easier to program.
![]()
XML offers the following benefits for technical communicators:
![]()
DITA is the acronym for Darwin Information Typing Architecture. It is
The technical publications department at IBM developed the DITA specification. In 2004, IBM donated DITA to the OASIS standards organization. The OASIS DITA Technical Committee now manages DITA.
DITA’s features include:
A topic-based DITA architecture allows information reuse and makes translation and localization more efficient. DITA defines four types of topics:
![]()
Here are some of the most popular DITA XML implementations.
DITA Open Toolkit (DITA-OT) is an open-source publishing engine for content authored in DITA. The toolkit’s extensible plug-in mechanism allows users to add their transformations and customize the default output, which includes Eclipse Help, HTML5, Microsoft Compiled HTML Help, Markdown, PDF, (through XSL-FO), troff, XHTML, and XHTML with JavaScript frameset.
Developed by IBM, the distribution packages contain Ant, Apache FOP, Java, Saxon, and Xerces.
Several DITA authoring tools and DITA CMSs integrate the DITA Open Toolkit or parts of it, into their publishing workflows.
Standalone tools have also been developed to run the DITA-OT via a graphical user interface instead of the command line.
The Oxygen XML Editor is a multi-platform XML Editor, XSLT/XQuery debugger, and profiler with Unicode support. It is a Java application and can run on Windows, Mac OS X, and Linux. It also has a version that can run as an Eclipse plugin.
Oxygen XML includes schemas and DTDs for popular or major XML and Extensible Stylesheet Language (XSL) formats including DocBook (versions 4.0 and 5.0), TEI format, XSL Transformations (versions 1.0, 2.0, and 3.0), DITA, XHTML and HTML 5. The editor supports multiple output formats like HTML, PDF, EPUB, or DITA.
To learn about Oxygen XML Editor pricing, please visit the website.
An authoring paradigm presents technical writers with a particular view of the document model.
Writers use unstructured authoring to create content according to rules and approved styles described in style guides.
A style guide contains a documented approach to how the writing team is supposed to author content, including:
Editors double-check adherence to the approved style guide. This manual process of ensuring style guide adherence is time-consuming.
Writers use desktop publishing tools for the unstructured authoring of documents. These tools, such as Microsoft Word, allow authoring and publishing from the same system. The tools integrate content and format, and the graphical user interface (GUI) of the tool is almost always What You See is What You Get (WYSIWYG). Desktop publishing tools give writers control over content presentation and delivery.
Structured authoring is a publishing paradigm that defines and enforces consistent organization of information.
Structured authoring incorporates:
Even though the desktop publishing paradigm based on unstructured authoring is popular, it has many disadvantages.
For example, when employees are asked to create materials for a single presentation, each piece of the content created originates and resides in different places throughout the organization. Over time, a lot of duplicate content is created and a lot of content becomes obsolete. Content that is scattered throughout the organization is difficult to find and difficult to maintain. Moreover, HR departments must create and maintain training material for the different desktop publishing platforms employees prefer.
This time-consuming, inefficient, and error-prone approach to content management is frustrating for individual employees and costly for organizations.
For these reasons, organizations are finding that structured content is a much more efficient and reliable way to generate, maintain, and publish content.
Creating a structured authoring environment involves a lot of work. Before a documentation group can begin the implementation, it must analyze content to understand the required structure. This effort to analyze content is significant and excludes the time required to implement the structured workflow.
This is where the use of DITA can help with making a move to structured authoring. Some documentation groups will find that the DITA structure closely matches their requirements, and they can bypass most of the content modeling effort by adopting the standard.
The free and popular DITA Open Toolkit contains the files you need to implement a structured authoring environment based on the DITA standard. The toolkit includes the files that define the structure and the XSL templates that transform the DITA XML content into output, including HTML, PDF, and three types of online help.
XML is a markup language that you can use to implement structured authoring.
Especially for large documentation projects, structured authoring with XML offers numerous benefits for documentation departments, such as increased productivity and significant cost savings.
Despite the benefits, moving from an unstructured to a structured authoring environment involves significant investments such as an XML editor, training, expertise (in-house/outsourced), and process implementation. This can become a time-consuming and expensive process.
DITA is an open standard that you can use to implement structured authoring in your organization. The DITA Open Toolkit (OT) is a vendor-independent, open-source implementation of the DITA standard. Many of the best-known XML editors and enterprise authoring solutions, such as Adobe FrameMaker and oXygen XML Editor, rely on DITA-OT to publish XML content.
You can use the free DITA Open Toolkit to publish XML content. If required, you can also use other commercial DITA implementations for your documentation department.
Here, you will find frequently asked questions about XML and DITA.
DITA is specifically designed for technical documentation with a high need for content modularity, reuse, and multichannel publishing. It is preferable in situations requiring a standardized approach to managing complex documentation.
While XML is a flexible markup language that allows you to define custom tags, DITA provides a set of predefined tags for creating structured documentation. DITA’s structure is optimized for topic-based authoring, which is absent in general-purpose XML schemas.
DITA’s topic-based architecture and predefined structures promote content reuse, allowing you to maintain consistency and reduce duplication.
If you are new to technical writing and are looking to break into the industry, we recommend taking our XML Writing Certification Course, where you will learn the fundamentals of XML writing and managing documentation.
Get the weekly newsletter keeping 23,000+ technical writers in the loop.
Get certified in technical writing skills.
Get our #1 industry rated weekly technical writing reads newsletter.