XML authoring refers to the use of XML for authoring technical documentation, books, and other types of technical publications and documents.
XML authoring differs from today's dominant authoring paradigm, which is desktop publishing. Documentation departments can increase productivity and realize significant cost savings by switching to structured authoring with XML.
What is XML?
XML is the acronym for Extensible Markup Language. It is a text-centric markup language derived from Standard Generalized Markup Language (SGML).
XML is used to store structured data, rather than to format or display information on a page. You can use XML to represent structured information for documents, books, data, manuscripts, and more.
XML is an important development because of the following two reasons:
- XML overcomes the inflexibility of and dependence on HTML, a single document type that was being used for tasks it was not designed for.
- XML overcomes the complexity of SGML whose syntax allows several powerful but hard-to-program options.
XML allows the flexible development of user-defined document types. It provides a robust, non-proprietary, persistent, and verifiable file format for the storage and transmission of text and data both on and off the Web; and it removes the more complex options of SGML, making it easier to program.
XML offers the following benefits for technical communicators:
- XML does not participate in displaying data, it carries the data. In essence, it allows users to store data independently of how it will be presented.
- XML is an extensible markup and enables programmers to create custom tags for their applications and describe the tags and their permitted use cases.
- XML is available as an open standard.
- XML markup is verbose. For example, every end tag must be supplied, which enables computer programs to catch common errors such as incorrect nesting.
- The readability of XML and the presence of element and attribute names in XML means that writers looking at an XML document often find it easy to understand the format. This also makes it easy to find mistakes.
An authoring paradigm presents technical writers with a particular view of the document model.
Writers use unstructured authoring to create content according to rules and approved styles described in style guides.
A style guide contains a documented approach on how the writing team is supposed to author content, including:
- Voice and tone
- Preferred writing style
- Number of heading levels
- Handling of images in documents
- Punctuation, grammar, and spelling requirements
Adherence to the approved style guide is double-checked by editors. This manual process of ensuring style guide adherence is time-consuming.
Writers use desktop publishing tools for the unstructured authoring of documents. These tools, such as Microsoft Word, allow authoring and publishing from the same system. The tools integrate content and format, and the graphical user interface (GUI) of the tool is almost always What You See is What You Get (WYSIWYG). Desktop publishing tools give writers control over content presentation and delivery.
Structured authoring is a publishing paradigm that defines and enforces consistent organization of information.
Structured authoring incorporates the following:
- Systematic labeling: Systematic and consistent labeling allows easy identification of semantic elements by the readers. A semantic element is an element of code that uses words to represent what that element contains, in language that is easy for humans to understand.
- Modular, topic-based architecture: Content is authored in topics, and each topic must make sense in its own right. A topic is authored as a unit, not as part of a larger document. Topic-based authoring enables large-scale content re-use. Writers can assemble topics from a single pool or repository into different deliverable documents. Topics can be used in different documents as long as they make sense when read in different contexts. Topic-based writing makes translation and localization more efficient because a topic needs translation only once.
- Constrained writing environments: Constraints aim to simplify the authoring process by reducing the complexity of an information type.
- Separation of content and form: Structured authoring separates content from presentation and delivery. Formatting and presentation are post-authoring considerations.
XML and Structured Authoring
Structured authoring is a concept or paradigm.
XML is a technology that you can use to implement structure authoring. For XML, the structure and the legal elements and attributes of a document are defined in a Document Type Definition or DTD.
Today, the terms "structured authoring" and "XML" are often used interchangeably.
XML Authoring Benefits
XML authoring offers multiple benefits for technical writers and communicators:
- Content reuse: The main benefit of using XML is content reuse. Time and effort savings are made when information handling is reduced. Documentation departments can realize time and cost savings if authoring, editing, and updating are done once for many documents. Every time you do not have to touch a document to update it is time saved, which converts to savings for the company. During the documentation design phase, documentation teams can determine the content that can be reused. For example, they can determine whether the same product description can be used for marketing brochures and engineering manuals, or whether a certain installation procedure be used for both the production and service manuals. A lot of the similar content is often rewritten for different departments within the same company. One of the additional benefits of content reuse is consistency within the company message. Especially for regulated industries, this consistency makes the audit trail smoother.
- Quick Updates: Every time technical writers need to update content, they only have to update the relevant content, and all documents that use the content will get updated. This is far more efficient compared to unstructured authoring where each individual document requires separate updating.
- Quick Edits: Similar to updating, editing for documents created with structured editing technologies, such as XML, is much more efficient compared to documents created with unstructured authoring tools.
- Translation: Structured authoring with XML provides significant cost savings for translation and localization services. This is because all content that is reused for multiple manuals and documents is only translated once. Content is not only common to documentation for a single product but is often common to documentation for similar products and for different product models from the same product line. The translation savings are in direct proportion to the amount of content that is common among different documents. The translation savings are also in direct proportion to the number of languages documentation is translated into.
- Easier Search: Finding a specific piece of content in a document, e.g text, table, graphic, etc, is harder to find. Tagging in XML gives context to content and makes it much easier for users to find.
- Universal format: If you acquire another company, as long as their content (and your content) both use an XML format, you can integrate their documentation into your own, with your own branding and formatting. This is an example of how XML allows easier exchange and reuse of content.
- Predictability: The predictable format allows for easier authoring and comprehension.
- Structured content such as XML follows a predictable format that enables easier authoring. It also reduces the number of decisions technical writers must make as they author content.
- When readers can predict the format, it also aids comprehension.
Best XML Authoring Tools
An XML authoring tool or structured authoring tool is a text editor that you can use with a markup language such as XML to “tag” content based on a predefined structure or set of rules laid down in a DTD.
Oxygen XML Editor
Oxygen XML Editor is a great XML authoring tool that you can use to create XML files and XML documents. It offers multiple platforms for XML editing. The tool is renowned among developers as an advanced solution for technical authoring and development. Oxygen XML Editor features an advanced set of editing tools in addition to numerous other helpful tools.
Oxygen XML Editor includes features such as Web Help, XML Author, XML Editor, XML Web Author, and XML Developer. From simple authoring to development and editing, Oxygen XML Editor makes it simple for all types of technical communication projects.
Notepad++ is a free text editor with a ready plugin for editing XML files. It helps users copy, paste, and highlight text in XML files. The tool also enables users to work on multiple files at the same time. Notepad++ is based on C++ and the editing component Scintilla and has a GPL License. It supports code formatting, code folding, syntax highlighting, and auto-completing functionality for scripting, programming, and markup languages. In addition, Notepad++ features a Color Coding feature to differentiate content from code in an XML file.
One of the drawbacks of Notepad++ is that it does not support functionality for syntax hacking or code completion. To edit libXML2-XML documents, you can add the XML Tools Plugin to Notepad++.
Notepad++ helps users define Macros for applying bulk actions to multiple XML files. The tool also supports a ‘Pretty Print layout’ for defining, structuring, and organizing XML files.
XML Notepad is an open-source editor for XML. It boasts a user-friendly interface for browsing and editing XML documents.
Some of the features supported by XML Notepad are:
- Unlimited cut/copy/paste
- Incremental search system in text and tree views
- Tree and Node Text views, synchronized
- Quick editing
- Drag and drop
In addition, the tool supports configurable fonts and colors, integrated XML diff tool, and support for custom editors for date, Time, etc. It is one of the best tools for large XML documents. XML Notepad also provides users with XSD schema information and support for XInclude.
XML Notepad’s toolbar buttons provide convenience for handling the movement of nodes on the tree. It is one of the best tools for developers and technical writers alike, given that it provides intelligence-based elements and values.
Even though the desktop publishing paradigm based on unstructured authoring is popular, it has many disadvantages.
For e.g when employees are asked to create materials for a single presentation, each piece of the created content originates and resides in a different place throughout the organization. Over time, a lot of duplicate content is created and a lot of content becomes obsolete. Content that is scattered throughout the organization is difficult to find and difficult to maintain. Moreover, HR departments have to create and maintain training material for the different desktop publishing platforms preferred by employees.
This time-consuming, inefficient, and error-prone approach to content management is frustrating for individual employees and costly for organizations.
For these reasons, organizations are finding that structured content is a much more efficient and reliable way to generate, maintain and publish content.
According to Scott Abel's benchmarking survey published in 2012, 44% of companies were using structured XML content, and 81% of those companies were using DITA. According to DITAWriter, more than 770 companies are already using DITA in 2022. DITA is a popular XML-based authoring model for creating and publishing content.
Making the shift to structured authoring with XML does involve costs such as an XML editor software, training, expertise (in-house/outsourced), and process implementation. For the long term, however, the shift to XML provides multiple benefits such as ease and reduced cost of document maintenance and new document creation. The benefits increase further when translation is factored into the equation.
What is the difference between SGML, XML, and HTML?
SGML and XML are metalanguages. HTML, XHTML, and HTML5 are all applications of SGML/ XML.
SGML is the "mother tongue", and has been used for describing different document types from transcripts of ancient manuscripts to technical documentation, patients’ medical records, and even musical notation. SGML is large and complex, and overkill for most common office desktop applications.
XML is an abbreviated version of SGML, to make it easier to use over the Web, easier for you to define your own document types, and easier for programmers to write programs to handle them.
HTML, XHTML, and HTML5 are XML applications most frequently used on the Web.
What is DITA?
The OASIS Open Darwin Information Typing Architecture (DITA) is a standard XML-based architecture for representing documents. DITA provides architectural features for content modularity, content reuse, and controlled extension of document vocabularies in a way that ensures interoperability of DITA documents.
The DITA architecture was developed inside IBM for IBM technical publications and was donated to OASIS Open in 2004. DITA is an OASIS Open standard first published in 2005 and last updated in 2015 with the publication of version 1.3.
Josh is the founder of Technical Writer HQ and Squibler, a writing software. He had his first job in technical writing for a video editing software company in 2014. Since then, he has written several books on software documentation, personal branding, and computer hacking. You can connect with him on LinkedIn here.