Thursday, April 15, 2010

Discovering XML for editing

There seems to be a stirring about the ‘next big thing’ for publishing. XML – extensible markup language – is starting to creep into editors’ vocabulary. Editors may have come across XML through Word 2007, which can save word processing documents in Microsoft’s version of this language. Perhaps they may have used native XML in a publishing organisation or had exposure to it with desktop publishing software. XML is being touted by various players in publishing as the next format in which to produce publications. ‘Single-source publishing’ is a term to describe the workflow that is based on XML technologies, where content stored in one source is used to produce publications in several formats and for various media – print, web, PDAs and so on.

Despite years of marketing spin that has promised XML will result in greater efficiency and reduced costs for producing publications by ‘reusing content’, implementation of such single-source workflow has been limited to large publishers – and even then it is used primarily for typesetting, practically at the end of the workflow. But is XML useful across the whole workflow? I am setting out to see how the format can be used more effectively, by implementing it right at the start – with editing.

My first taste of XML came two years ago, when I produced RSS news feeds for a web portal. It didn’t make any sense to me – I just downloaded another news feed, worked out where to put the text and ‘tags’ that defined the title, date and news, and uploaded the XML file to ‘go live’. I had some HTML experience, so understanding the structure of XML code wasn’t totally foreign. But then I landed a job as an editor with a publishing house, and I had to learn XML to edit publications. With specific in-house training, I took to it like the proverbial duck. I learned a new language and new terms – chunks, elements, attributes, nesting, validation. Despite using XML every day, it took six months to become comfortable with it – and there were still many aspects of markup I needed to discover.

Now 18 months down the track, I have an appreciation of how XML can be used to its potential – and on the other hand, where it is just plain awkward. Editing with XML gives you wider exposure to the publishing workflow because you need to consider the structure of documents and, at a basic level, you become involved in formatting text for output – traditionally the domain of the desktop publisher. Yet, with the experience of an enterprise-wide XML publishing system that aims to reduce the turnaround time for producing documents, it is obvious the workflow continues to parallel the conventional stages of production. There is still a necessity to work with Word documents in the early stages of editing, as authors may need to review edits in ‘track changes’. Typical of editing, a Word document may go back and forth between author and editor until a final draft. Once past this stage, editors then mark up the text in XML ‘chunks’ (files). This is effectively typesetting the text – the editor takes on the role of desktop publisher by copying and pasting from Word documents, and also specifying appropriate elements and attributes that aid in defining the final appearance of content. A draft PDF document is then generated, printed, proofread, more corrections made to the XML to modify incomplete markup, and a press-ready PDF is the end result. It is a laborious process, one that is likely similar amongst large publishers.

There are inefficiencies in such a workflow right from the start. Granted, an XML workflow is only as good as the software implemented to manage the content. But the necessity of needing to edit initially with Word, because authors provide and need to review proofread material in this format, is merely transplanting conventional editing practice into a single-source workflow – there is no time-saving solution at this stage. Subsequent typesetting of content by the editor, and generating PDF documents that are sent to authors for review, bring about further delays. Although the XML files may contain all the text and graphics, producing a draft PDF or RTF would be necessary for authors to review. This is ‘roundtripping’ – moving from one format to another then back – just to take in edits.

While such a workflow might be sustainable on a large scale, as for publishing houses, it is not likely to translate well when scaled down to single users or very small teams. The current state of XML software makes it almost feasible for editors to start moving into single-source publishing to supplement word processing-based editing. Almost, because although there is a considerable offering of well-developed and affordable editing and authoring packages that approach the basic functionality of word processors, there is no effective workflow that would reduce the time to produce documents.

If editors are to implement XML, they would need to include typesetting in their services – not only because XML standards and supporting publishing businesses are yet to be developed, but also because production time is most likely to be reduced when editors control the whole production process. The process of structured editing involves marking up content, which encroaches into the typesetting stage. There is a much larger aspect of formatting XML content for presentation – that of developing stylesheets – but for this discussion I will focus on a broad framework for editing workflow.

The figure shows a typically familiar workflow that commences with the author sending a document to the editor. The departure then from conventional editing is that the document is marked up as XML. The document must remain as XML to avoid roundtripping, so reducing the time to take in edits. This, of course, assumes the author has access to XML authoring software for reviewing the content. The editor would control the whole production process for expediency – because in a conventional workflow, a desktop publisher would typeset the document and set up styles with advice from the author, and this requires accurate scheduling to meet the press deadline. There are inevitable delays as the desktop publisher manages several publications and liaises with the editor.



The rationale for devolving the typesetting stage to the editor is that during editing markup, XML documents are partly formatted by selecting appropriate elements and attributes – ensuring valid document structure, for example, is one aspect of formatting for presentation. Because the editor’s markup partially achieves formatting, XML can encourage the editor to complete the formatting, by applying stylesheets to generate press-ready PDF documents. An additional consideration is the need for expertise in document design, preparation of graphics and development of stylesheets, which means collaboration with a desktop publisher would be required. Another imperative for the editor to assume greater control over production is that the infrastructure and expertise within the publishing industry is undeveloped to support XML production – so with very few XML publishers and virtually no XML designers, editors (who are increasingly diversifying into desktop publishing and graphic design) are probably best placed to start building the framework for XML publishing.

This broad framework is but a starting point to test and accommodate the totality of XML publishing – developing effective and efficient markup, stylesheets for various media, and typesetting options; trialling software for authoring; time trials to compare against conventional production workflow; and identifying specialists for aspects of editing, design and usability testing.

1 comment:

  1. Hi Dave,

    I'm hearing more and more lately about the copyeditor being the one to put manuscripts into XML format (i.e., tagging according to some chosen DTD/schema). I wonder if typesetters and desktop publishers will be doing that too. For example, “typesetters” in an XML publishing shop might convert MS Word manuscripts into XML (DocBook, DITA, whatever) and add whatever “design” markup is needed. The typesetter would then write or customize XSL transformations and FO code to produce output, whether it's for print, HTML or ebook. In short, typesetters would be experts in DTDs, schemas, XSLT, and FO rather than in InDesign or Quark XPress.

    I found a site describing a university press's XML workflow at https://authornet.cambridge.org/information/productionguide/stm/XML_workflow.asp. The “XML capture” is done by typesetters, and it looks as if copyediting is done on paper (typescripts). Is that how you would interpret their terminology?

    Another site at http://www.reallysi.com/newsletter14_1.htm has a few success stories about companies implementing XML workflows.

    I'm continuing to google XML publishing workflows, so I'll let you know if I find anything interesting.

    Cheers
    John

    ReplyDelete