Sunday, April 25, 2010

What is a web-based document?

Apple’s iPad has certainly stirred up discussions about ebooks again. It seems every time a new ebook reader is introduced (like the Kindle), the talk about ebooks hots up – either ‘the book is dead’ or ‘you can’t curl up in bed with a laptop’ depending on which side of the digital divide you’re arguing for. And with that talk people question the format of ebooks. The latest is that epub will be ‘the’ format for ebooks. Have a look at http://www.abc.net.au/rn/breakfast/stories/2010/2815124.htm for instance.

Being a novice to the whole ebooks idea – I don’t have a reader or even an iPhone – I wonder where the epub format will leave PDF. I think PDF is great. I have downloaded a few PDF ebooks because I read them on my laptop at home (where I have the time to read) – the wide colour screen is convenient for the page size and readability. Others think PDF is great. I have prepared a few reports as PDF for clients, and they then put those publications on their website as downloadable documents. PDF is one of the main formats of what a web document is right now. Readers can download it and read it onscreen or print out relevant pages – it’s such a convenient format. I haven’t come across anyone who has asked for a HTML version of their document. Probably because it means creating an entirely new document and generating graphics, text and stylesheets to display the same content as a PDF (and who wants to pay twice for two different formats of the same document?).

On to the other format – I have tried to read HTML documents and given up in frustration. Well, okay, I once took a graduate course where all the course notes were provided as hyperlinked pages, and I ended up copying and pasting each page into Word. Not my only experience at trying to download web pages, mind you. HTML documents are designed to be read while you are online, and you might not be able to easily save an entire publication unless the creator has been kind enough to provide a downloadable zip file of all pages. If this is the best that HTML has to offer in the world of reading – despite the attractiveness of being able to handle rich media content – then I can see it dying a natural death as an ebook format. The very structure of HTML pages is a turn-off because I can’t download a HTML document and read it at my leisure. Offline. When I’m not paying big dollars for my broadband access.

I really look forward to seeing epub in action, especially if I can download content then view it offline. I may even think about getting an iPhone…

Friday, April 16, 2010

AODC Conference

I’ll be heading to the Australasian Online Documentation and Content Conference in Darwin, 12-14 May. This looks like it’ll be a great opportunity to catch up on the latest in online content development (and soak up the tropical clime). I am looking forward in particular to the sessions on DITA schema – this is constantly developing as an XML schema and is looking like a strong contender in XML authoring, to sit alongside the DocBook schema. I also want to find out about scalable vector graphics (SVG) – I’m hoping this will replace the many image formats we currently have to contend with in XML (e.g. JPEG, EPS, TIFF). So keep an eye on my blog - I’ll post a conference report soon.

Thursday, April 15, 2010

Discovering XML for editing

There seems to be a stirring about the ‘next big thing’ for publishing. XML – extensible markup language – is starting to creep into editors’ vocabulary. Editors may have come across XML through Word 2007, which can save word processing documents in Microsoft’s version of this language. Perhaps they may have used native XML in a publishing organisation or had exposure to it with desktop publishing software. XML is being touted by various players in publishing as the next format in which to produce publications. ‘Single-source publishing’ is a term to describe the workflow that is based on XML technologies, where content stored in one source is used to produce publications in several formats and for various media – print, web, PDAs and so on.

Despite years of marketing spin that has promised XML will result in greater efficiency and reduced costs for producing publications by ‘reusing content’, implementation of such single-source workflow has been limited to large publishers – and even then it is used primarily for typesetting, practically at the end of the workflow. But is XML useful across the whole workflow? I am setting out to see how the format can be used more effectively, by implementing it right at the start – with editing.

My first taste of XML came two years ago, when I produced RSS news feeds for a web portal. It didn’t make any sense to me – I just downloaded another news feed, worked out where to put the text and ‘tags’ that defined the title, date and news, and uploaded the XML file to ‘go live’. I had some HTML experience, so understanding the structure of XML code wasn’t totally foreign. But then I landed a job as an editor with a publishing house, and I had to learn XML to edit publications. With specific in-house training, I took to it like the proverbial duck. I learned a new language and new terms – chunks, elements, attributes, nesting, validation. Despite using XML every day, it took six months to become comfortable with it – and there were still many aspects of markup I needed to discover.

Now 18 months down the track, I have an appreciation of how XML can be used to its potential – and on the other hand, where it is just plain awkward. Editing with XML gives you wider exposure to the publishing workflow because you need to consider the structure of documents and, at a basic level, you become involved in formatting text for output – traditionally the domain of the desktop publisher. Yet, with the experience of an enterprise-wide XML publishing system that aims to reduce the turnaround time for producing documents, it is obvious the workflow continues to parallel the conventional stages of production. There is still a necessity to work with Word documents in the early stages of editing, as authors may need to review edits in ‘track changes’. Typical of editing, a Word document may go back and forth between author and editor until a final draft. Once past this stage, editors then mark up the text in XML ‘chunks’ (files). This is effectively typesetting the text – the editor takes on the role of desktop publisher by copying and pasting from Word documents, and also specifying appropriate elements and attributes that aid in defining the final appearance of content. A draft PDF document is then generated, printed, proofread, more corrections made to the XML to modify incomplete markup, and a press-ready PDF is the end result. It is a laborious process, one that is likely similar amongst large publishers.

There are inefficiencies in such a workflow right from the start. Granted, an XML workflow is only as good as the software implemented to manage the content. But the necessity of needing to edit initially with Word, because authors provide and need to review proofread material in this format, is merely transplanting conventional editing practice into a single-source workflow – there is no time-saving solution at this stage. Subsequent typesetting of content by the editor, and generating PDF documents that are sent to authors for review, bring about further delays. Although the XML files may contain all the text and graphics, producing a draft PDF or RTF would be necessary for authors to review. This is ‘roundtripping’ – moving from one format to another then back – just to take in edits.

While such a workflow might be sustainable on a large scale, as for publishing houses, it is not likely to translate well when scaled down to single users or very small teams. The current state of XML software makes it almost feasible for editors to start moving into single-source publishing to supplement word processing-based editing. Almost, because although there is a considerable offering of well-developed and affordable editing and authoring packages that approach the basic functionality of word processors, there is no effective workflow that would reduce the time to produce documents.

If editors are to implement XML, they would need to include typesetting in their services – not only because XML standards and supporting publishing businesses are yet to be developed, but also because production time is most likely to be reduced when editors control the whole production process. The process of structured editing involves marking up content, which encroaches into the typesetting stage. There is a much larger aspect of formatting XML content for presentation – that of developing stylesheets – but for this discussion I will focus on a broad framework for editing workflow.

The figure shows a typically familiar workflow that commences with the author sending a document to the editor. The departure then from conventional editing is that the document is marked up as XML. The document must remain as XML to avoid roundtripping, so reducing the time to take in edits. This, of course, assumes the author has access to XML authoring software for reviewing the content. The editor would control the whole production process for expediency – because in a conventional workflow, a desktop publisher would typeset the document and set up styles with advice from the author, and this requires accurate scheduling to meet the press deadline. There are inevitable delays as the desktop publisher manages several publications and liaises with the editor.



The rationale for devolving the typesetting stage to the editor is that during editing markup, XML documents are partly formatted by selecting appropriate elements and attributes – ensuring valid document structure, for example, is one aspect of formatting for presentation. Because the editor’s markup partially achieves formatting, XML can encourage the editor to complete the formatting, by applying stylesheets to generate press-ready PDF documents. An additional consideration is the need for expertise in document design, preparation of graphics and development of stylesheets, which means collaboration with a desktop publisher would be required. Another imperative for the editor to assume greater control over production is that the infrastructure and expertise within the publishing industry is undeveloped to support XML production – so with very few XML publishers and virtually no XML designers, editors (who are increasingly diversifying into desktop publishing and graphic design) are probably best placed to start building the framework for XML publishing.

This broad framework is but a starting point to test and accommodate the totality of XML publishing – developing effective and efficient markup, stylesheets for various media, and typesetting options; trialling software for authoring; time trials to compare against conventional production workflow; and identifying specialists for aspects of editing, design and usability testing.

Welcome

G’day and welcome to my blog about XML editing. I’m a freelance editor who’s keen to see XML editing adopted by freelancers. I’m undertaking research and development to learn more about the publishing technology, and am aiming to make it a realistic proposition for editors and authors at the ‘grass roots’.

I’ll share my experiences here, and trust that you’ll find it useful to swap your knowledge or ask questions – we’ll learn together. And I’m looking for XML editors who’d like to collaborate in this work – so feel free to post comments.