LE BAIN MARIE: XML schema design for document authoring

By: Marie Bilde Rasmussen, 2006

This posting is a copy of my contribution to the poster session at the ExtremeMarkup conference 2006 in Montreal

David Birnbaum (University of Pittsburgh) wrote on his conference blog: You want your students and clients to read the intelligent and lucid two pages of Marie Bilde Rasmussen’s “XML Schema Design for Document Authoring,” which provide clear, concise, and practical guidelines to useful, learnable, teachable, and maintainable design. If there were an award for the best overall poster of the conference, this would be the winner.

C. Michael Sperberg-McQueen (W3C) said in his closing remarks at the conference: I was very struck with something that Maria Bilde Rasmussen said in connection with her poster, about author-centered XML. The goal for an editing system for lexicographers is that when a lexicographer looks at the screen he should not say, “Oh, okay: this is an XML document that represents a lexicographic entry.” You want the lexicographer to look at the screen and say: “This is a lexicographic entry.” Period. If absolutely necessary it may be ok if they say, “This a lexicographic entry represented in XML” - but not “This is primarily an XML document.” You want them to see through the XML, to the information.

XML schema design for document authoring
using XML to support the process

Document authoring is sometimes the predominant time consuming factor in the production cyclus. XML schemas tend to be designed only to describe the final state of data, representing a structure that allows later data processing, but is not further customized to support the authoring process.

The schema should inform the author about exactly which elements (etc.) are relevant to insert/use in a given structural context. The grammar will appear much simpler from a local point of view. And the author will be able to recognize the information types of the current text type. However, we cannot assume that he is an expert on XML or markup in general

This schema use will probably require a more complex set of rules. But schema complexity and the resulting extra ressources spent on schema design is a good investment if the time spent on authoring is reduced and the data quality is increased. So, instead of only defining XML vocabularies as being either document or data oriented, it might be a good idea to focus on the authoring process when designing the XML-environment for (the authoring of) a given text type.

Taking the authoring process into account affects:

schema design
selection of relevant schema language(s)
selection of an appropriate application

Assuming that the author uses an application that is schema aware (e.g. by exploiting the PSVI of a W3C schema intensively) the schema can be designed to facilitate the authoring task. If the application furthermore provides means of data presentation in the editing view, we can actually support and help authors to concentrate on their primary task: text production

definitions

document authoring is simoultaneous content production, editing and markup, performed by an author
an author is a person producing a piece of text
an application is a piece of software used for authoring

schema design goals

the vocabulary and the relations defined by the schema must be recognized by the author as a meaningful representation of the text type and the chosen analysis model
in any given structural context, the schema must allow the insertion/deletion/alteration of exactly the relevant elements/tree fragments as siblings or children
the author should recognize working with the XML environment as more beneficial than working without it. He should not feel reduced to some sort of technical encoder
the schema is the author's working tool, and the perfect tool excels by being inconspicuous (the schema language is the designer's tool). This means that if the schema is well-designed, the author does not pay any attention to -- and is not distracted by -- the xml'ness: he can focus on his text

schema design strategies

structures should be shallow in order to keep as much of the text on the screen at the same time and to prevent the text from being abruptly fragmented
depth should be dynamic and only be used when necessary (structures should not always be as deep as in the worst case)
bottom-up markup must be possible, i.e. that coherent pieces of text can first be written and thereafter marked up
there should be a high degree of context sensitivity in the sense that only relevant and all the relevant substructures are valid in a given context (in a W3C xsd this may result in a very large variety of global types in a Venetian blind or Garden of Eden approach)
element and attribute naming should be meaningful and take the distinction between content, form and function into account
mixed content should only be used in coherent "textual" contexts

LE BAIN MARIE

Tuesday, January 16, 2007

XML schema design for document authoring

No comments:

If XML is so difficult to write, how can we make it easy?

About Me

Selected readings about XML and text

Blog Archive