By: Marie Bilde Rasmussen, 2006
| This posting is a copy of my contribution to the poster session at the ExtremeMarkup conference 2006 in Montreal | |
| David Birnbaum (University of Pittsburgh) wrote on his conference blog: You want your students and clients to read the intelligent and lucid two pages of Marie Bilde Rasmussen’s “XML Schema Design for Document Authoring,” which provide clear, concise, and practical guidelines to useful, learnable, teachable, and maintainable design. If there were an award for the best overall poster of the conference, this would be the winner. | C. Michael Sperberg-McQueen (W3C) said in his closing remarks at the conference: I was very struck with something that Maria Bilde Rasmussen said in connection with her poster, about author-centered XML. The goal for an editing system for lexicographers is that when a lexicographer looks at the screen he should not say, “Oh, okay: this is an XML document that represents a lexicographic entry.” You want the lexicographer to look at the screen and say: “This is a lexicographic entry.” Period. If absolutely necessary it may be ok if they say, “This a lexicographic entry represented in XML” - but not “This is primarily an XML document.” You want them to see through the XML, to the information. |
XML schema design for document authoring
using XML to support the process
Document authoring is sometimes the predominant time consuming factor in the production cyclus. XML schemas tend to be designed only to describe the final state of data, representing a structure that allows later data processing, but is not further customized to support the authoring process.
The schema should inform the author about exactly which elements (etc.) are relevant to insert/use in a given structural context. The grammar will appear much simpler from a local point of view. And the author will be able to recognize the information types of the current text type. However, we cannot assume that he is an expert on XML or markup in general
This schema use will probably require a more complex set of rules. But schema complexity and the resulting extra ressources spent on schema design is a good investment if the time spent on authoring is reduced and the data quality is increased. So, instead of only defining XML vocabularies as being either document or data oriented, it might be a good idea to focus on the authoring process when designing the XML-environment for (the authoring of) a given text type.
Taking the authoring process into account affects:
- schema design
- selection of relevant schema language(s)
- selection of an appropriate application
definitions
- document authoring is simoultaneous content production, editing and markup, performed by an author
- an author is a person producing a piece of text
- an application is a piece of software used for authoring
schema design goals
- the vocabulary and the relations defined by the schema must be recognized by the author as a meaningful representation of the text type and the chosen analysis model
- in any given structural context, the schema must allow the insertion/deletion/alteration of exactly the relevant elements/tree fragments as siblings or children
- the author should recognize working with the XML environment as more beneficial than working without it. He should not feel reduced to some sort of technical encoder
- the schema is the author's working tool, and the perfect tool excels by being inconspicuous (the schema language is the designer's tool). This means that if the schema is well-designed, the author does not pay any attention to -- and is not distracted by -- the xml'ness: he can focus on his text
schema design strategies
- structures should be shallow in order to keep as much of the text on the screen at the same time and to prevent the text from being abruptly fragmented
- depth should be dynamic and only be used when necessary (structures should not always be as deep as in the worst case)
- bottom-up markup must be possible, i.e. that coherent pieces of text can first be written and thereafter marked up
- there should be a high degree of context sensitivity in the sense that only relevant and all the relevant substructures are valid in a given context (in a W3C xsd this may result in a very large variety of global types in a Venetian blind or Garden of Eden approach)
- element and attribute naming should be meaningful and take the distinction between content, form and function into account
- mixed content should only be used in coherent "textual" contexts
No comments:
Post a Comment