LE BAIN MARIE: document-oriented XML vs. data-oriented XML: an awkward distinction

Whether you use the term document-oriented XML or narrative XML documents and define this phenomenon to be the opposite of something called data-oriented XML or record-oriented XML, it is assumed that the markup’s structure (or at least: the very nature of the data structure) emerges from the text type in it’s own right, i.e. regardless of it’s interaction with the surrounding world.

It is also assumed, that very complex and less ordered data are more likely to having been produced manually, whereas very simple, repetitive and maybe quite linear data are considered as the typical expression of computer-produced XML. Rigidly maintaining this distinction and these assumptions makes life difficult for some XML content architects.

The variety of text types is far more complicated than this. Lexicographers author texts that are VERY complex yet very well-ordered and carefully restricted structures. Other texts' structure may be very simple and repetitive, even though it is very difficult to analyse the content correctly in order to mark it up, and therefore the markup process is performed by human authors. Maybe other texts again could quite poorly structured, e.g. with plenty of mixed content, but despite this, it is being marked-up by computers.

In my view there are a number of parameters that must be taken into account if one wishes to seize the most adequate design of a data structure for some 'text':

The text type itself. Are we dealing with
- a medieval poem
- a result of a database query
- a scientific report
- an entry in a phonebook/a dictionary/an encyklopedia
- technical documentation
- ...

The content production process. Is the content
- output from a computer process?
- manually keyed in?
- already there?
- added here and there to an existing text?, e.g. as part of a revising process

The markup process. Is the markup process
- carried out manually?
- handled by a computer process?
- simultaneous with the writing of the text - perhaps even integrated with it?
- applied to a preexisting text?
- consisting of some structural rearrangement/transformation/extension of an already existing structure?

The intended use of the marked-up text. Shall the text
- only be published? (in print or electronically)
- be read in a structured format?
- be searchable by categories?
- be subject to research and comparison with other similar texts?
- be exchanged and read by automated processes?

Possibly we will add to this list of perspectives in order to decide wich design strategy to follow in a given project invlving xml authoring. But already at this point, it should be clear, that it is not only the nature of the data, that has to be considered during a schema design process. Authoring processes, the text type and the very purpose of marking up the text must not be ignored. In this light, the discrimination between document and data oriented xml appears at best to be insufficient

LE BAIN MARIE

Saturday, January 6, 2007

document-oriented XML vs. data-oriented XML: an awkward distinction

No comments:

If XML is so difficult to write, how can we make it easy?

About Me

Selected readings about XML and text

Blog Archive