[Xml-bin] Possible structures

Al B. Snell alaric@alaric-snell.com
Thu, 12 Apr 2001 00:04:15 +0100 (BST)


There are a few different ways of binarizing XML.

1) SAX event stream. A series of self contained tokens saying thing like
"Start element", "CDATA", "End element".

2) Recursively defined with lengths. This generally involves encoding each
element in its entirety, then prepending that stream with a header
containing a length. This is not strreamable (most of the document will
need to be processed before the first byte is output), but it does allow
for fast traversal of the document when reading; you can skip entire
unwanted elements. See http://RFC.net/rfc3072.html

3) Indexed Sequential, our old friend from the database world. The data is
serialised as a list of event tokens as in 1), while the encoder remembers
the byte offsets of all start element tokens (or more detail, perhaps -
tradeoff) then tacks an index onto the end. This increases resource usage
in the encoder, but it means that the data is still streamable (it is
possible to process the document iwthout needing the whole thing in
memory), and random access systems can still seek to the end, read the
index, then seek directly to the data. This may be what SDW's project
does.

4) Compacted, external schema-reliant. This appears to be what the project
at http://asn1.elibel.tm.fr/xml/ are doing; encoders and decoders work
with reference to a schema to create a dense format with no metadata in
it. Decoding without the schema is impossible. This is very compact, but I
don't think the savings are worth the terrible cost of it not being in the
spirit of XML; I suspect this is what people who fear binary-XML think I'm
suggesting...

I like solution 3. Suggesting a standard method of encodign solution 3 to
be part of the ITU stuff Olivier is working on might be a good idea?

Especially if the index was *optional* and could be left off for streaming
and reassembled at the end.

ABS

-- 
                               Alaric B. Snell
 http://www.alaric-snell.com/  http://RFC.net/  http://www.warhead.org.uk/
   Any sufficiently advanced technology can be emulated in software