[Xml-bin] An attempt to sort things out [long]

Al Snell alaric@alaric-snell.com
Wed, 18 Apr 2001 15:29:43 +0100 (BST)


On Wed, 18 Apr 2001, Stefan Zier wrote:

> > <?xml version='1.0' format='SAXbin' format-version='0.5b' ?>
> 
> I couldn't find format and format-version in the XML Recommendation, are
> those defined somewhere?

Nope, unilaterally suggested by yours truly. The crux of my point was to
try to reuse the <?xml as a header, y'see.

> Otherwise, why not go ahead and use the encoding
> attribute, I think this could be a meaningful application of it.

I dunno, wasn't that meant to refer to *character* encoding?

> Again, to recap: We are currently talking about two different things. A
> generic binary representation of XML documents as well as binary
> representation of data that is output by XML parsers (such as a DOM tree or
> a SAX event stream).

The two are semantically identical, though - whether to make the file
SAX-like or DOM-like are just issues of how to represent an XML document
in a binary format, as I see it (kick me if I'm being thick, it happens).

A SAX-like representation would be structured identically to the textXML
(since SAX does stuff in document order, doesn't it?). A DOM-like
representation would probably have a small pointer tree at the start and a
lake of strings after it, or some variation on that theme.

> Does it make sense to design all of them under the same hood or should we
> separate them in two different efforts? Or should we decide on one of them
> and focus? The requirements are significantly different.

Definitely. The SAX-like representation is easy, but has less benefits
than the structured approach.

The ASN.1 <-> XML effort seems to be covering the SAX-like approach of
just representing the document structure in a binary manner, currently,
but doesn't look likely to be optimised for extracting snippets in an
XPath like manner, which the DOM one would have.

I'm proud to say that Olivier's invitied me to join the ASN.1 <-> XML
working group :-)

Therefore, I'd be inclined to say that we ought to be focussing on a
"DOM-like" representation of an XML document in a binary file, meaning
that DOM tree operations should be implementable in constant time (eg, the
element gives pointers to its children that can be followed directly
rather than requiring parsing of all the children to find the desired
one).

> This is all getting fairly complex, IMHO it would make sense to focus on a
> single thing for now and get it done to see if its worth the effort at all.

Yeah!

ABS

-- 
                               Alaric B. Snell
 http://www.alaric-snell.com/  http://RFC.net/  http://www.warhead.org.uk/
   Any sufficiently advanced technology can be emulated in software