[Xml-bin] Some central design issues

Al Snell alaric@alaric-snell.com
Thu, 12 Apr 2001 10:24:24 +0100 (BST)


On Thu, 12 Apr 2001, Peter Jacobi wrote:

> For example binary content can seen as an alternate external 
> representation of base64binary. When accessing with standard DOM 
> functions, it will be seen base64 encoded.

Yes, I was planning on making all the extensions low-profile, like
that.  Anything that can't be easily done like that can be mapped to
something like:

<xmlbin:binary>...base64...</xmlbin:binary>

This would be needed to be able to round trip binXML through
textXML... otherwise the binary bits would come out as base64 strings
after going through XML :-(

It depends on whether we have access to a schema. If there is a schema,
then that's the way to specify that an element is base64 binary or a 
float rather than a string.

If there is not a schema, we need to embed that information.

There are maybe several possible translations:

1) Unschemaed tXML into bXML - all CDATA stored as strings unless they
have binxml:type attributes specifying that they are to be read as floats
or binary in base64, etc.

2) bXML into XML, sans schema - all doubles, binaries, etc. stored as
CDATAs with appropriate textual representations; binxml:type attributes
added to all elements with CDATA, specifying its type. If there are
multiple values in an element we give it binxml:type "compound" and then
list <binxml:value type="..."> elements inside it.

3) Schemad tXML into bXML and vice versa - CDATA handled with knowledge of
its intended type from the schema

> 3. Seekability
> 
> This would be key benefit of a binary external representation. We should 
> look how its done in MPEG7.

Mmmm. Seekability, if implemented naively, can hamper streaming - what did
you think of my proposal for combining the two by having a streamed
SAX-event representation followed by an optional index?

Suggested index format: For each element, we store a pointed to its byte
offset in the SAX-stream, and a list of its child elements, sorted. The
sorted list enables a binary search when scanning the index. Each child
element has the file offset of the index entry for that element.

> Regards,
> Peter Jacobi

ABS

-- 
                               Alaric B. Snell
 http://www.alaric-snell.com/  http://RFC.net/  http://www.warhead.org.uk/
   Any sufficiently advanced technology can be emulated in software