[Xml-bin] Some central design issues
Al Snell
alaric@alaric-snell.com
Thu, 12 Apr 2001 10:24:24 +0100 (BST)
On Thu, 12 Apr 2001, Peter Jacobi wrote:
> For example binary content can seen as an alternate external
> representation of base64binary. When accessing with standard DOM
> functions, it will be seen base64 encoded.
Yes, I was planning on making all the extensions low-profile, like
that. Anything that can't be easily done like that can be mapped to
something like:
<xmlbin:binary>...base64...</xmlbin:binary>
This would be needed to be able to round trip binXML through
textXML... otherwise the binary bits would come out as base64 strings
after going through XML :-(
It depends on whether we have access to a schema. If there is a schema,
then that's the way to specify that an element is base64 binary or a
float rather than a string.
If there is not a schema, we need to embed that information.
There are maybe several possible translations:
1) Unschemaed tXML into bXML - all CDATA stored as strings unless they
have binxml:type attributes specifying that they are to be read as floats
or binary in base64, etc.
2) bXML into XML, sans schema - all doubles, binaries, etc. stored as
CDATAs with appropriate textual representations; binxml:type attributes
added to all elements with CDATA, specifying its type. If there are
multiple values in an element we give it binxml:type "compound" and then
list <binxml:value type="..."> elements inside it.
3) Schemad tXML into bXML and vice versa - CDATA handled with knowledge of
its intended type from the schema
> 3. Seekability
>
> This would be key benefit of a binary external representation. We should
> look how its done in MPEG7.
Mmmm. Seekability, if implemented naively, can hamper streaming - what did
you think of my proposal for combining the two by having a streamed
SAX-event representation followed by an optional index?
Suggested index format: For each element, we store a pointed to its byte
offset in the SAX-stream, and a list of its child elements, sorted. The
sorted list enables a binary search when scanning the index. Each child
element has the file offset of the index entry for that element.
> Regards,
> Peter Jacobi
ABS
--
Alaric B. Snell
http://www.alaric-snell.com/ http://RFC.net/ http://www.warhead.org.uk/
Any sufficiently advanced technology can be emulated in software