[Xml-bin] Some central design issues

Stephen D. Williams sdw@lig.net
Fri, 13 Apr 2001 09:14:38 -0400


Al Snell wrote:
> 
> On Fri, 13 Apr 2001, Stephen D. Williams wrote:
> 
> > > I can make a format that's great for streaming and random access reads,
> > > but in-place modification is an arse :-(
> >
> > Yes, it is!  Solving it in a reasonably optimal way is highly beneficial
> > everywhere else.
> 
> Is it worth having seperate streaming and random-access formats?
> 
> We could start with a streaming one (that works like current XML as far as
> the programmer needs to look) then work on a random-access one (that
> efficiently implements DOM, but requires a DOM traversal to generate SAX)?

We can also start with a random-access format and work on streaming it
later...  That's currently my choice since I'm looking for a big win in
random-access.  Streaming is easy comparitively and there is much less
to be gained compared to modifiable random-access.  Most applications
effectively use DOM semantics, even many of those that use SAX.

> 
> The streaming one is easier to design, and is useful for many applications
> (such as SOAP).
> 
> Note that many contemporary binary formats do not support in-place
> editing; the common access pattern for data is generation (as a linear
> output stream) then reading (either the whole thing as an input stream or
> random access reading making use of some form of indexing information).

Exactly what I'm trying to address.  It's hard to solve this problem and
I have had a hard time finding project that tried.  I know enough about
the solution to know that it will work, although exact performance is
still just an estimate.

> In place update does involve a big jump in complexity that's not necessary
> for all applications (although it's crucial for some). Do you think this
> justifies making it a seperate format (although there is much scope for
> sharing between them in many ways)?

I believe that the complexity is only present in the code that tries to
optimize in-place modification.  The same format can be 'read' with
fairly low complexity.  This is similar to the difference between
traversing a B-Tree vs. trying to keep it balanced in the face of random
activity.

> ABS
> 
> --
>                                Alaric B. Snell
>  http://www.alaric-snell.com/  http://RFC.net/  http://www.warhead.org.uk/
>    Any sufficiently advanced technology can be emulated in software

sdw
-- 
sdw@lig.net  http://sdw.st
Stephen D. Williams
43392 Wayside Cir,Ashburn,VA 20147-4622 703-724-0118W 703-995-0407Fax 
Dec2000