[Xml-bin] Some central design issues

Al Snell alaric@alaric-snell.com
Fri, 13 Apr 2001 18:56:12 +0100 (BST)


On Fri, 13 Apr 2001, Stephen D. Williams wrote:

> Looks like some very good ideas.  I'll digest it more later.

It's been a while since I last designed a filesystem, so there may be a
few gotchas I've forgotten.

> One important choice is whether the file uses single (UTF-8) or
> double-byte Unicode for actual string data.  I always lean toward UTF-8
> but it does increase complexity when you actually have double-byte
> values.  Java's pervasive use of double-byte makes this a more evenly
> balanced decision for me.

Yeah :-(

My ideas about how Unicode should have been done are a different story :-)

> Another goal is to make searching ultra-fast.

Is regenerating a tree index an acceptable approach?

> Modification has to be fast for 'appending'.  For random changes, some
> 'startup' penalty is assumed that should be amortized if many changes
> are made.  The latter point allows element values to be 'variables' used
> in normal processing.  It's valid to incur some 'warm-up' as long as
> subsequent changes are usually cheap.

Caching. Especially write caching.

> Support for repeated tags is required; ordering is assumed to be
> important, and 'data' is treated as pseudo-tagged to allow intermingling
> of data and tags within an XML 'object'.  It seems that a good default
> rule is that presence of non-space data automatically triggers verbatim
> mode for a tag data space.  Probably all space should be, by default,
> treated as significant to allow totally verbatim transformations.

That seems to be the way people are going.

> Your thinking of grouping like fixed sized data items together is good
> and will improve efficiency, but that's really a 'compression' level
> problem. 

It also helps simplify free-space management - which is trivial for fixed
sized lumps.

> 'Elastic Memory' is simply the idea that within a block, you can have
> holes that are hidden from higher layers but are used, and created, by
> modification to the virtually flat and contiguous memory space. 

I designed a filesystem once that supported this directly (when you do a
write, you get to say whether the byte sequence should overwrite or insert
- and you could delete bits too. Moving things within the file was done as
an intrinsic operation rather than a read then write, in order to allow
for optimisation of disk->disk transfers in the controllers), since so
many files require it!

It also supported bookmarks - you drop a bookmark at a file position, and
get a handle returned. A position in the file is specified as a bookmark
handle plus or minus a number of bytes. Bookmarks move around when bytes
are inserted or deleted.

The filesystem worked by treating the disk as a single file, with the
semantics described above, in the disk driver.

The filesystem basically mapped a lot of files down into one file, and
could be recursively used if you so wished.

If you have efficient low level operations to model a byte string that can
be chopped about, the rest gets a lot easier :-)

> sdw

ABS

-- 
                               Alaric B. Snell
 http://www.alaric-snell.com/  http://RFC.net/  http://www.warhead.org.uk/
   Any sufficiently advanced technology can be emulated in software