[Box Backup-dev] Box Backup 0.20 redesign
Ben Summers
boxbackup-dev@fluffy.co.uk
Sat, 13 Jan 2007 13:31:31 +0000
I know I promised a few of you that I'd put up notes on a redesign.
But I haven't. I've been trying to put together something which
* Signs directories, and handles them efficiently on the client and
server
* Obscures the length of files
* Stops an attacker determining the directory structure
* Allows random access.
* Efficient server operation, both for read, update and housekeeping
* Network efficient
Now the difficulty I had was that you needed to be able to delete any
arbitrary file from the server, which does require leaking
information about files and directories to the server. So... what if
you were to remove this requirement? Instead, bbackupd always writes
a "snapshot", which is a consistent image of the fs at the point in
time of the backup. You can only delete entire snapshots.
So, with this in mind, the server has a reduced set of commands:
* List snapshots
* Read block ID x from snapshot y
* Start new snapshot, T.
* Write block ID x to snapshot T
* Update block ID x in snapshot T
* Delete block ID x from snapshot T
* Commit snapshot T
A block is a chunk of data, compressed and encrypted. We won't assume
they're constant size.
A snapshot is stored as a list of new, updated and deleted blocks.
Just as with the diffing, we do that in reverse order, so the latest
version is complete, and the previous snapshots are diffs.
We can delete any snapshot in the sequence, with a bit of care.
Over the top of this, we implement a file-system like thing entirely
in the client. A file is a list of blocks, stored as now, but the
blocks are all separate blocks in the FS. They're not stored as a
continuous thing, but each block of the file is a separate block in
the snapshot. This allows us to do the diffing, just as we do now,
only slightly differently.
I wonder if we can use existing FS code from anywhere?
So what can an attacker find out? They could probably identify which
blocks are directory blocks, because of access patterns from the
client, but if we make dirs have potentially many blocks, or even
stuff them with dummy data, we can reduce the amount of information
an attacker has. And to be extra paranoid, we could interleave access
to throw them off the scent.
This also simplifies the code and makes the crypto easier to audit;
as there is one central place a block is encrypted and 'signed'. So
we get much of strong crypto I want for free.
I wonder if we could make a server which never needs to be upgraded?
That would solve the upgrade cycle hassles.
Ben