[Box Backup-dev] Box Backup 0.20 redesign

Sat, 13 Jan 2007 13:31:31 +0000

I know I promised a few of you that I'd put up notes on a redesign.  
But I haven't. I've been trying to put together something which

* Signs directories, and handles them efficiently on the client and  
server

* Obscures the length of files

* Stops an attacker determining the directory structure

* Allows random access.

* Efficient server operation, both for read, update and housekeeping

* Network efficient

Now the difficulty I had was that you needed to be able to delete any  
arbitrary file from the server, which does require leaking  
information about files and directories to the server. So... what if  
you were to remove this requirement? Instead, bbackupd always writes  
a "snapshot", which is a consistent image of the fs at the point in  
time of the backup. You can only delete entire snapshots.

So, with this in mind, the server has a reduced set of commands:

* List snapshots

* Read block ID x from snapshot y

* Start new snapshot, T.

* Write block ID x to snapshot T

* Update block ID x in snapshot T

* Delete block ID x from snapshot T

* Commit snapshot T

A block is a chunk of data, compressed and encrypted. We won't assume  
they're constant size.

A snapshot is stored as a list of new, updated and deleted blocks.  
Just as with the diffing, we do that in reverse order, so the latest  
version is complete, and the previous snapshots are diffs.

We can delete any snapshot in the sequence, with a bit of care.

Over the top of this, we implement a file-system like thing entirely  
in the client. A file is a list of blocks, stored as now, but the  
blocks are all separate blocks in the FS. They're not stored as a  
continuous thing, but each block of the file is a separate block in  
the snapshot. This allows us to do the diffing, just as we do now,  
only slightly differently.

I wonder if we can use existing FS code from anywhere?

So what can an attacker find out? They could probably identify which  
blocks are directory blocks, because of access patterns from the  
client, but if we make dirs have potentially many blocks, or even  
stuff them with dummy data, we can reduce the amount of information  
an attacker has. And to be extra paranoid, we could interleave access  
to throw them off the scent.

This also simplifies the code and makes the crypto easier to audit;  
as there is one central place a block is encrypted and 'signed'. So  
we get much of strong crypto I want for free.

I wonder if we could make a server which never needs to be upgraded?  
That would solve the upgrade cycle hassles.

Ben