[Box Backup] Common file recognition (was: Win32 port)

richard_eigenmann boxbackup@fluffy.co.uk
Fri, 24 Sep 2004 15:01:52 +0200


> I expect this would be done in a slightly different way to the usual 
> backups. The client would query a list of these "common files" given 
> filename, hash and length, and the server would tell it which it had.
> 
> This would mean
> 
> * You would have to upload all the common files to the server, 
> unencrypted
> 
> * The client would lose out on some confidentiality of data -- the 
> server would know the client used a common file, and would know lots of 
> filenames of files you were backing up.
> 
> The latter issue could be minimized by either using hashes of 
> filenames, or restricting the search for common files to specified 
> directories (like C:\Windows or /usr).
> 
> So it's possible to do, but it is a different mechanism. I don't think 
> the commercial companies see encryption as something as important as I 
> do.


We could have a pool of "common files" that can be set up as a baseline
(perhaps backup a freshly set up Windows box). Boxbackup could perhaps even
be configured to use some sort of regexp filtering to identify files to store
in the common pool (*.exe, *.dll, *.cab) so that the pool doesn't get stale.

Strong encryption is great and I think Ben has done a fantastic job. People's
requirements may vary of course. Encrypting the traffic is certainly a best
practice. There could be situations where you have secure network connections
(VPN) where doing a second level of encryption is simply a waste of effort.
Likewise having encrypted stores on the storage vault is a great feature and
allows you to put your backups on machines that you would not normally let
near your data. But there will be situations where the backup server is being
administered by people who know what they are doing and where the encryption
stands in the way of storage efficiency. Perhaps there are even legal issues
that might prevent the use of encryption. Key escrow management issues would
also not be an issue if the backup is unencrypted.

I think we have more valuable enhancements on the to do list. But if we keep
the idea of "shared backup files" in the back of our (Ben's) mind the
development is less likely to go down a road where this becomes impossible to
accomplish at a future date.

Regards,
Richard