[Box Backup] BoxBackup Server Side Management Specs (Draft0.01)

Garry Glendown boxbackup@fluffy.co.uk
Thu, 23 Sep 2004 07:51:54 +0200


richard_eigenmann wrote:
> If we implemented such a kind of "backup recognition" algorythm this could
> speed up backups of remote laptops as perhaps the documents the laptop user
> has been working on have already been backed up from users back at the base.
> 
> I imagine this sort of feature could save somewhere between 0.5 and 5 GB per
> workstation that is doing a full backup. This could be significant to the
> scalability of the boxbackup.
> 
> Of course this sort of thing probably would lead to massive redevelopment of
> code and should only be undertaken if there is a very strong demand. I for
> one don't need it.

Your suggestion might have been derived from what some companies already 
sell as backup solutions ... e.g., InterXion, a large hoster, is selling 
exactly this feature set ... they say they have all mayor Windows 
version and many M$ apps "on file" and if the version found on the 
client machine matches the one on file, it is only stored as reference 
... they even do this for other files, dynamically extending this to 
everything stored on their server ... IIRC, they still store 3 copies of 
each file even with matches ...

Setting up such a feature will probably require a database of MD5 (or 
similar) checksums, filesize plus possibly file names (to reduce search 
time, at the cost of maybe not recognizing a match, but at the gain of 
reducing theoretical overlaps in MD5 checksum) ... probably want to 
define a minimum file size below which you wouldn't want to do all the 
searching/md5'ing due to little gain by matches ... plus if a file is 
used as a reference, the server must maintain it as long as there are at 
least 1 references to it ... if it is changed or deleted on the server 
it originated at, the old version must be kept intact for the other 
backups ... some work, but doable ...

But then, at the moment a nice frontend for M$-users might be of a 
slightly higher importance ...

-gg