[Box Backup] Common file recognition (was: Win32 port)

Ben Summers boxbackup@fluffy.co.uk
Fri, 24 Sep 2004 12:34:58 +0100


On 24 Sep 2004, at 12:21, Garry Glendown wrote:

> Ben Summers wrote:
>> On 24 Sep 2004, at 00:33, Chris Wilson wrote:
>>> Hi all,
>>>
>>> I second the call for duplicate file detection :-)
>> Unfortunately there's only one of me.
>
> Also, there is one basic problem that just popped up in my mind ... BB 
> uses encryption for storing the files - encryption done by the client 
> IIRC ... therefore, the server has no way of telling which files are 
> the same, and even if, a second client couldn't restore its file from 
> the first client's storage, as he doesn't have the private key to 
> extract the data ... so strike that feature off the list ... (dunno 
> how the other companies handle it, but it would seem to me they do not 
> handle security the same way BB does ...)
>
I expect this would be done in a slightly different way to the usual 
backups. The client would query a list of these "common files" given 
filename, hash and length, and the server would tell it which it had.

This would mean

* You would have to upload all the common files to the server, 
unencrypted

* The client would lose out on some confidentiality of data -- the 
server would know the client used a common file, and would know lots of 
filenames of files you were backing up.

The latter issue could be minimized by either using hashes of 
filenames, or restricting the search for common files to specified 
directories (like C:\Windows or /usr).

So it's possible to do, but it is a different mechanism. I don't think 
the commercial companies see encryption as something as important as I 
do.

Ben