[Box Backup-dev] Re: Learning from ZFS (fwd)
Martin Ebourne
boxbackup-dev@fluffy.co.uk
Thu, 10 May 2007 13:56:46 +0100
Ben Summers <ben@fluffy.co.uk> wrote:
> Hmmm. Yes and no.
>
> Yes, you want your original data back. 100% guaranteed.
>
> No, in that if you stick to this rule absolutely you can't use rsync
> or Box Backup's rsync-like algorithm.
>
> Maybe, in that there's a lower chance of it being a problem in the
> rsync case.
There's a difference here in use between the original suggestion, =20
which was to use checksums as a compression system (which doesn't =20
work) and the way rsync etc use checksums to detect changes (checksums =20
were originally designed to detect errors of course and this works =20
well).
Given two random blocks that have the same checksum, it is very =20
unlikely that they contain the same data, hence no good for compression.
On the other side if you have the checksum for a block and the data in =20
the block is subsequently changed, it is very unlikely that they will =20
have the same checksum, which makes it work for the rsync etc case.
Put simply, checksums are very good at detecting differences, but very =20
bad at proving similarity. (And it is counter-intuitive that these are =20
not reciprocal.)
Cheers,
Martin.