[Box Backup-dev] Re: Learning from ZFS (fwd)
Wout Mertens
boxbackup-dev@fluffy.co.uk
Tue, 22 May 2007 17:23:48 +0200
On 10 May 2007, at 14:56, Martin Ebourne wrote:
> Put simply, checksums are very good at detecting differences, but
> very bad at proving similarity. (And it is counter-intuitive that
> these are not reciprocal.)
Well put, and I completely agree. I also never said that when two
checksums match, zfs should throw away a block without comparing the
contents ;-)
That said, I have it on good authority that a large company that
sells Content-Addressable Storage does not do the compare phase,
under the assumption that two random data blocks that have the same
checksum _and also make sense_ are way too rare to support taking
that performance hit.
The odds of two data blocks of backed up data (ie non-random) having
the same checksum are pretty low.
That said, I'd prefer it if the hypothetical zfs-block-deduper would
check before coalescing blocks. ;-)
Wout.