[Box Backup-dev] Re: Learning from ZFS (fwd)

Wout Mertens boxbackup-dev@fluffy.co.uk
Tue, 22 May 2007 17:23:48 +0200


On 10 May 2007, at 14:56, Martin Ebourne wrote:

> Put simply, checksums are very good at detecting differences, but  
> very bad at proving similarity. (And it is counter-intuitive that  
> these are not reciprocal.)

Well put, and I completely agree. I also never said that when two  
checksums match, zfs should throw away a block without comparing the  
contents ;-)

That said, I have it on good authority that a large company that  
sells Content-Addressable Storage does not do the compare phase,  
under the assumption that two random data blocks that have the same  
checksum _and also make sense_ are way too rare to support taking  
that performance hit.

The odds of two data blocks of backed up data (ie non-random) having  
the same checksum are pretty low.

That said, I'd prefer it if the hypothetical zfs-block-deduper would  
check before coalescing blocks. ;-)

Wout.