[Box Backup-dev] Re: [Box Backup] bbackupd - read errors on database files

G. boxbackup-dev@fluffy.co.uk
Fri, 13 Apr 2007 05:47:03 -0700 (PDT)


Chris,

> I'd say it uses less Internet bandwidth use than compare -a, but not less 
> CPU or disk activity.

Eliminating "compare -aq" block-level checksum download requirement speeds up the entire verification process by an order of magnitude. If I recall correctly, calculating one MD5 hash value for an entire file and using it for a comparison is also significantly faster than calculating multiple block-by-clock hash values and making multiple block-by-clock comparisons.

There is also the half-way option of pre-downloading, caching, and persisting (StoreObjectInfoFile) all remote block-level checksum information, instead of generating somewhat redundant MD5s. Not too elegant, though.

> We can't cache the checksums of local files on disk, otherwise we'd have 
> the same problem that we do now :-(

I didn't catch that one... We already cache file attribute information locally (in-memory, and preserved by StoreObjectInfoFile) to be able to use it for change detection as well (folder-level checksum algorithm takes it into consideration).

> compare -aq does not compare checksums of anything, as far as I know, it 

Beg to differ here, Chris...

> I think that the mode you describe, "remote checksum to remote disk 
> content" would be better achieved by the client uploading the unencrypted 
> checksum of the encrypted data, which is saved by the server as an 
> unencrypted attribute, which bbstoreaccounts check can reverify at any 
> time.

Ok, let's forget my remote content verification idea for the moment (since I'm getting confused here ;)).

---

So, it's the plaintext MD5 as a part of a file attribute stream vs. pre-caching block-level checksum information vs. inode notification. However, I think we do need an option to not only 100% guarantee change detection, but also remote content verification during each backup cycle. I would personally accept a sacrifice of even 50% of performance (who cares, the thing runs in the wee hours of the morning anyway and takes hours already) to be absolutely sure that once a backup cycle completes successfully, remote content matches local content, "beyond reasonable doubt" :).

Gary





__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com