[Box Backup-dev] Re: [Box Backup] bbackupd - read errors on database files
G.
boxbackup-dev@fluffy.co.uk
Fri, 13 Apr 2007 05:47:03 -0700 (PDT)
Chris,
> I'd say it uses less Internet bandwidth use than compare -a, but not less
> CPU or disk activity.
Eliminating "compare -aq" block-level checksum download requirement speeds up the entire verification process by an order of magnitude. If I recall correctly, calculating one MD5 hash value for an entire file and using it for a comparison is also significantly faster than calculating multiple block-by-clock hash values and making multiple block-by-clock comparisons.
There is also the half-way option of pre-downloading, caching, and persisting (StoreObjectInfoFile) all remote block-level checksum information, instead of generating somewhat redundant MD5s. Not too elegant, though.
> We can't cache the checksums of local files on disk, otherwise we'd have
> the same problem that we do now :-(
I didn't catch that one... We already cache file attribute information locally (in-memory, and preserved by StoreObjectInfoFile) to be able to use it for change detection as well (folder-level checksum algorithm takes it into consideration).
> compare -aq does not compare checksums of anything, as far as I know, it
Beg to differ here, Chris...
> I think that the mode you describe, "remote checksum to remote disk
> content" would be better achieved by the client uploading the unencrypted
> checksum of the encrypted data, which is saved by the server as an
> unencrypted attribute, which bbstoreaccounts check can reverify at any
> time.
Ok, let's forget my remote content verification idea for the moment (since I'm getting confused here ;)).
---
So, it's the plaintext MD5 as a part of a file attribute stream vs. pre-caching block-level checksum information vs. inode notification. However, I think we do need an option to not only 100% guarantee change detection, but also remote content verification during each backup cycle. I would personally accept a sacrifice of even 50% of performance (who cares, the thing runs in the wee hours of the morning anyway and takes hours already) to be absolutely sure that once a backup cycle completes successfully, remote content matches local content, "beyond reasonable doubt" :).
Gary
__________________________________________________
Do You Yahoo!?
Tired of spam? Yahoo! Mail has the best spam protection around
http://mail.yahoo.com