[Box Backup-dev] bbackupd: enhanced file change detection
G.
boxbackup-dev@fluffy.co.uk
Sun, 14 Jan 2007 12:19:39 -0800 (PST)
Hi,=0A=0AA problem: in case of certain backup content, files to be backed u=
p change neither timestamps, nor sizes (examples: Subversion __db.* reposit=
ory objects, TrueCrypt fixed-size encrypted volumes) and never get backed u=
p. Software packages such as SyncBackSE usually include an option to use a =
file hash to ensure file change detection.=0A=0AI have been going through b=
backupd code and so far I came up with two possible solutions to this probl=
em:=0A=0A1.)=0A=0A(a.) Add a CRC-32 checksum or an MD5 checksum to each fil=
e's checksum_info in a directory to be examined, to BackupClientDirectoryRe=
cord::SyncDirectory(), currentStateChecksum.=0A(b.) Add QueryGetBlockIndexB=
yID(/* latestObjectID */) to BackupClientDirectoryRecord::UpdateItems() and=
run BackupStoreFile::CompareFileContentsAgainstBlockIndex() to determine d=
oUpload.=0A=0AEffectively, this would be a rough equivalent to "compare -q"=
for directories that have been already classified as changed by a director=
y checksum. No impact on any other parts of the code, however, this would g=
enerate additional network traffic and would be somewhat slow to execute.=
=0A=0A2.)=0A=0A(a.) Add a CRC-32 checksum or an MD5 checksum to each file's=
=0Achecksum_info in a directory to be examined, to=0ABackupClientDirectoryR=
ecord::SyncDirectory(), currentStateChecksum.=0A=0A(b.) Build a dedicated, =
local BDB, SQLite, or plain, lightweight=0Abinary file lookup repository fo=
r bbackupd, storing last known file CRC32s=0Aor MD5s, by latestObjectID.=0A=
=0A=0A---=0A=0AQuestion A.): would it be possible to add a CRC-32 checksum =
or an MD5 checksum as an additional StreamableMemBlock BackupStoreDirectory=
::Entry::mAttributes member, or something of that sort, in such a manner as=
not to require a server-side repository upgrade, nor a change to the curre=
nt client/server command set? If so, it could be used by BackupClientDirect=
oryRecord::UpdateItems() to perform a direct file checksum comparison for d=
irectories that have been already classified as changed by a directory chec=
ksum, without generating=0Aadditional QueryGetBlockIndexByID() network traf=
fic, nor running CompareFileContentsAgainstBlockIndex().=0A=0AQuestion B.) =
would you prefer MD5 or CRC-32 hash algorithm under those circumstances? MD=
5 is much more reliable, but CRC-32 is approximately 30% - 50% less expensi=
ve computationally: not a negligible difference, considering the fact that =
a backup cycle could include thousands of files to be examined.=0A=0ASorry =
if I missed something in my first pass through the code. Thoughts?=0A=0AGar=
y=0A=0A=0A=0A=0A=0A =0A____________________________________________________=
________________________________=0ANow that's room service! Choose from ov=
er 150,000 hotels=0Ain 45,000 destinations on Yahoo! Travel to find your fi=
t.=0Ahttp://farechase.yahoo.com/promo-generic-14795097