RFC: end-to-end compare -aq (Was: Re: [Box Backup] Win32 native client service bbackupd.conf)

Gary boxbackup@fluffy.co.uk
Fri, 7 Jul 2006 13:15:28 -0700 (PDT)


Ben, Chris,

>> The alternative of the client sending up to the server the ciphertext 
>> along with strong checksums for the ciphertext, to be stored and 
>> compared by the server later on, would also allow for an end-to-end 
>> bbstoredcheckaccount (without requiring client cooperation).

> I'm not sure I quite follow what you're suggesting.

The way I see it, there are two ways to guarantee end-to-end data integrity to a very high
statistical probability:

A.) IV + re-encryption:
	- (compare) client downloads ciphertext IVs and ciphertext checksums from the server
	- (compare) client encrypts plaintext with the IV, compresses it, and compares resulting
ciphertext against server ciphertext checksum
	- (bbstroedcheckaccount) <<not doable>>

B.) ciphertext-level checksum
	- (backup) client uploads ciphertext checksum to the server (along with ciphertext and plaintext
checksum)
	- (compare) client issues a command to the server to compare ciphertext checksum to what's on the
media
	
Since the plaintext checksum matches local client content (-aq) and ciphertext checksum matches
server content, we have end-to-end data integrity, unless both server media content and ciphertext
checksum get corrupted in such a way, as to match each other (wonder how possible that is...).
	
	- (bbstroedcheckaccount) server compares all ciphertext checksums to what's on the media

> Now there is a slight potential problems with this. Most blocks will  
> be compressed, so you are replying on zlib emitting the same output  
> every time it compresses the same data. Now this should work, but a  
> change in zlib could give different output and screw up this  
> comparison process.

Yes, when it comes to option (A.), since there is a need to re-encrypt and re-compress before
compare. Option (B.) does not have this problem, since it works with the ciphertext alone.
However, option (B.) would require changing internal repository layout (to accommodate the storage
of ciphertext checksums). Even if we make it backward-compatible, it is useless unless a client
re-backups everything to re-submit ciphertext checksums.

> Why even do that? The server can compute the checksum of the ciphertext by 
> itself. It could write the checksum to disk alongside the encrypted data, 
> and "bbstoreaccounts check" could verify that the encrypted block still 
> hashes to the same checksum, and thus the encrypted data was not damaged 
> on the server (however unlikely that might be).

Yes, only I augmented the idea with the client generating ciphertext checksums and sending them
along with the ciphertext in the first place (as opposed to the server generating ciphertext
checksums once it receives the ciphertext from the client), since that method would also catch any
kind of errors with transmission, server memory, server disks, server logic, the works.
Effectively, a client says: "the plaintext checksum is what this ciphertext MEANS to me, the
ciphertext checksum is the CONTENT I expect back from you in the future".

I might be over-doing this, though.

Gary


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com