[Box Backup] Thoughts on reliability (Was: "Box Backup - corrupted store")

Ben Summers boxbackup@fluffy.co.uk
Fri, 10 Jun 2005 11:17:22 +0100


On 9 Jun 2005, at 22:26, Gary wrote:

> Hi,
>
> A couple of thoughts on BoxBackup reliability, after reading the
> "corrupted store" thread.
>
>
>> Server is run with userland RAID disabled (...)
>> Compress TransformFailed
>>
>
> Does the "atomic" file commits used by BoxBackup apply when RAID is
> disabled?

It is still "atomic" without the RAID stuff being used.

> I am running a similar setup. I am wondering how a
> server-side corruption could have happened, even with no-UPS complete
> server power loss.

It's still possible to lose file data, even with a journalled FS.  
It's only the meta data which is guaranteed to be consistent in most  
designs. (as I understand it)

I suspect that in this case of corruption, a block got corrupted  
somehow after it was committed.

> My understanding is that Berkeley-Db should have
> rolled back automatically to the last known correct version of each
> file (along with client chcecksum blocks)?

Berkeley DB is only used on the client to track non-essential data.

>
> Do such circumstances imply that only the latest version is impossible
> to restore (lost), or ALL previous versions of a corrupted file?

Depends if the previous versions require blocks from the current  
version or not. So potentially all previous versions.

>
>
>> usr/local/bin/bbackupquery "compare -aq" (...)
>> doesn't write any errors
>>
>
> I think we ran into this before, in a theoretical discussion. It would
> imply that server/client block list checking does not actually
> cross-check actual data on server hard drive.

It compares MD5 checksums of each block with the index. Using a hash  
is good enough for digital signatures, so I hope it's good enough for  
this.

> Given "complete" compares
> are overwhelmingly time-consuming (100% download of each file), is
> there any way that we could work out to strengthen the quick check?

I suppose a MD5 of the encrypted data could be kept as well, and the  
server could verify that the file is still OK?

>
> Just wondering if, and what, I could get my coding into, to make the
> system more reliable and sleep better.

I must finish of my design notes for the rearrangement of the server  
store and post them here. This should provide better reliability as  
well as more features, but I would of course welcome comments from  
others!

Ben