[Box Backup] Thoughts on reliability (Was: "Box Backup - corrupted store")
Ben Summers
boxbackup@fluffy.co.uk
Fri, 10 Jun 2005 11:17:22 +0100
On 9 Jun 2005, at 22:26, Gary wrote:
> Hi,
>
> A couple of thoughts on BoxBackup reliability, after reading the
> "corrupted store" thread.
>
>
>> Server is run with userland RAID disabled (...)
>> Compress TransformFailed
>>
>
> Does the "atomic" file commits used by BoxBackup apply when RAID is
> disabled?
It is still "atomic" without the RAID stuff being used.
> I am running a similar setup. I am wondering how a
> server-side corruption could have happened, even with no-UPS complete
> server power loss.
It's still possible to lose file data, even with a journalled FS.
It's only the meta data which is guaranteed to be consistent in most
designs. (as I understand it)
I suspect that in this case of corruption, a block got corrupted
somehow after it was committed.
> My understanding is that Berkeley-Db should have
> rolled back automatically to the last known correct version of each
> file (along with client chcecksum blocks)?
Berkeley DB is only used on the client to track non-essential data.
>
> Do such circumstances imply that only the latest version is impossible
> to restore (lost), or ALL previous versions of a corrupted file?
Depends if the previous versions require blocks from the current
version or not. So potentially all previous versions.
>
>
>> usr/local/bin/bbackupquery "compare -aq" (...)
>> doesn't write any errors
>>
>
> I think we ran into this before, in a theoretical discussion. It would
> imply that server/client block list checking does not actually
> cross-check actual data on server hard drive.
It compares MD5 checksums of each block with the index. Using a hash
is good enough for digital signatures, so I hope it's good enough for
this.
> Given "complete" compares
> are overwhelmingly time-consuming (100% download of each file), is
> there any way that we could work out to strengthen the quick check?
I suppose a MD5 of the encrypted data could be kept as well, and the
server could verify that the file is still OK?
>
> Just wondering if, and what, I could get my coding into, to make the
> system more reliable and sleep better.
I must finish of my design notes for the rearrangement of the server
store and post them here. This should provide better reliability as
well as more features, but I would of course welcome comments from
others!
Ben