[Box Backup] Store corruption not detected or fixed by bbstoreaccounts

Chris Wilson boxbackup@boxbackup.org
Sat, 7 Feb 2009 23:45:59 +0000 (GMT)


Hi Alex,

Sorry for the delay in replying to your email.

On Sat, 24 Jan 2009, Alex Harper wrote:

>> Is it deep in directories? Could you copy it and its parent directories to
>> a new account? Otherwise can you leave it in the account that it's in now
>> for a few days?
>
> New discoveries (see below) make me believe I should keep the whole thing as
> an exemplar. I've solved the problem by finding enough space to replicate
> the account.

Excellent.

> On the original account with the original crypto error I reported, another
> error has cropped up on a different file:
>
> WARNING: zlib error code = -3
> WARNING: Exception thrown: CompressException(TransformFailed) at
> ../../lib/compress/Compress.h(154)
>
> Looking back at captured output this error was reported in the same compare
> -a output as the original crypto error, but because it wasn't a fatal error
> I overlooked it. However, I actually needed to restore the file yesterday
> and the restore failed.

Sorry to hear that. Do you run compare -a regularly?

> I then ran compare -a on another user's account and discovered the exact
> same error on a similar data file. In the case of the second account the
> compression exception was fatal. I'm still not sure why it wasn't fatal on
> my account.

It might depend on the code path, as the exception might be caught on some 
paths and not others. If it's not caught then it will be fatal.

> In both my account and the other user account the newly discovered 
> corrupt file is a large (1.5GB or more) mail database that Box is 
> backing up while open but largely quiescent. The files are from the same 
> program (Entourage 2008) on two different laptops of two different 
> architectures (PPC vs x86). In both cases this is a hot file with a 
> large number of revisions.
>
> Since the files are so structurally similar and updated similarly I'm 
> willing to believe that this is a problem with the diff patch. Or maybe 
> a problem with recipe generation in general.

Is it possible for you to reproduce this problem, e.g. by setting up a 
dummy Entourage database and sending dummy emails to it while backing it 
up?

> Its also possible that the problem is with the filesystem, as a part of 
> diagnosing this I discovered a frayed cable to the drive. That said, 
> fsck is fine, other non bbstored data on the drive checksums correctly, 
> and it seems suspicious that the corruption is so similar.

If this was the case then I'd expect you to be seeing corrupted files and 
directories, filesystem errors, etc. So it wouldn't be the first avenue 
that I'd investigate.

Do you back up other Entourage databases onto a different system? Do you 
also see problems there?

> The argument against filesystem issues or cosmic rays would be that 
> since these are not the only hot files, why are they having such similar 
> errors?

Agreed. Is it only Entourage databases that are suffering this problem?

> If you have any diagnostics you want me to run I'm willing. Because I'm 
> cutting over to my secondary backup as the primary now and shutting down 
> bbstored there's no longer much time pressure, I can keep the account 
> around for a while if it helps.

Please can you try to reproduce the problem in some way?

Cheers, Chris.
-- 
_____ __     _
\  __/ / ,__(_)_  | Chris Wilson <0000 at qwirx.com> - Cambs UK |
/ (_/ ,\/ _/ /_ \ | Security/C/C++/Java/Ruby/Perl/SQL Developer |
\__/_/_/_//_/___/ | We are GNU : free your mind & your software |