[Box Backup] BadBackupStoreFile

Johann Glaser boxbackup@fluffy.co.uk
Wed, 29 Aug 2007 09:45:49 +0200


Hi Chris!

> > Yes, the original file is >2GB, but we have some more, even larger
> > files, which don't show this problem.
> 
> It has to be over 2GB _compressed_ to trigger the bug. Perhaps the other 
> files compress down to under 2GB?

We have two >2GB files in the backup store. See the bottom of a sorted
listing:
[...]
748651521 ./cc/02/o1f.rfw
748679825 ./72/03/o8e.rfw
748705025 ./3c/03/oa7.rfw
759350104 ./20/04/o10.rfw
759356296 ./23/06/o12.rfw
759552073 ./9e/06/o8f.rfw
759633897 ./1a/07/obc.rfw
1529736909 ./ba/o55.rfw
1539937214 ./69/01/o05.rfw
2744679666 ./16/01/o5f.rfw
5133609317 ./f2/o44.rfw

> > Is it sure that the backup continues after the problem with all other 
> > files? Or does the backup stop after the first error?
> 
> It should continue, but if an exception is being thrown then it might well 
> not do so as that should be an extreme case. I haven't verified the code 
> path in this case. Does it matter to you, i.e. would you be happy to 
> continue to run this version knowing that a few large files are not being 
> backed up?
> 
> You can tell pretty easily from the backup logs. If it says "Caught 
> exception - reset state and waiting to retry" then it did not complete the 
> backup run, but aborted because of the exception.

I see. This error message is not in our logs.

I think that for all users of a backup tool it is vital that the backup
runs as reliable and complete as possible. Therefore it should continue
even if some "small" errors occur and only notify the user of them. It
should also be stated clearly (either in documentation or better in
syslog) whether an error is "small" enough and the backup continues or
if the backup run was stopped. With "clearly" I suggest that it is
explicitly written "Backup continues" or "Backup stopped" instead of the
above exception message where one needs to know the meaning and
implications.

> > How can I find out which files in the backup store belong to which 
> > original files?
> 
> Sorry, I don't know exactly what you mean here, does bbackupquery's list 
> command tell you what you want?

In our backup store we have files with hex numbers in their names, e.g.
"./3c/05/o95.rfw". How can I find out to which file on the backup client
that belongs? Or (the other way round): how can I find out which backup
store files belong to a particular file on the client?

> > Your second paragraph suggests that excluding files from the backup by 
> > adding an exclusion-statement in bbackupd.conf will remove all backups 
> > of this file. Is this true?
> 
> No, it's only that when bbackupd uploads a new version, it does not send a 
> patch against deleted versions, so it will upload a fresh copy and avoid 
> triggering the bug.

Ah, I see.

Another question: In our backup store we have lots of files with
identical size but different md5sums, e.g.
532869699 ./3d/05/o6a.rfw
532869699 ./42/04/of3.rfw
532869699 ./46/06/o24.rfw
532869699 ./8d/04/o89.rfw
532869699 ./94/03/o45.rfw
532869699 ./96/o81.rfw
532869699 ./a6/07/o00.rfw
532869699 ./c2/06/o38.rfw
532869699 ./ca/05/oa9.rfw
532869699 ./d8/04/o19.rfw
There are several such runs. I assume that these all belong to the same
original file on the client. So, my question is, if a large file changes
slightly (e.g. gets a bit longer, or a few (kilo)bytes are modified),
will the backup store then hold only the difference (like SVN) or the
whole new plus the whole old versions of the file?

To know this is important for us, because currently we intentionally
don't compress backups of database-dumps (MySQL, SVN) to allow the
diff-algorithm to find as small differences as possible. 

Thanks
  Hansi

-- 
Johann Glaser                          <glaser@ict.tuwien.ac.at>
             Institute of Computer Technology, E384
Vienna University of Technology, Gusshausstr. 27-29, A-1040 Wien
Phone: ++43/1/58801-38444                Fax: ++43/1/58801-38499