[Box Backup] BadBackupStoreFile
Chris Wilson
boxbackup@fluffy.co.uk
Thu, 30 Aug 2007 20:34:54 +0100 (BST)
Hi Johann,
On Wed, 29 Aug 2007, Johann Glaser wrote:
>>> We have two >2GB files in the backup store. See the bottom of a sorted
>>> listing:
>>> [...]
>>> 748651521 ./cc/02/o1f.rfw
>>> 748679825 ./72/03/o8e.rfw
>>> 748705025 ./3c/03/oa7.rfw
>>> 759350104 ./20/04/o10.rfw
>>> 759356296 ./23/06/o12.rfw
>>> 759552073 ./9e/06/o8f.rfw
>>> 759633897 ./1a/07/obc.rfw
>>> 1529736909 ./ba/o55.rfw
>>> 1539937214 ./69/01/o05.rfw
>>> 2744679666 ./16/01/o5f.rfw
>>> 5133609317 ./f2/o44.rfw
>>
>> Do you have any errors restoring or comparing the other large file?
>
> Yes, there are errors:
> query > compare -E . .
> Local file './__db.002/__db.002' has different contents to store file './__db.002'.
> Local file './__db.003/__db.003' has different contents to store file './__db.003'.
> Local file './__db.004/__db.004' has different contents to store file './__db.004'.
> Local file './__db.005/__db.005' has different contents to store file './__db.005'.
> Local file './__db.006/__db.006' has different contents to store file './__db.006'.
...
> [ 0 (of 5) differences probably due to file modifications after the last upload ]
> Differences: 5 (0 dirs excluded, 0 files excluded)
Are these errors expected, i.e. did those files change since the last
backup? The message seems to indicate that they did not, and therefore
another possible bug in 0.10. But it's also possible that Subversion or
BDB manually changes the timestamps on these files, rendering Box Backups'
timestamp comparison useless.
> ERROR: (4/48) during file fetch and comparsion for './strings'
> ERROR: (7/41) during file fetch and comparsion for './transactions'
> ERROR: (7/41) during file fetch and comparsion for './uuids'
The 7/41 errors are a symptom of a broken connection (loss of
synchronisation) after the comparison for ./strings failed, which is
expected (unfortunately). Please could you try to identify the other large
file and to compare it separately, to see if you get a 4/48 error? (I'd
expect so).
> In the directory there are some more files which haven't been mentioned
> in the output above. "strings" is the only large file (7.4GB). All other
> files in this directory are <55MB.
Any idea, then, what the other file over 2GB is? (./f2/o44.rfw)
> PID 28114 was still running with nearly 100% CPU when already at the
> bbackupquery prompt. Typing "ls" just hang. I had to kill it, just
> restarting the boxbackup-server didn't stop this task.
That's really bad, sorry. Can you reproduce this?
> Yes, indeed. But I want to state that there are cases where an exception
> is not an internal bug but another problem, e.g. that a (single) backup
> store file was deleted or its permissions changed by somebody playing
> around. Therefore there should be some fault tolerance or graceful error
> recovery to not endanger the rest of the backup.
The cases that you mention should not cause an exception to be thrown, but
rather a recoverable error condition. If you think that they are aborting
the backup, then I'd really appreciate your help to find out why.
>> I agree partly, but I think that we shouldn't have to write "backup
>> continues" after every error message. It should be safe to say that if
>> you see a message saying that it stopped because of an exception, then
>> it did, otherwise it didn't. Perhaps we should document that better.
>
> Good idea.
What Box Backup documentation have you read so far? Do you have an idea
where the best place to document this would be, so that you would have
found it if it existed?
>> The path name is converted to ID by taking the two hex digits from each
>> component, reversing the order (most significant byte is the last one,
>> before ".rfw") and padding with zeroes on the left. So, for example,
>> ./cc/02/o1f.rfw is 001f02cc. (I think that's right anyway).
>
> I found its a bit more complicated. For two levels above mentioned file
> ./f2/o44.rfw belongs to the ID 0000f244. For three levels the
> translation is ./xx/yy/ozz.rfw -> ID=00yyxxzz.
OK, sorry, you learn something every day :-)
>> You can compare those IDs to the ones given in the remote directory
>> listings in bbackupquery, but unfortunately there isn't a global
>> reverse mapping so you need to manually hunt through directories to
>> find them, sorry.
>
> Hehe, thats a good point to mention a feature request. :-)
OK, added to http://bbdev.fluffy.co.uk/trac/wiki/FeatureRequests. Feel
free to add your own feature requests there too.
> For some nearly-equally-sized files in the backup store I found that
> they belong to the very same file on the client, so they represent
> different versions. When looking with bbackupquery at old versions, all
> of them have similar large size.
...
> Unfortunately we backup on an external storage (Iomega StorCenter 150D)
> connected with NFS over 100MBit/s which is _extremely_ slow, especially
> for directory listings. So such timeouts might well be the problem.
>
> I found this option in bbackupd.conf. Which unit is used for the time?
> Seconds? Milliseconds?
Units are seconds. Where did you look for this information, i.e. where
should we improve the documentation?
> Another feature request: The backup server should (additionally) store
> checksums across large blocks of files, e.g. 1MB blocks. Then only these
> checksums need to be read from disk instead of the whole backup file.
I have a feeling that we already do, but I'm not 100% sure. Ben, do you
know?
Cheers, Chris.
--
_____ __ _
\ __/ / ,__(_)_ | Chris Wilson <0000 at qwirx.com> - Cambs UK |
/ (_/ ,\/ _/ /_ \ | Security/C/C++/Java/Perl/SQL/HTML Developer |
\ _/_/_/_//_/___/ | We are GNU-free your mind-and your software |