[Box Backup] Restore fails on symbolic link to itself

David H Kaufman boxbackup@boxbackup.org
Fri, 12 Sep 2008 16:08:52 -0400

I am running boxbackup 0.10 on Gentoo with ext3 filesystems. I did a test
restore of a backup, which restored about 17G of data (out of 42G) and then
failed with "Exception: Common OSFileError (Error accessing a file. Check
permissions.) (1/9)". This was very mysterious - how could a backup get a
permissions error after so much data had been restored?

strace (eventually) showed the problem:

stat("/mnt/newdisk/kaufman/Maildir/.Trash/Trash/trash", 0x7fff980fc870) =
-1 ENOENT (No such file or directory)
read(3, "\27\3\1\0 ", 5) = 5
32) = 32
read(3, "\27\3\1\0p", 5) = 5
112) = 112
unlink("/mnt/newdisk/kaufman/Maildir/.Trash/Trash/trash") = 0
symlink("trash", "/mnt/newdisk/kaufman/Maildir/.Trash/Trash/trash") = 0
geteuid() = 0
lchown("/mnt/newdisk/kaufman/Maildir/.Trash/Trash/trash", 500, 500) = 0
close(4) = 0
write(1, ".", 1.) = 1
stat("/mnt/newdisk/kaufman/Maildir/.Trash/Trash/trash", 0x7fff980fc8a0) =
-1 ELOOP (Too many levels of symbolic links)
close(3) = 0
brk(0x59a000) = 0x59a000
brk(0x599000) = 0x599000
write(1, "Exception: Common OSFileError (E"..., 81Exception: Common
OSFileError (Error accessing a file. Check permissions.) (1/9) ) = 81
exit_group(1) = ?

Indeed, that file is a link to itself in the source filesystem:

ls -l /home/kaufman/Maildir/.Trash/Trash/
total 0
lrwxrwxrwx 1 kaufman 500 5 2003-11-06 10:11 trash -> trash

And it had been restored as such by boxbackup:

ls -l /mnt/newdisk/kaufman/Maildir/.Trash/Trash/
total 0
lrwxrwxrwx 1 kaufman 500 5 2008-09-10 14:44 trash -> trash

If I removed the restored symlink, and resumed my restore, I got the same
error. If I deleted the symlink and touched the file, I got "Exception:
BackupStore OutputFileAlreadyExists (4/8)". I shouldn't expect that to
work, but I was trying to find a workaround so I could restore the rest of
my data!

Eventually, I deleted the link-loop in the source filesystem, and forced
boxbackup to sync. Then I could resume my restore, which ran to completion
and passed my other tests.

Some notes, beside the actual symbolic-link-loop problem itself:
1. Printing the actual Unix error message would have shortened the
debugging cycle
2. If I didn't have the source filesystem handy, I don't know how I would
have fixed the problem
3. A restore "skip-the-first-file" switch would have helped. Alternately,
restore "don't abort on errors". Or, if I could know what file was being
restored, some ability to remove it from the backup (this sounds the most
error-prone and may be a terrible idea).

Thanks very much,