[Box Backup] Hard Links and rdiff

Chris Wilson boxbackup@fluffy.co.uk
Sun, 13 Apr 2008 03:44:29 +0100 (BST)


Hi John,

On Sat, 12 Apr 2008, John Goerzen wrote:

> Until I started doing some tests.  I noticed that Box Backup has
> completely mangled all hard links.  I can do a backup, and when I do a
> restore, it unpacks a separate copy of each file that was hard linked
> together. 

Umm yeah, I can imagine why it would do that :-) Sorry.

> This is a showstopper for me.  In addition to making Box Backup 
> unsuitable for backing up the entire system (due to hardlinks in /bin, 
> /sbin, /usr/bin, etc.)

It is definitely not designed for backing up entire systems. Only user 
data which is not hardlinked.

> it also steps on the toes of people that use distributed version control 
> systems like Git, Mercurial, or Darcs.

Does mercurial really use hard links? That would be a bummer, that would 
never work on Windows or filesystems that don't support hard links.

I can believe that Git would do that because, well, because of the name... 
:-) No idea about Darcs, I'll take your word for it.

> Is there a configuration option somewhere to preserve hard links?

Unfortunately not (yet).

> Are there any other POSIX attributes that Box Backup may unexpectedly 
> not restore?  Does it preserve mtime, atime

It does preserve mtime and atime (although the value of preserving atime 
is approximately zero in my book, because you only have to sneeze to reset 
it).

> ctime

Preserving ctime is actually impossible without hacking the filesystem. 
Any change to the inode, including an attempt to change its ctime, would 
change its ctime. That's POSIX.

> symlinks

Are preserved.

> block and character devices, FIFO locations

Are not backed up at all.

> EAs, and ACLs?

POSIX EAs and ACLs are supported if and only if your Box Backup client 
(bbackupd) was compiled with support for them. Unfortunately there's no 
way to tell if that's the case with a binary package right now, although I 
can see that it would be a good idea to add one.

> Also, I tried in vain to find some details on its rdiff algorithm.

docs/backup/encrypt_rsync.txt

> Does it play rdiffs "backwards" like rdiff-backup, or forwards like
> duplicity?  In the "backwards" scheme, the "full" data is always the
> most current, and you follow a chain of deltas backwards to older
> versions.  This makes restores of recent versions easy, and also
> allows removal of the oldest data without breaking chains.

It is backwards, one of the biggest advantages over duplicity in my view 
(otherwise I'd quite possibly be using and working on duplicity instead).

> duplicity goes the other way -- taking full backups, and storing binary 
> deltas going forward.  This removes the need to keep an accessible copy 
> of the full recent data, at the expense of making restores of recent 
> data slower and imposing some limits on the removal of older data.

Restore time of much-changed files tends to infinity. Not good. And how 
exactly do you remove old versions? Don't you have to rewrite huge archive 
files? Does that work well over the Internet? (I doubt it).

> Is the rdiff algorithm applied on a per-file basis or on the entire 
> backup file as a whole?

Per file.

> If it's on a per-file basis, does it identify a file by name or by 
> device/inode number?

It depends, it uses name normally but it also tracks inode number for use 
in case there's no match on the store with the same name. It helps reduce 
backup bandwidth usage when files are renamed, but doesn't help if they 
are copied.

Cheers, Chris.
-- 
_____ __     _
\  __/ / ,__(_)_  | Chris Wilson <0000 at qwirx.com> - Cambs UK |
/ (_/ ,\/ _/ /_ \ | Security/C/C++/Java/Ruby/Perl/SQL Developer |
\ _/_/_/_//_/___/ | We are GNU : free your mind & your software |