[Box Backup] Issue with Incremental backup of mail folder

Chris Wilson boxbackup@fluffy.co.uk
Sun, 29 Apr 2007 22:48:26 +0100 (BST)


Hi Imran,

On Sun, 29 Apr 2007, Imran Niazi wrote:

> Anyway, all the emails of a particular user were deleted, approximately 
> 8 days ago.  I thought I could get into atleast an one or two old backup 
> of that folder, but the file only shows once, and its about 24 hours old

Did you try listing with the -o option? (show old versions of files).

> (I assume its still under MaxUploadWait.  I assumed that we would be 
> able to get to an older version of a file, but it would also mean there 
> would be as many versions of that file as there would be days.  But It 
> also means that if a file is truncated , we have no backups of the file.

Truncation or replacement of the file should not have any effect on the 
way that backups work, i.e. the number of old versions which are kept. The 
only things that will affect that are the size of the increments (daily 
changes) and the available space on the store for this account (versus the 
amount of data in current files, i.e. how much space is available for 
storing old versions of files).

> Or I'm hoping there is a low level way of finding out previous versions 
> of the file and possibly un-diffing the file?

If the old version is present on the store (shown with ls -o) then you can 
just restore it with "get -i" and its object ID. If it's not shown, then 
I'm afraid the case is pretty hopeless (unless you have a backup of the 
store itself at an earlier date).

> I guess the way it works is, that since the inode didn't change, the 
> backup thinks that all modifications are the same file and the original 
> can be discarded (or maybe the client behaves like that?)

No, we only check the inode number to determine if files have been 
renamed. Box Backup treats file modification in-place, and file 
deletion and rewriting, exactly the same.

> I guess if its not possible to get the old versions, then it would be 
> cool to have it such that a user can specify a certain date & time, and 
> the backup would give you the state of a file at that time.

Yes, that would be a cool feature. Boxi has something like that, where you 
can restore a file "as of" a specific time, and you will get last version 
backed up before that time.

> Also there would be an option to give a time range, where if a file 
> changed after the time spcified, but still in the time/date range.  To 
> do this functionality, however, you'd have to track a timeline plus 
> diffs for each file.

I think that we do this already, i.e. we know the last modification date 
(not the upload date) of each "old version" still on the store.

> There would be a resource/info file for each file tracked/backedup, that 
> would list the initial date of backup, the date the next backup ran, and 
> the diff of it.

We don't quite do that already, since we don't track the actual times of 
backup runs, but what I described above should hopefully be enough for 
what you need.

> If the space requirements are going to be very large for this feature 
> then there should also be a configurable number of days/months that it 
> keeps this state information.

At the moment, we keep it as long as we keep the reverse diff (i.e. the 
old version of the file) itself. I can't see a reason for keeping it for 
more or less time than that.

> In its housekeeping, it will remove the deltas from earlier than those 
> days, and update the initial state be the state on that date, as well 
> change the 'initial date' in the resource/info file.

The algorithm to decide what is removed from the store is unfortunately 
quite complicated, and I don't understand it well. There have been 
proposals to store the "entire state" of the remote system at a particular 
time (i.e. snapshot time) as a single entity that would be deleted in its 
entirety or not at all. That would allow one to implement a policy like 
"keep the last 14 days worth of backups". Unfortunately, nobody has had 
time to implement that in Box Backup yet.

rdiff-backup does something like that already, but it's not encrypted. For 
what it's worth, I currently use rdiff-backup for all of my backups (but 
never to untrusted machines).

I hope this helps,

Cheers, Chris.
-- 
_____ __     _
\  __/ / ,__(_)_  | Chris Wilson <0000 at qwirx.com> - Cambs UK |
/ (_/ ,\/ _/ /_ \ | Security/C/C++/Java/Perl/SQL/HTML Developer |
\ _/_/_/_//_/___/ | We are GNU-free your mind-and your software |