[Box Backup] Issue with Incremental backup of mail folder
Chris Wilson
boxbackup@fluffy.co.uk
Sun, 29 Apr 2007 22:48:26 +0100 (BST)
Hi Imran,
On Sun, 29 Apr 2007, Imran Niazi wrote:
> Anyway, all the emails of a particular user were deleted, approximately
> 8 days ago. I thought I could get into atleast an one or two old backup
> of that folder, but the file only shows once, and its about 24 hours old
Did you try listing with the -o option? (show old versions of files).
> (I assume its still under MaxUploadWait. I assumed that we would be
> able to get to an older version of a file, but it would also mean there
> would be as many versions of that file as there would be days. But It
> also means that if a file is truncated , we have no backups of the file.
Truncation or replacement of the file should not have any effect on the
way that backups work, i.e. the number of old versions which are kept. The
only things that will affect that are the size of the increments (daily
changes) and the available space on the store for this account (versus the
amount of data in current files, i.e. how much space is available for
storing old versions of files).
> Or I'm hoping there is a low level way of finding out previous versions
> of the file and possibly un-diffing the file?
If the old version is present on the store (shown with ls -o) then you can
just restore it with "get -i" and its object ID. If it's not shown, then
I'm afraid the case is pretty hopeless (unless you have a backup of the
store itself at an earlier date).
> I guess the way it works is, that since the inode didn't change, the
> backup thinks that all modifications are the same file and the original
> can be discarded (or maybe the client behaves like that?)
No, we only check the inode number to determine if files have been
renamed. Box Backup treats file modification in-place, and file
deletion and rewriting, exactly the same.
> I guess if its not possible to get the old versions, then it would be
> cool to have it such that a user can specify a certain date & time, and
> the backup would give you the state of a file at that time.
Yes, that would be a cool feature. Boxi has something like that, where you
can restore a file "as of" a specific time, and you will get last version
backed up before that time.
> Also there would be an option to give a time range, where if a file
> changed after the time spcified, but still in the time/date range. To
> do this functionality, however, you'd have to track a timeline plus
> diffs for each file.
I think that we do this already, i.e. we know the last modification date
(not the upload date) of each "old version" still on the store.
> There would be a resource/info file for each file tracked/backedup, that
> would list the initial date of backup, the date the next backup ran, and
> the diff of it.
We don't quite do that already, since we don't track the actual times of
backup runs, but what I described above should hopefully be enough for
what you need.
> If the space requirements are going to be very large for this feature
> then there should also be a configurable number of days/months that it
> keeps this state information.
At the moment, we keep it as long as we keep the reverse diff (i.e. the
old version of the file) itself. I can't see a reason for keeping it for
more or less time than that.
> In its housekeeping, it will remove the deltas from earlier than those
> days, and update the initial state be the state on that date, as well
> change the 'initial date' in the resource/info file.
The algorithm to decide what is removed from the store is unfortunately
quite complicated, and I don't understand it well. There have been
proposals to store the "entire state" of the remote system at a particular
time (i.e. snapshot time) as a single entity that would be deleted in its
entirety or not at all. That would allow one to implement a policy like
"keep the last 14 days worth of backups". Unfortunately, nobody has had
time to implement that in Box Backup yet.
rdiff-backup does something like that already, but it's not encrypted. For
what it's worth, I currently use rdiff-backup for all of my backups (but
never to untrusted machines).
I hope this helps,
Cheers, Chris.
--
_____ __ _
\ __/ / ,__(_)_ | Chris Wilson <0000 at qwirx.com> - Cambs UK |
/ (_/ ,\/ _/ /_ \ | Security/C/C++/Java/Perl/SQL/HTML Developer |
\ _/_/_/_//_/___/ | We are GNU-free your mind-and your software |