[Box Backup] Mailbox backup is dangerous.

Ben Summers ben at fluffy.co.uk
Sat Sep 12 13:30:54 BST 2009


Achim wrote:

>
> On Sun, 30 Aug 2009 12:29:25 +0100, Ben Summers <ben at fluffy.co.uk>  
> wrote:
>> Housekeeping chooses files to delete by going through each version of
>> each file in the store, and for each version:
>>
>>   * Adding that version to a sorted list.
>>
>>   * Removing entries from the tail of that sorted list until the
>> space recovered by deleting files in the list would be just over the
>> amount of space housekeeping wants to recover.
>>
>> When the scan is complete, housekeeping deletes versions, starting at
>> the head of the list, until it's recovered enough space. This is a
>> relatively memory efficient way of choosing the files to delete.
>>
>> So, the key to understanding which files are deleted is how this list
>> is ordered. Ordering is defined by a standard STL comparison  
>> function,
>> and it's found here:
>>
>>
> http://www.boxbackup.org/trac/browser/box/trunk/bin/bbstored/HousekeepStoreAccount.cpp
>>   on line 612.
>>
>> It would be quite easy to adjust this comparison function to prefer
>> not to remove the last version marked as "deleted", simply by ranking
>> them at the bottom of this list. The actual implementation is left as
>> an exercise for the reader, in the tradition of all the best  
>> textbooks.
>
> OK, I understand the concept, although my limited C++ skills would  
> make it
> dificult for me to write the actual implementation.

I'll check any code, and of course, the results will be checked by  
your accompanying test.

There's a comment in the code about what the code must return.  
Basically return true if you'd rather delete the x entry rather than  
the y entry.

>
> My question is: does this really address the underlying issue for a
> maildir that Tom described:
>
> --- snip ---
> Housekeeping removes files from the store based on the actual file  
> date
> and not on the deletion time.
>
> So, due to some reason my mail archive was deleted on my server.  
> Each mail
> in the maildir has the date it arrived on the Cyrus mail server.
>
> I then looked in the backup and restored the archive.  But it seems
> Housekeeping has cleaned up and deleted everything permanently up to
> september 2009. So I lost my archive from 1998 - september 2009.
> --- snap ---

No, it does not address the issue that there is no delete time  
available. However, it goes a long way to mitigate it. The on disc  
formats should be extensible enough to add delete time, though,  
without losing backwards compatibility.

>
>> As regarding snapshots, the idea of the "marks" referenced in the
>> function would have been that implementation. When you wanted to take
>> a snapshot, simply increment the mark number for FUTURE files.
>> Housekeeping will then prefer not to delete the last file in the
>> snapshot. To restore a snapshot, just use the latest files in that
>> mark. Obviously snapshots could be implemented better, but that was
>> the plan.
>
> Sounds like a "good enough" solution to me: that way, you can  
> restore to a
> consistent snapshot of 23 March 2008 or "Latest complete Snapshot",
> correct? Chris, would this be something that could be an intermediary
> solution to your vision of snapshots?
>
>> Regarding the subject of the thread, it could be written more
>> generally as "running software you don't understanding is dangerous",
>> and we should fix that lack of understanding in this particular case
>> through better documentation.
>
> I think that housekeeping should probably be off by default (i.e. soft
> limit == hard limit), since we cannot know if users will delete  
> important
> files sooner or later than non-important ones, so the order by which
> "deleted" files are cleaned up by housekeeping is arbitrary anyway.
>
> Snapshots or marks sound like a very interesting idea!

The marks aren't a complete solution, as files can be removed from it  
by housekeeping. So you might get a slightly later version of the file  
when you restore.

Ben







More information about the boxbackup mailing list