[Box Backup] Mailbox backup is dangerous.

Achim achim+box at qustodium.net
Wed Sep 9 14:08:30 BST 2009


Hello Ben:

On Sun, 30 Aug 2009 12:29:25 +0100, Ben Summers <ben at fluffy.co.uk> wrote:
> Housekeeping chooses files to delete by going through each version of  
> each file in the store, and for each version:
> 
>    * Adding that version to a sorted list.
> 
>    * Removing entries from the tail of that sorted list until the  
> space recovered by deleting files in the list would be just over the  
> amount of space housekeeping wants to recover.
> 
> When the scan is complete, housekeeping deletes versions, starting at  
> the head of the list, until it's recovered enough space. This is a  
> relatively memory efficient way of choosing the files to delete.
> 
> So, the key to understanding which files are deleted is how this list  
> is ordered. Ordering is defined by a standard STL comparison function,  
> and it's found here:
> 
>
http://www.boxbackup.org/trac/browser/box/trunk/bin/bbstored/HousekeepStoreAccount.cpp
>    on line 612.
> 
> It would be quite easy to adjust this comparison function to prefer  
> not to remove the last version marked as "deleted", simply by ranking  
> them at the bottom of this list. The actual implementation is left as  
> an exercise for the reader, in the tradition of all the best textbooks.

OK, I understand the concept, although my limited C++ skills would make it
dificult for me to write the actual implementation.

My question is: does this really address the underlying issue for a
maildir that Tom described:

--- snip ---
Housekeeping removes files from the store based on the actual file date
and not on the deletion time.

So, due to some reason my mail archive was deleted on my server. Each mail
in the maildir has the date it arrived on the Cyrus mail server.

I then looked in the backup and restored the archive.  But it seems
Housekeeping has cleaned up and deleted everything permanently up to
september 2009. So I lost my archive from 1998 - september 2009. 
--- snap ---

> As regarding snapshots, the idea of the "marks" referenced in the  
> function would have been that implementation. When you wanted to take  
> a snapshot, simply increment the mark number for FUTURE files.  
> Housekeeping will then prefer not to delete the last file in the  
> snapshot. To restore a snapshot, just use the latest files in that  
> mark. Obviously snapshots could be implemented better, but that was  
> the plan.

Sounds like a "good enough" solution to me: that way, you can restore to a
consistent snapshot of 23 March 2008 or "Latest complete Snapshot",
correct? Chris, would this be something that could be an intermediary
solution to your vision of snapshots?

> Regarding the subject of the thread, it could be written more  
> generally as "running software you don't understanding is dangerous",  
> and we should fix that lack of understanding in this particular case  
> through better documentation.

I think that housekeeping should probably be off by default (i.e. soft
limit == hard limit), since we cannot know if users will delete important
files sooner or later than non-important ones, so the order by which
"deleted" files are cleaned up by housekeeping is arbitrary anyway.

Snapshots or marks sound like a very interesting idea!

Best regards, Achim



More information about the boxbackup mailing list