[Box Backup] Mailbox backup is dangerous.

Achim boxbackup@boxbackup.org
Sun, 30 Aug 2009 00:22:28 +0200


Dear list:

(Apologies in advance for a lengthy post, but this is an important issue 
to me.)

I agree that losing your old data is terrible, especially since the
related "functionality" is apparently not very clear in the
documentation. You might get bitten by this in the way Tom did.

At the same time, I believe that this case touches a fundamental issue 
where we need to distinguish between Backup/Disaster Recovery (DR) and 
Archiving.

For our clients, we usually advise them to use different strategies for
both use cases, since the requirements are very different. The
distinction for is usually live projects with ongoing work that tend to
fall into the backup scenario, and older projects that have been
finalised and handed over that fall into the archival scenario.

Backup/DR requires high storage capacity (many versions of files), good
network performance (in case we really need to restore), and low storage
cost per GB (since we will need a lot of space due to point 1).

For archiving needs, the focus is more on data authenticity, media
longevity, and reasonable cost of ownership.

Without trying to sound patronising, in our setup the mails that Tom
lost (1998 - September 2009) would have at least partially (probably
until first half 2008) migrated into the archive, rather than staying
inside the backup schedule.

At the same time, I still agree that the scenario of deleting the single 
(MBOX?) file one day, and not being able to recover it from a *backup* 
solution the next day certainly sounds like a bug: perhaps not in code, 
but definitely in relation to fulfilling the user expectations.

I can't possibly think that such a scenario is "what the software is 
supposed to do" if we are talking about a backup solution.

As for Edo's idea:

On 29/08/2009 10:00, scartomail wrote:
> Ok, this is me thinking out load with no real c++ programming skils.
> BB does 3 things. - backup: this seems to work - housekeeping : we
> don't realy know what it does and it sometimes deletes files. -
> restore: this seems to work
>
> Let's say housekeeping is currently doing 10 things. Why don't we
> throw away the housekeeping part and replace it with a new
> houskeeping that just does 1 or 2 things that we know of it does well
> and are essential for BB to function. We create that by copy and
> paste the housekeeping stuff we know and understand.
>
> This way we have something we understand, is stabel and we can always
> add functionality to it as we go allong.

I would agree that turning off housekeeping is a first step: an 
automated way to determine what data the user does or does not want is 
probably not very effective, unless the BB algorithm has somehow managed 
to evolve into a being of higher consciousness: our users usually don't 
know themselves what they want, so outsourcing that decision to an 
algorithm does not sound very sensible.

Without housekeeping, the store will of course eventually run full and 
backups will stop, but we could probably have a warning at e.g. 80% of 
store full (like e-mail quota), so that the user and the admin can do 
something about the situation (delete stuff they really don't want, add 
storage space, migrate to archive).

At 100% storage full, I believe the pretty powerful message of "Data is 
not being backed up" will make users look for solutions pretty soon, and 
otherwise it is their problem, not the administrator's.

In the current situation, and admin might do everything correct (like in 
Tom's case), the user might do everything correct and even delete the 
occasional file (that's why we have backup, right), and still important 
data gets deleted (like in Tom's case) by an automated and apparently 
not very well understood function.

Again, this last scenario makes me certainly worried.