[Box Backup] Thoughts on reliability (Was: "Box Backup - corrupted store")
Gary
boxbackup@fluffy.co.uk
Tue, 21 Jun 2005 02:54:50 -0700 (PDT)
Ben,
> 1) Write to temp files, then move into place to commit. Everything
> 2) Careful ordering of operations, exception handlers and destructors
> 3) Fault tolerance -- expect things to go wrong. If something could
Sounds like a good plan, but considering the fact that a backup of
electronic property is often one of the most valuable company assets, I
think employing a transactional file system (through an API) would be
the best way to go. I would be happy to look into this, but are you
planning to change the backend code substantially in the next version?
Would that work be, effectively, "lost"?
> every few requests. If the server child process terminates
> unexpectedly, it will be out of date. So the code which allocates a
> new ID assumes that it may be out of date, and the housekeeping
> routine will correct out of date space information.
Actually, that one gave quite a run-around a few times, until I proved
to myself that it was harmless. Oh, well, I guess I am the extremely
paranoid type :).
> You would have to trust the server to report correct results. Either
> a checksum of the encrypted data could be added, but perhaps more
> space efficiently a single checksum for the entire store file could
> be added. There may be issues with calculating this when patches are
Exactly. If a server-content checksum is block-based and does not
"care" about the content of a file, then it should work fine - all we
need to verify is that what's stored on the server hard drive is
exactly what has been uploaded and corresponds to a set of client
blocks.
> I would prefer a challenge-response system, where the client
> challenges a server with some data, and it replies with some more
> which could only be calculated if the server held a proper copy of
> the file.
I beg to differ here, I think it would be much more valuable to confirm
server content integrity without any access to client data. A backup
server admin should not have to rely on his or her users reporting
server corruption, but should be able to cron a verification process
every, say, 12 hours, and get notified immediately over
e-mail/beepter/SMS, if something has gone wrong with the store (that
might require another client backup ASAP).
Gary
____________________________________________________
Yahoo! Sports
Rekindle the Rivalries. Sign up for Fantasy Football
http://football.fantasysports.yahoo.com