[Box Backup] Thoughts on reliability (Was: "Box Backup - corrupted store")

Gary boxbackup@fluffy.co.uk
Tue, 21 Jun 2005 02:54:50 -0700 (PDT)


Ben,

> 1) Write to temp files, then move into place to commit. Everything  
> 2) Careful ordering of operations, exception handlers and destructors
> 3) Fault tolerance -- expect things to go wrong. If something could  

Sounds like a good plan, but considering the fact that a backup of
electronic property is often one of the most valuable company assets, I
think employing a transactional file system (through an API) would be
the best way to go. I would be happy to look into this, but are you
planning to change the backend code  substantially in the next version?
Would that work be, effectively, "lost"?

> every few requests. If the server child process terminates  
> unexpectedly, it will be out of date. So the code which allocates a  
> new ID assumes that it may be out of date, and the housekeeping  
> routine will correct out of date space information.

Actually, that one gave quite a run-around a few times, until I proved
to myself that it was harmless. Oh, well, I guess I am the extremely
paranoid type :).

> You would have to trust the server to report correct results. Either 
> a checksum of the encrypted data could be added, but perhaps more
> space efficiently a single checksum for the entire store file could  
> be added. There may be issues with calculating this when patches are 

Exactly. If a server-content checksum is block-based and does not
"care" about the content of a file, then it should work fine - all we
need to verify is that what's stored on the server hard drive is
exactly what has been uploaded and corresponds to a set of client
blocks.
  
> I would prefer a challenge-response system, where the client 
> challenges a server with some data, and it replies with some more  
> which could only be calculated if the server held a proper copy of  
> the file.

I beg to differ here, I think it would be much more valuable to confirm
server content integrity without any access to client data. A backup
server admin should not have to rely on his or her users reporting
server corruption, but should be able to cron a verification process
every, say, 12 hours, and get notified immediately over
e-mail/beepter/SMS, if something has gone wrong with the store (that
might require another client backup ASAP).

Gary



		
____________________________________________________ 
Yahoo! Sports 
Rekindle the Rivalries. Sign up for Fantasy Football 
http://football.fantasysports.yahoo.com