[Box Backup] Housekeeping not catching up...

Ben Summers boxbackup@fluffy.co.uk
Tue, 12 Apr 2005 09:47:07 +0100


On 12 Apr 2005, at 02:16, Imran wrote:

> Hey Ben!
>
> I had a problem in january where housekeeping wasn't catching up with. 
>  I
> think the client kept connecting and interrupting the housekeeping 
> thing.
> Same thing happened again.
>
> I turned off the bbackupd for the account in question (I think thats 
> what I
> had done before) but I still had three other clients running.  which 
> hasn't
> helped.  two days and the server hasn't cleaned up the accounts.  I 
> also ran
> check+fix on it just in case.
>
> But basically the client in question has a lot of updates & new files 
> so it
> fills up and frequently needs cleaning.  The other clients have a lot 
> of files
> too.  but not as frequent of updates.

The current implementation of housekeeping does not work for all usage 
patterns. This is one of them which is sub-optimal.

The next version will have a re-written housekeeper which doesn't need 
to scan everything before deleting. The move to reference counted 
objects will make a big difference.

>
> Will one clients connection interrupt housekeeping for all clients?  
> or will
> it only interfere with its own housekeeping?

A connection will only interrupt the housekeeping for it's own account. 
All others will be unaffected. (you can see this from the logs)

>  Anyway, when its over limit, the
> server shouldn't allow that specific client to interrupt itself.  Also 
> started
> getting write lock error:
>
> Apr  9 14:53:19 backup bbstored[9274]: Certificate CN: BACKUP-030303
> Apr  9 14:53:23 backup bbstored[9274]: Failed to get write lock (for 
> Client ID
> 00030303)
> Apr  9 14:53:23 backup bbstored[9274]: in server child, exception 
> Connection
> TLSReadFailed (Probably a network issue between client and server.) 
> (7/34) --
> terminating child

You shouldn't see that.

>
> after this, client tried again a minute later:
>
> Apr  9 14:54:28 backup bbstored/hk[20596]: Housekeeping giving way to
> connection for account 0x00030303
> Apr  9 14:54:28 backup bbstored/hk[20596]: Account 0x00030303, removed 
> 3405
> blocks (1685 files, 0 dirs) was interrupted
> Apr  9 14:55:03 backup bbstored[20595]: Incoming connection from
> 209.126.207.39 port 58171 (handling in child 9278)
> Apr  9 14:55:03 backup bbstored[9278]: Certificate CN: BACKUP-030303
> Apr  9 14:55:03 backup bbstored[9278]: Login: Client ID 00030303, 
> Read/Write
> Apr  9 14:55:04 backup bbstored[9278]: Session finished
>
>
> And the server's houskeeping got interrupted.  and the server told it 
> that it
> doesn't have enough space so nothing was transferred. But the server 
> has to
> restart housekeeping.
>
> You had mentioned storing additional meta-data on each file or 
> directory,
> which may help to speed up housekeeping on large accounts.  If that is 
> not
> hard to implement, that would be cool.

No, it's not terribly hard. To get all the features that everyone 
wants, I'm going to have to rewrite a small portion of the backup 
engine. It'll fix all the complaints, and get things working for more 
usage patterns than I originally designed the thing for.

>
> Also I think what triggered this specific backlog is that I resized 
> several
> sets of images for a client.  about 6 gigs of images were shrunk to 2 
> or 3
> gigs.  I guess the server had to mark about 6 gigs of images (which 
> were
> anywhere from 300k to 3megs in size) as deleted.  and then upload 
> another two
> or three gigs of images (which were now about 70k to 400k in size).  
> And while
> this happened, also another customer uploaded about one or two gigs of 
> files
> to the server.

That is probably going to take a while to clean up.

>
> All this happened about a week ago.    server couldn't catchup on its 
> own for
> two days.  I just stopped all the other clients, and I'll jsut wait a 
> day or
> so for the housekeeping to catch up.

The only way at the moment, I'm afraid.

Ben