[Box Backup] Housekeeping not catching up...

Imran boxbackup@fluffy.co.uk
Mon, 11 Apr 2005 20:16:25 -0500


Hey Ben!

I had a problem in january where housekeeping wasn't catching up with.  I
think the client kept connecting and interrupting the housekeeping thing. 
Same thing happened again.

I turned off the bbackupd for the account in question (I think thats what I
had done before) but I still had three other clients running.  which hasn't
helped.  two days and the server hasn't cleaned up the accounts.  I also ran
check+fix on it just in case.

But basically the client in question has a lot of updates & new files so it
fills up and frequently needs cleaning.  The other clients have a lot of files
too.  but not as frequent of updates. 

Will one clients connection interrupt housekeeping for all clients?  or will
it only interfere with its own housekeeping?  Anyway, when its over limit, the
server shouldn't allow that specific client to interrupt itself.  Also started
getting write lock error:

Apr  9 14:53:19 backup bbstored[9274]: Certificate CN: BACKUP-030303
Apr  9 14:53:23 backup bbstored[9274]: Failed to get write lock (for Client ID
00030303)
Apr  9 14:53:23 backup bbstored[9274]: in server child, exception Connection
TLSReadFailed (Probably a network issue between client and server.) (7/34) --
terminating child

after this, client tried again a minute later:

Apr  9 14:54:28 backup bbstored/hk[20596]: Housekeeping giving way to
connection for account 0x00030303
Apr  9 14:54:28 backup bbstored/hk[20596]: Account 0x00030303, removed 3405
blocks (1685 files, 0 dirs) was interrupted
Apr  9 14:55:03 backup bbstored[20595]: Incoming connection from
209.126.207.39 port 58171 (handling in child 9278)
Apr  9 14:55:03 backup bbstored[9278]: Certificate CN: BACKUP-030303
Apr  9 14:55:03 backup bbstored[9278]: Login: Client ID 00030303, Read/Write
Apr  9 14:55:04 backup bbstored[9278]: Session finished


And the server's houskeeping got interrupted.  and the server told it that it
doesn't have enough space so nothing was transferred. But the server has to
restart housekeeping.

You had mentioned storing additional meta-data on each file or directory,
which may help to speed up housekeeping on large accounts.  If that is not
hard to implement, that would be cool.  

Also I think what triggered this specific backlog is that I resized several
sets of images for a client.  about 6 gigs of images were shrunk to 2 or 3
gigs.  I guess the server had to mark about 6 gigs of images (which were
anywhere from 300k to 3megs in size) as deleted.  and then upload another two
or three gigs of images (which were now about 70k to 400k in size).  And while
this happened, also another customer uploaded about one or two gigs of files
to the server.

All this happened about a week ago.    server couldn't catchup on its own for
two days.  I just stopped all the other clients, and I'll jsut wait a day or
so for the housekeeping to catch up. 

Oh, my platform is Fedora Core 1 for the server, P4 2.8ghz/HT, 1gig ram. 
Three 250gig drives in a raid group (bbstored provides the raiding). 
currently each drive is at 62% of capacity (146gig of 240gig used).  

And sar shows 15% user, 84% system and 0% idle cpu for the past two days... if
that matters.

:)

Imran