[Box Backup] possible bug

Alex Howansky boxbackup@fluffy.co.uk
Tue, 15 Feb 2005 15:30:39 -0600 (CST)


I'm having problems backing up a directory that has a number of large (500+
meg) files in it. I have a number of other directories also marked for backup,
and they are being backed up just fine -- at least up until the point where the
scan gets to this particular directory, then it bombs. If I use ExcludeDir in
the config file to avoid the directory where these large files are, the backups
work just fine.

When it bombs, it fails with exception 7/47 and then continually retries. So,
I'm getting a constant barrage of huge transfers with nothing to show for it.

The config file has all the default interval values.

Here is a portion of the log on the server side when the failure happens:

...
Feb 15 13:53:00 scn05 bbstored[11866]: Sending stream, size 306
Feb 15 13:53:00 scn05 bbstored[11866]: Receive GetBlockIndexByName(0x6d6,OPAQUE)
Feb 15 13:53:00 scn05 bbstored[11866]: Receive GetBlockIndexByName(0x6d6,OPAQUE)
Feb 15 13:53:00 scn05 bbstored[11866]: Send Success(0x885)
Feb 15 13:53:00 scn05 bbstored[11866]: Send Success(0x885)
Feb 15 13:53:00 scn05 bbstored[11866]: Sending stream, size 121596
Feb 15 13:53:21 scn05 bbstored[11866]: Receive StoreFile(0x6d6,0x3f01fd3ab4000,0x4fdb654e3825653,0x0,OPAQUE)
Feb 15 13:53:21 scn05 bbstored[11866]: Receive StoreFile(0x6d6,0x3f01fd3ab4000,0x4fdb654e3825653,0x0,OPAQUE)
Feb 15 13:53:21 scn05 bbstored[11866]: Receiving stream, size uncertain
Feb 15 13:54:58 scn05 bbstored[11866]: Send Error(0x3e8,0xb)
Feb 15 13:54:58 scn05 bbstored[11866]: Send Error(0x3e8,0xb)
Feb 15 13:54:58 scn05 bbstored[11866]: Receive SetClientStoreMarker(0x3f02c23ae8880)
Feb 15 13:54:58 scn05 bbstored[11866]: Receive SetClientStoreMarker(0x3f02c23ae8880)
Feb 15 13:54:58 scn05 bbstored[11866]: Send Success(0x3f02c23ae8880)
Feb 15 13:54:58 scn05 bbstored[11866]: Send Success(0x3f02c23ae8880)
Feb 15 13:54:58 scn05 bbstored[11866]: Receive Finished()
Feb 15 13:54:58 scn05 bbstored[11866]: Receive Finished()
Feb 15 13:54:58 scn05 bbstored[11866]: Session finished
Feb 15 13:54:58 scn05 bbstored[11866]: Send Finished()
Feb 15 13:54:58 scn05 bbstored[11866]: Send Finished()

And on the client side:

...
Feb 15 13:53:00 scn08 bbackupd[22194]: Receiving stream, size 306
Feb 15 13:53:00 scn08 bbackupd[22194]: Send GetBlockIndexByName(0x6d6,"20050214")
Feb 15 13:53:00 scn08 bbackupd[22194]: Receive Success(0x885)
Feb 15 13:53:00 scn08 bbackupd[22194]: Receiving stream, size 121596
Feb 15 13:53:21 scn08 bbackupd[22194]: Send StoreFile(0x6d6,0x3f01fd3ab4000,0x4fdb654e3825653,0x0,"20050214")
Feb 15 13:53:21 scn08 bbackupd[22194]: Sending stream, size uncertain
Feb 15 13:54:58 scn08 bbackupd[22194]: Receive Error(0x3e8,0xb)
Feb 15 13:54:58 scn08 bbackupd[22194]: Send SetClientStoreMarker(0x3f02c23ae8880)
Feb 15 13:54:58 scn08 bbackupd[22194]: Receive Success(0x3f02c23ae8880)
Feb 15 13:54:58 scn08 bbackupd[22194]: Send Finished()
Feb 15 13:54:58 scn08 bbackupd[22194]: Receive Finished()
Feb 15 13:54:58 scn08 bbackupd[22194]: Exception caught (7/47), reset state and waiting to retry...

After adding that ExcludeDir and resyncing to catch up on the past week of
missed files, I removed the ExcludeDir and tried again, knowing that it would
only try the large files since everything else was now up to date. This time, I
got:

Feb 15 13:20:18 scn08 bbackupd[22194]: Send GetBlockIndexByName(0x6d6,"20050214")
Feb 15 13:20:18 scn08 bbackupd[22194]: Receive Success(0x885)
Feb 15 13:20:18 scn08 bbackupd[22194]: Receiving stream, size 121596
Feb 15 13:20:39 scn08 bbackupd[22194]: Send StoreFile(0x6d6,0x3f01fd3ab4000,0x4fdb654e3825653,0x0,"20050214")
Feb 15 13:20:39 scn08 bbackupd[22194]: Sending stream, size uncertain
Feb 15 13:22:19 scn08 bbackupd[22194]: Receive Error(0x3e8,0xb)
Feb 15 13:22:19 scn08 bbackupd[22194]: About to notify administrator about event store-full, running script '/usr/local/etc/box/bbackupd/NotifySysadmin.sh store-full'
Feb 15 13:22:19 scn08 bbackupd[22194]: Send SetClientStoreMarker(0x3f02baeea90c0)
Feb 15 13:22:19 scn08 bbackupd[22194]: Receive Success(0x3f02baeea90c0)
Feb 15 13:22:19 scn08 bbackupd[22194]: Send Finished()
Feb 15 13:22:19 scn08 bbackupd[22194]: Receive Finished()
Feb 15 13:22:19 scn08 bbackupd[22194]: Exception caught (7/47), reset state and waiting to retry...
Feb 15 13:23:59 scn08 bbackupd[22194]: File statistics: total file size uploaded 510688603, bytes already on server 0, encoded size 510537568

Unfortunately, I lost the cut-and-paste of the account info output at this
point, but it showed (give or take) about 600 meg usage, with 0 old and over 3
gig in deleted files. The quota is 4096M/4505M. So, I reset the quota to 1 gig,
sent HUP to bbstored to force housekeeping, then set the quota back to 4 gig.
That gave me this:

                  Account ID: 00000008
              Last object ID: 5529
                 Blocks used: 308512 (602.56Mb)
    Blocks used by old files: 0 (0.00Mb)
Blocks used by deleted files: 249945 (488.17Mb)
  Blocks used by directories: 262 (0.51Mb)
            Block soft limit: 2097152 (4096.00Mb)
            Block hard limit: 2306560 (4505.00Mb)
         Client store marker: 1108500766000000

Once again, I uncommented the ExcludeDir on the client side, and restarted the
process. Everything worked fine this time. The large files got backed up with
no errors.

So, if we say that:

X = size of a totally new file to backup
Y = hard limit - (usage + deleted + old)

Then it seems that bbstored fails when X > Y, even if X < deleted + old.

I.e., there was certainly enough room to backup the new files if the deleted
files had been purged at the time the new files were uploaded, but that didn't
happen. I'm guessing that since the log file says "size uncertain" on the
transfer, it can't figure out that it needs to purge deleted files to make room
for the new one -- so it doesn't try, and the file ends up being so huge that
it hits the quota and fails.

-- 
Alex Howansky
Wankwood Associates
http://wankwood.com/