[Box Backup] Re: [Box Backup] error:1409F07F:SSL routines:SSL3_WRITE_PENDING:bad write retry

Chris Wilson boxbackup@fluffy.co.uk
Sun, 9 Sep 2007 20:59:04 +0100 (BST)


Hi Johann,

> I've investigated this case more closely. The problem occurs every time
> with the very same file with a size of (only) 360.724.480 bytes (we have
> much larger files). I attached an strace to the bbackupd process and
> found out that all files before this one are only checked using
> lstat64() and llistxattr() but are never open()ed or ever read(). I
> think this is because they already have been backuped in the night when
> the backup ran the first time (approx. 3:50).
>
> The particular file which triggers the problem on the other hand is read
> quite often. It starts with the following:
>
> ...
> 23240 open("/backup/daily/mysql/mysql-InnoDB.tar", O_RDONLY|O_LARGEFILE) = 6
> 23240 fstat64(6, {st_mode=S_IFREG|0600, st_size=360724480, ...}) = 0
> 23240 _llseek(6, 0, [0], SEEK_CUR)      = 0
> 23240 mmap2(NULL, 266240, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb79e5000
> 23240 mmap2(NULL, 135168, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb79c4000
> 23240 mmap2(NULL, 135168, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb79a3000
> 23240 _llseek(6, 0, [0], SEEK_SET)      = 0
> 23240 read(6, "data/mysql/ib_arch_log_000000000"..., 1430) = 1430
> ... many read(6, ..., 1430) = 1430
> 23240 read(6, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 1430) = 1260
>
> Ok, the file was now completely read. There is nothing in between, only
> lots of read()s. Then it starts over again with
>
> 23240 _llseek(6, 0, [0], SEEK_SET)      = 0
> 23240 read(6, "data/mysql/ib_arch_log_000000000"..., 1511) = 1511
> 23240 read(6, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 1511) = 1511
> ... many read(6, ..., 1511) = 1511
> 23240 read(6, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 1511) = 428
>
> So it was read a second time with a 81 bytes larger read buffer. Again
> it reads the very same file with an increased read buffer
>
> 23240 _llseek(6, 0, [0], SEEK_SET)      = 0
> 23240 read(6, "data/mysql/ib_arch_log_000000000"..., 1525) = 1525
> 23240 read(6, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\302\0\0\0\0\0\n-/"..., 1525) = 1525
> ...
>
> And this continues (until I stopped strace). The read buffer is always
> increased irregularly by 3-152 bytes except once where it went from 2394
> down to 1199 and then up to 2427. In the meantime bbstored already
> compliant about the Protocol_Timeout.

By the way, did adding the KeepAliveTime option fix this timeout for you?

> Why is bbackupd reading this file so often?

I have a theory that I'd appreciate your help in testing out. It seems 
that Box Backup doesn't use a single block size for the entire file. 
Rather, if it detects a chunk that's been changed or added, it can use a 
different block size specifically for that chunk. If this happens a lot, 
then the file can end up with a wide mixture of block sizes.

For every block size that we want to use in diffing, we have to calculate 
the checksums of every block with that size across the entire file. Since 
we don't want to hold the entire file in memory, we read it repeatedly 
(there is probably a more efficient way to do this by calculating all the 
checksums while reading the file just once).

Box Backup will pick the most frequently used block sizes in the file, up 
to 64 (!) of them, when deciding what sizes to calculate the checksums 
for. That means it will read the entire file up to 64 times.

There is clearly a tradeoff between bandwidth efficiency and local system 
efficiency with this principle, where the scale factor is particularly 
high with this algorithm.

It should be slightly better in recent development versions, since I added 
a buffering stream wrapper that always reads 4k chunks to reduce the 
number of system calls, but it could definitely be better.

Anyway, you can check at least part of the theory as follows: download the 
latest version of that file as an encrypted object from the store by using 
bbackupquery's "getobject" command; run "bbackupobjdump" on it like this:

   bbackupobjdump bigfile.2.obj | sort -k4 | uniq -f3 -c | grep s= \
   | sort -r -g

and the output will look like this:

    1280        0 this  s=    4129
       1     1280 this  s=     545

The last number of each line is the block size, and the first is the 
number of times that it occurs. They will be sorted in descending order. I 
expect that you will get a lot of lines. They won't correspond exactly to 
the read sizes you're seeing, because these are compressend, encrypted 
blocks and the reads are for uncompressed, unencrypted blocks on the 
server, but the number of different sizes that you see should correspond 
(roughly) to the number of times the file is read (up to 64).

I can see three options, in order of increasing work and effectiveness:

* accept that Box Backup is designed for bandwidth efficiency, not local
   file efficiency, and don't change anything;

* make the number of block sizes checked a tunable parameter, possibly
   changing the default from 64 to something else;

* change the patching algorithm to try to use an existing block size from
   the file if it find a close match to the "optimal" one;

* change the checksum algorithm to calculate all block size checksums in a
   single pass.

The last two options are the ones that I feel least confident about, and 
I'd appreciate some help or at least sanity checking of a patch from Ben, 
Martin or Chromi (who wrote the current diffing algorithm if I understand 
correctly).

On the other hand, you might find that the file doesn't have a range of 
different block sizes, and so I need to go back to the drawing board for 
another theory.

Thanks ifor pointing out this problem, and advance for your help in 
testing the theory and possible solutions (if you're willing to do so).

Cheers, Chris.
-- 
_____ __     _
\  __/ / ,__(_)_  | Chris Wilson <0000 at qwirx.com> - Cambs UK |
/ (_/ ,\/ _/ /_ \ | Security/C/C++/Java/Perl/SQL/HTML Developer |
\ _/_/_/_//_/___/ | We are GNU-free your mind-and your software |