[Box Backup] Re: Block Sizes and Diffing (was: Re: [Box Backup]
error:1409F07F:SSL routines:SSL3_WRITE_PENDING:bad write retry)
Johann Glaser
boxbackup@fluffy.co.uk
Mon, 10 Sep 2007 16:09:25 +0200
Hi Chris!
Am Freitag, den 07.09.2007, 21:09 +0100 schrieb Chris Wilson:
> Hi Johann,
> [...]
>
> > Why is bbackupd reading this file so often?
>
> That is a very good question. If the reads were powers of two, then I'd
> suggest it was due to diffing at different block sizes, but that doesn't
> seem to be the case here. Perhaps this is why your backups are taking
> so long? I'll investigate.
>
> I assume that you're running 0.10 on the client?
Yes, I do. The client and the server are the same machine and they use
the inofficial Debian packages from http://debian.myreseau.org/.
Am Sonntag, den 09.09.2007, 20:59 +0100 schrieb Chris Wilson:
> [...]
> By the way, did adding the KeepAliveTime option fix this timeout for you?
Yes it works, we have no timeouts any more. Thanks for the hint.
> > Why is bbackupd reading this file so often?
>
> I have a theory that I'd appreciate your help in testing out. It seems
> that Box Backup doesn't use a single block size for the entire file.
> Rather, if it detects a chunk that's been changed or added, it can use a
> different block size specifically for that chunk. If this happens a lot,
> then the file can end up with a wide mixture of block sizes.
>
> For every block size that we want to use in diffing, we have to calculate
> the checksums of every block with that size across the entire file. Since
> we don't want to hold the entire file in memory, we read it repeatedly
> (there is probably a more efficient way to do this by calculating all the
> checksums while reading the file just once).
>
> Box Backup will pick the most frequently used block sizes in the file, up
> to 64 (!) of them, when deciding what sizes to calculate the checksums
> for. That means it will read the entire file up to 64 times.
>
> There is clearly a tradeoff between bandwidth efficiency and local system
> efficiency with this principle, where the scale factor is particularly
> high with this algorithm.
>
> It should be slightly better in recent development versions, since I added
> a buffering stream wrapper that always reads 4k chunks to reduce the
> number of system calls, but it could definitely be better.
>
> Anyway, you can check at least part of the theory as follows: download the
> latest version of that file as an encrypted object from the store by using
> bbackupquery's "getobject" command; run "bbackupobjdump" on it like this:
>
> bbackupobjdump bigfile.2.obj | sort -k4 | uniq -f3 -c | grep s= \
> | sort -r -g
>
> and the output will look like this:
>
> 1280 0 this s= 4129
> 1 1280 this s= 545
>
> The last number of each line is the block size, and the first is the
> number of times that it occurs. They will be sorted in descending order. I
> expect that you will get a lot of lines. They won't correspond exactly to
> the read sizes you're seeing, because these are compressend, encrypted
> blocks and the reads are for uncompressed, unencrypted blocks on the
> server, but the number of different sizes that you see should correspond
> (roughly) to the number of times the file is read (up to 64).
The output consists of 1216 lines and starts with:
642 1069 this s= 49
407 107 this s= 177
336 1065 this s= 1633
296 1363 this s= 1649
277 1159 this s= 1617
265 1424 this s= 1681
254 2706 this s= 1601
248 1011 this s= 1665
248 1005 this s= 33
246 1015 this s= 1745
219 1027 this s= 1729
205 1226 this s= 1697
200 1006 this s= 1713
194 1012 this s= 1585
156 1303 this s= 1569
155 1014 this s= 1761
141 1016 this s= 2017
114 1020 this s= 2033
111 1246 this s= 1553
103 1025 this s= 1777
(and all following are <100 for the first column)
> I can see three options, in order of increasing work and effectiveness:
>
> * accept that Box Backup is designed for bandwidth efficiency, not local
> file efficiency, and don't change anything;
>
> * make the number of block sizes checked a tunable parameter, possibly
> changing the default from 64 to something else;
>
> * change the patching algorithm to try to use an existing block size from
> the file if it find a close match to the "optimal" one;
>
> * change the checksum algorithm to calculate all block size checksums in a
> single pass.
>
> The last two options are the ones that I feel least confident about, and
> I'd appreciate some help or at least sanity checking of a patch from Ben,
> Martin or Chromi (who wrote the current diffing algorithm if I understand
> correctly).
>
> On the other hand, you might find that the file doesn't have a range of
> different block sizes, and so I need to go back to the drawing board for
> another theory.
Unfortunately I don't understand BoxBackup's diffing and block-size
algorithms, so I don't know what to conclude from my above listing. :-)
Do I understand correctly, that BoxBackup tries to find the smallest
possible block size to transmit (and store) changes?
Again, regarding block sizes: From the strace I've sent last friday, we
can see that it always starts at position 0 and starts with reading
blocks of the very same size. So, the possible blocks always have to be
aligned to a position in the file which is a multiple of their block
size.
OTOH I have a suggestion for the block algorithm. The block size can be
defined to a fixed size, e.g. 4k and you can accept that up to 4095
bytes might be duplicate. In my opinion wasting a few kB is tolerable.
There is still a problem: insertions or deletions in the file can't be
identified this way. Imagine a single byte insertion at the very
beginning of the file. Then every 4k-aligned block will have changed and
the whole file needs to be updated. This problem has already been
addressed by rsync and is described at
http://rsync.samba.org/tech_report/ ("The rsync algorithm" and "Rolling
Checksum").
BTW: The (inofficial) BoxBackup Debian package from
http://debian.myreseau.org/ doesn't contain the bbackupobjdump tool, so
I checked out the trunk directory from the SVN. It doesn't build
bbackupobjdump automatically, so I had to trick a little bit. How can I
do this "beautifully"?
Bye
Hansi
--
Johann Glaser <glaser@ict.tuwien.ac.at>
Institute of Computer Technology, E384
Vienna University of Technology, Gusshausstr. 27-29, A-1040 Wien
Phone: ++43/1/58801-38444 Fax: ++43/1/58801-38499