[Box Backup] Re: Block Sizes and Diffing (was: Re: [Box Backup]
error:1409F07F:SSL routines:SSL3_WRITE_PENDING:bad write retry)
Chris Wilson
boxbackup@fluffy.co.uk
Mon, 10 Sep 2007 21:50:03 +0100 (BST)
Hi Johann,
On Mon, 10 Sep 2007, Johann Glaser wrote:
> The output consists of 1216 lines and starts with:
> 642 1069 this s= 49
> 407 107 this s= 177
> 336 1065 this s= 1633
> 296 1363 this s= 1649
> 277 1159 this s= 1617
> 265 1424 this s= 1681
> 254 2706 this s= 1601
> 248 1011 this s= 1665
> 248 1005 this s= 33
> 246 1015 this s= 1745
> 219 1027 this s= 1729
> 205 1226 this s= 1697
> 200 1006 this s= 1713
> 194 1012 this s= 1585
> 156 1303 this s= 1569
> 155 1014 this s= 1761
> 141 1016 this s= 2017
> 114 1020 this s= 2033
> 111 1246 this s= 1553
> 103 1025 this s= 1777
> (and all following are <100 for the first column)
Ouch! 1216 different block sizes in the same file!
Ben, I think we need to fix the diffing algorithm. This doesn't seem
reasonable to me.
> Unfortunately I don't understand BoxBackup's diffing and block-size
> algorithms, so I don't know what to conclude from my above listing. :-)
>
> Do I understand correctly, that BoxBackup tries to find the smallest
> possible block size to transmit (and store) changes?
No, it picks an "appropriate" block size for each chunk that it detects
has changed. Personally I don't think this is particularly smart, I think
we should keep the same block size for the whole file.
> OTOH I have a suggestion for the block algorithm. The block size can be
> defined to a fixed size, e.g. 4k and you can accept that up to 4095
> bytes might be duplicate. In my opinion wasting a few kB is tolerable.
>
> There is still a problem: insertions or deletions in the file can't be
> identified this way. Imagine a single byte insertion at the very
> beginning of the file. Then every 4k-aligned block will have changed and
> the whole file needs to be updated. This problem has already been
> addressed by rsync and is described at
> http://rsync.samba.org/tech_report/ ("The rsync algorithm" and "Rolling
> Checksum").
I believe that we already implement this, albeit modified to work with
encrypted data. See
http://bbdev.fluffy.co.uk/svn/box/trunk/docs/backup/encrypt_rsync.txt
for details.
> BTW: The (inofficial) BoxBackup Debian package from
> http://debian.myreseau.org/ doesn't contain the bbackupobjdump tool, so
> I checked out the trunk directory from the SVN. It doesn't build
> bbackupobjdump automatically, so I had to trick a little bit. How can I
> do this "beautifully"?
Sorry, the easiest way is to configure, then cd bin/bbackupobjdump; make;
cd ../..; debug/bin/bbackupobjdump ... .
Cheers, Chris.
--
_____ __ _
\ __/ / ,__(_)_ | Chris Wilson <0000 at qwirx.com> - Cambs UK |
/ (_/ ,\/ _/ /_ \ | Security/C/C++/Java/Perl/SQL/HTML Developer |
\ _/_/_/_//_/___/ | We are GNU-free your mind-and your software |