[Box Backup] Experimental questions on boxbackup

Ben Summers boxbackup@fluffy.co.uk
Mon, 4 Jul 2005 14:44:33 +0100


On 4 Jul 2005, at 06:24, Noah yan wrote:

> Thanks Ben for answering the questions, also the sources do have lots
> of document and it is very clean. Also some questsion following on.
>
> In backupclient/BackupStoreFileEncodeStream.h and cpp file, what is
> the concept of a receip for them?

It's a list of where the blocks making a file come from, referencing  
data in the old version and the new version. The actual stream is  
built from this recipe.

>
> What I understand is that most of the performance cost in the backup
> process are in the rsync checksum calculating, rolling, etc, and the
> crypting in the BackupStoreFileEncodeStream, am I right?

Possibly. You can never really tell these things until you run a  
profiler against it. But I suspect that you'll find most of the time  
is spent in the SearchForMatchingBlocks() function.

However, to find out where the time is being spent, use a profiler  
with test/backupdiff, and see where it spends it's time.

Ben



> On 7/1/05, Ben Summers <ben@fluffy.co.uk> wrote:
>
>>
>> On 1 Jul 2005, at 16:32, Noah yan wrote:
>>
>>
>>> Dear All,
>>>
>>> I am doing some experiment on boxbackup for its performance. Several
>>> questions about the rollingchecksum of rsync: thanks in advance for
>>> answering
>>>
>>
>> see notes/encrypt_rsync.txt which explains what's going on in the
>> algorithm. It's not the same as rysnc. The concept is almost the
>> same, but the actual algorithms are quite different. Encryption makes
>> things difficult.
>>
>>
>>> 1. What the block size of each file are splited to calculate its
>>> checksum? Or what is the range if it is fixed
>>>
>>
>> see BackupStoreFileEncodeStream::CalculateBlockSizes() in lib/
>> backupclient/BackupStoreFileEncodeStream.cpp
>>
>>
>>> 2. What if there are overflow when calcualting the sum, such as  
>>> the a
>>> and b in rollingchecksum algorithm. I noticed that a is 16bit as sum
>>> for 8bit data, how overflow is handled if have in the algorithm, or
>>> other ways to avoid this?
>>>
>>
>> see lib/crypto/RollingChecksum.h / .cpp
>>
>> That's the lovely thing about open source. You have the source! And
>> in this case, documentation too.
>>
>> Ben
>>
>>
>>
>> _______________________________________________
>> boxbackup mailing list
>> boxbackup@fluffy.co.uk
>> http://lists.warhead.org.uk/mailman/listinfo/boxbackup
>>
>>
> _______________________________________________
> boxbackup mailing list
> boxbackup@fluffy.co.uk
> http://lists.warhead.org.uk/mailman/listinfo/boxbackup
>