[Box Backup-commit] #45: File diff performance patch (reduced disk IO and wall time

Box Backup boxbackup-dev@fluffy.co.uk
Fri, 21 Mar 2008 03:32:39 -0000


#45: File diff performance patch (reduced disk IO and wall time
-------------------------+--------------------------------------------------
 Reporter:  aharper      |       Owner:  ben  
     Type:  enhancement  |      Status:  new  
 Priority:  normal       |   Milestone:  0.12 
Component:  bbackupd     |     Version:  trunk
 Keywords:               |  
-------------------------+--------------------------------------------------
 The enclosed patch (tested against SVN revision 2104) changes the file
 diff logic with the following enhancements:

 - Files are read no more than twice (versus read again and again for every
 block size).

 - Before performing a rolling checksum each server-side block is first
 checked (by MD5) at its previous location in the file. In the event the
 block has not changed or moved, the rolling checksum is skipped.

 - Rolling checksums are searched in total-file-coverage order (size times
 number of blocks) favoring larger blocks in the final recipe.

 In my testing these changes improve file diff performance wall time from
 2-10x and make the diff process CPU bound (instead of IO bound).

 No new dependencies are created by the patch, this is only an algorithmic
 change. The code passes existing unittests. Additionally, I have tested
 with my personal data for 3 wks on OS X (i386) without incident.

-- 
Ticket URL: <https://www.boxbackup.org/trac/ticket/45>
Box Backup <http://www.boxbackup.org/>
An open source, completely automatic on-line backup system for UNIX.