[Box Backup] exception BackupStore InvalidBackupStoreFilename (4/3)

Chris Wilson boxbackup@fluffy.co.uk
Tue, 6 May 2008 02:33:11 +0100 (BST)


Hi Jason,

OK, I found the problem. BackupStoreDirectory::CheckAndFix has an 
algorithm that's potentially exponential, and I can see it getting slower 
and slower as the bbstoreaccounts check fix process runs.

The good news is that I understand the problem, the bad news is that it's 
not easy to fix, and your check fix will probably take more than 30 days 
to complete :-(

Also, I'm going to be out of country for the next 3 weeks from Wednesday, 
but I'll have some long journeys to think about the problem and figure out 
how to fix it.

Ben, if you have time to investigate BackupStoreDirectory::CheckAndFix it 
would be greatly appreciated, as you know how it's supposed to work and I 
don't (yet) :-)

My gprof call graph for bbstoreaccounts check fix, when I manually removed 
a directory from the store that contained about 1000 files, looks like 
this:

-----------------------------------------------
                0.05    1.06    1072/3679        
BackupStoreCheck::InsertObjectIntoDirectory(long long, long long, bool) [7]
                0.13    2.59    2607/3679        
BackupStoreCheck::CheckDirectories() [5]
[4]     49.7    0.18    3.65    3679         
BackupStoreDirectory::CheckAndFix() [4]
                0.04    0.94  608393/608394
std::set<long long, std::less<long long>,
std::allocator<long long> >::insert(long long const&) [13]
                0.04    0.81  608393/608393      
std::set<BackupStoreFilename, std::less<BackupStoreFilename>,
std::allocator<BackupStoreFilename> >::insert(BackupStoreFilename
const&) [16]
                0.00    0.63  608393/609465
std::set<long long, std::less<long long>,
std::allocator<long long> >::find(long long const&) [19]
                0.00    0.49  608393/608393      
std::set<BackupStoreFilename, std::less<BackupStoreFilename>,
std::allocator<BackupStoreFilename> >::find(BackupStoreFilename 
const&) [23]
                0.00    0.24    3220/3220        
std::set<BackupStoreFilename, std::less<BackupStoreFilename>,
std::allocator<BackupStoreFilename> >::~set() [32]
                0.00    0.13    3220/3221        
std::set<long long, std::less<long long>,
std::allocator<long long> >::~set() [49]
                0.06    0.00 1219393/1319188     
BackupStoreDirectory::Entry::GetFlags() const [86]
                0.04    0.00  608393/608393      
BackupStoreDirectory::Entry::GetDependsNewer() const [107]
                0.04    0.00 1216786/2461259     
__gnu_cxx::__normal_iterator<BackupStoreDirectory::Entry**,
std::vector<BackupStoreDirectory::Entry*,
std::allocator<BackupStoreDirectory::Entry*> > >::operator++() [72]
                0.04    0.00 1216786/1824110     
BackupStoreDirectory::Entry::GetObjectID() const [91]
                0.04    0.00  608393/608393      
std::set<BackupStoreFilename, std::less<BackupStoreFilename>,
std::allocator<BackupStoreFilename> >::end() const [112]
                0.01    0.02  608393/609465      
std::set<long long, std::less<long long>,
std::allocator<long long> >::end() const [122]
                0.02    0.00 1227823/2515664     
std::vector<BackupStoreDirectory::Entry*,
std::allocator<BackupStoreDirectory::Entry*> >::end() [90]
                0.02    0.00  608393/608393      
BackupStoreDirectory::Entry::GetDependsOlder() const [145]
                0.02    0.00 1832537/3062008     
bool __gnu_cxx::operator!=<BackupStoreDirectory::Entry**,
std::vector<BackupStoreDirectory::Entry*,
std::allocator<BackupStoreDirectory::Entry*> > >
(__gnu_cxx::__normal_iterator<BackupStoreDirectory::Entry**, 
std::vector<BackupStoreDirectory::Entry*,
std::allocator<BackupStoreDirectory::Entry*> > > const&, 
__gnu_cxx::__normal_iterator<BackupStoreDirectory::Entry**,
std::vector<BackupStoreDirectory::Entry*,
std::allocator<BackupStoreDirectory::Entry*> > > const&) [119]
                0.00    0.01    3220/3221        
std::set<long long, std::less<long long>,
std::allocator<long long> >::set() [164]
                0.00    0.01    3220/3220        
std::set<BackupStoreFilename, std::less<BackupStoreFilename>,
std::allocator<BackupStoreFilename> >::set() [176]
                0.01    0.00 1216786/1216786     
BackupStoreDirectory::Entry::GetName() const [189]
                0.01    0.00 5478144/6693968     
__gnu_cxx::__normal_iterator<BackupStoreDirectory::Entry**,
std::vector<BackupStoreDirectory::Entry*,
std::allocator<BackupStoreDirectory::Entry*> > >::operator*()
const [177]
                0.00    0.00  619430/661726      
std::vector<BackupStoreDirectory::Entry*,
std::allocator<BackupStoreDirectory::Entry*> >::begin() [256]
                0.00    0.00  608393/608393      
__gnu_cxx::__normal_iterator<BackupStoreDirectory::Entry**,
std::vector<BackupStoreDirectory::Entry*,
std::allocator<BackupStoreDirectory::Entry*> > >
::operator--() [408]
                0.00    0.00  608393/609465      
std::_Rb_tree_const_iterator<long long>::operator!=
(std::_Rb_tree_const_iterator<long long> const&) const [403]
                0.00    0.00  608393/608393      
std::_Rb_tree_const_iterator<BackupStoreFilename>::
operator!=(std::_Rb_tree_const_iterator<BackupStoreFilename>
const&) const [410]
                0.00    0.00    3679/4751        
bool __gnu_cxx::operator==<BackupStoreDirectory::Entry**,
std::vector<BackupStoreDirectory::Entry*,
std::allocator<BackupStoreDirectory::Entry*> > >
(__gnu_cxx::__normal_iterator<BackupStoreDirectory::Entry**, 
std::vector<BackupStoreDirectory::Entry*,
std::allocator<BackupStoreDirectory::Entry*> > > const&, 
__gnu_cxx::__normal_iterator<BackupStoreDirectory::Entry**,
std::vector<BackupStoreDirectory::Entry*,
std::allocator<BackupStoreDirectory::Entry*> > > const&) [535]
-----------------------------------------------

If you're not used to reading gprof output then this link may help:

http://sourceware.org/binutils/docs-2.18/gprof/Call-Graph.html#Call-Graph

What this appears to mean to me is that, in 3000 calls to CheckAndFix, we 
did 600,000 insert() and find() operations on the two sets, idsEncountered 
and filenamesEncountered, and that took nearly 3 seconds out of the total 
3.65 seconds used by CheckAndFix and its callees.

I'm not yet sure whether most of those inserts happened on just one run of 
CheckAndFix, when a large number of objects had to be removed and 
therefore the loop had to be restarted multiple times; or if it's just 
that CheckAndFix does a lot of work and is called many times.

One thought that occurred to me was that in either case, we could reduce 
the constant cost of find() and insert() by using TR1 unordered_set 
instead of std::set on platforms that support it. 

Another is that the IDs are usually encountered in reverse order, so we 
could give begin() as a hint to the insert() operator because the new 
element will often belong at the beginning of the list for the ID list 
(but not the filename list). However that only saves about 1/3 of the 
overhead at best.

The third option would be to determine how many times we restart the 
search in these cases, and if it's significant, to rewrite the algorithm 
so that the search doesn't have to be restarted, for example by queueing 
all deletions until the end.

I could also be entirely wrong and CheckAndFix might not really be the 
slow path in this case. I can give you complete gprof output if that 
helps, or the built binary and gmon.out file so you can run your own 
analysis.

Cheers, Chris.
-- 
_____ __     _
\  __/ / ,__(_)_  | Chris Wilson <0000 at qwirx.com> - Cambs UK |
/ (_/ ,\/ _/ /_ \ | Security/C/C++/Java/Ruby/Perl/SQL Developer |
\ _/_/_/_//_/___/ | We are GNU : free your mind & your software |