[Box Backup] exception BackupStore InvalidBackupStoreFilename
(4/3)
Chris Wilson
boxbackup@fluffy.co.uk
Tue, 6 May 2008 02:33:11 +0100 (BST)
Hi Jason,
OK, I found the problem. BackupStoreDirectory::CheckAndFix has an
algorithm that's potentially exponential, and I can see it getting slower
and slower as the bbstoreaccounts check fix process runs.
The good news is that I understand the problem, the bad news is that it's
not easy to fix, and your check fix will probably take more than 30 days
to complete :-(
Also, I'm going to be out of country for the next 3 weeks from Wednesday,
but I'll have some long journeys to think about the problem and figure out
how to fix it.
Ben, if you have time to investigate BackupStoreDirectory::CheckAndFix it
would be greatly appreciated, as you know how it's supposed to work and I
don't (yet) :-)
My gprof call graph for bbstoreaccounts check fix, when I manually removed
a directory from the store that contained about 1000 files, looks like
this:
-----------------------------------------------
0.05 1.06 1072/3679
BackupStoreCheck::InsertObjectIntoDirectory(long long, long long, bool) [7]
0.13 2.59 2607/3679
BackupStoreCheck::CheckDirectories() [5]
[4] 49.7 0.18 3.65 3679
BackupStoreDirectory::CheckAndFix() [4]
0.04 0.94 608393/608394
std::set<long long, std::less<long long>,
std::allocator<long long> >::insert(long long const&) [13]
0.04 0.81 608393/608393
std::set<BackupStoreFilename, std::less<BackupStoreFilename>,
std::allocator<BackupStoreFilename> >::insert(BackupStoreFilename
const&) [16]
0.00 0.63 608393/609465
std::set<long long, std::less<long long>,
std::allocator<long long> >::find(long long const&) [19]
0.00 0.49 608393/608393
std::set<BackupStoreFilename, std::less<BackupStoreFilename>,
std::allocator<BackupStoreFilename> >::find(BackupStoreFilename
const&) [23]
0.00 0.24 3220/3220
std::set<BackupStoreFilename, std::less<BackupStoreFilename>,
std::allocator<BackupStoreFilename> >::~set() [32]
0.00 0.13 3220/3221
std::set<long long, std::less<long long>,
std::allocator<long long> >::~set() [49]
0.06 0.00 1219393/1319188
BackupStoreDirectory::Entry::GetFlags() const [86]
0.04 0.00 608393/608393
BackupStoreDirectory::Entry::GetDependsNewer() const [107]
0.04 0.00 1216786/2461259
__gnu_cxx::__normal_iterator<BackupStoreDirectory::Entry**,
std::vector<BackupStoreDirectory::Entry*,
std::allocator<BackupStoreDirectory::Entry*> > >::operator++() [72]
0.04 0.00 1216786/1824110
BackupStoreDirectory::Entry::GetObjectID() const [91]
0.04 0.00 608393/608393
std::set<BackupStoreFilename, std::less<BackupStoreFilename>,
std::allocator<BackupStoreFilename> >::end() const [112]
0.01 0.02 608393/609465
std::set<long long, std::less<long long>,
std::allocator<long long> >::end() const [122]
0.02 0.00 1227823/2515664
std::vector<BackupStoreDirectory::Entry*,
std::allocator<BackupStoreDirectory::Entry*> >::end() [90]
0.02 0.00 608393/608393
BackupStoreDirectory::Entry::GetDependsOlder() const [145]
0.02 0.00 1832537/3062008
bool __gnu_cxx::operator!=<BackupStoreDirectory::Entry**,
std::vector<BackupStoreDirectory::Entry*,
std::allocator<BackupStoreDirectory::Entry*> > >
(__gnu_cxx::__normal_iterator<BackupStoreDirectory::Entry**,
std::vector<BackupStoreDirectory::Entry*,
std::allocator<BackupStoreDirectory::Entry*> > > const&,
__gnu_cxx::__normal_iterator<BackupStoreDirectory::Entry**,
std::vector<BackupStoreDirectory::Entry*,
std::allocator<BackupStoreDirectory::Entry*> > > const&) [119]
0.00 0.01 3220/3221
std::set<long long, std::less<long long>,
std::allocator<long long> >::set() [164]
0.00 0.01 3220/3220
std::set<BackupStoreFilename, std::less<BackupStoreFilename>,
std::allocator<BackupStoreFilename> >::set() [176]
0.01 0.00 1216786/1216786
BackupStoreDirectory::Entry::GetName() const [189]
0.01 0.00 5478144/6693968
__gnu_cxx::__normal_iterator<BackupStoreDirectory::Entry**,
std::vector<BackupStoreDirectory::Entry*,
std::allocator<BackupStoreDirectory::Entry*> > >::operator*()
const [177]
0.00 0.00 619430/661726
std::vector<BackupStoreDirectory::Entry*,
std::allocator<BackupStoreDirectory::Entry*> >::begin() [256]
0.00 0.00 608393/608393
__gnu_cxx::__normal_iterator<BackupStoreDirectory::Entry**,
std::vector<BackupStoreDirectory::Entry*,
std::allocator<BackupStoreDirectory::Entry*> > >
::operator--() [408]
0.00 0.00 608393/609465
std::_Rb_tree_const_iterator<long long>::operator!=
(std::_Rb_tree_const_iterator<long long> const&) const [403]
0.00 0.00 608393/608393
std::_Rb_tree_const_iterator<BackupStoreFilename>::
operator!=(std::_Rb_tree_const_iterator<BackupStoreFilename>
const&) const [410]
0.00 0.00 3679/4751
bool __gnu_cxx::operator==<BackupStoreDirectory::Entry**,
std::vector<BackupStoreDirectory::Entry*,
std::allocator<BackupStoreDirectory::Entry*> > >
(__gnu_cxx::__normal_iterator<BackupStoreDirectory::Entry**,
std::vector<BackupStoreDirectory::Entry*,
std::allocator<BackupStoreDirectory::Entry*> > > const&,
__gnu_cxx::__normal_iterator<BackupStoreDirectory::Entry**,
std::vector<BackupStoreDirectory::Entry*,
std::allocator<BackupStoreDirectory::Entry*> > > const&) [535]
-----------------------------------------------
If you're not used to reading gprof output then this link may help:
http://sourceware.org/binutils/docs-2.18/gprof/Call-Graph.html#Call-Graph
What this appears to mean to me is that, in 3000 calls to CheckAndFix, we
did 600,000 insert() and find() operations on the two sets, idsEncountered
and filenamesEncountered, and that took nearly 3 seconds out of the total
3.65 seconds used by CheckAndFix and its callees.
I'm not yet sure whether most of those inserts happened on just one run of
CheckAndFix, when a large number of objects had to be removed and
therefore the loop had to be restarted multiple times; or if it's just
that CheckAndFix does a lot of work and is called many times.
One thought that occurred to me was that in either case, we could reduce
the constant cost of find() and insert() by using TR1 unordered_set
instead of std::set on platforms that support it.
Another is that the IDs are usually encountered in reverse order, so we
could give begin() as a hint to the insert() operator because the new
element will often belong at the beginning of the list for the ID list
(but not the filename list). However that only saves about 1/3 of the
overhead at best.
The third option would be to determine how many times we restart the
search in these cases, and if it's significant, to rewrite the algorithm
so that the search doesn't have to be restarted, for example by queueing
all deletions until the end.
I could also be entirely wrong and CheckAndFix might not really be the
slow path in this case. I can give you complete gprof output if that
helps, or the built binary and gmon.out file so you can run your own
analysis.
Cheers, Chris.
--
_____ __ _
\ __/ / ,__(_)_ | Chris Wilson <0000 at qwirx.com> - Cambs UK |
/ (_/ ,\/ _/ /_ \ | Security/C/C++/Java/Ruby/Perl/SQL Developer |
\ _/_/_/_//_/___/ | We are GNU : free your mind & your software |