[Box Backup] Old vs Deleted files removal
Peter Jalajas, GigaLock Backup Services
boxbackup@boxbackup.org
Thu, 19 Feb 2009 01:15:28 -0500
Hi Chris,
Thanks for the lead. I don't know if any of this helps, but I'm trying...
Another angle now: deleting that last copy of the Deleted file
removes the last chance for the user to restore his lost file. We
should keep at least the very latest Old or Deleted copy of every file
for as long as possible. Then remove the last Old copy of one file
(cuz it still has a Current version backed up) before removing the
last Deleted copy of another file (cuz it has no Current version
backed up). We can't presume that the user deleted his file
intentionally. To rephrase, in order of decreasing importance (remove
from bottom up):
1. Current version of file.
2. Latest Deleted version of file. (Presumes no Current version.
Presumes it is newer than any Old versions?)
3. Latest Old version of file. (Presumes Current version present.)
4. Older Old versions of file, including any old Deleted versions of
file that may be hanging around.
(Discussion for another day: should housekeeping delete long-ago
backed up Current files if needed to make room for new files trying to
be backed up?)
See below for highly redacted code.
See "Marks" and Flags section at bottom.
Sorry if this is more distracting than helpful.
Thanks,
Pete
https://www.boxbackup.org/svn/box/snapshots/0.11_trunk_2368/bin/bbstored/HousekeepStoreAccount.cpp
Not sure if this helps, but I read it as:
// Name: HousekeepStoreAccount::DoHousekeeping()
// Calculate how much should be deleted
// Scan the directory for potential things to delete
// This will also remove eligible items marked with RemoveASAP
// If scan directory stopped for some reason, probably parent
// instructed to terminate, stop now.
// If any files were marked "delete now", then update
// the size of the store
// Reset the delta counts for files, as they will include
// RemoveASAP flagged files deleted during the initial scan.
// Go and delete items from the accounts
------
// Name: HousekeepStoreAccount::ScanDirectory(int64_t)
// Get the filename
// Open it.
// Add the size of the directory on disc to the size being calculated
// Read the directory in
// Remove any files which are marked for removal as soon
// as they become old or deleted.
// Iterate through the directory
deletedSomething = false;
BackupStoreDirectory::Iterator i(dir);
BackupStoreDirectory::Entry *en = 0;
while((en = i.Next(BackupStoreDirectory::Entry::Flags_File)) != 0)
{
int16_t enFlags = en->GetFlags();
if((enFlags & BackupStoreDirectory::Entry::Flags_RemoveASAP) != 0
&& (enFlags & (BackupStoreDirectory::Entry::Flags_Deleted |
BackupStoreDirectory::Entry::Flags_OldVersion)) != 0)
{
// Delete this immediately.
DeleteFile(ObjectID, en->GetObjectID(), dir, objectFilename,
originalDirSizeInBlocks);
// flag as having done something
deletedSomething = true;
// Must start the loop from the beginning again, as iterator is now
// probably invalid.
break;
// Add files to the list of potential deletions
// map to count the distance from the mark //PJ: see "Marks"
below. Are we just trying to find the oldest version of a file?
// map to count the distance from the mark
std::map<std::pair<BackupStoreFilename, int32_t>, int32_t> markVersionAges;
// map of pair (filename, mark number) -> version age
// NOTE: use a reverse iterator to allow the distance from mark stuff to work
BackupStoreDirectory::ReverseIterator i(dir);
BackupStoreDirectory::Entry *en = 0;
while((en = i.Next(BackupStoreDirectory::Entry::Flags_File)) != 0)
{
// Update recalculated usage sizes
int16_t enFlags = en->GetFlags();
int64_t enSizeInBlocks = en->GetSizeInBlocks();
mBlocksUsed += enSizeInBlocks;
if(enFlags & BackupStoreDirectory::Entry::Flags_OldVersion)
mBlocksInOldFiles += enSizeInBlocks;
if(enFlags & BackupStoreDirectory::Entry::Flags_Deleted)
mBlocksInDeletedFiles += enSizeInBlocks;
// Work out ages of this version from the last mark
int32_t enVersionAge = 0;
std::map<std::pair<BackupStoreFilename, int32_t>,
int32_t>::iterator
enVersionAgeI(markVersionAges.find(std::pair<BackupStoreFilename,
int32_t>(en->GetName(), en->GetMarkNumber())));
if(enVersionAgeI != markVersionAges.end())
{
enVersionAge = enVersionAgeI->second + 1;
enVersionAgeI->second = enVersionAge;
}
else
{
markVersionAges[std::pair<BackupStoreFilename,
int32_t>(en->GetName(), en->GetMarkNumber())] = enVersionAge;
}
// enVersionAge is now the age of this version.
// Potentially add it to the list if it's deleted, if it's an old
version or deleted
if((enFlags & (BackupStoreDirectory::Entry::Flags_Deleted |
BackupStoreDirectory::Entry::Flags_OldVersion)) != 0)
{
// Is deleted / old version.
DelEn d;
d.mObjectID = en->GetObjectID();
d.mInDirectory = ObjectID;
d.mSizeInBlocks = en->GetSizeInBlocks();
d.mMarkNumber = en->GetMarkNumber();
d.mVersionAgeWithinMark = enVersionAge;
d.mIsFlagDeleted = (enFlags &
BackupStoreDirectory::Entry::Flags_Deleted)
? true : false;
// Add it to the list
mPotentialDeletions.insert(d);
// Update various counts
// Too much in the list of potential deletions?
// (check against the deletion target + the max size in deletions,
so that we never delete things
// and take the total size below the deletion size target)
// Make iterator for the last element, while checking that
there's something there in the first place.
// Nothing left in set
// Make this into an iterator pointing to the last element in the set
// Delete this one?
// Will need to recalculate the maximum size now, because
we've just deleted that element
// Over the size to remove, so stop now
// Because an object which was the maximum size recorded was
deleted from the set
// it's necessary to recalculate this maximum.
// Recurse into subdirectories
// Next level
"Marks": So, what are these? From:
// File
// Name: BackupStoreDirectory.h
// Purpose: Representation of a backup directory
// Class
// Name: BackupStoreDirectory
// Purpose: In memory representation of a directory
// Marks
// The lowest mark number a version of a file of this
name has ever had
uint32_t GetMinMarkNumber() const {return mMinMarkNumber;}
// The mark number on this file
uint32_t GetMarkNumber() const {return mMarkNumber;}
//PJ: Here are the Flags (also in BackupStoreDirectory.h). Is this
why OldVersions are removed before Deleted versions?
// Make sure these flags are synced with those in
backupprocotol.txt
// ListDirectory command
enum
{
Flags_INCLUDE_EVERYTHING = -1,
Flags_EXCLUDE_NOTHING = 0,
Flags_EXCLUDE_EVERYTHING = 31, //
make sure this is kept as sum of ones below!
Flags_File = 1,
Flags_Dir = 2,
Flags_Deleted = 4,
Flags_OldVersion = 8,
Flags_RemoveASAP = 16
// if this flag is set, housekeeping will remove it as it is marked
Deleted or OldVersion
};
On Wed, Feb 18, 2009 at 6:32 PM, Chris Wilson <chris@qwirx.com> wrote:
> Hi Pete,
>
> On Wed, 18 Feb 2009, Peter Jalajas, GigaLock Backup Services wrote:
...
>
> I think all the action happens in bin/bbstored/HousekeepStoreAccount.cpp. It
> starts in HousekeepStoreAccount::DoHousekeeping(), but that doesn't do much.
> The real action I suspect is in HousekeepStoreAccount::ScanDirectory() which
> is called on the root directory and calls itself recursively, building up a
> "map to count the distance from the mark" as it goes. That's where I get
> lost, I don't know what the "mark" is or what's done with this list yet.
>
> Cheers, Chris.