[Box Backup] Backup File Order

Johann Glaser boxbackup@fluffy.co.uk
Fri, 14 Sep 2007 11:10:22 +0200


Hi!

> Both proposals sound interesting, but I can't help feeling that this is 
> perhaps a little over-complicated. Do you have a use case that actually 
> requires all this?

You are right, my proposal got a bit over-featured. I personally don't
have a use case where I need all the features at once. My proposal holds
a very generic list of features, suited to fulfill nearly all needs.
Nobody will utilize all at once for a single directory, but I'm sure all
(or at least most) of the features will be useful in one or another
combination.

> For example, I think you said that provided we back up the files in the 
> correct order, we can back up a live SVN filesystem and we don't need to 
> make a hot copy? If so, then why the need for the pre and post scripts? 
> What would they do?

As already said, for SVN FSFS repositories no pre and post scripts are
needed. But e.g. for MySQL they are. The database directory
(e.g. /var/lib/mysql/) must be excluded and another directory which
holds the database dump must be included. Before the backup of the
dump-directory the pre-script must run.

Alternatively (for low-traffic sites) a pre-script could stop mysqld and
a post-script could start it. Therefore no dump is necessary.

> What does the Priority directive do? Override the default sort order? 
> Should there be a sort-by-priority BackupFilesOrder?

Yes, its just to give an order for the directory blocks. Previously I
thought of using an array of regexps, e.g.
  BackupFilesOrder = "/db/current","db/.*","db/lastfile"
but then it would be too complicated to assign other specific properties
to these array items, like BackupSubdirs, ...

> No, we split the list into files and directories and do deep recursion 
> first (I think).

Aha, ok, that means its not really good for SVN, but as Martin has
written, it is not a real problem:

Am Donnerstag, den 13.09.2007, 23:03 +0100 schrieb Martin Ebourne:
> [...]
> And it's not even manual any more, svnadmin recover will now fix this
> completely automatically anyway:
> 
> http://svn.haxx.se/dev/archive-2007-03/0061.shtml

Good news. thanks for that info.

Am Freitag, den 14.09.2007, 08:05 +0100 schrieb Martin Ebourne:
> On Thu, 2007-09-13 at 19:33 -0400, Ben Bennett wrote:
> > Would it perhaps make sense to have a per-directory argument to say
> > "run the candidate files through this program and have it return a
> > sorted list"? Then the complexity can be buried...
> 
> My preference would be to make sure files are backed up in the order
> they were originally written. This would then 'just work' for any such
> scenarios, and with no admin intervention at all.
> 
> This is a little hard to do by filesystem scanning, but would be easy
> with inotify support and similar.

The original order is not sure to be preserved by a file system, neither
by a copy of the files. Regardless whether you backup SVN or any other
kind of special database, I dissuade from relying on filesystem order.

I like Ben's idea of delegating the ordering complexity to an external
program/script. The script could either get a list of all files/dirs and
sort it according to its needs. This requires that the script can
distinguish between files and directories (coming from the input stream
provided by bbackupd).

Another possibility is to give the script the base directory and only
capture its stdout for a list of files and dirs. Implementing this
approach also enables to use the script as a filter which can implement
more complex rules as currently possible with the ExcludeXxx config
directives.

Despite that, I still propose to implement the rather simple features
  BackupFilesOrder = (reverse) (any|sorted|alphabetical|numeric|timestamp|size)
  BackupSubdirs = (mixed|first|last)
  PreScript = /backup/scripts/bb-mysql-full-pre
  PostScript = /backup/scripts/bb-mysql-full-post
applied to full backup locations (instead of subdirectories as proposed
yesterday).

Am Donnerstag, den 13.09.2007, 19:33 -0400 schrieb Ben Bennett:
> [...]
> The functionality I would like is to have a way to back up a DB
> well... I would need some hook to say "Does this need to be backed up"
> and then another call to actually extract the data to a dump file for
> backup. Though I suspect a clever cron job could do that for me...

Currently we implement the following: A cron job (which replaces the
default "bbackupctl -q sync" coming from the inofficial Debian packages)
first dumps our databases and then calls "bbackupctl -q sync" itself.
bbackupd.conf excludes all database directories but includes a
database-dump directory.

Bye
  Hansi