[Box Backup-dev] Backup of PostgreSQL database

Chris Wilson boxbackup-dev@boxbackup.org
Tue, 2 Jun 2009 21:46:15 +0300 (EAT)


Hi David,

On Mon, 1 Jun 2009, David Sommerseth wrote:

> I've been using boxbackup-0.10 for over a year in a production 
> environment. And it really makes me sleep quite well at night.  But 
> there is one thing which can cause some unclear nightmares.  Proper 
> backup of PostgreSQL.
>
> Right now, I'm running the worst scenario of DB backup - backing up the 
> raw DB files without shutting down the database server.  But I have done 
> restores from it this way, but it is far from ideal as the WAL might not 
> be consistent.  This can cause troubles if WAL data is not written to 
> the table files yet.
>
> PostgreSQL do support some kind of Point-In-Time-Recovery, with special 
> commands for preparing files for backup.
>
> http://www.postgresql.org/docs/8.3/static/continuous-archiving.html

With MySQL I think people tend to do a database dump and back that up. 
It's guaranteed consistent for innodb databases. But if you have a huge 
database, I guess it's inefficient.

With Postgres, it looks like you could use the NotifySysadmin script to 
issue the CHECKPOINT and SELECT pg_start_backup() commands before the 
backup starts, and SELECT pg_stop_backup() when it finishes. Does that 
meet your requirements? Can you easily back up the entire postgres data 
directory?

> Another aspect which I know some financial industries are interested in 
> as well is some kind of stream-backup of all SQL queries which modifies 
> the database in which ever way (DELETE/UPDATE/INSERT/GRANT/CREATE/DROP). 
> So each query is then logged and backed up immediately.  There are some 
> commercial solutions already for Oracle, where this is done to a tape 
> streamer.  Not sure if BoxBackup would be suitable for such an approach.

The most obvious implementation would result in the creation of millions 
of small "files" (one per query) which Box Backup is not particularly good 
at, since it stores each one as a separate file on the server. So the 
restore would be slow and the store would use a lot of disk space.

However, you can stream an object to the server with on-the-fly 
encryption, so one could write code to keep a connection open and encrypt 
each command and write it to the server as it happens. I'd be a little 
concerned about what would happen if the connection was interrupted for 
any reason. I think the server would discard the whole file (query log). 
Perhaps it could be made not to do that with some flag to the StoreFile 
command.

So Box Backup could probably be made to do something like what you want, 
but I think it would require a signficant programming effort. Perhaps the 
"financial industries" would be interested in paying for it?

Cheers, Chris.
-- 
_ ___ __     _
  / __/ / ,__(_)_  | Chris Wilson <0000 at qwirx.com> - Cambs UK |
/ (_/ ,\/ _/ /_ \ | Security/C/C++/Java/Perl/SQL/HTML Developer |
\ _/_/_/_//_/___/ | We are GNU-free your mind-and your software |