[Box Backup] BoxBackup Server Side Management Specs (Draft 0.01)

Ben Summers boxbackup@fluffy.co.uk
Wed, 22 Sep 2004 10:07:24 +0100


On 22 Sep 2004, at 06:27, Per Thomsen wrote:

> All,
> Here are the things I have come up with from a user's perspective as
> requirements/specifications for server-side management.
>
> This is very rough and not necessarily thought through at all levels,
> but I thought I'd get it out there, so everyone could get a chance to
> see it, and shoot down / change / add to what's in here.

All looks sensible. Some comments below, to start off the comments.

>
> 1. Symbolic Names for hosts.
> In addition to using a hex number to identify backup clients, a mapping
> must be put in place between the account number, and a name the user
> (backup admin) determines. This name can be up to 255 characters long
> (RFC 1034).
>
> For backwards compatibility, account numbers must still be accessible,
> and usable just as before. If needed command line switches can be
> applied to denote one or the other. Additionally, for management
> purposes it should be possible to explicitly set the account number
> during the creation of a new client (as well as the symbolic name).
>
> While the name can be arbitrary, the installation process should 
> attempt
> to determine the full domain name of the host, and use that value as 
> the
> default.

Will these names replace the account numbers on the client side too? So 
instead of getting an account number from your administrator before 
configuring and generating the certificate, you would just use the 
symbolic name of your host?

>
> 2. Client groups.
> To support the distinction of groups of boxes being backed up, the
> concept of groups of users should be implemented. Groups are collection
> of clients or other groups (with appropriate circular-dependency 
> checking).
>
> The 'bbstoreaccounts' executable will add functionality to manage
> groups, including getting statistics from a group. The statistics will
> be the cumulative values of the same data as for a single client, with
> the addition of the following:
>
> - List of group members
> - ?

Is the requirement for a group to contain another group really 
essential? It complicates things somewhat. It would be much simpler if 
a client could be contained in one group, and one group only.

>
> 3. Client Monitoring.
> The client should be able to send 'heart beat' messages to the bbstored
> server.
>
> Configuration information for heart beat is kept on the bbstored 
> server.
> It includes:
>
> For each client:
> - on/off switch. Clients can be monitored, but are not required to be.
> This is especially useful for mobile users, who are not connected to 
> the
> internet all the time.
> - Heart beat interval. How often the client sends heart beat
> information. Given in seconds. Default is 900.
>
> Heart beat messages will not interfere with long-running syncs or
> restores (large files), but will insert itself as close to the interval
> as possible, to ensure that as few false error alerts as possible are
> sent to administrators.
>
> When bbackupd starts, it will register with the bbstored server, and
> request its configuration information. It will use this to send the
> messages at the appointed times.
>
> Also, bbstored will create a record of the now running bbackupd (in
> memory, mmap, or whatever works best), to hold the data for the
> statistics, as well as to ensure that only clients that have registered
> are being monitored.
>
> When a bbackupd daemon completes an orderly shutdown, it will
> 'de-register' itself from the bbstored service, to ensure that no false
> 'down alerts' go out. However, if bbackupd dies as a result of some
> failure, the record on the bbstored server will remain, and eventually
> cause alerts to go out to the backup administrator.
>
> The heart beat packet contains the following information:
>
> - Host identifier (name and/or account number)
> - uptime (ie. how long has bbackupd been running on this host)
> - timestamp of last sync (when was the last file uploaded)
> - Number of bytes synced since last heart beat message
> - Number of bytes restored since last heart beat message
> - any significant errors that have occured since last heart beat.
> - ?
>
> On the server side, a daemon (most likely bbstored) receives these 
> heart
> beat messages, and keeps track of the status of all clients. It will
> keep a running counter of the byte-count statistics for the client, as
> well as a log of the significant errors. bbstore
>
> When a bbackup client daemon dies unexpectedly, the bbstored server 
> will
> notice that there is no heart beat message from the client after
> approximately 2 x the heart beat interval. It will then notify backup
> administrators using the NotifySysAdmin.sh mechanism, or one very much
> like it.
>
> When a significant error occurs, and is logged with the server, a
> similar notification mechanism will be used to notify the backup
> administrator.

The current implementation will connect to the server every so many 
seconds (the UpdateStoreInterval) in lazy mode, or whenever the sync is 
started in snapshot mode. If there was an option to always connect to 
the server, instead of just when there was data to update, and sent in 
the additional information above, would this be a suitable mechanism?

It seems slightly unnecessary to have a completely separate method for 
transferring this information.

>
> 4. Space use reporting.
> Reporting of space consumption is needed at several levels:
>
> - The entire bbstored server (all RAID volumes being used for backups).
> - Each Volume. Ensuring that one single volume isn't bearing the brunt
> of the load, as well as for planning purposes.
> - By Group. This relates to item #2 in my list. It has very similar
> reporting requirements to the individual client, with the same 
> additions
> as described in the Group section.
> - By Individual. This is already available in the current version.
>
> 5. Account Database.
> The ability to store the client account information in a database is
> crucial to the stability and scalability of the system. Change the use
> of text files to using a database.

And of course, make the database so you can store information for many 
bbstored daemons in it.

>
> 6. Interaction with the rest of the world.
> Interfacing in an easy way to other systems for Monitoring and 
> reporting
> purposes. Rather than nicely formatted output there should be an option
> in all commands to format the output for human and for script
> consumption. This data could then be used by products like Nagios
> (www.nagios.org)

Is there a standard format which could be used.

>
> As I said this is a very rough draft. So, go crazy: edit, change, add,
> whatever.

I look forward to other comments too.

Per -- thanks for your work on this.

Ben