[Box Backup] Server side requirements (plus others)

Mon, 14 Mar 2005 00:56:01 -0800

All,
A while back, I attempted to get a list of requirements together for the 
server side (bbstored). Since then, some changes have been made to this, 
plus some more good ideas have surfaced.

This is an attempt to regurgitate these requirements into one 
document/message.

As before, don't hesitate to comment on this. I'm trying to get a list 
of features together, that can be used both for planning and technical 
discussion.

This is mostly cut-pastes from the last 7 months of messages on the 
list. If I missed your gem of an idea, let me know, and I'll repost the 
message.

Thanks,
Per

The list:

1. Symbolic Names for hosts.
In addition to using a hex number to identify backup clients, a mapping
must be put in place between the account number, and a name the user
(backup admin) determines. This name can be up to 255 characters long
(RFC 1034).

For backwards compatibility, account numbers must still be accessible,
and usable just as before. If needed command line switches can be
applied to denote one or the other. Additionally, for management
purposes it should be possible to explicitly set the account number
during the creation of a new client (as well as the symbolic name).

While the name can be arbitrary, the installation process should attempt
to determine the full domain name of the host, and use that value as the
default.

The bbackupd installation process should be able to use both the symbolic
name and the account ID. Certificate files, etc. will be generated using 
the
symbolic name, if available. Otherwise it will fall back to the account ID.

2. Client groups.
To support the distinction of groups of boxes being backed up, the
concept of groups of users should be implemented. Groups are a collection
of clients, and no group can be a member of other groups.

A group has a 'group administrator' associated with itself. Messages that
would go to the Backup administrator if

The 'bbstoreaccounts' executable will add functionality to manage
groups, including getting statistics from a group. The statistics will
be the cumulative values of the same data as for a single client, with
the addition of the following:

- List of group members
- ?

It is given that if a client is a member of multiple groups, the 
statistical data
will count in each group.

Code and configuration should be implemented to support optional quotas
on groups. Each group will have hard and soft limits, much like client 
accounts.

I'm not sure what the consequences of exceeding a group quota should be?
Aggressive housekeeping on all members, when a client hits the group 
ceiling?
Something else?

3. Client Monitoring.
The client should be able to send 'heart beat' messages to the bbstored
server.

Configuration information for heart beat is kept on the bbstored server.
It includes:

For each client:
- on/off switch. Clients can be monitored, but are not required to be.
This is especially useful for mobile users, who are not connected to the
internet all the time.
- Heart beat interval. How often the client sends heart beat
information. Given in seconds. Default is 900.

Heart beat messages could be transmitted whenever a client connects to
the server for backing up, rather than on a separate time schedule.
Snapshot backups should transmit the data as well, to  be able to
track when the last backup was made. It would be preferable if
the interval was a separate number as described in the previous
paragraph. This would give more consistent data about clients that
snapshot backup or have long backup intervals. This is often done
(at least by me) to improve sluggish client machine performance
every hour, when the disk is scanned for eligible files.

Heart beat messages will not interfere with long-running syncs or
restores (large files), but will insert itself as close to the interval
as possible, to ensure that as few false error alerts as possible are
sent to administrators.

When bbackupd starts, it will register with the bbstored server, and
request its configuration information. It will use this to send the
messages at the appointed times.

Also, bbstored will create a record of the now running bbackupd (in
memory, mmap, or whatever works best), to hold the data for the
statistics, as well as to ensure that only clients that have registered
are being monitored. Snapshot backups will not register, but rather
data will be kept about the timestamps, etc. of the last backup.

When a bbackupd daemon completes an orderly shutdown, it will
'de-register' itself from the bbstored service, to ensure that no false
'down alerts' go out. However, if bbackupd dies as a result of some
failure, the record on the bbstored server will remain, and eventually
cause alerts to go out to the backup administrator, and the group
owner for a given group.

The heart beat packet contains the following information:

- Host identifier (name and/or account number)
- bbackupd version number
- backup type for last backup performed (lazy/snapshot)
- uptime (ie. how long has bbackupd been running on this host)
- time stamp of last connection (not necessarily any files uploaded)
- timestamps of last sync (when was the last file uploaded)
- Number of bytes synced since last heart beat message
- Number of bytes restored since last heart beat message
- any significant errors that have occurred since last heart beat.
- ?

On the server side, a daemon (most likely bbstored) receives these heart
beat messages, and keeps track of the status of all clients. It will
keep a running counter of the byte-count statistics for the client, as
well as a log of the significant errors.

When a bbackup client daemon dies unexpectedly, the bbstored server will
notice that there is no heart beat message from the client after
approximately 2 x the heart beat interval. It will then notify backup
administrators using the NotifySysAdmin.sh mechanism, or one very much
like it. This mechanism should support notification to a 'group owner', for
clients that are in a group.

When a significant error occurs, and is logged with the server, a
similar notification mechanism will be used to notify the backup
administrator.

Optionally, the statistics information can be stored in a database for
billing/auditing purposes.

A utility (possibly an updated bbstoreaccounts) will be needed to 
display this
information in ways that will be useful to administrators. For 
individual accounts
this information could include:

- time/date of last successful sync
- duration of last successful sync
- ???

4. Space use reporting.
Reporting of space consumption is needed at several levels:

- The entire bbstored server (all RAID volumes being used for backups).
- Each Volume. Ensuring that one single volume isn't bearing the brunt
of the load, as well as for planning purposes.
- By Group. This relates to item #2 in my list. It has very similar
reporting requirements to the individual client, with the same additions
as described in the Group section.
- By Individual. This is already available in the current version.

5. Account Database.
The ability to store the client account information in a database is
crucial to the stability and scalability of the system. Change the use
of text files to using a database.

Implement support for storing the client account information for multiple
Box servers in one database.

6. Interaction with the rest of the world.
Interfacing in an easy way to other systems for Monitoring and reporting
purposes. Rather than nicely formatted output there should be an option
in all commands to format the output for human and for script
consumption. This data could then be used by products like Nagios
(www.nagios.org)

7. Account Migration tools
It should be possible to move a client account from one Box server to 
another.

When the move is complete (not before), the old bbstored server should 
either
redirect (preferred) or proxy the requests to the 'new' server, so the 
client can
continue operations unaffected by the change.

8. Server Redundancy (grabbed from message by Ben on 9/24/04)

Design objectives
Failure means the server cannot be contacted by the client. If a server can
be contacted by another server but not the client, then that server must
still be considered down.

No central server. The objective above means server choice must be
made by the client.

A misbehaving client should not cause the stores to lose syncronisation.

Assume that all servers have the same amount of disc space, and identical
disc configuration.

Allow choice of primary and secondary on a per account basis.

Any connection can be dropped at any time, and the stores should be in
a workable, if non-optimal, state.

As simple as possible. Avoid keeping large amounts of state about the
accounts on another server.

8.1 Server groups.

The client store marker is defined to change at the end of every sync
(if and only if data changed) from the client. The client sync marker should
increase each time the store is updated. This allows the server groups to
determine easily if they are in sync, and which is the latest version.

Stores are grouped. Each server is a peer within the group.

On login, the server returns a list of all other servers in the group. 
The client
records this list on disc.

When the client needs to obtain a connection to a store, it uses the
following algorithm:

Let S = last server successfully connected
Let P = primary server
Do
{
    Attempt to connect to S
    If(S == P and S is not connected)
    {
        Pause;
        Try connecting to P again.
    }

} While(S is not connected and not all servers have been tried)

If(S is not connected)
{
    Pause
    Start process again
}

Let CSM_S = client store marker from S

If(S != P)
{
    Attempt to connect to P again, but with a short timeout this time
    If(P is connected)
    {
        Let CSM_P = client store marker from P
        If(CSM == expected client store marker)
        {
            Disconnect S
            S = P
        }
        else
        {
            Disconnect P
        }
    }
}

This algorithm ensures that the client prefers to connect to the primary
server, but will keep talking to the secondary server for as long as it's
available and is at a later state than the primary store. (This gives time
for the data to be transferred from the secondary to the primary and
avoid repeat uploads of data.)

Servers within a group use the fast sync protocol to update accounts
on a regular basis.

8.2 Observations
The servers are simply peers. The primary server for an account is
chosen merely by configuring the client.

If the servers simply use best efforts to keep each other up to date,
the client will automatically choose the best server to contact.

Using the existing methods of handling unexpected changes to the
client store marker, it doesn't matter whether a server is out of
date or not. The existing code handles this occurence perfectly.

The servers do not need to check whether other servers are down.
This fact is actually irrelevant, because it's the client's view of
upness which is important.

8.3 Accounts
The accounts database must be identical on each machine.
bbstoreaccounts will need to push changes to all servers. It will
probably be necessary to change the account database, and
store the limits within the database rather than in the stores of
data. This is desirable anyway.

Note: If another server is down, it won't be possible to update
the account database.

Alternatively, servers could update each other with changes to
the accounts database on a lazy basis. This might cause issues
with housekeeping unnecessarily deleting files which have to be
replaced.

8.4 Fast sync protocol
Compare client store markers. End if they are the same. Otherwise,
the server with the greater number becomes the source, and the
lesser the target.

Zero client store marker on target

Send stream of deleted (by housekeeping) object IDs from source to
target. Target deletes the objects immediately.

Send stream of object ID + hash of directories on source server to the 
target.

For each directory on the target server which doesn't exist, or doesn't
have the right hash...
    - check objects exist, and transfer them
    - write directory, only if all the objects are correct
    - check for patches. Attempt to transfer by patch if new version exists

Each server records the client store marker it expects on the remote 
server.
If that marker is not as expected, then the contents of the directories are
checked as well, sending MD5 hashes across. This allows recovery from 
partial
syncs. [This should probably be optimised if for when there's an empty 
store
at one end.]

When an object is uploaded, the "last object ID used" value for that 
account
should be kept within the acceptable range to allow recovery when syncing
with the client.

Write new client store marker on target

If a client connects during a fast sync, then that fast sync will be 
aborted to give the client the lock on the account.

8.5 Optimised fast sync.

It's undesirable for the fast sync to check every directory when it 
doesn't have to. During sync with a client a store

Keeps a list of changed directories by writing to disc (and flushing) 
every time a directory is saved back to disc.

Keep patches from previous versions to send to remote store

Connect after backup to remote stores, use fast sync to send changes over.

This will allow short-cuts to be taken when syncing, and changes sent by 
patch.

The cache of patches will need to be managed, deleting them when they 
are transferred to a peer or are too old.

8.6 Housekeeping
Deleted objects need to be kept in sync too. Housekeeping takes place 
independently on each server. Since housekeeping is a deterministic 
process, this should not delete different files on different servers.

A list of deleted objects is kept on each server during the housekeeping 
process.

In the unlikely event that a server deletes an object that the source 
server doesn't, this object will be retrieved in the next fast sync. 
This is unlikely to happen because clients only add data.

Typically, housekeeping on non-primary servers will never delete an 
object in that account.

9 Pseudo-clustering of servers (from Ben on 9/27/04)
It has just occurred to me that using the built in software RAID, a 
limited form of redundant servers could be created. Someone suggested 
this on the list a while back, and I've only just realised the 
implications.

All you need are three identical servers. On each server, compose the 
RAID file sets from the local hard drives and the two hard drives from 
the other servers (mount the discs using NFS or something.)

Run the bbstored daemon on each, and use round-robin DNS with a low TTL 
to send clients to different machines.

It should then "just work". If any machine goes down, then the software 
RAID will kick in and no-one will notice, apart from the administrator 
who will notice the log messages.

The changes required are:

* Add communications between bbstored servers so that a client can log 
in even if another server is housekeeping that account.

* Account database syncing between servers.

* Raid file disc set restoration tools needs to be written (which is 
still currently lacking -- right now you have to move the existing files 
away in case they're needed, then blank every account and wait until the 
clients have uploaded everything again.)

* Efficiency: write the raidfile daemon to offload RAID work, and write 
the temporary files to the local filesystem only.

The advantage over the previous plan is that most of the work is already 
done -- none of the above is a particularly significant amount of 
effort. The disadvantage is that it limits clusters to three machines 
which are connected to each other with fast network connections. 
However, it is a rather neat and simple solution.

10. No SSL/TLS on the wire.
It should be an option to turn off SSL/TLS after the initial handshake, 
to lower the overhead of the protocol.

-- 
Per Reedtz Thomsen | Reedtz Consulting, LLC | F: 209 883 4119
V: 209 883 4102    |   pthomsen@reedtz.com  | C: 209 996 9561
GPG ID: 1209784F   |  Yahoo! Chat: pthomsen | AIM: pthomsen