[Box Backup] Server redundancy and backup servers
Ben Summers
boxbackup@fluffy.co.uk
Fri, 24 Sep 2004 08:10:56 +0100
I have been discussing redundancy for servers off-list, and have come
up with some plans and preliminary design notes. A copy is below for
your comments.
Ben
Design objectives
* Failure means the server cannot be contacted by the client. If a
server can contacted by another server but not the client, then that
server must still be considered down.
* No central server. The objective above means server choice must be
made by the client.
* A misbehaving client should not cause the stores to lose
syncronisation.
* Assume that all servers have the same amount of disc space, and
identical disc configuration.
* Allow choice of primary and secondary on a per account basis.
* Any connection can be dropped at any time, and the stores should be
in a workable, if non-optimal, state.
* As simple as possible. Avoid keeping large amounts of state about the
accounts on another server.
Server groups.
* The client store marker is defined to change at the end of every sync
(if and only if data changed) from the client. The client sync marker
should increase each time the store is updated. This allows the server
groups to determine easily if they are in sync, and which is the latest
version.
* Stores are grouped. Each server is a peer within the group.
* On login, the server returns a list of all other servers in the
group. The client records this list on disc.
* When the client needs to obtain a connection to a store, it uses the
following algorithm:
Let S = last server successfully connected
Let P = primary server
Do
{
Attempt to connect to S
If(S == P and S is not connected)
{
Pause;
Try connecting to P again.
}
} While(S is not connected and not all servers have been tried)
If(S is not connected)
{
Pause
Start process again
}
Let CSM_S = client store marker from S
If(S != P)
{
Attempt to connect to P again, but with a short timeout this time
If(P is connected)
{
Let CSM_P = client store marker from P
If(CSM == expected client store marker)
{
Disconnect S
S = P
}
else
{
Disconnect P
}
}
}
This algorithm ensures that the client prefers to connect to the
primary server, but will keep talking to the secondary server for as
long as it's available and is at a later state than the primary store.
(This gives time for the data to be transferred from the secondary to
the primary and avoid repeat uploads of data.)
* Servers within a group use the fast sync protocol to update accounts
on a regular basis.
Observations
* The severs are simply peers. The primary server for an account is
chosen merely by configuring the client.
* If the servers simply use best efforts to keep each other up to date,
the client will automatically choose the best server to contact.
* Using the existing methods of handling unexpected changes to the
client store marker, it doesn't matter whether a server is out of date
or not. The existing code handles this occurance perfectly.
* The servers do not need to check whether other servers are down. This
fact is actually irrelevant, because it's the client's view of upness
which is important.
Accounts
The accounts database must be identical on each machine.
bbstoreaccounts will need to push changes to all servers. It will
probably be necessary to change the account database, and store the
limits within the database rather than in the stores of data. This is
desirable anyway.
Note: If another server is down, it won't be possible to update the
account database.
Alternatively, servers could update each other with changes to the
accounts database on a lazy basis. This might cause issues with
housekeeping unnecessarily deleting files which have to be replaced.
Fast sync protocol.
* Compare client store markers. End if they are the same. Otherwise,
the server with the greater number becomes the source, and the lesser
the target.
* Zero client store marker on target
* Send stream of deleted (by housekeeping) object IDs from source to
target. Target deletes the objects immediately.
* Send stream of object ID + hash of directories on source server to
the target.
* For each directory on the target server which doesn't exist, or
doesn't have the right hash...
- check objects exist, and transfer them
- write directory, only if all the objects are correct
- check for patches. Attempt to transfer by patch if new version exists
* Each server records the client store marker it expects on the remote
server. If that marker is not as expected, then the contents of the
directories are checked as well, sending MD5 hashes across. This allows
recovery from partial syncs. [This should probably be optimised if for
when there's an empty store at one end.]
* When an object is uploaded, the "last object ID used" value for that
account should be kept within the acceptable range to allow recovery
when syncing with the client.
* Write new client store marker on target
If a client connects during a fast sync, then that fast sync will be
aborted to give the client the lock on the account.
Optimised fast sync.
It's undesirable for the fast sync to check every directory when it
doesn't have to. During sync with a client a store
* Keeps a list of changed directories by writing to disc (and flushing)
every time a directory is saved back to disc.
* Keep patches from previous versions to send to remote store
* Connect after backup to remote stores, use fast sync to send changes
over.
This will allow short-cuts to be taken when syncing, and changes sent
by patch.
The cache of patches will need to be managed, deleting them when they
are transferred to a peer or are too old.
Housekeeping
Deleted objects need to be kept in sync too. Housekeeping takes place
indepedently on each server. Since housekeeping is a determinisitic
process, this should not delete different files on different servers.
A list of deleted objects is kept on each server during the
housekeeping process.
In the unlikely event that a server deletes an object that the source
server doesn't, this object will be retrieved in the next fast sync.
This is unlikely to happen because clients only add data.
Typically, housekeeping on non-primary servers will never delete an
object in that account.