[Box Backup] Server redundancy and backup servers

Fri, 24 Sep 2004 08:10:56 +0100

I have been discussing redundancy for servers off-list, and have come 
up with some plans and preliminary design notes. A copy is below for 
your comments.

Ben

Design objectives

* Failure means the server cannot be contacted by the client. If a 
server can contacted by another server but not the client, then that 
server must still be considered down.

* No central server. The objective above means server choice must be 
made by the client.

* A misbehaving client should not cause the stores to lose 
syncronisation.

* Assume that all servers have the same amount of disc space, and 
identical disc configuration.

* Allow choice of primary and secondary on a per account basis.

* Any connection can be dropped at any time, and the stores should be 
in a workable, if non-optimal, state.

* As simple as possible. Avoid keeping large amounts of state about the 
accounts on another server.

Server groups.

* The client store marker is defined to change at the end of every sync 
(if and only if data changed) from the client. The client sync marker 
should increase each time the store is updated. This allows the server 
groups to determine easily if they are in sync, and which is the latest 
version.

* Stores are grouped. Each server is a peer within the group.

* On login, the server returns a list of all other servers in the 
group. The client records this list on disc.

* When the client needs to obtain a connection to a store, it uses the 
following algorithm:

Let S = last server successfully connected
Let P = primary server
Do
{
	Attempt to connect to S
	If(S == P and S is not connected)
	{
		Pause;
		Try connecting to P again.
	}

} While(S is not connected and not all servers have been tried)

If(S is not connected)
{
	Pause
	Start process again
}

Let CSM_S = client store marker from S

If(S != P)
{
	Attempt to connect to P again, but with a short timeout this time
	If(P is connected)
	{
		Let CSM_P = client store marker from P
		If(CSM == expected client store marker)
		{
			Disconnect S
			S = P
		}
		else
		{
			Disconnect P
		}
	}
}

This algorithm ensures that the client prefers to connect to the 
primary server, but will keep talking to the secondary server for as 
long as it's available and is at a later state than the primary store. 
(This gives time for the data to be transferred from the secondary to 
the primary and avoid repeat uploads of data.)

* Servers within a group use the fast sync protocol to update accounts 
on a regular basis.

Observations

* The severs are simply peers. The primary server for an account is 
chosen merely by configuring the client.

* If the servers simply use best efforts to keep each other up to date, 
the client will automatically choose the best server to contact.

* Using the existing methods of handling unexpected changes to the 
client store marker, it doesn't matter whether a server is out of date 
or not. The existing code handles this occurance perfectly.

* The servers do not need to check whether other servers are down. This 
fact is actually irrelevant, because it's the client's view of upness 
which is important.

Accounts

The accounts database must be identical on each machine. 
bbstoreaccounts will need to push changes to all servers. It will 
probably be necessary to change the account database, and store the 
limits within the database rather than in the stores of data. This is 
desirable anyway.

Note: If another server is down, it won't be possible to update the 
account database.

Alternatively, servers could update each other with changes to the 
accounts database on a lazy basis. This might cause issues with 
housekeeping unnecessarily deleting files which have to be replaced.

Fast sync protocol.

* Compare client store markers. End if they are the same. Otherwise, 
the server with the greater number becomes the source, and the lesser 
the target.

* Zero client store marker on target

* Send stream of deleted (by housekeeping) object IDs from source to 
target. Target deletes the objects immediately.

* Send stream of object ID + hash of directories on source server to 
the target.

* For each directory on the target server which doesn't exist, or 
doesn't have the right hash...
	- check objects exist, and transfer them
	- write directory, only if all the objects are correct
	- check for patches. Attempt to transfer by patch if new version exists

* Each server records the client store marker it expects on the remote 
server. If that marker is not as expected, then the contents of the 
directories are checked as well, sending MD5 hashes across. This allows 
recovery from partial syncs. [This should probably be optimised if for 
when there's an empty store at one end.]

* When an object is uploaded, the "last object ID used" value for that 
account should be kept within the acceptable range to allow recovery 
when syncing with the client.

* Write new client store marker on target

If a client connects during a fast sync, then that fast sync will be 
aborted to give the client the lock on the account.

Optimised fast sync.

It's undesirable for the fast sync to check every directory when it 
doesn't have to. During sync with a client a store

* Keeps a list of changed directories by writing to disc (and flushing) 
every time a directory is saved back to disc.

* Keep patches from previous versions to send to remote store

* Connect after backup to remote stores, use fast sync to send changes 
over.

This will allow short-cuts to be taken when syncing, and changes sent 
by patch.

The cache of patches will need to be managed, deleting them when they 
are transferred to a peer or are too old.

Housekeeping

Deleted objects need to be kept in sync too. Housekeeping takes place 
indepedently on each server. Since housekeeping is a determinisitic 
process, this should not delete different files on different servers.

A list of deleted objects is kept on each server during the 
housekeeping process.

In the unlikely event that a server deletes an object that the source 
server doesn't, this object will be retrieved in the next fast sync. 
This is unlikely to happen because clients only add data.

Typically, housekeeping on non-primary servers will never delete an 
object in that account.