[Box Backup-dev] Soft-RAID support

Chris Wilson boxbackup-dev@boxbackup.org
Fri, 24 Jul 2009 23:29:26 +0100 (BST)


Hi David,

On Fri, 24 Jul 2009, David Sommerseth wrote:

>>>>  Support for [software RAID] was never finished (no recovery
>>>>  procedure), it is pretty limited (only supports RAID 5 and three
>>>>  devices) and it was written at a time when OS/software and hardware
>>>>  RAID were not as ubiquitous or well supported as they are now.
>>>
>>>  I would be willing, with some guidance to look into such a tool, if
>>>  that is the main criteria for dropping this support.
>>
>>  That would definitely be very helpful, thanks in advance. You can read
>>  the encrypted objects (which are reconstructed successfully) and then
>>  rewrite them, which will reestablish the redundant copies.
>
> I'll grab the code soon after the holiday season is over, and poke into 
> this. I'd consider the program as a stand alone program somehow, which 
> will do the recreation in a way which you suggest.  As a brief quick 
> idea of how I could imagine it:
>
>     root@host # bbackrecover --source-dir1 /path/to/origdata_part1 \
> 			      --source-dir2 /path/to/origdata_part2 \
> 	  		      --recover-dir /path/to/recovered_part3
>
> Only the missing part would then be recovered to the given directory in 
> --recover-dir.  Not sure though, if it would be needed to write data to 
> part1 and part2 directories in addition to the already mentioned part3. 
> Does this approach seem sensible?

I'd slightly prefer it if this was integrated with the main 
bbstoreaccounts utility, perhaps with the existing "check" command. I 
don't have a very strong objection to creating a new utility, but it seems 
to naturally belong there.

The most obvious implementation would be to completely rewrite each 
object, which would require touching all three files, even though only one 
is strictly necessary. The alternative would be to write a utility which 
requires a deeper understanding of the RAID file format. You might like to 
consider that as an optimisation for later work, once you have the basic 
RAID recovery working.

> Any special parts in the code you'd recommend me to dig into before I 
> begin to ask more questions?  And any of the Box Backup developers 
> available on IRC channels?

Sorry, I don't do IRC in general, I simply don't have time for it. However 
if you wanted to have a focused introduction or Q&A session to the code at 
a specific time and place, I think I could do that for you.

> Is the source code available via a public SCM URL?  (git, svn, cvs)

It's all in Subversion at https://www.boxbackup.org/svn/box/trunk/.

>> >  The soft-raid solution itself seems to work flawlessly and seems to only 
>> >  need this recovery tool.  Or are there any other issues which is not to 
>> >  well known with the soft-raid which should make me worried?  Are there 
>> >  any critical bugs related to the current implementation?
>>
>>  No, I don't think so. All of our tests actually run in RAID mode, hence
>>  the "more tested" aspect. However it does impose significant performance
>>  limitations which may prevent me from making some optimisations to reduce
>>  disk I/O in future, and the new refcount database will not be mirrored,
>>  but it can be reconstructed by housekeeping in any case, so it's more of a
>>  cache than a database.
>
> That sounds good.  Of course I/O requests and performance are more complex 
> when needing to keep control over three streams vs just one.  But this 
> optimisation is also depending on how clever the OS is able to spread the 
> tasks.  Of course, I do recognise that if all data is on the same device, an 
> OS optimisation should probably be ignored.  It could also be that with some 
> syscalls, it's possible to do, at least some of, this optimisation inside 
> BoxBackup (usually done by sorting by inodes of the files being read/written 
> to/from, afaik - considering the inodes for all of these three streams).  But 
> in the case of using 3 different devices, the OS is the one which should do 
> the optimisation.

The largest problem that I'm aware of is that a RAID file can't be 
modified in place, it has to be completely rewritten. This is needlessly 
intensive on time, disk space and I/O operations, and a good reason to 
consider soft RAID as a candidate for being replaced by a faster 
filesystem in some cases. However, I don't know whether or when I will 
actually do so. Amazon S3 suffers from the same problem (in-place partial 
updates are not possible).

>>  As you have a good use case for it, I am not planning to remove it in
>>  the near future. However I would be interested in thinking about
>>  better ways to implement this, such as at the OS level. I do think it
>>  would be more efficient, not less, to implement this at the block
>>  level in the OS rather than in Box.
>
> Thanks!  This sounds good.  Yeah, it would be possible to move it to 
> kernel-space.  But I'm not sure this would gain too much interest, as 
> you have dmraid and mdraid in kernel already (thinking Linux primarily). 
> To have a file-based soft-raid in addition, might be considered waste of 
> time - and a more difficult case to optimise.  Anyway, I'll try to 
> mention it for some kernel fs developers at work.

I wasn't actually proposing adding a file-based RAID system to the kernel, 
although I have considered it in the past as it has many potential 
advantages. But I don't see why it shouldn't be possible to use 
block-level kernel RAID to implement what you are intending to do, without 
requiring any code in Box Backup to support it.

> Another reason why not to depend on the OS here, is that this might not 
> be possible or very difficult to implement such feature in all OS 
> supported by Box Backup.

I think we can, and possibly should, delegate the RAID support to the OS, 
where the code is heavily tested, used by many other applications, and 
supports more interesting combinations such as generic block devices (e.g. 
iSCSI and ATAoE) as backends.

>>  I'm also planning to implement S3 client support in Box Backup fairly
>>  soon, and I expect that most users will move to that as it frees them
>>  from the need to ever buy more disks or take their systems offline for
>>  disk upgrades. Unless we can find a good way to support userland RAID
>>  on top of S3, I expect that these code paths will diverge
>>  significantly and you may find that fewer users use libraidfile at
>>  all.
>
> I haven't studied the S3 client in general much, and thus I have no idea 
> how the Box Backup implementation for S3 will be.  But if the soft-raid 
> code is kept inside Box Backup, it might be easier to setup three remote 
> destinations as well, or one local and two remote.
>
> Have you thought about supporting other remote protocols in addition? 
> Like ssh or webdav?  For me it sounds like you might plan such a remote 
> layer is located in the bbstored, instead of assigning a local directory 
> you assign a remote - or am I completely wrong?

You could already use ssh/webdav as a backend, probably without problems, 
although it may violate POSIX guarantees and thereby lose some of the 
safety that Box Backup supposedly guarantees by relying on them (e.g. 
atomic rename over existing files to replace them). When I consider 
alternative backends, I'm more thinking about other remote filesystem 
protocols which don't map well onto POSIX semantics. S3 positively sucks 
in that regard, which makes the implementation very challenging :)

Cheers, Chris.
-- 
_____ __     _
\  __/ / ,__(_)_  | Chris Wilson <0000 at qwirx.com> - Cambs UK |
/ (_/ ,\/ _/ /_ \ | Security/C/C++/Java/Ruby/Perl/SQL Developer |
\__/_/_/_//_/___/ | We are GNU : free your mind & your software |