[Box Backup] Restoring from a hardware failure

Per Thomsen boxbackup@fluffy.co.uk
Mon, 25 Jul 2005 22:52:45 -0700


On 7/22/05 2:01 AM, Dennis Speekenbrink wrote:

> Per Thomsen wrote:
>
>> All,
>> I'm using a hardware raid card on my bbstored server, and I had a 
>> drive failure the other day. Switched out the drive, and everything 
>> seemed to work fine.
>>
>> However, one of the clients was unable to back up. I found the 
>> following in one of the backupdirs under that account (output of 'ls 
>> -li'):
>>
> Going for a hardware RAID setup myself for my new server, this worries 
> me.
> Could you please offer a few more details on the original setup and 
> what seemed to cause the crash.

Here's my configuration:

Dell GX400 w/ 1G RAM (1.4GHz P4), and a 3ware 8506-4LP card. The 4 
backup drives are configured into 2x200G RAID1 drives. The individual 
drives are 200G SATA Western Digital drives.

Here's my conjecture on what happened:
A drive lost its connection (the SATA cable was somehow loosened, and 
eventually fell out). I found the problem, rebooted and told the drive 
to rebuild. Lots of problems accessing the drive during the bootup 
process. So, I called 3ware. They suggested removing the 'faulty' drive, 
and replacing it with a fresh one. I did that, and things seemed 
hunky-dory. Only to find that this one user was having the problem with 
these files on the rebuilt drive. Something is wrong with the superblock 
on this drive now. Since the file system can't even see that there's a 
problem, it seems more like a filesystem problem to me, than necessarily 
a hardware RAID problem, but I am by no means a filesystem guru...

>
> It would seem to me that any hardware RAID setup (except 0:striping) 
> is _invented_ to prevent these issues from arising, no?

Yup. I'm a bit frustrated too.

Just another observation: I don't think that my next box server will use 
hardware RAID. At least in my setup (remote backups over a T1 line from 
DSL or  a wireless connection) the bandwidth will be the slowest link, 
so there shouldn't be any problems with the disk I/O becoming the 
bottleneck with a software RAID solution.

>
> Thanks for any insights.
>
> For your problem, it would seem to me that removing (or moving, try 
> never to destroy data unless your sure you won't need it anymore) 
> would cause the client to re-send all original data to the server.  
> But like Ben posted, let bbstoreaccounts do it's thing before 
> re-allowing the client's connection.

This worked fine. There were no problems after I moved the directory, 
and let bbstoreaccounts fix the inconsistencies.

> The only way I see this going wrong, is when the client has dataloss 
> now, it will not be available on the server yet.

That's right, but the backup is complete and up-to-date now.

Thanks,
Per


-- 
Per Reedtz Thomsen | Reedtz Consulting, LLC | F: 209 883 4119
V: 209 883 4102    |   pthomsen@reedtz.com  | C: 209 996 9561
GPG ID: 1209784F   |  Yahoo! Chat: pthomsen | AIM: pthomsen