[Box Backup] More serious Win32 0.9f/RedHat 0.9 trouble...

Nick Knight boxbackup@fluffy.co.uk
Mon, 11 Apr 2005 12:11:09 +0100


Is this repeatable - i.e. can you setup a store, using the same
directory structure it happening again.=20

It looks as though I will need help in debugging it - i.e. a machine I
can run up a debugger on, it would be better if this could happen i.e.
some one send me some reproducible data to test with.

-----Original Message-----
From: boxbackup-admin@fluffy.co.uk [mailto:boxbackup-admin@fluffy.co.uk]
On Behalf Of Per Thomsen
Sent: 11 April 2005 09:59
To: boxbackup@fluffy.co.uk
Subject: Re: [Box Backup] More serious Win32 0.9f/RedHat 0.9 trouble...

On 4/7/05 2:32 AM, Ben Summers wrote:

>
> On 7 Apr 2005, at 10:18, Gary wrote:
>
>> Hi everyone,
>>
>> Unfortunately, the problem of backing up large files from Win32
(WinXP
>> SP2) bbackupd to RedHad 9.0 bbstored is back again.
>>
>> Client side log:
>>
> [ snip ]
>
> I am suspicious of the Win32 port, to be honest. The store checking is

> quite comprehensive, and will definitely get it into a state which is=20
> working. Maybe it's not coping well with something. It would probably=20
> help Nick if you could create a minimal environment where it goes
wrong.
>
> I'm afraid that at the moment, I can only really help if it's run on=20
> UNIX.

A couple of comments about the Win32 client. I'm definitely suspicious=20
of it as well. I'm running several Unix clients (Linux FC2), and I have=20
no problems with them.

I'm having the exact same problem described above with the 0.09f Win32=20
client. It seems to have to do with the size of the stream that's sent=20
down to the client from the bbstored server. The server is sending=20
directory information to the client, and every time this directory is=20
sent, it fails on this client. The size of the stream sent from the=20
server to the client before the client disconnects (7/34 on the server)=20
is 3744380, which is several orders of magnitude larger than any other=20
stream sent by the server.

I run 5 Windows clients, and just one of them is displaying this=20
behavior a lot of the time. For 6-8 hour stretches, the client will=20
login  over and over again (because of the failure (7/34 on the=20
server)), and scan the local disk. Then, suddenly it will work for a=20
while (6-72 hours). Note that only one of the 5 clients has this
problem.

When the connection works, the object that appears to be the problem is=20
not sent from the server, and so things work.

Here is a log snippet (from the server) to illustrate:

A bad run:
Apr 10 04:27:45 planck bbstored[8863]: Receive=20
ListDirectory(0x72a2,0xffffffff,0xc,true)
Apr 10 04:27:45 planck bbstored[8863]: Receive=20
ListDirectory(0x72a2,0xffffffff,0xc,true)
Apr 10 04:27:45 planck bbstored[8863]: Send Success(0x72a2)
Apr 10 04:27:45 planck bbstored[8863]: Send Success(0x72a2)
Apr 10 04:27:45 planck bbstored[8863]: Sending stream, size 665
Apr 10 04:27:45 planck bbstored[8863]: Receive=20
ListDirectory(0x72ab,0xffffffff,0xc,true)
Apr 10 04:27:45 planck bbstored[8863]: Receive=20
ListDirectory(0x72ab,0xffffffff,0xc,true)
Apr 10 04:27:45 planck bbstored[8863]: Send Success(0x72ab)
Apr 10 04:27:45 planck bbstored[8863]: Send Success(0x72ab)
Apr 10 04:27:45 planck bbstored[8863]: Sending stream, size 3481
Apr 10 04:27:45 planck bbstored[8863]: Receive=20
GetBlockIndexByName(0x72ab,OPAQUE)
Apr 10 04:27:45 planck bbstored[8863]: Receive=20
GetBlockIndexByName(0x72ab,OPAQUE)
Apr 10 04:27:45 planck bbstored[8863]: Send Success(0xe442)
Apr 10 04:27:45 planck bbstored[8863]: Send Success(0xe442)
Apr 10 04:27:45 planck bbstored[8863]: Sending stream, size 3744380
Apr 10 04:39:51 planck bbstored[8863]: in server child, exception=20
Connection TLSReadFailed (Probably a network issue between client and=20
server.) (7/34) -- terminating child


A successful run:
Apr 10 06:51:02 planck bbstored[17033]: Receive=20
ListDirectory(0x72a2,0xffffffff,0xc,true)
Apr 10 06:51:02 planck bbstored[17033]: Receive=20
ListDirectory(0x72a2,0xffffffff,0xc,true)
Apr 10 06:51:02 planck bbstored[17033]: Send Success(0x72a2)
Apr 10 06:51:02 planck bbstored[17033]: Send Success(0x72a2)
Apr 10 06:51:02 planck bbstored[17033]: Sending stream, size 665
Apr 10 06:51:02 planck bbstored[17033]: Receive=20
ListDirectory(0x72ab,0xffffffff,0xc,true)
Apr 10 06:51:02 planck bbstored[17033]: Receive=20
ListDirectory(0x72ab,0xffffffff,0xc,true)
Apr 10 06:51:02 planck bbstored[17033]: Send Success(0x72ab)
Apr 10 06:51:02 planck bbstored[17033]: Send Success(0x72ab)
Apr 10 06:51:02 planck bbstored[17033]: Sending stream, size 3481
Apr 10 06:51:02 planck bbstored[17033]: Receive=20
ListDirectory(0x7316,0xffffffff,0xc,true)
Apr 10 06:51:02 planck bbstored[17033]: Receive=20
ListDirectory(0x7316,0xffffffff,0xc,true)
Apr 10 06:51:02 planck bbstored[17033]: Send Success(0x7316)
Apr 10 06:51:02 planck bbstored[17033]: Send Success(0x7316)
Apr 10 06:51:02 planck bbstored[17033]: Sending stream, size 769
Apr 10 06:51:02 planck bbstored[17033]: Receive=20
ListDirectory(0x731c,0xffffffff,0xc,true)
Apr 10 06:51:02 planck bbstored[17033]: Receive=20
ListDirectory(0x731c,0xffffffff,0xc,true)
Apr 10 06:51:02 planck bbstored[17033]: Send Success(0x731c)
Apr 10 06:51:02 planck bbstored[17033]: Send Success(0x731c)



Any thoughts would be very welcome.

Thanks,
Per

--=20
Per Reedtz Thomsen | Reedtz Consulting, LLC | F: 209 883 4119
V: 209 883 4102    |   pthomsen@reedtz.com  | C: 209 996 9561
GPG ID: 1209784F   |  Yahoo! Chat: pthomsen | AIM: pthomsen

_______________________________________________
boxbackup mailing list
boxbackup@fluffy.co.uk
http://lists.warhead.org.uk/mailman/listinfo/boxbackup