[Box Backup] Problems with cygwin (was: Change in network usage characteristics in 0.08?)

Ben Summers boxbackup@fluffy.co.uk
Mon, 29 Nov 2004 10:33:35 +0000


On 28 Nov 2004, at 23:40, Per Thomsen wrote:

> On 11/27/04 9:10 AM, Ben Summers wrote:
>
>>
>> There were no changes in protocol behaviour in 0.08.
>>
>> If you use lazy mode, the daemon is designed to send a trickle of  
>> backups throughout the day. So I would expect a connection to the  
>> server for quite a lot of the day if the clients have extensive data  
>> modifications to upload.
>
> Agreed. However, I'm seeing this trickle of data 24x7. No let-up. I 
> know that the folks that are using the machine aren't changing files 
> at 2 in the morning. They go home at 5pm, so the latest I should see 
> is about 11pm or midnight for any substantial data transfer.

I agree that something is not quite right. I'm suspicious of the cygwin 
client.

>
>>
>> Regarding the asymmetry over the data transfer, it does seem a little 
>>  odd. Obviously you're confident that the measuring system is 
>> correct,  so turn on ExtendedLogging and take a look at the logs and 
>> see exactly  what is being sent.
>
> I'm using MRTG, which is a robust bandwidth monitoring tool.

Yes, that's probably reliable.

>
>>
>> If a directory is modified on the client, it will request a list of  
>> that directory from the server until it has uploaded the files. This  
>> might mean it's requested once an hour with the default settings, for 
>>  up to six hours. Might this be it?
>>
>> Or if the client is a fileserver, and one of it's clients has a clock 
>>  which is wildly out of sync, it might be downloading the directories 
>>  all the time. Check the client logs for warnings about huge offsets, 
>>  but, say, a 12 hour offset won't provoke the warning but continual  
>> filesystem write activity by that fileserver client will result in 
>> more  queries to the server.
>>
>> But as usual, a look at the logs will give more clarity over what's  
>> happening.
>
> I looked at the logs, and here's what I've found: The two wayward 
> clients are connecting every 5 minutes and 10 seconds (+/- 3 seconds), 
> rather than every hour as their config files dictate.

I can't think of any reason for that behaviour.

>
> Your explanation about the higher transmission rates makes sense, if 
> the clients are constantly requesting the directory information from 
> the server. The connection takes between 2 and 4 minutes, so there is 
> always at least one client connected to the server. Hence the constant 
> bandwidth drain.
>
> I won't have access to the client machines until tomorrow (they are at 
> 2 different companies that won't be open until Monday), but I will 
> look at the Windows event logs for them to see if I can glean any more 
> information from them.
>
> My suspicion is that one of two things is happening:
> 1. There is a bug in the 0.08 cygwin client, which causes this 
> behavior. My bandwidth use changed after I installed the 0.08 cygwin 
> client. I installed the 0.08h win32 client on a Windows machine here, 
> and observed no problems like what I'm seeing with the cygwin client. 
> The cygwin client on this machine had done the 'every-5-minute' thing 
> before that as well.

I have never run the cygwin client. I'm afraid I can't really help on 
this. Looking at the logs may show something interesting, and perhaps 
someone else who does run it can help?


>
> 2. The new versions of the client somehow has permission problems with 
> the /var/bbackupd directory. I will be checking into that tomorrow. If 
> bbackupd is unable to touch the 'last_sync_start' file, I can see how 
> this can happen. last_sync_start never gets a more recent timestamp, 
> and bbackupd will continually try to touch the file, and re-connect.

These timestamps are for information only. The client does not use them 
for timing information. This won't be the problem.

>
> I'm going to be switching to the Win32 clients anyway, so this may be 
> a moot point for me (if it is cygwin-related), but I wanted to make 
> sure that something wasn't being missed.

I would like to know why it's doing this, in case it's a problem with 
some of the non-cygwin code.

Ben