[Box Backup] Boxbackup for Win32 and accentuated characters
Nick Knight
boxbackup@fluffy.co.uk
Thu, 3 Feb 2005 09:28:49 -0000
Hello Pascal,
The win32 client should be able to cope with the Unicode filenames, I =
suspect there maybe issues with one of my functions - I only ever tested =
it with the filenames - didn't cross my mind about the directory names - =
so never tested it - and they are handled by different functions, I will =
look into it...
Nick
-----Original Message-----
From: boxbackup-admin@fluffy.co.uk [mailto:boxbackup-admin@fluffy.co.uk] =
On Behalf Of Pascal Lalonde
Sent: 01 February 2005 19:07
To: boxbackup@fluffy.co.uk
Subject: [Box Backup] Boxbackup for Win32 and accentuated characters
Hi,
I've been having some problems regarding file/directory names containing
accentuated characters (which are quite common in the french version of
Windows XP).
Here is what happens:
First of all I must mention that BoxBackup does not descend in
directories with accentuated characters. Only the directory itself is
backed up, and anything under it is ignored. I get the following message
in the event viewer, for each such directory:
Backup object failed, error when reading L:\\profiles\pascal\Menu
D??marrer
(Substitute the two ?? for ISO8859-1 characters C3 and A9 respectively:
capital A with a little thing above it and the copyright symbol)
It should really be "Menu D=E9marrer".
Windows XP seems to store file/directory names in UTF-8. Thus,
accentuated characters are encoded on 2 bytes. For example, the letter =
=E9
(e with acute) in UTF-8 is encoded as "C3 A9" in hex. When browsing
files in boxquery, such name will show up with different characters, as
they are interpreted using CP850 or something like that (Windows
cmd.exe's default codepage, it seems). Instead of an e acute, you get
two symbols: the first is one of those border symbols used in old DOS
dialog-based apps, and the second is the "Registered" symbol (C3 and A9
in CP850 respectively). It is still possible though to restore such
files by restoring the parent directory (which is always the case, since
otherwise boxbackup would not descend in it). Now here's the strangest
part. Upon restore, here is what happens:
C3 becomes a capital A with the tilde above it (ISO8859-1's C3
character)
A9 becomes the Copyright symbol.
But if Windows stores filenames in UTF-8, this means that the individual
bytes were first interpreted as ISO8859-1 characters, then translated to
their UTF-8 equivalent. In fact, if you let Boxbackup take a backup of
your restored folder (now containing A-tilde and Copyright), these two
characters take 2 bytes each in the UTF-8 encoding. Upon restoring one
more time, the new folder now has 4 special characters instead of the
"=E9" in the first version of the folder.
Now, by rereading all this e-mail, I find it a little confusing. I think
the best way would be to try it yourself. Just create a folder with an
accentuated character in it (=E9 for example), and let boxbackup back it
up. Then restore it. The results should be:
1) Files with accentuated characters are OK
2) Directories are restored with special characters instead of the
original accentuated character
3) Nothing below the accentuated directory is backed up
Could anyone confirm this behavior ?
I'm using boxwin0.09b on XP Pro french with a 0.09 server on OpenBSD
3.4.
Thanks,
Pascal
_______________________________________________
boxbackup mailing list
boxbackup@fluffy.co.uk
http://lists.warhead.org.uk/mailman/listinfo/boxbackup