[Box Backup] Boxbackup for Win32 and accentuated characters

Pascal Lalonde boxbackup@fluffy.co.uk
Tue, 01 Feb 2005 14:06:38 -0500


Hi,

I've been having some problems regarding file/directory names containing
accentuated characters (which are quite common in the french version of
Windows XP).

Here is what happens:
First of all I must mention that BoxBackup does not descend in
directories with accentuated characters. Only the directory itself is
backed up, and anything under it is ignored. I get the following message
in the event viewer, for each such directory:
Backup object failed, error when reading L:\\profiles\pascal\Menu
D??marrer
(Substitute the two ?? for ISO8859-1 characters C3 and A9 respectively:
capital A with a little thing above it and the copyright symbol)
It should really be "Menu Démarrer".

Windows XP seems to store file/directory names in UTF-8. Thus,
accentuated characters are encoded on 2 bytes. For example, the letter é
(e with acute) in UTF-8 is encoded as "C3 A9" in hex. When browsing
files in boxquery, such name will show up with different characters, as
they are interpreted using CP850 or something like that (Windows
cmd.exe's default codepage, it seems). Instead of an e acute, you get
two symbols: the first is one of those border symbols used in old DOS
dialog-based apps, and the second is the "Registered" symbol (C3 and A9
in CP850 respectively). It is still possible though to restore such
files by restoring the parent directory (which is always the case, since
otherwise boxbackup would not descend in it). Now here's the strangest
part. Upon restore, here is what happens:

C3 becomes a capital A with the tilde above it (ISO8859-1's C3
character)
A9 becomes the Copyright symbol.

But if Windows stores filenames in UTF-8, this means that the individual
bytes were first interpreted as ISO8859-1 characters, then translated to
their UTF-8 equivalent. In fact, if you let Boxbackup take a backup of
your restored folder (now containing A-tilde and Copyright), these two
characters take 2 bytes each in the UTF-8 encoding. Upon restoring one
more time, the new folder now has 4 special characters instead of the
"é" in the first version of the folder.

Now, by rereading all this e-mail, I find it a little confusing. I think
the best way would be to try it yourself. Just create a folder with an
accentuated character in it (é for example), and let boxbackup back it
up. Then restore it. The results should be:

1) Files with accentuated characters are OK
2) Directories are restored with special characters instead of the
original accentuated character
3) Nothing below the accentuated directory is backed up

Could anyone confirm this behavior ?

I'm using boxwin0.09b on XP Pro french with a 0.09 server on OpenBSD
3.4.

Thanks,
Pascal