[Box Backup-dev] Re: [Box Backup] Danish and other special chars

Chris Wilson boxbackup-dev@fluffy.co.uk
Sat, 4 Feb 2006 18:14:46 +0000 (GMT)


Hi Gary,

On Sat, 4 Feb 2006, Gary wrote:

> Hmmmm, I thought UTF-8 was the "standard" encoding used on the server 
> side. If not, where does the UTF-8 come from in the first place?

Because we encoded the filename that way (in the win32 client) before 
sending the file to the server.

The other clients do things rather differently, but I believe, and please 
correct me if I'm wrong, that readdir() and friends return the filename as 
an array of bytes. The encoding used to store international characters in 
these byte arrays is probably system-dependent. I believe that older 
Linux systems would use the configured locale, while newer ones (e.g. Red 
Hat > 9.0) use UTF-8. I have no idea about other Unix systems.

> I think the only way to solve this once and for all would be to pick a 
> standard (say, UTF-8), and have each client convert filenames to/from 
> this standard before transmitting up/down to/from the server.

That certainly seems like the way to go to me, but lacking the 
availability of a standard Unicode conversion library on Unix platforms, I 
think it would be very difficult to achieve there. Nevertheless, we could 
just assume that each Unix must implement full UTF-8 support in filenames 
and consoles, or suffer the consequences :-)

> If I understand this function correctly, the call will not only 
> translate from UTF-8 to a local encoding, but also translate from 
> wide-char to multi-byte (MBCS), ready to go for printf().

Since when is MBCS ready for printf? I thought printf() could only take an 
ASCII byte array? Perhaps there is a unicode equivalent like _wprintf that 
we can use on Windows?

Cheers, Chris.
-- 
_ ___ __     _
  / __/ / ,__(_)_  | Chris Wilson <0000 at qwirx.com> - Cambs UK |
/ (_/ ,\/ _/ /_ \ | Security/C/C++/Java/Perl/SQL/HTML Developer |
\ _/_/_/_//_/___/ | We are GNU-free your mind-and your software |