Originally Posted By: tacit
[On a vaguely tangentially related note, speaking of languages, it appears there's some funkiness in the way the Finder reports names with accented characters in them. If you use the version of PHP that ships with OS X and you write code that accesses files in the local filesystem, PHP appears not to be able to read files whose names contain accented characters, no matter what character encoding you use, even if the file paths and names appear identical in the Finder and PHP. I've never been able to figure out what gives with that.

This may have to do with the fact that HFS+ uses "fully decomposed canonical" characters, stored on disk using UTF-16. PHP is probably using UTF-8, but that's no problem. The conversion between UTF-16 and UTF-8 is mechanical and trivial. The "fully decomposed" part is more interesting.

For example, the letter "á" (an "a" with an acute accent), can be written in Unicode as either the single codepoint 00E0 (LATIN SMALL LETTER A WITH ACUTE), or as the pair of codepoints 0061 0301 (LATIN SMALL LETTER A followed by COMBINING ACUTE ACCENT). HFS+ insists on the latter, even though the printed glyph is exactly the same.

Even this conversion should happen automatically. Perhaps PHP is doing something to prevent the conversion.