Slashdot Mirror


Changing Your Filesystem's Locale?

dybdahl asks: "Now that Red Hat has changed the default character set to be UTF-8, none of the existing filenames that included local characters like æ, ø, å, (Denmark) are handled correctly by Konqueror or can be seen correctly with "ls" in a shell. Is there a tool out there that can convert an ISO8859-1 ext3 filesystem to UTF-8?"

1 of 15 comments (clear)

  1. What they should do by spitzak · · Score: 2, Insightful
    Forget all this nonsense about "locales". It is obvious there are exactly 2 "locales" of interest, UTF-8 and ISO-8859-1. Now suprisingly enough these can co-exist almost perfectly, so there can be *one* "locale" and we can be rid of all this brain-dead attempts at i18n.

    What systems should do is treat all streams of bytes as UTF-8, with the additional rule that all sequences of bytes that are not legal UTF-8 (including a unicode value encoded with more bytes than necessary) should be treated as individual bytes in ISO-8859-1. It turns out that you need three accented characters in a row, or a capitalized accent character followed by a foreign punctuation mark, for an ISO-8859-1 to be confused with UTF-8.

    I very much believe this works, although I think a search should be done through lots of ISO-8859-1 text to find out if there are any common sequences that are confused with UTF-8.

    Even if this is not a perfect solution, it certainly is better than the current scheme. Most filenames will be readable. More importantly it gets rid of the idea of an "error" in a character string, significantly simplifying the interfaces.