Muliti-Lingual Web Sites and Character Encoding?
languageLost asks: "I'm working on developing a multi-lingual web site, and I've come across a major problem. It looks like when web browsers submit a form, they don't include the character encoding they used in the headers anywhere. This means I have no way to distinguish between ISO-Latin-1, Shift-JIS, or GBK, for example. Netscape Navigator, Internet Explorer and Mozilla all have this problem. The browsers do send a header called "Accept-charset", but that's not what I'm looking for (and this header typically lies, in any case). I need to know what encoding was used for the text in the form fields. Does anyone know how to do this without using "detection" heuristics? Why don't the browsers just say what encoding they're using?"
How you actually get the character encoding into this new variable is the proverbial exercise left to the reader, but I'm pretty confident that it could be done. In the worst case scenario, you'd have to write a new module for Apache, but it's possible that something like this already exists. Surely this isn't a rare problem when getting into I18N issues....
DO NOT LEAVE IT IS NOT REAL
If your script is the one capturing the form data, then it is also usually the one which generated the page with the form on it, so you can tell the browser to switch into whatever encoding you want (using the charset option on the Content-type HTTP header or placing it in an HTML META tag).
But my grandest creation, as history will tell,
But my grandest creation, as history will tell,
Was Firefrorefiddle, the Fiend of the Fell.