Muliti-Lingual Web Sites and Character Encoding?

← Back to Stories (view on slashdot.org)

Muliti-Lingual Web Sites and Character Encoding?

Posted by Cliff on Thursday June 7, 2001 @01:48AM from the what-language-does-the-user-speak-in dept.

languageLost asks: "I'm working on developing a multi-lingual web site, and I've come across a major problem. It looks like when web browsers submit a form, they don't include the character encoding they used in the headers anywhere. This means I have no way to distinguish between ISO-Latin-1, Shift-JIS, or GBK, for example. Netscape Navigator, Internet Explorer and Mozilla all have this problem. The browsers do send a header called "Accept-charset", but that's not what I'm looking for (and this header typically lies, in any case). I need to know what encoding was used for the text in the form fields. Does anyone know how to do this without using "detection" heuristics? Why don't the browsers just say what encoding they're using?"

2 of 4 comments (clear)

Min score:

Reason:

Sort:

Apache modules by babbage · 2001-06-06 22:07 · Score: 3

I've never actually tried to use the facility, but Apache allows you to set environment variables more or less on the fly. Assuming that you're running Apache, look up the documentation on SetEnv. If you've got a copy of O'Reilly's Apache guide, the reference material starts on page 90. The syntax is one of:
SetEnv variable value SetEnvIf attribute regex envar[=value] [..]

How you actually get the character encoding into this new variable is the proverbial exercise left to the reader, but I'm pretty confident that it could be done. In the worst case scenario, you'd have to write a new module for Apache, but it's possible that something like this already exists. Surely this isn't a rare problem when getting into I18N issues....

--
DO NOT LEAVE IT IS NOT REAL
I've done this before by divbyzero · 2001-06-07 05:56 · Score: 3

Speaking from experience, I can say that posting HTML form data (using either GET or POST) works just fine in arbitrary encodings. The encoding will always be that of the page containing the form.
If your script is the one capturing the form data, then it is also usually the one which generated the page with the form on it, so you can tell the browser to switch into whatever encoding you want (using the charset option on the Content-type HTTP header or placing it in an HTML META tag).
But my grandest creation, as history will tell,

--
But my grandest creation, as history will tell,
Was Firefrorefiddle, the Fiend of the Fell.