Slashdot Mirror


How Are You Accomplishing Your i18n?

cobrabyte asks: "My team has recently been given the task of implementing internationalization (i18n) in our MySQL databases (PHP-interfaced). Essentially, for every article X, we need it presented in any number of languages (once translated). As we were working on gathering the necessary procedures, we were very surprised to find that there's not much organized information regarding i18n using MySQL and PHP. Is the topic of i18n too new to garner any usable info?"

20 of 117 comments (clear)

  1. It is just me? by stinerman · · Score: 2, Insightful

    I can't stand that abbreviation, i18n. I mean who thought that would be a good abbreviation? It bears no resemblence to the original word. I think we can do better.

  2. How Are You Accomplishing Your i18n? by pyrrhonist · · Score: 4, Funny
    How Are You Accomplishing Your i18n?

    By p09g.

    --
    Show me on the doll where his noodly appendage touched you.
  3. Have your looked at PEAR? by 33degrees · · Score: 4, Informative

    I haven't tried any of them, but PEAR has a number of packages for dealing with internationalization. You might want to try looking there for insight.

  4. Easy way, using SQL by jd · · Score: 2, Informative
    Simply define a strings table with two key fields. The first key defines the string ID, the second defines the language ID. The sole attribute would then be the string itself.


    Adding a new language then just becomes a case of adding a new language ID to the system, and adding a new string becomes adding a string ID.


    Any place that you want to generate an output string, simply insert a token which represents the string ID. Your translation code scans for the tokens, gets the current language from the environment, and then searches your strings table for the substitution string.


    (For those who remember the Commodore PET computer, this is very similar to how it worked. The Print command, for example, was stored internally as a "?" token. It substituted when displaying.)


    You do not need a table for the string IDs, an enumerated type would be sufficient to track what IDs are in use and what for. You WOULD want a table for the language, with the language ID as the key field (preferably as an enumerated type) and the font ID as the attribute. If you are not using fonts (eg: plain-text output) then again you can just use the enumerated type.


    Because you would NOT be encoding font data into the string (NEVER, EVER do that, by the way, as you're just padding the data with redundant information, and introducing extra complexity), you can replace the font at will, provided it conforms to the mapping standards for international character sets.

    --
    It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
  5. i18nHTML by Mind+Booster+Noori · · Score: 3, Informative

    Well, I think you're looking for this.

  6. That's not just i18n. by truedfx · · Score: 3, Insightful
    My team has recently been given the task of implementing internationalization (i18n) in our MySQL databases (PHP-interfaced). Essentially, for every article X, we need it presented in any number of languages (once translated).
    Let's check that link, shall we?
    The distinction between internationalization and localization is subtle but important. Internationalization is the adaptation of products for potential use virtually everywhere, while localization is the addition of special features for use in a specific locale. Subjects unique to localization include:

    * Language translation,
  7. Simple by metamatic · · Score: 2, Funny

    I shout loudly and tell the users to learn English.

    (I keed, I keeed...)

    --
    GCHQ Quantum Insert installed. If only our tongues were made of glass, how much more careful we would be when we speak
  8. Surpise! by fm6 · · Score: 2, Interesting
    ...we were very surprised to find that there's not much organized information...
    You shouldn't be. Internationalization (I'm a good typist, so I can dispense with that mysterious "18") only applies to human-readable stuff. In other words, documentation. (Yes, captions on forms are documentation too!) Is there anything software people are less motivated to deal with than documentation?
  9. UTF! by EvilIdler · · Score: 2, Insightful

    Definitely use UTF-8 for all your strings and XHTML documents.
    Make sure your preferred editors really are saving UTF-8.

  10. Re:What's the question? by plcurechax · · Score: 5, Informative

    -without using some crappy 'BabelFish' layer

    Ask any government that supports multiple official languages (Canada, Switzerland, ...). You translate into the other language(s) using professional translators. Period. You can give them the most powerful automatic translation tools available, and multiple language dictionarys (e.g. English-French) but in the end you need a human professional translator to make translations worth reading.

    -without having to write a complete localized version for each language.

    You need to make the content management system (CMS) language aware, and you need to localize all your templates. Then you need to add a key to your article database for language, so the user can retrieve article 101 in either english or french. (think a long the lines of http://localhost/cms/display.php?article=101&lang= en ).

    I know nothing about PHP programming, so I cannot comment on that, or MySQL (main gotcha I expect is datatype, UTF-8, iso8859-1, vs. windowspage1574). Two articles I found useful in general about internationalization are

    UTF-8 and Unicode FAQ for Unix/Linux by Markus Kahn
    How do I have to modify my software?
    http://www.cl.cam.ac.uk/~mgk25/unicode.html#mod

    The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)
    http://www.joelonsoftware.com/articles/Unicode.htm l

  11. Smarty + preparse plugin by DamienMcKenna · · Score: 3, Informative

    I did this in 2003 for a CMS+ecommerse system I did for a company. You had Smarty templates which had things like {productstr1} in them. The text strings were referenced by language and string ID, and if the string didn't have a specific version for your language it defaulted to English. This string was loaded from the database in a preparse plugin and was cached in a per-language directory. It worked ok, a bit kludgy but sufficient to get the job done.

    Damien

  12. It is just you by bluGill · · Score: 4, Insightful

    The problem is you speak English. There is a good chance that you speak no other language. Since nearly everything is written in English first these days, you don't care about these issues.

    Many of those who care about i18n do not speak English at all! To these people even spelling the word out gives no help. In fact it is less helpful because they have to learn this large symbol. (There is no reason to assume they even know the Latin alphabit, so they will not think to learn each letter separately)

    Of those who speak English, many do not speak it fluently. Often they speak English as a first year student ("hello, my name is"), and they know how to look words up in their English-whatever dictionary.

    Of course English is the dominate second language in the world. There are plenty of people who speak English fluently as a second language. They often have trouble with the creative spelling English came up with. Words with 20 letters are hard for anyone to spell, so it would be no surprise if they have trouble spelling it.

    The goal is one symbol that is easy for everyone to recognize. No matter what language the page is written in, if you see "i18n", you know you are in a location where people are interested in translation. This is often enough for some educated clicking to find the same information in your language.

    i18n may not be a good abbreviation. However can you come up with a way to represent the concept to all 6+billion people on earth?

    1. Re:It is just you by Pseudonym · · Score: 2, Funny
      Write it in esperanto!

      Unfortunately, "internaciigo" isn't necessarily an improvement.

      Admittedly, it's shorter. However, the word is pronounced something like in-tehr-na-tsee-EE-go. Many people find the "ts" followed by two separately pronounced "i"'s, with overall word emphasis on the second, a bit hard to pronounce. And it sounds a bit like there's the word "nazi" in the middle, which means the thread is over.

      --
      sub f{($f)=@_;print"$f(q{$f});";}f(q{sub f{($f)=@_;print"$f(q{$f});";}f});
    2. Re:It is just you by Pseudonym · · Score: 2, Funny

      I propose we use the locale-neutral word "internationali1ation".

      --
      sub f{($f)=@_;print"$f(q{$f});";}f(q{sub f{($f)=@_;print"$f(q{$f});";}f});
    3. Re:It is just you by pla · · Score: 3, Interesting

      Of course English is the dominate second language in the world.

      In IT, English holds the majority by far. And Spanish doesn't even come in second - You have Japanese and German as distant seconds, with Hebrew and French as dark-horse thirds.

      Attempts at internationalization simply hinder the adoption of English as the next ubiquitous academic language. Much like Greek and Latin during the Roman empire - The rabble may all speak Spanish, but those who want to appear educated speak English. Of course, Latin later went on to hold the same place, so perhaps some day Spanish will function as the language of the academic elite.

      Personally, I don't have great hope of us not blowing up the planet before then. So I code with English as my target language. Speak it, or don't use my programs, doesn't much matter to me


      Many of those who care about i18n do not speak English at all!

      I don't think that needs an exclamation mark - It doesn't come as a particular surprise to anyone. If you speak English, you don't have the least interest in "internationalization", which basically means "Make it accessible to people who don't speak English".


      And I don't write this as a xenophobic rant... I regularly use programs written by Japanese coders, and a few in German. And do I sit around complaining about how those coders, who already have given me something I find useful, should do extra work unrelated to the purpose of the program to make those programs more friendly to me? No. I recognized my inability to read the menus and such as a shortcoming in myself, and made the effort to learn enough Japanese and German (albeit very little) to navigate those programs.

      Or to put that another way - If Bill Gates only spoke Italian, a LOT more people would have learned at least a basic proficiency in it by now.

  13. Re:profanity, morality? by bluGill · · Score: 2, Insightful

    Because these issues will trip you up.

    Particularly when using automatic translation (which is a bad idea anyway), something that is acceptable in your language may come out as something unacceptable in a different one. No matter how cheap you are trying to get by, you still need a someone to check profanity in your output. This is less a problem with human translators who will avoid the issue, but even still you should check because some translators will apply them thinking you won't know.

    Morality is important because you don't think of the issue. Muslim societies have restrictions on what females can wear. Show a girl in a swimsuit (even a one-piece) in the context of diving, and you have offended your Muslim audience. Christans have similar taboos, but will generally not be offended by that same picture. Hindu's consider cow sacrid, and your promotion of a pound of beef with any order will offend them.

    You might not consider them, but you should. These two issues cover all the subtile things that you won't think about unless you make a special effort.

  14. Re:Multilingual user interface by Intron · · Score: 2, Insightful

    I found that translating some concepts gave strings of very different lengths. For example, some technical stuff became much longer strings in Spanish (maybe it was my translators). What do you do about the problem of the web forms getting messed up in different languages? My site is small enough to just test and adjust where necessary, but for a bigger site, this could be a problem.

    --
    Intron: the portion of DNA which expresses nothing useful.
  15. Some answers by JavaRob · · Score: 5, Informative

    I18n/localization is one of those tasks that has *lots* of questions that will need to be resolved... often you won't even know about all of the issues to resolve until you start digging into it.

    I sorted out the i18n design for a project recently, so I can share some insights on the process. My project used Java/JSP, but the problems are mostly the same. One of the most important points to be made is that you *need* to sit down and design it all the way through -- this is not a "feature" that can be easily added in when you need it later (and extreme programming teams can get hosed on this one pretty easily).

    Things to consider (in the sequence of a request for simplicity's sake):
    1) How will you know what language a user wants (first time, and on subsequent pages)? The user should be able to select/change their preference (though you could use their browser-reported locale as a guess), and they should be able to *bookmark* the homepage in their language. You could use a cookie, and redirect from the basic homepage based on that. Personally I avoid depending on cookies where possible, I didn't want to have duplicated directory structures, and I didn't want an added param on every request, so I used multiple *subdomains*, one per supported locale. They all mapped to the same IP, same application -- but in the web application I could check the requested URL and set the locale (and build the page) correctly using that. There were links on the top of the homepage to switch languages -- which would just flip to the proper subdomain. (Important note -- this complicates getting a cert for SSL, since that's tied to the domain... keep that in mind).

    Once you know what language you're using, build the page... this will probably involve getting data out of the database and displaying some of it.

    First, make sure your tables support whatever character set the languages will need. Then make your data design carefully. You need to make sure that any data in the database that will show up onscreen: product descriptions, category names, and ALSO prices (you probably have to give prices in various currencies, right?).

    Building the page -- you'll need more PHP-specific advice here, but the idea is that you need to get text and possibly images that are language-specific for each page. The general choices are:
    * Use a single PHP file for the content (e.g., a form for registration info), and get the text displayed from locale-specific files (so for the "name" label over that field, you'd grab the proper translation).
    * Maintain a separate PHP file for the content in each language, plug the proper one into the template.

    The first option is better if your content is mostly short bits of text -- but if there are larger chunks of text it gets hard to read (and if the whole page is text -- like a privacy policy page, etc. -- the second option may make more sense). Personally, I supported both options.

    What else? Don't forget that formatting of currency, numbers, dates, and times will vary by locale. Don't forget to review any Flash animations, dropdown menus, popup calendars, etc.. these will need to support changes based on locale. Organize your resources carefully, so that a simple substitution in the path will get you the right image, content file, etc. (e.g., images/fr_CA/whatever.gif).

    HTH.

  16. One more thing by JavaRob · · Score: 2, Interesting

    Forgot to mention... remember you are always balancing ease of development and ease of maintenance.

    Something that helps one does NOT always help the other -- for example, building the site in English, then making complete copies and translating all text into other languages is easy to develop, but quickly becomes a nightmare in maintenance... the customer wants a minor change and you have to update 10 files.

    Just walk through quick scenarios for each option: I would do X to create and integrate this page, and I would have to do Y to update its layout or text later. If they add in a few new fields, I'd do Z... you get the idea. If there are dozens of steps, and you'll be laughing cynically at the suggestion of bringing a new developer onto the team... you're probably doing something wrong.

  17. Gettext and separate version. by Bitsy+Boffin · · Score: 2, Insightful

    Use gettext for general string i18n & l10n. Gettext is the defacto standard, it works, it's reasonably efficient, and there are many tools to support "unskilled" localisers to do the translating for you.

    For large or potentially dynamic text l10n (eg entire content of pages, descriptions of products in a database, etc etc..) then you need to have 1 version for each language you are supporting (you COULD do it through gettext but it would be rather tedious). How you do that is of course 100% dependant on your application.

    --
    NZ Electronics Enthusiasts: Check out my Trade Me Listings