Slashdot Mirror


How Are You Accomplishing Your i18n?

cobrabyte asks: "My team has recently been given the task of implementing internationalization (i18n) in our MySQL databases (PHP-interfaced). Essentially, for every article X, we need it presented in any number of languages (once translated). As we were working on gathering the necessary procedures, we were very surprised to find that there's not much organized information regarding i18n using MySQL and PHP. Is the topic of i18n too new to garner any usable info?"

117 comments

  1. The same way you do everything else. by Suppafly · · Score: 0, Offtopic

    Hire some Indian outsourcing company to do it for you.

  2. Inside info - but ... php5 is going to solve it. by Anonymous Coward · · Score: 0

    I think, Yahoo! is working on multi-byte native support for PHP5 .

    As a major user and employer of Rasmus ... that shouldn't be a big surprise ..

    Until then stick to mbstring stuff :)

  3. What's the question? by plcurechax · · Score: 1

    I'm not sure what the question is. Is it, how do we allow users to select a language? Is it, how to implement i18n in PHP based code? Is it, how to manage multiligual databases?

    I'm not sure what the question(s) is.

    1. Re:What's the question? by cobrabyte · · Score: 1

      I thought my question was clear ... but let me elaborate...

      We are familiar with UTF and have done extensive research on the subject. However, outside the realm of standards, there is not a clear path for bringing all of the various pieces (MySQL, PHP, Apache, etc.) together to form a cohesive, multi-language-compatible unit.

      There are articles here and there about various aspects of internationalization. However, I get a sense, after reading these articles, that the authors are just experimenting. I don't want to say that they are implementing hacks ... they're not.

      I am wondering if the subject matter is too new to have definitive manuals/books/etc.

      -c

    2. Re:What's the question? by Otter · · Score: 1

      Sorry, I still don't understand -- could you explain in more detail (to the degree that you can) what you're making and which parts need to be internationalized? And whether you're supporting two languages or twenty?

    3. Re:What's the question? by Ucklak · · Score: 1

      You create a website or web application.
      How do you translate it into other languages?
      -without using some crappy 'BabelFish' layer
      -without having to write a complete localized version for each language.

      --
      if you steal from one source, that is plagiarism, if you steal from many, well, that's just research.
    4. Re:What's the question? by rylin · · Score: 1

      You ask $DEITY for a miracle?

    5. Re:What's the question? by plcurechax · · Score: 5, Informative

      -without using some crappy 'BabelFish' layer

      Ask any government that supports multiple official languages (Canada, Switzerland, ...). You translate into the other language(s) using professional translators. Period. You can give them the most powerful automatic translation tools available, and multiple language dictionarys (e.g. English-French) but in the end you need a human professional translator to make translations worth reading.

      -without having to write a complete localized version for each language.

      You need to make the content management system (CMS) language aware, and you need to localize all your templates. Then you need to add a key to your article database for language, so the user can retrieve article 101 in either english or french. (think a long the lines of http://localhost/cms/display.php?article=101&lang= en ).

      I know nothing about PHP programming, so I cannot comment on that, or MySQL (main gotcha I expect is datatype, UTF-8, iso8859-1, vs. windowspage1574). Two articles I found useful in general about internationalization are

      UTF-8 and Unicode FAQ for Unix/Linux by Markus Kahn
      How do I have to modify my software?
      http://www.cl.cam.ac.uk/~mgk25/unicode.html#mod

      The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)
      http://www.joelonsoftware.com/articles/Unicode.htm l

    6. Re:What's the question? by Anonymous Coward · · Score: 0

      Ask any government that supports multiple official languages (Canada, Switzerland, ...). You translate into the other language(s) using professional translators. Period.

      Apparently you've never been to Mexico, where the customs and immigration forms ask you "sing" your name (among a variety of other grammatically incorrect phrases).

    7. Re:What's the question? by ShieldW0lf · · Score: 1

      Create table Locales (PK LocaleID int LocaleName int)
      Create table LocalizableFields (PK FieldID int)
      Create table LocalizedValues (FieldID, LocaleID, LocalizedValue)

      Then, instead of creating table products with:
      ProductID int, ProductName nvarchar, ProductDescription nvarchar

      You use:
      ProductID int, ProductName int, ProductDescription int

      When you want to fetch stuff back out, you do a join like this:

      SELECT lpn.LocalizedValue AS ProductName,
      lpd.LocalizedValue AS ProductDescription
      FROM Products p
      INNER JOIN LocalizedValues lpn
      ON p.ProductName = lpn.FieldID
      AND lpn.LocaleID = [Users Locale ID]
      INNER JOIN LocalizedValues lpd
      ON p.ProductDescription = lpd.FieldID
      AND lpn.LocaleID = [Users Locale ID]
      WHERE p.ProductID = [ProductID you want]

      Using this technique, you can multilingualize everything in an open ended fashion. You can also use snippits for your user interface.

      Create table Snippit (PK SnippitID int, Snippit int)

      Then make a function that takes a SnippitID and a LocaleID and returns a string, and replace every single piece of localizable text in your static pages with snippits.

      Once you've set all this up, you want to have a default locale so that if there isn't a translated version available for the users locale, it will look for a value in the default locale and show that instead, and if that isn't available (ie data entered in french, default locale english, user locale german) then it shows whatever it has (french).

      Finally, overload your snippit function so that a translator can enter "translation mode" and, while in translation mode, instead of just showing the text it wraps it in an anchor tag that, when clicked, fires a popup in which the translator can enter the correct value in the locale they are logged in as and have it update the database and reload the page being edited. You could also use AJAX if you don't like popups.

      End result: a highly flexible and extensible localization engine for a web app. I've used this same setup for several of my clients in the past with great success. Enjoy.

      --
      -1 Uncomfortable Truth
    8. Re:What's the question? by jesup · · Score: 1

      Not only do you need to translate strings (and as you say, use real live translators to do it, at least for anything you care about), but you also need to deal with formatting order (especially in printf-like statements).

      In many languages, you have strings like this:
      "You, %s, owe us %s dollars."
      that are used with formatted print statements such as printf() that make assumptions based on ordering: printf(get_locale_string(you_owe_us_string),name,l ocal_money_string(amount)).

      In another language, the ordering of name and amount may NOT be the same. You really need to use an indexed string inserter like this:
      "You, %1, owe us %2 dollars." This way a localized string could have %2 first, then %1.

      In Scala (http://www.scala.com/) we handled it this way. Very effective (so far as it goes).

      Another important thing: French is not just French, English is not just English. There's US English and British English; French-Canadian and French, etc. And that's ignoring things like date ordering, monetary issues, etc.

    9. Re:What's the question? by Anonymous Coward · · Score: 0

      Just been through some of this with a site that needed to support the two simplified Chinese encodings UTF and GB2313.

      PHP had issues with encoding and decoding across the character sets. If you need to do character conversion you can probably save yourself a bit of debugging time by processing it as a system call to iconv (libiconv).

    10. Re:What's the question? by cerberusss · · Score: 1

      I found Joel's "Absolute Minimum" really, really minimal and did a presentation for the rest of the developers at the workplace: Zipped Powerpoint. Be sure to check the notes below each sheet.

      --
      8 of 13 people found this answer helpful. Did you?
  4. It is just me? by stinerman · · Score: 2, Insightful

    I can't stand that abbreviation, i18n. I mean who thought that would be a good abbreviation? It bears no resemblence to the original word. I think we can do better.

    1. Re:It is just me? by Anonymous Coward · · Score: 1

      Perhaps you meant...

      "I can't stand that a10n, i18n. I mean who thought that would be a good a10n? It bears no r9e to the o6l word. I think we can do b4r."

    2. Re:It is just me? by reidbold · · Score: 0, Offtopic

      i18n is an abbreviation? They really should have chosen something more explicit..

      --
      -Reid
    3. Re:It is just me? by gstoddart · · Score: 1
      I can't stand that abbreviation, i18n. I mean who thought that would be a good abbreviation? It bears no resemblence to the original word. I think we can do better.

      Not just you. I actually followed the link to wiki to figure out where that damned thing started.

      I've always assumed it was somehow l337 and supposed to match phonetically --- the fact that 18 is the number of omitted letters (according to wiki) makes me hate it as an abbreviation even more.
      --
      Lost at C:>. Found at C.
    4. Re:It is just me? by Anonymous Coward · · Score: 0

      in british/world-english: internationalisation
      in american-english: internationalization

      nation-agnostic term: i18n

  5. How Are You Accomplishing Your i18n? by pyrrhonist · · Score: 4, Funny
    How Are You Accomplishing Your i18n?

    By p09g.

    --
    Show me on the doll where his noodly appendage touched you.
  6. Have your looked at PEAR? by 33degrees · · Score: 4, Informative

    I haven't tried any of them, but PEAR has a number of packages for dealing with internationalization. You might want to try looking there for insight.

  7. Easy way, using SQL by jd · · Score: 2, Informative
    Simply define a strings table with two key fields. The first key defines the string ID, the second defines the language ID. The sole attribute would then be the string itself.


    Adding a new language then just becomes a case of adding a new language ID to the system, and adding a new string becomes adding a string ID.


    Any place that you want to generate an output string, simply insert a token which represents the string ID. Your translation code scans for the tokens, gets the current language from the environment, and then searches your strings table for the substitution string.


    (For those who remember the Commodore PET computer, this is very similar to how it worked. The Print command, for example, was stored internally as a "?" token. It substituted when displaying.)


    You do not need a table for the string IDs, an enumerated type would be sufficient to track what IDs are in use and what for. You WOULD want a table for the language, with the language ID as the key field (preferably as an enumerated type) and the font ID as the attribute. If you are not using fonts (eg: plain-text output) then again you can just use the enumerated type.


    Because you would NOT be encoding font data into the string (NEVER, EVER do that, by the way, as you're just padding the data with redundant information, and introducing extra complexity), you can replace the font at will, provided it conforms to the mapping standards for international character sets.

    --
    It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
  8. Too new? by hexghost · · Score: 1

    Is the topic of i18n too new to garner any usable info?

    Uh, its been around for a decade at least. Maybe a google search would help you.

  9. Indian companies are very qualified for this stuff by CyricZ · · Score: 1, Offtopic

    Indian companies are often very qualified for doing this sort of work. Considering the pervasiveness of non-English and English in India, they have become experts at including support for numerous languages simultaneously, even those written in very different scripts.

    --
    Cyric Zndovzny at your service.
  10. i18nHTML by Mind+Booster+Noori · · Score: 3, Informative

    Well, I think you're looking for this.

  11. That's not just i18n. by truedfx · · Score: 3, Insightful
    My team has recently been given the task of implementing internationalization (i18n) in our MySQL databases (PHP-interfaced). Essentially, for every article X, we need it presented in any number of languages (once translated).
    Let's check that link, shall we?
    The distinction between internationalization and localization is subtle but important. Internationalization is the adaptation of products for potential use virtually everywhere, while localization is the addition of special features for use in a specific locale. Subjects unique to localization include:

    * Language translation,
  12. Simple by metamatic · · Score: 2, Funny

    I shout loudly and tell the users to learn English.

    (I keed, I keeed...)

    --
    GCHQ Quantum Insert installed. If only our tongues were made of glass, how much more careful we would be when we speak
  13. Re:Indian companies are very qualified for this st by EnronHaliburton2004 · · Score: 1, Offtopic

    How many of those projects have been done correctly?

    I know a number of projects which have been outsourced to India, and they have all been done wrong, and quite a few of them ended in disaster. I don't know of a single outsourcing project that has been finished correctly.

  14. Surpise! by fm6 · · Score: 2, Interesting
    ...we were very surprised to find that there's not much organized information...
    You shouldn't be. Internationalization (I'm a good typist, so I can dispense with that mysterious "18") only applies to human-readable stuff. In other words, documentation. (Yes, captions on forms are documentation too!) Is there anything software people are less motivated to deal with than documentation?
    1. Re:Surpise! by morton2002 · · Score: 1

      18 characters are removed from Internationalization to make I18n. 10 characters are removed from Localization to get L10n.

      -Robert

  15. UTF! by EvilIdler · · Score: 2, Insightful

    Definitely use UTF-8 for all your strings and XHTML documents.
    Make sure your preferred editors really are saving UTF-8.

  16. profanity, morality? by spoonyfork · · Score: 1

    The i18l wiki scope section calls out profanity and morality. This really caught my attention. There is no explanation of their inclusion in the wiki. Why are they listed separate from language translation? Could anyone explain if/how/why they are incorporating profanity and morality into their i18n plans?

    --
    Speak truth to power.
    1. Re:profanity, morality? by bluGill · · Score: 2, Insightful

      Because these issues will trip you up.

      Particularly when using automatic translation (which is a bad idea anyway), something that is acceptable in your language may come out as something unacceptable in a different one. No matter how cheap you are trying to get by, you still need a someone to check profanity in your output. This is less a problem with human translators who will avoid the issue, but even still you should check because some translators will apply them thinking you won't know.

      Morality is important because you don't think of the issue. Muslim societies have restrictions on what females can wear. Show a girl in a swimsuit (even a one-piece) in the context of diving, and you have offended your Muslim audience. Christans have similar taboos, but will generally not be offended by that same picture. Hindu's consider cow sacrid, and your promotion of a pound of beef with any order will offend them.

      You might not consider them, but you should. These two issues cover all the subtile things that you won't think about unless you make a special effort.

  17. Here's a link for Rails by Anonymous Coward · · Score: 1, Informative

    I found the following Rails article quite helpful:

    http://manuals.rubyonrails.com/read/chapter/82

    In particular it links to the following:

    http://www.quepublishing.com/articles/printerfrien dly.asp?p=328641&rl=1

    Which is a very good discussion of characters sets in MySQL. I didn't realize it was so thorough. For instance you can have different character sets on tables, connections, and the server itself. Finally, it seems MySQL got something right. :-)

  18. Re:Indian companies are very qualified for this st by spoonyfork · · Score: 1, Offtopic

    I don't know of a single outsourcing project that has been finished correctly.

    Not all projects are intended to be, as you say, "finished correctly".

    --
    Speak truth to power.
  19. Multilingual user interface by jpkunst · · Score: 1

    I created a multilingual user interface for a moderately complicated web application with a small number of users like this:

    create an include directory 'lang' with language files for every language needed. In my case, two 'en.inc.php' for English and 'nl.inc.php' for Dutch. These files contain the strings for the interface in an associative array. Example:

    'nl.inc.php' contains:
    $l10n['ja'] = 'ja';
    'en.inc.php' contains:
    $l10n['ja'] = 'yes';
    I use a session to store the desired language:
    if (isset($_REQUEST['lang'])) {
    $available_languages = array('nl', 'en');
    if (in_array($_REQUEST['lang'], $available_languages)) {
    $_SESSION['lang'] = $_REQUEST['lang'];
    } else {
    $_SESSION['lang'] = 'nl';
    }
    require('lang/' . $_SESSION['lang'] . '.inc.php');
    }

    And then I just use the $l10n array for the strings in the user interface instead of hardcoded strings.

    echo $l10n['ja'];

    Which gives 'yes' if the session language is English and 'ja' if the session language is Dutch.

    A simple technique but it seems to work good enough.

    JP

    1. Re:Multilingual user interface by Intron · · Score: 2, Insightful

      I found that translating some concepts gave strings of very different lengths. For example, some technical stuff became much longer strings in Spanish (maybe it was my translators). What do you do about the problem of the web forms getting messed up in different languages? My site is small enough to just test and adjust where necessary, but for a bigger site, this could be a problem.

      --
      Intron: the portion of DNA which expresses nothing useful.
    2. Re:Multilingual user interface by Metaphorically · · Score: 1

      There's not a perfect rule, but the oft-repeated rul-of-thumb is that you should leave 35% spare 'space' if you write in English. The way I understand the rule is that if your text in English takes up 65% of the space that you could use for text then you probably have enough room to place the translated text in that location for any other language.

      Of course this only makes sense for horizontal text and maybe even only left-to-right at that. It's also bound to be wildly off in some cases. The reasoning I heard for is was that text tends to get longer when you translate it from English.

      Don't ask me what to do if English isn't the language you authour your text in.

      --
      more of the same on Twitter.
    3. Re:Multilingual user interface by jpkunst · · Score: 1

      I found that translating some concepts gave strings of very different lengths. For example, some technical stuff became much longer strings in Spanish (maybe it was my translators). What do you do about the problem of the web forms getting messed up in different languages? My site is small enough to just test and adjust where necessary, but for a bigger site, this could be a problem.

      In my case the number of different page templates (about 50) and languages (two, English and Dutch) was also small enough to adjust things by hand if needed.

      PHP Developer Derick Rethans has given talks about multilingual development, there might be stuff of interest there.

      JP

    4. Re:Multilingual user interface by raju1kabir · · Score: 1
      There's not a perfect rule, but the oft-repeated rul-of-thumb is that you should leave 35% spare 'space' if you write in English. The way I understand the rule is that if your text in English takes up 65% of the space that you could use for text then you probably have enough room to place the translated text in that location for any other language.

      Some southeast Asian languages (e.g., Burmese and Khmer) stack letters vertically in certain cases, so they end up taking a lot more space. What I mean is, these languages run horizontally like English, but certain combinations require some letters to be placed above or below others instead of next to them. They take stylized forms that don't require a full extra line-height but there's still a considerable amount of extra inter-line space required. It's sort of like mandatory double-spacing.

      Don't ask me what to do if English isn't the language you authour your text in.

      Why, get with the program and start writing in English, of course.

      authour

      Wow, you Canadians are even more British than the British!

      --
      "Patriotism is your conviction that this country is superior to all other countries because you were born in it." -- GBS
  20. application-level by maraist · · Score: 1

    It's a proportionality thing.. Often, the difficulty of programming i18n is less dramatic than the problem of actually generating all the different language sets.. Often such management means an advanced GUI (maybe not advanced, but certainly more than just raw field entries for a DB-backed widget).

    i18n is generally token language-set in a 1:n relationship.. Which maps nicely to table layouts, thus I don't see any need to create i18n support in the DB itself.

    If you want some degree of abstraction, java provides the ResourceBundle for which you can easily write your own DB-backed loaders. You instantiate your bundle, then pass it around to whatever device needs to render the actual text.

    There is lots of i18n support in the ASP/JSP environment (I assume in PHP as well). jstl, struts, webwork have nice end-to-end support for i18n (error messages accepting tokens instead of raw text, etc).

    For manipulation of static sets of text, there are generally plugins for your editors which allow you to manage a suite of bundleName_locale.properties with color highlighting, missing-term warnings, etc. I personally use the properties plugin for idea in the java environment.

    Anymore, you should conciously be evaluating if anything is ever being displayed to the end-user, and organizing that material into i18n bundles of some sort. Standardizing on whatever your platform natively supports is critical because you can leverage the tonns of tools that are here and are bound to come.

    --
    -Michael
  21. Smarty + preparse plugin by DamienMcKenna · · Score: 3, Informative

    I did this in 2003 for a CMS+ecommerse system I did for a company. You had Smarty templates which had things like {productstr1} in them. The text strings were referenced by language and string ID, and if the string didn't have a specific version for your language it defaulted to English. This string was loaded from the database in a preparse plugin and was cached in a per-language directory. It worked ok, a bit kludgy but sufficient to get the job done.

    Damien

  22. It is just you by bluGill · · Score: 4, Insightful

    The problem is you speak English. There is a good chance that you speak no other language. Since nearly everything is written in English first these days, you don't care about these issues.

    Many of those who care about i18n do not speak English at all! To these people even spelling the word out gives no help. In fact it is less helpful because they have to learn this large symbol. (There is no reason to assume they even know the Latin alphabit, so they will not think to learn each letter separately)

    Of those who speak English, many do not speak it fluently. Often they speak English as a first year student ("hello, my name is"), and they know how to look words up in their English-whatever dictionary.

    Of course English is the dominate second language in the world. There are plenty of people who speak English fluently as a second language. They often have trouble with the creative spelling English came up with. Words with 20 letters are hard for anyone to spell, so it would be no surprise if they have trouble spelling it.

    The goal is one symbol that is easy for everyone to recognize. No matter what language the page is written in, if you see "i18n", you know you are in a location where people are interested in translation. This is often enough for some educated clicking to find the same information in your language.

    i18n may not be a good abbreviation. However can you come up with a way to represent the concept to all 6+billion people on earth?

    1. Re:It is just you by Anonymous Coward · · Score: 0

      However can you come up with a way to represent the concept to all 6+billion people on earth?

      Can we start by not having numbers in the middle of the damn word? What languages other than l33t stick numbers right in the goddamn word?!

    2. Re:It is just you by stinerman · · Score: 1

      Write it in esperanto!

      I do speak a bit of French (not fluent by any standards). But you are right, I had never thought of the fact that others simply didn't understand what "internationalization" is. Seems I've been pretty humbled.

    3. Re:It is just you by jrumney · · Score: 1
      I had never thought of the fact that others simply didn't understand what "internationalization" is.

      In English, that would be internationalisation.

    4. Re:It is just you by Pseudonym · · Score: 2, Funny
      Write it in esperanto!

      Unfortunately, "internaciigo" isn't necessarily an improvement.

      Admittedly, it's shorter. However, the word is pronounced something like in-tehr-na-tsee-EE-go. Many people find the "ts" followed by two separately pronounced "i"'s, with overall word emphasis on the second, a bit hard to pronounce. And it sounds a bit like there's the word "nazi" in the middle, which means the thread is over.

      --
      sub f{($f)=@_;print"$f(q{$f});";}f(q{sub f{($f)=@_;print"$f(q{$f});";}f});
    5. Re:It is just you by Pseudonym · · Score: 2, Funny

      I propose we use the locale-neutral word "internationali1ation".

      --
      sub f{($f)=@_;print"$f(q{$f});";}f(q{sub f{($f)=@_;print"$f(q{$f});";}f});
    6. Re:It is just you by Captain+Nitpick · · Score: 1, Insightful
      i18n may not be a good abbreviation. However can you come up with a way to represent the concept to all 6+billion people on earth?

      The grandparent poster complains about 'i18n' being a lousy abbreviation, and you give the world a six paragraph rant about cultural imperialism. This is rather like going off on communism because someone commented on the color of an object.

      Seriously, it has numbers in it. Numbers!

      (at this point, I start wanting to scream 'Those aren't even WORDS!!!! ED! ED! ED IS THE STANDARD!!!')

      --
      But then again, I could be wrong.
    7. Re:It is just you by fcgreg · · Score: 1

      Thank you. This needed to be said, and I was about to do it.

      Please mod this parent up to at least the level of the rant to which it applies.

      --
      Greg T.
    8. Re:It is just you by gabebear · · Score: 1

      I believe you missed the point a bit. More people in the world recognize Arabic numerals than Latin characters. i18n is just suposed to be a symbol that anyone in the word could recognize easily.

      The L33t spelling is just a bonus.....

    9. Re:It is just you by pla · · Score: 3, Interesting

      Of course English is the dominate second language in the world.

      In IT, English holds the majority by far. And Spanish doesn't even come in second - You have Japanese and German as distant seconds, with Hebrew and French as dark-horse thirds.

      Attempts at internationalization simply hinder the adoption of English as the next ubiquitous academic language. Much like Greek and Latin during the Roman empire - The rabble may all speak Spanish, but those who want to appear educated speak English. Of course, Latin later went on to hold the same place, so perhaps some day Spanish will function as the language of the academic elite.

      Personally, I don't have great hope of us not blowing up the planet before then. So I code with English as my target language. Speak it, or don't use my programs, doesn't much matter to me


      Many of those who care about i18n do not speak English at all!

      I don't think that needs an exclamation mark - It doesn't come as a particular surprise to anyone. If you speak English, you don't have the least interest in "internationalization", which basically means "Make it accessible to people who don't speak English".


      And I don't write this as a xenophobic rant... I regularly use programs written by Japanese coders, and a few in German. And do I sit around complaining about how those coders, who already have given me something I find useful, should do extra work unrelated to the purpose of the program to make those programs more friendly to me? No. I recognized my inability to read the menus and such as a shortcoming in myself, and made the effort to learn enough Japanese and German (albeit very little) to navigate those programs.

      Or to put that another way - If Bill Gates only spoke Italian, a LOT more people would have learned at least a basic proficiency in it by now.

    10. Re:It is just you by thesixthreplicant · · Score: 1, Interesting
      live in belgium and you'll know how to make multi-language sites. jesus, the number of times we have demos of CMS systems and the first question i ask is 'how do you implement a multilanguage site?' and they always say 'it'll be in the next edition'. funnily enough they're quite happy to have the admin module in multiply languages but actually having the *site* use multiple languages is always a big no-no.

      ciao

    11. Re:It is just you by Captain+Nitpick · · Score: 1
      i18n is just suposed to be a symbol that anyone in the word could recognize easily.

      That unpronouncable symbol Prince changed his name to was easily recognizable too, but that doesn't mean it served well as a name.

      --
      But then again, I could be wrong.
    12. Re:It is just you by gabebear · · Score: 1

      Actually if you wanted to write internationalization in a short easy to pronounce way you would write \u56fd\u969b\u5316\u3011 . Since that is the kanji for internationalism in both Chinese and Japanese. I think that makes it much easier for the majority of the population to pronounce*.

      There is a very sound reason to put Arabic numerals in the word, it's easy to pick out no matter what language(s) you read. This isn't anything like Prince's name where he just made up a totally new symbol to get out of contract obligations.

      * I don't know literacy percentages, nor do I know any Japanese or Chinese.

    13. Re:It is just you by gabebear · · Score: 1

      Ooops, that made a mess of the unicode. But I guess Slashdot posters/reader can just use U+22269 U+38469 U+21270.

    14. Re:It is just you by Anonymous Coward · · Score: 0

      INTER~15 takes less space, both in print and FAT32.

    15. Re:It is just you by Anonymous Coward · · Score: 0
      The populations of China and Japan add up to under 1.5 billion people. Even if you ignore illiteracy and add in North and South Korea (I don't know whether they use the same glyphs as hanja for that word) and speakers in other countries, you aren't going to get anywhere near a majority of the world's 6.4 billions.

      If you look at literate computer users worldwide, a plurality (maybe not a majority) almost certainly read English.

    16. Re:It is just you by Anonymous Coward · · Score: 0

      Our civilization relies on mostly verbal (not visual) communication. Prince's new symbol wasn't adapted for our media at all, especially since he didn't even bother to give it a name or pronunciation. None of this is true for "i18n".

    17. Re:It is just you by Captain+Nitpick · · Score: 1
      Our civilization relies on mostly verbal (not visual) communication. Prince's new symbol wasn't adapted for our media at all, especially since he didn't even bother to give it a name or pronunciation. None of this is true for "i18n".

      I don't consider "I-eighteen-n" to be much more pronouncable than "The artist formerly known as Prince". Hell, "internationalization" isn't very pronouncable either. Eight syllables in one word is too many.

      --
      But then again, I could be wrong.
    18. Re:It is just you by Captain+Nitpick · · Score: 1
      There is a very sound reason to put Arabic numerals in the word, it's easy to pick out no matter what language(s) you read. This isn't anything like Prince's name where he just made up a totally new symbol to get out of contract obligations.

      Except for the whole unpronouncable symbol thing.

      --
      But then again, I could be wrong.
    19. Re:It is just you by gabebear · · Score: 1

      it's perfectly pronouncable; its pronounced inter-nation-al-i-zation.

    20. Re:It is just you by Anonymous Coward · · Score: 0

      If you speak English, you don't have the least interest in "internationalization", which basically means "Make it accessible to people who don't speak English".

      Fascinating. All these years I thought I cared about i18n, and now suddenly you come along and tell me I have no interest in it at all! Thank you, pla, for opening my eyes!

      And I don't write this as a xenophobic rant... I regularly use programs written by Japanese coders, and a few in German.

      And some of your best friends are niggers, right?

      Or to put that another way - If Bill Gates only spoke Italian, a LOT more people would have learned at least a basic proficiency in it by now.

      Really? I wonder why Microsoft have bothered to waste all that money localising Windows into every major first-world language, when according to a random troll on Slashdot, everyone would have learned English if they hadn't. What idiots Microsoft must be, caring about speakers of other languages!

    21. Re:It is just you by wideBlueSkies · · Score: 1

      >>So I code with English as my target language. Speak it, or don't use my programs, doesn't much matter to me

      I agree that a system needs to have a 'base' language. The business case, requirements, design docs, and code/comments are usually better off being written 1 language.

      However, if you are dealing with clients in multiple regions footing the bill for your project, it's also a good idea to think in terms of having the application support multiple languages in some type of modular way, so that they can be added (or removed) as needed, as the business changes.

      I'm working on such a project right now, and if the requirements team had told our Japanese clients that the UI would be 'english only', the prject would heve ended right there..2 years ago.

      Without getting into the requirements, and specifics of the implementation: English is the base language of the application, but it is trivial, (from the client perspective) for the UI language to be changed. So all menus, and screen labels/widgets can appear in the language of choice. We also allow multilingual input, where it is required(text only), and store as unicode. Fortunately stuff like account numbers, and business entities can be assigned numbers (which are universal) on lookup tables.

      Though the most important data, the money, is represented by numbers, which everyone understands. No translation needed there. :)

      wbs.

      --
      Huh?
    22. Re:It is just you by gomiam · · Score: 1

      You may want to check your dictionary. Webster's Seventh New Collegiate Dictionary (American edition, which might be more prone to using -isation) only knows about "internationalization". Yes, I know it's old. Oh, and a Google fight says internationalization sweeps the floor with internationalisation.

    23. Re:It is just you by jrumney · · Score: 1
      Webster's Seventh New Collegiate Dictionary (American edition

      Hardly an authority on the native language of England then, is it?

    24. Re:It is just you by gomiam · · Score: 1
      Preface: Oh, I guess American English is not "English enough". Silly me. And I would love to see the face of any English philology scholar at reading that Webster somehow is "not an authority" on English. Well, let's get to the answer.

      I thought "internationalisation" was a term more commonly used in the USA than in the UK. But The Cambridge Dictionary states the term is "internationalization", with "internationalisation" being a UK localism. The Oxford English Dictionary (user login required, BugMeNot is your friend) knows of no "internationalisation", though its references on "internationalization" cite sources both with -z- and -s-. I hope you will agree these two dictionaries' creators know what's spoken in England. Otherwise, please provide your authoritative references.

      I think this proves both forms are currently accepted, so we probably should call it a draw. As a native Spanish speaker, I prefer the -z- form, as it looks more like the Spanish internacionalización, but that's my preference.

  23. Re:Too new? by Anonymous Coward · · Score: 0

    From the PHP manual

    "In PHP, a character is the same as a byte, that is, there are exactly 256 different characters possible. This also implies that PHP has no native support of Unicode."

    If you know absolutely nothing about a topic, please don't post.

  24. Re:Indian companies are very qualified for this st by EnronHaliburton2004 · · Score: 0, Offtopic

    I agree. Alot of projects that I've come across lately aren't really taken seriously. It's busywork, or a project that was started and then dropped mid-way through or something strange...

  25. The database tier by chiph · · Score: 1

    From a database perspective, there's two basic ways to do this. Assuming you need to present an I18N version of a Widget table, you can:

    1. Define Widget and WidgetText, with all the I18N material moved to WidgetText. WidgetText is keyed on the Id from Widget and a Culture identifier. Every time you need a Widget, you JOIN to WidgetText based on the Id from Widget and the Culture identifier of the requesting user.

    2. Add a Culture identifier column to your Widget table, and use that in your WHERE clause. Leaving it off means you'll get multiple rows back for a request to fetch a Widget.

    At my last job we took approach #1, and it worked well enough. It's main advantage (for us) was it made reporting easier (yeah, it seems like it wouldn't, but it did for various arcane reasons)

    In both cases you need to make sure you're using the NCHAR, NVARCHAR, and NTEXT Unicode variations of string column datatypes. All literals used in your SQL must have the "N" prefix to indicate Unicode data. You need to also watch out for the collation sequence defined in your database if the order of rows returned is important (and it usually is!). And make sure that you have the concept down of how storing DateTime values is entirely different from the display of DateTimes.

    Last time I looked at it, MySQL was sortof weak on a number of these points. Sybase SQL Anywhere (or whatever it's called this week) was pretty good, as well as the usual suspects: SQLServer, DB2, and Oracle. I don't know about PostgreSQL - haven't used it.

    You'll want to code up a vertical slice of your application to make sure all your chosen tools & components can handle I18N.

    Chip H.

  26. Re:Too new? by cicadia · · Score: 1
    This also implies that PHP has no native support of Unicode

    And, for that matter, neither does C, or C++, or assembler. We can conclude from this that Unicode support is not possible, except perhaps in Java or Python.

    The grandparent post was entirely correct to point out that this is not a new problem. People have been doing multibyte characters in all sorts of languages for a long time. I was even doing i18n in PHP in 1999.

    Not having 'native support' for Unicode doesn't mean that you can't use Unicode strings. (They're composed of bytes, you know). At most, it means you can't get useful data from the length() function, and things like toupper() and tolower() may not do what you expect. You can still store them and retrieve them, display them to the user. Programmers have been doing this sort of thing for a long time, without 'native' support from their language.

    If you know absolutely nothing about a topic, please don't post

    Good advice.

    --
    Living better through chemicals
  27. Easy by floop · · Score: 1

    ./configure --without-nls

  28. PHP? MySQL?? by deepestblue · · Score: 1

    Regarding PHP, http://www.joelonsoftware.com/items/2003/10/10.htm l is instructive. Yes, I did confirm from the PHP website that things aren't too different now.

    MySQL? The less said, the better.

  29. Not just you - but mostly by Roadkills-R-Us · · Score: 1

    It's actually quite clever. Feel free to come up with something even remotely that short that conveys it any better. The abbreviation is meaningless outside the group of people who care about it, but so is most geek speak and a great deal of the body of scientific language-- regardless of the language of origin.

  30. Re:Indian companies are very qualified for this st by jrumney · · Score: 1
    Considering the pervasiveness of non-English and English in India, they have become experts at including support for numerous languages simultaneously, even those written in very different scripts.

    Not really. The Indian scripts are very poorly supported by most operating systems and software. It is only recently that Indian programmers have started to work on this and improve the software situation for their own domestic market. Most Indian programmers have barely more awareness of internationalisation issues than Americans in my experience.

  31. Re:Indian companies are very qualified for this st by Suppafly · · Score: 0, Offtopic

    I'm not sure why I was modded as flamebait, I didn't say anything negative about have indian firms do the work, I just made an observation that using indian firms is the normal way of getting this kind of work done. I'm sure the metamods will vindicate me.

  32. gettext by Ruis · · Score: 1

    If you want to use gettext with php, I wrote a very simple howto on the subject. http://ruistech.com/gettext/

    1. Re:gettext by khanyisa · · Score: 1

      Or you can use http://pear.php.net/package/Translation2 which also handles gettext Note that using gettext gives you huge advantages as you can then use standard tools to manipulate the translations...

  33. One of the many things you may need to do is... by slopedome · · Score: 1

    You may need to start by converting your iso-8859-1 or other European ASCII to UTF-8 or another sensible Unicode charset. Some of our MySQL data was in the dreaded windows-1252 encoding, and I had to convert it to UTF. I downloaded the Convert Charset class (found via http://phpclasses.org/ from Mikoaj Jdrzejak, and with that I discovered I could basically convert anything I wanted from whatever charset to whichever charset I like. Wrote a couple scripts, and that was that.

  34. You seem confused... by Senjutsu · · Score: 1

    Localization applies to documentation and other human readable stuff, because it involves adapting the program and it's documentation for a particular locale.

    Internationalization is the process of adapting your program so that it can easily be made to work in any locale. Not hardcoding strings in english, not assuming 1 byte == 1 char, that kind of thing. A good i18n architecture makes localization much easier.

    1. Re:You seem confused... by sahala · · Score: 1

      Mod parent up. He has a clue.

    2. Re:You seem confused... by fm6 · · Score: 1

      Please, I know what the terms mean -- I've actually done these things. But Localization and Internationalization are two parts of the same process. Especially in PHP-based applications, which are basically script-driven web pages.

    3. Re:You seem confused... by Asmodai · · Score: 1

      Erhm,

      you are wrong on the account of what they mean, Senjutsu is actually accurate there.

      --
      Jeroen Ruigrok/Asmodai
  35. Some answers by JavaRob · · Score: 5, Informative

    I18n/localization is one of those tasks that has *lots* of questions that will need to be resolved... often you won't even know about all of the issues to resolve until you start digging into it.

    I sorted out the i18n design for a project recently, so I can share some insights on the process. My project used Java/JSP, but the problems are mostly the same. One of the most important points to be made is that you *need* to sit down and design it all the way through -- this is not a "feature" that can be easily added in when you need it later (and extreme programming teams can get hosed on this one pretty easily).

    Things to consider (in the sequence of a request for simplicity's sake):
    1) How will you know what language a user wants (first time, and on subsequent pages)? The user should be able to select/change their preference (though you could use their browser-reported locale as a guess), and they should be able to *bookmark* the homepage in their language. You could use a cookie, and redirect from the basic homepage based on that. Personally I avoid depending on cookies where possible, I didn't want to have duplicated directory structures, and I didn't want an added param on every request, so I used multiple *subdomains*, one per supported locale. They all mapped to the same IP, same application -- but in the web application I could check the requested URL and set the locale (and build the page) correctly using that. There were links on the top of the homepage to switch languages -- which would just flip to the proper subdomain. (Important note -- this complicates getting a cert for SSL, since that's tied to the domain... keep that in mind).

    Once you know what language you're using, build the page... this will probably involve getting data out of the database and displaying some of it.

    First, make sure your tables support whatever character set the languages will need. Then make your data design carefully. You need to make sure that any data in the database that will show up onscreen: product descriptions, category names, and ALSO prices (you probably have to give prices in various currencies, right?).

    Building the page -- you'll need more PHP-specific advice here, but the idea is that you need to get text and possibly images that are language-specific for each page. The general choices are:
    * Use a single PHP file for the content (e.g., a form for registration info), and get the text displayed from locale-specific files (so for the "name" label over that field, you'd grab the proper translation).
    * Maintain a separate PHP file for the content in each language, plug the proper one into the template.

    The first option is better if your content is mostly short bits of text -- but if there are larger chunks of text it gets hard to read (and if the whole page is text -- like a privacy policy page, etc. -- the second option may make more sense). Personally, I supported both options.

    What else? Don't forget that formatting of currency, numbers, dates, and times will vary by locale. Don't forget to review any Flash animations, dropdown menus, popup calendars, etc.. these will need to support changes based on locale. Organize your resources carefully, so that a simple substitution in the path will get you the right image, content file, etc. (e.g., images/fr_CA/whatever.gif).

    HTH.

    1. Re:Some answers by Anonymous Coward · · Score: 0

      "Don't forget that formatting of currency, numbers, dates, and times will vary by locale."

      For heaven's sake, don't forget that the formatting of currency data completely changes its meaning! $199.00 is completely different from £199.00.

    2. Re:Some answers by JavaRob · · Score: 1

      That's not formatting; "formatting" means changing the display to convey the *same* info to people who use different standards (I think I mentioned currency conversions in the data section).

      For example,
      $1,999.00
      might be formatted as
      USD 1.999,00 in a different locale.

    3. Re:Some answers by jgrahn · · Score: 1
      How will you know what language a user wants (first time, and on subsequent pages)? The user should be able to select/change their preference (though you could use their browser-reported locale as a guess) [...]

      Pet peeve: the locales reported by a browser isn't just a guess; it's the standard way for a user to tell what languages she prefers to read. I realize that web developers mostly ignore this, but IMHO not using it (in the absence of other information, of course) is a bug.

    4. Re:Some answers by JavaRob · · Score: 1

      I did mention it... but as a developer, I have to still treat it as a "best guess". It's certainly not a guarantee of the preferred language. I've done a decent amount of traveling, and when I'm in an internet cafe and websites show me content in the local language (without the option to change!), I'm screwed if it's a language I don't read. That's a more serious bug than showing a non-preferred language (but allowing them to click a link for their preferred lang).

      Of course, travellers are not the majority of users... but still, there are plenty of countries where multiple languages are spoken, and any user going into an internet cafe (still how most people access the internet in many places...) will likely not be able to change the language setting on the browser.

      So.. it can be helpful, but it can't be the center of your localization strategy.

  36. One more thing by JavaRob · · Score: 2, Interesting

    Forgot to mention... remember you are always balancing ease of development and ease of maintenance.

    Something that helps one does NOT always help the other -- for example, building the site in English, then making complete copies and translating all text into other languages is easy to develop, but quickly becomes a nightmare in maintenance... the customer wants a minor change and you have to update 10 files.

    Just walk through quick scenarios for each option: I would do X to create and integrate this page, and I would have to do Y to update its layout or text later. If they add in a few new fields, I'd do Z... you get the idea. If there are dozens of steps, and you'll be laughing cynically at the suggestion of bringing a new developer onto the team... you're probably doing something wrong.

    1. Re:One more thing by ibennetch · · Score: 1
      remember you are always balancing ease of development and ease of maintenance.
      and
      quickly becomes a nightmare in maintenance...
      These are very good points -- When you have a minor change in text, you've got to worry about getting it translated in to every language you support. Who's going to do the translations? Maybe I'm missing something obvious by not working at a huge corporation (like "we just send it to our Bejing and Madrid offices and they do the translations in to their local languages"), but it seems to me that this is going to be a ton of work on top of the initial redesign. Of course, you probably already realize that.

      In an effort to stay on-topic, how I would implement this depends on exactly what your company does and how extensive your web site is. It could be as simple as putting a bunch of languages in the database, keeping track of the user's choice based on a session variable (and maybe even guessing the initial choice based on their IP...just a thought). Then pulling the correct information out of the database should be rather easy -- instead of finding, say, article 12 or with title 'foo' (depending on how your site's written), just pull id=2&lang=en or foo.py&lang=de and let the database handle the rest.
  37. Re:Indian companies are very qualified for this st by cakoose · · Score: 1

    Agree. Most colleges in India use English as the medium of instruction. Until recently, anyone who has had to deal with a computer was probably relatively fluent in English so there was never an urgent need to deal with internationalization.

  38. Re:Indian companies are very qualified for this st by pete6677 · · Score: 1

    The moderator disagreed with your opinion, which pretty much defines "flamebait", at least according to most Slashdot moderators. This is why I meta-mod just about all flamebaits as unfair.

  39. Why? by nurb432 · · Score: 0, Troll

    Those stupid langauge dont count, so why support them?

    If you cant speak/read English, then screw you.

    Hell, if you arent an American, screw you. Even better.

    Ya, mod me down. I dont care. Ill be the one laughing when your job is outsourced. You people cant hide from the truth forever.

    --
    ---- Booth was a patriot ----
  40. Maybe because by Anonymous Coward · · Score: 0

    MySQL and PHP are not scalable solutions? Ever thought of that?

  41. phpBB by xluap · · Score: 1

    The open source forum phpBB uses I18n. You can study it's source as an example how this can be done in php.

    www.phpbb.com

  42. Gettext and separate version. by Bitsy+Boffin · · Score: 2, Insightful

    Use gettext for general string i18n & l10n. Gettext is the defacto standard, it works, it's reasonably efficient, and there are many tools to support "unskilled" localisers to do the translating for you.

    For large or potentially dynamic text l10n (eg entire content of pages, descriptions of products in a database, etc etc..) then you need to have 1 version for each language you are supporting (you COULD do it through gettext but it would be rather tedious). How you do that is of course 100% dependant on your application.

    --
    NZ Electronics Enthusiasts: Check out my Trade Me Listings
    1. Re:Gettext and separate version. by Anonymous Coward · · Score: 0

      Is "gettext" so incredibly superior that we should disregard a real standard like "catgets" from X/Open?

  43. This method fails for many things by Dancin_Santa · · Score: 1

    The way you've mentioned works fine if you are only going to display static strings, but if you wish to display dynamic strings you will need a different approach.

    Languages like English are SVO while other languages are SOV. Throw in a few extra grammar rules and a simple string substitution scheme becomes impossible because printf("%s %s %s", S, V, O); will simply not create correct strings for any language that uses a different ordering.

    1. Re:This method fails for many things by jd · · Score: 1
      That's true, I hadn't considered that aspect. What you would need to do, in that case, is to store the ordering in the language table, then use that ordering to generate the SQL needed to pull the strings. Then you would be able to handle flexible ordering.


      Character direction is another problem, if you're going truly international, as some languages alternate between left-to-right and right-to-left on different lines. This is a problem, because you can't now just store a direction somewhere and use that to increment/decrement the cursor position. This one, I don't know how to solve at the application level. It would have to be trapped in the windowing code, if it is to work correctly.


      (Actually, it gets worse - the language used on Easter Island was not only written in alternate directions, alternate lines were written on alternate sides. The Phaios disk has text written spirally. There are probably many other "special cases" out there that I don't know about. It would be unimaginably bloaty to modify X' text-handler to support every writing practice ever used, and if you want this to be cross-platform, forget it, as you can't add such capabilities to Windows and even adding them in a portable way to Java would be horribly complicated.)

      --
      It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
    2. Re:This method fails for many things by Anonymous Coward · · Score: 0

      Have a look at the QT4 implementation of it.

      I think its called Uniscribe or something... Truly wonderful way of handling virtually any kind of language and rendering it in a variety of ways.

      Trolltech have solved this stuff already, so KDE apps should have no problem support esoteric languages.

    3. Re:This method fails for many things by CableModemSniper · · Score: 1

      Forget that. Just use a a substition with labels instead of sprintf-ing everything. Then you can have "Hello my name is $name." and "Guten Tag, Ich heisse $name." (Ok, ok in this instance it's the same order, but I'm having trouble of thinking of better examples and I am lazy and you get the idea and it makes way more sense than %s.) Just name the substitution arguments.

      --
      Why not fork?
  44. Just curious by Anonymous Coward · · Score: 0

    Who or what is (a) "Booth"?

    1. Re:Just curious by Anonymous Coward · · Score: 0

      I always assumed he was referring to John Wilkes Booth, the many who assassinated Abraham Lincoln.

    2. Re:Just curious by Anonymous Coward · · Score: 0

      d'oh, "man," not "many."

      Damn /. and it's lack of an "edit" feature.

    3. Re:Just curious by nurb432 · · Score: 1

      Always? But i just changed to that sig yesterday... To make people think...

      --
      ---- Booth was a patriot ----
    4. Re:Just curious by Anonymous Coward · · Score: 0


      And you have suceeded! You have made me think you're an idiot.

  45. Re:Indian companies are very qualified for this st by Anonymous Coward · · Score: 0

    Same, one man's flamebait is really just another man's opposing opinion, most of the time. I also don't understand why parts of this thread have been modded offtopic, when they talk about internationalization and are therefore on topic.

  46. This is a pretty common task for OSS projects... by WoTG · · Score: 1

    might I suggest you browse various PHP OSS projects. Most of the biggest most popular packages have language selections. You're bound to find some good examples on how to handle i18n.

  47. Re:Indian companies are very qualified for this st by The+Cydonian · · Score: 1
    Oh the situation has become muuuch better over the last two years; MS is big time into Indic computing, and there's been a fair bit of work done on the OSS front as well.

    But otherwise, the broader point is well-taken; despite India's obvious linguistic diversity, Indian programmers dont necessarily have an advantage over other nationalities in i18n efforts.

  48. Re:Too new? by Anonymous Coward · · Score: 0

    Unicode isn't bytes, its a standard for referencing characters. Perhaps you were thinking of UTF-8 or UCS-2.

    I guess you were happy with FORTRAN character strings, too? They're just bytes after all. Why create a new type for text data when you can just pack them in integers?

    If you know absolutely nothing about a topic, please don't post.

  49. i18nHTML by Anonymous Coward · · Score: 0

    Look at i18nHTML

  50. It's a pun! by Anonymous+Brave+Guy · · Score: 1
    I can't stand that abbreviation, i18n. I mean who thought that would be a good abbreviation?

    I thought this was common knowledge, but no-one seems to have posted it yet while many people seem to be asking, so: it's a pun.

    The word is written either "internationalisation" or "internationalization", depending on which English-speaking country you're in at the time, but both versions have 18 letters between the 'i' and the 'n'. As well as being shorter, "i18n" therefore works without adjustment in all English-speaking locales.

    --
    If you disagree, post your argument. (-1, Overrated) isn't your personal censorship tool for views you don't like.
  51. Re:Too new? by Anonymous Coward · · Score: 0

    Are there any live platforms that use Unicode at all but whose C++ runtime doesn't support UCS-4 (or UTF-16), UTF-8, and local encodings in wcstombs and mbstowcs?

  52. PHP + MySQL for I18N by agir · · Score: 1

    As a number of people have mentioned, Internationalization and localization can be an incredibly complex process.

    Since you are working with an existing system, you don't have the option of designing in I18N support from the very beginning.

    Get a good book.

    I recommend "XML Internationalization and Localization" by Yves Savourel, and "Beyond Borders web globalization strategies" by John Yunker. Both the authors have been in the I18N business a long time. They know what they are talking about.

    Choose your tools wisely.

    Use MySQL 4.1 (or newer) --

    Since MySQL 4.1, you have the option of choosing which character set to use on a per DB, per table, or per field basis. The simplest solution is to just make the entire DB use the UTF-8 character set (This may not be appropriate for reasons of optimization or other reasons).

    Learn about Unicode/UTF-8. (Others have provided links)

    Store your localized data in UTF-8. Using a single character set makes life much easier.

    Use a fairly recent version of PHP --

    PHP 4.1.1 (or newer) comes bundled with GNU Gettext.

    GNU gettext

    http://www.gnu.org/software/gettext/ You probably don't need to download it, since it should be included with your version of PHP. Just enable it in the php.ini, or compile it in from source.

    GNU Gettext has been around for a number of years. It's fairly efficient, well maintained and has a larger user base. It basically makes use of mapping a reference ID and a language-locale to a string of text. It replaces the ID with the appropriate text in your template to create a finished document. Text for different language-locales are stored in separate files called PO files.

    You will also want a PO file editor.

    Here are a couple of articles on GNU Gettext

    http://www.phpdig.net/ref/rn26.html
    http://www.onlamp.com/pub/a/php/2002/06/13/php.htm l
    http://www.uberdose.com/php/php-and-gettext-for-i1 8n/

    If you are going to be using professional translators, you may want to consider XLIFF as a document exchange format. There are XLIFF to PO converters available.

    You may be considering XML (XHTML, XSLT and XLIFF) for Internationalization. The PHP solution, using Sablatron, is not yet fully-baked. I would avoid it for a production system. It shows promise for the future. Plus, XLIFF is not recommended as a storage format. You'll probably find some performance issues if you try to use it as a direct data store.

    Use templates, if at all possible.

    You may not be able to use the same template for all language-locales, but they should work for most cases. If you have a BDI language, for example Arabic or Hebrew, would likely need a separate template.

    Localize your CSS stylesheets.

    You may have locale specific layout and formatting information in your stylesheets.

    From a design point of view, consider using a combination of a Front Controller pattern to switch languages and a Page Controller pattern to apply the templates.

    Where are you storing the article data? Is it in the MySQL DB, or is it in static files that are referenced by the DB? Focus most of your efforts on the part that is most critical, MySQL if most of the data is in the DB, or PHP if most of the data is static. But remember, you are going to have to internationalize both parts of your system.

    Don't forget, text from many other languages takes up more space than english to say the same thing. Sometimes 30-50% more space. This can significantly impact layout in heading sections, column widths, a

  53. Best article on localization by greggman · · Score: 1

    First of all read this article. Yes, it's from perl not php but it's an awesome introduction to some of the problems you might have

    http://perl.active-venture.com/lib/Locale/Maketext /TPJ13.html

  54. Re:Too new? by cicadia · · Score: 1
    Unicode isn't bytes, its a standard for referencing characters. Perhaps you were thinking of UTF-8 or UCS-2.

    Umm, I never said "Unicode is bytes". Unicode is a standard, Unicode is a consortium, Unicode is a registered trademark of Unicode Inc. Unicode is not bytes.

    What I said was that Unicode strings are composed of bytes. A sequence of Unicode characters, under a particular encoding, is generally representable as a sequence of bytes.

    I guess you were happy with FORTRAN character strings, too? They're just bytes after all.

    Happy? I suppose they're not necessarily the most efficient things in the world (I don't know for sure; I've never really used FORTRAN,) but as a sequence of bytes, I'm sure they're perfectly suitable for storage and retrieval, so I guess I can be happy with them.

    Being fixed length might have some performance advantages in certain applications, too, though at the expense of some storage efficiency.

    But then, I don't really know much at all about FORTRAN, so I'm going to stop posting about it.

    --
    Living better through chemicals
  55. Re:Too new? by Anonymous Coward · · Score: 0
    good advice, jizzmop. Too bad you don't take it.

    Pull your thumb out of your ass and check out UTF 8 or html entities, you fart knocker.