Slashdot Mirror


On Creating Multilingual Web Sites?

Jens asks: "I am designing an Intranet web application that needs to come out in multiple languages. I am using PHP to include common elements in include files, which makes things a lot easier. I want to avoid making each change three times (I have someone doing the translations, however). The question is: How do I tackle the multiple languages? Do I separate design from content, or content from design? Do I write "<table><tr><td>$text[$lang]</td></tr></table>", keep the international text in include files, and then call the pages with appropriate parameters; or should I write "<?php nice_table("Dies ist der deutsche Text"); ?>" and keep three different files, but one include file with all the design elements? How do I handle buttons (i.e. graphics) with text on them?" (Read More)

Multi-language sites are tricky, but as long as there is some separation of page design and language elements, it shouldn't be too hard for the rest to fall into place. What determines whether you separate design-from-content or content-from-design depends on the plans for your implementation. What schemes work for those webmasters out there with already established multi-lingual sites?

16 of 189 comments (clear)

  1. Babelfish is your friend... by Christopher+B.+Brown · · Score: 3

    Why not just write in English, and have a CGI that uses Babelfish to translate when a user logs in indicating some other language? :-)

    --
    If you're not part of the solution, you're part of the precipitate.
  2. Re:xml style approach? by Matts · · Score: 3

    Don't forget: XML was designed from the ground up for multilingual support, via the xml:lang attribute. Use that, not some invented tag!!!

    --

    Matt. Want XML + Apache + Stylesheets? Get AxKit.
  3. Seperation of both form AND content by jd · · Score: 5
    Actually, I'd remove -BOTH- the form AND the content from the page. Fetch the outer form from one database (that saves you having multiple copies of templates, for each page), have content-specific HTML in a second database and fetch the content from a third.

    My thoughts would be to have anything not explicitly related to the actual content stored seperately from content-arranging tags (such as tables, paragraphs, etc.). This lets you maximise reusability, and minimise effort.

    ie:

    Database 1 -> Outer Shell Template
    Database 2 -> Content-Specific Formatting Template
    Database 3 -> Actual Text, in 1+ records. This should contain NO tags, whatsoever. That's all done in #2.

    --
    It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
  4. Apache and Content Negotiation by jeffg · · Score: 3

    Those looking for multilingual solutions for sites might want to look into making some use of Apache's content negotiation. See http://www.apache.org/docs/cont ent-negotiation.html for more information.

    1. Re:Apache and Content Negotiation by dsplat · · Score: 4
      --
      The net will not be what we demand, but what we make it. Build it well.
  5. Hints for globalizing web pages by Kismet · · Score: 3

    I am currently responsible for globalizing a web project at my company. This is the first time I have had to deal with this sort of thing, and I have learned much. Here are some tips:

    1) Whether you decide to dynamically fetch strings when the page is processed, or have multiple versions of the HTML isn't as important as deciding your strategy in the first place. Make the decision and stick with it long before you start work on the actual project. Having to implement a globalization strategy for a site that has already been programmed can be difficult if not impossible. Heed the warning about separating content from code, but be sure you know how that is going to play into your strategy.

    2) Language is only part of the problem. You need to consider sort order, for example, if you are presenting sorted lists. You need to consider date and time formats and also number formats. Some countries swap the comma and the decimal point, for example. If you are planning on selling something, then multiple currency support would be useful.

    3) You need to support multiple code pages. It would be neat if you could just use UTF-8, except there is no widely available Unicode font that contains all the glyphs needed for some languages. It is poor globalization design to only support the latin codepage assuming that you'll never need Korean, for example.

    4) Make sure you avoid colloquialisms and other culture-centric ideas on your page. Keep it simple and as icon-free as possible. Where you have icons that contain text, keep a copy with the text layer separate from any background elements. Gimp has some features that help when localizing these bitmaps. But it's best to just avoid them.

    The project I am heading up contains several hundred .asp files. Rather than translating every one of these files into who knows how many languages, we are creating a string resource that can be queried by a server object. Someone recommended that you look at the GNU gettext, which I second. If you can find standards that already exist, I recommend you use them.

    Someone else recommended an XML approach. Again, this is a good idea to consider.

    Don't try and re-engineer some existing code to make it global. I can't emphasize that enough. Start global from the ground up. Try to find the most intuitive means of doing so.

  6. Graphics with text. by Abigail-II · · Score: 3
    How do I handle buttons (i.e. graphics) with text on them?

    You don't. Try to imagine you are blind and need a speech interface, or that you have bad eye sight and need 48pt fonts to read something, and then be faced with a site that uses needless graphics for navigation, when written words would have done as well, if not better.

    -- Abigail

  7. I disagree... by rm+-rf+/etc/* · · Score: 4


    The thing to remember here is that PHP vs Zope is not a decision you will ever have to make. Zope is not a language, it's an application server. Python vs PHP is a comparison. If you want to talk zope, you have to look at the php equivalent, midgard (http://www.midgard-project.org/). Not that I have anything against zope or python, they are great tools, I just think for the task here php/midgard are much better suited. Part of this is because I think PHP has a quicker learning curve and let's face it, there's no sense in mastering a language just to create a multilingual site... Second, Midgard is much more suited to this type of thing. I suggest Zope to people who want to program a website, and Midgard to people who want to manage website content. Midgard is much more focused on content and using inherited styles and layouts, plus giving multi user web based access to manage content and layout. For something like this where you have the same layout and style just with different content, I think Midgard will really do this with less hassle and effort.

  8. gettext by rm+-rf+/etc/* · · Score: 5


    PHP4 can be built with gettext support. gettext is a GNU library for internationalizing programs. PHP's support is undocumented currently, so you'd have to check out the code to see what it does (in ext/gettext), but it might be worth looking into.

    Gettext info and manuals can be found at http://www.gnu.org/software/gettext/

  9. Re:some experience by jilles · · Score: 3

    "First off you've got to determine what language the end user wants to view the site with. This can be done multiple ways: client hostname, browser version (language), server hostname (ie. japan.bigcorp.com), or by hitting a button or link on the page."

    client/server hostname is a bad idea. I'm dutch but I live in sweden and when I visit www.lycos.com, I'm presented with the swedish version of this site. My browser settings are ignored (favors dutch above english and english above swedish). As a consequence I don't use lycos anymore. There is of course the alternative link which I can click to get the dutch or english version but that's too much trouble for me.

    The best way I think is to just ask the user what language he/she prefers. Even if english is not your native language, it sometimes is the best option since the english version of a site is updated more often.

    --

    Jilles
  10. Re:Well first of all.... by angelo · · Score: 3

    Damn straight! that may very well be out of my personal design theory! Tables are logical elements for organising data, not a tool to lay out your webpage! I couldn't put this better myself! Maybe if people paid attention to content, the internet would be a far better place. I suppose I'll go back to dreaming.

    PS: I tried to come up with an alternate layout for /., and made it in css. not only did it look sweet, but it was changable from a stylesheet! whoot!

  11. PHP wouldn't be my first choice by hey! · · Score: 4

    Don't get me wrong, I like PHP a lot, but it's greatest strength is for people who like to work in HTML and but make it smart. You can separate content and presentation in PHP, because it is a reasonably powerful language, but doing throughout your site means turning every page in your site into a program to emit the content in the desired language. Once you go there, you may as well consider other options for transforming content.

    There are lots of ways people have come up with for doing this kind of thing. You could do each translation as an XML document and use XSLT to convert into to a fully decorated HTML document. You could use java servlet changes to take a simple HTML translation and to add the usual banners, links and formatting. This would be nicer than transparently dynamic content because it could be cached by the user and downstream proxies.

    My current favorite method is Zope. In Zope, nested folders inherit all the characteristics of their parent folder, but allow you to override them. I use this to enforce stylistic uniformity between people maintaining content on my site, which is just a generalization of your problem.

    With zope, you could do your original site in German, and put it in a folder called "/foo"; then create an empty folder called "/foo/EN" which by "acquisition" starts by looking identical to "/foo" even though its blank. Then you start overriding the various text bits into English, and gradually, an English translation emerges. You could even write a little method to iterate over a bunch of links and append "/EN" to them.

    The main issue with zope is scalability, since everything is dynamically generated. I use squid as a reverse proxy to cut down on dynamic page generation overhead. This also turns out to be an easy method for multihoming zope, which is a requirement that I have.

    --
    Post may contain irony: discontinue use if experiencing mood swings, nausea or elevated blood pressure.
  12. Re:why tear them apart at all? by thogard · · Score: 3

    I run one site that is accessed from all over the world and I can tell you that 90% aren't using a 4.0 broswer (its only about 1/3 and only thouse from well off countries) and looking at how long it takes to transfer the pages the connection between here and there is often way slower than 33k.

  13. xml style approach? by zzzeek · · Score: 5

    I havent used PHP before, but heres the basic idea not specific to any programming language..

    build your site in English (or your default language), assuming we are talking about regular static HTML files or some kind of .shtml perhaps. Write a filter for your webserver (I use java servlets mapped to *.html myself) that parses the path info of the request for something like "/german/foo/bar.html", i.e. the language identified in the beginning of the URI. Then within your HTML files, anywhere you want translated text, do it like this:

    <translation default="english" modulename="/foo/bar/mytext.html">
    This is my english text.
    </translation>

    Then within your filter servlet or apachemod or whatever, parse the file for these tags (caching schemes can be utilized for speed), and then based on the language encoded in the URL, dynamically replace the body text (if the URI-specified language is not the default) with the contents of a file <languagebase>/foo/bar/mytext.html. So you could have somehting like /web/translations/german/foo/bar/mytext.html, /web/translations/spanish/foo/bar/mytext.html, etc. If a translation file is not present then you just use the default text already in the document, so you can still launch new HTML pages even if a translation is not available yet.

    If you want to raise the bar of speed, dont use a dynamic filter, just write a perl script to regenerate an entire static site underneath "/german" "/spanish", etc. using the same scheme. Or you could even mix up the two approaches.

    If your site is not static HTML but some kind of database driven thing, you can still use a similar approach, it just means the filtering program has to be molded to fit your content-delivery environment.

  14. Well first of all.... by elegant7x · · Score: 3

    Don't use tables at all, use CSS layout, you'll find it makes it a lot easier to seperate content, people with 4.0+ browsers will see the cool stuff, and people with older browsers will see some old-school, highly readable HTML. Table layout is dead, long live CSS...

    Most HTML+CSS pages are readable right from the source, and would be easy to translate the file in whole.

    If I were in your place, I'd put every paragraph in a database table, with each row having the text in each language. That way, the translators could work on one paragraph at a time, and it would still be easy to update.

    Amber Yuan 2k A.D

    --

    "and dear god does this website suck now." -- CmdrTaco
  15. I've done this with MySQL and a perl script. by broken77 · · Score: 3

    I've done this on a recent project by storing all text into a MySQL database and writing a simple perl script to merge the text from the DB into the HTML files.

    It goes like this:

    1. Make a language resource table. Call it "RESOURCES". The columns are TEXT_ID, LANGUAGE, and TEXT_DATA.

    2. Make an html template directory. You will store all "raw" html files here. Beneath this directory, make subdirectories for each language (eng, frc, jpn, etc.)

    3. In the HTML, make references to the database values by some easily identifiable token string, and wrap this token string around the TEXT_ID value from the database for this text resource. If you want, you can put the english equivalent inside the token string, so you can read the templates in their raw form. E.g.:

      <p>##38471::Welcome to my multilingual website!##</p>
    4. Write a script (I chose perl) to:

      - Read the templates
      - For each language to translate: - Look for the existence of the token string (## in this case)
      - Take the resource ID, do a database lookup based on the language
      - Substitute the resource text for ##ID::string##
      - Save the modified html to the language subdirectory for this language.
      - End

    That should be it. Now, when an english-speaking person comes to your site (you'll have to ask them somehow of course), you can just redirect them to /path/templates/eng/file.html, and everything will work.

    This doesn't address the images, however. If you're using languages that use the western european character set (french, spanish, english, portuguese, german, italian, etc.), it will be easy. You'll be able to type your text directly into photoshop or the gimp or whatever and make your graphics. The next thing is to put a language token in these HTML templates that you've made for all images. Something like this:

    <img src="/images/##LANGUAGE##/button.gif">

    And in your language parser, write a one-line substitution that will substitute all instances of ##LANGUAGE## to the current language you're iterating through.

    If you're translating to languages with different character sets (double-byte languages such as chinese, korean, etc.), you'll need to create your graphics differently, but once their created, the storage of them is the same. One way to create them is to write a cgi that will run through the DB and print, in HTML all the text resources of a given language. If you have your browser set to the correct character set, you will see the foreign language characters correctly. You can then do a screenshot, and paste the screenshot into your graphic editor to make buttons or whatever out of.

    This approach has worked really well for us on two projects so far, and looking to be more projects soon. The advantage of making these HTML templates is that it greatly reduces the load and time it would take to build the pages if they were dynamically created from database lookups upon request. You just run the template generating script every time a change is made to the template, and voila.

    --

    I modded the Troll Investigation and I got