Slashdot Mirror


On Creating Multilingual Web Sites?

Jens asks: "I am designing an Intranet web application that needs to come out in multiple languages. I am using PHP to include common elements in include files, which makes things a lot easier. I want to avoid making each change three times (I have someone doing the translations, however). The question is: How do I tackle the multiple languages? Do I separate design from content, or content from design? Do I write "<table><tr><td>$text[$lang]</td></tr></table>", keep the international text in include files, and then call the pages with appropriate parameters; or should I write "<?php nice_table("Dies ist der deutsche Text"); ?>" and keep three different files, but one include file with all the design elements? How do I handle buttons (i.e. graphics) with text on them?" (Read More)

Multi-language sites are tricky, but as long as there is some separation of page design and language elements, it shouldn't be too hard for the rest to fall into place. What determines whether you separate design-from-content or content-from-design depends on the plans for your implementation. What schemes work for those webmasters out there with already established multi-lingual sites?

64 of 189 comments (clear)

  1. Re:gettext (gettext has problems) by Anonymous Coward · · Score: 2

    Many languages have richer grammar than English, and since the phrase/word is used as an index to the translation you often get into situations where it is impossible to do a correct translation... i.e. The word 'new' in English is used both in singular and plural, when gettext generates the .po file it will reference all places where the word 'new' is used and it doesn't allow you to split up the entry manually providing different translations for different contexts.

    X/Open catgets and the Java Locale system does not have this weakness, but requires more maintenance.

  2. yes... by mosch · · Score: 2

    because perl gives you more than one way to do it. and they're both wrong. (hey, you flamed first)

    PHP isn't the root of all evil, and I've actually seen quite a bit of really elegantly written large php sites (500K+ codebases) that were maintainable and easily understandable.
    ----------------------------

  3. how to internationalize in php3 by mosch · · Score: 2

    At the top of the page call the secret function:

    init_i18n();

    and then use the super-secret _( macro. How it works is this. Lets say you have a function foo that's defined to be called foo (string baz, int bar) instead of calling foo('title', 3) you'd call foo(_('title'), 3). and echo "foo"; changes accordingly to echo _("foo");

    Now you just need to follow the instructions to setup your strings files, but you now know how to do the php specific gettext wrapping.

    You'll have to have the standard files containing language specific msgid->msgstr mappings, and it's helpful if you cook up a little script to grab every string and create a base msgid file if you're managing a large source tree. It's not the most intuitive or well documented procuedure unfortunately. At some point I'll have to write good documentation on how it all works, but hopefully this is at least somewhat useful.


    ----------------------------

  4. Babelfish is your friend... by Christopher+B.+Brown · · Score: 3

    Why not just write in English, and have a CGI that uses Babelfish to translate when a user logs in indicating some other language? :-)

    --
    If you're not part of the solution, you're part of the precipitate.
  5. Re:xml style approach? by Matts · · Score: 3

    Don't forget: XML was designed from the ground up for multilingual support, via the xml:lang attribute. Use that, not some invented tag!!!

    --

    Matt. Want XML + Apache + Stylesheets? Get AxKit.
  6. Re:Seperation of both form AND content by jd · · Score: 2

    Actually, you can cut out steps 2 and 3 in that. The browser passes the language setting to the server, which the server can handle using multiviews. That way, you can get the browser and server to take that part of the workload off your back.

    --
    It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
  7. Seperation of both form AND content by jd · · Score: 5
    Actually, I'd remove -BOTH- the form AND the content from the page. Fetch the outer form from one database (that saves you having multiple copies of templates, for each page), have content-specific HTML in a second database and fetch the content from a third.

    My thoughts would be to have anything not explicitly related to the actual content stored seperately from content-arranging tags (such as tables, paragraphs, etc.). This lets you maximise reusability, and minimise effort.

    ie:

    Database 1 -> Outer Shell Template
    Database 2 -> Content-Specific Formatting Template
    Database 3 -> Actual Text, in 1+ records. This should contain NO tags, whatsoever. That's all done in #2.

    --
    It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
    1. Re:Seperation of both form AND content by penguinicide · · Score: 2
      You're starting to get into web app server areas. Vignette and zope support htings of this nature. (Although zope has a context sensitive subclassing feature i like alot (vignette doesn't))

      Zope Vignette

      There are others(possibly asp, coldfusion, jsp, etc...), but these are the ones I know a bit about.

      --


      penguinicide... when jumping out a window just won't do.
  8. Use Language Metatag; Don't Specify Font Face by MoNickels · · Score: 2

    Two of the simplest but most widespread problems with viewing foreign language web sites, particularly in non-Roman character sets, are when the language is not specified in a metatag or the font face *is* specified in HTML.

    The first problem has been described elsewhere on this page, but deserves reiteration: specifying the encoding language in HTML can allow the correct font, language script and character set to be used automatically by the browser without scripting on the server end. On my Mac, for example, this would allow the Haaretz newspaper site to come up automatically in Hebrew (having chosen the correct language script and font for me). I would not have to manually choose the settings. Good examples of where this is not done but should be are the Yahoo! Asian sites. The only way my browser knows I'm looking at Korean text on Yahoo! Korea is because I choose it; the HTML pages are not encoded to tell my browser this.

    Scripting to determine what preferred language is chosen in the browser is, in my opinion, the hallmark of a great multi-language site. The second hallmark is a link on every page to switch to another language at will.

    A similar item happens in bad email programs: they do not specify the character encoding in the header. One of the nicest things about a great email program is that when I get, say, a Japanese-encoded email, even if the words themselves are English (by using the Roman characters built into the Japanese encoding group), it kicks in the Japanese character entry and editing system automatically. It recognizes it, as it should.

    Intentionally *not* specifying font face is equally important. If I want to view web site in Hebrew or Arabic, a ridiculous number of the sites require that I download the particular font they have specified in their HTML. This is preposterous. Language encoding, used properly, might mean never having to download another font again.

    As for graphics in lieu of foreign fonts: avoid it any way you can. It makes copying and pasting near-impossible. (I once had to do it this way: I saved the GIF image to my hard drive, converted it to TIFF and ran it through and optical character recognition program, then cleaned it up manually. A huge waste of my time).

    --

    Wordnik, a dictionary project which aims to collect

  9. Apache and Content Negotiation by jeffg · · Score: 3

    Those looking for multilingual solutions for sites might want to look into making some use of Apache's content negotiation. See http://www.apache.org/docs/cont ent-negotiation.html for more information.

    1. Re:Apache and Content Negotiation by dsplat · · Score: 4
      --
      The net will not be what we demand, but what we make it. Build it well.
  10. Advanced approach by jole · · Score: 2

    The approach we are using is to separate both language and style from the content by putting all the text into database and building object oriented set of libraries which automatically select ui-style and language using sessions.

    example:
    $page = new Page() # New page object is initialized; user, language and style are identified
    $page->title('7652'); # Title phrase number 7652 is printed on users language and style to page object
    $page->paragraph('7898'); # Chunk of text is printed
    $page->showpage(); # Page is shown to user in selected style

    The phrase-table is then indexed by phrase-numbers and languages and modification dates are kept up-to-date for all the phrases. All the content on the pages can be modified on the fly by using online-editors, which gives translators and content creators access to phrase-table. Editors also make the phrase numbering totally transparent. The phrase table is distributed to multiple development machines and synchronized over CVS using special synchronization tools. The approach is very fast on Apache+DB2+Linux+embPerl platform, but caching into static perl-hashes in Apache-registry is also possible to make it even faster.

    You can go to our site (Europes first and only virtual hospital providing healthcare on the net) Atuline.COM to try it out. Go to demo and inside the virtual hospital try to change language and user interface look'n'feel from settings/interface. Currently only 5 languages / 3 looks are supported but there is more to come..

    -- Joonas

    --
    Vaadin - the best open source framework for building web applications in Java - no plug
  11. Re:The java way by Nicolas+MONNET · · Score: 2

    Microsoft kind of had a monopoly as of late WRT performance-wise USABLE XSLT parsers. Now this is about to change with Apache's Xalan-C. IT should be quite fast. See Apache-XML's site.

  12. Re:The java way by Nicolas+MONNET · · Score: 2

    People are the problem, and that's precisely my point. I don't want to trust any dumb luser around. I'm just not suicidal. The same way that my servers hopefully don't go down as often as stupid lusers remove important files "by mistake". Thank you for your attention.

  13. Text Base by howardjp · · Score: 2

    I have recently run across something similar. I want to convert a a telnet-based bbs (called M-Net) to a multilingual site. The problem is there we are limited to merely what VT100 will let us output. In the end, we decided to translate the documents and use basic locale and i18n features already in the OS to help. It is not quite the same but I hope it helps.

  14. No Multilingual Site Is Complete Without... by unitron · · Score: 2
    No Multilingual Site Is Complete Without...

    The Irish-language curse engine (An tInneal Mallachtaí)!

    Jefferson City, Missouri's Lincoln University offers this amusing little interactive time sponge here and The Register explains it here.

    --

    I see even classic Slashdot is now pretty much unusable on dial up anymore.

  15. Re:European posters [OT] by QZS4 · · Score: 2

    Europeans get moderation points too. I usually spend them during Swedish daytime (yes, I should be working...).

  16. I'm working on something like this right now! by orabidoo · · Score: 2
    I'm working on a multi-lingual-capable system these days; it puts together static text from various templates (in various files) with script-generated sections, and has meta-commands for "this bit in french" "this bit in english" for cases where you want to put everything into a single file (otehrwise you can just have one file for each langauge version). the driving ideas are:
    1. total separation of content and code (to the extent that the content files don't even have conditionals, all they have is a way to give names to different bits and specify templating relationships)
    2. dynamic "compilation" of pages into data structures, with cacheing of this intermediate form, and
    3. the fundamental unit is a content page that calls code, not a code page that outputs content.
    the whole thing is perl-based, on top of mod_perl and speaking directly to apache's perl API (i.e not using CGI.pm or Apache::Registry).

    send me an e-mail if you're interested; it's in a very raw format now and not near release, but i'll probably be able to release it as open source. if not, well, at least we can talk about it :)

  17. YES! Re:gettext by Brent+Nordquist · · Score: 2
    PHP4 gettext is the way to go. IMP (a GPL'ed web-based mail reader) is fully internationalized and this is the direction they're going.

    The PHP4 function _(x) is a synonym for gettext(x), so the code ends up being very readable for the maintainers: _('Permission denied.')

    --

    --
    Brent J. Nordquist N0BJN
  18. Hints for globalizing web pages by Kismet · · Score: 3

    I am currently responsible for globalizing a web project at my company. This is the first time I have had to deal with this sort of thing, and I have learned much. Here are some tips:

    1) Whether you decide to dynamically fetch strings when the page is processed, or have multiple versions of the HTML isn't as important as deciding your strategy in the first place. Make the decision and stick with it long before you start work on the actual project. Having to implement a globalization strategy for a site that has already been programmed can be difficult if not impossible. Heed the warning about separating content from code, but be sure you know how that is going to play into your strategy.

    2) Language is only part of the problem. You need to consider sort order, for example, if you are presenting sorted lists. You need to consider date and time formats and also number formats. Some countries swap the comma and the decimal point, for example. If you are planning on selling something, then multiple currency support would be useful.

    3) You need to support multiple code pages. It would be neat if you could just use UTF-8, except there is no widely available Unicode font that contains all the glyphs needed for some languages. It is poor globalization design to only support the latin codepage assuming that you'll never need Korean, for example.

    4) Make sure you avoid colloquialisms and other culture-centric ideas on your page. Keep it simple and as icon-free as possible. Where you have icons that contain text, keep a copy with the text layer separate from any background elements. Gimp has some features that help when localizing these bitmaps. But it's best to just avoid them.

    The project I am heading up contains several hundred .asp files. Rather than translating every one of these files into who knows how many languages, we are creating a string resource that can be queried by a server object. Someone recommended that you look at the GNU gettext, which I second. If you can find standards that already exist, I recommend you use them.

    Someone else recommended an XML approach. Again, this is a good idea to consider.

    Don't try and re-engineer some existing code to make it global. I can't emphasize that enough. Start global from the ground up. Try to find the most intuitive means of doing so.

  19. Structure by turg · · Score: 2
    Part of the answer depends on what you want the site to look like for the user. Most Canadian web sites start off with a splash/welcome page where the user chooses English or French (e.g. Canada Post). Personally, I would want a bit more functionality on the first page.

    I think that, in any case, you need to make sure that any repeated design element only occurs once so you never need to make the same change in multiple places to redign the site. THis seems sort of obvious, but it's hard to be more specific without more details about your project. I have always tended (when using SSI) to have the "actual" html file contain the content and call the design up from a single template -- and have been very happy with this approach. You need to decide whether you are adding another decision layer to this (e.g. from content+design to content+design+ language -- or just stay with content+design and make seperate pages for seperate languages)

    ======
    Webmasters: get a Free Palm Pilot for referring 25 signups (Web-based games).

    ========

    --
    <sig>Guvf vf abg n frperg zrffntr
  20. Buttons by turg · · Score: 2

    For the question about buttons, you life would probably be easier if you used wordless icons accompanied by text -- I personally don't think it makes much sense to use GIF/JPEG/PNG to represent text in any case. But if you really need to use graphics to represent text, make it part of the variable subsitution -- img src="$language$button_name.gif"

    ========

    --
    <sig>Guvf vf abg n frperg zrffntr
    1. Re:Buttons by Ralph+Wiggam · · Score: 2

      Just being picky: I would recommend image file names as button_name-language.gif. This would put NextPage-Eng.gif and NextPage-Ger.gif next to each other alphabetically.

      -B

  21. Re:Graphics with text. by Abigail-II · · Score: 2
    For a site where accessibility is a prime concern (a site on blindness for example)

    Accesibility should be a prime concern for every site. What on earth makes you think blind people have only a very limited range of interests? Do you think it's fine web sites use plugins that are only available for Windows users, and only a site like Slashdot should concern itself with plugins for Linux? Or would you agree people using Linux have more interests than "news for nerds"?

    -- Abigail

  22. Graphics with text. by Abigail-II · · Score: 3
    How do I handle buttons (i.e. graphics) with text on them?

    You don't. Try to imagine you are blind and need a speech interface, or that you have bad eye sight and need 48pt fonts to read something, and then be faced with a site that uses needless graphics for navigation, when written words would have done as well, if not better.

    -- Abigail

    1. Re:Graphics with text. by FreezerJam · · Score: 2

      How do I handle buttons (i.e. graphics) with text on them?

      You can use the background= feature of cells to place a background 'button' graphic in each table cell. As long as the cell is bigger than the graphic, the 'button' will float to the top left corner. Then use align (likely top and left) to get the text in the right spot. Be sure and use contrasting colors properly. Make the text part of an link, and away you go.

      Fit and finish -- use CSS to suppress the decorations on the text, since it already looks like a button. Don't forget the ALT tags! Conveniently, you will also be text-enabling your pages.

      All this applies whether you use a EN#FR#DE style inline in the page, or an exernal approach. The external approach works better if the text is unlikely to change often - things like a database access site, for example.

  23. Re:I have the same question/problem. by rm+-rf+/etc/* · · Score: 2
    Does PHP allow you to create objects like Java does with JavaBeans?

    Yes, php has a basic concept of objects, not incredibly powerful, but good for grouping together data and operations. Better yet, php can actually create and work with Java objects. Or better still, php can be embedded in a java servlet engine so that you can use php as a replacement for JSP in your servlets :)

  24. Re:IIS, NT and Unicode by rm+-rf+/etc/* · · Score: 2


    Keep in mind, there are a lot of reasons to use one thing or another. If people don't want to use ASP because it's an MS product, who cares? The most vocal group are always the idiots. While it's nice to do this so easily, you need to look at other things here. For example, what would it cost for this person to switch to an NT/ASP solution? Consider a hardware upgrade, software licensing, and most imporantly, the time required to learn a new platform *and* a new language, this is not an economical choice and may not be worth it to be able to solve the problem in two lines of code rather than 50.

    Also note that with php, you can do this with equally few lines of code at all. Using gettext support, you simply create your translations and store them elsewhere, then by prefixing your output with a _ it will be translated.

    There are always multiple ways to solve problems, and each one has advantages and disadvantages. Don't think that people avoid IIS/ASP just because they hate MS. Those who do don't really matter in the grand scheme of things.

  25. I disagree... by rm+-rf+/etc/* · · Score: 4


    The thing to remember here is that PHP vs Zope is not a decision you will ever have to make. Zope is not a language, it's an application server. Python vs PHP is a comparison. If you want to talk zope, you have to look at the php equivalent, midgard (http://www.midgard-project.org/). Not that I have anything against zope or python, they are great tools, I just think for the task here php/midgard are much better suited. Part of this is because I think PHP has a quicker learning curve and let's face it, there's no sense in mastering a language just to create a multilingual site... Second, Midgard is much more suited to this type of thing. I suggest Zope to people who want to program a website, and Midgard to people who want to manage website content. Midgard is much more focused on content and using inherited styles and layouts, plus giving multi user web based access to manage content and layout. For something like this where you have the same layout and style just with different content, I think Midgard will really do this with less hassle and effort.

  26. gettext by rm+-rf+/etc/* · · Score: 5


    PHP4 can be built with gettext support. gettext is a GNU library for internationalizing programs. PHP's support is undocumented currently, so you'd have to check out the code to see what it does (in ext/gettext), but it might be worth looking into.

    Gettext info and manuals can be found at http://www.gnu.org/software/gettext/

  27. I use perl but... by toofast · · Score: 2

    I have common files for page headers and footers, and seperate language files for the body. The body files contain html for language-specific graphics (that contain text).

    I then use a perl script to combine the header + body + footer together.

    This is rather painful tho, as there is a lot of redundancy, and page changes are quite painful.

  28. Re:some experience by jilles · · Score: 3

    "First off you've got to determine what language the end user wants to view the site with. This can be done multiple ways: client hostname, browser version (language), server hostname (ie. japan.bigcorp.com), or by hitting a button or link on the page."

    client/server hostname is a bad idea. I'm dutch but I live in sweden and when I visit www.lycos.com, I'm presented with the swedish version of this site. My browser settings are ignored (favors dutch above english and english above swedish). As a consequence I don't use lycos anymore. There is of course the alternative link which I can click to get the dutch or english version but that's too much trouble for me.

    The best way I think is to just ask the user what language he/she prefers. Even if english is not your native language, it sometimes is the best option since the english version of a site is updated more often.

    --

    Jilles
  29. Re:Well first of all.... by angelo · · Score: 3

    Damn straight! that may very well be out of my personal design theory! Tables are logical elements for organising data, not a tool to lay out your webpage! I couldn't put this better myself! Maybe if people paid attention to content, the internet would be a far better place. I suppose I'll go back to dreaming.

    PS: I tried to come up with an alternate layout for /., and made it in css. not only did it look sweet, but it was changable from a stylesheet! whoot!

  30. roxen makes it easier by eMBee · · Score: 2
    a lot of the problems mentioned are easley solved with the Roxen WebServer.
    Roxen has an extensive SSI language (RXML) that allows you to build powerfull sites.
    eg. buttons can be created by the server, with the text you specify (like <gtext>click here</gtext>), you can also use a backgroundimage and thus putting any kind of text on top of any image.

    of course seperation of content and design is key.
    in order to achive 100% seperation i wrote an xml template module for roxen, that allows me to specify the content in XML, and then apply a template to show it.
    roxen also makes it easy to check for the clients default language, so you could select the language based on that easely.
    the most important thing is, it all looks like html to the webauthor. there is no programming involved.

    sure, php is nice, but if you don't know about programming languages, then it will look very confusing, also it's very easy to screw up, because the non-programmer doesn't see that there is a ; or , missing, that creates a syntax-error in the php code.

    greetings, eMBee.
    --

    --
    Gnu is Not Unix / Linux Is Not UniX
  31. Re:I have the same question/problem. by bergie · · Score: 2

    Not only when it comes to translations, but context-layout in general. PHP is nice and dandy, but for somehow it is really hard to seperate content and layout, or content and language support/translations for that matter.

    You might want to check out Midgard then. It is a Web application server that uses PHP as its scripting language. While it isn't that much better in internationalization, it at least has good support for separating layout, structure and content into different components.

    /Bergie

    --

    --
    Midgard Project - Open Source CMS
  32. Midgard? And Zope/Python misconceptions. by hey! · · Score: 2

    I suggest Zope to people who want to program a website, and Midgard to people who want to manage website content.

    It sounds like you think Midgard is better ;-)

    Well, the whole point of all these systems is to extend HTML with some kind of dynamic behavior; if you define this as "programming", then they all require some kind of programming. The basic scripting language of Zope is DHTML; it works like practically every other system like this -- you write HTML and decorate it with script directives -- just like PHP. If anything, DHTML is simpler than PHP, albeit less powerful. You do have the power of Python behind the system if you need it, but many if not most people will use Zope's capabilities blissfully unaware that Python even exists.

    This midgard project looks interesting, but the screenshots were not very informative (except to say the developers perhaps have a little more aesthetic sense than most). I'd be interested in your unvarnished opinions on midgard, and I'll give you my own unvarnished opinion of Zope.

    First the good stuff.

    Zope's first great strength is that it is an object publishing system. It allows you to reuse, not only display templates, but database queries, and useful bits of logic (e.g. converting lists of URLs into horizontal or vertical link menus). These things have clear usefulness for database oriented applications, but they also have a great deal of usefulness for things which are normally done through static inclusion. For example, I can define a table of links, and then define logic to transform that table into horizontal or vertical menus. Of course this is easy to do in PHP, but the neat thing about Zope is that it is simple and natural for a document to inherit the list of links from a higher folder; to override the list of links but use them in the same way as the document template; to reuse the same transformation logic for different purpose. Of course, you can do all this stuff using low level scripting systems, but its a lot of work to make it happen, whereas in Zope it is simple, natural and automatic.

    For database work, it manage persistent connections very nicely, and by providing objects for database connections, SQL statements, and transformation into various presentations, it allows you to plug all these things together tinker-toy fashion.

    Because of this reusability, there's a lot of terrific stuff that's already been done than you can simply grab and plug into your website, such as a slashdot like discussion forum (minus moderation, alas).

    Now the bad stuff.

    The documentation is pathetic. It is obviously written by people who know the internals of the system and have since the beginnning of time -- they don't really remember what it is like not to know. In fact, the user guides sometimes make use of undocumented internals. Of course you have the source code, but that's really a last resort.

    I often say there are two kinds of people who read documentation. One are hands on people who like to see examples and generalize from them, and others are abstract folks who like to see principles and specialize from them. The documentation doesn't really serve either. The step by step examples are somewhat obsolete and if you follow them they don't always work with the latest stuff. Sometimes because the method documented has been somewhat superceded by newer methods which are explained in vague, handwaving manner which will infuriate abstract thinkers and do nothing for hands on people.

    This truly awful documentation will be a show-stopper for many people. YOu have to beat your head against the system for a while

    Speed is not exactly stellar. Python has time and time again shown its ability to handle incredibly complex logic; but it is SLOOOW. Zope is surprisingly fast considering that it is written in Python and practically every page is parsed with some dynamic behavior. There are some moderate volume techie sites like Bruce Peren's technocrat site running Zope, but you aren't going to scale to Slashdot or Amazon's scale.

    That said, I chose Zope because it works very well, once you learn your way around it. I don't know half of it, and there's no way of even learning half of what it can do without going to the source code, unfortunately, but what is documented is great.

    --
    Post may contain irony: discontinue use if experiencing mood swings, nausea or elevated blood pressure.
  33. Re:how does squid improve speed? by hey! · · Score: 2

    Apache can be used in reverse proxy mode with Zope, too!

    Any pointers to docs on this?

    --
    Post may contain irony: discontinue use if experiencing mood swings, nausea or elevated blood pressure.
  34. Re:moderaters please read by hey! · · Score: 2

    Who ever moderated this up should know that zope is a python application and php3 is a language just like python. There are similar applications to zope that use php3. But compairing an application to a language is just not done.

    Well, OK, what if I were to compare Zope Document Template Markup Language (DTML) to PHP3; would that take the knot out of your knickers?

    Seriously, if you'd like to share your knowledge of PHP based application servers, I'm all ears. There's something new every day, I'm just sharing what I learned last year when a lot of alternatives weren't there yet.

    --
    Post may contain irony: discontinue use if experiencing mood swings, nausea or elevated blood pressure.
  35. PHP wouldn't be my first choice by hey! · · Score: 4

    Don't get me wrong, I like PHP a lot, but it's greatest strength is for people who like to work in HTML and but make it smart. You can separate content and presentation in PHP, because it is a reasonably powerful language, but doing throughout your site means turning every page in your site into a program to emit the content in the desired language. Once you go there, you may as well consider other options for transforming content.

    There are lots of ways people have come up with for doing this kind of thing. You could do each translation as an XML document and use XSLT to convert into to a fully decorated HTML document. You could use java servlet changes to take a simple HTML translation and to add the usual banners, links and formatting. This would be nicer than transparently dynamic content because it could be cached by the user and downstream proxies.

    My current favorite method is Zope. In Zope, nested folders inherit all the characteristics of their parent folder, but allow you to override them. I use this to enforce stylistic uniformity between people maintaining content on my site, which is just a generalization of your problem.

    With zope, you could do your original site in German, and put it in a folder called "/foo"; then create an empty folder called "/foo/EN" which by "acquisition" starts by looking identical to "/foo" even though its blank. Then you start overriding the various text bits into English, and gradually, an English translation emerges. You could even write a little method to iterate over a bunch of links and append "/EN" to them.

    The main issue with zope is scalability, since everything is dynamically generated. I use squid as a reverse proxy to cut down on dynamic page generation overhead. This also turns out to be an easy method for multihoming zope, which is a requirement that I have.

    --
    Post may contain irony: discontinue use if experiencing mood swings, nausea or elevated blood pressure.
  36. perl journal #13 (spring 1999) p. 40 by bracher · · Score: 2

    you probably want to read this article on the real complexity involved in this sort of localization, and the pitfalls inherent in the sort of substitution you suggest.

    suffice it to say, it's a nightmare for anything beyond nice and friendly iso8859-1. the author uses an amusing anecdote about a simple localization of a simple feedback message into Chinese, Arabic, Russian and Italian...

    - mark

  37. Not a simple answer. by dlc · · Score: 2

    How you do it is going to depend a great deal on how you get your content.

    For example, is your content mainly user-contributed, or does is come from professional writers? Where the stuff comes from is going to make a big difference as to how it gets translated. User submitted stuff could probably be run through the babelfish in pseudo-real time (a daemon which queues requests, uses wget or lwp-request to have them translated, and then sticks them into a database). On the other hand, professionally submitted content should probably be treated a little better than simply babelfish; hire some native speakers as translators.

    The way I would do it is something like this (keep in mind that I am a Perl/C programmer specializing in Apache/mod_perl; some of the things I mention are inaccessible to PHP, like translation handlers):

    • Back end daemon that takes queued requests and passes them to babelfish to be translated, then they get stuck into the database (spearate thread, running as a cron or daemon). This daemon could be fed through a web-based form, for example, for a site with user-contributed content, and the data would be put into a staging area, where the daemon would read it, translate it, and then enter it into the live database.
    • Web pages would specify an initial two character language code (e.g., /en/foo/bar.html). When a page gets requested, a custom URI translation handler would strip out the initial two character name and keep that around (via r->notes) for future use. This could be done via a Perl module as a PerlTransHandler or a custom C translation handler, or, if you are/can not use perl or C, you could use mod_rewrite to splice off the initial 3 characters, strip off the leading '/', and stick the last 2 into the environment as, e.g., LANGUAGE (fetch them like you would and environment variable).
    • Alternatively, if you are using user authentication, you could fetch the preferred langauge from the users profile after the Authentication stage.
    • When the time comes to actually produce content, I would retrieve the langauge code from the notes table/environment and use it as part of a custom SQL statement. The database would be set up in such a way that content is broken up into as many different tables as possible, so that as much common (i.e., non langauge-specific) content could be used as possible. Using some sort of a multi-table join, I would bring the content all together before the templates are actually filled in, so that when the time comes to fill them in, you don't need to switch on the language; it's all already taken care of by the database.

    Retrieving the data and putting it onto the page would be the easy part... once you have the information stored in your database in the appropriate languages, that is. Designing the database so that you have as little unnecessary redundancy as you can while still ensuring that all of your content is available in all the required languages will definitely be a challenge, but it's an architecture problem, not a programming problem.

    Good luck. I, for one, would be interested in hearing how you make out and what track you decide to take.

    darren


    Cthulhu for President!
    --
    (darren)
  38. Re:why tear them apart at all? by thogard · · Score: 3

    I run one site that is accessed from all over the world and I can tell you that 90% aren't using a 4.0 broswer (its only about 1/3 and only thouse from well off countries) and looking at how long it takes to transfer the pages the connection between here and there is often way slower than 33k.

  39. How does Joe HTML-coder work with this? by zzzeek · · Score: 2

    If each page is a collection of bits and tags from three separate databases youd have to write tons of fancy administration code, and it sort of limits the ability to use other web technologies in tandem with your site, as well has have anyone who isnt an experienced programmer modify the content layout, since everything has to fit into a very restrictive and complicated architecture.

  40. xml style approach? by zzzeek · · Score: 5

    I havent used PHP before, but heres the basic idea not specific to any programming language..

    build your site in English (or your default language), assuming we are talking about regular static HTML files or some kind of .shtml perhaps. Write a filter for your webserver (I use java servlets mapped to *.html myself) that parses the path info of the request for something like "/german/foo/bar.html", i.e. the language identified in the beginning of the URI. Then within your HTML files, anywhere you want translated text, do it like this:

    <translation default="english" modulename="/foo/bar/mytext.html">
    This is my english text.
    </translation>

    Then within your filter servlet or apachemod or whatever, parse the file for these tags (caching schemes can be utilized for speed), and then based on the language encoded in the URL, dynamically replace the body text (if the URI-specified language is not the default) with the contents of a file <languagebase>/foo/bar/mytext.html. So you could have somehting like /web/translations/german/foo/bar/mytext.html, /web/translations/spanish/foo/bar/mytext.html, etc. If a translation file is not present then you just use the default text already in the document, so you can still launch new HTML pages even if a translation is not available yet.

    If you want to raise the bar of speed, dont use a dynamic filter, just write a perl script to regenerate an entire static site underneath "/german" "/spanish", etc. using the same scheme. Or you could even mix up the two approaches.

    If your site is not static HTML but some kind of database driven thing, you can still use a similar approach, it just means the filtering program has to be molded to fit your content-delivery environment.

  41. IIS, NT and Unicode by CoolAss · · Score: 2

    Thanks to built in support in NT for unicode, and heavy support in IIS/ASP for multi-language applications, it's cake.

    Basically, you set a "region code" at the begining of each page in ASP, and then simply supply your multi language content from a seperate location. Each page's content is determined by this region code.

    Wow... that takes about 2 lines of code, and works around the world perfectly. I bet I will get nothing but flame for this post because it uses an MS technology. Funny... I remember the days where people used things not because of who made it, but because of how well it did it's job.

    Memories...

  42. See Multilingual-capable Software Running! by pbryant · · Score: 2

    I've developed software that is to be used with multiple languages.

    You can see the software running at http://beta.infopop.net.

    It uses XSL and XML to render each page. Users are able to upload their own XSL with different languages if they wish. This is the 'template' approach mentioned by many other posters. Contrary to what has been said about XSL performance however, I've found it works well.

    Also, we use a database to store keyed messages in different languages. Each message is requested by key and looked up by the language of the site its being used on.

    The only problem? Getting Oracle to swallow UTF-8 characters. We're having a daemon of a time. If anyone has worked with Oracle, Java and Unicode I'd love to hear from you! peter @ infopop . com

  43. Problem solved by Hynman · · Score: 2

    Since I'm in my local universities Comp Sci Databases class, and just having tackled this problem in the work place, I can tell you the most
    poetic way is to use a database.
    You will need 4 simple tables:
    Lanuages: A one-field table of lanuages "English", "Deutch", etc...

    Strings: IdNum (as text, like a #define constant), and a field for your
    native language i.e. "Click Here"

    Finally a 3rd table, translations, containing the IdNum references IdNum.Strings, the
    language (references language.Languages) and finally the translated
    text.

    Of course, you mau want to use integers instead of strings but I like them this way. They are easier to read, etc. Your image problem is the same as the last table, but you store blobs instead of strings.

    And one function translate($id, $language)
    where a typical call would look like:
    translate ("Click Here", "English")
    would execute:
    select phrase from translated where lanugage='$language' and
    indum='$id';
    and return that.

    then you can make cool online forms for the translator to use =) Please let me know if you use this. (Just curious)(Comments also welcome)

    There are reasons for 3 tables, but I will not go into my exact mental model or DB thoery here.

  44. we do something similar by eries · · Score: 2

    For my site, we use content "templates" in PHP. Ours is a bit more complicated because we are database-backed, but the concept is the same. I just teach our designers to use a syntax like this:

    <? $this->show( "foo" ) ?>

    which gets include()ed in the context of an object that holds data associated with "foo."

    For your application, if you don't want to go with an OO design (which you should, IMHO), you could just do this. Define a variable like $lang which can be one of "en", "de", "fr" and then every time you have language-dependant content, just do:

    <? include( "foo-$lang.ihtml" ) ?>

    Just make sure every PHP file shares a common header that sets $lang appropriately. To do this the even easier way, just make that part of your auto-prepend setting in php3.ini.

    (If anyone is interested, we're thinking of open-sourcing the code for our site, which would make this OOD template system available for all db-backed sites. Let me know if this is something that there is an actual need for in the PHP world.)

    1. Re:we do something similar by eries · · Score: 2

      Better yet, have it be something like:

      <? // do nothing ?>
      welcome to my site
      <? // still do nothing ?>

      since PHP does not require the <? tags, and include()ed files get printed out by default (one of PHP's strongest features, imho)

  45. why tear them apart at all? by kootch · · Score: 2

    Lets go under a few assumptions:
    1. you don't want to skimp on the site design/graphic look
    2. 90% of the web is browsing on a 4.0 browser that supports CSS
    3. 90% is using a minimum of a 33.6 dialup.

    under these assumptions, I would like to say that you wouldn't have to skimp on anything to create a site that caters to a variety of languages with only 1 version of the code.

    CSS tied into something like PHP would be your answer plain and simple.

    Text could be dynamically swapped in depending on the selected language, keep your image buttons plain and layer the text above the image, etc and you're all good.

    Make a site in which you could easily substitute different "palettes" into the design... not designed differently, just coded differently. It's not as difficult as many people make it out to be. CSS is a much more powerful tool in web design than many people give it credit for (most use it just for text decoration). I suggest picking up a book on it.

  46. Re:European posters [OT] by WhyteRabbyt · · Score: 2

    Post Posting: Has any european poster any chance to get a rating higher than 1

    Yes. Repeatedly

    --
    free experimental electronic music netlabel at www.viablehybrid.com
  47. Re: On Creating Multilingual Web Sites? by skidt+og+kanel · · Score: 2

    I have experience with two different approaches for creating and maintaining multilingual web sites:

    • All-in-one:

      We use this approach for the SSLUG web site, and except for the lack of time, I see it as a success.

      The language choice is based on (in order of priority): Direct choice, Accept-Languages, and client domain.

      The actual implementation is done with SSI and plenty of conditional blocks. This makes the raw files look rather messy, but it is definitely beneficial to have the different language versions just next to each other.

    • One file per language:

      I have used this approach for my own web site, but I wouldn't call it a great success. It is difficult to keep track of translations when they are stored in different files and the result will typically be pages that are severely out of sync. The main benefit of using one file per language is that you can leave the language choice to the Content Negotiation/MultiViews feature in Apache.

    My advice for people starting on a multilingual web site is:

    • Edit the pages as multilingual files.
    • Publish the pages with one language per file.

    The multilingual files should be as close to plain HTML as possible, so something like

    ...
    <p>
    <en>Hello world
    <fr>Salut le monde
    <da>Hej verden
    </p>

    where you simply allow a sequence of language coded tags followed by a text in that language would probably be a good solution.

    /Jacob (who is going to change his own site soon)

    --
    Atheism is a non-prophet organisation.
  48. Heh. by dragonfly_blue · · Score: 2
    When I read the story, I saw the "$LANG" tag as a "SLANG" tag.... which actually might not be altogether useless in a multi-lingual site, if you use a Babel-Fish-esque algorhythm to translate parts of it.

    --
    Free music from Jack Merlot.
  49. one directory per language by osguzzler · · Score: 2

    I've got a bilingual site, 100% PHP; admittedly, what I'm going to say applies only to languages using the latin alphabet, but the simple fact of having one directory per language makes maintaining the site in both languages as easy as falling off a log, and nothing is duplicated. And I didn't even consider whether this was design or content. Buttons and everything are completely bilingual except where I chose deliberately not to make them so. The site's called www.mrquiz.org. If you take a look and you're still interested, I'll gladly send you the source. My mail address is on the site.

    --

    Adam:What kept you?
    God:Rome wasn't built in a day
  50. Your Answer - TEMPLATES by orrd · · Score: 2
    Basically what you're asking for is templates. PHP is designed to be an HTML-embedded language, but for people who want to separate the PHP from the HTML and text, you can use templates. Once you're using templates, it's easy to create scripts that choose a template for whichever language is currently selected.

    There are two routes you can go for using templates with PHP, FastTemplates and the PHP Base Library's ("PHPLIB") Template.

    So how are they different? FastTemplates was originally a Perl library that was ported to PHP. FastTemplates works well for Perl programs, but it's not ideal for PHP. Kristian Koehntopp wrote PHPLIB Template from the ground up as a pure PHP library to better take advantage of the capabilities of PHP. One advantage to Kristian's design is that it parses templates with preg_replace(), which is said to be faster than FastTemplate's reliance on ereg_replace(). Another advantage of PHPLIB Template is it allows dynamic blocks to be nested, unlike FastTemplates.

    For those reasons I prefer to use PHPLIB Template, but you do have a choice of the two libraries.

    It may be worth also mentioning the XML approach. XLT is an XML based format for templates, so you might want to look into that. PHP4 can parse XML, but there isn't code to specifically parse XLT as far as I know. XML or XLT are options if you need them, but they're probably more involved then you would need for most PHP projects that really just need templates.

    And for a nice tutorial on PHPLIB Template, look for my article on phpbuilder.net sometime soon (assuming the editor over there decides he wants to publish it). But even if my article doesn't get put online there, it is a very nice site for PHP info.

  51. Study the HTTP protocol! by paranoidfish · · Score: 2

    Everyone is looking at this as some kind of database or server side scripting problem, which is IMHO overdoing it a bit, and missing the point

    Firstly, every webserver and browser worth using already handles default language handling. It's an integral part of HTTP. A french user can be given the french version of the page transparently by the server (as their browser already knows what languages they want). This part of the specifications is there for a reason, use it! People don't want cookies, logins and unnecesary choices.

    If you use apache (the only server I have more than half a days experience with) search for content negotiation in the docs. It's actually set up by default (download the tar ball and look at the "It worked" page to see what I mean). In a nutshell, instead of a file foo.html, you have several files for each language, foo.html.en, foo.html.fr etc, and the server works out which one the user wants.

    From what I understand, IIS also handles this kind of thing well, if you want to use it.

    Getting the webserver to do this for you will almost certainly be quicker than anything you can write, as it is better integrated into the server. By serving static pages things get even quicker still. If you need Dynamic content, PHP, Perl, ASP or CGI scripts can all be programmed to use the default-language headers. If you want to generate dynamic content, choos ethe language this way rather than trying to create your own system (unless you already have a login system ala slash, in which case it wouldn't hurt to add it as an option).

    None of this information is hard to find. In fact, it's pretty hard to avoid it when looking at the Apache config files, so I don't know why everyone else has missed it.

    hth

  52. some experience by rotten_ · · Score: 2

    I've evaluated methods for implementing multi-lingual sites.

    First off you've got to determine what language the end user wants to view the site with. This can be done multiple ways: client hostname, browser version (language), server hostname (ie. japan.bigcorp.com), or by hitting a button or link on the page. Typically, I lean towards using the hostname (if ($HTTP_HOST) == "deutsch.bigcorp.com") { $lang = "de"; }) in combination with link set a cookie for the language choice. The 'guessing' of the desired language based on the browser or the client hostname doesn't work all that well because there are a lot of foreign nationals in the states that may want to view in their native language, and vice versa for overseas.

    Once we've determined the language choice, typically I have multiple tables in the database for the language options. I.e. headlines.en or headlines.de and you just append the language choice:
    <?
    if (!isset($lang)) {
    $lang="en";
    }
    $query = "SELECT headline, link FROM headlines.$lang";
    ?>

    So for pulling from a db that's pretty easy. When We want to present seperate pages or page layouts for the different languages (i.e. localized data, or product offerings, etc.), its not too hard to do either. You can do it with http header redirects:
    <?
    if ($lang == "de") {
    $location = "page.de.php3";
    } else {
    $location = "page.en.php3";
    }
    header("Location: $location");

    Or (this is what we usually do), when building a frameset point to the proper language page:redirects:

    <?
    if ($lang == "de") {
    $location = "page.de.php3";
    } else {
    $location = "page.en.php3";
    }?>
    <frame src="<? echo $location;?>">

    Images are the same sort of thing. We just would append the language code to the image (or easier, is to set the $image_path to like "/images.de", etc.) Then you just pull from the selected images path.

    If you are building images dynamically, and they have text embedded into them, you're going to have to hax0r it to output the language of your choice. However, I think that many of the Internet users from abroad understand that its really a English/American dominated network, and if everything isn't offered in their native language, they aren't going to get super pissed off. If you make an effort to get the key content in as many languages as you can, you'll be in good (better) shape.

    What becomes tricky (and what I don't have much experience with) is the non-roman based languages (ie. asian languages). We typically have to outsource this work to a translation company, and they tend to provide us with rasterized and vector-based files that we can then embed into our site. If you find a good translation company, they should have experience doing this sort of thing and probably can help you figure out the best methods to employ. The do this stuff for a living, and many of them are top notch.

    -k

  53. Have you looked into WML? by Habanero · · Score: 2

    It has structure in place for producing web documents in several languages. It's slow, but if you don't have much dynamic content, or you just want to know how it's implimented and maybe adopt for your PHP needs...

    http://www.engelschall.com/sw/wml/

  54. Well first of all.... by elegant7x · · Score: 3

    Don't use tables at all, use CSS layout, you'll find it makes it a lot easier to seperate content, people with 4.0+ browsers will see the cool stuff, and people with older browsers will see some old-school, highly readable HTML. Table layout is dead, long live CSS...

    Most HTML+CSS pages are readable right from the source, and would be easy to translate the file in whole.

    If I were in your place, I'd put every paragraph in a database table, with each row having the text in each language. That way, the translators could work on one paragraph at a time, and it would still be easy to update.

    Amber Yuan 2k A.D

    --

    "and dear god does this website suck now." -- CmdrTaco
  55. I've done this with MySQL and a perl script. by broken77 · · Score: 3

    I've done this on a recent project by storing all text into a MySQL database and writing a simple perl script to merge the text from the DB into the HTML files.

    It goes like this:

    1. Make a language resource table. Call it "RESOURCES". The columns are TEXT_ID, LANGUAGE, and TEXT_DATA.

    2. Make an html template directory. You will store all "raw" html files here. Beneath this directory, make subdirectories for each language (eng, frc, jpn, etc.)

    3. In the HTML, make references to the database values by some easily identifiable token string, and wrap this token string around the TEXT_ID value from the database for this text resource. If you want, you can put the english equivalent inside the token string, so you can read the templates in their raw form. E.g.:

      <p>##38471::Welcome to my multilingual website!##</p>
    4. Write a script (I chose perl) to:

      - Read the templates
      - For each language to translate: - Look for the existence of the token string (## in this case)
      - Take the resource ID, do a database lookup based on the language
      - Substitute the resource text for ##ID::string##
      - Save the modified html to the language subdirectory for this language.
      - End

    That should be it. Now, when an english-speaking person comes to your site (you'll have to ask them somehow of course), you can just redirect them to /path/templates/eng/file.html, and everything will work.

    This doesn't address the images, however. If you're using languages that use the western european character set (french, spanish, english, portuguese, german, italian, etc.), it will be easy. You'll be able to type your text directly into photoshop or the gimp or whatever and make your graphics. The next thing is to put a language token in these HTML templates that you've made for all images. Something like this:

    <img src="/images/##LANGUAGE##/button.gif">

    And in your language parser, write a one-line substitution that will substitute all instances of ##LANGUAGE## to the current language you're iterating through.

    If you're translating to languages with different character sets (double-byte languages such as chinese, korean, etc.), you'll need to create your graphics differently, but once their created, the storage of them is the same. One way to create them is to write a cgi that will run through the DB and print, in HTML all the text resources of a given language. If you have your browser set to the correct character set, you will see the foreign language characters correctly. You can then do a screenshot, and paste the screenshot into your graphic editor to make buttons or whatever out of.

    This approach has worked really well for us on two projects so far, and looking to be more projects soon. The advantage of making these HTML templates is that it greatly reduces the load and time it would take to build the pages if they were dynamically created from database lookups upon request. You just run the template generating script every time a change is made to the template, and voila.

    --

    I modded the Troll Investigation and I got

  56. Re:Appservers for translation by crazyj · · Score: 2
    I am told that Oracle's WebDB translates on the fly to something like 11 laguages.

    _________________________________________

  57. Content + Design == PHP by sudoer · · Score: 2
    What the hell do you expect?
    Do I separate design from content, or content from design?

    YOU ARE USING PHP!!!

    The reason that many of us would never use a
    language such as PHP is that PHP is based on
    being embeded in a .HTML file. ASP is the
    same design, same problem.

    If you are one dude, maintaining a few pages
    of one sight, and you don't have any knowledge
    of any other tools, then PHP might be the only
    tool you use for any job. If, on the other
    hand, you are maintaining many pages, or many
    sites, or have knowledge of other tools, this
    is a good time to start using those other
    tools.
    Perl with XML::XSL would be a good tool for a
    job like this, but don't expect to be
    seperating content from design in a language
    designed to be embeded into the content!

    <CHANT>
    The right tool for the Right Job
    The right tool for the Right Job
    The right tool for the Right Job
    The right tool for the Right Job
    </CHANT>

    When in doubt, use perl 8-}


    ---
    This is your life, good to the last drop.
    Doesn't get any better than this.
    --

    ---
    This is your life, good to the last drop.
    Doesn't get any better than this.
    This is your life,
  58. Don't forget mod_perl and CPAN by gozerhbe · · Score: 2

    If you are using mod_perl, jou might realize that the problem was already adressed and solved (well partly) by Apache::Language avaliable on a CPAN mirror near you. It is HTTP/1.1 compliant, works around buggy browsers and tries very hard to do all that transparently. Language storage can be in flat files, SQL databases and any other storage method you can access thru perl. Just my 0.02$ advice

  59. Usability has to come first by iljitsch · · Score: 2

    I see a lot of people go off into the deep end with all kinds of complicated databases and transformation tools. Maybe this works for very large projects with lots op people working on them.

    My experience is:

    If you want to be able to translate texts in a reasonably efficient manner, you should keep small texts for multiple languages together and separate them for large texts. For instance, I use a lot of scripts that generate forms. So I start every page with an array that contains words and phrases:

    if ($lang == "nl")
    $texts = array("name"=>"Naam", "age"=>"Leeftijd");
    else
    $texts = array("name"=>"Name", "age"=>"Age");

    (I include a header that figures out the user's language by the http accept language, user and site domains (none of which are foolproof) or authentication/cookie data for registered users.)

    Translating this is very simple: copy the array definition and change the phrases. You don't want to use a database for this, because you need to be able to look at the from and to languages at the same time. For large texts I include html files. Translating them isn't much of a problem, keeping several versions up to date is harder.

    Don't forget that many users speak more than one language. For instance, many users I talk to in Dutch on my site want to see links to content in both Dutch and English, so when they sign up they can choose between Dutch, English, Dutch + English and English + Dutch.