On Creating Multilingual Web Sites?
Jens asks: "I am designing an Intranet web application that needs to come out in multiple languages. I am using PHP to include common elements in include files, which makes things a lot easier. I want to avoid making each change three times (I have someone doing the translations, however). The question is: How do I tackle the multiple languages? Do I separate design from content, or content from design? Do I write
"<table><tr><td>$text[$lang]</td></tr></table>", keep the international text in include files, and then call the pages with appropriate parameters; or should I write "<?php nice_table("Dies ist der deutsche Text"); ?>" and keep three different files, but one include file with all the design elements? How do I handle buttons (i.e. graphics) with text on them?" (Read More)
Multi-language sites are tricky, but as long as there is some separation of page design and language elements, it shouldn't be too hard for the rest to fall into place. What determines whether you separate design-from-content or content-from-design depends on the plans for your implementation. What schemes work for those webmasters out there with already established multi-lingual sites?
I thought that the newer versions of apache could serve pages based on file extension. You could have a index.de, a index.en, a index.fi, and apache would do the right thing.
Unless you have a particular reason not to use a database, use that to store your localised strings. Then, separate as much of the look-and-feel stuff as you can and put it into separate includes. This makes it much easier to maintain the site, because:
Not any more so than any other scripting language...
You have http://xml.apache.org/ but I didn't really checked that out as I didn't found any literature about the xml.apache and XML combination on the PHP scene.
PHP has XML parsing support using the expat library that comes with apache. Docs and examples can be found at http://www.php.net/manual/ref.xml.php
- blackbird.
I created a web page describing the system, and some of the problems to be avoided: http://files.moo.ca/multilingual
You may want to check out Apache's MultiViews Option. It's a very handy, little used feature that lets your users chose their preferred language(s) once for all websites, right in their browser. If you enable this option, you have documents with the ISO language code appended to the filename, like index.html.en and index.html.it or banner.gif.en and banner.gif.it (for english and italian, respectively).
The links in the web pages are then without the trailing ISO code:
<IMG SRC=banner.gif>
Very handy indeed. Makes it extra easy to add support for more languages, even after project launch.
Not to nitpick, but why would you do it this way, when Apache (and HTTP) has a working, transparent, way to do what you're doing, without re-inventing the wheel...
Instead of making your templates, you just do this:
<A href=http://slashdot.org/><IMG src="http://images.slashdot.org/title.gif" width=275 height=72
border=0 alt="##Language Title##"></A>
(where ##Language Title## is set by the HTTP preferred-language header.. - this is the only part that Apache won't handle by itself..)
You just created "title.gif.de", "title.gif.fr", etc, and let Apache handle the rest?
You'd save yourself a whole lot of trouble setting up multiple DNS servers, and your templates would be a lot smaller..
look at http://www.linux.be this site is multilanguage and multi layout. in PHP4.02b with Mysql. (click on star-trek and under sea in the right menu)
Moderate this guy up ...
...
.php or .asp or .shtml - just build the .php page and cache it to an .html page. You can build a function called "update" which update's all of your static pages when changes are made. (use lynx).
I run a few bilingual sites and know that this stuff has to be built into your software. From my experience I suggest the following
- If the page is static or rarely changes (like a TOS or privacy statement) use duplicate HTML pages. It's impossible to predict how these will change, and each will have to be retranslated if they do change.
- If the page has to be updated on a regular basis, use your favorite scripting language to insert the language accordingly. This does not mean you'll need every page on your site to be
- For dynamic pages / applications, it is best to create a string library / object which will store the language specific code. Then have your script check the language parameter and generate the page accordingly. The language parameter could be stored in the user database, but I would recommend just using a cookie. The oportunity to change languages is just displayed somewhere on the generated pages.
- try to avoid gifs. Building them on the fly is a waste of the cpu, and maintaining a gif library is tiresome.
- for sites that are strictly bilingual (i.e. english and on other language) it is preferable at times to include both languages on the same page. An example would be a registration form - you don't want the user to look for the language they need.
/*
regarding PHP vs. whatever - Take a close look at php. You'll see it's makes things _really_ convenient, and not only for new programmers. I know many perl diehards who like to do things the hard way, and the great availability of perl modules and scripts does give it an advantage, but php was built with web programming in mind, and it shows.
*/
Many languages have richer grammar than English, and since the phrase/word is used as an index to the translation you often get into situations where it is impossible to do a correct translation... i.e. The word 'new' in English is used both in singular and plural, when gettext generates the .po file it will reference all places where the word 'new' is used and it doesn't allow you to split up the entry manually providing different translations for different contexts.
X/Open catgets and the Java Locale system does not have this weakness, but requires more maintenance.
because perl gives you more than one way to do it. and they're both wrong. (hey, you flamed first)
PHP isn't the root of all evil, and I've actually seen quite a bit of really elegantly written large php sites (500K+ codebases) that were maintainable and easily understandable.
----------------------------
At the top of the page call the secret function:
init_i18n();
and then use the super-secret _( macro. How it works is this. Lets say you have a function foo that's defined to be called foo (string baz, int bar) instead of calling foo('title', 3) you'd call foo(_('title'), 3). and echo "foo"; changes accordingly to echo _("foo");
Now you just need to follow the instructions to setup your strings files, but you now know how to do the php specific gettext wrapping.
You'll have to have the standard files containing language specific msgid->msgstr mappings, and it's helpful if you cook up a little script to grab every string and create a base msgid file if you're managing a large source tree. It's not the most intuitive or well documented procuedure unfortunately. At some point I'll have to write good documentation on how it all works, but hopefully this is at least somewhat useful.
----------------------------
I've been working on redoing my home page to use themes. Each section of text appears in a table, decorated with graphics to look like a window in a variety of OSes (Windows, Mac OS, Mac OS X, UNIX, and I'm working on more). On the right is a navigation bar, also designed to look like a window. The various vidgets (close boxes, minimize/maximize buttons, etc.) are completely non-functional.
For any page on the site, you can click a link and change what theme you're viewing the page in. While my focus is on keeping the content the same and changing the surrounding aesthetics, while your focus is on keeping the surrounding aesthetics and changing the content, the same concepts still apply.
My site is based on a CGI script written in perl called page.pl, which takes two arguments (passed in the QUERY_STRING): the name of the page to be viewed, and the theme to view it in. Eventually the theme selection will be moved into a cookie; you might similarly want to move language selection into a cookie.
Unfortunately, my site isn't ready for prime time yet, and is currently hosted only on my 56k dialup connection (definitely unable to stand up to the Slashdot Effect). However, the previous version of my site works in a similar fashion, offering different layouts depending on what browser you're using. Much simpler, but it should give you some idea as to what I'm talking about. The main home page is done with shtml, rather than CGI, and lets you use a QUERY_STRING to override the automatic browser detection. Compare:
http://www.inficad.com/~phroggy/?4
http://www.inficad.com/~phroggy/?3
http://www.inficad.com/~phroggy/?2
The three versions are designed for Netscape 4.x, Netscape 3.x and all other browsers, respectively (MSIE pretends to be Netscape). The difference between versions 3 and 4 is subtle; look for the drop-shadow - Netscape 3 doesn't support background table graphics, so I designed version 3 not to use them.
If you're seriously interested in seeing my new site with the themes feel free to e-mail me according to the instructions in my sig, and I'll give you the URL.
$x='S24;r)>63/* h@<5+oZ)32"5cz';$me='phroggy'x$];
$x=~y+ -xz+\0-Tx+;print$_^chop$me for split'',$x;
Why not just write in English, and have a CGI that uses Babelfish to translate when a user logs in indicating some other language? :-)
If you're not part of the solution, you're part of the precipitate.
Don't forget: XML was designed from the ground up for multilingual support, via the xml:lang attribute. Use that, not some invented tag!!!
Matt. Want XML + Apache + Stylesheets? Get AxKit.
Actually, you can cut out steps 2 and 3 in that. The browser passes the language setting to the server, which the server can handle using multiviews. That way, you can get the browser and server to take that part of the workload off your back.
It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
My thoughts would be to have anything not explicitly related to the actual content stored seperately from content-arranging tags (such as tables, paragraphs, etc.). This lets you maximise reusability, and minimise effort.
ie:
Database 1 -> Outer Shell Template
Database 2 -> Content-Specific Formatting Template
Database 3 -> Actual Text, in 1+ records. This should contain NO tags, whatsoever. That's all done in #2.
It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
Two of the simplest but most widespread problems with viewing foreign language web sites, particularly in non-Roman character sets, are when the language is not specified in a metatag or the font face *is* specified in HTML.
The first problem has been described elsewhere on this page, but deserves reiteration: specifying the encoding language in HTML can allow the correct font, language script and character set to be used automatically by the browser without scripting on the server end. On my Mac, for example, this would allow the Haaretz newspaper site to come up automatically in Hebrew (having chosen the correct language script and font for me). I would not have to manually choose the settings. Good examples of where this is not done but should be are the Yahoo! Asian sites. The only way my browser knows I'm looking at Korean text on Yahoo! Korea is because I choose it; the HTML pages are not encoded to tell my browser this.
Scripting to determine what preferred language is chosen in the browser is, in my opinion, the hallmark of a great multi-language site. The second hallmark is a link on every page to switch to another language at will.
A similar item happens in bad email programs: they do not specify the character encoding in the header. One of the nicest things about a great email program is that when I get, say, a Japanese-encoded email, even if the words themselves are English (by using the Roman characters built into the Japanese encoding group), it kicks in the Japanese character entry and editing system automatically. It recognizes it, as it should.
Intentionally *not* specifying font face is equally important. If I want to view web site in Hebrew or Arabic, a ridiculous number of the sites require that I download the particular font they have specified in their HTML. This is preposterous. Language encoding, used properly, might mean never having to download another font again.
As for graphics in lieu of foreign fonts: avoid it any way you can. It makes copying and pasting near-impossible. (I once had to do it this way: I saved the GIF image to my hard drive, converted it to TIFF and ran it through and optical character recognition program, then cleaned it up manually. A huge waste of my time).
Wordnik, a dictionary project which aims to collect
Having worked with multi-lingual Web sites, I must precise a few points.
First you need to remember that a Web site must be designed to work easilly with existing tools you use to develop/maintain Web site. Altough Javascript seem nice, the maintenance seem to me like a nightmare.
Having one directory per language for static pages seem so much easier to maintain for content change. For design change it's more of a pain.
Ask yourself, does your organisation is more likely to change contents or design? Generally I would think in more organisations that content is more likely to change than design. Do your planning accordingly.
PS: For Canadian websites, the Canadian federal governement as a directive that a website MUST be bilingual (English/French) or not to exist at all.
Please ...
:)
PHP isn't limited to embedding in HTML. Sure, it works great WITH html, but isn't limited to it. I've written whole applications that deal with nothing but text and interface with a template system (also PHP) to insert the content into a template.
One extremely nice feature of PHP that a lot of Perl users overlook is mod_php. As an Apache module, PHP scripts have a smaller memory footprint, a faster startup and parsing time and lower server load than Perl. One could use mod_perl, but the number of servers offering mod_php far outnumbers those offering mod_perl, due to various issues with unscoped variables.
OTOH, Perl's text capabilities are legendary.
Also, someone noted that you had to put PHP in every page in the site to accomplish what this guy wants. Not true. I have coded sites with only one real page and many 'virtual' pages. I use a combination of ForceType and globbing PATH_INFO to make it look as if you're navigating a bunch of different real pages, but you're not.
The PHP does all the work of talking to the db and seeing what pages should show up where and getting the correct content.
Now, no more FUD!
Those looking for multilingual solutions for sites might want to look into making some use of Apache's content negotiation. See http://www.apache.org/docs/cont ent-negotiation.html for more information.
The approach we are using is to separate both language and style from the content by putting all the text into database and building object oriented set of libraries which automatically select ui-style and language using sessions.
example:
$page = new Page() # New page object is initialized; user, language and style are identified
$page->title('7652'); # Title phrase number 7652 is printed on users language and style to page object
$page->paragraph('7898'); # Chunk of text is printed
$page->showpage(); # Page is shown to user in selected style
The phrase-table is then indexed by phrase-numbers and languages and modification dates are kept up-to-date for all the phrases. All the content on the pages can be modified on the fly by using online-editors, which gives translators and content creators access to phrase-table. Editors also make the phrase numbering totally transparent. The phrase table is distributed to multiple development machines and synchronized over CVS using special synchronization tools. The approach is very fast on Apache+DB2+Linux+embPerl platform, but caching into static perl-hashes in Apache-registry is also possible to make it even faster.
You can go to our site (Europes first and only virtual hospital providing healthcare on the net) Atuline.COM to try it out. Go to demo and inside the virtual hospital try to change language and user interface look'n'feel from settings/interface. Currently only 5 languages / 3 looks are supported but there is more to come..
-- Joonas
Vaadin - the best open source framework for building web applications in Java - no plug
I read the documents on the squid homepage and it says that it caches the objects as they pass through it, which speeds up static content by offloading the web server. However, squid cannot cache dynamic content (obviously -- that's why it's called *dynamic*). When a request to a cgi script is made, squid passes it on to the web server. So, I don't see how it can possibly speed up the dynamic content generation.
Also, why would you want to run squid at all, even for static content? Would it not make more sence to just use a bigger box for Apache? Or run 2 Apache boxes clustered instead of Apache + squid?
___
___
If you think big enough, you'll never have to do it.
Microsoft kind of had a monopoly as of late WRT performance-wise USABLE XSLT parsers. Now this is about to change with Apache's Xalan-C. IT should be quite fast. See Apache-XML's site.
People are the problem, and that's precisely my point. I don't want to trust any dumb luser around. I'm just not suicidal. The same way that my servers hopefully don't go down as often as stupid lusers remove important files "by mistake". Thank you for your attention.
I basically set a cookie that contains a variable indicating the desired language (default is EN if no cookie). My PHP3 scripts look for files similar to the following index_$lang.php3 where $lang is a two-digit language indicator. If the deisred language page is found, it is displayed. If not, the english version is shown.
For menus, I abstracted the localized portion and substituted variables for all of the strings. The PHP scripts then look for a menu_$lang file to include.
Also, I use the ISO entity descriptions for all of the accented characters so I don't have to worry (as much) about fonts on the client.
* As is generally the case, my opinions do not reflect those of my employer.
To do translations from English to German or to italian or French is not a trivial task, but they are so much more easier than automate and put up websites that have to carry both English (or other romanized) language, PLUS the double-byte encoded languages such as Arabic, Chinese, Korean (using Unicode or other encoding methods)...
So... if anyone have done the above that I've mentioned, I'd appreciate if someone can share a clue or two.
Thanks in advance.
Muchas Gracias, Señor Edward Snowden !
I have recently run across something similar. I want to convert a a telnet-based bbs (called M-Net) to a multilingual site. The problem is there we are limited to merely what VT100 will let us output. In the end, we decided to translate the documents and use basic locale and i18n features already in the OS to help. It is not quite the same but I hope it helps.
The Irish-language curse engine (An tInneal Mallachtaí)!
Jefferson City, Missouri's Lincoln University offers this amusing little interactive time sponge here and The Register explains it here.
I see even classic Slashdot is now pretty much unusable on dial up anymore.
<?php
if ($language == "spanish") {
include spanish_text.php3;
}
elsif ($language == "german") {
include german_text.php3;
}
else {
include english_text.php3;
}
?>
(...HTML setting up your page and whatnot...)
<TITLE><?php echo $page_title ?></TABLE>
<?php echo $welcome_text; ?>
The file english_text.php3 would contain stuff like:
<?php
$page_title = "Superduper Website!";
$welcome_text = "Welcome to Superduper Website";
?>
german_text.php3 would contain the exact same variable declarations only the values would be in german. This technique is messy but flexible. YOU could also apply this to images (i.e. <IMG SRC="<?php echo $submit_button_image ?>">).
ToiletDuk (58% Slashdot Pure)
Supposedly NS6 has "built-in" translations. If a page is in german and you are set for english, netscape will run the webpage through a transulator automagically and then spit back to you in your native tounge. At least that's what I heard.
Europeans get moderation points too. I usually spend them during Swedish daytime (yes, I should be working...).
Languages like perl have enough possibilities to compile the graphics from within the language. PHP can do that also (but why go dynamic and tease your little server?), and with very little difficulty.
Actually, programming your graphics gives you great flexibility, methink.
Good luck,
Jeroen
Post Posting: Has any european poster any chance to get a rating higher than 1 (unless posting in the middle of the night?). Probably, this posting proves no again.
Writing about music is like dancing about words - FZ
Well, that is nice to know.
Some time ago I posted an elaborate piece about
a neuroscience related topic, AFAIR the bandwith of a neuron. I got one enthousiastic reaction in my mailbox, but no moderation at all. Makes one wonder if there is use in posting at all.
However, European stuff is read also, apperantly.
Thanx,
Jeroen
Writing about music is like dancing about words - FZ
You might want to check out an effort to create a template engine for PHP written as PHP extension. See http://va.php.net/~andrei.
-Andrei
- total separation of content and code (to the extent that the content files don't even have conditionals, all they have is a way to give names to different bits and specify templating relationships)
- dynamic "compilation" of pages into data structures, with cacheing of this intermediate form, and
- the fundamental unit is a content page that calls code, not a code page that outputs content.
the whole thing is perl-based, on top of mod_perl and speaking directly to apache's perl API (i.e not using CGI.pm or Apache::Registry).send me an e-mail if you're interested; it's in a very raw format now and not near release, but i'll probably be able to release it as open source. if not, well, at least we can talk about it :)
PHP supports the GD library, which can happily superimpose text onto a PNG (or GIF depending on the version of libgd PHP is linked with) graphic. So you can just make a few button images (selected, deselected, active) and use PHP+GD to drop the correct text for whatever language you are using on top.
See this link for a good example to build on.
--
Separate layout from the content and layout your pages using fixed symbolic references.
Internationalization then becomes a two phase process. You can have your visitors select from a drop-down list of languages (that you're supporting) as to what language they want to see.
Then use JSP, ASP (boo-hiss) or ModPerl to fetch from a dictionary containing all of the symbolically described tags you have, a dictionary containing the language specific content.
The script can shove out the page containing the final page content. (You'll find StyleSheets to be extremely useful for controling positioning and typefaces.)
Charles-A.
MSBPodcast.com The opinions expressed here are my own. If you don't like 'em... Think up your own stuff.
The PHP4 function _(x) is a synonym for gettext(x), so the code ends up being very readable for the maintainers: _('Permission denied.')
--
Brent J. Nordquist N0BJN
I has a similar problem recently, in designing a press releases site (http://press.ducati.com) in italian and english.
What I did was use templates, and store the language preference in a session variable (I have only registered users, so the lang pref is set according to the user preferences, and can be changed with a link on each page, overriding the session variable).
Once you do that, you just have to check the variable when declaring templates. Using the template class from phplib (http://phplib.netuse.de) you would do:
$tpl->set_file(array(
"mypage" => $lang . "pagetemplate.html",
"myblock" => $lang . "pagetemplate.html"
));
That's it! Easy to do, easy to maintain.
If you're using php, a great template class is the one above, waiting for the release of the php templating module, which is in the works.
If you're using (ehm) asp, you canuse my jscript port of the fasttemplate class (http://www.sumatrasolutions.com/asptemplate)
I am currently responsible for globalizing a web project at my company. This is the first time I have had to deal with this sort of thing, and I have learned much. Here are some tips:
.asp files. Rather than translating every one of these files into who knows how many languages, we are creating a string resource that can be queried by a server object. Someone recommended that you look at the GNU gettext, which I second. If you can find standards that already exist, I recommend you use them.
1) Whether you decide to dynamically fetch strings when the page is processed, or have multiple versions of the HTML isn't as important as deciding your strategy in the first place. Make the decision and stick with it long before you start work on the actual project. Having to implement a globalization strategy for a site that has already been programmed can be difficult if not impossible. Heed the warning about separating content from code, but be sure you know how that is going to play into your strategy.
2) Language is only part of the problem. You need to consider sort order, for example, if you are presenting sorted lists. You need to consider date and time formats and also number formats. Some countries swap the comma and the decimal point, for example. If you are planning on selling something, then multiple currency support would be useful.
3) You need to support multiple code pages. It would be neat if you could just use UTF-8, except there is no widely available Unicode font that contains all the glyphs needed for some languages. It is poor globalization design to only support the latin codepage assuming that you'll never need Korean, for example.
4) Make sure you avoid colloquialisms and other culture-centric ideas on your page. Keep it simple and as icon-free as possible. Where you have icons that contain text, keep a copy with the text layer separate from any background elements. Gimp has some features that help when localizing these bitmaps. But it's best to just avoid them.
The project I am heading up contains several hundred
Someone else recommended an XML approach. Again, this is a good idea to consider.
Don't try and re-engineer some existing code to make it global. I can't emphasize that enough. Start global from the ground up. Try to find the most intuitive means of doing so.
That way it would take the work of translating and put it on the client end. The software could try to deduce the language from the URL and for images it could use the ALT tag to come up with a translation for the image or it might be possible to use a combination image processing/ocr algorithm to try to distill the text from the graphic on the button.
It would be great if the software could, like i said, guess on what the most likely language was and similarly allow the user to choose to turn the translation on or off and to what language.
I mean, the technology is there, as evidenced by babelfish, what would it take to turn it into a client side program to transparently translate webpages into the viewer's native tongue?
This way, not only would pages that have to budget to translate be available to everyone, but every page would be available to everyone, without the added step of going to babelfish (i realize that its not too much extra work to do that, but hey, people are lazy...).
- Start with the current
/. codebase as a learning resource. I recommend this even though the Slash code is in Perl and somewhat complex because of how extensively /. relies on the MySQL database and user preferences in the generation of the page. Once you understand the concepts, you can implement your own user preferences related to language. - Use templates stored in the database to build the page from a common code engine. For example, if my preference is "language=English" and another user's preference is "language=Espanol", then I can build a MySQL select statement that looks something like this: where the items in brackets {} are replaced by the web server code.
- Design templates to be multi-lingual. For example, the logo for
/. is a graphic containing the English words "Slashdot News for Nerds. Stuff that Matters", and the image is part of a hypertext link. So the HTML for the image is like this:So what would a multilingual version look like? Well, for example, assume that the Spanish (Espanol) version should go to "http://www.Spanish.Slashdot.Org". Here's the same HTML except using token strings for the language specific items:Then you write the PHP code to replace the tokens (the part between the number sign pairs (##_____##) with the user's preferences (which was probably stored in a client side cookie) [Note: this example implies that you have images for the different languages]
If you understand these techniques, you are well on your way to creating a multi-lingual site in my book....Open Source isn't the only answer -- but it's almost always a better value than the alternatives...
They're all gonna learn English. Oh yes, learn English they must. Cause nearly all existing code and software is in...(drumroll)...English! And there's nothing the computer industry/world likes better than backwards compatibility and tradition.
So......
I believe the link was http://www.learnenglishfast.com/
Blar.
I think that, in any case, you need to make sure that any repeated design element only occurs once so you never need to make the same change in multiple places to redign the site. THis seems sort of obvious, but it's hard to be more specific without more details about your project. I have always tended (when using SSI) to have the "actual" html file contain the content and call the design up from a single template -- and have been very happy with this approach. You need to decide whether you are adding another decision layer to this (e.g. from content+design to content+design+ language -- or just stay with content+design and make seperate pages for seperate languages)
======
Webmasters: get a Free Palm Pilot for referring 25 signups (Web-based games).
========
<sig>Guvf vf abg n frperg zrffntr
For the question about buttons, you life would probably be easier if you used wordless icons accompanied by text -- I personally don't think it makes much sense to use GIF/JPEG/PNG to represent text in any case. But if you really need to use graphics to represent text, make it part of the variable subsitution -- img src="$language$button_name.gif"
========
<sig>Guvf vf abg n frperg zrffntr
Accesibility should be a prime concern for every site. What on earth makes you think blind people have only a very limited range of interests? Do you think it's fine web sites use plugins that are only available for Windows users, and only a site like Slashdot should concern itself with plugins for Linux? Or would you agree people using Linux have more interests than "news for nerds"?
-- Abigail
You don't. Try to imagine you are blind and need a speech interface, or that you have bad eye sight and need 48pt fonts to read something, and then be faced with a site that uses needless graphics for navigation, when written words would have done as well, if not better.
-- Abigail
Yes, php has a basic concept of objects, not incredibly powerful, but good for grouping together data and operations. Better yet, php can actually create and work with Java objects. Or better still, php can be embedded in a java servlet engine so that you can use php as a replacement for JSP in your servlets :)
Keep in mind, there are a lot of reasons to use one thing or another. If people don't want to use ASP because it's an MS product, who cares? The most vocal group are always the idiots. While it's nice to do this so easily, you need to look at other things here. For example, what would it cost for this person to switch to an NT/ASP solution? Consider a hardware upgrade, software licensing, and most imporantly, the time required to learn a new platform *and* a new language, this is not an economical choice and may not be worth it to be able to solve the problem in two lines of code rather than 50.
Also note that with php, you can do this with equally few lines of code at all. Using gettext support, you simply create your translations and store them elsewhere, then by prefixing your output with a _ it will be translated.
There are always multiple ways to solve problems, and each one has advantages and disadvantages. Don't think that people avoid IIS/ASP just because they hate MS. Those who do don't really matter in the grand scheme of things.
The thing to remember here is that PHP vs Zope is not a decision you will ever have to make. Zope is not a language, it's an application server. Python vs PHP is a comparison. If you want to talk zope, you have to look at the php equivalent, midgard (http://www.midgard-project.org/). Not that I have anything against zope or python, they are great tools, I just think for the task here php/midgard are much better suited. Part of this is because I think PHP has a quicker learning curve and let's face it, there's no sense in mastering a language just to create a multilingual site... Second, Midgard is much more suited to this type of thing. I suggest Zope to people who want to program a website, and Midgard to people who want to manage website content. Midgard is much more focused on content and using inherited styles and layouts, plus giving multi user web based access to manage content and layout. For something like this where you have the same layout and style just with different content, I think Midgard will really do this with less hassle and effort.
PHP4 can be built with gettext support. gettext is a GNU library for internationalizing programs. PHP's support is undocumented currently, so you'd have to check out the code to see what it does (in ext/gettext), but it might be worth looking into.
Gettext info and manuals can be found at http://www.gnu.org/software/gettext/
I have common files for page headers and footers, and seperate language files for the body. The body files contain html for language-specific graphics (that contain text).
I then use a perl script to combine the header + body + footer together.
This is rather painful tho, as there is a lot of redundancy, and page changes are quite painful.
"First off you've got to determine what language the end user wants to view the site with. This can be done multiple ways: client hostname, browser version (language), server hostname (ie. japan.bigcorp.com), or by hitting a button or link on the page."
client/server hostname is a bad idea. I'm dutch but I live in sweden and when I visit www.lycos.com, I'm presented with the swedish version of this site. My browser settings are ignored (favors dutch above english and english above swedish). As a consequence I don't use lycos anymore. There is of course the alternative link which I can click to get the dutch or english version but that's too much trouble for me.
The best way I think is to just ask the user what language he/she prefers. Even if english is not your native language, it sometimes is the best option since the english version of a site is updated more often.
Jilles
Perhaps the icons are logical, but when given mystery meat navigation, she is most likely confused. That's the thing with icons, they have to be targeted, and they have to mean the same thing to all people. This fellow is talking about an international effort, and therefore must not only translate, but localise his content.
Localisation involves matching culture to the point that an add doesn't give the wrong impression.
Lowmag.net
It is unfortunate that they should pick this name. The name is currently used by the WAP Forum in their Wireless markup language.
I figure they are aware of this.
Lowmag.net
It would make a lighter light mode... I think I'll get the slash code and make up some patches. (dusts off perl books) Or maybe I'll try out the code to some of the php-based systems. I'd actually like to see this all run from XML+XSLT on the backend to generate a better experience based on client. This would rock.
Lowmag.net
That's css2, and it isn't really necessary. I rarely find a use for tables except for placing a table of data on a page. For layouts, I set borders in css, and keep it simple. You could, however, use a span or division to achieve a modest sidebar. The tr-td combinations in slashdot's layout are superflous if you use a nested div tag. You define two css classes:
div.top { margin-left:div.sub { margin-left: 1em;} #for every nested thread
The result is a set of nested articles, much like the slash code.
<div class="top">top thread text
<div class="sub">
second thread text
</div>
<div class="sub">
third, parallel thread text
</div>
</div>
no table/tr/td combinations, just divisions. the slasdot front page would be defined as 4 divs: A top for the ads and logo, one for the left column, one for stories, and one for the right. Without css, everything would fall straight down the page, and as such, be available in next generation WAP devices.
Lowmag.net
Damn straight! that may very well be out of my personal design theory! Tables are logical elements for organising data, not a tool to lay out your webpage! I couldn't put this better myself! Maybe if people paid attention to content, the internet would be a far better place. I suppose I'll go back to dreaming.
PS: I tried to come up with an alternate layout for /., and made it in css. not only did it look sweet, but it was changable from a stylesheet! whoot!
Lowmag.net
Wo xue ci zhong wen duo tien. wo zhen xi huan. Ni bu xi huan zong wen? ching si, geng men!!
ReadThe ReflectionEngine, a cyberpunk style n
RXML, used by the Roxen WebServer has all the features of WML, but roxen translates that almost in realtime. (and instead of WML, RXML is now fully XML compliant).
Roxen also can create buttons on the fly, and contrary to what you say about gfont, it has no problem with different button sizes.
greetings, eMBee.
--
Gnu is Not Unix / Linux Is Not UniX
Roxen has an extensive SSI language (RXML) that allows you to build powerfull sites.
eg. buttons can be created by the server, with the text you specify (like <gtext>click here</gtext>), you can also use a backgroundimage and thus putting any kind of text on top of any image.
of course seperation of content and design is key.
in order to achive 100% seperation i wrote an xml template module for roxen, that allows me to specify the content in XML, and then apply a template to show it.
roxen also makes it easy to check for the clients default language, so you could select the language based on that easely.
the most important thing is, it all looks like html to the webauthor. there is no programming involved.
sure, php is nice, but if you don't know about programming languages, then it will look very confusing, also it's very easy to screw up, because the non-programmer doesn't see that there is a ; or , missing, that creates a syntax-error in the php code.
greetings, eMBee.
--
Gnu is Not Unix / Linux Is Not UniX
Think of an XML item as a collection of labels and values for each label. If what you have is a HOME_PAGE item, it can contain values for WELCOME_ENGLISH, WELCOME_FRENCH, and WELCOME_AMERICAN. The server knows which language it should use, and you have your XML-based server program select the appropriate text for that language.
Yes, XML entries can contain sub-categories. So that might actually be implemented with the actual texts buried within several levels of labeled categories. (I forget the actual XML terminology, but basically everything is labeled, and new data structures can be defined which contain other data structures)
We've been using the method of storing strings that need to be translated into arrays successfully with some sites. An example here would be StoneJobs.com.
There strings used in either the layout or applications are handled in the following way:
$welcome[0] = "Tervetuloa!";
$welcome[1] = "Welcome!";
echo $welcome[$lang];
We're using Midgard (the PHP-based application server) there, so content is easily separated into languages by using an extra database field.
There have been plans for creating a better system for handling localization within Midgard, but we're still waiting for ideas on that.
--
Midgard Project - Open Source CMS
Not only when it comes to translations, but context-layout in general. PHP is nice and dandy, but for somehow it is really hard to seperate content and layout, or content and language support/translations for that matter.
You might want to check out Midgard then. It is a Web application server that uses PHP as its scripting language. While it isn't that much better in internationalization, it at least has good support for separating layout, structure and content into different components.
--
Midgard Project - Open Source CMS
You're a genius. I say so because we're doing the same thing :-).
I am consulting at a "Top 50 Worldwide" website. We currently have seven international versions of the site, and almost three hundred "private label" versions. By using the tagged document/swap in messages from database approach, we gain a lot of flexibility in what content shows up on a page. Also, we can do this at the time we build out the site, not when the pages are viewed.
We have the notion of a channel, that has a language and country associated with it. Every bit of text on a page is replaced in the ASP files with a tag delimiter. Each of these "messages" is substituted when the page is built out by a Perl script. You can refer to a "message" by a number, or a human readable text alias. We also include in functions from a library into each page as needed. Using this message approach, we can do translated text, or conditional logic per channel inside ASP. We built an HTML GUI to allow people from around the world to log in and translate messages. After translation, each site is built out and the correct messages appear when viewed by users.
If we tried to swap in all this localized text on the fly, we'd be dead. We do too much volume. Make your pages flat html that gets built out every few hours whenever possible. If you absolutely need dynamic things on a page, minimize them and make sure that your SQL queries rip. Cache lookups in server level variables (Application("whatever") in ASP) and build a mechanism to flush the cache when you want changes to take effect. Make as many things static as you can, and resolve them at build time as opposed to run time. Make flat HTML when you can, and only make dynamic pages when you need to.
Though in our case we use ASP and a Perl script to swap in the messages, this technique could be applied to any dynamic page server and scripting language, as long as its source code is in text. You could even do JSP's.
Just a validation of your post. It is a good system, it's extensible, and it helps performance.
-AI suggest Zope to people who want to program a website, and Midgard to people who want to manage website content.
;-)
It sounds like you think Midgard is better
Well, the whole point of all these systems is to extend HTML with some kind of dynamic behavior; if you define this as "programming", then they all require some kind of programming. The basic scripting language of Zope is DHTML; it works like practically every other system like this -- you write HTML and decorate it with script directives -- just like PHP. If anything, DHTML is simpler than PHP, albeit less powerful. You do have the power of Python behind the system if you need it, but many if not most people will use Zope's capabilities blissfully unaware that Python even exists.
This midgard project looks interesting, but the screenshots were not very informative (except to say the developers perhaps have a little more aesthetic sense than most). I'd be interested in your unvarnished opinions on midgard, and I'll give you my own unvarnished opinion of Zope.
First the good stuff.
Zope's first great strength is that it is an object publishing system. It allows you to reuse, not only display templates, but database queries, and useful bits of logic (e.g. converting lists of URLs into horizontal or vertical link menus). These things have clear usefulness for database oriented applications, but they also have a great deal of usefulness for things which are normally done through static inclusion. For example, I can define a table of links, and then define logic to transform that table into horizontal or vertical menus. Of course this is easy to do in PHP, but the neat thing about Zope is that it is simple and natural for a document to inherit the list of links from a higher folder; to override the list of links but use them in the same way as the document template; to reuse the same transformation logic for different purpose. Of course, you can do all this stuff using low level scripting systems, but its a lot of work to make it happen, whereas in Zope it is simple, natural and automatic.
For database work, it manage persistent connections very nicely, and by providing objects for database connections, SQL statements, and transformation into various presentations, it allows you to plug all these things together tinker-toy fashion.
Because of this reusability, there's a lot of terrific stuff that's already been done than you can simply grab and plug into your website, such as a slashdot like discussion forum (minus moderation, alas).
Now the bad stuff.
The documentation is pathetic. It is obviously written by people who know the internals of the system and have since the beginnning of time -- they don't really remember what it is like not to know. In fact, the user guides sometimes make use of undocumented internals. Of course you have the source code, but that's really a last resort.
I often say there are two kinds of people who read documentation. One are hands on people who like to see examples and generalize from them, and others are abstract folks who like to see principles and specialize from them. The documentation doesn't really serve either. The step by step examples are somewhat obsolete and if you follow them they don't always work with the latest stuff. Sometimes because the method documented has been somewhat superceded by newer methods which are explained in vague, handwaving manner which will infuriate abstract thinkers and do nothing for hands on people.
This truly awful documentation will be a show-stopper for many people. YOu have to beat your head against the system for a while
Speed is not exactly stellar. Python has time and time again shown its ability to handle incredibly complex logic; but it is SLOOOW. Zope is surprisingly fast considering that it is written in Python and practically every page is parsed with some dynamic behavior. There are some moderate volume techie sites like Bruce Peren's technocrat site running Zope, but you aren't going to scale to Slashdot or Amazon's scale.
That said, I chose Zope because it works very well, once you learn your way around it. I don't know half of it, and there's no way of even learning half of what it can do without going to the source code, unfortunately, but what is documented is great.
Post may contain irony: discontinue use if experiencing mood swings, nausea or elevated blood pressure.
Who ever moderated this up should know that zope is a python application and php3 is a language just like python. There are similar applications to zope that use php3. But compairing an application to a language is just not done.
Well, OK, what if I were to compare Zope Document Template Markup Language (DTML) to PHP3; would that take the knot out of your knickers?
Seriously, if you'd like to share your knowledge of PHP based application servers, I'm all ears. There's something new every day, I'm just sharing what I learned last year when a lot of alternatives weren't there yet.
Post may contain irony: discontinue use if experiencing mood swings, nausea or elevated blood pressure.
Don't get me wrong, I like PHP a lot, but it's greatest strength is for people who like to work in HTML and but make it smart. You can separate content and presentation in PHP, because it is a reasonably powerful language, but doing throughout your site means turning every page in your site into a program to emit the content in the desired language. Once you go there, you may as well consider other options for transforming content.
There are lots of ways people have come up with for doing this kind of thing. You could do each translation as an XML document and use XSLT to convert into to a fully decorated HTML document. You could use java servlet changes to take a simple HTML translation and to add the usual banners, links and formatting. This would be nicer than transparently dynamic content because it could be cached by the user and downstream proxies.
My current favorite method is Zope. In Zope, nested folders inherit all the characteristics of their parent folder, but allow you to override them. I use this to enforce stylistic uniformity between people maintaining content on my site, which is just a generalization of your problem.
With zope, you could do your original site in German, and put it in a folder called "/foo"; then create an empty folder called "/foo/EN" which by "acquisition" starts by looking identical to "/foo" even though its blank. Then you start overriding the various text bits into English, and gradually, an English translation emerges. You could even write a little method to iterate over a bunch of links and append "/EN" to them.
The main issue with zope is scalability, since everything is dynamically generated. I use squid as a reverse proxy to cut down on dynamic page generation overhead. This also turns out to be an easy method for multihoming zope, which is a requirement that I have.
Post may contain irony: discontinue use if experiencing mood swings, nausea or elevated blood pressure.
How about buying/licensing/downloading/getting the Babelfish technology or some other "on-the-fly-translation" software and stick it up as your front door? Then you just write in whatever language you like and let the translation software do the rest...
LOAD "SIG",8,1
LOADING...
READY.
RUN
you probably want to read this article on the real complexity involved in this sort of localization, and the pitfalls inherent in the sort of substitution you suggest.
suffice it to say, it's a nightmare for anything beyond nice and friendly iso8859-1. the author uses an amusing anecdote about a simple localization of a simple feedback message into Chinese, Arabic, Russian and Italian...
- mark
For those of us living outside the US this is not a new problem. When it comes down to it the easiest is probably to use the first option and use real HTML with the multilingual text being includes. This way when you are actually trying to fix a page a few months from now anyone that understands HTML will be able to do it and in fact you will even be able to us Dreamweaver and the like. The other option leads to only the original developers being able to really understand what pieces are in which files and where to change the HTML to get that dang table to align correctly.
How you do it is going to depend a great deal on how you get your content.
For example, is your content mainly user-contributed, or does is come from professional writers? Where the stuff comes from is going to make a big difference as to how it gets translated. User submitted stuff could probably be run through the babelfish in pseudo-real time (a daemon which queues requests, uses wget or lwp-request to have them translated, and then sticks them into a database). On the other hand, professionally submitted content should probably be treated a little better than simply babelfish; hire some native speakers as translators.
The way I would do it is something like this (keep in mind that I am a Perl/C programmer specializing in Apache/mod_perl; some of the things I mention are inaccessible to PHP, like translation handlers):
Retrieving the data and putting it onto the page would be the easy part... once you have the information stored in your database in the appropriate languages, that is. Designing the database so that you have as little unnecessary redundancy as you can while still ensuring that all of your content is available in all the required languages will definitely be a challenge, but it's an architecture problem, not a programming problem.
Good luck. I, for one, would be interested in hearing how you make out and what track you decide to take.
darren
Cthulhu for President!
(darren)
I run one site that is accessed from all over the world and I can tell you that 90% aren't using a 4.0 broswer (its only about 1/3 and only thouse from well off countries) and looking at how long it takes to transfer the pages the connection between here and there is often way slower than 33k.
If each page is a collection of bits and tags from three separate databases youd have to write tons of fancy administration code, and it sort of limits the ability to use other web technologies in tandem with your site, as well has have anyone who isnt an experienced programmer modify the content layout, since everything has to fit into a very restrictive and complicated architecture.
I havent used PHP before, but heres the basic idea not specific to any programming language..
build your site in English (or your default language), assuming we are talking about regular static HTML files or some kind of .shtml perhaps. Write a filter for your webserver (I use java servlets mapped to *.html myself) that parses the path info of the request for something like "/german/foo/bar.html", i.e. the language identified in the beginning of the URI. Then within your HTML files, anywhere you want translated text, do it like this:
Then within your filter servlet or apachemod or whatever, parse the file for these tags (caching schemes can be utilized for speed), and then based on the language encoded in the URL, dynamically replace the body text (if the URI-specified language is not the default) with the contents of a file <languagebase>/foo/bar/mytext.html. So you could have somehting like /web/translations/german/foo/bar/mytext.html, /web/translations/spanish/foo/bar/mytext.html, etc. If a translation file is not present then you just use the default text already in the document, so you can still launch new HTML pages even if a translation is not available yet.
If you want to raise the bar of speed, dont use a dynamic filter, just write a perl script to regenerate an entire static site underneath "/german" "/spanish", etc. using the same scheme. Or you could even mix up the two approaches.
If your site is not static HTML but some kind of database driven thing, you can still use a similar approach, it just means the filtering program has to be molded to fit your content-delivery environment.
For some time now, I have been using webmacro as my templating engine and have found it to be extremely powerful, especially in situations such as these. Webmacro's main design approach is based around the "Model/View/Controller" paradigm where the back-end data set is your model, the templates are your view and the webmacro based servlet is the controller that ties everything together. The advantage of a system such as this is that you never need to include any language awareness into the template at all, the only thing that would matter would be the abstract structure of the document in question (which can be broken down into whatever granularity you would prefer to use - pages/sections/paragraphs etc etc).
The controller would then be responsible for making a decision on what page and which language version was required either by virtual directories (/english/category/doc.html) or file extensions (/english/doc.en.html). The category would define the choice of template and the language would modify the query to a relational db. However, the information that is passed to the template is passed packaged by structural role rather than by language thereby enabling the template to render the page completely independantly of language type.
In order to deal with the generation of text buttons (if needed), then I would use java to build a scheme module that is then run through gimp. I've had at various other batch based manipulation packages, but haven't yet anything that gives you the range of freedom and quality that gimp does.
If the data set that is being used to generate this site is static then there is absolutely no reason to generate the site dynamically on the fly and it is better to cache the whole site onto a static file system (unless it is prohibitively large) and then just rebuild it whenever the data set is updated.
The other final advantage that using a system such as webmacro gives you is that it is possible to very easily test out your navigational systems without having a large data set of information already stored in the db. If you replace the model with a virtual model that supplies random text within the contexts requested then it is very easy to test out situations like "Does this tree navigation work as well with deeper trees and more than 5 languages?" without changing either your servlet or your template design.
The only Good System is a Sound System
This is exactly what has been puzzling me for a while. Not only when it comes to translations, but context-layout in general. PHP is nice and dandy, but for somehow it is really hard to seperate content and layout, or content and language support/translations for that matter.
I messed around with things and tried a few different approaches though none of them really satisfied me.
PHP has some mechanism as FastTemplate.php3 available, though an XML based approach (with seperate stylesheets) would be probably the ultimate way to seperate content from layout. Too bad XML isn't fully supported yet. You have http://xml.apache.org/ but I didn't really checked that out as I didn't found any literature about the xml.apache and XML combination on the PHP scene.
I'm not sure what the best approach would be.
-- Mentor
haha haha haha
...Or I'll beat you. No kidding.
Seriously, check out Mozilla's approach with XUL and I18N. They separate out the text using entity substitution, and the rest using CSS. And that's a very basic, moderately unsafe way to go about things. A more intelligent way would be to have one XML doctype for your basic document, an XML doctype for your content (of which you would have n instances for each of your n languages), and XHTML for your formatted result. You would also have two XSLT transformations: one to merge in the I18N, and another to merge in the HTML design.
The next release of Apache Cocoon is expected to be very efficient in terms of XML/XSLT processing, but I don't know how it racks up in comparison to hardcoded PHP.
Just keep it easy, I'm sure the person wouldn't mind making one extra mouse click to view the page in their native language. Just allow them to choose a language and have different pages for each one. That way you just got alot of copying and translating instead of coding.
"When will this FP stuff stop?" "After the great growing..." "The great growing?" "Yea, when people grow up."
Thanks to built in support in NT for unicode, and heavy support in IIS/ASP for multi-language applications, it's cake.
Basically, you set a "region code" at the begining of each page in ASP, and then simply supply your multi language content from a seperate location. Each page's content is determined by this region code.
Wow... that takes about 2 lines of code, and works around the world perfectly. I bet I will get nothing but flame for this post because it uses an MS technology. Funny... I remember the days where people used things not because of who made it, but because of how well it did it's job.
Memories...
Check it out.
--w
E V E R Y T H I N G I W R I T E I S F A L S E
I've developed software that is to be used with multiple languages.
You can see the software running at http://beta.infopop.net.
It uses XSL and XML to render each page. Users are able to upload their own XSL with different languages if they wish. This is the 'template' approach mentioned by many other posters. Contrary to what has been said about XSL performance however, I've found it works well.
Also, we use a database to store keyed messages in different languages. Each message is requested by key and looked up by the language of the site its being used on.
The only problem? Getting Oracle to swallow UTF-8 characters. We're having a daemon of a time. If anyone has worked with Oracle, Java and Unicode I'd love to hear from you! peter @ infopop . com
three words: don't use babelfish.
Since I'm in my local universities Comp Sci Databases class, and just having tackled this problem in the work place, I can tell you the most
poetic way is to use a database.
You will need 4 simple tables:
Lanuages: A one-field table of lanuages "English", "Deutch", etc...
Strings: IdNum (as text, like a #define constant), and a field for your
native language i.e. "Click Here"
Finally a 3rd table, translations, containing the IdNum references IdNum.Strings, the
language (references language.Languages) and finally the translated
text.
Of course, you mau want to use integers instead of strings but I like them this way. They are easier to read, etc. Your image problem is the same as the last table, but you store blobs instead of strings.
And one function translate($id, $language)
where a typical call would look like:
translate ("Click Here", "English")
would execute:
select phrase from translated where lanugage='$language' and
indum='$id';
and return that.
then you can make cool online forms for the translator to use =) Please let me know if you use this. (Just curious)(Comments also welcome)
There are reasons for 3 tables, but I will not go into my exact mental model or DB thoery here.
My company does ASP business, so we wanted to be able to support both multiple languages and multiple brands (skins, essentially) on top of the same site code. Our solution is JSPs with method calls conditionally including all brand- and language-specific content.
This makes the site wonderfully customizable, but the result is lots of little HTML fragments in lots of scattered subdirectories. I.e., it is a nightmare for the non-technical who generate content. And maintaining it as the site changes is nigh impossible.
XML-based solutions (like Cocoon) are elegant, but they require skills that nobody has yet (XSL, etc.). Does anybody have a solution that gracefully degrades for non-technical minds?
For my site, we use content "templates" in PHP. Ours is a bit more complicated because we are database-backed, but the concept is the same. I just teach our designers to use a syntax like this:
<? $this->show( "foo" ) ?>
which gets include()ed in the context of an object that holds data associated with "foo."
For your application, if you don't want to go with an OO design (which you should, IMHO), you could just do this. Define a variable like $lang which can be one of "en", "de", "fr" and then every time you have language-dependant content, just do:
<? include( "foo-$lang.ihtml" ) ?>
Just make sure every PHP file shares a common header that sets $lang appropriately. To do this the even easier way, just make that part of your auto-prepend setting in php3.ini.
(If anyone is interested, we're thinking of open-sourcing the code for our site, which would make this OOD template system available for all db-backed sites. Let me know if this is something that there is an actual need for in the PHP world.)
Can your IM do this?
There's an article on JavaWorld how to do this using JSP. I have no idea how PHP works, but maybe the same ideas apply.
As a fellow web application developer, here's a simple way that seems would be a pretty good solution:
Have a per-user configuration file with a selected language as one of the preferences.
Then, load a text (or library) file that has all the corresponding 'text' information for the application according to the users preferences.
This way your code base stays small, and you can debug config files a little easier.
~Marshall
--
Homer: "No beer, No TV make Homer something something";
Marge: "Go crazy?";
Homer: "Don't mind if I do!"
arcane for life
Carpe diem!
Plus you receive the benefit of not haveing to remember to create the set of images each time. Just update text somewhere on the site. (Config file/otherwise)
penguinicide... when jumping out a window just won't do.
I've been doing some work for the Gov't of Canada and they need bilingual stuff so this is right up my alley. My current design works on "keywords" for "phrases" used in the page. A "keyword" is a design reference used by the programmer to identify phrases, which are one or more words.
.moveNext (or equivalent) through them. I use this mostly on forms, so when I join the translation table to the data table, the ordering is done according to the data table and I don't have to worry about that.
Keywords are put into a table with a unique id, keyword name, a group id (which references a group name (ie. page) from another table), and any number of translations desired that have the language as the column name.
Then if I wanted the word "apple" on a page, I just do "SELECT * FROM WHERE keyword='kApple'" and then use the "english" column if I'm on an english page, etc.
If you want to retrieve more than one keyword translation at once, you can use an ordering system for each group so that you can just
I also made my own admin tools for the languages so that I could look at a group of translations (for one page lets say) and make sure they are done correctly, that they are all there and that they are in the right order for each language.
If you start to get fancy, you can do all of your translation with one query that returns many phrases, though most of my pages have 2 or 3. Also, if you limit the amount of dynamic data you have in your translated text, you can save a lot of time.
I would recommend against using a syntactical system with a parser only because this can be fairly computationally intensive on a web server.
----- rL
True, this is the "easy" way - but it's a nightmare to maintain.
If you support n languages, your maintainance time is multiplied by a factor of n. Obviously, you want to use the same logic and formatting tags but with different words and phrases. This may be harder on your server, but it'll be easier to maintain, and you'll have less bugs and inconsistencies between languages.
This is why people are talking about querying the info from databases and using include files. Even include files are not a good way to go for all people because and ASP page that doesn't support dynamic filenames on includes will be bogged if you give it a page with 5 languages translated on it. PHP doesn't have this problem as far as I know.
Cheers.
----- rL
I've never really worked on a large project, but I've done some multiple format designs, and essentially I like to keep content separate from design. Try and keep the graphics-based text to a minimum and if you have to, use a small gif for it. What I did was have the multiple images in a sub-dir of /images (where the main text-free graphics are stored, like /images/en /de /fr something like that, and when you refer to the images use a variable to fill in the blank, /images/*variable*/filename.ext. But make sure everything works, heh, nothing pisses me off than doing a search and seing a damned cfm/asp/php/cgi error. But it's always easier (and faster) to import text into a file than graphics.
I hate grammar.
I don't know PHP, but here's a simple, language-independant solution to your problem:
Create an include file that defines an array (someArrayOfText) for each piece of multilingual text. Make element 0 the english text, 1 german, 2 spanish, etc. Use a cookie to store the user's language preference (intLanguage = someInteger).
Finally, in the code, display someArrayOfText[intLanguageFromCookie] instead of flat text.
This is not a great technique for very big sites, but for an intranet or small public web site, it involves the least overhead. It also allows all of your translation to be done in one place.
I developed a perl-based system a few years ago called the Translator Engine to solve the problem of translating large websites. When you get into the thousands of pages you end up getting lots of different teams working on the site and it becomes very difficult to manage a single version let alone multiple language versions.
Since there is a previously existing site, there is no option to build templates or install a database, so the idea was to make a tool for building a static translated site. The system I built descends into a folder containing an entire site, and rips all content and tags into separate files which could ultimately (not implemented since rates changed) be emailed to translators living in areas where translation was cheaper. Otherwise you need html-savvy translators whereas you really just want translators to see plain text.
At any time you could rebuild the site automatically and see how far you had gotten. Initially MacPerl was used but because of the huge filesystem overhead it was moved to a Sun. It was done before much of the current CGI libraries were debugged so if I was going to use it now I would probably want to rewrite it to use them with MySql.
No matter how you implement translation remember that when an update or addition is made you need to have all other languages updated as well. If you can use a database which separates plain text and provides an index of tracts which are pending translation this would be very useful.
Another system I have (which is currently in use) is a multilingual search engine suite called EyeLatitude. It currently is being used in Japanese and English and uses separate resource files, with a web based administration interface generated separately before installation. It gets dicey trying to type multibyte strings into Perl and controlling the encoding type on output so it is easier to use the correct encoding once in the resource file. Separate files are also used for different functional modules, for example a timestamp routine includes a separate file with a micro-dictionary which lets the module "talk" in plain Japanese. Running that module with a different argument ("en" instead of "jp", gotten from embedded comments or hidden form input fields in the html) makes the module spit back a differently formatted string. A simple version of the engine uses a cache (with HTML tags ripped out) for small sites, since word stemming trees don't always work for these languages.
Perl XML might be a very good option for you now, but in the end your bottleneck is not in your ripping system but in how fast translators' work can be accomplished and incorporated into the site. PHP code mixed in will confuse them too. So you should have a database involved, but a better idea to do it in Perl and have the option to provide the client a database-driven site or if they don't want a database, a static site built out of it on your side before uploading to the server. And remember, be sure to use binary fields and quoting if you do database programming! Otherwise your first double byte translation could crash mysteriously.
Check out the available Perl libraries for ripping HTML and test how they work with your pages, particularly do they preserve all the accents or multibyte characters in your text, and do they preserve Javascript or PHP correctly. They should be up to the job. If you do have a large corporate site for which you want to manage translation and updating over the long term, I am willing to provide such a service. I don't know of any other such service around.
The Debian web site speaks about 20 different languages and is huge (about 2G in size). What we use is lots of Website Meta Language and its slices, plus apache's content negotiation. We do have some problems with the negotiation but it is usually due to browser bugs. lynx is definitely the most compliant of the browsers.
Lets go under a few assumptions:
1. you don't want to skimp on the site design/graphic look
2. 90% of the web is browsing on a 4.0 browser that supports CSS
3. 90% is using a minimum of a 33.6 dialup.
under these assumptions, I would like to say that you wouldn't have to skimp on anything to create a site that caters to a variety of languages with only 1 version of the code.
CSS tied into something like PHP would be your answer plain and simple.
Text could be dynamically swapped in depending on the selected language, keep your image buttons plain and layer the text above the image, etc and you're all good.
Make a site in which you could easily substitute different "palettes" into the design... not designed differently, just coded differently. It's not as difficult as many people make it out to be. CSS is a much more powerful tool in web design than many people give it credit for (most use it just for text decoration). I suggest picking up a book on it.
Our site is multilingual and we use JSP, but it's pretty much the same concept.
e =en).
.gif"> Hope this helps
The simplest way is to have template files (for example, one file for the front page and one for the sections), that include different texts depending on the sections (to know the current section, you can use an http request ex: http://www.example.com/main.jsp?section=3&languag
For the images, the best way is to go with a "imagename_en.gif" and "imagename_fr.gif" and so on. In your code, you can just put something like
http://www.logient.com
Hi,
I had a similar dilemma. The site of my own
company had to be in English because it
targets an international audience (contracting),
but also be able to support _at least_
Spanish because some of my target audience
is specifically Spanish speaking.
So, the result is that several parts are
multilingual (the web applications). I keep the
language message catalog in a separate file,
and then load it on demand depending on the
language. These constants are used instead of
the actual message.
There is also a way for Apache to deliver pages
in several languages but I didn't get it to
work and it requires you to keep both language
and layout in multiple files so it becomes
a nightmare to maintain. So I prefer the
approach I follow better than the Apache
content negotiation stuff.
Post Posting: Has any european poster any chance to get a rating higher than 1
Yes. Repeatedly
free experimental electronic music netlabel at www.viablehybrid.com
WML's pretty slick - it's a preprocessor basically with 9 passes and there's nothing keeping you from embedding PHP in it. In fact with the right amount of handwaving you can even have it generate code -inside- the PHP for multi-language sites. I use it for a bunch of sites - it's well worth looking in to.
>On a related note I have been wondering about
>using XML for more things. Theoretically you can
>convert almost all of your data into any
>language format that you want. However it seems
>like the stuff out there about xml is rather
>cryptic. In fact I still can't get a xml
>document converted into just plain text (ie the
>freenet documentation that comes with the
>server).
You can use XSL transformations to convert XML to either a different XML, HTML or Text, but the source must be XML.
To create text, use the text output method: <xsl:output method="text"/>
Surround your text with <xsl:text> tags.
${YEAR+1} is going to be the year of Linux on the desktop!
Hell, if William Shatner advocated it, its gotta be cool, right? right? kidding...
ZEN is a prime number in base-36
I had to deal with this problem a while ago and here's what I came up with.
:).
I created a spreadsheet containing the following columns:
- Phrase group ID name
- Phrase ID name
- The actual phrase to be translated (in English)
- Description of phrase explaining context (in English - this helps translators for sentence fragments that have multiple potential contexts in other languages)
- One column for each language that will contain the translated phrases.
The spreadsheet is e-mailed around to translators who fill out the appropriate column, and new columns can be added whenever another language needs to be supported (You can even make up your own "Dilbert-speak" or other fictional language
The spreadsheet, aside from containing simple phrases and sentences, also contains other language-specific elements, such as currency punctuation and other appropriate marks (like double quotes, for example).
I then wrote a perl script that parsed the spreadsheet data, creating a large nested hash table. It was grouped first by language, then by group ID, then finally by entry ID. This script then wrote out conversion tables consisting of entry ID/translated phrase pairs. It used the Headings for each language column in the spreadsheet as the name for each language directory. What was written was something like this:
ProjectDir -> English -> account.hash
general.hash
signin.hash
Spanish -> account.hash
general.hash
signin.hash
German -> account.hash
general.hash
signin.hash
...etc. Each language directory contains a hash table for each phrase group (I break the phrases into groups to avoid having to load all of the phrases all of the time). I also used these language directories to store language-specific images.
Now I create the HTML pages (I use a simple text editor). Whenever I specify a text phrase I say something like this:
...INPUT TYPE=RESET VALUE="{SIGN_IN_CLEAR}"...
...INPUT TYPE=SUBMIT NAME="ACTION" VALUE="{SIGN_IN_CHG_PW}"...
...INPUT TYPE=SUBMIT NAME="ACTION" VALUE="{SIGN_IN}"...
My cgi scripts, written in perl, first initialise the localization code by loading the appropriate hash tables (the correct pathname is determined through another hash table using the customer's selected language of preference). Then, whenever a page is to be output, it is read into a variable which is passed to a function like this, which substitutes each entry ID in curly braces with its translated phrase (my regex "skillz" are not terribly "l33t", sorry):
sub sendHTML
{
my $content = shift;
$content =~ s/(\{)([A-Z|0-9|_]+)(\})/translate($2)/ge;
print $content;
}
The translate() function is equally simple. It uses the ID to look up the phrase from the locale hash table:
sub translate
{
my $key = shift;
my $s = $GV{textStrings}{$key};
$s =~ s/(\{)([A-Z|0-9|_]+)(\})/$GV{userData}{$2}/ge;
return $s;
}
Note that any text in curly braces found in the translated phrase is interpreted as user information data and filled out in the same fashion (but from a different hash table). This is to accomodate changes in word order from language to language that cause parameters in translated phrases to change order.
There are still problems with this method, as it doesn't address problems like text printed right to left, or up to down (or both). But it was a suitable solution for my needs. Hope this helps.
It makes a lot of sense to design the site first, then add the text. This way you can change the look on the fly and not have to touch the pages in each language. As far as a splash screen for people in different national TLD's, it might be nice to pre-design a few for common countries, welcoming them and asking them which language they would like to procede in. That would not be a problem for updating, but imagine having to touch each one to add a nav bar or something? Ack! If international componants are needed they can be added with an international variable, like a table with region-specific symbols. If there is no need for that componant, then nothing will get inserted and it just won't appear.
I don't use it for multi-lingual sites, but i would think it'd be pretty simple to set up. as far as the graphics, WML can help out there, too, through its support of the gfont language, which can create GIF images on the fly, using TeX fonts. with graphics for multiple languages, though, it may be tough creating buttons and the like because the length of the text could change so dramatically. i've not played with gfont either.
WML (and some tutorial pages, one describing multi-lingual sites) is at http://www.engelschall.com/sw/wml/. give it a try. it has made my web-development life much richer.
I had a similar problem a few months ago while designing a servlet based website. The solution I found was to use a preffix based language search.
For example, say you connect as a french user. One of the parameters you'll send to the client is 'language=fr'. Server side, any language dependent picture is prefixxed with 'fr_', and any text was looked up in the 'FR_TEXT' database. Of course, you'll probably need to fine-grain it if you don't want to have redundant pictures (my understanding is that an arrow looks exactly the same whatever the language...).
Nicolas Rinaudo
I've been working on this for a little while too. The main difference is that all my pages are database driven, from the content to the menu item and the images. My solution (yet to actually be implimentented) is to add another field to the tables that will denote the language, and then use a view based on the language of the browser to figure out which records get displayed. Someone will have to translate the text, but that is the only sticking point. Everything else - site design, application programming, page logic -stays the same for each language. You won't need to maintain multiple language pages or static include files. It is all done through the same web-based entry form.
not sure if this would help, but if you're looking at creating dynamic international support for buttons you could just use php's gd library implementation to create buttons on the fly using existing images as templates. perhaps even create them once and statically referrence them based on the language mode you're in. that would definitely cut on server load.
Maintaining multiple sets of code really amplifies the chance of error when making application changes. For small blocks of text I use if-then or case statements. For larger blocks I make a phrase database and just query out the language I'm using.
The first, and probably most important step would be to hire someone, such as myself, who is thoroughly versed in this topic. Someone, such as I, who has done this everyday for about 4 years at companies like Intel, Novell and Microsoft (leaving the latter due to the inferior IIS and all things pertaining to Basic):)
I have experience with two different approaches for creating and maintaining multilingual web sites:
We use this approach for the SSLUG web site, and except for the lack of time, I see it as a success.
The language choice is based on (in order of priority): Direct choice, Accept-Languages, and client domain.
The actual implementation is done with SSI and plenty of conditional blocks. This makes the raw files look rather messy, but it is definitely beneficial to have the different language versions just next to each other.
I have used this approach for my own web site, but I wouldn't call it a great success. It is difficult to keep track of translations when they are stored in different files and the result will typically be pages that are severely out of sync. The main benefit of using one file per language is that you can leave the language choice to the Content Negotiation/MultiViews feature in Apache.
My advice for people starting on a multilingual web site is:
The multilingual files should be as close to plain HTML as possible, so something like
<p>
<en>Hello world
<fr>Salut le monde
<da>Hej verden
</p>
where you simply allow a sequence of language coded tags followed by a text in that language would probably be a good solution.
Atheism is a non-prophet organisation.
Lord Almighty! I know some perl fanatics like to rip on php. Your point is (un)fortunately moot. PHP offers the luxury of choosing whether you want to embed it, or use it perl style (with echo and all). Working in PHP, I've realized the following:
- within functions, it looks cleaner when you echo your html
- outside of functions, it looks cleaner when you embed your html
According to some dreamers, including myself, clean-looking code is self-documenting. Perl-style reg-exps may be powerful, but they are really messy to look at. PHP database integration is usually a lot more readable, and more forgiving of errors than line-by-line file processing perl-style.For the original problem, i gather a database for pages, with the following structure (simplified defn) :)
table content
location varchar (50) unique
text english
text german
text french
might be a start. It depends on how advanced you wanna go. With a neat administrative tool, you could fix up the php/html from within a textarea in a form, and thus edit your site from within a browser. Do a 'SELECT $lang FROM Content WHERE Location=$PHP_SELF' or some such thing..
Again - it all depends on deadline, future use and number of pages..
-Jeppe
Stop the brainwash
Free music from Jack Merlot.
I've got a bilingual site, 100% PHP; admittedly, what I'm going to say applies only to languages using the latin alphabet, but the simple fact of having one directory per language makes maintaining the site in both languages as easy as falling off a log, and nothing is duplicated. And I didn't even consider whether this was design or content. Buttons and everything are completely bilingual except where I chose deliberately not to make them so. The site's called www.mrquiz.org. If you take a look and you're still interested, I'll gladly send you the source. My mail address is on the site.
Adam:What kept you?
God:Rome wasn't built in a day
This would be equivalent to putting all code in php include files, and leave as little code inside the HTML as possible.
You are probably in the same situation as me, you are doing the business logic, and someone else the HTML. I have people doing HTML who do now know what the Supposedly there is some solution with servlets / jsp I saw mentioned this weekend when looking at technologies for our next product which preferable would be easily localizable. I did not find much info though, and will probably stick with php4
The right solution would be XML, but we need good XML tools. A folding XML editor where you can select languages to show and enable/disable the code in between would be good. But right now, I think you are stuck with php include files, and the necesary embedded php :(
I would also appreciate better solutions.
I have done this in a few projects, and it is not perfect, but quite close.
BUT Moving the business logic of the code into include files in PHP is not much different from Perl files reading and parsing HTML or XML templates.
It is all a question of forcing the PHP programmer to use only loop and print on the page itself. Save all database queries in an array/list whatever in the include file. What is really the difference between doing
The problem is, that HTML was not designed from the ground up to print tables based on vectors/matrices.
I don't have any useful advice for your webpage structure, but I there is some hope for images.
...but don't quote me on that, I gave up trying to use it when I realized I'd have to put effort into it.
PHP has an image processing library that let's you do composite image stuff on the fly, including adding text to images. I've never used it, but it looks fairly powerful.
The big drawback is they originally implemented the system to generate GIFs, and then the GIF trouble happened, and they seem to have withdrawn that image genaration feature. From what I've read, it looks like all it takes is a couple pico changes, and a recompile before you're up and generating pngs.
Pastry
Surely, in whatever language you are using, there are case statements. :-)
You just need to be careful how you place them.
Be thankful you are not my student. You would not get a high grade for such a design
Oops. Make that "XSLT", not "XLT" in the last couple paragraphs.
There are two routes you can go for using templates with PHP, FastTemplates and the PHP Base Library's ("PHPLIB") Template.
So how are they different? FastTemplates was originally a Perl library that was ported to PHP. FastTemplates works well for Perl programs, but it's not ideal for PHP. Kristian Koehntopp wrote PHPLIB Template from the ground up as a pure PHP library to better take advantage of the capabilities of PHP. One advantage to Kristian's design is that it parses templates with preg_replace(), which is said to be faster than FastTemplate's reliance on ereg_replace(). Another advantage of PHPLIB Template is it allows dynamic blocks to be nested, unlike FastTemplates.
For those reasons I prefer to use PHPLIB Template, but you do have a choice of the two libraries.
It may be worth also mentioning the XML approach. XLT is an XML based format for templates, so you might want to look into that. PHP4 can parse XML, but there isn't code to specifically parse XLT as far as I know. XML or XLT are options if you need them, but they're probably more involved then you would need for most PHP projects that really just need templates.
And for a nice tutorial on PHPLIB Template, look for my article on phpbuilder.net sometime soon (assuming the editor over there decides he wants to publish it). But even if my article doesn't get put online there, it is a very nice site for PHP info.
For example, my 6-year old gets around a lot better with icons than words.
The same would be true of anyone who doesn't have a great grasp of the language used (and exactly how many web pages are in every possible language?).
Shut up, be happy. The conveniences you demanded are now mandatory. -- Jello Biafra
When you add a new language, translate the file of content, run the script, and stick the new pages in a new subdirectory. Tweak your splash page to point to the new language.
P.S. Someday, you will do this with XML/XSL.
Apache JServ allows for servlet zone creation which provides an excellent environment for generic code base usage with localized HTML content generation. You can create zones for each supported locale and then assign a common repository for each zone. This allows you to write a single application using localized UIs. To bring it all together, you create an application framework that provides resource management on a zone-by-zone basis. Examples of localizable reources are: database connection pooling, HTML template management and error messaging via resource bundles.
Everyone is looking at this as some kind of database or server side scripting problem, which is IMHO overdoing it a bit, and missing the point
Firstly, every webserver and browser worth using already handles default language handling. It's an integral part of HTTP. A french user can be given the french version of the page transparently by the server (as their browser already knows what languages they want). This part of the specifications is there for a reason, use it! People don't want cookies, logins and unnecesary choices.
If you use apache (the only server I have more than half a days experience with) search for content negotiation in the docs. It's actually set up by default (download the tar ball and look at the "It worked" page to see what I mean). In a nutshell, instead of a file foo.html, you have several files for each language, foo.html.en, foo.html.fr etc, and the server works out which one the user wants.
From what I understand, IIS also handles this kind of thing well, if you want to use it.
Getting the webserver to do this for you will almost certainly be quicker than anything you can write, as it is better integrated into the server. By serving static pages things get even quicker still. If you need Dynamic content, PHP, Perl, ASP or CGI scripts can all be programmed to use the default-language headers. If you want to generate dynamic content, choos ethe language this way rather than trying to create your own system (unless you already have a login system ala slash, in which case it wouldn't hurt to add it as an option).
None of this information is hard to find. In fact, it's pretty hard to avoid it when looking at the Apache config files, so I don't know why everyone else has missed it.
hth
Maybe you could convince one of those responsible for that page to help ?
As for separating design/content, I'd use CSS. It downloads quickly, doesn't require much server recources, and in my expirience it renders much faster than table-formating (./ is hopless here ...) It also degrades gracefully.
Now with mozilla/Netscape 6 soon to be released, css will be pretty much uniformely supported, including layers++. It works with lynx(degrades gracefully), emacs, iexplorer and soon netscape (already works for simply formatting pages, but the fancy stuff is a nightmare in current netscape... and you thought microsoft didn't adhere to standards...)
I'd write a "try-out" page, for making the design, looking something like:
inclue your style-sheet(s) in the header
<h1 class="banner">Welcome to bar-page</h1>
<p class="normal">
One little paragraph right here...
</p>
(...)
And then use a "wysiwyg"-stylesheet editor for formatting it, and the go back, and replace the content through php, or something similar (I guess the choice of a true database vs. just text-files depends on size of the web-site, and who/how it will be updated). I have used dreamveawer(from macromedia - win/mac only) a bit - it is very good, and at least the previous version pretty much preserved whatever type of indenting/formating you used on you stylesheets/html-source, and that is something few wysiwyg tools do, in my not-so-far-reaching experience.
And just to repeat the important stuff: CSS rul3z! :). And it's cutting-edge too! AND your page will be much easier to genereate/manipulate with scripts/php/whatever, once you do that. Better searching/easier to implement searching, easier to maintain, cooler, faster, improves your sex-life, saves disk-space, ... oh I'm rambeling, sorry...
You really have to try the speed-thing to see it for yourself: Make one page with table-formatting and a ton of font-tags, and one that just embeds a style-sheet. Download-size drops, and rendering go relative (as in approaching speed-o-light
With appropriately written templates, it can also automatically use the right images for each page. Read the list archives for more info on what has already been done.
--Just the place for a snark!
Basically, try to avoid them. Not only do they make your site difficult to maintain due to language, they're just an overall pain in the butt. You can do a lot (design-wise) with plain text and CSS, or if you absolutely must have a graphic with text in it, try an alternate trick (such as using the graphic as a table background and then laying text over top of it).
I/O Error G-17: Aborting Installation
I don't like when I ask a question in the context of a certain programming language and get an answer about how I should use a different language, so please understnd I am not advocating one over the other.
I believe it would be a clean approach to use a Java Application server such as JServ or Websphere and take advantage of the multi-lingual capabilities of JAVA on the server-side.
I need a TiVo for my car. Pause live traffic now.
I've evaluated methods for implementing multi-lingual sites.
First off you've got to determine what language the end user wants to view the site with. This can be done multiple ways: client hostname, browser version (language), server hostname (ie. japan.bigcorp.com), or by hitting a button or link on the page. Typically, I lean towards using the hostname (if ($HTTP_HOST) == "deutsch.bigcorp.com") { $lang = "de"; }) in combination with link set a cookie for the language choice. The 'guessing' of the desired language based on the browser or the client hostname doesn't work all that well because there are a lot of foreign nationals in the states that may want to view in their native language, and vice versa for overseas.
Once we've determined the language choice, typically I have multiple tables in the database for the language options. I.e. headlines.en or headlines.de and you just append the language choice:
<?
if (!isset($lang)) {
$lang="en";
}
$query = "SELECT headline, link FROM headlines.$lang";
?>
So for pulling from a db that's pretty easy. When We want to present seperate pages or page layouts for the different languages (i.e. localized data, or product offerings, etc.), its not too hard to do either. You can do it with http header redirects:
<?
if ($lang == "de") {
$location = "page.de.php3";
} else {
$location = "page.en.php3";
}
header("Location: $location");
Or (this is what we usually do), when building a frameset point to the proper language page:redirects:
<?
if ($lang == "de") {
$location = "page.de.php3";
} else {
$location = "page.en.php3";
}?>
<frame src="<? echo $location;?>">
Images are the same sort of thing. We just would append the language code to the image (or easier, is to set the $image_path to like "/images.de", etc.) Then you just pull from the selected images path.
If you are building images dynamically, and they have text embedded into them, you're going to have to hax0r it to output the language of your choice. However, I think that many of the Internet users from abroad understand that its really a English/American dominated network, and if everything isn't offered in their native language, they aren't going to get super pissed off. If you make an effort to get the key content in as many languages as you can, you'll be in good (better) shape.
What becomes tricky (and what I don't have much experience with) is the non-roman based languages (ie. asian languages). We typically have to outsource this work to a translation company, and they tend to provide us with rasterized and vector-based files that we can then embed into our site. If you find a good translation company, they should have experience doing this sort of thing and probably can help you figure out the best methods to employ. The do this stuff for a living, and many of them are top notch.
-k
Try using UNL -- its a United Sations meta-language that lets write a page once but allow everyone read your page in their own native language. Basically it works like this. You produce a page in English and you translate it into "UNL" using their free software. Then your UNL file back into English. If the meaning hasn't changed then it will produce as good a translation for everyone else in their native language. Otherwise you can modify your file a bit and try again until it works. Unlike translators like Babelfish this translation engine works every time. Why? Because it comes with an interacive editor that asks you what you really mean.
It has structure in place for producing web documents in several languages. It's slow, but if you don't have much dynamic content, or you just want to know how it's implimented and maybe adopt for your PHP needs...
http://www.engelschall.com/sw/wml/
Don't use tables at all, use CSS layout, you'll find it makes it a lot easier to seperate content, people with 4.0+ browsers will see the cool stuff, and people with older browsers will see some old-school, highly readable HTML. Table layout is dead, long live CSS...
Most HTML+CSS pages are readable right from the source, and would be easy to translate the file in whole.
If I were in your place, I'd put every paragraph in a database table, with each row having the text in each language. That way, the translators could work on one paragraph at a time, and it would still be easy to update.
Amber Yuan 2k A.D
"and dear god does this website suck now." -- CmdrTaco
I've done this on a recent project by storing all text into a MySQL database and writing a simple perl script to merge the text from the DB into the HTML files.
It goes like this:
Make a language resource table. Call it "RESOURCES". The columns are TEXT_ID, LANGUAGE, and TEXT_DATA.
Make an html template directory. You will store all "raw" html files here. Beneath this directory, make subdirectories for each language (eng, frc, jpn, etc.)
In the HTML, make references to the database values by some easily identifiable token string, and wrap this token string around the TEXT_ID value from the database for this text resource. If you want, you can put the english equivalent inside the token string, so you can read the templates in their raw form. E.g.:
Write a script (I chose perl) to:
- EndThat should be it. Now, when an english-speaking person comes to your site (you'll have to ask them somehow of course), you can just redirect them to /path/templates/eng/file.html, and everything will work.
This doesn't address the images, however. If you're using languages that use the western european character set (french, spanish, english, portuguese, german, italian, etc.), it will be easy. You'll be able to type your text directly into photoshop or the gimp or whatever and make your graphics. The next thing is to put a language token in these HTML templates that you've made for all images. Something like this:
And in your language parser, write a one-line substitution that will substitute all instances of ##LANGUAGE## to the current language you're iterating through.
If you're translating to languages with different character sets (double-byte languages such as chinese, korean, etc.), you'll need to create your graphics differently, but once their created, the storage of them is the same. One way to create them is to write a cgi that will run through the DB and print, in HTML all the text resources of a given language. If you have your browser set to the correct character set, you will see the foreign language characters correctly. You can then do a screenshot, and paste the screenshot into your graphic editor to make buttons or whatever out of.
This approach has worked really well for us on two projects so far, and looking to be more projects soon. The advantage of making these HTML templates is that it greatly reduces the load and time it would take to build the pages if they were dynamically created from database lookups upon request. You just run the template generating script every time a change is made to the template, and voila.
I modded the Troll Investigation and I got
_________________________________________
For the tenants in my building (students mostly) I wrote a database driven website. The content of the pages, including caption and index name are entered into a database. At the moment this is done in Dutch and English. HTML tags are included in this content. The nice thing is you have the two language version side by side on your screen while you enter / change the content so it's easy to keep them synchronised.
:).
Every night the actual web pages are generated from the database using templates for a frames and non frames version. The images for the index are also automatically generated.
The downside is you still have to duplicate the complex layout stuff for both languages. But since our layout isn't complicated it's no problem (besides simple is beautifull
The actual site and on my own site you can find two screenshots of the editor.
So people on the internet speak languages other than english? The net doesn't belong to the United States? yikes! stop the encryption, terrorists could be using it to plot against our government!
-- Just the FAQs Ma'am.
Lotus Domino Global Workbench
Lotus Domino has the excellent Domino Global Workbench, it allows you to manage mulitlingual web sites and workflow applications using browsers or Notes clients. It can be coupled with a machine translation engine to do the actual translation or you can farm out the translation strings to an agency, or different agencies for each language.
So it is not strictly open source in that it is not under an open licence however most of the source is visible, and you can change it for your own requirements.
With the recent spate of high Lotus execs resigning Lotus is likely to fall more under IBM's wing which means a strenghend commitment to Linux and Open source in future.
Might as well go ahead and use JSPs, because they are more flexible than servlets when it comes to separating content and logic (unless you need to transfer binary data...). I would only recommend XML/XSL if the amount of data displayed is somewhat small. You can see your server grind to a halt when the DOM is built for big XML files to do the XSL transformation ;)
www.freetranslation.com
Personally, I'd make a basic design on the page layout and then slap all the text into an access database. Then depending on the language type they selected pull the info. As far as the buttons and such go, you could do something like create your own object models and name them in such a way that you could easily call that object model for whatever language you want. But that is a LOT of work. Of course, if you wanted to take the lazy way out, there's always babble fish :)
Java internationalization is a really cool solution. It takes a day or two to master the fine points. But it'll pay off in the long run.
Hope this helps!
--
Socrates was asked where he was from. He replied not "Athens," but "The world."
The best way I found for creating multilingual web sites is with apache + mod_perl + HTML::Mason
- First, this module helps you a lot to separate content from formatting.
- Next you put people write content and have a file per language: index_es.html index_uk.html
- Then when index.html is requested, the default handler reads the browser language preferences and loads the most appropiate.
Very EasyI do PHP a lot. I would use the file() function (Each element of the array corresponds to a line in the file, with the newline still attached). You could easily add support for any number of languages this way.
For example, each first line (foo[0]) of the file could be whatever the pages <title> tag would read in THAT specific language. Then you'd only have to make sure your code would alter which file to grab, with the specific language in it. Then the second line (foo[1]) could be something else, and the third (foo[2]), fourth (foo[3]), and so on, could each be used for some specific part.
Or you could just simply use MySQL or some other database, if you have access to it.
*/
If you choose to go with some sort of PHP design / content seperation, you should know that PHP has dynamic image generation capabilities. PHP has documentation and webmonkey has a guide.
On the other hand, where you have pages where you are changing the content often, it is better to start with your layout first, then call the contents into the page. Many dynamically generated sites, such as Slashdot, probably use this method.
Finally, Slashdot and other sites usually have an archiving method, which converts the older dynamically generated pages into static "archive" pages with a fixed url. Thus, it is likely that one might employ both methods when managing sites with both static and changing content.
When I read the original question my first thought was XML.
For info on XML check out http://www.xml.com http://www.xml.org and/or http://www.apache.org
I'm just starting in XML myself, mainly as a way of transfering data from Oracle8i databases to a presentation layer (Browser, WAP, WebTV &) and updates back.
Stephen
"Don't write down to your readers, the only people less intelligent than you can't read" - Sign on Newspaper Office Wall
I still prefer the old style: no fancy markings and XML - all you get is bloaded source - and stuff hard to maintain. Take a look at my VCFe site at http://www.homecomputer.de/VCFe/ or http://www.vintage.org/VCFe/.
A seperate (sub)directory for every language and a selection script at root for the desired language according to the HTTP request. If the User comes in via the default URL, he will be switched to his language - if he enters thru one of the (sub)dirs, he may switch tehm at will. And if you can't install the switcher script, a small perl util will move the aprobiate index.html to the site root.
All pages are static at the server, so access time is fast - well unless your information is dynamic - and you may install your site within any existing server - remember, your best design may be void if your provider changes the server soft.
For dseign the pages are seperated into a common framework for all pages (once per language) and seperate content files for each page - a set pf perl scripts will concat them into the effective pages when the site is generated.
I did use the basic utils on several sites without change - ov course, as more complex the framework gets, as more specific the scripts have to be to incooperate features - But as we all know, only a simple design ma reach maximum audience. If you want to increase your audience by offering alternative languages, you shouldn't repell your visitors by using to 'modern' designs.
One of the main benefits is that the content files bear only a minimum (HTML) overhead, and even unexperienced people may translate them without geting confused by and interfering with the frame design. This saves a lot of debug time.
Well, it's my way - YMMV
There are more elegant ways of doing it than using include() but it's functional and does what you need to get a site up and working relatively easily.
YOU ARE USING PHP!!!
The reason that many of us would never use a
language such as PHP is that PHP is based on
being embeded in a
same design, same problem.
If you are one dude, maintaining a few pages
of one sight, and you don't have any knowledge
of any other tools, then PHP might be the only
tool you use for any job. If, on the other
hand, you are maintaining many pages, or many
sites, or have knowledge of other tools, this
is a good time to start using those other
tools.
Perl with XML::XSL would be a good tool for a
job like this, but don't expect to be
seperating content from design in a language
designed to be embeded into the content!
<CHANT>
The right tool for the Right Job
The right tool for the Right Job
The right tool for the Right Job
The right tool for the Right Job
</CHANT>
When in doubt, use perl 8-}
---
This is your life, good to the last drop.
Doesn't get any better than this.
---
This is your life, good to the last drop.
Doesn't get any better than this.
This is your life,
I have had some success using PHP and included language files. I added Portugese translations to a few pages on my site http://www.artwells.com/oracula/ by creating separate language files for all content on these pages.
Of course, this makes it necessary to pass a language variable from page to page. It's also necessary to put php commands to print content from variables set in the language files. The pay off, however, is in the ease of translation. I can distribute a language file for each page to a translator, who can then easily tranlsate the file without too many technical barriers.
If you are using mod_perl, jou might realize that the problem was already adressed and solved (well partly) by Apache::Language avaliable on a CPAN mirror near you. It is HTTP/1.1 compliant, works around buggy browsers and tries very hard to do all that transparently. Language storage can be in flat files, SQL databases and any other storage method you can access thru perl. Just my 0.02$ advice
I see a lot of people go off into the deep end with all kinds of complicated databases and transformation tools. Maybe this works for very large projects with lots op people working on them.
My experience is:
If you want to be able to translate texts in a reasonably efficient manner, you should keep small texts for multiple languages together and separate them for large texts. For instance, I use a lot of scripts that generate forms. So I start every page with an array that contains words and phrases:
if ($lang == "nl")
$texts = array("name"=>"Naam", "age"=>"Leeftijd");
else
$texts = array("name"=>"Name", "age"=>"Age");
(I include a header that figures out the user's language by the http accept language, user and site domains (none of which are foolproof) or authentication/cookie data for registered users.)
Translating this is very simple: copy the array definition and change the phrases. You don't want to use a database for this, because you need to be able to look at the from and to languages at the same time. For large texts I include html files. Translating them isn't much of a problem, keeping several versions up to date is harder.
Don't forget that many users speak more than one language. For instance, many users I talk to in Dutch on my site want to see links to content in both Dutch and English, so when they sign up they can choose between Dutch, English, Dutch + English and English + Dutch.
Why dont you use Unicode? Set the context type to utf8. There is a unicode editor on http://www.yudit.org/ . There are some sample pages at the same site (Resources: Hungarian Grammar in Japanese) created quite a while back, but they still look ok.
Can you give me an example how to use CSS to replace a table-tr-td structure? Thanks.
xuanb
I solved this for my purposes with a very little PHP3 function (10 lines, only one swicth() call). Now I have only _one_ source file for each multilingual page (no database needed).
Examples available.
Languages are selected via ?language=xx tags appended to the URL.
Of course it's not a solution for a big site and by far not perfect.
I personally never got the hang of it, but the version we used was still pretty beta and we only spent a few weeks with it. Still, it is designed for multiversioned sites.
..p
Yeah, I agree. JSPs with java.util.ResourceBundle is a good combination. A very basic framework (may be slightly wrong, not taking from code).
... {
...
...
...
...
... >
P references")%></A></LI>
--- I18N_Strings_en_us.properties (English strings)
HomePageTitle=User Home Page
Link_UserPreferences=User Preferences
...
--- I18N_Strings_sp_mx.properties (Spanish/mexico)
// Really bogus spanish here, I apologize
HomePageTitle=Pagina de Casa
Link_UserPreferences=Profilo de Usuario
...
--- UserLogin.java -- a Java object that represents current login
class UserLogin
String m_country;
String m_language;
java.util.ResourceBundle m_i18nStrings;
protected void loadStuffAtInit()
{
m_i18nStrings = java.util.ResourceBundle.
SomeMethodIForget("I18N_Strings",
m_country, m_language);
}
public java.util.ResourceBundle getI18NStrings()
{
return m_i18nStrings;
}
}
--- HomePage.jsp
<@page
<jsp:useBean id="curLogin" class="...UserLogin">
<%
java.util.ResourceBundle i18n = curLogin.getI18nStrings();
%>
<HTML><BODY>
<H1><%=i18n.getString("HomePageTitle")</H1>
<UL>
<LI><A HREF="UserPrefs.jsp"><%=i18n.getString("Link_User
...
</UL>
</BODY></HTML>
----
The xxx.properties files contain I18N'ed (internationalized) strings for all possible strings in your JSP.
If you're worried about performance hit of string lookup, then you can preprocess all JSPs for each language/country, finding all instances of i18n.getString("XXX") and replacing it with the literal value in the language file. You would then generate HomePage_en_us.jsp, HomePage_sp_mx.jsp, etc. Your links would also need to change from <A HREF="UserPrefs.jsp"> to things like <A HREF="UserPrefs_en_us.jsp"> -- both tasks are easy to do with Java (or Perl).