ICANN Mulling Multilingual URLs

← Back to Stories (view on slashdot.org)

ICANN Mulling Multilingual URLs

Posted by ryuzaki0 on Thursday October 11, 2007 @07:12AM from the so-many-ways-to-say-google dept.

griffjon writes "The Washington Post is reporting that ICANN is testing out fully multilingual domain names. These won't just be [non-western-language].com, but would have TLDs translated into other scripts, fixing annoyances for non-English speaking audiences. An example: 'Speakers of Hebrew, Arabic and any other language written from right to left must type half of the URL in one direction and the other half — the .com, .net or .org postscript — the opposite way.' Let's hope it goes better this time around: 'Next week's experiments use the domain name "example.test" translated into 11 languages. A previous model, however, used "hippopotamus" instead of "test." These plans went awry when an Israeli registrar realized the Hebrew word ICANN thought meant "hippopotamus" was an expletive and threatened to involve the Israeli government.'"

25 of 213 comments (clear)

Min score:

Reason:

Sort:

Multilingual URLs... by It+doesn't+come+easy · 2007-10-11 07:13 · Score: 4, Funny

Well hippopotamus me, what will they think of next?

--
The NSA: The only part of the US government that actually listens.
1. Re:Multilingual URLs... by Rob+T+Firefly · 2007-10-11 07:17 · Score: 4, Funny
  
  Meh, they can all go example themselves.
  
  --
  Slashdot Burying Stories About Slashdot Media Owned
2. Re:Multilingual URLs... by lgw · 2007-10-11 08:12 · Score: 4, Informative
  
  The .test domain and the example.com address are specifically reserved for testing (anddocumantation example) purposes. There's an RFC somewhere. How silly to use something else!
  
  --
  Socialism: a lie told by totalitarians and believed by fools.
3. Re:Multilingual URLs... by SL+Baur · 2007-10-11 09:16 · Score: 3, Informative
  
  That would be Chinese and Japanese - top to bottom, right to left.
  
  Japanese writing has pretty much been converted to the western left to right style. Formal government documents and newspapers are written that way and in day-to-day life in Japan one will rarely encounter top to bottom writing except in traditional restaurants, certain stylized ads and museums. You actually encounter it less than outright English (English is very popular in ads see http://www.engrish.com/ ), which few people read.
  
  My brief trip to China seemed to indicate that they've done the same thing there.
  
  It's not an issue.
4. Re:Multilingual URLs... by amRadioHed · 2007-10-11 12:40 · Score: 2, Informative
  
  You're right, Chinese text is frequently printed horizontally left to right. Most frequently from my experience. None of the Chinese language newspapers I've seen here in Southern California use vertical text. The only time I've seen vertical text was in formal situations.
  
  Chinese text can also be seen written horizontally from right to left on some old signs and buildings. This comes from before horizontal writing was common and is actually a special case of vertical printing where there is room for only one row of characters.
  
  --
  We hope your rules and wisdom choke you / Now we are one in everlasting peace
5. Re:Multilingual URLs... by mattmatt · 2007-10-11 14:34 · Score: 2, Informative
  
  There's an RFC somewhere.
  
  RFC 2606, Section 3. It's referenced at (where else) example.com.
Domain name != URL by Anonymous Coward · 2007-10-11 07:19 · Score: 5, Informative

A URL is an entire address, including the protocol, local path and fragment identifier. This is a URL:

http://slashdot.org/foo?bar=baz#qux

A domain name does not include the protocol, the local path or the fragment identifier. This is a domain name:

slashdot.org

This is talking about domain names, not URLs. If anybody would talk about multilingual URLs, it would be the IETF, not ICANN, and they already have, they are called IRIs.
1. Re:Domain name != URL by CastrTroy · 2007-10-11 08:28 · Score: 2, Interesting
  
  Sounds like something that the Canadian government would embrace. There's rules for government websites that the url must be bilingual, so the directory path and file names must be mirrored to create the same structure in both French and English. The loophole in the rules is that you don't have to provide multiple directories and folders where the name isn't linguistic, such as calling your file 1243.html, or ESADOFE.html. So you can either mirror your directory structure in French and English, or have a completely incomprehensible gibberish based directory structure.
  
  --
  
  Anthropic principle: We see the universe the way it is because if it were different we would not be here to see it.
2. Re:Domain name != URL by Phisbut · 2007-10-12 01:06 · Score: 3, Funny
  
  Sounds like something that the Canadian government would embrace. There's rules for government websites that the url must be bilingual, so the directory path and file names must be mirrored to create the same structure in both French and English. The loophole in the rules is that you don't have to provide multiple directories and folders where the name isn't linguistic, such as calling your file 1243.html, or ESADOFE.html.
  
  Ah, but that's where you're wrong my friend. Like it or not, "1234.html" can be expanded to "1 2 3 4 . HyperText Markup Language", which can then be translated to "1 2 3 4 . Langage Balisé HyperTexte" and then back to "1234.lbht", so you can't even escape the bilingual requirements with non-words html files.
  Unless of course you make your website using only PHP scripts, which is lucky because it's a palindrome (and a recursive one like we geeks all like our acronyms), and the "PHP Hypertext Preprocessor" translates to "Préprocesseur Hypertexte PHP" and then back to PHP, so 1234.php would be ok. PHP is a bilingual recursive acronym, making it Canada-proof.
  Don't get me started on .cgi, .asx and .pl, cause things could get ugly.
  
  --
  After 3 days without programming, life becomes meaningless
  - The Tao of Programming
Seriously by El+Lobo · 2007-10-11 07:22 · Score: 3, Interesting

Seriously, multilingual domain names are a pain (for the whole humanity). Visiting japan, last year, I saw a lot of servers using japanish simplified language on it. As a foreigner, I hadn't the minimal idea about what the site was (without clicking on ot). Clicking on it didn't help either. Yes, a lot of japanese have the same problem with english domain names, but adding multilanguage names adds more complexity to the whole thing. I would like to see the face of a chinese guy trying to decrypt some URL using ukranian characters... or... trying to write it on his japanese keyboard...

--
It's time to realise that Abble's products are the biggest abomination these days. Just say NO to the dumb iAbble way!!
1. Re:Seriously by veganboyjosh · 2007-10-11 07:25 · Score: 2, Interesting
  
  Speaking of Asian (written) languages, don't a lot of them read top to bottom?
  
  How to accommodate those?
2. Re:Seriously by gregoryb · 2007-10-11 07:27 · Score: 2, Funny
  
  Speaking of Asian (written) languages, don't a lot of them read top to bottom?
  
  How to accommodate those?
  
  Rotate your screen 90 degrees...
What word? by dotancohen · 2007-10-11 07:28 · Score: 2, Funny

I'd love to know what Hebrew word for hippo is explicative. All my life I've only ever heard "hipopotam" in Hebrew for hippo- not a very dirty word. In any case, Hebrew URLs have been the norm at the Hebrew Wikipedia since as long as I've been using it. Hebrew domain names, on the other hand, would be interesting (even though I'm sure this is what the poster meant).

--
It is dangerous to be right when the government is wrong.
1. Re:What word? by Red+Flayer · 2007-10-11 07:55 · Score: 2, Funny
  
  As we all know, hippopotamos means river whores.
  
  Or at least, that's what I recall from 4th grade biology class.
  
  --
  "Trolls they were, but filled with the evil will of their master: a fell race..." -- J.R.R. Tolkien on Olog-hai
2. Re:What word? by blinx_ · 2007-10-11 08:19 · Score: 2, Funny
  
  It was a 750cc water cooled 2 stroke triple, that is sweet vintage sex on wheels, not horrible :)
  
  --
  Resistance is not futile - www.gnu.org
3. Re:What word? by zunger · 2007-10-11 14:00 · Score: 2, Interesting
  
  Behemot is the plural of behema; the word literally means (roughly) "large, mindless quadruped." In the plural it's often used as an equivalent to "livestock," and in Biblical Hebrew it was used as the (only) word for hippopotamus. In more modern Hebrew, the borrowed word "hipopotam" is used for hippo, and "behema" has a slightly more literary feel to it -- except when it's used to refer to a person, which is probably its most common use today. And not polite. :)
More info here by Anonymous Coward · 2007-10-11 07:29 · Score: 5, Funny

xc.estaog//:ptth
I am registering by dotpavan · 2007-10-11 07:40 · Score: 4, Funny

http://org.slashdot/ or is it org.dotslash://http or org.dotslashcolon://http or.... ah, hippo it!
1. Re:I am registering by tighr · 2007-10-11 08:24 · Score: 2, Funny
  
  Actually, I read some article about it. Creators of 'Net made a mistake with domain names - but when they realised it was too late. They logically should be made in this way - top to bottom. Protocol://TLD.domain name/rest of URI.
  I cannot find where I've seen it... So does that mean that in a few years after this change, we'll have the com-dot boom? Will we be living in the age of com-dot? That doesn't even roll off the tounge...
Re:This negates the entire purpose of DNS by griffjon · 2007-10-11 08:04 · Score: 3, Insightful

Actually, if you RTFA, ICANN's failure to do this so far has caused increased fragmentation, as countries have implemented their own, only-works-here solutions:
At least a dozen countries, including China and Saudi Arabia, have created their own domains in different alphabets and their own Internets to support these domains. A Russian newspaper article last July reported that President Vladimir Putin was commissioning the creation of a Cyrillic Internet. Users of Russia's Internet, like current users of China's and Saudi Arabia's, could surf the Web without going through U.S.-controlled ICANN servers.

"We have been told so many times it will be next year and next year and next year that ICANN will make" multilingual domains work, said Alexei Sozonov, chief executive of Regtime, a Russian domain registrar. "So countries now have their own deployments."

Now, of course, most of these countries have their own issues about Internet connectivity and interoperability, but this at least is one less acceptable reason they behave that way.

--
Returned Peace Corps IT Volunteer
!knil taht kcilc t'noD !GMO by Anonymous Coward · 2007-10-11 08:18 · Score: 2, Funny

ssa lamron a tub htuom eguh a htiw yug a fo etis kcohs a si tI
Well, uh, we could click by Anonymous Coward · 2007-10-11 08:40 · Score: 5, Funny

This is a fair comment - how do we deal with languages we don't know and can't even type? Let's see, I'd say somebody should really come up with a way to get to a site on the Internet without having to type some moon language that has letters that aren't even on my keyboard. Maybe there could be some other input device we could use, maybe this little hand-held rodent-looking device just to the side of my keyboard. I've always wanted a use for it.

Maybe if I did a search for something, and the answer is in one of those "other" languages written by those "other" people, maybe I could somehow click some kind of--I don't know--maybe a representation of that site, using my rat or squirrel or whatever these new-fangled devices are called. Then of course I'd like to be able to save this transportation capability for future use; if only there were a way to save some kind of cyber-bookmark in my browser, to keep my place without having to type in all those funny characters ever again. I think I have some ideas, but I need to contact my patent attorney first.

Oh, no. Wait. I just thought of something bad. You know, when I actually get to this site, it's probably going to be really hard to understand what's written on the page. Funny squiggles and such. I suppose there's really just no reason for me to go to such a page, if I can't read it anyway, so why even bother? Plus "they" probably don't know anything good anyway, but there's always a chance that "they" might be more intelligent than we thought. If only there were some site that provided a service that could help me translate this page, then maybe, just maybe, I'd be Ok with allowing these foreign-speaking visitors to spread their native language like some kind of disease all over "my" Internet. If only...
Analogy with the official language of Aviation by presidenteloco · 2007-10-11 09:41 · Score: 2, Insightful

I don't think the computing world is ready for this yet, and it may never be a good idea.

Internationalization in software and operating systems is in a horrible state of excess
complexity right now. When everything top to bottom runs unicode UTF8 as its default
mode, then MAYBE.

But even then, there is a single language for Aviation communications (happens
to be English) but that is done so that there is some hope that everyone will know what
everyone is talking about, because everyone can learn the aviation subset of a single
natural language.

Also, most programming languages retain a small set of keywords in a single natural
language, so that most people will have a chance of learning that small set.

Simplicity-and-universality-first arguments maybe should win the day
for domain names too.

"Nationalized" domain names are one more step in the very unfortunate
trend toward balkanization of the Internet. The Internet is to some extent and
should continue to be one place where all people around the world start working
and communicating and trading and problem solving together. A Lingua Franca
is clearly needed if this is to remain true.

--

Where are we going and why are we in a handbasket?
Re:The "Balkanisation" of the Internet by Pfhorrest · 2007-10-11 10:49 · Score: 3, Insightful

As for accusations of "cultural imperialism" - can I just point out that English speaking people developed the Internet at their own time and expense (and a lot of tax-payers money) - so they are entitled to have it in English if they want And other countries are free to develop their own networks in their own languages and scripts if they want.

I agree that segregating the Internet into separate "internets" for particular countries is a bad idea; however, if other people want to have networks that operate in their native languages, who are we to tell them that they should stop that and be forced to use English instead? Wouldn't it be better to just make the Internet (the one that we have now, predominantly English) capable of supporting multiple languages, so that if and when people want to build networks in other languages, they're at least connectable to our internet, even if we can't type the domain names directly from our English keyboards? The alternatives are either making everyone build their networks in English, which WOULD be cultural imperialism, or ignoring the pressure for multilingual networks to the point that completely incompatible non-English alternatives spring up.

The world is already largely divided up by language. I doubt you (presumably a native English speaker in a predominantly English-speaking country) visit many Chinese websites written entirely in Chinese languages for Chinese speakers in China right now, even though their domains are written in 7-bit ASCII script like every other site on the Internet. This proposition won't make that any better, but it won't make it any worse either; and it holds the possibility of staving off the even worse alternative of completely separate, incompatible, non-ASCII "internets" springing up to meet the demands of these other peoples. At least with this multilingual system, an English site (with an ASCII domain) can link to a Chinese site (with a Hanzi domain). If China were to invent their own Hanzi-based DNS protocol, separate from our existing DNS protocol, not even that would be possible. Making our network multilingual actually prevents Balkanization more than it induces it.

--
-Forrest Cameranesi, Geek of all Trades
"I am Sam. Sam I am. I do not like trolls, flames, or spam."
Re:Some actual facts by jc42 · 2007-10-12 06:35 · Score: 3, Interesting

Before they rush on with alphabets that read right to left and use alternative character sets they really should try English words with greater than 8 bit characters. Are they gonna actually work?

Well, lately I've been testing a lot of my old code in various UTF-8 environments, and I've been duly impressed by the fact that, as Ken intended, almost all the code "just works" with Arabic, Chinese, Japanese, etc.

It turns out that there's a simple explanation. If the code doesn't examine chars with bit 8 turned on, but just treats them as unexamined "data" (or letters if the code is trying to distinguish that way), then everything works right. The only time the code needs to actually look at non-ASCII characters' values are when the text is being rendered in physical form. And hardly any code ever actually does that. Almost all my code reads data from files and writes data to other files, but never does anything with the physical representation of the data. It passes the data to other programs for that.

A case in point: I was recently working on some multi-language HTML files, and I decided to try a fun test with CSS: I defined a whole lot of classes whose names were in Chinese. This made sense, since these classes were being used for pieces of the text that contained mostly Chinese characters, not counting things like spaces and punctuation. I tested the CSS using more than a dozen browsers that I have installed on my linux and OSX test machines. I was unable to find a single case where it didn't work. I even hunted down some Windows boxes and tested the files on IE6 and IE7; the worked fine (despite the well-known CSS incompatibilities in IE ;-). I also tried a few CSS class names with Arabic and Hebrew names, and they worked fine, too.

Now, I don't think for a second that the writers of all those browsers spent time making sure that their code could handle UTF-8-encoded Chinese identifiers in CSS. I suspect that most of them never even considered the possibility. I'd bet that the code just takes anything that's not a significant character in CSS syntax, and tacitly treats it as a "letter". This is all it takes to make UTF-8 work correctly in this case.

I did mention this in a couple of browsers' newsgroups. The responses were basically of the form "Well, of course it works. Why wouldn't it? You don't need special code to handle charset=UTF-8, except for the rendering. You'd have to be a fairly incompetent programmer to write code that doesn't work correctly with UTF-8. Except for rendering."

I can hear people saying "but those browsers all need to render the text." Yeah, but the CSS routines don't render text. They parse the CSS input, and fill in fields in data structures that tell the rendering code how to position and color the text. But the charset-handling code is probably not called anywhere in the CSS modules; it's only called in the few places that actually need to color pixels on the screen.

Lots of people have suggested declaring UTF-8 to be the only encoding for URLs. If this is done, there's probably very little URL-handling code anywhere that needs to be changed; it'll mostly "just work", because char codes 0x800 to 0xFF are treated as "letters". The only question is whether the final step of rendering the text's pixels will produce the right glyph, and the URL-handling code doesn't care about that.

I happen to have a DNS server handy. Maybe I'll try a little test: In one of the domains, I'll add hostnames in Russian, Chinese, Arabic, and maybe a few other non-Roman alphabets. I'll wait a while, and see if I can access the machines via those names from a few other machines. I'll predict that it'll also "just work".

--
Those who do study history are doomed to stand helplessly by while everyone else repeats it.