ICANN Under Pressure Over Non-Latin Characters

Changing a system by Kamineko · 2006-11-21 04:06 · Score: 5, Insightful

Changing a system which works is a very, very bad idea.

Wont this open up the system to many more phishing attacks involving addresses which include non-latin characters which look similar to latin ones?

Re:Changing a system by Daniel_Staal · 2006-11-21 04:14 · Score: 4, Insightful

That's one possible problem. Then there are characters that are technically equivilent but have different representations. (Accented vowels for instance: you can code them directly, or you can code the accent and the vowel seperate.) You need some way to make sure they both go the same place, no matter UTF-8, -16, -32 or whatever else people throw at it.

And, of course, you need to make sure when someone types this into a browser some major DNS server someplace won't crash.

I'm all for adding non-latin characters. But I do recognize that it should be a slow process.

--
'Sensible' is a curse word.
Re:Changing a system by KingJoshi · 2006-11-21 04:25 · Score: 5, Insightful

But it's not working. Mainly for all those people that want non-latin characters. It's been broken from the beginning. Sure, there is historical reasons why we have the system we do, but change is definitely needed. Twomey is right that a change can't be rushed and it needs to be done right (for reasons of security, compatibility, stability, etc). However, the change does need to occur and there needs to be some level of pressure to ensure that it happens.

--
In times like these, it is helpful to remember that there have always been times like these. - Paul Harvey
Re:Changing a system by jmorris42 · 2006-11-21 04:28 · Score: 4, Insightful

> Wont this open up the system to many more phishing attacks involving addresses which include non-latin characters which look similar to latin ones?

Even worse, although your problem is reason enough to postpone doing this change. It will break the very idea of the Internet as a common when URLs can't even be typed in on all keyboards. There are good reasons why DNS didn't even include the whole ASCII set. Least common denominator is a good design decision. Every character currently allowed is easy to generate on ALL keyboards, can be printed in an unambigious way by EVERY printing system, etc. Remember that a lot of wire services aren't even 7-bit ASCII clean, email addresses on a lot of news wires have to use (at) instead of @.

More bluntly, of what use is the parts of the Internet I can't even type the domain name for? As things now stand I CAN, and have, snarfed firmware directly from .com.tw sites where I couldn't read any of the text. Learned things from sites where I couldn't read anything but the code text and command lines. Seen images and understood even when the captions were meaningless to me. I'm sure the reverse is equally true, that those who do not speak English still benefit from the English majority of the Internet the same way. All this because DNS is currently universal. Break that universal access feature and, frankly they can just as easy ingore ICANN and just get the hell off the Internet and make their own walled garden network based in IPv6 technology.

At a minimum, unicode DNS should be restricted to IPv6 ONLY. No sense wasting scarce IPv4 resources on supporting walled off ghettos.

--
Democrat delenda est
Re:Changing a system by imbaczek · 2006-11-21 04:31 · Score: 2, Insightful

Except that it doesn't. Being allowed to use 37 characters as a domain name is not what many people consider "working".
Re:Changing a system by ericlondaits · 2006-11-21 04:33 · Score: 4, Insightful

Accented vowels would be a problem, at least in spanish. Though their use is "mandatory", people with mediocre spelling don't use them in the internet. Even people who use them don't always do it: even though the use of accents is mostly regular, there are many (and very common) irregular placements.

Let's say for instance we have an online shop for tea called "Sólo Té" (Tea Only). Both accents are due to irregular rules ("Sólo" = "Only" and "Solo" = "Alone", "Te" is a personal pronoun and "Té" = Tea). Some people would try the current www.solote.com, others would try the correct www.sóloté.com, some would try www.sólote.com and yet others www.soloté.com depending on their spelling capabilities.

What this basically means is that in order to make sure everybody finds your domain and to avoid phishing you have to register four different domains.

A solution to this problem could be what Google does right now with accents: map them to the unnacented vowel. Thus "Solo Te" and "Sólo Té" would both find the "Sólo Té" store.

--
As a Slashdot discussion grows longer, the probability of an analogy involving cars approaches one.
Re:Changing a system by Tet · 2006-11-21 04:36 · Score: 2, Insightful

Wont this open up the system to many more phishing attacks involving addresses which include non-latin characters which look similar to latin ones?
Potentially, yes. But I'm not too bothered about that. Protecting people from their own stupidity is rarely a good long term strategy. However, i18n for DNS is a particularly bad idea for purely pragmatic reasons. Currently, anyone anywhere in the world can go to any URL in the world in their web browser. If we allow the full range of unicode characters, that simply ceases to be true. When URLs start containing unicode characters, many people are simply not going to be able to enter them into their computer (with current input methods, anyway). True, many of those sites will not be of interest to the average person that doesn't have a convenient way to enter the URL anyway. But there will always be those that need to grab a data sheet from a Taiwanese electronics manufacturer, or look at live results from a sporting event in the middle east. That will cease to be possible with i18n. As you say, the system currently works. Changing it for political reasons is just stupid.

--
"The invisible and the non-existent look very much alike." -- Delos B. McKown
Re:Changing a system by GeorgeS069 · 2006-11-21 04:53 · Score: 2, Funny

"37?!!"
"...in a row??"

--
I'd rather have a bottle in front of me than a frontal lobotomy
Re:Changing a system by Sin+Nombre · 2006-11-21 04:53 · Score: 5, Insightful

'when URLs can't even be typed in on all keyboards'
As far as Japanese go, there are very usable technologies that allow to type in kanji. Using a standard latin keyboard. It works pretty well, and i'm not sure what other languages have such options available, but since most of Asia uses the same kanji system I'm pretty sure that at least Asia has viable typing options.
'of what use is the parts of the Internet I can't even type the domain name for?'
Its of no use... to you. But then again, can you read Japanese, Korean, Arabic, Sanskrit or any other non-latin language? no? Then your usability isn't in question here.

--
"Im such a nonconformist I'm going to not conform to the rest of you!"
"Dude I think we just got goth-served"
Re:Changing a system by teh+kurisu · 2006-11-21 04:57 · Score: 4, Insightful

Just because the letters aren't printed on your keyboard doesn't mean it won't type them. Have a look at the list of keyboard layouts in your OS. Sure, it's an inconvenience for you, but less of an inconvenience than it is to the people for whom it is a barrier to entry. Or you could use Google - a lot of people don't even bother typing in domain names any more, they just search.

The whole point about this is that it avoids walled gardens, because the DNS records are still held by ICANN. The alternative is that China decides it's had enough, and creates its own root servers, causing a very real split.
Re:Changing a system by Hamled · 2006-11-21 05:20 · Score: 2, Insightful

This could certainly be a big win for Google.

I'm all for people using and having websites and domains in their native language and alphabet, however it would be very difficult for me to find traditional Persian music (which I happen to be fond of) if the domain were .sa (if that doesn't show up, it was a simple translation of persianmusic.sa into Arabic). On the other hand, I could probably find that site through Google, and largely would have to go through Google or some other search engine, to find and visit websites and domains in another alphabet.

On the other hand, I suppose that's how I do it now.
Re:Changing a system by Zaatxe · 2006-11-21 05:37 · Score: 3, Insightful

As far as Japanese go, there are very usable technologies that allow to type in kanji. Using a standard latin keyboard. It works pretty well, and i'm not sure what other languages have such options available, but since most of Asia uses the same kanji system I'm pretty sure that at least Asia has viable typing options.

I wonder how you got +4 mod points... this makes no sense at all!!

Let's suppose you are are a japanese person and you travel to Brazil. Nevermind if can speak portuguese or not, but then you need to send an e-mail using your company's webmail server from a computer at the hotel. And suppose this webmail server has kanji characters in its URL. How are you going to type them? Believe me, brazilian portuguese Windows has no support for asian languages (at least not by default, and actually I don't know if it's even possible with a regular brazilian Windows XP). What now?

--
So say we all
Re:Changing a system by Anonymous Coward · 2006-11-21 05:39 · Score: 5, Informative

What's this? I've been able to use the Norwegian characters in domain names for a long time. There are screetshots over at http://en.wikipedia.org/wiki/Internationalized_dom ain_name
Re:Changing a system by MrNougat · 2006-11-21 05:56 · Score: 3, Interesting

Though their use is "mandatory", people with mediocre spelling don't use them in the internet.

I don't have mediocre English spelling, and I would use the correct accented characters in English words like "naive" - except I don't know how to type those characters. Like many people, I know how to type the characters that are on the keyboard. Additionally, because there's no need for me to type characters outside the ones printed on the keys on my keyboard to make the internets come down my tubes, I have no incentive to learn how to type any differently than I already do.

It's not necessarily a matter of spelling ability.

--
Web 2.0 == Giant Blogspam Circle Jerk
Re:Changing a system by dasunt · 2006-11-21 06:01 · Score: 4, Insightful

As far as Japanese go, there are very usable technologies that allow to type in kanji. Using a standard latin keyboard. It works pretty well, and i'm not sure what other languages have such options available, but since most of Asia uses the same kanji system I'm pretty sure that at least Asia has viable typing options.

I must have missed where Japan conquered 51%+ of the area east of the Ural mountains.

AFAIK (and I'm not an expert), China, Japan, Korea and Vietnam used very similar writing system decended from Chinese Hanji characters. Vietnam and Korea (South Korea at least) later adopted other alphabets. So really, only China and Japan commonly use Hanji/Kanji, and even then, the CJK unification of hanji/hanja/kanji characters really annoyed a few purists when similar hanji/hanja/kanji were merged in unicode.

So, other than hanji/kanji, there is hangul (S. Korea), hana/kana (Japan -- yes, they have more than one writing system!), the Thai alphabet, the Cyrillic alphabet (former USSR), the Arabic alphabet (Middle East), Hebrew (Israel), the Brahmic scripts (India) and the Georgian alphabet. (And this is just off the top of my head, I wouldn't be surprised if there were a few more writing systems in use in Asia!).

And then, just to confuse the problem, there are the various forms of encoding. Admittedly, unicode would probably be one of the better methods, but there are a lot of pre-unicode encodings in common use.

When you expand the problem to be worldwide, there's also the Ethiopian and Greek alphabets that are used in their respective regions. There's also a ton of latin-based alphabets, which introduces many more characters than are currently used in the DNS system. (Including characters that look a lot like existing characters!)

And then you have the problem of alphabets used only by very small groups, such as Cherokee (Oh, I'm going to get flamed!). There are very few people who can write in Cherokee, but does that mean that the Cherokee language shouldn't be part of the DNS system?

Now, can you see why this is a mess?
Re:Changing a system by Znork · 2006-11-21 06:13 · Score: 3, Insightful

"It will break the very idea of the Internet as a common when URLs can't even be typed in on all keyboards"

You know, when one sees comments like that, it's not strange that non-7bit ascii countries find themselves rather exasperated with the rate of progress. If you take a few seconds to actually research the issue you'll find both a suggestive lack of multi-thousand key keyboards, as well as a whole host of solutions to that problem.

I mean, I can cut'n'paste chinese and japanese into vi, save the file with a unicode filename, and it'll just work. Earlier valid technical reasons are gone, everyone else has solved this; now the excuses start sounding really hollow.

It's time to drag DNS kicking and screaming out of the dark ages.
Re:Changing a system by ericlondaits · 2006-11-21 06:25 · Score: 2, Informative

If you are spanish-speaking (which was my example) not knowing how to place accents is not an excuse. They're a fundamental part of the language, unlike in english where they're only required for foreign words written in their original form.

In Argentina some people have keyboards with spanish language distribution (that is, with extra letters) and some learn the ASCII codes and use the ALT key (along with the code typed in the Numpad) to place accents and the letters Ñ and ñ (which are mandatory as well and can't be replaced by N or n... specially when Año means "year" and Ano means "Anus").

I know of many people that know how to place accents and are just lazy... but I consider that a sign of poor spelling as well, since the best spellers I know use all accents and get a bit of pain every time they find an omission (which normally changes the meaning of the word, makes fluent reading a bit more difficult, and it's just ugly).

--
As a Slashdot discussion grows longer, the probability of an analogy involving cars approaches one.
Re:Changing a system by jacksonj04 · 2006-11-21 07:23 · Score: 2, Insightful

So when foreign stores open their UK or US branches, complete with accented characters, how in God's name are UK and US citizens going to suddenly learn how to type accented vowels, umlauts etc?

--
How many people can read hex if only you and dead people can read hex?
Re:Changing a system by dcam · 2006-11-21 10:44 · Score: 3, Insightful

You mean: It can only be better for me, in the long run, if we all end up using my alphabet.

--
meh
Re:Changing a system by cortana · 2006-11-21 11:15 · Score: 3, Informative

It depends on your operating system. The "standard" way is to hold Ctrl+Shift and then type the hexadecimal representation of the unicode code point that you want, but that conflicts with a lot of keyboard shortcuts that people use and so implementors often alter it a bit (for example, with GTK+ you press Ctrl+Shift+U and then type the code point).

If your keyboard has a compose key then you can often compose a glyph from two similar looking glyphs. For example, for an o with an umlaut, " o -> ö (though I expect Slashdot will filter that character out).

Macintosh users have an Option key that they can use to make weird glyphs (option-8 for the infinity symbol, option-g for the copyright symbol, etc). On most operating systems, various other combinations of the Ctrl/Shift/Meta/Alt/AltGr modifier keys and regular keys will allow you to type more glyphs. Most desktop environments also have an on-screen keyboard type program that ease experimentation in this area.

Users of complex (e.g, Asian) scripts have a host of input methods to choose from and configure.

Finally, if all else fails, create a text file full of your faviourite non-ascii characters and resort to the tried and tested method of copying and pasting! :)
Re:Changing a system by pablo.cl · 2006-11-21 13:14 · Score: 2, Informative

http://hualañé.cl
Re:Changing a system by joto · 2006-11-22 02:32 · Score: 2, Insightful
How do I type in a character for a domain name that isn't supported on my keyboard?

Why do you need to type in a character for a domain name written in a language you don't even understand the alphabet of, and certainly can't read or write?
No matter what you do, I'm still limited to the keys on my keyboard. I think that's ~104 by last count. But I certainly don't use that many characters.

And how do you think a chinese keyboard looks? Do you think they have hundreds of thousands of keys? There are three reasons why you can't enter chinese characters into your keyboard, and none of them has to do with hardware:
1. You don't know chinese
2. Your computer software may lack an input method for chinese text
3. Even if you knew chinese, and your computer had an input method for chinese text, you still need to learn to type with it
I admit that there are some people who are going to bitch about the internet being english.

Yes, you are one of them. The people who want non-latin characters are not wanting them because they want to communicate with other english-speaking people on the Internet. They want them because they want to communicate between themselves, in their own native language. Imagine that you only had hebrew letters available for domain names in the US. The hebraic alphabet is relatively easy to learn, and most english words can be written in it. But it's cumbersome for english-speaking people to communicate with the hebrew alphabet. And that's why people speaking different languages than english, want to be able to write their domain names in different alphabets than english.
But does that give me a right to bitch about classical music being defined in French and Italian terms like a fugue, sonata, adiago, allegro... I think not. In the past 400 years we've all managed very nicely to adopt to these terms in order to converse with each other an a common basis.
Perhaps there are some terms that these anglicans can adopt from the middle east besides Jihad?

Sorry, you are not making sense.

What? by Aladrin · 2006-11-21 04:06 · Score: 5, Funny

Wait, so it's not tubes... It's a 15 story building?

Anyone else getting more lost every day?

--
"If you make people think they're thinking, they'll love you; But if you really make them think, they'll hate you." - DM

Re:What? by rubycodez · 2006-11-21 04:17 · Score: 2, Funny

those that live in 15 story buildings made of glass tubes should not throw brick laptop power supplies.
Re:What? by jmyers · 2006-11-21 04:26 · Score: 2, Funny

No, there are only 15 stories about the internet that are just retold with slight modifications. One is about tubes, one about bricks, etc, etc, etc...
Re:What? by morgan_greywolf · 2006-11-21 04:30 · Score: 2, Funny

It's a 15-story building made of tubes and supported by a brick basement, on a flatbed truck headed down the information superhighway.

--
My blog

not the whole internet! by syrinx · 2006-11-21 04:07 · Score: 5, Funny

It won't break the whole Internet! Just DNS. DNS is overrated anyway. Now if you'll excuse me, I need to finish reading all the new posts on 66.35.250.150.

--
Quidquid latine dictum sit, altum sonatur.

Yes and No by Aadain2001 · 2006-11-21 04:08 · Score: 4, Insightful

Yes, countries that use non-English characters should be able to interact with the rest of the world using their natural language. No, they shouldn't rush the change and risk a possible crash of a large portion of the Internet. Be patient young patawans, soon you will be able to have DNS names with any character you can think of, but it will be reliable and actually work.

--
Space for rent, inquire within

Break the whole Internet? by GBWisc · 2006-11-21 04:09 · Score: 2, Funny

Plans to fast-track the introduction of non-English characters in website domain names could 'break the whole internet', warns ICANN chief executive Paul Twomey

Luckily for us, GWB knows that we have some redundancy with the Internets, so if one breaks we can just use another.

That would be a good reason to get the UN in by aadvancedGIR · 2006-11-21 04:11 · Score: 2, Funny

The ICANN tries to give a technical reason to a political problem, although this reason may be valid, it is not a very good idea. With the UN, it will be handled by international comitees and we will all be long dead before they finally agree on which country will be in that comitee.

Late in coming? by grasshoppa · 2006-11-21 04:11 · Score: 2, Insightful

Perhaps, but I can't fault ICANN for this one, as much as I might like to. Like it or not, most internet technologies have their roots in latin speaking countries, which means systems developed there may not be tweaked to work with outside language schemes.

If the fault lies with anyone, it's with the individual contributers of the tech. Or better, with the non-latin countries appearent lack of interest in some of the core projects needed to push this through ICANN ( specifically DNS, httpd ).

--
Mod me down with all of your hatred and your journey towards the dark side will be complete!

When you've built on a foundation of straw- by Bonker · 2006-11-21 04:13 · Score: 4, Insightful

- Don't be too surprised when people around you start building their own houses rather than choosing to pay rent.

DNS upheaval has been a long time coming, and the current anti-American sentiment worldwide isn't exactly helping to stabilize it. We're already seeing all sorts of adhoc routing setups that deal with shortcomings of an ameri-centric DNS. My guess is that within the next few years, ICANN's 'control' of the internet will be in name only as everyone else in the world will have moved on to alternative routing and domain systems.

--
The next Slashdot story will be ready soon, but subscribers can beat the rush and slashdot the links early!

Re:When you've built on a foundation of straw- by t0tAl_mElTd0wN · 2006-11-21 04:27 · Score: 3, Insightful

I think that might be jumping the gun. American or not, the internet plays a huge role in the functionality of the modern world. Just imagine the chaos if international office networks went from "I can't open this word document you sent me because it's in a different format" to "I can't get email from you because you're on a different internet". American DNS control or not, decentralizing the internet like you suggest might happen could be one of the worst things that could happen for global communications.

--
Not So Random
Re:When you've built on a foundation of straw- by benoitg · 2006-11-21 05:08 · Score: 3, Insightful

Please, there have been complaints about DNS not supporting most language's (even latin) character sets since the birth of the web, so it's completely untrue that we waited till everything was built. After well over a decade of patient waiting, it seems that actual pressure was required to get this change through.

Stupid question by VENONA · 2006-11-21 04:13 · Score: 3, Insightful

"Given that some societies have used non-Latin characters for thousands of years, is this a bit late in coming?"

No.

Zonk either knows zero about the histories of the Internet or DNS, or is so enamored of finishing stories with questions that he'll tack on the truly ridiculous.

--
What you do with a computer does not constitute the whole of computing.

Watch out for attacks by Agelmar · 2006-11-21 04:16 · Score: 5, Insightful

For all you people saying "There's no problem, just do it" - I say watch out... there will be a rush of attacks and spoofs as soon as this is opened up. The letter "a" appears in the unicode character set multiple times, and some of the variants are almost indistinguishable. I'm not just talking about someone registering släshdot.org, I'm talking about someone reigstering slashdot.org (the a is FF41 instead of the normal a). Good luck telling the attacks appart from the real sites.

Re:Watch out for attacks by gsasha · 2006-11-21 04:24 · Score: 5, Informative

It's called a "Homograph Attack". See http://en.wikipedia.org/wiki/IDN_homograph_attack
Re:Watch out for attacks by tonigonenstein · 2006-11-21 04:46 · Score: 2, Insightful

As a human you might be fooled, but a well designed browser could tell the difference and alert you. So this shouldn't be a problem.

--
The sooner you fall behind, the more time you have to catch up.

Sure, go 'head by kahei · 2006-11-21 04:16 · Score: 4, Insightful

I'd be in favor of the change just because anything that undermines the Unix Tower of Babel -- the dependency on ASCII which complicates text handling sooooo much even when Windows solved the problem soooo long ago -- is good. Even Java gets it. Even Apple (finally) get it. Unix Is Teh Problem.

And the ASCII problem isn't just bad because it forces people to use inefficient encodings like UTF-8 (THREE bytes per character?) It's bad because it allows people to write code like:

if(string[index] == '.' || string[index] == '?' || string[index] == '!') sentenceEnd = true;

(a line repeated, with subtle variations, several hundred times in the code of a certain ubiquitous editor).

And, lo and behold, the above does not work, but once it appears in a few thousand places it's impossible to fix, and a vast towering structure of fixes made by people who don't really understand why it's an issue is built.

So, even though the proposed change would be hugely inconvenient for a huge number of people, I'm in favor, because I want the world to grow the fork up and understand that text != byte array some time while I'm still alive.

--
Whence? Hence. Whither? Thither.

Can't trust your browser's address bar anymore. by Anonymous Coward · 2006-11-21 04:17 · Score: 2, Interesting

Unicode has many characters that look almost exactly like characters in Latin-1.

For example, if "www.microsoft.com" is shown in your browser's address bar, how would you know for sure that the "c" is not from the Cyrillic alphabet, or the "o" is not from the Greek alphabet?

You simply won't be able to trust your browser's address bar anymore. The possibilities for phishing attacks are endless.

Re:Can't trust your browser's address bar anymore. by NoMoreNicksLeft · 2006-11-21 04:27 · Score: 4, Insightful

Why not have the browser fail to render them outside of the user's preferred alphabet?

Cyrillic users would see www.**c******.com, latin users would see www.mi*rosoft.com?

Or better yet, put up a big warning that it's using mixed alphabets?
Re:Can't trust your browser's address bar anymore. by reed · 2006-11-21 04:42 · Score: 2, Insightful

Or better yet, put up a big warning that it's using mixed alphabets?

In general, browsers ought to make users more aware of the parts of their current URL, and maybe also of link destinations (also mail client).

For example, seperate the URL into its parts (scheme, host, path). Display some of the WHOIS info below the hostname, and some info from the SSL certificate if it has one.

This would help people spot phishing scams or other suspicious activity.

Reed
Re:Can't trust your browser's address bar anymore. by Srin+Tuar · 2006-11-21 05:14 · Score: 2, Insightful

Thats a good start.

Registrars shouldnt accept such names in the first place though: Is there a valid reason to ever have a domain name with stray characters mixed in from different languages?

If a standard were to specify that a domain name must use a subset of unicode that is self-consistent, and that browsers should turn the address bar red to warn anytime a domain uses characters not in the users selected languages subsets, that would go a long way towards minimizing the phishing problem.

There would still be issues between users of the same orthography, but in general there is no way to prevent phishing style attacks completely, which fundamentally rely upon people to be careless. Even the current DNS system is vulnerable:
spoofing "cnn.com" with "cnn-news.com" or "cnn.newsnetwork.com" doesnt need i18n support to work at all.
Re:Can't trust your browser's address bar anymore. by Bogtha · 2006-11-21 09:48 · Score: 2, Informative

Is there a valid reason to ever have a domain name with stray characters mixed in from different languages?

You're assuming that characters belong exclusively to one language. Try telling a French guy that he can't register café.com because 'c' 'a' and 'f' are English, not French.

--
Bogtha Bogtha Bogtha

URL goldmine. by emmagsachs · 2006-11-21 04:19 · Score: 4, Insightful

Imagine the land rush that'll ensue if DNS will allow non-Latin characters. Trademark transliteration ? A heaven for domainsquatters and an upcoming surge of legal fees for trademark lawyers, if you ask me.

Nice for localising, sure, but how usable will Japanese, Indian, or Arabic script URLs -- for example -- be for those who do not have access to the respective sets or keyboard layouts?

compounding one mistake with another by tverbeek · 2006-11-21 04:19 · Score: 2, Insightful

Of course it's late in coming.

But that doesn't mean it should be done hastily and badly.

--
http://alternatives.rzero.com/

English, not latin languages by pubjames · 2006-11-21 04:23 · Score: 2, Insightful

Given that some societies have used non-Latin characters for thousands of years, is this a bit late in coming?

Let's be clear. The domain name system only uses English characters. There are lots of languages in Europe (Italian, Spanish, French...) which are closer to latin than English (which isn't really a latin language at all) which are not currently represented, because you can't use accents in domain names, or other letters such as the spanish Enye (n with a squiggle, actually a distinct letter). English speakers often think accents aren't important but they can completely change a word's meaning.

Re:English, not latin languages by brusk · 2006-11-21 04:36 · Score: 3, Insightful

True, but the English subset of the alphabet has another feature that matters in this regard: it's a lowest common denominator that all computers on the planet are capable of producing. I can type any letter easily on a computer in China, Israel, Jordan, Russia, Spain, India, etc. I can't necessarily input a given Chinese character, Arabic letter, or Cyrillic letter.

Why does this matter? Well, one argument is that it doesn't, much: if I want to view a Chinese website I'm probably in China and can input Chinese characters on my computer. But what about a Chinese person visiting an English-speaking country and surfing at a public computer (e.g. in a web cafe)? If the computer isn't set up for input of Chinese, he/she won't be able to view certain sites if they can only be accessed by inputting a non-latin URI. Thus to serve all possible customers, the computer would need dozens of input systems installed. That simply isn't going to happen. The alternative of just inputting Unicode codes is unworkable.

Hence it makes more sense to have a requirement that any non-Latin DNS registration ALSO be accompanied by a pure ASCII one, so that any computer will be able to access it. This also helps people who don't know a given language very well: if you don't know Chinese well, and are just learning it, you may find it hard to type in a web address with unfamiliar characters, even if your computer has Chinese input enabled. That shouldn't keep you from visiting a site.

In fact, there are some Chinese systems that do this, by creating a registry of Chinese names for websites. But they involve kludgy workarounds like browser bars that are not universal and are otherwise evil.

--
.sig withheld by request

Not a trivial job by turnipsatemybaby · 2006-11-21 04:24 · Score: 4, Insightful

The internet was originally conceived, designed, and implemented in the USA at a time where hardware was at a premium, and corners were cut to conserve that limited resource. DNS was just one of the results of that era. However, it is the most visible because it is the front end means for people to find each other. That means there is now a very well established standard, used by people across the entire globe, that is very difficult to change.

Changing all the DNS servers in the world to switch from ASCII to Unicode is NOT trivial. The fact that some societies have used non-latin characters for thousands of years is completely and utterly irrelevant. THEY didn't make the internet. They simply bolted themselves on to an existing infrastructure.

I agree that progress needs to be made to accomodate non-latin characters, but to have people whining about "how they want it, and want it now"... That's just ridiculous. It's like waltzing into a house that was built 40 years ago and having a tantrum because the stairs are too steep and the house is too squished. Major structural renovations take time, effort, and careful planning. And there is nothing you can do to avoid that, short of implementing cheap stop-gap measures that are virtually guaranteed to cause even bigger unintended headaches later on.

thousands of years? by sexyrexy · 2006-11-21 04:27 · Score: 3, Insightful

Given that some societies have used non-Latin characters for thousands of years, is this a bit late in coming?

Those societies did not build an entire economic and social infrastructure using all 50,000 of those characters in a few decades, though.

--

Rex is 09 F9 11 02 9D 74 E3 5B D8 41 56 C5 63 56 88 C0

Better idea! by EvilRyry · 2006-11-21 04:29 · Score: 2, Funny

How 'bout we all just speak English and forget about all those weird letters.

(It was a joke... well sort of)

Huh? by writermike · 2006-11-21 04:29 · Score: 4, Funny

ICANN Under Pressure Over Non-Latin Characters

You mean white people?

--
If Nalgene water bottles are outlawed, only outlaws will have Nalgene water bottles.

Base-Ten BIGOTRY, I say!!! by mosel-saar-ruwer · 2006-11-21 04:30 · Score: 3, Funny

Now if you'll excuse me, I need to finish reading all the new posts on 66.35.250.150.

Base-Ten CHAUVINIST!!!

What about societies that use Base 2 [binary], or Base 8 [octal], or Base 16 [hexadecimal]?

Or entire societies, like the British empire, which use no base at all?

12 inches in a foot. 3 feet in a yard. 1760 yards in a mile...

60 seconds in a minute. 60 minutes in a hour. 24 hours in a day. 7 days in a week. 52 weeks in a year [give or take]...

Or how about base 12?

12 keys in a chromatic scale: A 440, then, logarithmically [give or take a little well-tempering]: A#, B, B# == C [kinda sorta], C#, D, D#, E, E# == F [kinda sorta], F#, G, G#, and finally A 880.

Except that on the continent, things are often just a little sharper - say A 443/444/445 & A 886/888/890...

And let's not even get into water freeezing & boiling at 32 & 212 versus 0 & 100...

Use a simple eight dot three kludge by tempest69 · 2006-11-21 04:34 · Score: 2, Interesting

Set up a private latin name prefix for the non-latin names i.e. NONLATINPREFIX and then a UUEncode of the non-latin name.. IE (arabic word for horse in arabic script)=AER5ER8EDG so you would have NONLATINPREFIX-AER5ER8EDG.com as a domain name, that would resolve correctly if someone typed in (arabic word for horse in arabic script).. 1. This allows for simple web-extention to serve non-latin countries

2. Doesnt require any change to the DNS system. (other than some name policy changes)

3. Allows links to be imbedded in normalweb-pages so that they can be cut and pasted by anyone with latin functionality. So a Japanese person could cut and paste the link to some arabic site that they dont have the font for.

4. While this is a kludge it has some major advantages over rebuilding the DNS system.

Storm

DNS won't break by zdzichu · 2006-11-21 04:34 · Score: 2, Informative

DNS won't break. In fact, it already works! The thing is called IDN and is supported by all modern web browsers (including IE). Try for yourself - http://www.kozowski.pl (I hope Slashcode won't caniballize letter "").

So DNS and Web is OK. Any breakage I can think of may appear in email systems or other domain-based forms of communication.

--
:wq

.cn by hey · 2006-11-21 04:45 · Score: 2, Interesting

Does ICAN control .cn (China)? Or other national TLDs? Why don't they just start registering
domain in their local language. Leave .com, .org, .mil (ie the USA TLDs) English.

Pléåsé ñø by bugnuts · 2006-11-21 04:49 · Score: 2, Funny

Tht ìs thê £äst thïñg wë ñèêd

Dibs on ©óm

What's this going to do for security .. by rs232 · 2006-11-21 04:49 · Score: 3, Interesting

What's this going to do for security. Didn't we have phishing attacks receintly that consisted of unicode characters being inserted into e+bay.com for instance that didn't get displayed. the domain e+bay.com being different than ebay.com.

"A domain name is a unique address that allows people to access a website, for example, smh.com.au"

No,a domain name is a sequence of characters mapped to an IP address. It was designed so as you won't have to remember 66.35.250.150 instead of slashdot.org. This wasn't a problem while the original Internet consisted of just four computers. DNS was never designed to provide identity. There was also the case of a stock trader hacking a DNS server and redirecting traffic from a legitimate finantial site to his own where he had duplicated the real site only with bogus information.

"He said that this could create problems where, for example, a character in Urdu looks identical to one in Arabic"

It sure could. How about totally replacing DNS with a system of online identities.

--
davecb5620@gmail.com

Um... why? by Colin+Smith · 2006-11-21 04:50 · Score: 2, Informative

"Yes, countries that use non-English characters should be able to interact with the rest of the world using their natural language."

Why... No really. You speak as if this is a good thing. Why should they be able to use their natural language rather than English? Why shouldn't they be restricted to a limited area of local language speaking people?

The reason the Internet is useful is because everyone speaks TCP/IP. Incompatible protocols are to be actively discouraged because they balkanise the network. Language is exactly the same. The reason the Internet is useful is because everyone speaks English, the more divided it becomes the less useful it becomes.

Languages are anachronisms, the only reason we have more than one is the physical distance between locations and difficulty travelling allowed them to evolve independently. Well that isn't the world we live in any more and the different languages actually make communication far more difficult now. They're no longer beneficial. So get rid of them, insist on a common language. The most popular happens to be English at the moment. I could live with Spanish, but for those of you about to suggest Chinese, read this before deciding: http://www.pinyin.info/readings/texts/moser.html

We should be using this opportunity to actively get rid of languages.

--
Deleted

Re:Um... why? by CRCulver · 2006-11-21 05:08 · Score: 4, Interesting

Languages are anachronisms, the only reason we have more than one is the physical distance between locations and difficulty travelling allowed them to evolve independently.

So why does every language have strata of slang and jargon that may well be incomprehensible to outsiders? In south-east England, a fairly small area, one has a wide range of speech depending on economic status and social circle. If one has a few people speaking a common language, it won't stay uniform for long, even if everyone's still in the same place.

So get rid of them, insist on a common language.

Sure, and why don't we just all wear the same clothes, just because different styles or colours can be taken too seriously (on gang turf, for example)? And let's all eat the same food, no need for various cuisines when flavourless mush can keep us alive.

Languages make the world more interesting. I enjoy very much traveling about and seeing how the local communicate, the phonological inventory and morphological quirks they employ, the different judgements on eloquent speech they hold. If all this disappeared, it would be very dull.

And your claim that languages are "too difficult" is a peculiar opinion of some in first world nations. The vast majority of human beings are multilingual, see e.g. Edwards, John. Multilingualism (London: Penguin, 1994). It should only take a person a couple of weeks to acheive a basic conversational level in a foreign language, which can easily be done before each time you set off on vacation. I've never had a problem learning enough of the language to talk with the locals about their culture and mine, and I think my language skills are actually fairly humdrum in comparison to a lot of people I've met.

And if all national tongues disappear in favour of some world language imposed by fiat, what would happen to all the literature written in them? Poetry translates infamously poorly. People have spent millennia composing art in words, one of the skills that makes us the unique species we are. Are we to throw all of those great monuments away?
Re:Um... why? by CRCulver · 2006-11-21 05:48 · Score: 3, Interesting

Strawman, neither of the examples are communication protocols which benefit from the network effect. Language is.

Language may be employed in various ways. Not only to communicate, but also to obfuscate (as some Roma do with their use of Romani) or to explore new possibilities of form (conlangers, bits of Sandor Weores and James Joyce).

People make the world more interesting. It's nice to be able to talk to them.

People aren't solitary individuals, they belong to larger societies that shape them. Understanding his language is part of understanding a person.

Nope. Spanish, Italian, German or other romanic or germanic language I could probably pick up as required. Chinese is apparently particularly difficult.

Chinese's difficulty is mainly at the level of official orthography. I studied Chinese at Defense Language Institute while in the Navy, where we concentrated only on the spoken language and learnt but a few characters, and after the first two months I no longer felt any barriers. Granted, I occasionally had to ask a person to explain what they meant, but still in Chinese of course, and I employed many circumlocutions, but it's not hard at all to learn enough Chinese to talk to Chinese people about themselves and their culture.

It would be consigned to academia, where all dead languages go.

The Finno-Ugrian minorities of Russia, which are my chief object of study now, do not want their languages and literature "consigned to academia". They want their works preserved, they desparately seek more funding of publication (and an end to local government censorship), and they experience great pain over the monolingual policies of the Russian state--most of the Mari men of letters, for example, were murdered under Stalin. Are you to tell those suffering peoples to "just get over it"? One finds in Russia that the locals who did "get over" the loss of their language also have higher rates of suicide, alcoholism, and existential crisis, while those who are fighting to preserve their language and feel a connection to the past have a much more positive outlook.

Horrible indeed by unity100 · 2006-11-21 04:55 · Score: 2, Interesting

Im in a country that is based between europe and middle east, we have a few non-latin characters in the alphabet, still it creates problems when conferring domain names.

no wonder the middle east (arabic) countries are especially wanting this, because the majority of the inexperienced internet users there will be more likely to easily use these domain names, hence the sites using those domains will be greater incentive for controlling what they see, because these domains will be under their control nationally.

not only this, but we as it people will be very unwilling to change all our software to adapt with the new situation because of the horrible development/testing/implementation involved, and hence wont be accepting these domains as valid in our network traffic, which will create a second internet which is as described above, less free.

this should not be allowed.

--
Read radical news here

Internet Layers by 0xABADC0DA · 2006-11-21 04:55 · Score: 2, Funny

1. Physical
2. DataLink
...
6. Presentation
7. Application
8. Tubes
9. Bricks
10. Porn
11. Google
12. YouTube
13. ??
...
16. Profit

It was hard enough remembering them all back when there were only 7.

you couldnt be more wrong by Srin+Tuar · 2006-11-21 05:02 · Score: 5, Informative

much even when Windows solved the problem soooo long ago

i18n on windows is far from "solved".
I do admit that MS had a huge benefit when they started pushing unicode.
(It takes a company with microsoft's level of clout to push around national governments )

And the ASCII problem isn't just bad because it forces people to use inefficient encodings like UTF-8 (THREE bytes per character?)

Perhaps you don't realize that UTF-8 is moving on to become the most dominant character encoding,
and the legacy cruft such as UTF-16 (designed to deal with design flaws in windows) is being phased out.

Even languages that would end up as mostly 3 byte characters tend to benefit from the savings on single byte
characters for control and formatting markup.

I'm not going to harp on about it, but a few basic web searches could enlighten you here.

if(string[index] == '.' || string[index] == '?' || string[index] == '!') sentenceEnd = true;

Code like that *works* in UTF-8, which is one of the things that makes it beatiful. (among many others)

It allows you to deal with world characters sets when it matters, and allows you to ignore them when it does not.
(for example, a lexical analyzer that specifies its tokens does not want to support punctuation from every language ever conceived)

And if you think code like that doesnt exist in the windows world, you are sadly quite naive.
In my experience internationalizing applications, its typically far easier to upate unix applications, which
on occaision need nearly no changes at all, compared to the laborious grind and near total re-write often needed
for ms-windows applications.

Bad for phishing by AaronW · 2006-11-21 05:14 · Score: 2, Interesting

Adding unicode to DNS names would make phishing much more difficult to detect unless all the browsers, email clients and other tools are modified to indicate that a URL may not be what the user thinks it is. It is bad enough as it is, and remember, most Internet users are not as savvy as those of us on Slashdot. I forsee a lot of security implications by adding this.

--
This post is encrypted twice with ROT-13. Documenting or attempting to crack this encryption is illegal.

FUD ... just implement IDN everywhere by jhermans · 2006-11-21 05:42 · Score: 2, Informative

see http://en.wikipedia.org/wiki/Internationalized_dom ain_name

IDN is backwards compatible with existing DNS-servers, and has been in use for several years. Mozilla, Firefox, Safari and Opera support it. So does Internet Exploder 7.

The GNS System? by Kadin2048 · 2006-11-21 06:04 · Score: 5, Interesting

Kind of an interesting point. Maybe we should just let Google run the DNS system, and just replace it with a giant search engine. If we make actually typing in a web address hard enough, then that's what we're effectively doing anyway: people will just start typing everything (including the domain name of sites they want to go to) into the Google Search box at the top of their browser window, instead of the actual address bar.

Actually, DNS arguably is a giant search engine, which simply works on a 1:1 relationship and uses a distributed database (you input one piece of information, and it gives you some corresponding piece of information back). Replacing it with a 'fuzzier' search engine that would give you back a number of results, ranked by relevance, isn't that huge a leap.

--
"Ladies and gentlemen, my killbot features Lotus Notes and a machine gun. It is the finest available."

Re:Why not? by Agelmar · 2006-11-21 08:07 · Score: 2, Informative

What do you mean by "if the unicode of the URL does not match the default unicode of the browser"? The point of unicode is that it is uniform - there's only one. It is broken up into sections, and perhaps that's what you meant to say, but even that won't work.

Let's take Japanese as an example, and I will give you two reasons why it won't work.

Perhaps if you assume I am Japanese, you will assume that my "default unicode section" is the section containing the Japanese characters. So this works fine if I go to URLs that use hiragana / katakana / kanji, but what if I go to www.google.com? Or www.washingtonpost.com? Or www.citibank.com? (Yes, there are Citi offices in Japan). Are you going to throw up a phishing warning simply because I'm browsing an international site? Because if you do that, you're going to make people so used to seeing those warnings that they will just ignore them and/or turn them off.

Even if your method did work, however, this would still be easy to get around. The original 256 characters are repeated many times, and it just so happens that in the full-width forms (in the CJK sections) they are repeated again. I.e. I can use the letters a-z while still staying within the Japanese section of Unicode, and although these letters are the same visually, they are a different character in the Unicode charset, so you could easily have www.google.com and www.google.com registered entirely in the first 256 characters of Unicode or entirely in the full-width form section of Unicode, and there would be no discrepancy whatsoever.

The problem is a lot more complicated than you make it out to be.

I think the whole idea is a mistake by msobkow · 2006-11-21 09:17 · Score: 4, Insightful

Instead of changing the fundamental DNS which is a programmer's and administrator's tool, not an advertising medium. It is founded, like programming languages, on a fundamental 7-bit ASCII character set, and is not intended to be used for NLS text.

A far better solution is some form of VDNS that translates NLS text names into the proper domain name at the system level. That also allows the same domain to have multiple language translations to reflect localized product and service names.

We seriously need to kick the general political community in the arse. They keep trying to impose technical decisions, and it fails as miserably as any corporate PHB's uninformed decisions. ASK the techies to propose solutions instead of shoving ill-conceived ideas down our throats.

For example -- once you mandate multibyte domains, you implicitly mandate multibyte URL components. Goodbye direct mapping of names to the directories, file systems, and servers.

Bad idea. Very bad idea.

--
I do not fail; I succeed at finding out what does not work.

Internet != Web, and other IDN technical issues by billstewart · 2006-11-21 14:11 · Score: 2, Interesting

The Internet is not just the web - you might remember that there are other applications such as email, ftp, ssh, telnet, ping, traceroute, and some people use programs other than browsers to access these things.

The reason ICANN wants to do lots of testing (after having dragged their feet for years before getting started) is that IDNs fundamentally change how DNS works, and it's really important not to break too much when you do that (not that ICANN traditionally worried about that.) It's *not* simple, and you don't want to get it wrong.

DNS translates a set of strings of nominally-ascii characters into numbers, or translates numbers into a set of strings of characters, or translates some sets of strings into other sets of strings, depending on which query you run, and uses specific data formats to represent those strings and numbers. There are restrictions on what characters can be in the strings, some for reasons that we could easily declare to be obsolete (7-bit, uppercase-to-lowercase translation), some for reasons that are harder to change (printable characters only, please), and some which are really hard (dots are used as delimiters, and nulls terminate character strings in some popular computer languages. So you can't just plug in arbitrary Unicode two-byte characters instead of pairs of ASCII bytes and skip the case-munging, because some of the bytes will have values that can't be handled, though most of the 8-bit-character alphabets can be used transparently if you don't mind people using incorrect character sets on occasion. 8-bit character sets simply aren't enough - you can handle most Western languages in ISO-8859-1, and UTF-8 is closer but apparently not quite a cigar (too bad - it would have been my preference.)

The main IDN strategies replace this by adding one more translation layer - character-string-set IDN names are translated into ugly-but-recognizable Punycode strings, which get used with standard DNS character-string-set to number translations in the forward direction, and in the reverse direction, anything that arrived as a Punycode xn-uglystuff string usually gets fed to a Punycode-to-Unicode translator by a user interface.

Some things can be fixed by recompiling (or relinking, or re-DLLing) all of your programs with a DNS resolver library that guesses whether to convert strings or not - forward DNS knows to punycode non-ascii characters and not to re-punycode xn--uglystuff, though reverse DNS doesn't necessarily know whether to convert it to Unicode 16 or UTF-8 or just pass it on directly, and if you've typed in a domain name using something other than 7-bit lowercase+digits ASCII, it knows to punycode it, and obviously any domain registry supporting punycode ought to allow anybody who registers a name that doesn't need punycode to have both the straight and punycode names. But it's still ugly.

--

Bill Stewart
New Fast-Compression-only CPR http://preview.tinyurl.com/dy575ks

DNS is *precisely* for NLS text by billstewart · 2006-11-21 14:37 · Score: 2, Insightful

The problem is that it was designed for natural language text in the US back when some computers could deal with the new fancy feature of lower-case letters and others couldn't, and when humans tended to get confused about that sort of thing even though they all spoke English, and some computers could deal with 8-bit bytes and punctuation while others were very limited. I don't know if the IBM 48-character character sets were still around, but 64-character was still widespread, and EBCDIC was certainly still plagueing many of us in the early 1980s. It's a tool for users, not the programmers and admins who support them - but it's a tool for users of _computers_, so it still has technical constraints.

It's been obvious since the Europeans got DNS for their ftp and email that there was a problem, even before they invented the web, and even aside from myopic silliness like having .GOV be a US TLD and fortunate accidental decisions about having .COM be viewed as global instead of US-only. Techies have been working on the internationalized-character-sets-for-computers problem for a while. ICANN's finally starting to pay some attention to the IDN issues, but they're not fundamentally a technology organization, they're a trademark protection organization and their approach to non-US domain names was an attempt at World Domination designed to get the CCTLDs to follow their trademark-protection rules, not to worry about fundamental technologies like making DNS work outside the US.

DNS has a couple of restrictions that may have made sense in 1985, long before Unicode was invented. Some of them are easy to fix, especially since most DNS servers in the world use versions of one of three or four server programs, but there's a lot more resolver software out there that deliberately casefolds (though you could fix most of that in two or three generations of Microsoft releases, if you knew what you wanted it to do), and you can fix some of it administratively, by having the people who register UPPERCASE-EXAMPLE.COM also register uppercase-example.com and maybe Uppercase-example.com and do a few similar things for munged Unicode.

--

Bill Stewart
New Fast-Compression-only CPR http://preview.tinyurl.com/dy575ks

Re:Anything's possible. by Evilest+Doer · 2006-11-23 16:32 · Score: 2, Insightful

It's amazing, krell. How do you type so much with Rush Limbaugh's cock rammed down your throat?

--
I feel like death on a soda cracker.

Slashdot Mirror

ICANN Under Pressure Over Non-Latin Characters

72 of 471 comments (clear)