ICANN Under Pressure Over Non-Latin Characters
RidcullyTheBrown writes "A story from the Sydney Morning Herald is reporting that ICANN is under pressure to introduce non-Latin characters into DNS names sooner rather than later. The effort is being spearheaded by nations in the Middle East and Asia. Currently there are only 37 characters usable in DNS entries, out of an estimated 50,000 that would be usable if ICANN changed naming restrictions. Given that some bind implementations still barf on an underscore, is this really premature?" From the article: "Plans to fast-track the introduction of non-English characters in website domain names could 'break the whole internet', warns ICANN chief executive Paul Twomey ... Twomey refuses to rush the process, and is currently conducting 'laboratory testing' to ensure that nothing can go wrong. 'The internet is like a fifteen story building, and with international domain names what we're trying to do is change the bricks in the basement,' he said. 'If we change the bricks there's all these layers of code above the DNS ... we have to make sure that if we change the system, the rest is all going to work.'" Given that some societies have used non-Latin characters for thousands of years, is this a bit late in coming?
Changing a system which works is a very, very bad idea.
Wont this open up the system to many more phishing attacks involving addresses which include non-latin characters which look similar to latin ones?
Wait, so it's not tubes... It's a 15 story building?
Anyone else getting more lost every day?
"If you make people think they're thinking, they'll love you; But if you really make them think, they'll hate you." - DM
It won't break the whole Internet! Just DNS. DNS is overrated anyway. Now if you'll excuse me, I need to finish reading all the new posts on 66.35.250.150.
Quidquid latine dictum sit, altum sonatur.
Yes, countries that use non-English characters should be able to interact with the rest of the world using their natural language. No, they shouldn't rush the change and risk a possible crash of a large portion of the Internet. Be patient young patawans, soon you will be able to have DNS names with any character you can think of, but it will be reliable and actually work.
Space for rent, inquire within
Plans to fast-track the introduction of non-English characters in website domain names could 'break the whole internet', warns ICANN chief executive Paul Twomey
Luckily for us, GWB knows that we have some redundancy with the Internets, so if one breaks we can just use another.The ICANN tries to give a technical reason to a political problem, although this reason may be valid, it is not a very good idea. With the UN, it will be handled by international comitees and we will all be long dead before they finally agree on which country will be in that comitee.
Perhaps, but I can't fault ICANN for this one, as much as I might like to. Like it or not, most internet technologies have their roots in latin speaking countries, which means systems developed there may not be tweaked to work with outside language schemes.
If the fault lies with anyone, it's with the individual contributers of the tech. Or better, with the non-latin countries appearent lack of interest in some of the core projects needed to push this through ICANN ( specifically DNS, httpd ).
Mod me down with all of your hatred and your journey towards the dark side will be complete!
- Don't be too surprised when people around you start building their own houses rather than choosing to pay rent.
DNS upheaval has been a long time coming, and the current anti-American sentiment worldwide isn't exactly helping to stabilize it. We're already seeing all sorts of adhoc routing setups that deal with shortcomings of an ameri-centric DNS. My guess is that within the next few years, ICANN's 'control' of the internet will be in name only as everyone else in the world will have moved on to alternative routing and domain systems.
The next Slashdot story will be ready soon, but subscribers can beat the rush and slashdot the links early!
"Given that some societies have used non-Latin characters for thousands of years, is this a bit late in coming?"
No.
Zonk either knows zero about the histories of the Internet or DNS, or is so enamored of finishing stories with questions that he'll tack on the truly ridiculous.
What you do with a computer does not constitute the whole of computing.
For all you people saying "There's no problem, just do it" - I say watch out... there will be a rush of attacks and spoofs as soon as this is opened up. The letter "a" appears in the unicode character set multiple times, and some of the variants are almost indistinguishable. I'm not just talking about someone registering släshdot.org, I'm talking about someone reigstering slashdot.org (the a is FF41 instead of the normal a). Good luck telling the attacks appart from the real sites.
I'd be in favor of the change just because anything that undermines the Unix Tower of Babel -- the dependency on ASCII which complicates text handling sooooo much even when Windows solved the problem soooo long ago -- is good. Even Java gets it. Even Apple (finally) get it. Unix Is Teh Problem.
And the ASCII problem isn't just bad because it forces people to use inefficient encodings like UTF-8 (THREE bytes per character?) It's bad because it allows people to write code like:
if(string[index] == '.' || string[index] == '?' || string[index] == '!') sentenceEnd = true;
(a line repeated, with subtle variations, several hundred times in the code of a certain ubiquitous editor).
And, lo and behold, the above does not work, but once it appears in a few thousand places it's impossible to fix, and a vast towering structure of fixes made by people who don't really understand why it's an issue is built.
So, even though the proposed change would be hugely inconvenient for a huge number of people, I'm in favor, because I want the world to grow the fork up and understand that text != byte array some time while I'm still alive.
Whence? Hence. Whither? Thither.
Unicode has many characters that look almost exactly like characters in Latin-1.
For example, if "www.microsoft.com" is shown in your browser's address bar, how would you know for sure that the "c" is not from the Cyrillic alphabet, or the "o" is not from the Greek alphabet?
You simply won't be able to trust your browser's address bar anymore. The possibilities for phishing attacks are endless.
Nice for localising, sure, but how usable will Japanese, Indian, or Arabic script URLs -- for example -- be for those who do not have access to the respective sets or keyboard layouts?
Of course it's late in coming.
But that doesn't mean it should be done hastily and badly.
http://alternatives.rzero.com/
Given that some societies have used non-Latin characters for thousands of years, is this a bit late in coming?
Let's be clear. The domain name system only uses English characters. There are lots of languages in Europe (Italian, Spanish, French...) which are closer to latin than English (which isn't really a latin language at all) which are not currently represented, because you can't use accents in domain names, or other letters such as the spanish Enye (n with a squiggle, actually a distinct letter). English speakers often think accents aren't important but they can completely change a word's meaning.
The internet was originally conceived, designed, and implemented in the USA at a time where hardware was at a premium, and corners were cut to conserve that limited resource. DNS was just one of the results of that era. However, it is the most visible because it is the front end means for people to find each other. That means there is now a very well established standard, used by people across the entire globe, that is very difficult to change.
Changing all the DNS servers in the world to switch from ASCII to Unicode is NOT trivial. The fact that some societies have used non-latin characters for thousands of years is completely and utterly irrelevant. THEY didn't make the internet. They simply bolted themselves on to an existing infrastructure.
I agree that progress needs to be made to accomodate non-latin characters, but to have people whining about "how they want it, and want it now"... That's just ridiculous. It's like waltzing into a house that was built 40 years ago and having a tantrum because the stairs are too steep and the house is too squished. Major structural renovations take time, effort, and careful planning. And there is nothing you can do to avoid that, short of implementing cheap stop-gap measures that are virtually guaranteed to cause even bigger unintended headaches later on.
Given that some societies have used non-Latin characters for thousands of years, is this a bit late in coming?
Those societies did not build an entire economic and social infrastructure using all 50,000 of those characters in a few decades, though.
Rex is 09 F9 11 02 9D 74 E3 5B D8 41 56 C5 63 56 88 C0
How 'bout we all just speak English and forget about all those weird letters.
(It was a joke... well sort of)
ICANN Under Pressure Over Non-Latin Characters
You mean white people?
If Nalgene water bottles are outlawed, only outlaws will have Nalgene water bottles.
Now if you'll excuse me, I need to finish reading all the new posts on 66.35.250.150.
Base-Ten CHAUVINIST!!!
What about societies that use Base 2 [binary], or Base 8 [octal], or Base 16 [hexadecimal]?
Or entire societies, like the British empire, which use no base at all?
12 inches in a foot. 3 feet in a yard. 1760 yards in a mile...
60 seconds in a minute. 60 minutes in a hour. 24 hours in a day. 7 days in a week. 52 weeks in a year [give or take]...
Or how about base 12?
12 keys in a chromatic scale: A 440, then, logarithmically [give or take a little well-tempering]: A#, B, B# == C [kinda sorta], C#, D, D#, E, E# == F [kinda sorta], F#, G, G#, and finally A 880.
Except that on the continent, things are often just a little sharper - say A 443/444/445 & A 886/888/890...
And let's not even get into water freeezing & boiling at 32 & 212 versus 0 & 100...
2. Doesnt require any change to the DNS system. (other than some name policy changes)
3. Allows links to be imbedded in normalweb-pages so that they can be cut and pasted by anyone with latin functionality. So a Japanese person could cut and paste the link to some arabic site that they dont have the font for.
4. While this is a kludge it has some major advantages over rebuilding the DNS system.
Storm
DNS won't break. In fact, it already works! The thing is called IDN and is supported by all modern web browsers (including IE). Try for yourself - http://www.kozowski.pl (I hope Slashcode won't caniballize letter "").
So DNS and Web is OK. Any breakage I can think of may appear in email systems or other domain-based forms of communication.
:wq
Does ICAN control .cn (China)? Or other national TLDs? Why don't they just start registering .com, .org, .mil (ie the USA TLDs) English.
domain in their local language. Leave
Tht ìs thê £äst thïñg wë ñèêd
Dibs on ©óm
What's this going to do for security. Didn't we have phishing attacks receintly that consisted of unicode characters being inserted into e+bay.com for instance that didn't get displayed. the domain e+bay.com being different than ebay.com.
"A domain name is a unique address that allows people to access a website, for example, smh.com.au"
No,a domain name is a sequence of characters mapped to an IP address. It was designed so as you won't have to remember 66.35.250.150 instead of slashdot.org. This wasn't a problem while the original Internet consisted of just four computers. DNS was never designed to provide identity. There was also the case of a stock trader hacking a DNS server and redirecting traffic from a legitimate finantial site to his own where he had duplicated the real site only with bogus information.
"He said that this could create problems where, for example, a character in Urdu looks identical to one in Arabic"
It sure could. How about totally replacing DNS with a system of online identities.
davecb5620@gmail.com
"Yes, countries that use non-English characters should be able to interact with the rest of the world using their natural language."
Why... No really. You speak as if this is a good thing. Why should they be able to use their natural language rather than English? Why shouldn't they be restricted to a limited area of local language speaking people?
The reason the Internet is useful is because everyone speaks TCP/IP. Incompatible protocols are to be actively discouraged because they balkanise the network. Language is exactly the same. The reason the Internet is useful is because everyone speaks English, the more divided it becomes the less useful it becomes.
Languages are anachronisms, the only reason we have more than one is the physical distance between locations and difficulty travelling allowed them to evolve independently. Well that isn't the world we live in any more and the different languages actually make communication far more difficult now. They're no longer beneficial. So get rid of them, insist on a common language. The most popular happens to be English at the moment. I could live with Spanish, but for those of you about to suggest Chinese, read this before deciding: http://www.pinyin.info/readings/texts/moser.html
We should be using this opportunity to actively get rid of languages.
Deleted
Im in a country that is based between europe and middle east, we have a few non-latin characters in the alphabet, still it creates problems when conferring domain names.
no wonder the middle east (arabic) countries are especially wanting this, because the majority of the inexperienced internet users there will be more likely to easily use these domain names, hence the sites using those domains will be greater incentive for controlling what they see, because these domains will be under their control nationally.
not only this, but we as it people will be very unwilling to change all our software to adapt with the new situation because of the horrible development/testing/implementation involved, and hence wont be accepting these domains as valid in our network traffic, which will create a second internet which is as described above, less free.
this should not be allowed.
Read radical news here
1. Physical
...
...
2. DataLink
6. Presentation
7. Application
8. Tubes
9. Bricks
10. Porn
11. Google
12. YouTube
13. ??
16. Profit
It was hard enough remembering them all back when there were only 7.
much even when Windows solved the problem soooo long ago
i18n on windows is far from "solved".
I do admit that MS had a huge benefit when they started pushing unicode.
(It takes a company with microsoft's level of clout to push around national governments )
And the ASCII problem isn't just bad because it forces people to use inefficient encodings like UTF-8 (THREE bytes per character?)
Perhaps you don't realize that UTF-8 is moving on to become the most dominant character encoding,
and the legacy cruft such as UTF-16 (designed to deal with design flaws in windows) is being phased out.
Even languages that would end up as mostly 3 byte characters tend to benefit from the savings on single byte
characters for control and formatting markup.
I'm not going to harp on about it, but a few basic web searches could enlighten you here.
if(string[index] == '.' || string[index] == '?' || string[index] == '!') sentenceEnd = true;
Code like that *works* in UTF-8, which is one of the things that makes it beatiful. (among many others)
It allows you to deal with world characters sets when it matters, and allows you to ignore them when it does not.
(for example, a lexical analyzer that specifies its tokens does not want to support punctuation from every language ever conceived)
And if you think code like that doesnt exist in the windows world, you are sadly quite naive.
In my experience internationalizing applications, its typically far easier to upate unix applications, which
on occaision need nearly no changes at all, compared to the laborious grind and near total re-write often needed
for ms-windows applications.
Adding unicode to DNS names would make phishing much more difficult to detect unless all the browsers, email clients and other tools are modified to indicate that a URL may not be what the user thinks it is. It is bad enough as it is, and remember, most Internet users are not as savvy as those of us on Slashdot. I forsee a lot of security implications by adding this.
This post is encrypted twice with ROT-13. Documenting or attempting to crack this encryption is illegal.
see http://en.wikipedia.org/wiki/Internationalized_dom ain_name
IDN is backwards compatible with existing DNS-servers, and has been in use for several years. Mozilla, Firefox, Safari and Opera support it. So does Internet Exploder 7.
Kind of an interesting point. Maybe we should just let Google run the DNS system, and just replace it with a giant search engine. If we make actually typing in a web address hard enough, then that's what we're effectively doing anyway: people will just start typing everything (including the domain name of sites they want to go to) into the Google Search box at the top of their browser window, instead of the actual address bar.
Actually, DNS arguably is a giant search engine, which simply works on a 1:1 relationship and uses a distributed database (you input one piece of information, and it gives you some corresponding piece of information back). Replacing it with a 'fuzzier' search engine that would give you back a number of results, ranked by relevance, isn't that huge a leap.
"Ladies and gentlemen, my killbot features Lotus Notes and a machine gun. It is the finest available."
What do you mean by "if the unicode of the URL does not match the default unicode of the browser"? The point of unicode is that it is uniform - there's only one. It is broken up into sections, and perhaps that's what you meant to say, but even that won't work.
Let's take Japanese as an example, and I will give you two reasons why it won't work.
Perhaps if you assume I am Japanese, you will assume that my "default unicode section" is the section containing the Japanese characters. So this works fine if I go to URLs that use hiragana / katakana / kanji, but what if I go to www.google.com? Or www.washingtonpost.com? Or www.citibank.com? (Yes, there are Citi offices in Japan). Are you going to throw up a phishing warning simply because I'm browsing an international site? Because if you do that, you're going to make people so used to seeing those warnings that they will just ignore them and/or turn them off.
Even if your method did work, however, this would still be easy to get around. The original 256 characters are repeated many times, and it just so happens that in the full-width forms (in the CJK sections) they are repeated again. I.e. I can use the letters a-z while still staying within the Japanese section of Unicode, and although these letters are the same visually, they are a different character in the Unicode charset, so you could easily have www.google.com and www.google.com registered entirely in the first 256 characters of Unicode or entirely in the full-width form section of Unicode, and there would be no discrepancy whatsoever.
The problem is a lot more complicated than you make it out to be.
Instead of changing the fundamental DNS which is a programmer's and administrator's tool, not an advertising medium. It is founded, like programming languages, on a fundamental 7-bit ASCII character set, and is not intended to be used for NLS text.
A far better solution is some form of VDNS that translates NLS text names into the proper domain name at the system level. That also allows the same domain to have multiple language translations to reflect localized product and service names.
We seriously need to kick the general political community in the arse. They keep trying to impose technical decisions, and it fails as miserably as any corporate PHB's uninformed decisions. ASK the techies to propose solutions instead of shoving ill-conceived ideas down our throats.
For example -- once you mandate multibyte domains, you implicitly mandate multibyte URL components. Goodbye direct mapping of names to the directories, file systems, and servers.
Bad idea. Very bad idea.
I do not fail; I succeed at finding out what does not work.
The reason ICANN wants to do lots of testing (after having dragged their feet for years before getting started) is that IDNs fundamentally change how DNS works, and it's really important not to break too much when you do that (not that ICANN traditionally worried about that.) It's *not* simple, and you don't want to get it wrong.
DNS translates a set of strings of nominally-ascii characters into numbers, or translates numbers into a set of strings of characters, or translates some sets of strings into other sets of strings, depending on which query you run, and uses specific data formats to represent those strings and numbers. There are restrictions on what characters can be in the strings, some for reasons that we could easily declare to be obsolete (7-bit, uppercase-to-lowercase translation), some for reasons that are harder to change (printable characters only, please), and some which are really hard (dots are used as delimiters, and nulls terminate character strings in some popular computer languages. So you can't just plug in arbitrary Unicode two-byte characters instead of pairs of ASCII bytes and skip the case-munging, because some of the bytes will have values that can't be handled, though most of the 8-bit-character alphabets can be used transparently if you don't mind people using incorrect character sets on occasion. 8-bit character sets simply aren't enough - you can handle most Western languages in ISO-8859-1, and UTF-8 is closer but apparently not quite a cigar (too bad - it would have been my preference.)
The main IDN strategies replace this by adding one more translation layer - character-string-set IDN names are translated into ugly-but-recognizable Punycode strings, which get used with standard DNS character-string-set to number translations in the forward direction, and in the reverse direction, anything that arrived as a Punycode xn-uglystuff string usually gets fed to a Punycode-to-Unicode translator by a user interface.
Some things can be fixed by recompiling (or relinking, or re-DLLing) all of your programs with a DNS resolver library that guesses whether to convert strings or not - forward DNS knows to punycode non-ascii characters and not to re-punycode xn--uglystuff, though reverse DNS doesn't necessarily know whether to convert it to Unicode 16 or UTF-8 or just pass it on directly, and if you've typed in a domain name using something other than 7-bit lowercase+digits ASCII, it knows to punycode it, and obviously any domain registry supporting punycode ought to allow anybody who registers a name that doesn't need punycode to have both the straight and punycode names. But it's still ugly.
Bill Stewart
New Fast-Compression-only CPR http://preview.tinyurl.com/dy575ks
It's been obvious since the Europeans got DNS for their ftp and email that there was a problem, even before they invented the web, and even aside from myopic silliness like having
DNS has a couple of restrictions that may have made sense in 1985, long before Unicode was invented. Some of them are easy to fix, especially since most DNS servers in the world use versions of one of three or four server programs, but there's a lot more resolver software out there that deliberately casefolds (though you could fix most of that in two or three generations of Microsoft releases, if you knew what you wanted it to do), and you can fix some of it administratively, by having the people who register UPPERCASE-EXAMPLE.COM also register uppercase-example.com and maybe Uppercase-example.com and do a few similar things for munged Unicode.
Bill Stewart
New Fast-Compression-only CPR http://preview.tinyurl.com/dy575ks
It's amazing, krell. How do you type so much with Rush Limbaugh's cock rammed down your throat?
I feel like death on a soda cracker.