ICANN Under Pressure Over Non-Latin Characters
RidcullyTheBrown writes "A story from the Sydney Morning Herald is reporting that ICANN is under pressure to introduce non-Latin characters into DNS names sooner rather than later. The effort is being spearheaded by nations in the Middle East and Asia. Currently there are only 37 characters usable in DNS entries, out of an estimated 50,000 that would be usable if ICANN changed naming restrictions. Given that some bind implementations still barf on an underscore, is this really premature?" From the article: "Plans to fast-track the introduction of non-English characters in website domain names could 'break the whole internet', warns ICANN chief executive Paul Twomey ... Twomey refuses to rush the process, and is currently conducting 'laboratory testing' to ensure that nothing can go wrong. 'The internet is like a fifteen story building, and with international domain names what we're trying to do is change the bricks in the basement,' he said. 'If we change the bricks there's all these layers of code above the DNS ... we have to make sure that if we change the system, the rest is all going to work.'" Given that some societies have used non-Latin characters for thousands of years, is this a bit late in coming?
It's called a "Homograph Attack". See http://en.wikipedia.org/wiki/IDN_homograph_attack
much even when Windows solved the problem soooo long ago
i18n on windows is far from "solved".
I do admit that MS had a huge benefit when they started pushing unicode.
(It takes a company with microsoft's level of clout to push around national governments )
And the ASCII problem isn't just bad because it forces people to use inefficient encodings like UTF-8 (THREE bytes per character?)
Perhaps you don't realize that UTF-8 is moving on to become the most dominant character encoding,
and the legacy cruft such as UTF-16 (designed to deal with design flaws in windows) is being phased out.
Even languages that would end up as mostly 3 byte characters tend to benefit from the savings on single byte
characters for control and formatting markup.
I'm not going to harp on about it, but a few basic web searches could enlighten you here.
if(string[index] == '.' || string[index] == '?' || string[index] == '!') sentenceEnd = true;
Code like that *works* in UTF-8, which is one of the things that makes it beatiful. (among many others)
It allows you to deal with world characters sets when it matters, and allows you to ignore them when it does not.
(for example, a lexical analyzer that specifies its tokens does not want to support punctuation from every language ever conceived)
And if you think code like that doesnt exist in the windows world, you are sadly quite naive.
In my experience internationalizing applications, its typically far easier to upate unix applications, which
on occaision need nearly no changes at all, compared to the laborious grind and near total re-write often needed
for ms-windows applications.
What's this? I've been able to use the Norwegian characters in domain names for a long time. There are screetshots over at http://en.wikipedia.org/wiki/Internationalized_dom ain_name