Internationalized Domain Names Coming Soon
rduke15 writes "You think you know how to parse a domain name for validity? Well, in case you haven't noticed, things are getting tougher as registrars keep adopting IDN (Internationalized Domain Names), which uses a weird encoding named Punycode to enable accented characters in domain names. The Register reports about Switzerland, Germany and Austria's joint move to enable IDN. See the overview in English from Switch. But I guess it would be difficult to talk about this on /., since it does not even support basic Latin-1 ... :-)"
More ways for trolls to disguise goatse.cx links...
Sounds like a job for Unicode.
Unicode.org
I sure hope this harebrained idea doesn't take off.
After all, now they need not only worry about registering say...
c rosoft.tv
Microsoft.com
Microsoft.net
Microsoft.org
Mi
etc..
But also
Microsoft.com
Microsoft.com
Well, you get the picture.
--Won't that be grand? Computers and the programs will start thinking and the people will stop. - Dr. Walter Gibbs
If you don't know how to type the characters necessary to access the web site, chances are you won't be able to read the content anyway. So I think it's a moot point.
Ita erat quando hic adveni.
Since this solution doesn't break any old implementation just the countries that need it will have to modify their software, and not wait for the slow and expensive process of changing all of DNS, which a large part of the 'net isn't motivated do pay for.
This means that it can't possibly include ALL of the unicode spectrum, as Unicode supports far more than just 92 extra characters.
Also, the way the coding is going to work, you still can't register a name with B.
I am unamerican, and proud of it!
Unicode-reinterpreted-as-a-string-of-ASCII-bytes (taken literally) can only mean UTF-7, which never really got much traction, but had no NULs or control characters in it - all pure, readable ASCII. It's problem in DNS would be that it treats upper and lower case as distinct, which is not true for current DNS queries. If you meant "UTF-8" when you said "unicode-reinterpreted-as-a-string-of-ASCII-bytes" , that also has no NUL or control codes in it, and unlike UTF-7 it lets you treat upper/lower case any way you want. It's drawback is that it will insert bytes in the 128..255 (ie, non-ASCII) range into the data stream, which will probably cause trouble for current DNS servers.
So, to sum it up, you are right that current Unicode encodings will not meet current DNS RFCs, but the reason you gave wasn't quite right. Punycode does solve the problem, but ugh, punycode is an awful hack of a character encoding system. I'd hate to see it live on forever, but it might be useful getting us started on i18n-ified DNS.
Yes, it is. Because it's not just a few "umlauts". When you're talking about Asian or other non-Romanized languages then the Romanization may be totally incomprehensible to even some speakers of that language. It's one thing to lose a few accent marks and such but it's quite another to translate your language into a totally incomprehensible and unrelated format. In fact in kanji based languages at the very least Romanization actually LOSES information. It's not just a matter of transcribing the sounds into another format because the kanji carry additional meaning not present in just the phonetic lanaguage. If you've ever seen two native Chinese or Japanese speakers talk to each other they frequently will "write" kanji in the air or on the palm of the other person's hand with their fingers because their spoken language is imprecise.These changes are very necessary for the Internet to become a truly international phenomenon
"djbdns doesn't support unicode either, although it doesn't rely on standard c-libraries, so unicode support might only take a few weeks to add."
djbdns is 8-bit clean. Use UTF-8 all you want right now.
It's not as simple as you may think. I am all for Unicode, but to use it for domain names can lead to unwanted consequences.
There already exists some intenationalized domain names in Chinese, so instead of having chopsticks.com we can have [insert chinese characters for chopsticks here].com.
The problem comes from the fact that there are tens of thousands of different Chinese characters, each of those having a different unicode code but many of those being only slight variations of each other, or even so similar than a regular Chinese reading user wouldn't notice the difference at first glance. Thus you could have two very different websites having seemingly exactly the same name in Chinese but being different nonetheless because the unicode for their names is different.
With only 26 letters and a few more characters, there has been many abuses of domain names, like www.microsoft-.com instead of www.microsoft.com (or some similar abuse), but the possibilities for abuse in chinese are almost infinite. The same would be in many European languages: many will not pay enough attention to the differences between an acute accent and a grave accent in French and might be mislead to a different site than the one they were looking for. Imagine the credit card payment page of the bank Societe Generale: with the accents written backwards, a lookalike site can be created under a domain name that looks the same. The same in Polish, with the cedillas under many of the letters or the L, with or without the bar accross it.
Technically, unicode may be feasible, but human beings cannot distinguish between the hundreds of thousands (and more!) differents letters, characters and other signs that it offers...
http://www.masquilier.org/republic/election/ Condorcet, Plurality voting and alternative voting enabled bulletin board.
You know, this arrogant, self-centric view does not help the discussion.
Anyway, the current infrastructure DOES NO have to be updated and this change is NOT intended to be "some jagoff's playground", but rather for the non-English speaking people - there are quite a few of them.
Real life is overrated.
Never fear, oh monolingual one, I found this very handy site that will help solve this pesky problem for you. Try it some time and let us know what you think!
If Paul Vixie did say that it would kinda argue for chosing that route rather than trying to get the IETF to agree to anything, so far it has been over five years since the start of this effort and counting.
The real problem is not fixing Bind, that is easy. Deploying bind updates and deploying compatible client updates is the real problem. It just isn't feasible.
Looking for an Information Security student project suggestion?
Try http://dotcrimeManifesto.com/
Why not extend dns to support unicode?
DNS should never get Unicode support, or any form of "internationalization" for that matter!
DNS is supposed to be a way for humans to communicate with computers about internet hosts. The intent is not for some human to be able to read it, but for all humans. This has worked until now because hostnames were limited to only ~37 characters. Regardless of native language, any computer operator can quickly learn to handle the [a-z][0-9] gylphs. Basically anyone literate in one language can copy ASCII characters from a signpost onto a notepad, and then punch those into a keyboard. Even if her culture doesn't use the ASCII set in normal daily activities (which about everyone in America, Europe, and Japan does), then the shapes are at least simple enough to copy geometrically.
But if 16-bit charsets are allowed in DNS, we could get hostnames composed of 3 Chinese characters and two Arabic ones, and which a Russian or Briton will be incapable of processing without tremendous pain.
DNS is something that should be left in a "lowest common denominator" form, so that it's accessible to all of humanity (if they meet the low hurdle of operating a normal PC)
Internationalized host identifiers in URLs will be important, of course. But they should be a separate layer implemented on top of DNS. DNS is a standard that already exists. Rather than changing the standard and breaking every single internet-using computer (the "flag day" scenario), a new system should be rolled out for people who want host identifiers in funny-looking squiggles.
I know there are times when differnet accents sometimes indicate different words -- but I'm under the impression that it is unlikely that more than one of them would be a "good" domain name. (Am I wrong about that?)
This won't work for non-latin characters, obviously. But UTF-8 seems like a better solution to that. (I understand that most chineese words are 2-3 characters of 2-3 bytes (unified is U-430 to U-9fa and upto U-7ff is 2 characters) for 4-9 bytes -- clearly less than 63 bytes) The obvious downside is that it means that all DNS servers and resolvers must (at least!) be 8-bit clean.