Slashdot Mirror


Internationalized Domain Names Coming Soon

rduke15 writes "You think you know how to parse a domain name for validity? Well, in case you haven't noticed, things are getting tougher as registrars keep adopting IDN (Internationalized Domain Names), which uses a weird encoding named Punycode to enable accented characters in domain names. The Register reports about Switzerland, Germany and Austria's joint move to enable IDN. See the overview in English from Switch. But I guess it would be difficult to talk about this on /., since it does not even support basic Latin-1 ... :-)"

41 of 526 comments (clear)

  1. Ah great... by Worminater · · Score: 5, Insightful

    More ways for trolls to disguise goatse.cx links...

    1. Re:Ah great... by MikeXpop · · Score: 4, Insightful

      Heh. Worse than that. Imagine http://www.paypal.com/enteryourcreditcardnumberher e.php! How many people do you think that would fool? I'd be guessing a lot more than sites now are.

      --
      Etiquette is etiquette. He kills his mother but he can't wear grey trousers.
    2. Re:Ah great... by Trejkaz · · Score: 3, Insightful

      Okay, you got me. That domain name is 100% pixel identical to www.paypal.com ... which letter is changed?

      --
      Karma: It's all a bunch of tree-huggin' hippy crap!
  2. sounds like by Anonymous Coward · · Score: 3, Insightful

    Sounds like a job for Unicode.
    Unicode.org

  3. Isn't there a better way? by CTalkobt · · Score: 4, Interesting

    It looks to me like the problem is that the DNS servers don't support unicode so they're using a bad implementation of it.

    Why not extend dns to support unicode? That way they'd be no translation or other crap to go through.

    Granted software would need changing but that be the case with the mangled crap that's mentioned in the article.

    What am I not understanding here? Or is this just implementation dreamed up to make life complicated?

    --
    There's a gorilla from Manilla whose a fella that stinks of vanilla and has salmonella.
    1. Re:Isn't there a better way? by wmshub · · Score: 4, Insightful

      Unicode-reinterpreted-as-a-string-of-ASCII-bytes (taken literally) can only mean UTF-7, which never really got much traction, but had no NULs or control characters in it - all pure, readable ASCII. It's problem in DNS would be that it treats upper and lower case as distinct, which is not true for current DNS queries. If you meant "UTF-8" when you said "unicode-reinterpreted-as-a-string-of-ASCII-bytes" , that also has no NUL or control codes in it, and unlike UTF-7 it lets you treat upper/lower case any way you want. It's drawback is that it will insert bytes in the 128..255 (ie, non-ASCII) range into the data stream, which will probably cause trouble for current DNS servers.

      So, to sum it up, you are right that current Unicode encodings will not meet current DNS RFCs, but the reason you gave wasn't quite right. Punycode does solve the problem, but ugh, punycode is an awful hack of a character encoding system. I'd hate to see it live on forever, but it might be useful getting us started on i18n-ified DNS.

    2. Re:Isn't there a better way? by Rob+Riggs · · Score: 4, Informative
      wouldn't UTF-8 have worked just as well?

      No. The problem that punycode solves is that the encoded DNS names are themselves valid RFC1034 DNS names. That is, even when encoded, standard DNS validity checkers will accept the name.

      UTF-8 does not have this property

      --
      the growth in cynicism and rebellion has not been without cause
    3. Re:Isn't there a better way? by pawal · · Score: 4, Interesting

      There are _so_ many applications using the domain name system that feeding UTF-8 through it will break most of them. Except for perhaps Internet Explorer.

      The registries using UTF-8 (most notably .NU) are running IDN in parallell with UTF-8 now.

      The Swedish registry is only using IDN. The reason for that is that UTF-8 in DNS is not an internet supported standard at all.

      http://www.xn--rksmrgs-5wao1o.se/ will work if you are using a recend Mozilla. (Slashdot should upgrade to at least ISO-8859-1 or UTF-8... I couldn't write raksmorgas.se correctly.)

      Microsoft are extremly slow in supporting IDN, and will probably not launch it until next OS release which is in 2006... There are plugins from Verisign.

      Do a good thing, release an open source plugin for MSIE.

    4. Re:Isn't there a better way? by Anguo · · Score: 3, Insightful
      It looks to me like the problem is that the DNS servers don't support unicode so they're using a bad implementation of it. Why not extend dns to support unicode? That way they'd be no translation or other crap to go through.

      It's not as simple as you may think. I am all for Unicode, but to use it for domain names can lead to unwanted consequences.

      There already exists some intenationalized domain names in Chinese, so instead of having chopsticks.com we can have [insert chinese characters for chopsticks here].com.

      The problem comes from the fact that there are tens of thousands of different Chinese characters, each of those having a different unicode code but many of those being only slight variations of each other, or even so similar than a regular Chinese reading user wouldn't notice the difference at first glance. Thus you could have two very different websites having seemingly exactly the same name in Chinese but being different nonetheless because the unicode for their names is different.

      With only 26 letters and a few more characters, there has been many abuses of domain names, like www.microsoft-.com instead of www.microsoft.com (or some similar abuse), but the possibilities for abuse in chinese are almost infinite. The same would be in many European languages: many will not pay enough attention to the differences between an acute accent and a grave accent in French and might be mislead to a different site than the one they were looking for. Imagine the credit card payment page of the bank Societe Generale: with the accents written backwards, a lookalike site can be created under a domain name that looks the same. The same in Polish, with the cedillas under many of the letters or the L, with or without the bar accross it.

      Technically, unicode may be feasible, but human beings cannot distinguish between the hundreds of thousands (and more!) differents letters, characters and other signs that it offers...

      --
      http://www.masquilier.org/republic/election/ Condorcet, Plurality voting and alternative voting enabled bulletin board.
    5. Re:Isn't there a better way? by Zeinfeld · · Score: 3, Insightful
      Paul Vickie (of BIND fame) has stated that supporting unicode in bind would probably require at least a year to implement, and could introduce new buffer overflow exploits.

      If Paul Vixie did say that it would kinda argue for chosing that route rather than trying to get the IETF to agree to anything, so far it has been over five years since the start of this effort and counting.

      The real problem is not fixing Bind, that is easy. Deploying bind updates and deploying compatible client updates is the real problem. It just isn't feasible.

      --
      Looking for an Information Security student project suggestion?
      Try http://dotcrimeManifesto.com/
    6. Re:Isn't there a better way? by Minna+Kirai · · Score: 4, Insightful

      Why not extend dns to support unicode?

      DNS should never get Unicode support, or any form of "internationalization" for that matter!

      DNS is supposed to be a way for humans to communicate with computers about internet hosts. The intent is not for some human to be able to read it, but for all humans. This has worked until now because hostnames were limited to only ~37 characters. Regardless of native language, any computer operator can quickly learn to handle the [a-z][0-9] gylphs. Basically anyone literate in one language can copy ASCII characters from a signpost onto a notepad, and then punch those into a keyboard. Even if her culture doesn't use the ASCII set in normal daily activities (which about everyone in America, Europe, and Japan does), then the shapes are at least simple enough to copy geometrically.

      But if 16-bit charsets are allowed in DNS, we could get hostnames composed of 3 Chinese characters and two Arabic ones, and which a Russian or Briton will be incapable of processing without tremendous pain.

      DNS is something that should be left in a "lowest common denominator" form, so that it's accessible to all of humanity (if they meet the low hurdle of operating a normal PC)

      Internationalized host identifiers in URLs will be important, of course. But they should be a separate layer implemented on top of DNS. DNS is a standard that already exists. Rather than changing the standard and breaking every single internet-using computer (the "flag day" scenario), a new system should be rolled out for people who want host identifiers in funny-looking squiggles.

  4. really dumb sounding by happyfrogcow · · Score: 4, Interesting

    I'm sorry, is it just me or do they seem to be taking a bad shortcut to get to a good end? It doesn't seem like they are doing this correctly. Why not plan to migrate to unicode? Their choice seems shortsighted and flawed. I hope they atleast considered unicode and came up with real reasons why not to use it.

  5. Why not by Pingular · · Score: 4, Funny

    But I guess it would be difficult to talk about this on /., since it does not even support basic Latin-1
    Just say the ascii number?

    --

    When anger rises, think of the consequences.
    Confucius (551 BC - 479 BC)
  6. Useful? Naw. by grub · · Score: 4, Interesting


    I'm not sure what all the accents are on the alphabet, will I have to know to type them to access a simple website? Sorry, this doesn't make using the net easier.

    --
    Trolling is a art,
    1. Re:Useful? Naw. by tuffy · · Score: 4, Insightful

      If you don't know how to type the characters necessary to access the web site, chances are you won't be able to read the content anyway. So I think it's a moot point.

      --

      Ita erat quando hic adveni.

    2. Re:Useful? Naw. by McDutchie · · Score: 3, Insightful
      I'm not sure what all the accents are on the alphabet, will I have to know to type them to access a simple website?

      Never fear, oh monolingual one, I found this very handy site that will help solve this pesky problem for you. Try it some time and let us know what you think!

  7. Taco, why did you remove the accents from slashdot by Anonymous Coward · · Score: 5, Funny

    ,
    Taco est un mechant garcon.
    '

  8. Maybe not as useful as one might believe by Ryu2 · · Score: 4, Interesting

    While it's logical for, say, Chinese companies to have a Chinese domain name and Chinese e-mail addresses, it may not be the best choice if the company wishes to expand oversea.

    Unfortunate but true, if a company has a Chinese domain name, it would probably be only used within China, Taiwan, Hong Kong, Singapore, Japan (since it's unicode), and maybe South Korea. The company would be pretty much limited to the East Asia market.

    However, I suppose the company could get both a Chinese domain and an English, or rather Pinyin, domain so they could make their Chinese, or maybe other Asian clients feel "closer" while also being able to reach clients outside of East Asia.

    I also think that it'd be great to give people the option of having a native-language email address. It's not too hard to set up a romanized email alias for it. An SMTP "X-Roman-Address" header could even by added to outgoing messages in case a recipient can't read the default "From" line.

    --
    There's 10 types of people in this world, those who understand binary and those who don't.
  9. URLs that you cannot type by HermanZA · · Score: 3, Insightful
    That is sure to improve your hit rate no end...

    I sure hope this harebrained idea doesn't take off.

    1. Re:URLs that you cannot type by Scrameustache · · Score: 3, Insightful

      That is sure to improve your hit rate no end...

      URLs that you cannot type. But why would they want your hits if you can't even type their domain name? Its not like you'll be able to read the content if you get there, or understand their ads.

      --

      You can't take the sky from me...

  10. Companies will shell out more to registrars now by Arcturax · · Score: 4, Insightful

    After all, now they need not only worry about registering say...

    Microsoft.com
    Microsoft.net
    Microsoft.org
    Mic rosoft.tv
    etc..

    But also
    Microsoft.com
    Microsoft.com

    Well, you get the picture.

    --

    --Won't that be grand? Computers and the programs will start thinking and the people will stop. - Dr. Walter Gibbs
  11. IDN? Mozilla supports it by ospirata · · Score: 3, Informative

    I'm delighted to tell that Mozilla is one step forward again, and already supports IDN since version 0.9.5 http://www.mozilla.org/projects/intl/idn_mozilla.h tml

  12. Mixed feelings by f97tosc · · Score: 5, Informative

    I have mixed feelings about this. I am from Sweden, and it always looks kind of ugly when names lose their dots and circles in the domain name.

    On the other hand, this is also quite convenient. I live in the US now, and I travel around quite a bit. I often surf on Swedish Internet sites, typically without access to a Swedish keyboard. It would not be very convenient if the domain names used non-English symbols.

    Sometimes I go to Japanese sites also, and I am really glad that I don't have to install a Japanese word processor to do this...

    Tor

  13. Super Monkeys! by Speare · · Score: 5, Funny

    Any Internet RFC which includes the phrase, -with-SUPER-MONKEYS, has GOT to be good. (And in case you think I'm trolling, check the link.)

    --
    [ .sig file not found ]
  14. Re:FINALLY! by arcanumas · · Score: 5, Funny
    I'm glad to see that people other than Americans are being recognized on the internet. Which originally started as an American military project...

    I am glad too see others than the Mesopotamians using the wheel which was originally invented for use in Mesopotamia.

    --
    Slashdot Sig. version 0.1alpha. Use at your own risk.
  15. Punycode *is* a Unicode encoding. by Speare · · Score: 4, Informative

    Punycode *is* a Unicode encoding.

    Unicode has many encodings; UTF-8 is one encoding and Punycode is another. UTF-8 aims for efficiency when the majority of the text is ASCII, and Punycode aims for completeness when you must fit in 64 characters and use only the ASCII characters to do it.

    --
    [ .sig file not found ]
  16. Subject to Approval by The_Systech · · Score: 3, Funny

    Yeah, but did anybody get Al Gore's approval to make these changes?

    --
    To err is human, but to really foul things up requires a computer
  17. it works fine on /. by GillBates0 · · Score: 4, Funny

    - - - - ..
    I, for one, welcome our new European overlords.

    --
    An Indian-American Hindu committed to non-violent thought/speech/action alarmed by the global explosion of radical Islam
  18. No change needed... by JohnGrahamCumming · · Score: 5, Informative

    > You think you know how to parse a domain name for validity?

    Yes, I do, and if you _read_ the RFC you'll see that nothing changes, these domain names are encoded into the same character set as the current DNS system. And hence if you give me a URL I can validate it with existing scripts. There's an example which shows that Bucher.ch (with an umlaut on the u) would be translated to: xn--bcher-kva.ch which looks totally parseable to me.

    John.

  19. I can't wait by nizo · · Score: 5, Funny

    Personally I can't wait to see funky chinese character domain names in my web logs (mostly from infected windows machines trying to attack my apache server).

  20. Reason by ajnlth · · Score: 3, Insightful
    I would guess that the reason for this rather than redesigning DNS to use Unicode is beacause of the still rather dominant presence of the USA on the internet.

    Since this solution doesn't break any old implementation just the countries that need it will have to modify their software, and not wait for the slow and expensive process of changing all of DNS, which a large part of the 'net isn't motivated do pay for.

  21. Re:FINALLY! by cynicalmoose · · Score: 4, Funny

    The internet was built as a highly decentralised, noncontrolled network, so that, in the event of a nuclear war, military leaders would have unrivalled access to pornography. (3DTIAB)

    --
    Exercise your right not to vote. thinkoutside.org
  22. Well... it's still not perfect by Krach42 · · Score: 3, Interesting

    Ok, so you're mostly guarenteed a domain name if you own the trademark on the name. (To prevent cybersquatters right?)

    Well, what about the .jp domain? How can they possibly handle this, since in Japan you cannot copyright latin characters. (Or at least as far as I've heard)

    This is the reasoning I've heard, as to why IBM is ai-bi-emu in Japan. And maikurosofuto, souni, etc. (roomaji transliteration there, sorry if you don't get why ai=I)

    So what do you do in this case? Unless they can enter Shift-JIS or Unicode URLs, then you're stuck having people enter roomaji versions of your name, which remember, aren't technically trademarkable.

    I'd love to hear I'm wrong on some point here, could anyone with more info clue me in?

    --

    I am unamerican, and proud of it!
  23. You RTFA by Krach42 · · Score: 4, Insightful
    The introduction of the new IDN (Internationalised Domain Name) standard does much more than permit umlauts. A total of 92 additional characters, from the French e to the Danish o, will adorn domains.


    This means that it can't possibly include ALL of the unicode spectrum, as Unicode supports far more than just 92 extra characters.

    Also, the way the coding is going to work, you still can't register a name with B.

    According to international rules, this is equivalent to its transcription as ss. It would simply not be possible to distinguish between the domains straBe.de and strasse.de.
    --

    I am unamerican, and proud of it!
  24. Babylon 5 by uberdave · · Score: 3, Funny

    microsoft and microsoft for instance are two completly diffrent words.

    Reminds me of that Babylon 5 episode when they find a person named Zathras down on this planet. Ivanova thought she had been talking to Zathras:

    "No, that was not Zathras, that was Zathras. There are 10 of us, all of family Zathras, each one named Zathras. Slight differences in how you pronounce. Zathras, Zathras, Zathras.. You are seeing now?" - Zathras, Babylon 5: Conflicts of Interest

  25. Re:Not to be Overly American... by Mnemia · · Score: 4, Insightful

    Yes, it is. Because it's not just a few "umlauts". When you're talking about Asian or other non-Romanized languages then the Romanization may be totally incomprehensible to even some speakers of that language. It's one thing to lose a few accent marks and such but it's quite another to translate your language into a totally incomprehensible and unrelated format. In fact in kanji based languages at the very least Romanization actually LOSES information. It's not just a matter of transcribing the sounds into another format because the kanji carry additional meaning not present in just the phonetic lanaguage. If you've ever seen two native Chinese or Japanese speakers talk to each other they frequently will "write" kanji in the air or on the palm of the other person's hand with their fingers because their spoken language is imprecise.These changes are very necessary for the Internet to become a truly international phenomenon

  26. Backwards compatability by Stephen+Samuel · · Score: 3, Informative
    Why not extend dns to support unicode? That way they'd be no translation or other crap to go through.

    Sounds like a great idea.... If you're willing to re-implement the DNS code in my Win-95 box.... or on my Amiga-4000. How about my 10 year old Apollo workstation or the SUN-3 that's still working just fine, thank you. etc. etc.

    A lot of old DNS implementations would choke (and properly so) on UTF-8 encoded DNS names. We probably could have seeded the needs of the future by saying that IP-6 DNS servers should support unicode, but I think that even that boat has been missed. (or is quickly leaving dock).

    In the meantime the old DNS and it's anglo-centric presumptions and restrictions are with us for the next few years (or decades, as the case may be). Clearly some people feel the need to live within those restrictions.

    --
    Free Software: Like love, it grows best when given away.
  27. It's not for trolls. by dmelomed · · Score: 3, Insightful

    "djbdns doesn't support unicode either, although it doesn't rely on standard c-libraries, so unicode support might only take a few weeks to add."

    djbdns is 8-bit clean. Use UTF-8 all you want right now.

  28. This is important.. by k98sven · · Score: 4, Interesting

    Just to diverge, I'd like to represent the non-english speaker view here.

    In most of the languages with 'funny accents' like umlauts, these characters often have a completely different pronounciation, and are often considered to be a completely different letter than without the 'accent'.

    Simply 'brushing off the dirt' and removing the 'accent' thus changes the word. Sometimes with wierd results.
    Just ask someone from the town of Moensteraas, Sweden.
    Their website contains mostly municipal information intended for swedes, but due to the restrictions of DNS, the name is instead spelt 'monsteras', which means 'monster-carcass' in Swedish.

    Obviously, these people would be happier spelling it with umlauts on the o, and a ring over the a.

  29. Re:Sorry, but this is really stupid... by dabadab · · Score: 4, Insightful

    You know, this arrogant, self-centric view does not help the discussion.
    Anyway, the current infrastructure DOES NO have to be updated and this change is NOT intended to be "some jagoff's playground", but rather for the non-English speaking people - there are quite a few of them.

    --
    Real life is overrated.
  30. Accecents like case? by davburns · · Score: 3, Insightful
    Perhaps I'm showing grave naivete, but it seems like it would be better to treat accents (dots, slashes and stuff) like case. DNS names are case insensitive, but case preserving. So, you can type all your fancy European characters if you want, but you don't have to mess with them if you're on a keyboard where that's difficult, and there's no additional opportunity for squatting or visual name hijacking. Naturally, you would want the accents to appear on reverse lookups (just like mixed case domain names work.)

    I know there are times when differnet accents sometimes indicate different words -- but I'm under the impression that it is unlikely that more than one of them would be a "good" domain name. (Am I wrong about that?)

    This won't work for non-latin characters, obviously. But UTF-8 seems like a better solution to that. (I understand that most chineese words are 2-3 characters of 2-3 bytes (unified is U-430 to U-9fa and upto U-7ff is 2 characters) for 4-9 bytes -- clearly less than 63 bytes) The obvious downside is that it means that all DNS servers and resolvers must (at least!) be 8-bit clean.