Slashdot Mirror


Internationalized Domain Names Coming Soon

rduke15 writes "You think you know how to parse a domain name for validity? Well, in case you haven't noticed, things are getting tougher as registrars keep adopting IDN (Internationalized Domain Names), which uses a weird encoding named Punycode to enable accented characters in domain names. The Register reports about Switzerland, Germany and Austria's joint move to enable IDN. See the overview in English from Switch. But I guess it would be difficult to talk about this on /., since it does not even support basic Latin-1 ... :-)"

33 of 526 comments (clear)

  1. Ah great... by Worminater · · Score: 5, Insightful

    More ways for trolls to disguise goatse.cx links...

    1. Re:Ah great... by Hanzie · · Score: 2, Insightful

      The parent post is absolutely not flamebait. It actually brings up an extremely good point. There will unquestionably be domain squatting and misdirection with use of accented characters.

      --
      ********* sig: If you don't like the law, get filthy stinking rich, and buy a better one.
    2. Re:Ah great... by MikeXpop · · Score: 4, Insightful

      Heh. Worse than that. Imagine http://www.paypal.com/enteryourcreditcardnumberher e.php! How many people do you think that would fool? I'd be guessing a lot more than sites now are.

      --
      Etiquette is etiquette. He kills his mother but he can't wear grey trousers.
    3. Re:Ah great... by Trejkaz · · Score: 3, Insightful

      Okay, you got me. That domain name is 100% pixel identical to www.paypal.com ... which letter is changed?

      --
      Karma: It's all a bunch of tree-huggin' hippy crap!
  2. sounds like by Anonymous Coward · · Score: 3, Insightful

    Sounds like a job for Unicode.
    Unicode.org

  3. URLs that you cannot type by HermanZA · · Score: 3, Insightful
    That is sure to improve your hit rate no end...

    I sure hope this harebrained idea doesn't take off.

    1. Re:URLs that you cannot type by Scrameustache · · Score: 3, Insightful

      That is sure to improve your hit rate no end...

      URLs that you cannot type. But why would they want your hits if you can't even type their domain name? Its not like you'll be able to read the content if you get there, or understand their ads.

      --

      You can't take the sky from me...

  4. Companies will shell out more to registrars now by Arcturax · · Score: 4, Insightful

    After all, now they need not only worry about registering say...

    Microsoft.com
    Microsoft.net
    Microsoft.org
    Mic rosoft.tv
    etc..

    But also
    Microsoft.com
    Microsoft.com

    Well, you get the picture.

    --

    --Won't that be grand? Computers and the programs will start thinking and the people will stop. - Dr. Walter Gibbs
  5. Re:Useful? Naw. by tuffy · · Score: 4, Insightful

    If you don't know how to type the characters necessary to access the web site, chances are you won't be able to read the content anyway. So I think it's a moot point.

    --

    Ita erat quando hic adveni.

  6. Re:Mixed feelings by HerbieStone · · Score: 2, Insightful
    That's why Website owner will register thier sites under two Domain: The current one for english-keyboard users, and the (orginal) foreign-named Domain

    And that's also why registrars love it.

  7. Reason by ajnlth · · Score: 3, Insightful
    I would guess that the reason for this rather than redesigning DNS to use Unicode is beacause of the still rather dominant presence of the USA on the internet.

    Since this solution doesn't break any old implementation just the countries that need it will have to modify their software, and not wait for the slow and expensive process of changing all of DNS, which a large part of the 'net isn't motivated do pay for.

  8. Just use Google by bstadil · · Score: 2, Insightful
    The whole issue of convenient Domain names is a bit passee.

    Often used url's I have as book marks and when i need some other site, it is much easier to make a guess via Google. What I am looking for is almost always on page one of googles choices.

    Sure Google could find a way to handle the special characters and make an intelligent suggestion, if nothing else based on IP address of the request. If it is from Burundi chances of needing a German umlaut is slim

    --
    Help fight continental drift.
  9. Re:Useful? Naw. by ShecoDu · · Score: 2, Insightful

    As others have pointed out, if you dont use the accents, why would you want to visit a foreign language page? if you happend to like the language you can find the way to type the characters... besides, there is always a way to use google to locate the page and click on the link or something like that, you dont have to be so closed minded, not just because you dont find it usesful, everybody will see it the same way as you do...

    Just as the moderator guideline says "focus on promoting instead of modding down", the same applies, focus on the things you like and ignore those that dont mean a thing.

    If I happend to see a slashdot post about war and I couldnt give a f*ck less, I would certainly just avoid looking inside, it's obvious I'm going to get irritated with the comments inside, but that doesnt mean they are wrong...

    By the way, I'm not trolling or beeing agressive... I just express my point of view. =)

  10. Wrong way on a one-way track... by mishehu · · Score: 2, Insightful

    Let's assume (and I might not be correct in this assertion) that every computer in every country can at least type & see the 26 letters used in the English language plus digits 0-9 and the dash & period signs. However, I have no idea how to type anything coherent in Chinese Simplified or Traditional (hell, it's all Chinese to me...)...

    In the interest of fostering the best method to communicate your ideas, products, services, etc., would you not want to use the characters that most everybody can type?

    Oh, and this begs the next question - what about languages that go right-to-left instead of left-to-right? How about Thai, Arabic, and Hebrew? Personally, I don't want to see any domain names outside of the 26 chars used in English, 0-9, and the period & dash signs.

  11. Sorry, but this is really stupid... by coene · · Score: 2, Insightful

    "Yeah, let's make sure that every normal english domain name can easily be spoofed with accented characters, not to mention having everyone open up and hunt around charmap to get to these new domains"...

    This isnt going to be abused, AT ALL. Worst idea ever.

    The Internet (domain names, top-tier nameservers, nameserver software, web and e-mail server software, all markup documents) runs on english, there's no way to i18n it without opening up a world of hurt. Sorry, but I don't want to have to upgrade BIND to a whole new series of bugs and exploits just so that some jagoff can open up his own go~o`le'.com.

    1. Re:Sorry, but this is really stupid... by dabadab · · Score: 4, Insightful

      You know, this arrogant, self-centric view does not help the discussion.
      Anyway, the current infrastructure DOES NO have to be updated and this change is NOT intended to be "some jagoff's playground", but rather for the non-English speaking people - there are quite a few of them.

      --
      Real life is overrated.
  12. Re:Bad idea but bound to happen with todays thinki by isaac338 · · Score: 2, Insightful

    The funny part is you'd probably be the first to complain had the Internet been designed by some foreign country and you couldn't register a plain English URL. Learning a whole new language isn't a "little learning curve", it's actually pretty hard.

    if you can't handle a little learning curce to access the info, IMO you aren't capable mentaly of doing anything with the info once you access it.

    Next time you go to a country the native language of which you can't understand, try planning your whole trip without once reading an English translation of any map or sign. Then you possibly might see how ignorant that statement sounds.

    The Internet is a world-wide resource, and like it or not, people who speak other languages have a say in how it works too.

    isaac

  13. Well, it had to happen sometime...I guess by The+Spanish+Ninja · · Score: 2, Insightful

    It looks to me like this isn't really going to be such a big deal. Their domain names are going to be converted for DNS anyway, so it's not like we would have to type in a complicated string of characters that aren't on our keyboards. So we can't remember what to type so easily, so what? That's why we have bookmarks. Besides, this isn't really for us anyway. It's purpose seems to be to allow the people in other countries to use their own native languages for their own domain names. Easier for them, right? And if we want to access their domains, we just have to remember a few extra letters and dashes. No big deal. They get to do stuff in their language, we translate to ours, the whole world speaks, and maybe something gets done.

    --
    "I like you, but I wouldn't want to see you working with subatomic particles."
  14. You RTFA by Krach42 · · Score: 4, Insightful
    The introduction of the new IDN (Internationalised Domain Name) standard does much more than permit umlauts. A total of 92 additional characters, from the French e to the Danish o, will adorn domains.


    This means that it can't possibly include ALL of the unicode spectrum, as Unicode supports far more than just 92 extra characters.

    Also, the way the coding is going to work, you still can't register a name with B.

    According to international rules, this is equivalent to its transcription as ss. It would simply not be possible to distinguish between the domains straBe.de and strasse.de.
    --

    I am unamerican, and proud of it!
  15. Re:Isn't there a better way? by wmshub · · Score: 4, Insightful

    Unicode-reinterpreted-as-a-string-of-ASCII-bytes (taken literally) can only mean UTF-7, which never really got much traction, but had no NULs or control characters in it - all pure, readable ASCII. It's problem in DNS would be that it treats upper and lower case as distinct, which is not true for current DNS queries. If you meant "UTF-8" when you said "unicode-reinterpreted-as-a-string-of-ASCII-bytes" , that also has no NUL or control codes in it, and unlike UTF-7 it lets you treat upper/lower case any way you want. It's drawback is that it will insert bytes in the 128..255 (ie, non-ASCII) range into the data stream, which will probably cause trouble for current DNS servers.

    So, to sum it up, you are right that current Unicode encodings will not meet current DNS RFCs, but the reason you gave wasn't quite right. Punycode does solve the problem, but ugh, punycode is an awful hack of a character encoding system. I'd hate to see it live on forever, but it might be useful getting us started on i18n-ified DNS.

  16. Re:Not to be Overly American... by Mnemia · · Score: 4, Insightful

    Yes, it is. Because it's not just a few "umlauts". When you're talking about Asian or other non-Romanized languages then the Romanization may be totally incomprehensible to even some speakers of that language. It's one thing to lose a few accent marks and such but it's quite another to translate your language into a totally incomprehensible and unrelated format. In fact in kanji based languages at the very least Romanization actually LOSES information. It's not just a matter of transcribing the sounds into another format because the kanji carry additional meaning not present in just the phonetic lanaguage. If you've ever seen two native Chinese or Japanese speakers talk to each other they frequently will "write" kanji in the air or on the palm of the other person's hand with their fingers because their spoken language is imprecise.These changes are very necessary for the Internet to become a truly international phenomenon

  17. Who types URLs? by Royster · · Score: 2, Insightful

    Geeks do, but your average surfer does not. They go clickly clickly on the results returned by the search engine or clicky clicky on the link someone emailed them or clicky clicky on the link from some other website.

    Most users don't even *know* that you can type stuff in the Address field.

    --
    I have discovered a truly marvelous sig, unfortunately the sig limit is too small to contain i
  18. Re:FINALLY! by McDutchie · · Score: 2, Insightful
    I'm glad to see that people other than Americans are being recognized on the internet. Which originally started as an American military project...

    I'm glad to see that people other than the Swiss are being recognized on the web. Which originally started as an Swiss scientific project...

    Without the rest of the world, the Internet would have been obsolete and irrelevant by now. Deal.

  19. It's not for trolls. by dmelomed · · Score: 3, Insightful

    "djbdns doesn't support unicode either, although it doesn't rely on standard c-libraries, so unicode support might only take a few weeks to add."

    djbdns is 8-bit clean. Use UTF-8 all you want right now.

    1. Re:It's not for trolls. by divec · · Score: 2, Insightful
      Are you claiming that if I use UTF-8 to encode a string, I will never get a bytestring that contains a 46 (that is, a dot ".")?

      That's correct - no unicode codepoint apart from [FULL STOP] will cause a \x2E to appear in a UTF-8 stream. UTF-8 encodes the first 128 code points of Unicode using the identical ASCII values (which all have the eighth bit set to 0), and then only using combinations of the other 128 byte values (which all have the eighth bit set to 1) to encode every other character. It's very cool - that's why existing software doesn't usually need much modification to support UTF-8.
      --

      perl -e 'fork||print for split//,"hahahaha"'

    2. Re:It's not for trolls. by Carewolf · · Score: 2, Insightful

      Yes he is not only claiming that. He is right and you should look up your facts.

      UTF-8 only uses non-ascii values to produce non-ascii characters. That's one of the things that make it really neat, and easy to convert to. It also means that you jump into an UTF-8 stream at any point without getting out of sync and receiving trash. this makes it more powerfull than UTF-16.

  20. Re:Isn't there a better way? by Anguo · · Score: 3, Insightful
    It looks to me like the problem is that the DNS servers don't support unicode so they're using a bad implementation of it. Why not extend dns to support unicode? That way they'd be no translation or other crap to go through.

    It's not as simple as you may think. I am all for Unicode, but to use it for domain names can lead to unwanted consequences.

    There already exists some intenationalized domain names in Chinese, so instead of having chopsticks.com we can have [insert chinese characters for chopsticks here].com.

    The problem comes from the fact that there are tens of thousands of different Chinese characters, each of those having a different unicode code but many of those being only slight variations of each other, or even so similar than a regular Chinese reading user wouldn't notice the difference at first glance. Thus you could have two very different websites having seemingly exactly the same name in Chinese but being different nonetheless because the unicode for their names is different.

    With only 26 letters and a few more characters, there has been many abuses of domain names, like www.microsoft-.com instead of www.microsoft.com (or some similar abuse), but the possibilities for abuse in chinese are almost infinite. The same would be in many European languages: many will not pay enough attention to the differences between an acute accent and a grave accent in French and might be mislead to a different site than the one they were looking for. Imagine the credit card payment page of the bank Societe Generale: with the accents written backwards, a lookalike site can be created under a domain name that looks the same. The same in Polish, with the cedillas under many of the letters or the L, with or without the bar accross it.

    Technically, unicode may be feasible, but human beings cannot distinguish between the hundreds of thousands (and more!) differents letters, characters and other signs that it offers...

    --
    http://www.masquilier.org/republic/election/ Condorcet, Plurality voting and alternative voting enabled bulletin board.
  21. Re:Useful? Naw. by Just+Some+Guy · · Score: 2, Insightful
    Bzzzt - wrong. You may not've travelled to countries with different "standard" keyboard layouts, but that's not going to help a Japanese businessman on a trip to Los Angeles figure out how to type the name of his company's website on a PC-104 setup. Put him on a Kanji keyboard and he'll be there in seconds. Give him a nice en.US layout and see what happens.

    What was your point again?

    --
    Dewey, what part of this looks like authorities should be involved?
  22. Re:Useful? Naw. by McDutchie · · Score: 3, Insightful
    I'm not sure what all the accents are on the alphabet, will I have to know to type them to access a simple website?

    Never fear, oh monolingual one, I found this very handy site that will help solve this pesky problem for you. Try it some time and let us know what you think!

  23. Re:Isn't there a better way? by Zeinfeld · · Score: 3, Insightful
    Paul Vickie (of BIND fame) has stated that supporting unicode in bind would probably require at least a year to implement, and could introduce new buffer overflow exploits.

    If Paul Vixie did say that it would kinda argue for chosing that route rather than trying to get the IETF to agree to anything, so far it has been over five years since the start of this effort and counting.

    The real problem is not fixing Bind, that is easy. Deploying bind updates and deploying compatible client updates is the real problem. It just isn't feasible.

    --
    Looking for an Information Security student project suggestion?
    Try http://dotcrimeManifesto.com/
  24. Re:Isn't there a better way? by Minna+Kirai · · Score: 4, Insightful

    Why not extend dns to support unicode?

    DNS should never get Unicode support, or any form of "internationalization" for that matter!

    DNS is supposed to be a way for humans to communicate with computers about internet hosts. The intent is not for some human to be able to read it, but for all humans. This has worked until now because hostnames were limited to only ~37 characters. Regardless of native language, any computer operator can quickly learn to handle the [a-z][0-9] gylphs. Basically anyone literate in one language can copy ASCII characters from a signpost onto a notepad, and then punch those into a keyboard. Even if her culture doesn't use the ASCII set in normal daily activities (which about everyone in America, Europe, and Japan does), then the shapes are at least simple enough to copy geometrically.

    But if 16-bit charsets are allowed in DNS, we could get hostnames composed of 3 Chinese characters and two Arabic ones, and which a Russian or Briton will be incapable of processing without tremendous pain.

    DNS is something that should be left in a "lowest common denominator" form, so that it's accessible to all of humanity (if they meet the low hurdle of operating a normal PC)

    Internationalized host identifiers in URLs will be important, of course. But they should be a separate layer implemented on top of DNS. DNS is a standard that already exists. Rather than changing the standard and breaking every single internet-using computer (the "flag day" scenario), a new system should be rolled out for people who want host identifiers in funny-looking squiggles.

  25. Accecents like case? by davburns · · Score: 3, Insightful
    Perhaps I'm showing grave naivete, but it seems like it would be better to treat accents (dots, slashes and stuff) like case. DNS names are case insensitive, but case preserving. So, you can type all your fancy European characters if you want, but you don't have to mess with them if you're on a keyboard where that's difficult, and there's no additional opportunity for squatting or visual name hijacking. Naturally, you would want the accents to appear on reverse lookups (just like mixed case domain names work.)

    I know there are times when differnet accents sometimes indicate different words -- but I'm under the impression that it is unlikely that more than one of them would be a "good" domain name. (Am I wrong about that?)

    This won't work for non-latin characters, obviously. But UTF-8 seems like a better solution to that. (I understand that most chineese words are 2-3 characters of 2-3 bytes (unified is U-430 to U-9fa and upto U-7ff is 2 characters) for 4-9 bytes -- clearly less than 63 bytes) The obvious downside is that it means that all DNS servers and resolvers must (at least!) be 8-bit clean.

  26. Re:Isn't there a better way? by Anonymous Coward · · Score: 2, Insightful

    While I agree with the spirit of your post -- mainly that there should be internationalized domain names -- I do find fault with your argument.

    What the grandparent post was saying was that what makes the current DNS scheme universally accesible is its small codespace, not just that latin letters are used. While he did take a very anglocentric tone in his post -- which believe me, I have some issue with -- you failed to address the main issue here, which is a 16-bit codespace and its relative inaccessibility.

    I live in China and let me tell you, it took me several years to become appreciably literate in Chinese. 37 glyphs is not a lot to learn, and if DNS were based on 37 Chinese characters or whatever that would be fine, we could all learn that. But 37,000?

    Also, while I dislike the notion of non-globalized DNS, consider the facts: every keyboard on every computer in every nation can type those 37 characters. Moreover, the dominance of the western world today has ensured that there almost always exists some romanization system which the locals are vaguely familiar with. These systems may discard possibly vital information (for example, tones in Pinyin or umlauts on the Swedish town of Horby as mentioned by another poster), but they remain a) universally accessible and b) basically fairly easy to remember for all people.

    Let's be honest, computers and the computer world are (at least currently) very anglocentric. This is not right. In the future, hopefully, it will not be this way. And there are some places where using ascii is a pain in the butt for the locals.

    The argument that "if you can't type tho domain name, you can't read the content" isn't a bad one, although it does require multi-lingual sites to register redundant domain names.

    I think it would be simpler, at least for those countries which use some superset of ascii as their local writing system, to have DNS simply map intelligently to ascii.

    For example, in Germany, an umlauted letter could be transparently mapped to the same letter, sans umlaut, followed by an e, and the sharp s could be mapped to two ss.

    In French, where no established system for representing accents exists, letters with accents could be simply mapped to their respective non-accented counterparts.

    Because sometimes (and this often happens to me) you're on a computer that can display the characters but cannot type them; I'm french and when I'm in the states and I write an e-mail home I just don't use the accents. It's annoying, yes. But it's comprehensible.

    But what if I needed to type an accent just to get to a news site, or something? I would need to either figure out how to type the character -- not always easy -- or I would have to find the character and copy/paste it. Annoying, to say the least.

    Much better would be the option of typing without the accent, and the option of typing with. Both would map "internally" to the same domain name. And we europeans could get our accent fix.

    But for non-latin character sets (which IDN doesn't aim to support anyway, is eurocentrism really better than anglocentrism?) this system would of course not work, and so the locals would need to rely on officialized transliteration systems. But actually, you could certainly use a localized DNS system that did automatic transliteration (Chinese characters to pinyin, for example). That way the locals could use their character set, but internally, it would still just be ascii, ensuring that typing the URL for both the locals when abroad and for others (clients) would be possible without registering multiple domain names.

    What do you guys think?