Slashdot Mirror


Internationalized Domain Names Coming Soon

rduke15 writes "You think you know how to parse a domain name for validity? Well, in case you haven't noticed, things are getting tougher as registrars keep adopting IDN (Internationalized Domain Names), which uses a weird encoding named Punycode to enable accented characters in domain names. The Register reports about Switzerland, Germany and Austria's joint move to enable IDN. See the overview in English from Switch. But I guess it would be difficult to talk about this on /., since it does not even support basic Latin-1 ... :-)"

88 of 526 comments (clear)

  1. Ah great... by Worminater · · Score: 5, Insightful

    More ways for trolls to disguise goatse.cx links...

    1. Re:Ah great... by Hanzie · · Score: 2, Insightful

      The parent post is absolutely not flamebait. It actually brings up an extremely good point. There will unquestionably be domain squatting and misdirection with use of accented characters.

      --
      ********* sig: If you don't like the law, get filthy stinking rich, and buy a better one.
    2. Re:Ah great... by MikeXpop · · Score: 4, Insightful

      Heh. Worse than that. Imagine http://www.paypal.com/enteryourcreditcardnumberher e.php! How many people do you think that would fool? I'd be guessing a lot more than sites now are.

      --
      Etiquette is etiquette. He kills his mother but he can't wear grey trousers.
    3. Re:Ah great... by ceejayoz · · Score: 2, Funny

      I have modpoints right now, but I can't find the "-1, Dumbass" one... hmm...

    4. Re:Ah great... by Trejkaz · · Score: 3, Insightful

      Okay, you got me. That domain name is 100% pixel identical to www.paypal.com ... which letter is changed?

      --
      Karma: It's all a bunch of tree-huggin' hippy crap!
    5. Re:Ah great... by lisany · · Score: 2, Funny

      Maybe he works for Verisign and is planning to hijack the domain "For the good of the Internet."

  2. sounds like by Anonymous Coward · · Score: 3, Insightful

    Sounds like a job for Unicode.
    Unicode.org

  3. Isn't there a better way? by CTalkobt · · Score: 4, Interesting

    It looks to me like the problem is that the DNS servers don't support unicode so they're using a bad implementation of it.

    Why not extend dns to support unicode? That way they'd be no translation or other crap to go through.

    Granted software would need changing but that be the case with the mangled crap that's mentioned in the article.

    What am I not understanding here? Or is this just implementation dreamed up to make life complicated?

    --
    There's a gorilla from Manilla whose a fella that stinks of vanilla and has salmonella.
    1. Re:Isn't there a better way? by Anonymous Coward · · Score: 2, Funny
      Better still, why not just suck it up and get by with ascii? It only requires 1 byte per character, and is easy to memorize.

      Accented characters are so Old World and passe, anyway.

    2. Re:Isn't there a better way? by Horny+Smurf · · Score: 2, Interesting
      Paul Vickie (of BIND fame) has stated that supporting unicode in bind would probably require at least a year to implement, and could introduce new buffer overflow exploits.

      djbdns doesn't support unicode either, although it doesn't rely on standard c-libraries, so unicode support might only take a few weeks to add.

      Unicode would be better than punycode, but punycode works with existing DNS client and server software.

    3. Re:Isn't there a better way? by wmshub · · Score: 4, Insightful

      Unicode-reinterpreted-as-a-string-of-ASCII-bytes (taken literally) can only mean UTF-7, which never really got much traction, but had no NULs or control characters in it - all pure, readable ASCII. It's problem in DNS would be that it treats upper and lower case as distinct, which is not true for current DNS queries. If you meant "UTF-8" when you said "unicode-reinterpreted-as-a-string-of-ASCII-bytes" , that also has no NUL or control codes in it, and unlike UTF-7 it lets you treat upper/lower case any way you want. It's drawback is that it will insert bytes in the 128..255 (ie, non-ASCII) range into the data stream, which will probably cause trouble for current DNS servers.

      So, to sum it up, you are right that current Unicode encodings will not meet current DNS RFCs, but the reason you gave wasn't quite right. Punycode does solve the problem, but ugh, punycode is an awful hack of a character encoding system. I'd hate to see it live on forever, but it might be useful getting us started on i18n-ified DNS.

    4. Re:Isn't there a better way? by Rob+Riggs · · Score: 4, Informative
      wouldn't UTF-8 have worked just as well?

      No. The problem that punycode solves is that the encoded DNS names are themselves valid RFC1034 DNS names. That is, even when encoded, standard DNS validity checkers will accept the name.

      UTF-8 does not have this property

      --
      the growth in cynicism and rebellion has not been without cause
    5. Re:Isn't there a better way? by pawal · · Score: 4, Interesting

      There are _so_ many applications using the domain name system that feeding UTF-8 through it will break most of them. Except for perhaps Internet Explorer.

      The registries using UTF-8 (most notably .NU) are running IDN in parallell with UTF-8 now.

      The Swedish registry is only using IDN. The reason for that is that UTF-8 in DNS is not an internet supported standard at all.

      http://www.xn--rksmrgs-5wao1o.se/ will work if you are using a recend Mozilla. (Slashdot should upgrade to at least ISO-8859-1 or UTF-8... I couldn't write raksmorgas.se correctly.)

      Microsoft are extremly slow in supporting IDN, and will probably not launch it until next OS release which is in 2006... There are plugins from Verisign.

      Do a good thing, release an open source plugin for MSIE.

    6. Re:Isn't there a better way? by defMan · · Score: 2, Funny

      i18n-ified

      internationalization-ified? Wow.

    7. Re:Isn't there a better way? by Anguo · · Score: 3, Insightful
      It looks to me like the problem is that the DNS servers don't support unicode so they're using a bad implementation of it. Why not extend dns to support unicode? That way they'd be no translation or other crap to go through.

      It's not as simple as you may think. I am all for Unicode, but to use it for domain names can lead to unwanted consequences.

      There already exists some intenationalized domain names in Chinese, so instead of having chopsticks.com we can have [insert chinese characters for chopsticks here].com.

      The problem comes from the fact that there are tens of thousands of different Chinese characters, each of those having a different unicode code but many of those being only slight variations of each other, or even so similar than a regular Chinese reading user wouldn't notice the difference at first glance. Thus you could have two very different websites having seemingly exactly the same name in Chinese but being different nonetheless because the unicode for their names is different.

      With only 26 letters and a few more characters, there has been many abuses of domain names, like www.microsoft-.com instead of www.microsoft.com (or some similar abuse), but the possibilities for abuse in chinese are almost infinite. The same would be in many European languages: many will not pay enough attention to the differences between an acute accent and a grave accent in French and might be mislead to a different site than the one they were looking for. Imagine the credit card payment page of the bank Societe Generale: with the accents written backwards, a lookalike site can be created under a domain name that looks the same. The same in Polish, with the cedillas under many of the letters or the L, with or without the bar accross it.

      Technically, unicode may be feasible, but human beings cannot distinguish between the hundreds of thousands (and more!) differents letters, characters and other signs that it offers...

      --
      http://www.masquilier.org/republic/election/ Condorcet, Plurality voting and alternative voting enabled bulletin board.
    8. Re:Isn't there a better way? by caluml · · Score: 2

      So if he'd started working on it over a year ago, it would be ready by now.

    9. Re:Isn't there a better way? by Zeinfeld · · Score: 3, Insightful
      Paul Vickie (of BIND fame) has stated that supporting unicode in bind would probably require at least a year to implement, and could introduce new buffer overflow exploits.

      If Paul Vixie did say that it would kinda argue for chosing that route rather than trying to get the IETF to agree to anything, so far it has been over five years since the start of this effort and counting.

      The real problem is not fixing Bind, that is easy. Deploying bind updates and deploying compatible client updates is the real problem. It just isn't feasible.

      --
      Looking for an Information Security student project suggestion?
      Try http://dotcrimeManifesto.com/
    10. Re:Isn't there a better way? by Minna+Kirai · · Score: 4, Insightful

      Why not extend dns to support unicode?

      DNS should never get Unicode support, or any form of "internationalization" for that matter!

      DNS is supposed to be a way for humans to communicate with computers about internet hosts. The intent is not for some human to be able to read it, but for all humans. This has worked until now because hostnames were limited to only ~37 characters. Regardless of native language, any computer operator can quickly learn to handle the [a-z][0-9] gylphs. Basically anyone literate in one language can copy ASCII characters from a signpost onto a notepad, and then punch those into a keyboard. Even if her culture doesn't use the ASCII set in normal daily activities (which about everyone in America, Europe, and Japan does), then the shapes are at least simple enough to copy geometrically.

      But if 16-bit charsets are allowed in DNS, we could get hostnames composed of 3 Chinese characters and two Arabic ones, and which a Russian or Briton will be incapable of processing without tremendous pain.

      DNS is something that should be left in a "lowest common denominator" form, so that it's accessible to all of humanity (if they meet the low hurdle of operating a normal PC)

      Internationalized host identifiers in URLs will be important, of course. But they should be a separate layer implemented on top of DNS. DNS is a standard that already exists. Rather than changing the standard and breaking every single internet-using computer (the "flag day" scenario), a new system should be rolled out for people who want host identifiers in funny-looking squiggles.

    11. Re:Isn't there a better way? by Anonymous Coward · · Score: 2, Insightful

      While I agree with the spirit of your post -- mainly that there should be internationalized domain names -- I do find fault with your argument.

      What the grandparent post was saying was that what makes the current DNS scheme universally accesible is its small codespace, not just that latin letters are used. While he did take a very anglocentric tone in his post -- which believe me, I have some issue with -- you failed to address the main issue here, which is a 16-bit codespace and its relative inaccessibility.

      I live in China and let me tell you, it took me several years to become appreciably literate in Chinese. 37 glyphs is not a lot to learn, and if DNS were based on 37 Chinese characters or whatever that would be fine, we could all learn that. But 37,000?

      Also, while I dislike the notion of non-globalized DNS, consider the facts: every keyboard on every computer in every nation can type those 37 characters. Moreover, the dominance of the western world today has ensured that there almost always exists some romanization system which the locals are vaguely familiar with. These systems may discard possibly vital information (for example, tones in Pinyin or umlauts on the Swedish town of Horby as mentioned by another poster), but they remain a) universally accessible and b) basically fairly easy to remember for all people.

      Let's be honest, computers and the computer world are (at least currently) very anglocentric. This is not right. In the future, hopefully, it will not be this way. And there are some places where using ascii is a pain in the butt for the locals.

      The argument that "if you can't type tho domain name, you can't read the content" isn't a bad one, although it does require multi-lingual sites to register redundant domain names.

      I think it would be simpler, at least for those countries which use some superset of ascii as their local writing system, to have DNS simply map intelligently to ascii.

      For example, in Germany, an umlauted letter could be transparently mapped to the same letter, sans umlaut, followed by an e, and the sharp s could be mapped to two ss.

      In French, where no established system for representing accents exists, letters with accents could be simply mapped to their respective non-accented counterparts.

      Because sometimes (and this often happens to me) you're on a computer that can display the characters but cannot type them; I'm french and when I'm in the states and I write an e-mail home I just don't use the accents. It's annoying, yes. But it's comprehensible.

      But what if I needed to type an accent just to get to a news site, or something? I would need to either figure out how to type the character -- not always easy -- or I would have to find the character and copy/paste it. Annoying, to say the least.

      Much better would be the option of typing without the accent, and the option of typing with. Both would map "internally" to the same domain name. And we europeans could get our accent fix.

      But for non-latin character sets (which IDN doesn't aim to support anyway, is eurocentrism really better than anglocentrism?) this system would of course not work, and so the locals would need to rely on officialized transliteration systems. But actually, you could certainly use a localized DNS system that did automatic transliteration (Chinese characters to pinyin, for example). That way the locals could use their character set, but internally, it would still just be ascii, ensuring that typing the URL for both the locals when abroad and for others (clients) would be possible without registering multiple domain names.

      What do you guys think?

  4. really dumb sounding by happyfrogcow · · Score: 4, Interesting

    I'm sorry, is it just me or do they seem to be taking a bad shortcut to get to a good end? It doesn't seem like they are doing this correctly. Why not plan to migrate to unicode? Their choice seems shortsighted and flawed. I hope they atleast considered unicode and came up with real reasons why not to use it.

    1. Re:really dumb sounding by x+mani+x · · Score: 2, Informative

      They did obviously consider unicode, perhaps you did not RTFA. However their solution uses unicode at a different layer.

      I think the *real* solution here is to reimplement ALL top level DNS servers to support unicode. But the overhead in doing this, when you really think about it, seems difficult (ICANN approval, unicode related bugs, getting everyone to use new DNS server, etc). At least, since the ASCII text supported by DNS are exactly the same in Unicode, backwards compatibility should not be a problem.

      This solution is a workaround that uses unicode at the client level, encodes it to "punicode" (which only contains characters supported by DNS, unlike, say, BASE-64 or Quoted-Printable), and sends the request to the DNS server. It is a quick and easy solution to a messy problem. But its hacky-ness makes me doubt it will be supported by whatever governing body influences this stuff (IETF, ICANN, etc).

      -Mani

  5. Why not by Pingular · · Score: 4, Funny

    But I guess it would be difficult to talk about this on /., since it does not even support basic Latin-1
    Just say the ascii number?

    --

    When anger rises, think of the consequences.
    Confucius (551 BC - 479 BC)
    1. Re:Why not by Golias · · Score: 2, Funny
      Euro: 0x450x750x24

      Accented a: 0x61

      "gze" sound: 0x670x7a0x65

      That was easy!

      --

      Information wants to be anthropomorphized.

    2. Re:Why not by gnu-generation-one · · Score: 2, Funny

      "Right... What's the ASCII code for the Euro sign? Or even accented "a"? How about the russian Gze?"

      Simple, just cut-and-paste them from Word, like those fantastically useful ?intelligent quotes? you keep seeing on people?s websites

  6. Useful? Naw. by grub · · Score: 4, Interesting


    I'm not sure what all the accents are on the alphabet, will I have to know to type them to access a simple website? Sorry, this doesn't make using the net easier.

    --
    Trolling is a art,
    1. Re:Useful? Naw. by tuffy · · Score: 4, Insightful

      If you don't know how to type the characters necessary to access the web site, chances are you won't be able to read the content anyway. So I think it's a moot point.

      --

      Ita erat quando hic adveni.

    2. Re:Useful? Naw. by ShecoDu · · Score: 2, Insightful

      As others have pointed out, if you dont use the accents, why would you want to visit a foreign language page? if you happend to like the language you can find the way to type the characters... besides, there is always a way to use google to locate the page and click on the link or something like that, you dont have to be so closed minded, not just because you dont find it usesful, everybody will see it the same way as you do...

      Just as the moderator guideline says "focus on promoting instead of modding down", the same applies, focus on the things you like and ignore those that dont mean a thing.

      If I happend to see a slashdot post about war and I couldnt give a f*ck less, I would certainly just avoid looking inside, it's obvious I'm going to get irritated with the comments inside, but that doesnt mean they are wrong...

      By the way, I'm not trolling or beeing agressive... I just express my point of view. =)

    3. Re:Useful? Naw. by Just+Some+Guy · · Score: 2, Insightful
      Bzzzt - wrong. You may not've travelled to countries with different "standard" keyboard layouts, but that's not going to help a Japanese businessman on a trip to Los Angeles figure out how to type the name of his company's website on a PC-104 setup. Put him on a Kanji keyboard and he'll be there in seconds. Give him a nice en.US layout and see what happens.

      What was your point again?

      --
      Dewey, what part of this looks like authorities should be involved?
    4. Re:Useful? Naw. by McDutchie · · Score: 3, Insightful
      I'm not sure what all the accents are on the alphabet, will I have to know to type them to access a simple website?

      Never fear, oh monolingual one, I found this very handy site that will help solve this pesky problem for you. Try it some time and let us know what you think!

    5. Re:Useful? Naw. by mijok · · Score: 2, Interesting

      Well for non-English speakers it will make quite a big difference. Let me give you two funny and/or embarrasing examples: Two municipality names in Sweden: Mnsters and Hrby. As you (hopefully) can see the first one has two dots over the "o" (called "umlaut" in german, i.e. a form of the letter "o", in Swedish it is considered a different letter in the alphabet) and a ring above the a and the latter name has two dots over the "o". Well, these municipalities have websites and since they can't get the dots and the rings the names are as follows: www.monsteras.se www.horby.se Now comes the funny and embarrasing part, since the names have become words, which mean something, translations: www.monstercarcass.se and www.hookervillage.se Now, try to tell the not-so-internet-literate people what to type in their web browser and get some reactions :)

      --
      Karma. Moderation. Is my .sig good now?
  7. Oh great... by JoeLinux · · Score: 2, Funny

    Now the Europian Union will want everyone to click on the left side of the mouse, left-handers be damned.

    The French will demand that "bandwidth exceeded" errors be renamed to "(web page) surrenders"

    The Germans will try to take over the internet.

    In a sneak attack, the Iraqis will launch a massive DDOS attack, but accidently hard-code localhost in the trojan. The Iraqi information guru will deny everything.

    1. Re:Oh great... by beebware · · Score: 2, Funny

      Whilst us poor Brits will just do everything President Bush's lapdog (aka Tony Blair) tells us to do.

  8. Taco, why did you remove the accents from slashdot by Anonymous Coward · · Score: 5, Funny

    ,
    Taco est un mechant garcon.
    '

  9. Maybe not as useful as one might believe by Ryu2 · · Score: 4, Interesting

    While it's logical for, say, Chinese companies to have a Chinese domain name and Chinese e-mail addresses, it may not be the best choice if the company wishes to expand oversea.

    Unfortunate but true, if a company has a Chinese domain name, it would probably be only used within China, Taiwan, Hong Kong, Singapore, Japan (since it's unicode), and maybe South Korea. The company would be pretty much limited to the East Asia market.

    However, I suppose the company could get both a Chinese domain and an English, or rather Pinyin, domain so they could make their Chinese, or maybe other Asian clients feel "closer" while also being able to reach clients outside of East Asia.

    I also think that it'd be great to give people the option of having a native-language email address. It's not too hard to set up a romanized email alias for it. An SMTP "X-Roman-Address" header could even by added to outgoing messages in case a recipient can't read the default "From" line.

    --
    There's 10 types of people in this world, those who understand binary and those who don't.
    1. Re:Maybe not as useful as one might believe by Scrameustache · · Score: 2, Interesting

      Unfortunate but true, if a company has a Chinese domain name, it would probably be only used within China, Taiwan, Hong Kong, Singapore, Japan (since it's unicode), and maybe South Korea. The company would be pretty much limited to the East Asia market.

      Yeah, they would "limit" themselves to the fastest growing economy in the world and a market of about 2 billion people...who'd want that?

      P.S. Why can't that company have a chineese domain name and a roman-character domain name? Is there a law I don't know about?

      --

      You can't take the sky from me...

  10. URLs that you cannot type by HermanZA · · Score: 3, Insightful
    That is sure to improve your hit rate no end...

    I sure hope this harebrained idea doesn't take off.

    1. Re:URLs that you cannot type by Scrameustache · · Score: 3, Insightful

      That is sure to improve your hit rate no end...

      URLs that you cannot type. But why would they want your hits if you can't even type their domain name? Its not like you'll be able to read the content if you get there, or understand their ads.

      --

      You can't take the sky from me...

    2. Re:URLs that you cannot type by WegianWarrior · · Score: 2, Interesting

      Or how about URLs you have to spell differently than you spell the name of the company in question? Thats a pretty harebraided idea, but one very many* people online today. Take for instance norwegians (as I happen to be one myself). The norwegian alphabet consists of 29 letters, the old 26 from latin (a-z) as well as three I can't show you here on /. since the site for some bizarre reason don't support them**. Therefore we're forced to use 'ae', 'oe' and 'aa'*** instead, opening for plenty more misunderstandsings for _norwegian_ websites catering for the _norwegian_ public. And since I still have to discover any online tranlator that can translate norwegian into english, I dare say that the chance of any non-norwegian needing to type the URL is slim at best.

      So frankly, you can have a big serving of STFU. If you don't see the point of this, you prolly will never use it anyway, or even notice. For those of us who actually care, this is pretty good news.

      __*) I would - wihtout seeing any proof - guess that the majority of people online today does not speak english as their native tounge.
      _**) Other US sites do...
      ***) For those interested, the ascii-codes are 230, 248 and 229 for small letters, and 198, 216 and 197 for capitals.

      --
      Everything in the world is controlled by a small, evil group to which, unfortunately, no one you know belongs.
  11. Companies will shell out more to registrars now by Arcturax · · Score: 4, Insightful

    After all, now they need not only worry about registering say...

    Microsoft.com
    Microsoft.net
    Microsoft.org
    Mic rosoft.tv
    etc..

    But also
    Microsoft.com
    Microsoft.com

    Well, you get the picture.

    --

    --Won't that be grand? Computers and the programs will start thinking and the people will stop. - Dr. Walter Gibbs
  12. IDN? Mozilla supports it by ospirata · · Score: 3, Informative

    I'm delighted to tell that Mozilla is one step forward again, and already supports IDN since version 0.9.5 http://www.mozilla.org/projects/intl/idn_mozilla.h tml

  13. Mixed feelings by f97tosc · · Score: 5, Informative

    I have mixed feelings about this. I am from Sweden, and it always looks kind of ugly when names lose their dots and circles in the domain name.

    On the other hand, this is also quite convenient. I live in the US now, and I travel around quite a bit. I often surf on Swedish Internet sites, typically without access to a Swedish keyboard. It would not be very convenient if the domain names used non-English symbols.

    Sometimes I go to Japanese sites also, and I am really glad that I don't have to install a Japanese word processor to do this...

    Tor

    1. Re:Mixed feelings by HerbieStone · · Score: 2, Insightful
      That's why Website owner will register thier sites under two Domain: The current one for english-keyboard users, and the (orginal) foreign-named Domain

      And that's also why registrars love it.

  14. Super Monkeys! by Speare · · Score: 5, Funny

    Any Internet RFC which includes the phrase, -with-SUPER-MONKEYS, has GOT to be good. (And in case you think I'm trolling, check the link.)

    --
    [ .sig file not found ]
  15. Re:FINALLY! by arcanumas · · Score: 5, Funny
    I'm glad to see that people other than Americans are being recognized on the internet. Which originally started as an American military project...

    I am glad too see others than the Mesopotamians using the wheel which was originally invented for use in Mesopotamia.

    --
    Slashdot Sig. version 0.1alpha. Use at your own risk.
  16. USA! by ekephart · · Score: 2, Funny

    U.S.A.!!! U.S.A.!!! U.S.A.!!!

    If it wasn't for us we'd all be speaking German. Wait.

    [ducks]

    --
    sig
  17. Punycode *is* a Unicode encoding. by Speare · · Score: 4, Informative

    Punycode *is* a Unicode encoding.

    Unicode has many encodings; UTF-8 is one encoding and Punycode is another. UTF-8 aims for efficiency when the majority of the text is ASCII, and Punycode aims for completeness when you must fit in 64 characters and use only the ASCII characters to do it.

    --
    [ .sig file not found ]
  18. Subject to Approval by The_Systech · · Score: 3, Funny

    Yeah, but did anybody get Al Gore's approval to make these changes?

    --
    To err is human, but to really foul things up requires a computer
  19. Taking 1337-speek to a new level by RobertB-DC · · Score: 2, Informative

    Now I won't have to be limited to using a hyphen! I can register d[i-circ]xiechicks.com, or dixi[e-grave]chicks.com, or maybe dixie[c-cedil]hicks.com!

    That last one would be doubly good, because if I understand the Punycode spec correctly, it'll get translated to ASCII as dixiehicks-XXXX.com. Not my opinion of the group, but maybe it would attract hits from the Toby Keith crowd.

    --
    Stressed? Me? Of course not. Stress is what a rubber band feels before it breaks, silly.
  20. it works fine on /. by GillBates0 · · Score: 4, Funny

    - - - - ..
    I, for one, welcome our new European overlords.

    --
    An Indian-American Hindu committed to non-violent thought/speech/action alarmed by the global explosion of radical Islam
  21. Because by SweetAndSourJesus · · Score: 2, Funny

    It's ve
    r&#1 21; diff
    ic&#117 ;lt t
    o ty& #112;e 
    like &#1 16;h
    is

    --

    --
    the strongest word is still the word "free"
    1. Re:Because by Tackhead · · Score: 2, Funny
      > It's v&#101 ;
      r&#1 21; diff
      ic&#117 ;lt t
      o ty&; #112;e 
      like & #1 16;h
      is

      An opportunity to quote one of my favorite bits of .sigfodder of all time:

      Now, I knew this was coming, but that still didn't prepare me to actually see it. I'm looking at this thinking "You know, that couldn't be ANY MORE WRONG if it was in HTML with a .GIF of a psychotic nun in a bondage outfit clubbing a baby seal to death with an Al Gore doll." I mean, _ew_. Is that supposed to mean anything to ANYBODY? Can I put that address on an envelope and have it get delivered somewhere other than "Ampersand Incorporated"? WHAT IDIOT THINKS THAT THIS IS A GOOD IDEA?

      Huey, on news.admin.net-abuse.email, commenting on the same issue, over two and a half years ago.

  22. No change needed... by JohnGrahamCumming · · Score: 5, Informative

    > You think you know how to parse a domain name for validity?

    Yes, I do, and if you _read_ the RFC you'll see that nothing changes, these domain names are encoded into the same character set as the current DNS system. And hence if you give me a URL I can validate it with existing scripts. There's an example which shows that Bucher.ch (with an umlaut on the u) would be translated to: xn--bcher-kva.ch which looks totally parseable to me.

    John.

  23. I can't wait by nizo · · Score: 5, Funny

    Personally I can't wait to see funky chinese character domain names in my web logs (mostly from infected windows machines trying to attack my apache server).

  24. Re:Bad idea but bound to happen with todays thinki by TomV · · Score: 2, Funny

    thier
    ludacris
    femail
    curce
    mentaly


    ...after all, some people find just 26 letters and 0-9 hard enough already ;)

  25. Reason by ajnlth · · Score: 3, Insightful
    I would guess that the reason for this rather than redesigning DNS to use Unicode is beacause of the still rather dominant presence of the USA on the internet.

    Since this solution doesn't break any old implementation just the countries that need it will have to modify their software, and not wait for the slow and expensive process of changing all of DNS, which a large part of the 'net isn't motivated do pay for.

  26. Just use Google by bstadil · · Score: 2, Insightful
    The whole issue of convenient Domain names is a bit passee.

    Often used url's I have as book marks and when i need some other site, it is much easier to make a guess via Google. What I am looking for is almost always on page one of googles choices.

    Sure Google could find a way to handle the special characters and make an intelligent suggestion, if nothing else based on IP address of the request. If it is from Burundi chances of needing a German umlaut is slim

    --
    Help fight continental drift.
  27. Wrong way on a one-way track... by mishehu · · Score: 2, Insightful

    Let's assume (and I might not be correct in this assertion) that every computer in every country can at least type & see the 26 letters used in the English language plus digits 0-9 and the dash & period signs. However, I have no idea how to type anything coherent in Chinese Simplified or Traditional (hell, it's all Chinese to me...)...

    In the interest of fostering the best method to communicate your ideas, products, services, etc., would you not want to use the characters that most everybody can type?

    Oh, and this begs the next question - what about languages that go right-to-left instead of left-to-right? How about Thai, Arabic, and Hebrew? Personally, I don't want to see any domain names outside of the 26 chars used in English, 0-9, and the period & dash signs.

  28. Sorry, but this is really stupid... by coene · · Score: 2, Insightful

    "Yeah, let's make sure that every normal english domain name can easily be spoofed with accented characters, not to mention having everyone open up and hunt around charmap to get to these new domains"...

    This isnt going to be abused, AT ALL. Worst idea ever.

    The Internet (domain names, top-tier nameservers, nameserver software, web and e-mail server software, all markup documents) runs on english, there's no way to i18n it without opening up a world of hurt. Sorry, but I don't want to have to upgrade BIND to a whole new series of bugs and exploits just so that some jagoff can open up his own go~o`le'.com.

    1. Re:Sorry, but this is really stupid... by pawal · · Score: 2, Informative

      Nothing in the DNS infrastructure need to be upgraded. There is only us-ascii in the zones. BUT, you have to upgrade your applications in order to read them the names the way they are supposed to read, otherwise you will end up with www.xn--rksmrgs-5wao1o.se instead of "www.raksmorgas.se".

    2. Re:Sorry, but this is really stupid... by dabadab · · Score: 4, Insightful

      You know, this arrogant, self-centric view does not help the discussion.
      Anyway, the current infrastructure DOES NO have to be updated and this change is NOT intended to be "some jagoff's playground", but rather for the non-English speaking people - there are quite a few of them.

      --
      Real life is overrated.
  29. Re:Bad idea but bound to happen with todays thinki by isaac338 · · Score: 2, Insightful

    The funny part is you'd probably be the first to complain had the Internet been designed by some foreign country and you couldn't register a plain English URL. Learning a whole new language isn't a "little learning curve", it's actually pretty hard.

    if you can't handle a little learning curce to access the info, IMO you aren't capable mentaly of doing anything with the info once you access it.

    Next time you go to a country the native language of which you can't understand, try planning your whole trip without once reading an English translation of any map or sign. Then you possibly might see how ignorant that statement sounds.

    The Internet is a world-wide resource, and like it or not, people who speak other languages have a say in how it works too.

    isaac

  30. Re:FINALLY! by cynicalmoose · · Score: 4, Funny

    The internet was built as a highly decentralised, noncontrolled network, so that, in the event of a nuclear war, military leaders would have unrivalled access to pornography. (3DTIAB)

    --
    Exercise your right not to vote. thinkoutside.org
  31. Well... it's still not perfect by Krach42 · · Score: 3, Interesting

    Ok, so you're mostly guarenteed a domain name if you own the trademark on the name. (To prevent cybersquatters right?)

    Well, what about the .jp domain? How can they possibly handle this, since in Japan you cannot copyright latin characters. (Or at least as far as I've heard)

    This is the reasoning I've heard, as to why IBM is ai-bi-emu in Japan. And maikurosofuto, souni, etc. (roomaji transliteration there, sorry if you don't get why ai=I)

    So what do you do in this case? Unless they can enter Shift-JIS or Unicode URLs, then you're stuck having people enter roomaji versions of your name, which remember, aren't technically trademarkable.

    I'd love to hear I'm wrong on some point here, could anyone with more info clue me in?

    --

    I am unamerican, and proud of it!
  32. Well, it had to happen sometime...I guess by The+Spanish+Ninja · · Score: 2, Insightful

    It looks to me like this isn't really going to be such a big deal. Their domain names are going to be converted for DNS anyway, so it's not like we would have to type in a complicated string of characters that aren't on our keyboards. So we can't remember what to type so easily, so what? That's why we have bookmarks. Besides, this isn't really for us anyway. It's purpose seems to be to allow the people in other countries to use their own native languages for their own domain names. Easier for them, right? And if we want to access their domains, we just have to remember a few extra letters and dashes. No big deal. They get to do stuff in their language, we translate to ours, the whole world speaks, and maybe something gets done.

    --
    "I like you, but I wouldn't want to see you working with subatomic particles."
  33. Re:Bad idea but bound to happen with todays thinki by mkiesila · · Score: 2, Interesting

    Good day to answer to a troll, here goes...

    26 letters and 0-9 are not the best way to communicate with computer if your native language has more than 26 letters in its alphabet. It's not about being insulted or offended, it's about being understood. The computer speaks all natural languages equally badly, after all.

    Let's think about average nordic webshop owner who sells beds online for a minute, operating for example in Finland or Sweden. He wants to sell stuff to the native dwellers and hence needs a domain name that has an "a" with two dots on top of it so that the domain name for bed is spelled corretly in swedish or finnish. It might surprise some people, but there are quite a lot of people who don't speak a single word of english. So the people who he wishes to sell beds to A) know how to spell "bed" in their native language and B)have a key like that in their keyboards, and, *gasp* prefer to use correct spelling when referring to things!

    So you don't have an "a" with two dots on your keyboard? That's just too bad, but then again you probably don't speak finnish too well either. Why would you want to visit that e-bedshop then?

  34. Use utf-8 instead of 'punycode'. by blitz487 · · Score: 2

    That's what utf-8 is for. Why on earth invent yet another encoding?

  35. You RTFA by Krach42 · · Score: 4, Insightful
    The introduction of the new IDN (Internationalised Domain Name) standard does much more than permit umlauts. A total of 92 additional characters, from the French e to the Danish o, will adorn domains.


    This means that it can't possibly include ALL of the unicode spectrum, as Unicode supports far more than just 92 extra characters.

    Also, the way the coding is going to work, you still can't register a name with B.

    According to international rules, this is equivalent to its transcription as ss. It would simply not be possible to distinguish between the domains straBe.de and strasse.de.
    --

    I am unamerican, and proud of it!
    1. Re:You RTFA by Krach42 · · Score: 2, Informative

      Actually, I'm aware of that, but Slashdot seems to have stripped out the accents from my stuff...

      I am aware that the German scharf s is not a capital B. I had it correctly in my submission, but someone who was working on the slashcode thought it would be a good idea to eliminate accents, rather than to possibly HTMLize them.

      Try it yourself, put in an scharf s into a Slashdot comment, and see what happens.

      I notice that you DIDN'T complain about the missing accent on the French e, or the missing slash through the Swedish o.

      Now, as a speaker of German for 10 years, I'm going to leave it at that.

      --

      I am unamerican, and proud of it!
  36. Babylon 5 by uberdave · · Score: 3, Funny

    microsoft and microsoft for instance are two completly diffrent words.

    Reminds me of that Babylon 5 episode when they find a person named Zathras down on this planet. Ivanova thought she had been talking to Zathras:

    "No, that was not Zathras, that was Zathras. There are 10 of us, all of family Zathras, each one named Zathras. Slight differences in how you pronounce. Zathras, Zathras, Zathras.. You are seeing now?" - Zathras, Babylon 5: Conflicts of Interest

  37. Re:FINALLY! by jea6 · · Score: 2, Interesting

    The last time I checked, binary had zero, so an off-hand uninformed (slightly prejudiced) comment as yours is even dumber when you actually think about it.

    For the Maya's, zero was not just a placeholder. It signified the concept of an absence of value, a.k.a. an empty set.

    http://en.wikipedia.org/wiki/Zero

    History
    The numeral or digit zero is used in numeral systems, where the position of a digit signifies its value, with successive positions having higher values, and the digit zero is used to skip a position. By about 300 BCE the Babylonians used two slanted wedges to mark an empty place in a given sequence of positional digits. It did not function in the true sense of a number. The use of zero as a number unto itself was introduced into mathematics relatively late by Indian mathematicians. An early study of the zero by Brahmagupta dates to 628.

    Zero was also used as a numeral in Pre-Columbian Mesoamerica. It was used by the Olmec and subsequent civiliations; see also: Maya numerals.

    The ancient Maya civilization used a vigesimal (base-20) numeral system.

    A vigesimal numeral system has a base of twenty.

    --

    sarchasm: The gulf between the author of sarcastic wit and the person who doesn't get it.
  38. Re:Not to be Overly American... by Mnemia · · Score: 4, Insightful

    Yes, it is. Because it's not just a few "umlauts". When you're talking about Asian or other non-Romanized languages then the Romanization may be totally incomprehensible to even some speakers of that language. It's one thing to lose a few accent marks and such but it's quite another to translate your language into a totally incomprehensible and unrelated format. In fact in kanji based languages at the very least Romanization actually LOSES information. It's not just a matter of transcribing the sounds into another format because the kanji carry additional meaning not present in just the phonetic lanaguage. If you've ever seen two native Chinese or Japanese speakers talk to each other they frequently will "write" kanji in the air or on the palm of the other person's hand with their fingers because their spoken language is imprecise.These changes are very necessary for the Internet to become a truly international phenomenon

  39. Backwards compatability by Stephen+Samuel · · Score: 3, Informative
    Why not extend dns to support unicode? That way they'd be no translation or other crap to go through.

    Sounds like a great idea.... If you're willing to re-implement the DNS code in my Win-95 box.... or on my Amiga-4000. How about my 10 year old Apollo workstation or the SUN-3 that's still working just fine, thank you. etc. etc.

    A lot of old DNS implementations would choke (and properly so) on UTF-8 encoded DNS names. We probably could have seeded the needs of the future by saying that IP-6 DNS servers should support unicode, but I think that even that boat has been missed. (or is quickly leaving dock).

    In the meantime the old DNS and it's anglo-centric presumptions and restrictions are with us for the next few years (or decades, as the case may be). Clearly some people feel the need to live within those restrictions.

    --
    Free Software: Like love, it grows best when given away.
  40. Who types URLs? by Royster · · Score: 2, Insightful

    Geeks do, but your average surfer does not. They go clickly clickly on the results returned by the search engine or clicky clicky on the link someone emailed them or clicky clicky on the link from some other website.

    Most users don't even *know* that you can type stuff in the Address field.

    --
    I have discovered a truly marvelous sig, unfortunately the sig limit is too small to contain i
  41. Re:FINALLY! by McDutchie · · Score: 2, Insightful
    I'm glad to see that people other than Americans are being recognized on the internet. Which originally started as an American military project...

    I'm glad to see that people other than the Swiss are being recognized on the web. Which originally started as an Swiss scientific project...

    Without the rest of the world, the Internet would have been obsolete and irrelevant by now. Deal.

  42. It's not for trolls. by dmelomed · · Score: 3, Insightful

    "djbdns doesn't support unicode either, although it doesn't rely on standard c-libraries, so unicode support might only take a few weeks to add."

    djbdns is 8-bit clean. Use UTF-8 all you want right now.

    1. Re:It's not for trolls. by divec · · Score: 2, Insightful
      Are you claiming that if I use UTF-8 to encode a string, I will never get a bytestring that contains a 46 (that is, a dot ".")?

      That's correct - no unicode codepoint apart from [FULL STOP] will cause a \x2E to appear in a UTF-8 stream. UTF-8 encodes the first 128 code points of Unicode using the identical ASCII values (which all have the eighth bit set to 0), and then only using combinations of the other 128 byte values (which all have the eighth bit set to 1) to encode every other character. It's very cool - that's why existing software doesn't usually need much modification to support UTF-8.
      --

      perl -e 'fork||print for split//,"hahahaha"'

    2. Re:It's not for trolls. by Carewolf · · Score: 2, Insightful

      Yes he is not only claiming that. He is right and you should look up your facts.

      UTF-8 only uses non-ascii values to produce non-ascii characters. That's one of the things that make it really neat, and easy to convert to. It also means that you jump into an UTF-8 stream at any point without getting out of sync and receiving trash. this makes it more powerfull than UTF-16.

  43. Compatibility question by Psychic+Burrito · · Score: 2, Interesting
    Does anybody know if this will just work "out of the box" with every computer that can produce umlauts?

    I'm asking because today, I've tried out the Netsol way of doing umlauts and they don't work at all with my Mac OS X and Safari: None of the listed domains work. The page lists a "plugin" that every web user is supposed to install, but it's Win only (of course...) and it's quite silly to have a domain with umlauts if you have to tell all your customers "before visiting me, please install this plugin"...

    Any idea if this new way work in all circumstances where the user has a international keyboard? Thanks!

  44. I for one... by corebreech · · Score: 2, Redundant

    ...most certainly do not welcome our new Unicode-munging overlords.

    I don't care what the issues are. I have had it up to HERE with charset issues! ENOUGH ALREADY!

    If you can't do it using UTF-8, don't do it at all!

    Dammit.

  45. Re:A Step In The Right Direction by geoffspear · · Score: 2, Funny

    Bah. The ancient Greeks didn't need any accents, why should we?

    --
    Don't blame me; I'm never given mod points.
  46. Spoof with accented characters by Cardbox · · Score: 2, Informative

    There's no need to put accents on things, you can spoof just as well without. For example: the Greek omicron, Russian lowercase o, and Latin lowercase o all look identical... but they are all different Unicode characters!
    Unless the registries all implement some sort of canonicalization, owners of domain names containing the letter "o" are going to have a combinatorial explosion!

  47. This is important.. by k98sven · · Score: 4, Interesting

    Just to diverge, I'd like to represent the non-english speaker view here.

    In most of the languages with 'funny accents' like umlauts, these characters often have a completely different pronounciation, and are often considered to be a completely different letter than without the 'accent'.

    Simply 'brushing off the dirt' and removing the 'accent' thus changes the word. Sometimes with wierd results.
    Just ask someone from the town of Moensteraas, Sweden.
    Their website contains mostly municipal information intended for swedes, but due to the restrictions of DNS, the name is instead spelt 'monsteras', which means 'monster-carcass' in Swedish.

    Obviously, these people would be happier spelling it with umlauts on the o, and a ring over the a.

  48. Multiple Non-Technical Problems by angedinoir · · Score: 2, Interesting

    First of all, this opens a huge hole for url hijacking and obfuscation.

    Say for instance, you get a spam that has a url to http://www.microsoft.com/freeoffers

    You too were tricked, but you'll notice that instead of a normal i, it is instead replaced with an accented i or an i with a grave (slashdot strips these btw). Anyone that doesn't use accents (english, japanese, chinese, etc) probably won't catch the minor detail and will probably think that it's really pointing to www.microsoft.com.

    This is very similar to, but less obvious than using:

    http://www.microsoft.com@via.gra.biz/offers

    Most non-tech internet users will also believe this to be Microsoft's web-site. Spammers will have a hay day with all of the new opportunities.

    The second non-technical problem is that say I want to go to a Japanese web-site that doesn't have an english url. If I don't know kana/kanji (like most countries don't), then I don't know what letters to type in to get the correct japanese. I would have to get a dictionary and look up each character to figure out what to type.

    I agree that it's lame to only have it in english, but at this point, any country that uses the internet already has the ability to type english, but now they will need to be able to type in Japanese, Chinese, Russian, Greek, etc, etc, etc....

  49. Re:Can we have punctuation while we are at it? by Tazzy531 · · Score: 2, Informative
    There are technical reasons for disallowing certain characters. They are "reserved characters" in URLS.
    • The ? signifies the end of the URL and the beginning of the parameters.
    • The & deliminates the parameters.
    • The % are used for escapes [ie %20; is a space in URL parameters].
    • The = is the assignment operation in URL parameters.
    • The # is link anchors


    There are a couple others, but I don't remember them offhand... So in other words, these characters are unusable for a reason.
    --


    _______________________________
    "I'm not Conceited...I'm just a realist..."
  50. Accecents like case? by davburns · · Score: 3, Insightful
    Perhaps I'm showing grave naivete, but it seems like it would be better to treat accents (dots, slashes and stuff) like case. DNS names are case insensitive, but case preserving. So, you can type all your fancy European characters if you want, but you don't have to mess with them if you're on a keyboard where that's difficult, and there's no additional opportunity for squatting or visual name hijacking. Naturally, you would want the accents to appear on reverse lookups (just like mixed case domain names work.)

    I know there are times when differnet accents sometimes indicate different words -- but I'm under the impression that it is unlikely that more than one of them would be a "good" domain name. (Am I wrong about that?)

    This won't work for non-latin characters, obviously. But UTF-8 seems like a better solution to that. (I understand that most chineese words are 2-3 characters of 2-3 bytes (unified is U-430 to U-9fa and upto U-7ff is 2 characters) for 4-9 bytes -- clearly less than 63 bytes) The obvious downside is that it means that all DNS servers and resolvers must (at least!) be 8-bit clean.

  51. Arrrgggh by nnnneedles · · Score: 2, Funny
    The intent is not for some human to be able to read it, but for all humans.

    Ohhh the arrogance of americans.

    Here's an example of why this is good. In sweden there is a town called Horby. That's 'o' with two dots over it. Their site has to be named 'horby' as it is now (without the dots). Horby means 'the village of whores' in swedish.

    Do you think that billions of people who use other alphabets than the american one, are going to agree with anything you said in your post?

    This change IS a big deal, not only for small towns, but for loads of big companies, government websites and all kinds of sites you can think of.

    --
    Will code a sig generator for food
  52. Translitteration by k98sven · · Score: 2, Informative

    Why monsteras instead of moensteraas?

    Good question. Basically people don't think/too lazy to translitterate the letters properly.

    Some places have the forethought to register both:
    Munich in Germany has registered both "munchen.de" and "muenchen.de".
    (But it's really a u with an umlaut)

  53. Re:Example by rduke15 · · Score: 2, Informative

    http://www.xn--rksmrgs-5wao1o.se/ will work if you are using a recend Mozilla

    Thanks for the example. Let's do a few quick tests.

    The encoded version always works, and leads to a page where you have an unencoded link (normal spelling with the accents).

    Copied the unencoded version, and tried:

    On WinXP:

    - Mozilla 1.4 : OK
    - MSIE 6, Opera 6.2 : NO

    On Linux - Red Hat 6.2 (of course, that's a pretty old system):

    - lynx, ping, host, dig, ... : NO
    (cannot test Mozilla, since this server has no GUI.)

    Well, I guess we'll have to live with that horrible Punycode.