Slashdot Mirror


ICANN Approves Non-Latin ccTLDs

Several readers including alphadogg tipped the news that ICANN has approved non-Latin ccTLDs at its meeting in Seoul. "Starting in mid-November, countries and territories will be able to apply to show domain names in their native language, a major technical tweak to the Internet designed to increase language accessibility. On Friday, the Internet's addressing authority approved a Fast-Track Process for applying for an IDN (Internationalized Domain Name) and will begin accepting applications on Nov. 16. The move comes after years of technical testing and policy development... Currently, domain names can only be displayed using the Latin alphabet letters A-Z, the digits 0-9 and the hyphen, but in future countries will be able to display country-code Top Level Domains (cc TLDs) in their native language. ... 'The usability of IDNs may be limited, as not all application software is capable of working with IDNs,' ICANN said in a 59-page proposal (PDF) dated Sept. 30 that describes the [application] process." Reader dhermann adds, "Great, now even less chance I can identify NSFW links before they are blocked by my work's big brother app and my boss is notified... again."

45 of 284 comments (clear)

  1. terrorist level domain by czarangelus · · Score: 4, Funny

    Arabic TLDs are a threat to national security

    --
    When a true genius appears, you can know him by this sign: that all the dunces are in a confederacy against him.
    1. Re:terrorist level domain by Anonymous Coward · · Score: 3, Informative

      That has been possible for years.
      This is about registering bankofamerica.cõm or lloydstsb.cø.ûk

      The part AFTER the dot.

  2. I took Latin in high school by Anonymous Coward · · Score: 2, Funny

    I'm glad we're going with Non-Latin TLDs now, I never understood going to the website "e.pluribus.unm"

  3. Perdire by SEWilco · · Score: 2, Funny

    There go my plans for world domination through venividivici.vvv

  4. first urls, then slashdot by azior · · Score: 5, Funny

    ï höpé thãt slâshðõt wìll dö thís töø wìth ÜRLs!

    www.íçáñn.örg

    ìt wörkéð!

    1. Re:first urls, then slashdot by Anonymous Coward · · Score: 5, Funny

      Here's a demonstration of how non-Latin characters show on /., starting with Arabic:

      Hindi:
      Russian:
      Japanese:
      Korean:
      Chinese:

    2. Re:first urls, then slashdot by rxmd · · Score: 2, Insightful

      Just because the characters don't show up in the edited text doesn't mean that they won't be handled in anchor tags or Slashdot's URL tag.

      Well, Slashdot mangles them anyway. The URL should end in .com.

      Slashdot's web interface is quite embarrassing in this respect. Having a non-Unicode-capable page in 2009 is like having one that is optimized for Netscape 0.9, no matter what amount of JavaScript and Web 2.0 bling they put in there.

      If international URLs will finally force Slashdot to implement a triviality such as string parsing, so much the better.

      --
      As a state gets corrupt, its laws multiply; the most corrupt states have the most numerous laws. (Tacitus, Annales 3:27)
  5. ICANN has lost it! by RiotingPacifist · · Score: 4, Insightful

    Far too much software makes the assumption that TLDs only contain [a-z0-9-], so if you want to go changing that there needs to be a damn good reason, there is not. There are ~1369 2 letter TLDS to be shared between ~200 soverin states and 49284 3 letter generic ones to be split between uses (.xxx .nws .org .edu, etc), there doesn't seam to be any good reason to expand that and make lots of software more complex.

    --
    IranAir Flight 655 never forget!
    1. Re:ICANN has lost it! by imagoon · · Score: 2, Informative

      If everyone in the world liked those latin characters, then sure. But maybe someone else in the world prefers yahoo.(nihon*)? Wanted to write it in kanji but /. doesn't seem to take unicode.

    2. Re:ICANN has lost it! by Anonymous Coward · · Score: 2, Insightful

      This is about letting people use characters from their frickin' own language instead of just english.

      Just like so many other things in programming.. if the software doesn't do international, it doesn't do international.

      This has nothing to do with making more TLDs.

    3. Re:ICANN has lost it! by Jorgensen · · Score: 3, Insightful

      Yeah right. Because everybody in the whole world only uses ASCII right?

      Sorry for sounding flippant, but such US-myopia is far to prevalent for my liking.... Come on guys: Wake up and smell the coffee! There is more to the world than the US! There is no reason to make most of South East Asia and China 2nd-rate citizens on the internet.

      I agree that there is a lot of software that needs changing as a result though. But that just means more work, right? You could probably sell this as an anti-recession measure too.

    4. Re:ICANN has lost it! by Tanktalus · · Score: 2, Funny

      And now, with today's progress, that'd be CØBÖL.

    5. Re:ICANN has lost it! by jayme0227 · · Score: 2, Insightful

      You know, except for ease of use for those who don't use Latin characters in their daily lives. But who cares about them? They should just go back to their own country and create their own internet.

      --
      But then I realized the cable was blue, so I only gave it one star. I hate blue.
    6. Re:ICANN has lost it! by bill_mcgonigle · · Score: 2, Informative

      if you want to go changing that there needs to be a damn good reason

      I don't have any first-hand experience, but according to the BBC story when one enters a native-script domain name into one's browser, the domain name is entered normally (for the locale) and then to enter, e.g., ".in", one needs to press a key combination to shift the keyboard into latin-mode, then, enter the two letters, then shift the keyboard back into native mode.

      It's a usability problem. I sure would be annoyed if .com had to be rendered in Kanji on my system.

      --
      My God, it's Full of Source!
      OUTSIDE_IP=$(dig +short my.ip @outsideip.net)
  6. And the answer to that... by Looce · · Score: 4, Interesting

    ... of course, is Punycode.

    A comment before yours has www.íçáñn.örg, which, when entered into Firefox, turns into

    www.xn--n-tfarxw.xn--rg-eka

    . Looks like the software will still live :)

    1. Re:And the answer to that... by Looce · · Score: 4, Informative

      You don't understand. Punycode is how second-level domains are already implemented, even on top of relatively old browsers. This is an extension of Punycode to be usable in the TLD as well.

      In other words, your current version of Firefox will be able to visit pages in IDN TLDs when they're implemented, and so if someone does create a .örg TLD today, you can go to www.anysite.örg to your heart's content already.

      Note that this doesn't mean you can go to www.anysite.örg in NCSA Mosaic or anything, because these old browsers were around when Punycode wasn't even a standard. You can go to www.anysite.xn--rg-eka and NCSA Mosaic will recognise that, though. The seamless IDN TLD usage is just going to be present in the more modern browsers. I expect that Opera 8+, IE 6+, Firefox 2+ and recent Safari/Konqueror/Epiphany are going to be able to visit www.anysite.örg and 'hide' the xn--etc- access details from you, the user.

      Happy surfing!

  7. Phishing aid by querist · · Score: 5, Insightful

    This will only make phishing attacks easier unless there are SERIOUS checks on domain name registrations. There are letters in the Cyrillic alphabet that have different character codes than their look-alike letters in the Latin alphabet. I'm sure there are other collisions as well. I'm sure they accounted for this in the proposal, but the problem always lies in the implementation. From a security standpoint, this is a VERY bad idea without proper regulation of domain name registrations, and so far it has been demonstrated that we cannot manage them properly even with only the Latin alphabet. From a cultural and usability standpoint, this is a good thing. It will be easier for someone whose native language uses a non-Latin alphabet to recognize the supposed purpose of a web site by its domain name if some of those domain names can be in their native language. A hypothetical native Tamil speaker who speaks no English will be able to recognize the purpose of a site with an appropriate domain name in Tamil, for example

    1. Re:Phishing aid by nsayer · · Score: 3, Informative

      I think the limitation that nationalized character sets will be restricted to the country TLDs where that language is native is a good first step. Additionally, I believe you're not allowed to use the latin alternative form characters from unicode (like 0xFF20-0xFF5F).

      If you're really paranoid, you could just be extra suspicious of domains that end in two letters (and yes, I am including .us), particularly when the 2nd level name is something you recognize, like paypal, ebay, etc. If you're in China, there may indeed be a legitimate paypal.cn, but I suspect it would set off my spidey sense to see a URL like that show up in my e-mail.

    2. Re:Phishing aid by dkf · · Score: 2, Insightful

      If you're really paranoid, you could just be extra suspicious of domains that end in two letters (and yes, I am including .us), particularly when the 2nd level name is something you recognize, like paypal, ebay, etc. If you're in China, there may indeed be a legitimate paypal.cn, but I suspect it would set off my spidey sense to see a URL like that show up in my e-mail.

      That won't work. There really are a lot of big companies that have country-specific sites that use the two-letter global domains. For example, if you're after books in German then you might be very interested in visiting amazon.de, which is totally legit.

      --
      "Little does he know, but there is no 'I' in 'Idiot'!"
    3. Re:Phishing aid by pablo.cl · · Score: 2, Informative

      There are letters in the Cyrillic alphabet that have different character codes than their look-alike letters in the Latin alphabet.

      Remember we are talking about ccTLDs. There are no more than 200 countries that would like to use non ASCII ccTLD, and they can be inspected manually. Russia wasn't awarded Cyrillic .ru because it looks like Latin .py (Paraguay). They will get .fr (Russian Federation) that looks like 0p (0 with vertical bar).

    4. Re:Phishing aid by nsayer · · Score: 2, Insightful

      Yeah, but if you know that you want that, then you'll be expecting it. We're talking about being on the lookout for 2 letter TLDs in places you don't expect them.

    5. Re:Phishing aid by Mathieu+Lu · · Score: 4, Insightful

      This risk can be greatly reduced if they limit domain names to only one alphabet, i.e. Russian domain with Cyrillic ccTLD should have only Cyrillic letters in it.

      In many of these countries, they often have two domain names for a website: one that is easy to remember by foreigners, one that is easy to remember by locals (i.e. cyrillic name transliterated to Latin alphabet). The transliterated domain name is usually horrible, sounds weird, and often people transliterate stuff in different ways, so it's often not easy to remember anyway.

      I think non-latin ccTLDs is a good thing.

      matt

  8. Re:Speeding the path to IPv6? by Anonymous Coward · · Score: 2, Insightful

    I wonder what impact this will have on the ever decreasing amount of IPv4 addresses available.

    This will have absolutely no effect on IPv4/IPv6. This is a DNS change to allow additional characters in domain names.

    The domain names get translated to ip addresses by DNS servers.

    I doubt that individuals & companies said, "No! We refuse to go on the internet until we can have TLDs with non-Latin characters."

  9. Re:Encoding? by Psx29 · · Score: 3, Informative

    "in order to maintain compatibility with the existing infrastructure." Tons (dare I say, a majority) of software would break if they used UTF8

  10. Re:Encoding? by DamonHD · · Score: 5, Informative

    To avoid breaking all the DNS-related code out there that assumes (ie correctly, based on the current spec) only alphanumerics and '-' in each component.

    If you wish to rewrite every single bit of DNS-dependent code, in every laptop, server, embedded network device, etc, etc, ... well assume that it can't be done, and with this mechanism it doesn't need to be. Though I bet a few bits of code will barf at the '--' anyhow...

    Rgds

    Damon

    --
    http://m.earth.org.uk/
  11. Re:Encoding? by tokul · · Score: 2, Informative

    Any DNS gurus care to explain why they wouldn't simply use UTF8?

    I am not DNS guru, but guessing. RFC882 - November 1983. RFC2044 - October 1996.

  12. don't forget who wer're talking about here... by damn_registrars · · Score: 5, Insightful

    There are letters in the Cyrillic alphabet that have different character codes than their look-alike letters in the Latin alphabet. I'm sure there are other collisions as well. I'm sure they accounted for this in the proposal, but the problem always lies in the implementation

    This is a decision made by ICANN. We've known for some time that they will willingly approve really tremendously bad ideas, if enough money is presented to them. They recently moved on a motion to start selling gTLDs, after all.

    From a security standpoint, this is a VERY bad idea without proper regulation of domain name registrations, and so far it has been demonstrated that we cannot manage them properly even with only the Latin alphabet

    Security is not of any concern for ICANN. Never has been, never will be. As long as they keep making money they're happy; security, spam, phishing, etc, be damned.

    --
    Damn_registrars has no butt-hole. Damn_registrars has no use for a butt-hole.
  13. Excellent idea by ugen · · Score: 3, Insightful

    Now those countries, organizations and businesses that wish to become inaccessible to most of the world (except the native speakers of their own language) can finally do so as easily as possible. Create their own little Internet reservations and stay there :)

    As long as my software (such as Firefox) obligingly converts these IDN urls into the dash-hex notation making them obviously unreadable, I am ok with that.

    Disclaimer: I am a native of non-English speaking country. I am sure a few of my countrymen will use this feature based on misplaced patriotism. I am also sure that vast majority will ignore it just like they ignore potential to use non-latin domain names that exists right now.

  14. Latin =/= Support for English only. by mano.m · · Score: 3, Insightful

    A lot of the debate here seems to be about English-speaking countries vs. the rest of the world, but English isn't the only language that uses the Latin. Also, the unavailability of non-Latin scripts hasn't hampered the flourishing of home-grown websites in India and China named in their many local languages - what makes the ICANN think this is even necessary?

    --
    Karma fed to this user will be promptly burnt. Be warned; be wary.
    1. Re:Latin =/= Support for English only. by pablo.cl · · Score: 3, Insightful

      Actually we are talking about the English alphabet, with j, u and w, which Latin din't have.

    2. Re:Latin =/= Support for English only. by Estanislao+Mart�nez · · Score: 2, Insightful

      Also, the unavailability of non-Latin scripts hasn't hampered the flourishing of home-grown websites in India and China named in their many local languages - what makes the ICANN think this is even necessary?

      And how exactly do you claim to know this? It certainly makes it difficult to market the website among the potential user base who have only a shaky command of the Latin alphabet.

  15. Re:Encoding? by Creepy · · Score: 2, Insightful

    Actually, UTF-8 can and is being used in DNS - as long as you stick to basic Latin characters, that is. Also it is Unicode - as I posted earlier, Unicode is a blanket for UTF-8, UTF-16 and UTF-32 which makes it ambiguous.

    UTF-8 bits 0-7 is ASCII as long as bit 8 isn't set, so to fully support it you'd need to still exclude bits below 7 that are not valid html characters and include support for multiple bytes and bit 8. The reason existing DNS servers won't work with it is because bit 8 indicates multibyte and the second byte may carry an invalid character from the 0-7 bits and the first byte may have a language encoding for the second byte (indicated by bit 8). For instance character 43 is + and that is invalid in a URL. If character 1 had bit 8 set and indicated the language as French in the language encoding (which I believe is done in the first 7 bits and can in some cases be extended to the second or even third byte, but its been a while since I read the spec - I do know there is an encoding that does this and I'm pretty sure it is UTF-8), the second byte 43 would (probably - I'm not going to look it up) mean something entirely different and be perfectly valid.

  16. Re:TLDs only? by petermgreen · · Score: 2, Informative

    It's already been in use for the rest of the domain name under certain TLDs for some time.

    --
    note: i'm known as plugwash most places but i screwd up registering that here somehow in the past and now can't register
  17. it just got easier to phish by Nadaka · · Score: 3, Informative

    Yay. Now you can can register yourbankname.com with some funky characters that render in exactly the same way as the letter you are used to.

  18. Re:FORSTUS POSTUS by twistedcubic · · Score: 2, Funny

    That's PRIMUS POSTUS.

  19. Re:Its to do with people with the wrong keyboard . by jason.sweet · · Score: 2, Insightful

    There are a lot of websites where the words don't matter.

  20. Re:Its to do with people with the wrong keyboard . by mea37 · · Score: 3, Insightful

    Uh, yeah, because the keyboard you're using is a clear indicator of which language(s) you understand.

  21. The Internet has to evolve by PinkyDead · · Score: 4, Funny

    ....although obviously not ... in Kansas.

    --
    Genesis 1:32 And God typed :wq!
  22. Re:Its to do with people with the wrong keyboard . by shutdown+-p+now · · Score: 3, Informative

    How exactly do you think you'll be able to type in a URL in mandarin or russian on west european keyboard?

    You enable Chinese keyboard layout (dunno what's it called), and type it. The letters printed on the keys of your keyboard aren't some sort of magic that lets your computer input languages written in them, you know.

    I don't have any keyboards with Russian characters on them, but I happily type in Russian regardless (in fact, I only first realized that I do actually truly touch type when I first ran into this problem, which turned out to not be a problem in the end).

  23. Re:Hmm... by xaxa · · Score: 2, Funny

    micrösöft.cöm?

    That's Microsoft with the volume turned up to 11?

  24. Re:Um, can they be more specific than "Unicode"? by spitzak · · Score: 2, Informative

    Several mistakes there.

    First of all any domain name is going to have to be encoded as a stream of bytes somehow because far too much stuff is already implemented to handle the string that way. As others pointed out punycode is used.

    Second, UTF-8 is smaller than UTF-16 for all languages, even Chinese. This is because all the ASCII 0x00-0x7F characters are smaller, and therefore the encoding will be smaller if there are more of these than there Unicode 0x800-0xFFFF characters. This seems incorrect for Chinese but you have to realize that ASCII includes spaces, newlines, numbers, and all XML and HTML markup and therefore any reasonable sized Chinese document will be smaller in UTF-8.

    Translating encodings to "wide characters" is a mistake, as you have noticed. You should write your software to deal with it in it's original encoding because that is the only way to intelligently deal with errors in the string. The fact that Windows uses UTF-16 for an encoding a lot seems to be confusing people no end, but please check exactly what they do when that UTF-16 has surrogate pairs, or even "invalid" surrogate halves. They are handling the original encoding, they are not "translating it to Unicode".

  25. Re:Its to do with people with the wrong keyboard . by shutdown+-p+now · · Score: 2, Interesting

    I'm happy you'll do this. I won't, and the majority of the internet users won't either. It'll just further separate nations, because I won't go through the hassle of typing in a foreign character domain name - it'll just a site I won't visit.

    Presumably, if a site is designed to be visited by someone who only understands English, it will use an English TLD. If it uses TLD with national characters, then most likely the content is in the language other than English as well, and you'd need to have means to input that language to fully interact with the site anyway.

  26. Re:Its to do with people with the wrong keyboard . by Runaway1956 · · Score: 2, Insightful

    "the majority of the internet users won't either."

    Sorry, but that sounds like typical American ethnocentricity. The MAJORITY of internet users actually are people who don't natively speak English. Chinese speakers, Russian speakers, European people, many of whom use cyrilic alphabets, Arabs, South Americans, Indians, and others that I'm surely missing.

    How can you possibly speak for "the majority of internet users", when people who speak English as their native language constitute a pretty small percentage of the world's people? I could google, but I'm almost willing to bet that more people on this earth grow up speaking Chinese, than people who grow up speaking English as their first language.

    If a guy is more comfortable using his own language, I'm all for him doing so.

    --
    "Windows is like the faint smell of piss in a subway: it's there, and there's nothing you can do about it." - Charlie Br
  27. Re:NSFW by Zontar+The+Mindless · · Score: 2, Insightful

    I don't normally browse websites written in a language I can't understand.

    1. The link text in the example I provided was in English.

    2. I am not aware of any requirement that only one language may be used on a given website. If there is such a requirement, please inform my contacts on Facebook of this, because they post messages there in about 15 different languages using at least 4 different writing systems. (And I've posted there myself in 4 languages, including English.)

    I still see an ignorant american that thinks the whole world should read and write english for people like dhermann.

    1. See above.

    2. So you are saying that you can read my mind? Perhaps this ability of yours needs some fine-tuning, since I never made any such assertion.

    3. It's true that I still carry a US passport, but I've not lived there in many years.

    NSFW has nothing to do with supporting more internationalization and it's all a cop out.

    Nobody is "copping out", and if you seriously think I am opposed to internationalisation, you're barking up the wrong tree.

    Nevertheless, dhermann is voicing what I believe is a legitimate concern, even more so for less sophisticated and experienced Internet users.

    The answer to such concerns is, of course, education. Many people are not even aware that services like Google Translate are available.

    In the meantime, I suggest you remove the chip from your shoulder. Not all Americans are alike, you know.

    --
    Il n'y a pas de Planet B.
  28. Re:Here comes the Phishers! by mjwx · · Score: 2, Informative

    You do know that this is for the TLD part of the URL only. The first part of a domain can already be written in non latin scripts, Korean for example but the TLD must but Latin, this decision just enables the .com.kr to be turned into Hangul.

    If ICANN did not standardise this then nations will just implement their own systems which will be different and incompatible with each other, much like China and Thailand have already done.

    --
    Calling someone a "hater" only means you can not rationally rebut their argument.