Slashdot Mirror


International URLs Pass First Test

Off the Rails writes "The BBC reports on the results of a successful test of non-ASCII domain names on Internet-equivalent hardware (pdf) carried out last October. The next stage is to plug the system into the net, and if it still works, it could go live sometime next year. 'Early work on the technical feasibility of using non-English character sets suggested that the address system would cope with the introduction of international characters tests were called for to ensure this was the case ... Also needed are policy decisions by Icann on how the internationalised domain names fit in and work with the existing rules governing the running of the address books. Icann is under pressure to get the international domain names working because some nations, in particular China, are working on their own technology to support their own character sets.'"

32 of 159 comments (clear)

  1. Great by otacon · · Score: 5, Funny

    now I have to learn second languages to look at asian porn.

    --
    In a world of acronyms, the words are the real victims.
    1. Re:Great by vivaoporto · · Score: 4, Funny

      I watch porn to learn foreign languages, you insensitive clod!

    2. Re:Great by Lenneth-chan · · Score: 3, Funny

      I'm pretty sure most porn sounds are the same in any langauge.

    3. Re:Great by mosburn · · Score: 2, Insightful

      That's what the cheesy lines at the beginning are for, typical pizza boy/plumber/etc. to get you going as your intro to a new language.

    4. Re:Great by nacturation · · Score: 2, Funny

      Vanilla people? Vanilla typically refers to the deep brown extract from the vanilla bean, which is itself nearly black once fully dried and cured. So the original poster is clearly referring to African tribesmen.
      --
      Want to improve your Karma? Instead of "Post Anonymously", try the "Post Humously" option.
    5. Re:Great by dunkelfalke · · Score: 2, Funny
      --
      Conservatism: The fear that somewhere, somehow, someone you think is your inferior is being treated as your equal.
  2. Phishing just got a lot more interesting by L.+VeGas · · Score: 4, Funny

    Imaging all the new ways to spell bank0famerlca.com.

    1. Re:Phishing just got a lot more interesting by slart42 · · Score: 4, Informative

      >Imaging all the new ways to spell bank0famerlca.com.

      This is already happening. A common example is the cyrillic lower case "?", which looks almost exactly like the latin "a" in most fonts.

      See http://en.wikipedia.org/wiki/IDN_homograph_attack for more information.

    2. Re:Phishing just got a lot more interesting by colfer · · Score: 3, Informative

      Preventing that has been part of Mozilla's IDN implementation, and I assume other browsers have addressed (ha) it as well. If a TLD, like .ie, Ireland, has a policy against phishing, and a table of lookalike letters, then Firefox will present the IDN address in the address bar in its own, non-English, language. Otherwise, Firefox displays the address in its IDN-encoded form, which is all ASCII. AFAIK, from reading bug reports on Mozilla, this is already in force.

    3. Re:Phishing just got a lot more interesting by colfer · · Score: 2, Informative

      Here are the references on IDN puny-code spoofing prevention settings in Mozilla. http://kb.mozillazine.org/Network.IDN.blacklist_ch ars http://kb.mozillazine.org/Network.IDN.whitelist.* http://kb.mozillazine.org/Network.enableIDN http://kb.mozillazine.org/Network.IDN_show_punycod e For example. .jp Japan is whitelisted but .ie Ireland is not. There was a debate between people that wanted to disable or hobble IDN/puny-code, for security, and people who wanted to internationalize Mozilla completely. The resulting blacklist/whitelist and configurability was a compromise.

    4. Re:Phishing just got a lot more interesting by drsquare · · Score: 4, Funny

      Rubbish. Since when does a question mark look like an 'a'?

    5. Re:Phishing just got a lot more interesting by chihowa · · Score: 4, Funny

      Rubbish. Since when does a question mark look like an 'a'?

      Didn't you even read the post? When it's lowercase. Duh.

      --
      If you want a vision of the future, imagine a youtube comments section scrolling - forever.
  3. Dibs! by truthsearch · · Score: 3, Funny

    I got dibs on sêx.com!

    1. Re:Dibs! by kimba · · Score: 2, Informative

      I got dibs on sêx.com!


      Umm, you do realise this was registered in 2005? Such domains already exist and can be registered today.

      The technical test is about having Internationalised Domain Names at the top-level, or root, of the DNS. So then you can have .sêx rather than .sex.
    2. Re:Dibs! by VWJedi · · Score: 5, Funny

      The technical test is about having Internationalised Domain Names at the top-level, or root, of the DNS. So then you can have .sêx rather than .sex.

      So we could theoretically have sex at any level... but this is slashdot, so it's not likely to happen for anyone around here.

  4. Re:Of little use by Anonymous Coward · · Score: 2, Interesting

    I would bet the average German Internet user knows how to do that. It's pretty easy when the key is on your keyboard: http://carbon.cudenver.edu/~tphillip/GermanKeyboar dLayout.html

  5. Re:Maybe not.. by LighterShadeOfBlack · · Score: 3, Insightful

    While browsers can't even properly show non-english alphabet, this doesn't seem to be a good a idea. My native language contains many special characters and I usually end up deciphering the emails sent by mom to me, because along the way, servers replace these characters with funny things. Well is it the browsers or the servers that are the issue? AFAIK any modern browser fully supports Unicode and any other encodings so there shouldn't be an issue there. If the servers are the problem then either it's the protocol that needs updating/replacing (I don't know nearly enough about SMTP, IMAP4, or POP3 protocols to comment) or the servers themselves are non-compliant. If there's a problem it should definitely be fixed, but you really need to know what the problem is first.
    --
    Spelling mistakes, grammatical errors, and stupid comments are intentional.
  6. Re:In practice it means "national" URLs. by leuk_he · · Score: 2, Informative

    umlaut is hardly a problem if you set the use keyboard to üs-ïnternätional. But asian/hebrew/arabic/hebrew charcacter are much more difficult to enter... in my expierence.

    But you will still be able to click them. IDN support is available in most popular browser (although disbled for security issues.)

  7. They could split unicode into sections by Colin+Smith · · Score: 2, Insightful

    Call them, say, "character sets.

    Then only allow names and queries all from the same character set.

    --
    Deleted
  8. English "X" vs. Cyrillic "khah" by J.R.+Random · · Score: 3, Insightful

    This is just common sense -- there's no reason why Chinese, Greeks, and Russians should have to use a character set meant for the English language. But any given URL should have a language associated with it and any character in that URL not associated with its language should be color coded. So English language URLs would get "omicron" flagged while Greek URLs would get "O" flagged. The "default" language could be English so that existing URLs are unchanged, for other languages their ISO code could precede the URL. Now this particular scheme might have some fatal flaw but something similar ought to be workable.

    1. Re:English "X" vs. Cyrillic "khah" by pavon · · Score: 2, Insightful

      Agreed, although I think a dialog box should also be shown as an annoyance / deterant. Otherwise just imagine what the Web 2.0 folks will do when they realize they can redirect their site to one with cool multi-colored URLs, thus conditioning people to ignore the colored warning. And you thought del.icio.us was overly cute :)

  9. Re:What about security issues? by JanneM · · Score: 2, Insightful

    Like you already have with "l", "I" and "1"; or "O" and "0"; or "V" and "U", depending on the particular font you happen to use?

    Phishing attacks mostly works not because people can't see a minute difference between two lookalike letters; they work because as long as nothing is utterly obviously, grossly out of order people just assume they're in the right place. You can have domain names that aren't even close to the real one, and websites with only superficial similarities to the original and a lot of people will still be duped.

    --
    Trust the Computer. The Computer is your friend.
  10. Security minded questions by merc · · Score: 2, Interesting

    Will having non-ASCII data in FQDN's open us up to buffer-overflow attacks in various network-aware services?

    --
    It's true no man is an island, but if you take a bunch of dead guys and tie 'em together, they make a good raft.
  11. Re:Phishing by evought · · Score: 2, Informative

    This has actually been discussed to some extent for years. One method is to only allow domains to be registered or displayed in a single language character set, such that a domain name can use latin characters or greek characters, but not both. This can be enforced at registration or when displayed in the browser (the browser can highlight improper URLs). This does not prevent attacks where the entire spelling of the domain is available in an alternate character set. One solution is for the browser to somehow tell the user what language a URL is written in.



    Here is a detailed description of how IE handles this, and also a w3c page discussing general techniques and different browsers. An interesting note is the possible use of the fraction slash to add fake urls to a domain name. Of course, at the end of the day, standard phishing protection applies to domains which slip through the net.

  12. Some Unanswered Questions About IDNs ... by Ron+Bennett · · Score: 2, Interesting

    Below is a quick copy and paste from one of my posts on DNForum regarding IDNs ... I own some IDNs and believe they have much potential, but there are still many unanswered questions...

    Excerpt from a post of mine on DNForum regarding IDNs:
    http://www.dnforum.com/showthread.php?p=732080

    I'm running into a lot of issues that many IDN folks aren't discussing - probably because they've not consider them ...

    Various issues / threats / questions:

    ?? The existance of numerous diverse dialects, even totally different languages, etc in the same country ... it's among the reasons that English dominates in some areas; some natives, even if they can understand a particular dialect, will sometimes speak a totally non-native language, such as English, instead to avoid risk of offending the other party. One can't assume one language dominates an entire region - languages can also overlap many areas ... it's one of the reasons some are pushing for language / culture based TLDs, such as .CAT (among the dumbest ideas ever, but that's another discussion for the .CAT thread running here on DNF).

    ?? An IDN that contains western european characters that very close matches a non IDN ... ie. cafe.com verses café.com ... what happens? Will the IDN be highlighted / blocked by default? ... likely an easy UDRP target? ... introduction of a new IDN specific dispute procedure? -perhaps there already is one?

    ?? Trademark issues ... ie. an IDN that is similar / exact to a trademark in another country ... less obvious, what about an IDN that translates to that of a trademarked word / phrase? -I believe there's a thread discussing such an issue now on one of the other boards here.

    ?? language variants (more applicable to asian languages, etc) related issues ... how good / stable are the various language variant tables?

    ?? what happens when a language variant table changes? -how are conflicts handled?

    ?? what happens if a character variant (an IDN [IDL package] technically can comprise multiple character variants [code points]) is released? ... does the current registrant get first dibs? ... even if yes, it may not be quite that simple if a character variant occurs in numerous permutations.

    ?? What happens if a reserved character variant is changed to a preferred character variant? - while such a change would have little to no effect on affected IDNs (IDL packages), it could result in the appearance of some IDNs changing ... probably not a biggie compared to some other issues, but one to be aware of.

    ?? How reliable, especially for those in languages with numerous character variants, will IDN domain resolution be? ... IDN resolution depends on much client-side APIs.

    ?? How well will IDN resolution APIs be regulated ... I can easily envision scenerios in which a web browser and/or other applications (email, IM, etc) implement resolution differently ... ie. adding and/or ignoring one or more valid language associations for a particular IDN / converting similar-looking western european characters to standard A-Z characters, etc. A related concern is language table management - I'm a little hazy on if the tables will be internally stored by each app or remotely loaded for each session, etc.

    Rambling on, but there are a lot of things that one needs to be aware of with IDNs.

  13. Balkanising the internet? by hcdejong · · Score: 3, Interesting

    Would this lead to segregation of the internet into zones defined by the language used for the domain name? At the moment, I can access e.g. Japanese websites easily, even if the content of that site is in a language I don't understand [1].
    If non-Roman domain names become popular, will I still be able to access them, or will they disappear behind untypeable URLs? A search engine may be able to mitigate this problem somewhat, but ATM I sometimes get search results for Japanese-language pages only because my search term is present in the URL.

    1: yes, a site can still be useful in this case and no, despite the stereotype it's not just for porn.

    1. Re:Balkanising the internet? by kimba · · Score: 2, Informative

      My concern would be for all the internet filtering and firewalling software which explicitly only allows ASCII in HTTP headers.


      IDN encoding is pure ASCII, in a similar way that MIME email attachments are. The protocol layer never sees anything other than letters, numbers and hyphens. All IDN encoded domains are prepended with "xn--" so that end-user interfaces can tell them apart and convert them back and forth.
    2. Re:Balkanising the internet? by badasscat · · Score: 2, Interesting

      Imaging all the Japanese who don't know English, but have to learn/type english domain names. Very unintuitive for them.

      Bad example.

      The Japanese are probably the *least* likely of any non-English speaking country to use non-roman url's. The fact is the standard Japanese keyboard is the same exact QWERTY keyboard we use. They can type Japanese through software, which is how they normally work when writing to each other, but there's nothing "non-intuitive" in using an English keyboard in the way that it was intended. In fact, most of them write Japanese using romanizations, then select the correct kanji through a list. So they're universally familiar with romanized url's, and like any habit, it's not going to change just because an alternative became available. Typing kanji is harder on a Japanese computer than typing a romanization.

      Now, the Chinese, Russians, etc. I don't know about, so there could be better examples out there of people who would take advantage of this.

  14. Romanization as DNS lingua franca by StreetStealth · · Score: 2, Interesting

    Couldn't these linguistically-heterogenous domain spaces still be universally linked through romanization? I see one possible solution: An intermediary DNS conversion server; i.e. type "[those were supposed to be Japanese kanji].co.jp" and your DNS request is treated the same as "rakuten.co.jp". Beyond the inability to rake in tons of money for new registrations, what might be the disadvantages of such a system?

    --
    Your mind is clear / The things that you fear / Will fade with how much you / Believe what you hear
    1. Re:Romanization as DNS lingua franca by Nimey · · Score: 2, Interesting

      For some languages, like Arabic, there is no one standard for romanization. A trivial example is Qu'ran/Koran.

      --
      Hail Eris, full of mischief...

      E pluribus sanguinem
  15. Already done by kahei · · Score: 2, Interesting


    Once again, committees lag behind actual problems and actual solutions.

    Now if you'll excuse me I'll go back to browsing .jp.

    (I seem to recall that /. has issues of its own, so the ascii encoding of that would be http://xn--cckev5k8eta5k.jp/. Anyway, the point is that characters beyond ASCII have been used for ages. Mostly by people who don't mind it when users from other countries can't access their site.)

    --
    Whence? Hence. Whither? Thither.
  16. Re:Maybe not.. by Petrushka · · Score: 2, Informative

    Just about any e-mail service should enable the use of non-ascii characters. Any halfway decent e-mail client will; if you're using Thunderbird or Mail or Pegasus, just set the character set to UTF-8; I believe Pine allows UTF-8 too. (Personally I can't imagine any reason for not using UTF-8 as default; I use it all the time, even though almost all of my e-mails are in English.) Most web-interfaces allow it as well: Gmail certainly does, for example; I'm pretty sure Yahoo does.