International URLs Pass First Test
Off the Rails writes "The BBC reports on the results of a successful test of non-ASCII domain names on Internet-equivalent hardware (pdf) carried out last October. The next stage is to plug the system into the net, and if it still works, it could go live sometime next year. 'Early work on the technical feasibility of using non-English character sets suggested that the address system would cope with the introduction of international characters tests were called for to ensure this was the case ... Also needed are policy decisions by Icann on how the internationalised domain names fit in and work with the existing rules governing the running of the address books. Icann is under pressure to get the international domain names working because some nations, in particular China, are working on their own technology to support their own character sets.'"
now I have to learn second languages to look at asian porn.
In a world of acronyms, the words are the real victims.
Imaging all the new ways to spell bank0famerlca.com.
Best Windows Freeware
I got dibs on sêx.com!
Developers: We can use your help.
I would bet the average German Internet user knows how to do that. It's pretty easy when the key is on your keyboard: http://carbon.cudenver.edu/~tphillip/GermanKeyboar dLayout.html
Spelling mistakes, grammatical errors, and stupid comments are intentional.
umlaut is hardly a problem if you set the use keyboard to üs-ïnternätional. But asian/hebrew/arabic/hebrew charcacter are much more difficult to enter... in my expierence.
But you will still be able to click them. IDN support is available in most popular browser (although disbled for security issues.)
Call them, say, "character sets.
Then only allow names and queries all from the same character set.
Deleted
This is just common sense -- there's no reason why Chinese, Greeks, and Russians should have to use a character set meant for the English language. But any given URL should have a language associated with it and any character in that URL not associated with its language should be color coded. So English language URLs would get "omicron" flagged while Greek URLs would get "O" flagged. The "default" language could be English so that existing URLs are unchanged, for other languages their ISO code could precede the URL. Now this particular scheme might have some fatal flaw but something similar ought to be workable.
Like you already have with "l", "I" and "1"; or "O" and "0"; or "V" and "U", depending on the particular font you happen to use?
Phishing attacks mostly works not because people can't see a minute difference between two lookalike letters; they work because as long as nothing is utterly obviously, grossly out of order people just assume they're in the right place. You can have domain names that aren't even close to the real one, and websites with only superficial similarities to the original and a lot of people will still be duped.
Trust the Computer. The Computer is your friend.
Will having non-ASCII data in FQDN's open us up to buffer-overflow attacks in various network-aware services?
It's true no man is an island, but if you take a bunch of dead guys and tie 'em together, they make a good raft.
This has actually been discussed to some extent for years. One method is to only allow domains to be registered or displayed in a single language character set, such that a domain name can use latin characters or greek characters, but not both. This can be enforced at registration or when displayed in the browser (the browser can highlight improper URLs). This does not prevent attacks where the entire spelling of the domain is available in an alternate character set. One solution is for the browser to somehow tell the user what language a URL is written in.
Here is a detailed description of how IE handles this, and also a w3c page discussing general techniques and different browsers. An interesting note is the possible use of the fraction slash to add fake urls to a domain name. Of course, at the end of the day, standard phishing protection applies to domains which slip through the net.
Below is a quick copy and paste from one of my posts on DNForum regarding IDNs ... I own some IDNs and believe they have much potential, but there are still many unanswered questions...
...
... it's among the reasons that English dominates in some areas; some natives, even if they can understand a particular dialect, will sometimes speak a totally non-native language, such as English, instead to avoid risk of offending the other party. One can't assume one language dominates an entire region - languages can also overlap many areas ... it's one of the reasons some are pushing for language / culture based TLDs, such as .CAT (among the dumbest ideas ever, but that's another discussion for the .CAT thread running here on DNF).
... ie. cafe.com verses café.com ... what happens? Will the IDN be highlighted / blocked by default? ... likely an easy UDRP target? ... introduction of a new IDN specific dispute procedure? -perhaps there already is one?
... ie. an IDN that is similar / exact to a trademark in another country ... less obvious, what about an IDN that translates to that of a trademarked word / phrase? -I believe there's a thread discussing such an issue now on one of the other boards here.
... how good / stable are the various language variant tables?
... does the current registrant get first dibs? ... even if yes, it may not be quite that simple if a character variant occurs in numerous permutations.
... probably not a biggie compared to some other issues, but one to be aware of.
... IDN resolution depends on much client-side APIs.
... I can easily envision scenerios in which a web browser and/or other applications (email, IM, etc) implement resolution differently ... ie. adding and/or ignoring one or more valid language associations for a particular IDN / converting similar-looking western european characters to standard A-Z characters, etc. A related concern is language table management - I'm a little hazy on if the tables will be internally stored by each app or remotely loaded for each session, etc.
Excerpt from a post of mine on DNForum regarding IDNs:
http://www.dnforum.com/showthread.php?p=732080
I'm running into a lot of issues that many IDN folks aren't discussing - probably because they've not consider them
Various issues / threats / questions:
?? The existance of numerous diverse dialects, even totally different languages, etc in the same country
?? An IDN that contains western european characters that very close matches a non IDN
?? Trademark issues
?? language variants (more applicable to asian languages, etc) related issues
?? what happens when a language variant table changes? -how are conflicts handled?
?? what happens if a character variant (an IDN [IDL package] technically can comprise multiple character variants [code points]) is released?
?? What happens if a reserved character variant is changed to a preferred character variant? - while such a change would have little to no effect on affected IDNs (IDL packages), it could result in the appearance of some IDNs changing
?? How reliable, especially for those in languages with numerous character variants, will IDN domain resolution be?
?? How well will IDN resolution APIs be regulated
Rambling on, but there are a lot of things that one needs to be aware of with IDNs.
Would this lead to segregation of the internet into zones defined by the language used for the domain name? At the moment, I can access e.g. Japanese websites easily, even if the content of that site is in a language I don't understand [1].
If non-Roman domain names become popular, will I still be able to access them, or will they disappear behind untypeable URLs? A search engine may be able to mitigate this problem somewhat, but ATM I sometimes get search results for Japanese-language pages only because my search term is present in the URL.
1: yes, a site can still be useful in this case and no, despite the stereotype it's not just for porn.
Couldn't these linguistically-heterogenous domain spaces still be universally linked through romanization? I see one possible solution: An intermediary DNS conversion server; i.e. type "[those were supposed to be Japanese kanji].co.jp" and your DNS request is treated the same as "rakuten.co.jp". Beyond the inability to rake in tons of money for new registrations, what might be the disadvantages of such a system?
Your mind is clear / The things that you fear / Will fade with how much you / Believe what you hear
Once again, committees lag behind actual problems and actual solutions.
Now if you'll excuse me I'll go back to browsing
(I seem to recall that
Whence? Hence. Whither? Thither.
Just about any e-mail service should enable the use of non-ascii characters. Any halfway decent e-mail client will; if you're using Thunderbird or Mail or Pegasus, just set the character set to UTF-8; I believe Pine allows UTF-8 too. (Personally I can't imagine any reason for not using UTF-8 as default; I use it all the time, even though almost all of my e-mails are in English.) Most web-interfaces allow it as well: Gmail certainly does, for example; I'm pretty sure Yahoo does.