International URLs Pass First Test
Off the Rails writes "The BBC reports on the results of a successful test of non-ASCII domain names on Internet-equivalent hardware (pdf) carried out last October. The next stage is to plug the system into the net, and if it still works, it could go live sometime next year. 'Early work on the technical feasibility of using non-English character sets suggested that the address system would cope with the introduction of international characters tests were called for to ensure this was the case ... Also needed are policy decisions by Icann on how the internationalised domain names fit in and work with the existing rules governing the running of the address books. Icann is under pressure to get the international domain names working because some nations, in particular China, are working on their own technology to support their own character sets.'"
now I have to learn second languages to look at asian porn.
In a world of acronyms, the words are the real victims.
Imaging all the new ways to spell bank0famerlca.com.
Best Windows Freeware
Non-ASCII? This is awesome! I can't wait for the ANSI addresses to start showing up.
I got dibs on sêx.com!
Developers: We can use your help.
In my skim through the various links, I didn't see what they are proposing to do for practical real-world problems such as phishing. What are they going to do to ensure that a phisher doesn't register a domain with characters that look almost indistinguishible from different characters in a different language, so as to trick users into visiting the phisher's site instead of the legitimate version of the site?
Oolite: Elite-like game. For Mac, Linux and Windows
While browsers can't even properly show non-english alphabet, this doesn't seem to be a good a idea. My native language contains many special characters and I usually end up deciphering the emails sent by mom to me, because along the way, servers replace these characters with funny things.
I would bet the average German Internet user knows how to do that. It's pretty easy when the key is on your keyboard: http://carbon.cudenver.edu/~tphillip/GermanKeyboar dLayout.html
That should be "mࡺcrosoft.com". Slashdot will probably need to be upgraded to support IDNs, it seems. :)
If your company/organisation/you have any international contacts then you will NOT be using these international URLs. So you still need the old-style URLs or you'll need to explain how to get those umlauts etc to type in the url. On their national keyboard... not yours that has them. And if you've done any support you know how hard it's even to get someone to READ what's already on the screen...
And this is mostly for countries that don't use the same characters as English (Latin alphabet?), like Japan and China.
It can be cool (?).
-- Rastignac was here.
They are internationalized urls. If they were international urls I would be able to enter them in my browser without doing funky stuff.
Pardon my ignorance, but couldn't they have just thought of an encoding scheme? Similar to how certain characters are encoded in the path of an URL ("&"-style or "%20"-style). Possibly a more complicated scheme would have been necessary, but surely it would have been possible without requiring changes to the ASCII nature of domains.
Slashdot social media options: AIM, ICQ, Yahoo, Jabber and Mobile Text. Why no MySpace?
Call them, say, "character sets.
Then only allow names and queries all from the same character set.
Deleted
This is just common sense -- there's no reason why Chinese, Greeks, and Russians should have to use a character set meant for the English language. But any given URL should have a language associated with it and any character in that URL not associated with its language should be color coded. So English language URLs would get "omicron" flagged while Greek URLs would get "O" flagged. The "default" language could be English so that existing URLs are unchanged, for other languages their ISO code could precede the URL. Now this particular scheme might have some fatal flaw but something similar ought to be workable.
Also needed is automatic translation by, say, a Firefox extension, from the domain name's registered home language (if any) into the user's default language. How do you say "goatse" in Urdu?
A good complement to the new system to preempt the huge coming problem of "glyph masquerade" would be registrations including a list of the domain name translated into different languages. Or at least a declaration of the home language. Without enforcement (ICANN doesn't even enforce name/address veracity) it won't be proof of anything, but it would be a start. And 3rd party databases could include in trust ratings the completeness of the name entry, as well as cross-checks.
I'd like my GUI to at least indicate when a domain name is rendered in foreign glyphs, so I can try to tell whether it's really just foreign glyphs that look like a familiar English word, fooling me into clicking on something totally unrelated.
Opening the system to foreign scripts and languages will get even more worthwhile people and orgs onto the Net, so it's well worth the risks of misidentification. But the risks are real, and largely predictable. We should roll out the new, inclusive system with risk mitigations to welcome those new people in greater security.
--
make install -not war
Like you already have with "l", "I" and "1"; or "O" and "0"; or "V" and "U", depending on the particular font you happen to use?
Phishing attacks mostly works not because people can't see a minute difference between two lookalike letters; they work because as long as nothing is utterly obviously, grossly out of order people just assume they're in the right place. You can have domain names that aren't even close to the real one, and websites with only superficial similarities to the original and a lot of people will still be duped.
Trust the Computer. The Computer is your friend.
This really seems like a pretty minor issue to me. Browsers would just need to adopt a policy of flagging URIs with mixed language character sets, highlighting that character in red or something. More dangerous is the new domain land grab as companies grab legitimate domains in other languages that natives feel the real company simply must own, but which the parent company probably does not. This can be addresses by a certificate scheme that ties identity verification to the site, however, and such a scheme really needs to be implemented on a wide scale to deal with current security problems anyway.
Will having non-ASCII data in FQDN's open us up to buffer-overflow attacks in various network-aware services?
It's true no man is an island, but if you take a bunch of dead guys and tie 'em together, they make a good raft.
Below is a quick copy and paste from one of my posts on DNForum regarding IDNs ... I own some IDNs and believe they have much potential, but there are still many unanswered questions...
...
... it's among the reasons that English dominates in some areas; some natives, even if they can understand a particular dialect, will sometimes speak a totally non-native language, such as English, instead to avoid risk of offending the other party. One can't assume one language dominates an entire region - languages can also overlap many areas ... it's one of the reasons some are pushing for language / culture based TLDs, such as .CAT (among the dumbest ideas ever, but that's another discussion for the .CAT thread running here on DNF).
... ie. cafe.com verses café.com ... what happens? Will the IDN be highlighted / blocked by default? ... likely an easy UDRP target? ... introduction of a new IDN specific dispute procedure? -perhaps there already is one?
... ie. an IDN that is similar / exact to a trademark in another country ... less obvious, what about an IDN that translates to that of a trademarked word / phrase? -I believe there's a thread discussing such an issue now on one of the other boards here.
... how good / stable are the various language variant tables?
... does the current registrant get first dibs? ... even if yes, it may not be quite that simple if a character variant occurs in numerous permutations.
... probably not a biggie compared to some other issues, but one to be aware of.
... IDN resolution depends on much client-side APIs.
... I can easily envision scenerios in which a web browser and/or other applications (email, IM, etc) implement resolution differently ... ie. adding and/or ignoring one or more valid language associations for a particular IDN / converting similar-looking western european characters to standard A-Z characters, etc. A related concern is language table management - I'm a little hazy on if the tables will be internally stored by each app or remotely loaded for each session, etc.
Excerpt from a post of mine on DNForum regarding IDNs:
http://www.dnforum.com/showthread.php?p=732080
I'm running into a lot of issues that many IDN folks aren't discussing - probably because they've not consider them
Various issues / threats / questions:
?? The existance of numerous diverse dialects, even totally different languages, etc in the same country
?? An IDN that contains western european characters that very close matches a non IDN
?? Trademark issues
?? language variants (more applicable to asian languages, etc) related issues
?? what happens when a language variant table changes? -how are conflicts handled?
?? what happens if a character variant (an IDN [IDL package] technically can comprise multiple character variants [code points]) is released?
?? What happens if a reserved character variant is changed to a preferred character variant? - while such a change would have little to no effect on affected IDNs (IDL packages), it could result in the appearance of some IDNs changing
?? How reliable, especially for those in languages with numerous character variants, will IDN domain resolution be?
?? How well will IDN resolution APIs be regulated
Rambling on, but there are a lot of things that one needs to be aware of with IDNs.
http://www.145/|-|D07.org
Imagine it with different ANSI colors for each char.
Would this lead to segregation of the internet into zones defined by the language used for the domain name? At the moment, I can access e.g. Japanese websites easily, even if the content of that site is in a language I don't understand [1].
If non-Roman domain names become popular, will I still be able to access them, or will they disappear behind untypeable URLs? A search engine may be able to mitigate this problem somewhat, but ATM I sometimes get search results for Japanese-language pages only because my search term is present in the URL.
1: yes, a site can still be useful in this case and no, despite the stereotype it's not just for porn.
As far as I know, Japanese URLs have been working and in use for quite some time. I've visited several myself. Mind you, I'm surprised anyone in the anglophone sphere takes notice.
He who lights his taper at mine, receives light without darkening me.
Couldn't these linguistically-heterogenous domain spaces still be universally linked through romanization? I see one possible solution: An intermediary DNS conversion server; i.e. type "[those were supposed to be Japanese kanji].co.jp" and your DNS request is treated the same as "rakuten.co.jp". Beyond the inability to rake in tons of money for new registrations, what might be the disadvantages of such a system?
Your mind is clear / The things that you fear / Will fade with how much you / Believe what you hear
Your average Mac user does too, since it's just option-u, followed by the letter. It was similarly easy on the Psion Series 3, but it seems harder on some other operating systems.
I am TheRaven on Soylent News
And there are quite some solutions to it. One of them (I think this is the one we're talking about) is converting the characters to ASCII and serialize them. Quite simple, let the browser do it.
Custom electronics and digital signage for your business: www.evcircuits.com
http://pi.cr.yp.to/
As a side note, it's interesting that Slashdot says this link is at cr.yp.to.
so how am i, on my gb keyboard suppose to conveniently type in all sorts of foreign characters?
if there is going to be some traditional ASCII alternative url.. then just what are we doing?
i am all for versatility, but there is always talk about unification, this would just segregate the web into 'things i can type' and 'things i can't'
and considering that html is in american, and that most people take into account that english is a very common language when designing a page, are we not just creating some novelty, which after a while will annoy all but a few?
of course, dns is only a convenience anyway, we could solve all this and all start memorising ip addresses, especially when IPv6 should soon be in play. XD
May the Maths Be with you!
Prov 9:8 Do not rebuke mockers or they will hate you; rebuke the wise and they will love you.
It's worse than that, actually. Many codepages include double-byte versions of the ASCII characters that, for all intents and purposes, look IDENTICAL to your standard ASCII letters.
An example of this is Japanese's curious, and depreciated, half-width and full-width alpha-numeric characters. Both of these replicates ASCII letters using different code values. So within just Japanese alone, there are three distinct but identical-looking ways to display the letter "a" within a domain name. And other language codepages have this "feature" as well...
Short of being able to decipher raw bytes against a given encoding, you won't be able to tell where that link that says "ebay.com" will take you.
Once again, committees lag behind actual problems and actual solutions.
Now if you'll excuse me I'll go back to browsing
(I seem to recall that
Whence? Hence. Whither? Thither.
If ASCII was good enough for the Apostles Peter and Paul then it ought to be good enough for everyone.
You are in a maze of twisty little passages, all alike.
Okay, I'll bite. I have what I think amounts to a fairly good, if basic, understanding of how internationalized character sets and encodings work, but I don't understand how you'd encode multiple character sets into one URL.
I mean, first of all, in order to use non-Latin characters at all, you have to have some way of transmitting which character set / codepage you want to use. I can't find any place in TFA where they actually describe how this is going to work (although I didn't read the PDF, so perhaps it's in there), but my assumption was that it would be transmitted outside the actual stream of bytes that represent the URL.
So, a "URL block" might consist of some metadata about the URL that's going to be transmitted -- e.g., what character set it's written with, etc. -- and then the stream of bytes that actually represent the address. Doing it that way would by definition only allow one character set per URL, because there's no way of changing it mid-stream.
If you allow people to change character sets in the middle of the address, so as to have an address where one part was written in ASCII or Latin-1, and then another byte or two in UTF-8, and then the remainder in Latin again, would hugely complicate the standard both from an implementation and use perspective.
As long as all the alternative (that is, alternative to ASCII) encodings include within them a minimalist Latin charset, enough so that you can type the ".com" and other TLDs, then there doesn't seem to be any reason to allow mixed-charset URLs.
"Ladies and gentlemen, my killbot features Lotus Notes and a machine gun. It is the finest available."
I don't see this as being very popular. Does the average Internet user know how to get an umlaut to display?
Yeah. All those people of the world who speak languages that use those characters have no clue how to actually type them in. Are you freaking stupid?
Workaround
English is not my native language (as if you didn't notice) I still think this is not exactly the best idea ever, I actually think it is pretty bad... Phising has been named, but it also seems as a huge overcomplication. Most sites (aka youtube) already get to survive with totally cryptic URLs , so I don't really thing this is a problem at all.
Copyright infringement is "piracy" in the same way DRM is "consumer rape"
But at least I will be able to register my last name. It is nice to see the World Wide Web becoming more... world-centric.
Actually, as the abstract of the paper correctly states, it's about non-ascii characters in TLDs. International characters already exist in the domain names, as some posters have pointed out.
In this article they applied the same encoding used for domain names to TLDs, and they noticed it works fine. So to summarize, it's not about miçrósoft.com, it's about microsoft.çóm . That's much more fun!
Like you already have with "l", "I" and "1"; or "O" and "0"; or "V" and "U", depending on the particular font you happen to use?
Indeed. This makes an existing problem much much bigger.
Phishing attacks mostly works not because people can't see a minute difference between two lookalike letters; they work because as long as nothing is utterly obviously, grossly out of order people just assume they're in the right place.
And what people see as "obviously out of order" changes as people learn about phishing. It's like conterfeiting: when the appearance of money changes you get a period where lower quality notes can be successfullt "passed", and the Treasury makes an effort to get the word outahead of time to make sure businesses at least are familiar with the new notes.
Similarly, people can learn not to be phished. That's why you have phishers hiding the address bar, emulating the address bar, creating addresses that try and push the "root" of the name off the address bar, creating addresses like "http://microsoft.com@192.168.1.1/security", and so on.
Being able to have addresses that are visually identical but encoded differently is a real problem, and one that needs to be solved before IDNs are rolled out.
This really seems like a pretty minor issue to me. Browsers would just need to adopt a policy of flagging URIs with mixed language character sets, highlighting that character in red or something.
It's not a minor issue, and it's not an insoluble issue, but it's one that needs to be positively and aggressively addressed.
And it's not just browsers: you need to flag these characters in any application that renders internationalized text with or without HTML being an intermediary. Alternatively, registered domains can be restricted to distinct subranges of Unicode, so that you couldn't (for example) register a second level domain containing glyphs outside a single national character set.
The point is, this test is just verifying that an issue that nobody really thought was going to turn out to be a problem is, in fact, not a problem. It doesn't mean that widespread use of IDNs should be considered imminent.
The day that goes on-line I'll be able to filter scads of spam simply by refusing to resolve international domain names. Woot!
I felt a great disturbance in the Net, as if millions of DNS servers suddenly cried out in terror.
Intron: the portion of DNA which expresses nothing useful.
Given the number of passwords that the average person who does a lot of stuff online needs to remember, unless they're doing something hideously insecure already (like using the same password everywhere), they can probably only sign on from a single computer anyway, because that's where their passwords are stored or written down.
The problem of certificate management is, IMO, actually more tractable than the problem of password management. There are lots of ways that you could allow people to move certificates around, if you really wanted to; you could issue USB sticks or smartcards that they could jack in to public machines (although preferably you'd create some method that never actually let the unsecure machine 'see' the certificate itself; you'd just do some sort of challenge/response with the USB key or smartcard).
Passwords really aren't all that convenient; if you're using passwords properly (not reusing the same ones in multiple places), and you're not using a crutch like iterative generation, or just writing the things down (which basically makes it a very insecure "analog certificate"), you're probably way out on the tail-end of the bell curve of what a normal person can remember. Passwords are only "user friendly" because the way that most people use them is hideously insecure.
"Ladies and gentlemen, my killbot features Lotus Notes and a machine gun. It is the finest available."
It is the key of the vowel you are most likely to use the umlaut with. Typing ü involves holding option, tapping u, releasing option, and tapping u again. Option-e does the same thing for an accent like this é.
I am TheRaven on Soylent News
True confidence comes not from realising you are as good as your peers, but that your peers are as bad as you are.
Firefox was ever so nice as to convert that into Punycode for me: xn--gba.com
Strangely, accessing ©.com in IE directed me to an advertisement for VeriSign's IDN client software. xn--gba.com works just fine in IE though.
This could be automated so there's not really any problem with the phishing stuff, IMHO.
Not if it's actually implemented.
But given some of the ratbags running domain registrars, you think they'll bother?
the Japanese set contains the full alphanumeric alphabet
There are always a few special cases. You just deal with them... for example, deny names using just those characters.
Well, the people who speak that language can type in the url, but probably nobody else in the world can. (It would probably lead to all foreign sites having to set up one server with an English domain name and one with a local domain name.) Heck, I'm not even positive I can type in an "accent grave" in this Slashdot message and get it to actually work. As for things like Khmer Unicode, which is poorly supported even inside Cambodia, lotsa luck.
Also, a lot of people don't know how to set up a computer for their own language, so they're stuck if they need to use a computer in a foreign country. (Even if the computer doesn't need admin rights to change the setup.)
How will they translate "www."?
How then are English visitors suppose to visit one of these sites? Purely by links?
:), and not of these poncy half wars, where we stop fighting before we've won and 'peacekeep' for the next century. At least then there would only be the conquering language to speak.
Although I can kinda see the point, I can't see how this will work...all I can see is the internet fragmenting, which seems to be against the whole spirit of things!
For those that don't see why someone who can't read the language would want to visit the site...the reason is simple: Pictures tell a thousand words. Secondly technology and science is often language independent, so the specs on a Korean phone site are useful.
What we really need is an massive all out war
----- I refuse to have an argument with an unarmed person