Spoofing URLs With Unicode
Embedded Geek writes: "Scientific American has an interesting article about how a pair of students at the Technion-Israel Institute of Technology registered "microsoft.com" with Verisign, using the Russian Cyrillic letters "c" and "o". Even though it is a completely different domain, the two display identically (the article uses the term "homograph"). The work was done for a paper in the Communications of the ACM (the paper itself is not online). The article characterizes attacks using this spoof as "scary, if not entirely probable," assuming that a hacker would have to first take over a page at another site. I disagree: sending out a mail message with the URL waiting to be clicked ("Bill Gates will send you ten dollars!") is just one alternate technique. While security problems with Unicode have been noted here before, this might be a new twist."
It is widely used on russian-language IRC
networks like RusNet. http://www.irc.net.ru/
Anyone else remember using alt+255 and other special characters to make hard to open directories (idiot proof anyway) on shared command line systems?
You were eaten by a grue.
At the moment these unicode domain names will not be displayed correctly by web-browsers, rather you will see a bunch of cunfusing control codes, so this threat isn't really a problem yet.
Of course, the underlying problem is that DNS is an ugly kludge which has long-outgrown itself. The administrative cost of constructing a massive global namespace is vast, and we can all see the opportunities for cyber-squatting it creates, to the detriment of the public interest.
These days I am more likely to go to Google and type in a few words, rather than try to guess the URL. The task of finding the website you are interested in should be left to the specialists (like Google and other search engines), we shouldn't try to maintain an ugly, broken, monopolistic, and expensive "first come first serve" architecture like DNS.
There is no good reason why a web user should ever need to see a URL (except perhaps momentum), any more than they need to see the HTML which makes up a document.
I believe it would be something along the lines of .
One way to control this would be to restrict the valid characters based on the TLD.
...
.com/.org./.net as ASCII, although they are meant to be global they are based on the Latin character set.
So for example '.uk'/'.au'/'.us' etc. can ONLY have ASCII 2nd level domains. '.de' Can only have German characters, '.fr' only French, and so on
Then for completely different character sets, you have new Unicode TLDs (Arabic, Greek, Chinese), which can only have their relevant characters.
I guess you leave
Of course, this adds complexity - but you can do all the testing for validity when the domain is registered (i.e. a web client can request any URL, but dodgy mixed character set domain names cannot be registered).
... so it seems safe to say that trust is the foundation of their business. Essentially, we trust Verisign to ensure that we're communicating with whom we think we're communicating, and to protect us from various forms of spoofing. They should therefore, IMHO, actively avoid even the appearance of impropriety.
However, we all remember the Microsoft certificates they mistakenly gave out to a third party.
Now we've got them registering another domain to someone that looks just like "microsoft.com." While it's tempting to absolve Verisign of guilt in this, I think they were asking for it. After all, even I thought of this possibility when I first heard about Unicode domain names, and I'm not the sharpest knife in the drawer. You've got to think someone at Verisign raised the possibility, but they chose not to deal with it.
Again, one might be tempted to say that this isn't their problem, if not for the fact that they are in the trust business. As the article says, "Certification agencies (which include VeriSign) ensure that encoded names are not misleading and that the registration corresponds with the correct real-world entity." It should not be technically difficult, for instance, to build a set of lists of visually similar Unicode characters and to refuse to register domains visually identical to existing ones. Maybe they should decide to forgo a relatively small amount of revenue and to refuse to sully their reputation with such inevitably deceptive domain registrations, especially considering that they interfere with Verisign's core business.
Of course, none of this compares to the letters they sent out trying to fool people into switching their domains over to Verisign. The other two were negligence and foolishness, but that was an active attempt to deceive from a company that's selling trust.
It all leaves me in a bit of shock. It's not that I'm shocked to see a company doing stupid and deceitful things; it's that trust is Verisign's primary asset. Hearing about these (colossally, in my mind) stupid decisions is like hearing that GM decided to torch all its manufacturing plants and assasinate all its employees. It leaves me with two questions: "what they hell are they thinking?" and "why does anyone continue to do business with Verisign?"
Domain spoofing is one are. But what if you see an email address on a business card, say @mirsft.com? How do you know what encodings are those 'c', 'a' and 'o' are in (for those with UNICODE brain-damaged browsers the address above should look like ca@microsoft.com)? Same goes for URLs, etc. Another option -- say a Swedish company registers an URL that perfectly represent the name of the comapny in Swedish. With all those umlauts and whatever-they-are-called-those-circles-over-A. And you are sitting there with a US_en keyboard -- how are you expected to type that URL into a location field in your browser?
For the use-cases like this I think that multilingual URLs are a Bad Idea (TM).
--AP
but how can you (or a avarage user) send an email to say müller@müller.de using an english keyboard?
i think we should stick to ASCII
In windows (the EU edition) - anyone. Just add the language. Your only problem is that the idiots in Redmond have yet to add a keyboard editor (something that has been present in all third party internationalisation packages since Windows 3.10). As a result you will be stuck with some extremely obscene keymap inherited from a cyrillic typewriter. Alternatively you can pick up dlls from third party cyrillisation packages made for older windows versions and violate the sanctity of the MSFT sertificate by slapping it on top of the current ones. It usually works. And you get a proper keymap.
Under unix it is usually a bit more p*** in the a*** because most internationalisations rely on Xmodmap and it no longer works nowdays. Once again by default you will get stuck with something you cannot use unless you have a keyboard that is engraved with the alternate characters. Once again you will need to spend half an hour with vi swearing at whoever made Xmodmap not to work any more in order to get a less obscene keymap.
Baker's Law: Misery no longer loves company. Nowadays it insists on it
http://www.sigsegv.cx/
This was only true in Western Christendom and then only true to a limited extent. For example, in the west, the first Christian missionaries to the British Isles translated the service books of the early Church to Gaelic and other Celtic languages. In the east, the the generally accepted practice was to use the venacular. This is why some of the oldest extent copies of the Bible are in one of the Ethiopic languages, Coptic, Syrian, etc.
The Roman canon that the liturgy could only be practiced in one of the tongues spoken by the apostles was of relatively late invention and only applied to congregations under the sole apostolic see of the west, Rome. Congregations under the apostolic sees of the east always used the venacular.
Hence it is somewhat ironic that many eastern Churches refuse to update the liturgy from being in liturgical Greek or old Slavonic into their modern equivalents.
Regards,
-l
For many language encodings the conversion to unicode is a one-way ticket, there is no roundtrip possible -- so you sometimes lose critical information about the characters.
It's also disappointing that unicode forum dropped their official JISUTF tables. There is no longer any official translation table for japanese encodings to unicode. It's the wild west for asian languages in unicode (ever wonder why no asian data systems use unicode?)