Spoofing URLs With Unicode
Embedded Geek writes: "Scientific American has an interesting article about how a pair of students at the Technion-Israel Institute of Technology registered "microsoft.com" with Verisign, using the Russian Cyrillic letters "c" and "o". Even though it is a completely different domain, the two display identically (the article uses the term "homograph"). The work was done for a paper in the Communications of the ACM (the paper itself is not online). The article characterizes attacks using this spoof as "scary, if not entirely probable," assuming that a hacker would have to first take over a page at another site. I disagree: sending out a mail message with the URL waiting to be clicked ("Bill Gates will send you ten dollars!") is just one alternate technique. While security problems with Unicode have been noted here before, this might be a new twist."
What is InterNic and such doing in the meantime to help prevent spoofs such as this? The Legal ramifications of this are interesting. One could also post stories with false links, that most people would never even realize weren't true.
There is a key failure. If someone tried to copy and paste the text into the URL and they weren't using the trick language, it wouldn't work. However if there is a link that says "microsoft.com" then that could send you to a different page. And as everyone knows, people are much more likely to click a link than to copy & paste it in the address bar.
Only dead fish swim with the stream...
I develop applications for a DSP company, and we've recently switched to using Unicode in our products. Unicode certainly has its quirks, and this is one of the more obvious ones. I fail to see why it has been implemented so widely, without very, very rigorous testing.
Actions like the one described in this article could bring down a company, if a person tried hard enough. Of course, Microsoft could just call Verisign and ask them to remove the Cyrillic domain, with no problems. But, for a small company, it could be hell. An entire user group using the same character set to access a certain website would be sent to a different site. In a worst case scenario, anti-company propaganda might be posted on the spoofing site, and it would deter people from visiting the "real" site in the future.
The only solution I can imagine is to simply prevent the translation of characters among character sets, especially in this sort of environment.
A Russian site, such as The Moscow Times, could have its site spoofed in exactly the same manner, and everyone using the Cyrillic character set (obviously, widely used in Russia, for example) would be sent to some other site, possibly indefinitely, knowing how registrars have been acting lately. This would create havoc for the newspaper and significant hurt revenue.
OS/2 - because choice is a terrible thing to waste.
That is false. Russian people had alphabet long before Cyrillic. Incidentally, that should really be proto-Russian, or Eastern Slavic since the people diverged into Russian, Ukrainian, and Belorussian much later.
So it could be said that "Russian Cyrillic" is redundant.
It is not. There are several "dialects" of the Cyrillic alphabet. They are mostly the same but a few letters are different. I already mentioned three of them above. There's also Bulgarian, Serbian, and I'm not sure what else.
I seriously doubt the the "c" and "o" characters mentioned in the article are unique to the K018R charset
The charset is called KOI8-R. Or are you using the l33t sp3lling?
___
If you think big enough, you'll never have to do it.
From the article:
...
But are international domain names even necessary? Kuhn, who is German, doesn't think so: "Familiarity with the ASCII repertoire and basic proficiency in entering these ASCII characters on any keyboard are the very first steps in computer literacy worldwide."
That's like saying basic numeracy is the first step for computer literacy worldwide, so we should go back to using IP addresses!
Currently email addresses and URLs are the only reason a native Chinese speaker needs to use ASCII. For someone from Germany, ASCII is pretty easy to handle, but for a lot of languages, Unicode URLs & email addresses are very necessary
you'd need an accent mark over the 'o' to keep it from sounding like an 'a', fyi. other than that, you're absolutely correct.
If there is a demand for a service which locates the authorative websites of corporations, then capitalism will provide. This is a lame argument specific to the way Google happens to work.
... It can't be done.
If there is a demand for something that we already have at this time, for free and with no effort? In other words, you would like it if I paid for something I already get now for free... well, if you can't find a good business model, why not create an artificial one?
What about the cyber-sqatting, cost, and creation of private monopolies? DNS is an ugly ugly solution to the problem of finding IP addresses.
Cyber-squatting is simple. Outlaw domain parking, domain transfers, false advertising (which is what registering www.books.com and pointing it at a porn site is), and enforce trademarks. If you want a domain, then use it. Use it for something other than pointing yet another name at your lame web site. Only allow registrations and de-registrations... if someone wants to try and sell the domain and someone else wants to pay money for it fine. But they don't get it, it just goes back into the unregistered pool. And if someone has a valid trademark (microsoft is valid, computers.com isn't) by all means give it back to the trademark holder. Duh. DNS is pretty handy for finding IP's, actually. It just isn't as good at making websurfing as effortless as you'd prefer. Or for keeping people from being assholes and polluting the namespace, I should add.
Market forces will create a demand for comprehensive search-engines which aren't biased, in fact, they already have.
Dumbass. On a fresh install of the browser of your choice (or lack thereof), you can't get everywhere you want to go only by clicking links. If the url field is hidden or disabled, which you advocate, you'll be reduced to clicking a toolbar button or a pre-loaded bookmark. I'm sure one such will be a searh engine... but with M$ can you count on its integrity?
What the hell are you ranting about? This has nothing todo with whether your ISP supports cgi.
So sorry, I thought you might have the ability to understand non-monosyllabic words. Let me try again...
I-S-P bad. No like us have nice web names. Must use bad homepage **DAMN*
I'm tired, so I'll try to make this clearer. If users are only ever allowed to use crappy homepage webspace, of course half the URL's on the net will be long and ugly. I also failed to mention that many commercial sites have bad web design... this accounts for the other half being ugly.
And if I got off on a rant, so what? I see someone like you talk out of your ass, I become a little bit upset. Well, guess what? If you want to add another protocol, pick a port number and get to work. I won't stop you. But stop ranting yourself about how the current ones are ugly, when you have no clue why they are even like they are.
DNS isn't broken, and it isn't ugly. As a protocol, it is highly distributable, robust, and solves the IP-human readable name problem as well as anything that has ever been published. It is the foundation of many protocols and services available on the internet, only one of which is the web. We don't need a seperate, incompatible system for the web, and you've offered nothing that would suffice for anything but that, and even then only poorly.
Whew, good thing you caught it in time! Don't worry, the credit card companies can take care of it, no worries, just enter your name,credit card number, social security number, and mother's maiden name at each of the following URLs:
(Those all use "ell" instead of "eye" when possible.. they look exactly the same with my fonts.. Since there already "homographs" in plain ASCII, and plus Javascript mouseovers can be used to change the browser status area, and plus many people don't even fully understand the difference between "microsoft.com" and "microsoft.evil.com", this Unicode trick is nothing to worry (more) about!)
Solution: Make brovsers default to displaying links to sites with non-ascii address different from regular links
Also since link display mey be overridden by style sheets, either make the browser override stylesheets for these links.
Display a warning when user follows one of these links
If this warning is displayed as a popup, if the user checks the "never show this warning again" display a text that explains why this is a bad idea
The only true way to security is to annoy your users into submission
- We are the slashdot. Resistance is futile. Prepare to be moderated -
Isn't the point of the article that now you can go to a Verisign approved website for (unicode of some big company) and have it check out properly because there is a verisign cert for the site (unicode of some big company)?
:)
People now seem to be good at knowing that if you get funny pop ups about self signed certs or certificates not matching the url that they don't put in their credit card number... now suddenly that doesn't apply, because you won't get that, and the differences aren't as obvious as those for something like paypaI.com or micros0ft.com
I'm trying not to sound like a lingual elite-ist by any means, but can anyone really say that we shouldn't standardize on English/ASCII?
The 5 billion people in the world who don't have English as their native language might. Some would argue that language is a cornerstone of culture, and that when a society loses their language, they lose a significant part of their culture. I've read parts of Shakespeare in German, and was very unhappy about the destruction of the writing. I know several poets of my native tongue (Poe, in particular) would be lost completely in translation. I have no interest in condeming other people to reading the great literature of their cultures in translation.
In any case, ASCII isn't good enough for English writing. French accents are used in English writing, as well as the ae and oe ligatures. Even in modern writing, proper quotes and apostraphes are needed, and footnote daggers often show up in English writing. For specialized work, mathematics, linguistics (even of English), historical English writing and APL all have thier own body of characters outside ASCII that need supported.
So... you can't respect other people's personal decisions on spirituality? Granted, the 900-numbers are gimmicky. But why should Astrology books be discredited as non-sense? Most mature people respect other's religious beliefs.
Although Astrology isn't a religion, it is faith-based, as religion is. Is Astrology scientific? No. Niether is the Bible (etc.). You might as well have worded that sentence to say "hell, astrology, christianity, and paganism still sell books...".
All I ask is that you respect other people's personal spiritual beliefs, whether that involves Astrology, Judaism, Wicca, or what have you. An exception is when you're discussing/debating spirituality or religion, but this isn't the case.
I don't believe in Christianity, but I don't attack a Christian's personal beliefs because I don't agree with them. I expect others to respect my personal beliefs the same way.