Spoofing URLs With Unicode
Embedded Geek writes: "Scientific American has an interesting article about how a pair of students at the Technion-Israel Institute of Technology registered "microsoft.com" with Verisign, using the Russian Cyrillic letters "c" and "o". Even though it is a completely different domain, the two display identically (the article uses the term "homograph"). The work was done for a paper in the Communications of the ACM (the paper itself is not online). The article characterizes attacks using this spoof as "scary, if not entirely probable," assuming that a hacker would have to first take over a page at another site. I disagree: sending out a mail message with the URL waiting to be clicked ("Bill Gates will send you ten dollars!") is just one alternate technique. While security problems with Unicode have been noted here before, this might be a new twist."
So, what would be the cyrillic for Slashdot.org?
It is widely used on russian-language IRC
networks like RusNet. http://www.irc.net.ru/
Anyone else remember using alt+255 and other special characters to make hard to open directories (idiot proof anyway) on shared command line systems?
You were eaten by a grue.
It's important because at least with yahii.com you know that there's something wrong with the address, ie the typo.
But with this unicode spoof, then you could go to a site you think is legitimate, and you'd have no way of knowing it's not.
"Derp de derp."
Should I be concerned?
What is InterNic and such doing in the meantime to help prevent spoofs such as this? The Legal ramifications of this are interesting. One could also post stories with false links, that most people would never even realize weren't true.
When you pay money, say with paypal.com, you always want to check the URL. Of course someone could have fake link like: "click here to pay with paypal" and then redirect you to their bogus site with the intention of stealing your passwords. But it would be fairly obvious from the location bar in the broswer that the URL was not paypal.com. But if unicode can be used to spoof the location bar then it will rope in even cautious users.
I recently received an email from a confused user who had received an email that appeared to be from Apple, and was selling Apple products using Apple logos, Apple website concepts and images, etc., but was not from Apple. He didn't sign up for the list, and though it appeared to be a legitimate Apple affiliate as far as I could tell (though perhaps one that used somewhat shaky methods to reach customers), he was confused why Apple was sending him email that he didn't ask for. It was his belief that the mail had actually come from Apple, because it looked like it was from Apple.
Non-nerds have proven to be extremely difficult to educate on the concept that "what email claims to be is not always what email is, and where it claims to come from is not always where it really came from". During the recent Klez outbreak, I even received a message from a nerd-friend saying that he thought my machine might be infected, because he received an infected message from "me". Of course it was spoofed, because I happen to be in a lot of peoples address books, but since I haven't used Windows on the desktop in over three years, it clearly didn't actually originate with my box.
Folks are just kinda thick about questioning the veracity of claims (hell, astrology still sells books and 900-number phone calls). And this could definitely be used for nasty purposes...and certainly will. Spammers will have a field day with this, because they can't help but seem 'fly by night' because they cannot establish a real brand name due to the disgusting nature of their busines. If they stand still, they'll get lynched. But if they can, even for a short time, hijack a real name that people trust, and offer up a too-good-to-be-true scam under that trusted name...well, you see where I'm going with this.
Of course, everyone here knows that unsolicited "business offers" by email are always scams run by filthy people...but my grandmother doesn't know it, nor do my parents or many of my non-nerd friends for that matter.
Just a thought. We'll see how it plays out, I reckon...
There is a key failure. If someone tried to copy and paste the text into the URL and they weren't using the trick language, it wouldn't work. However if there is a link that says "microsoft.com" then that could send you to a different page. And as everyone knows, people are much more likely to click a link than to copy & paste it in the address bar.
Only dead fish swim with the stream...
I develop applications for a DSP company, and we've recently switched to using Unicode in our products. Unicode certainly has its quirks, and this is one of the more obvious ones. I fail to see why it has been implemented so widely, without very, very rigorous testing.
Actions like the one described in this article could bring down a company, if a person tried hard enough. Of course, Microsoft could just call Verisign and ask them to remove the Cyrillic domain, with no problems. But, for a small company, it could be hell. An entire user group using the same character set to access a certain website would be sent to a different site. In a worst case scenario, anti-company propaganda might be posted on the spoofing site, and it would deter people from visiting the "real" site in the future.
The only solution I can imagine is to simply prevent the translation of characters among character sets, especially in this sort of environment.
A Russian site, such as The Moscow Times, could have its site spoofed in exactly the same manner, and everyone using the Cyrillic character set (obviously, widely used in Russia, for example) would be sent to some other site, possibly indefinitely, knowing how registrars have been acting lately. This would create havoc for the newspaper and significant hurt revenue.
Comment removed based on user account deletion
OS/2 - because choice is a terrible thing to waste.
At the moment these unicode domain names will not be displayed correctly by web-browsers, rather you will see a bunch of cunfusing control codes, so this threat isn't really a problem yet.
Of course, the underlying problem is that DNS is an ugly kludge which has long-outgrown itself. The administrative cost of constructing a massive global namespace is vast, and we can all see the opportunities for cyber-squatting it creates, to the detriment of the public interest.
These days I am more likely to go to Google and type in a few words, rather than try to guess the URL. The task of finding the website you are interested in should be left to the specialists (like Google and other search engines), we shouldn't try to maintain an ugly, broken, monopolistic, and expensive "first come first serve" architecture like DNS.
There is no good reason why a web user should ever need to see a URL (except perhaps momentum), any more than they need to see the HTML which makes up a document.
If you're serious about typing in Russian, you don't type the control-meta-alt-whacky sequences.
You spend $15 and buy a plastic keyboard overlay, one of those little flexible jobs with the alternate characters printed on them. Change your keymapping -- they make keymap files to match the popular overlay's plastic sheets, I'm told -- and you're done.
You cannot apply a technological solution to a sociological problem. (Edwards' Law)
That is false. Russian people had alphabet long before Cyrillic. Incidentally, that should really be proto-Russian, or Eastern Slavic since the people diverged into Russian, Ukrainian, and Belorussian much later.
So it could be said that "Russian Cyrillic" is redundant.
It is not. There are several "dialects" of the Cyrillic alphabet. They are mostly the same but a few letters are different. I already mentioned three of them above. There's also Bulgarian, Serbian, and I'm not sure what else.
I seriously doubt the the "c" and "o" characters mentioned in the article are unique to the K018R charset
The charset is called KOI8-R. Or are you using the l33t sp3lling?
___
If you think big enough, you'll never have to do it.
Lousy cybersquatters...
I believe it would be something along the lines of .
Yep, you're right. Let's make all the grandmothers stay in their rocking chairs where they belong. The internet is for young, savvy nerds. Knitting is for old people.
Seriously, I understand your perspective, and it isn't as though I'm suggesting legislation or something stupid like that (I'm anti-government on all issues)...I'm just saying I think people will get scammed using this method. And I think it may be damaging to legitimate companies as well. This is unfortunate on two counts...it is bad for my grandmother, and yours, and it is bad for honest businesses who would never use spam marketing or pull some kind of bait-and-switch, or just plain ol' scam.
That's all...I don't have solutions. I'm just griping about the problem. Isn't that what slashdot is for, hand-wringing and griping?
That is false. Russian people had alphabet long before Cyrillic. Incidentally, that should really be proto-Russian, or Eastern Slavic since the people diverged into Russian, Ukrainian, and Belorussian much later.
Fair 'nough. The good bishop simply wanted a written language that he understood, so that he could teach his religion. So the creation of the Cyrillic alphabet is a matter of convenience for the religious powers-that-be of the time. Not a new story, unfortunately. And your point about proto-Russian is well-taken.
It is not. There are several "dialects" of the Cyrillic alphabet. They are mostly the same but a few letters are different. I already mentioned three of them above. There's also Bulgarian, Serbian, and I'm not sure what else.
While in the broadest sense, you are right (I have a great story outside the context of this article on a miscommunication on my part with a Ukranian individual who I mistakenly thought was speaking Russian) in the context of my point about those two specific characters, I disagree. Again, a Unicode geek could prove me wrong.
The charset is called KOI8-R. Or are you using the l33t sp3lling?
Lol, heh. You are right on there. I was just dashing off a reply to the article, and wasn't paying enough attention to the niceties. l33t sp311ing was farthest from my mind, b3 a55ur3d.
political_news.c: warning: comparison is always true due to limited range of data type
Most people just blindly click OK, because it is usually OK.
A lot of small e-business sites want to use their hosting provider's cert, but don't want the user's browser to display the hosting company's domain rather than their own. (Yes I know it's stupid, people are picky as fuck when you are making web pages).
Anyway, that causes the browser to warn that the cert is not valid for the domain it is being used in.
It's kinda possible to get around this using frames, but then the browser might say something about mixed secure and unsecure items on a page. The only real way to do it right is to just let the users see the hosting provider's address, as far as I know, or have the site buy their own cert.
I've had enough abrasive sigs. Kittens are cute and fuzzy.
St. Cyrill developed the Glagolic alphabet, based on the slavic dialects spoken on the Balkan peninsula, and used it in translating the Christian holly scriptures for the slavic tribes in Moravia (today's Hungary/Slovakia). His student, St. Clement, developed the improved Cyrillic alphabet and spread its use in Bulgaria, from where it was adopted by Russia, Serbia, and others...
Today there are several variants of Cyrillic - Bulgarian, Serbian, Macedonian, Russian, Ukrainian, and it was used even in some of the former soviet republics and Mongolia, whose languages are very far from Slavic.
Also, KOI8 is not considered the Cyrillic codeset by other cyrillic-using nations, it is rather considered the Russian cyrillic code set. Other codesets are the Windows 1251, and ISO-8859-5. The latter would arguably be the standard Cyrillic code set.
From the article:
...
But are international domain names even necessary? Kuhn, who is German, doesn't think so: "Familiarity with the ASCII repertoire and basic proficiency in entering these ASCII characters on any keyboard are the very first steps in computer literacy worldwide."
That's like saying basic numeracy is the first step for computer literacy worldwide, so we should go back to using IP addresses!
Currently email addresses and URLs are the only reason a native Chinese speaker needs to use ASCII. For someone from Germany, ASCII is pretty easy to handle, but for a lot of languages, Unicode URLs & email addresses are very necessary
Dan Bernstein has a proposal for internationalized domain names which solves this problem and many other problems. It's called IDNC3. IDN stands for ``internationalized domain name.'' C3 stands for ``clean, careful, conservative.''
Don't piss off The Angry Economist
"...this "superior" Lunix operating system's complete lack of Unicode support..."
Try Linux. It's had Unicode for years.
Warning: this article may contain humor, sarcasm, parody, and perhaps even irony. Read at your own risk.
1) Some people are not good at spelling, and wouldn't know microsoft.com from microssoft.com, especially if it's just seen in a few quick glances.
.biz or .info TLD does not mean it is the same company... but no doubt alot of people think that's true.
n =allyourmoneyarebelongtous
2) There are more TLDs out now, and the same name at a
3) There's always the old numeral "1" swapped for the lowercase "L" or the uppercase "I", trick, among other similar things that never involved Unicode, but rather human vision and high-resolutions.
4) The "@" symbol in the URL trick, like http:\\microsoft.com\moneyfrombil@haxor.com?actio
So if you haven't figured out my point yet, a good percentage of people that use the internet are going to be fooled by far simpler feats of social engineering. Who needs Unicode to do it?
Er, no. Cyril developed the "Cyrillic" alphabet, although your statement of his intentions is correct. (I don't believe he was much of a saint, btw) :)
I do thank you for the correction on the charsets. I kind of knew that would happen
political_news.c: warning: comparison is always true due to limited range of data type
Actually, no. Glagolitic was indeed invented by Cyrill and Methodius, in the 9th century. I don't know where the previous poster got the St. Clement reference. See here for the character set and a bit of history.
These two also invented cyrillic. The difference is that glagolitic didn't survive very long, while the cyrillic is still in use today. The last country to use glagolitic in any quantity is Croatia, up to the end of the 19th century.
Tsunami -- You can't bring a good wave down!
If you buy something online without using a credit card, you deserve to get scammed.
If you buy something with a credit card, not only will you get your money back (actually never lose it in the first place), but the scammers will likely go to jail.
Besides, why are you clicking on links in your spam anyway?
Even better... I seem to recall a scam that did just that with paypal. They sent out bulk mail about updating your account or something but the link was not paypa(lower case 'L').com but paypa(Capital 'I').com and had made a carbon-copy of paypal's website, hoping you would log in. The address in the location bar looks identical for both. This sounds like the same kind of thing but using Unicode to make the spoof.
Comment removed based on user account deletion
My friend told me that a few years ago he was looking for a domain name to register. After some poking around he discovered that microsoft.net was up for grabs. He then proceeded to go to his dad to ask for the $10-$15 (don't remember the exact amount) he needed to register the domain, needless to say his dad refused!!
I stole this Sig
Ok, first take microsoft.com (alternate spelling), name your mail gateways identitcal to microsoft's, and then send out emails (as balmer@microsoft.com?) to a lot of MS employees, telling them to remove IE from XP ..
;-)
From there on, it only gets better and better. Think of the countries you would be able to influance, technology developement you could steer, and leaked memo's you could fabricate..
Damn i wish i had thought of it
Er, no. Cyril developed the "Cyrillic" alphabet, although your statement of his intentions is correct. (I don't believe he was much of a saint, btw)
A c/cyrillic_alphabet
I am sorry, but you are wrong - please see my other post for some links. Here is another: http://education.yahoo.com/search/be?lb=t&p=url%3
IMO, the major contribution of St. Cyrill and Methodius is not the creation of an alphabet, but their disputes with the Western church and the Pope regarding the right for the different peoples to learn and practice Christianity in their own language. Up to that point only Latin, Greek and Hebrew was used in church services...
Not necessarily.
Unicode defines character code points, but doesn't specify their appearance.
There's nothing preventing an application from using lame fonts for glyphs, and in fact many do.
On average, unicode implementations vary from bad to utterly horrible.
One way to control this would be to restrict the valid characters based on the TLD.
...
.com/.org./.net as ASCII, although they are meant to be global they are based on the Latin character set.
So for example '.uk'/'.au'/'.us' etc. can ONLY have ASCII 2nd level domains. '.de' Can only have German characters, '.fr' only French, and so on
Then for completely different character sets, you have new Unicode TLDs (Arabic, Greek, Chinese), which can only have their relevant characters.
I guess you leave
Of course, this adds complexity - but you can do all the testing for validity when the domain is registered (i.e. a web client can request any URL, but dodgy mixed character set domain names cannot be registered).
It's impossible to prove that someone hasn't inserted themselves in between you and the server, giving you a bogus cert, and pretending to be you to the server.
This is the reason for trusted signatures on certs.
Hit google for "man in the middle attack" if you want to know more.
DNA just wants to be free...
Ah, but then you couldn't get the pictures of the cousin's sister's kids emailed every time they get an award at school. Or the forward of the forward of the quoted forward of the latest monster joke to wander the 'net.
The same risks exist today with ASCII domain names: transposed letters "1lI", "O0", playing tricks with "@" and most user agents.
You just must not take anything for granted which you see or read on the web.
Yes, but you're forgetting, "Bq--at77w373jih7xepx7om7p6zx7oq" cannot be trademarked, because it is a common word, like "door" and "window."
The speed of time is one second per second.
If a domain needed to be hijacked, thats it.
-
The only way to get rid of a temptation is to yield to it. Resist it, and your soul grows sick with longing for the things it has forbidden to itself. - Oscar Wilde (1854 - 1900)
Well, there goes all the security of being able to find your gay porn when you want it.
How can we continue to believe in a just universe and freedom to eat crackers if we have no ale?
Domain names starting with bq-- are Unicode domain names mangled back into ASCII, so you were probably in the right place.
Win dain a lotica, en vai tu ri silota
Ohter english letters to fade is yoch [looks like a 3] - this is the z in Menzies = Men3ies "Menges".
Also of note is digamma. In the greek number system, this is 6, that is, the 6th letter of the alphabet. As a letter, it appear between epsilon and zeta. Since our alphabet is derived from the greek, one notes the letter here not only looks like digamma, but preserves much of the original sound: F. Phi was an asperated p.
Cyrillic bears a much closer resemblance to the classical greek letters, and the theta, indeeds represents an f here.
Unicode reflects current realities. There is more than one Cyrillic Alphabet, just as there is more than one Latin alphabet.
OS/2 - because choice is a terrible thing to waste.
Yeah, that's why a couple of Israeli college students were unable to register mirsoft.com (spelled "miсrоsoft")...oh wait a minute, what were they saying again?
20 January 2017: the End of an Error.
... so it seems safe to say that trust is the foundation of their business. Essentially, we trust Verisign to ensure that we're communicating with whom we think we're communicating, and to protect us from various forms of spoofing. They should therefore, IMHO, actively avoid even the appearance of impropriety.
However, we all remember the Microsoft certificates they mistakenly gave out to a third party.
Now we've got them registering another domain to someone that looks just like "microsoft.com." While it's tempting to absolve Verisign of guilt in this, I think they were asking for it. After all, even I thought of this possibility when I first heard about Unicode domain names, and I'm not the sharpest knife in the drawer. You've got to think someone at Verisign raised the possibility, but they chose not to deal with it.
Again, one might be tempted to say that this isn't their problem, if not for the fact that they are in the trust business. As the article says, "Certification agencies (which include VeriSign) ensure that encoded names are not misleading and that the registration corresponds with the correct real-world entity." It should not be technically difficult, for instance, to build a set of lists of visually similar Unicode characters and to refuse to register domains visually identical to existing ones. Maybe they should decide to forgo a relatively small amount of revenue and to refuse to sully their reputation with such inevitably deceptive domain registrations, especially considering that they interfere with Verisign's core business.
Of course, none of this compares to the letters they sent out trying to fool people into switching their domains over to Verisign. The other two were negligence and foolishness, but that was an active attempt to deceive from a company that's selling trust.
It all leaves me in a bit of shock. It's not that I'm shocked to see a company doing stupid and deceitful things; it's that trust is Verisign's primary asset. Hearing about these (colossally, in my mind) stupid decisions is like hearing that GM decided to torch all its manufacturing plants and assasinate all its employees. It leaves me with two questions: "what they hell are they thinking?" and "why does anyone continue to do business with Verisign?"
Comment removed based on user account deletion
Domain spoofing is one are. But what if you see an email address on a business card, say @mirsft.com? How do you know what encodings are those 'c', 'a' and 'o' are in (for those with UNICODE brain-damaged browsers the address above should look like ca@microsoft.com)? Same goes for URLs, etc. Another option -- say a Swedish company registers an URL that perfectly represent the name of the comapny in Swedish. With all those umlauts and whatever-they-are-called-those-circles-over-A. And you are sitting there with a US_en keyboard -- how are you expected to type that URL into a location field in your browser?
For the use-cases like this I think that multilingual URLs are a Bad Idea (TM).
--AP
Unfortunately, it doesn't protect against 'cekc' (I can't be bothered to get type this in Cyrillic here).
This issue was also discussed in my book Secure Programming for Linux and Unix HOWTO. Look at the section on semantic attacks.
- David A. Wheeler (see my Secure Programming HOWTO)
Can you perhaps explain why KOI8 characters are out of order? This is so stupid and I'm amazed KOI8 is still in use. How do you sort stuff alphabetically if you can't just do an integer comparison? Would be really slow to use some funky custom sorting routine.
___
If you think big enough, you'll never have to do it.
But what if you see an email address on a business card, say ñà@miñrîsîft.com? How do you know what encodings are those 'c', 'a' and 'o' are in[...]?
Since the surrounding characters are Latin, I think it safe to assume they are 'c', 'a' and 'o'. (BTW: encodings are things like ISO-8859-*, KOI8-R, and so on, which the IDN will only use Unicode. The question should be what script they are in.)
Same goes for URLs, etc.
You've never been prohibitied from using non-ASCII stuff in URLs.
Another option -- say a Swedish company registers an URL that perfectly represent the name of the comapny in Swedish. With all those umlauts and whatever-they-are-called-those-circles-over-A. And you are sitting there with a US_en keyboard -- how are you expected to type that URL into a location field in your browser?
Depending on your system, you can use ALT- or SHIFT-CTRL- combinations and the character numbers. Character Map or the equivelent will also let you enter the characters in.
OTOH, why is this a problem? If they have a large non-Swedish audience, they ought to register an all-ASCII name. If they chose not to, then that's their problem. Odds are any such site will be in Swedish for Swedes.
Just because it's a technical no-brainer doesn't mean it's legal, and doens't mean it even treads on laws that have anything to do with the internet.
If you pretend to be someone else, or if someone registered an alternate lookalike domain for microsoft.com and used it to in any way whatsoever to benefit from the fact.. they'd be in deep sheep.
That is, if you are interested in the dry, technical details... ;-)
Verisign's activites as a domain registrar are NOT the same thing as their CA business.
They are not required to, nor do they claim to, verify domain registrants UNLESS those registrants apply for digital certificates.
Yes, verisign are scum.. but you are barking up the wrong tree here. They are not at all requred or expected to verify domain registrars.
Hey. I wish they were. Imagine how many domains would have to be revoked? Literally millions.
How do you sort stuff alphabetically if you can't just do an integer comparison?
Unicode Sorting Algorithm.
Would be really slow to use some funky custom sorting routine.
What are you running? There are massive databases that use binary compare, and bitty boxes that use binary compare, but even my 386 should be able to do decent sorting in a negligable amount of time.
I don't know of many character sets that put the characters in sort order. ASCII doesn't work for English, because capital letters and lower case letters don't sort together. Latin-1 puts all its characters after ASCII, when some of them should sort with the ASCII characters.
As for why, the fact is it's not an option in a multilingual enviroment. Lithuanian sorts y after j; Swedish, German and Danish use some of the same accented characters, but sort them differently. The whole concept of binary sorting fails for some languages; Maltese and traditional Spanish both sort two letters ("ch" and "ll" for Spanish) as if they were one, and German sorts one letter ("ß") as if it were two ("ss").
Solution: Make brovsers default to displaying links to sites with non-ascii address different from regular links
Also since link display mey be overridden by style sheets, either make the browser override stylesheets for these links.
Display a warning when user follows one of these links
If this warning is displayed as a popup, if the user checks the "never show this warning again" display a text that explains why this is a bad idea
The only true way to security is to annoy your users into submission
- We are the slashdot. Resistance is futile. Prepare to be moderated -
Can you perhaps explain why KOI8 characters are out of order?
Because they were ordered as a transliteration for the Latin alphabet (sorry, can't put it in Cyrillic): ABCDEF instead of ABVGDE.
My guess is that this was done to easily transform Russian text written using the Latin alphabet into Cyrillic by simply flipping a bit.....
actually as it stands right now you can not have åöä in your address, a large complaint of people in sweden. å is not just an a with a ring over it. å is actually another letter entirely, something lots of english speakers can't get the grasp of.
Isn't the point of the article that now you can go to a Verisign approved website for (unicode of some big company) and have it check out properly because there is a verisign cert for the site (unicode of some big company)?
:)
People now seem to be good at knowing that if you get funny pop ups about self signed certs or certificates not matching the url that they don't put in their credit card number... now suddenly that doesn't apply, because you won't get that, and the differences aren't as obvious as those for something like paypaI.com or micros0ft.com
actually as it stands right now you can not have åöä in your address
You can't have it in the domain name. You can have it in the part of the URL following the domain name.
å is actually another letter entirely, something lots of english speakers can't get the grasp of.
I would be surprised to find many English speakers who couldn't learn that. I would expect that most of them just don't know that fact right now, and that many of them really don't care.
unless they run thier own servers, hosting is gonna be a little hard to get. I run a web hosting company. When a user signs up for hosting they are immediately ushered to the credit card processor, then after that it askes them what passowrd they wish to use on the system. after that the domain name, password, and other stuff are stuck into a database and an email is fired off to me to let me know someone signed up, containing the url of the page that will give me the details. anyway, i open up an ssh session to the server and start setting it up. when i enter the domain name into the httpd.conf i am not typing in cyrillic. I simply fire up vi, and type the domain name in there using regular latin characters. Same when I set up the DNS zone files, email, and other such stuff. Sure they can get the domain name there, but actually getting the page to show up is another matter all together. I believe even russian ISPs would assume the letters were latin characters and not thier cyrillic counterparts if they are used to spell english words (as in known company names to be used in some sort of scam)
actually you can not have it anywhere in the url. i tried it. first using cute ftp i tried to create a directory on my web site called "pål" cute wouldn't let me, so i had to ssh in and do it. i created the directory, put i file in there, but when i go to http://www.mydomain.com/pål i get a 404. so you can't have thoe symbols in URLs at all.
In windows (the EU edition) - anyone. Just add the language. Your only problem is that the idiots in Redmond have yet to add a keyboard editor (something that has been present in all third party internationalisation packages since Windows 3.10). As a result you will be stuck with some extremely obscene keymap inherited from a cyrillic typewriter. Alternatively you can pick up dlls from third party cyrillisation packages made for older windows versions and violate the sanctity of the MSFT sertificate by slapping it on top of the current ones. It usually works. And you get a proper keymap.
Under unix it is usually a bit more p*** in the a*** because most internationalisations rely on Xmodmap and it no longer works nowdays. Once again by default you will get stuck with something you cannot use unless you have a keyboard that is engraved with the alternate characters. Once again you will need to spend half an hour with vi swearing at whoever made Xmodmap not to work any more in order to get a less obscene keymap.
Baker's Law: Misery no longer loves company. Nowadays it insists on it
http://www.sigsegv.cx/
The Alt-keypad trick only works for 8-bit characters, AFAIK. You can copy characters out of Character Map (in Win2K/XP, not Win9x...Win9x's Character Map doesn't grok Unicode) and paste them into whatever you're typing, though: , , etc. (I think the first is "da" and the second is "nyet"...saw something that looked like that in a banner ad on a Russian website recently.)
If all else fails and you're editing HTML, you can escape the character entries, so that (for instance) gets entered as да.
20 January 2017: the End of an Error.
I'm trying not to sound like a lingual elite-ist by any means, but can anyone really say that we shouldn't standardize on English/ASCII? Just about every country where English is not the native language, English is taught to their school children from early on.
The internet has shrunk the barrier to exchange information, which has made diverse languages even more significant of a barrier. If we use UNICODE and just let accept that everyone wants to use their own language, then the internet will end up as a group of national islands of information. Each group will surf their set of native language web sites. When you search the web, the information on that Nokia phone might not be readable by you (Babblefish isn't a solution).
Language has always been a barrier, and I hope the internet will be the tool by which that barrier is torn down; not the tool which escalates the problem.
Slashdot gets worse every day... Pipedot: News for nerds, without the corporate slant
I think your knowledge of the subject is a bit off...
It was not developed for russian use at all. It was developed in Moravia which spanned most of current Chech Republic and bits of Slovakia. In other words it was developed for what has become Chech nowdays. The people who developed it were fairly high in the hierarchy of the Moravian church but got nailed for herecy by their superiors in Rome.
After that they fled to Bulgaria and from there on the alphabet spread to Russia. Considering that at that time the Bulgarian Empire span most of the Balkan peninsula and both bulgarians and serbians claim it to be in their ancestry I will skip on where did Serbians get the alphabet to avoid a Balkan flame war. Let's say once upon a time it was one country.
After that the alphabet went through at least several simplifications and changes of the writing. One around 9-10th century, one during the church reform in the middle ages in russia and one more in most slavic countries just before the first world war.
In any case:
Baker's Law: Misery no longer loves company. Nowadays it insists on it
http://www.sigsegv.cx/
W3C has a page on the subject. I don't know why it doesn't work in your case; I suspect you're dealing with character set issues, but you didn't mention what webbrowser you were using or anything, so it's hard to tell what went wrong.
Many ISPs do the whole sign up process automaticaly.
Maybe you would like to save some time as well - check out - www.rodopi.com.
Basically, the consensus in the end was that it is impossible to avoid this sort of problem as long as you have a standard that encodes characters instead of glyphs (that means that Latin "o" and Cyrillic "o" are different characters, even though they look the same).
A character set that encoded glyphs instead of characters could avoid this. However, such charsets are extremely tedious to implement. It has been tried with the Adobe glyph registry and has been found insufficient.
In practice, glyph-based character sets are unusable. The reason is that they cannot be made fully round-trip compatible with existing character sets, such as ISO 8859 or the Windows codepages, because these legacy character sets encode characters instead of glyphs. If URLs were encoded in such a glyph-based character set, it would be impossible to embed URLs in any document in a legacy character set. No URLs in e-mails.
As a result, the only solution is to have application and operating system vendors implement checks for such situations and to have URL registries reject such obvious spoofing attempts (e.g. no mixed-alphabet URLs). Since the problem is not fundamentally different from registering slashdot.org, it is not even a problem that we weren't already aware of.
There is absolutely no reason to panic.
yeah, i really did not think of that. i prefer to do it manually instead of something doing it automatically, A) because i don't want to pay for a tool that helps me do it automatically B) I am too lazy to do one up for myself C) I use plesk server administrator on some of the servers and i don't think i want to play plesk to develop something for me since all thier php source code is encrypted.
The Homograph Attack
This is slightly tangential, but seems a good place to ask: does anyone know how to get Microsoft IME under Windows XP to use a Dvorak layout for romanji input when typing Japanese ?
For English I just use the US Dvorak input method, but when the language is set to Japanese there seems to be no way to use Dvorak other than tediously modifying the romanji->kana input table, which is clearly the wrong way to go about things.
graspee
oh geez, i can see the creative Goatse links now.
THERE IS NO DATA. THERE IS O
The average literate chinese person has to know upwards of 3000 unique characters. Picking up the ~30 ascii glyphs needed to use the current internet is trifling in comparison.
Knowing a sufficient number of english words is much more difficult, but completely unnecessary for using email/DNS.
Also, I imagine if the "internet started in china", they would have included the measly 26 uppercase latin letters, as they are kanji's too. Most of the sites youd be interested in as an english speaker would stick to those anyway...
I have had numerous discussions (or better: fights) with people about this. Usually they feel the security problems can be solved without real effort (by somebody else of course), but feel what I really wanted is to discriminate against them.
It never ceases to amaze me that some people rather risk an entirely working system, like the DNS, than accept that technology cannot accomodate their personal needs that fast and that some of their personal needs may be very difficult to fulfill, and that this is not the fault of the engineers but rather a consequence of the fact that the technology they now want adapted to their needs was invented by people from another culture! If the WWW was a russian invention, of course everybody participation in it would have to learn russian language at first! Maybe even still some decades later. Now it was mostly american so it is ASCII and english. Those that cannot adapt to that should wait until their needs can be safely and cost-effectively accommodated or do the nedded extensions from thier own ressources!
But obviously many people just "want" without any willingness to contribute or invent or implement by themselves. I foresee interessting times for anybody using text-based identities, like names.
Most ACs are not even worth the keystrokes to insult them. Be generically insulted and ignored otherwise.
how thew fuck do i ssh uin and create a directory with fucking IIS. i tried mozilla and netscape on linux and i tried mozilla, opera, and IE on win32. none worked. oh yeah, i used apache.
Taken to its logical conclusion, if you can't handle life (all of it?), then you shouldn't be alive.
The thing is that people do have to cope with things that they do not understand. Societal norms should be such that minimal damage is inflicted due to lack of understanding of consequences. This applies to adults as well as children and infants.
As an aside, to me the various flavours of Cyrillic look like the character set was ultimately derived mostly from Greek, which seems reasonable if it was developed in the Balkan region. Anyone know any further-back history on it??
~REZ~ #43301. Who'd fake being me anyway?
I would think a better term to coin would be "homoglyph", because that is what it is. Two different characters with the same glyph. Plus this has the advantage of not being a word already in use (to my knowledge).
This was only true in Western Christendom and then only true to a limited extent. For example, in the west, the first Christian missionaries to the British Isles translated the service books of the early Church to Gaelic and other Celtic languages. In the east, the the generally accepted practice was to use the venacular. This is why some of the oldest extent copies of the Bible are in one of the Ethiopic languages, Coptic, Syrian, etc.
The Roman canon that the liturgy could only be practiced in one of the tongues spoken by the apostles was of relatively late invention and only applied to congregations under the sole apostolic see of the west, Rome. Congregations under the apostolic sees of the east always used the venacular.
Hence it is somewhat ironic that many eastern Churches refuse to update the liturgy from being in liturgical Greek or old Slavonic into their modern equivalents.
Regards,
-l
The first time I got a Klez message, I sent a reply saying that I thought their machine was infected. I only discovered the forgery problem when I started reading up on it. That's probably what happened to your friend.
If you aren't really bothered by viruses (i.e., keep you system reasonably secure and don't use MS), then their new tricks can sneak up on you.
I think we've pushed this "anyone can grow up to be president" thing too far.
Nobody who understands text data would use anything other than Unicode except for legacy handling. Using different encodings for different languages is as ridiculous today as using different encodings for English on different platforms used to be before everyone agreed to exchange data in ASCII.
"Those who have never entered upon scientific pursuits know not a tithe of the poetry by which they are surrounded."
MrHat, I've missed you. I had started to think I was unworthy of your limericks.
Tell me the truth though, is it, or is it not incredibly sad, that nearly every topic/conversation on this site can be reduced to a 5 line poem? It tells the lie of just how shallow most of this is...
...but it will have to be part of the solution.
The problem is the diversity of characters used by people around the world, regardless of how they are encoded. Encoding them in anything other than Unicode would make the problem dramatically worse because no group will sit back for long and allow their language to be excluded from global naming protocols on this shared "worldwide" platform.
Having everyone share an ASCII-only system is no longer a viable option, so either everyone shares a single system that covers all languages (Unicode is the only viable option), or the system breaks up into a composite of conflicting encodings. (.com could be registered as half a dozen different byte sequences by different registrars.)
The Unicode solution is the only one that makes sense, then you have to look at rules for the use of characters. You would have to look at the rules for the use of characters even without Unicode. It's just that Unicode makes it so much simpler than the composite alternative that a solution is probably possible.
This IDNC3 proposal is a good start, but there are even more issues. People who wave their arms about the "problems of Unicode" aren't helping, though. Almost all of them are really just advocating "let's keep it simple by limiting it to the characters I need and disallowing yours", and that won't fly any longer.
"Those who have never entered upon scientific pursuits know not a tithe of the poetry by which they are surrounded."
Yes, and it's a lot harder for you to write the characters needed for programming in C++ or Perl. I'd rather have my English keyboard.
HOWEVER, what I'd like best of all would be to replace the dumb keyboard (hit a key, get the character printed on the key cap) with smart input methods at the OS level (maybe keyboard driver level if you don't have a GUI).
For example, I should be able to type user-defined abbreviations and have the OS replace them with what they represent. I should be able to type "deja vu" and have the OS input dictionary automatically replace it with "déjà vu" and so on. We should be able to use the tab key for autocompletion and substitution, so if I type e/ then tap the tab key, it might replace e/ with é, and so on.
Yes, I know we have some of this functionality in unix shells like bash, some in emacs, some in word processors like MS-Word, etc. I'd like it at the OS level so that no matter what I was typing into, I would have a virtual keyboard much more powerful than my simple physical keyboard and one that I could optimize for the characters/words/phrases I needed most often.
"Those who have never entered upon scientific pursuits know not a tithe of the poetry by which they are surrounded."
actually i clearly stated that cute didn't work so i had to ssh in. read it again before you complain about it. second, i don't really care what the w3c has to say about it. it simply doesn't work. using standard server software and standard client software, it simply does not work.
You are absolutely correct, but try explaining that to a customer that insists they want their own domain name to be on the "check out" screen, and not their hosting provider, but also refuses to buy their own cert. They won't understand, and they won't listen. Maybe allowing hosting providers to sign certs for the domains they host could be a solution.
I've had enough abrasive sigs. Kittens are cute and fuzzy.
Of course, this is easy to defeat with a simple combination of backticks, ls -1 and wc.
The best way I discovered to hide the contents of a directory in unix is:
Unix is rather unhappyful trying to cd to a directory that has a / as part of its file name. Shell quoting tricks won't get you past it, since it's the kernel handling the /
Of course, you had to un-/-ify the directory every time you wanted in, but hey, the price of security...
No. THe point of hte article is to try to blame this on verisign when in fact they are doing nothing wrong.
It sounds like you don't really understand how certificates work. Verisign will NOT issue you a certificate for www.microsoft.com using some cyrillic characters. So there is no way a site can present a certificate, signed by verisign, indicating the site is microsoft.com
The article tries to make it out that because verising issues certificates, it shoudl ALSO be verifying domains people register.
It is a totally legitimate domain. There is nothing WRONG with it.
It's particular uses of it that can be wrong, but not the domain itself.
And as to what you said, you, directly or indirectly, implied that Verisign should not allow domains like this to be registered because they are in the certificate authority business.
Totally different things.
I don't see the connection you are drawing.
No, bill clinton's relationship with his wife has nothing to do with his ability to govern, and I cannot *believe* that people actually think it has an effect.
Actually, what I realy think (read this carefullY) is that it's a big deal because people THINK that other people think that it has some effect, and don't want to appear different.
Ha, I can definitely believe that.
The other day at work I scribbled a couple of words in Korean on my notepad (I studied Korean a bit). Later my boss came by and saw it and asked what it meant. I told him and then showed it to the *Chinese* guys who work in my area to see if I had it right. They glanced and said "oh that's Korean" and my boss said "oh, I see, but can't you read it at all?" Errrr, no.
With Chinese, I can understand there is a huge learning curve, but a lot of people don't know that they could pick up basic Korean (alphabet and making words) in about a day.
I guess when people look at Chinese/Japanese/Korean they see "Chicken Scratch language" and don't really look for the obvious distinctions among them.
mark
If you want to make an apple pie from scratch, you must first create the universe. -- Carl Sagan
Someone once sent an email to my yahoo account that looked just like the yahoo login message. I would have fell for it, but IE didn't auto-fill my login into their fake text field.
The Communications of the ACM article, is available online, at <http://www.csl.sri.com/users/neumann/insideris ks.html#140> (Inside Risks 140, CACM 45, 2, February 2002).
So... you can't respect other people's personal decisions on spirituality? Granted, the 900-numbers are gimmicky. But why should Astrology books be discredited as non-sense? Most mature people respect other's religious beliefs.
Although Astrology isn't a religion, it is faith-based, as religion is. Is Astrology scientific? No. Niether is the Bible (etc.). You might as well have worded that sentence to say "hell, astrology, christianity, and paganism still sell books...".
All I ask is that you respect other people's personal spiritual beliefs, whether that involves Astrology, Judaism, Wicca, or what have you. An exception is when you're discussing/debating spirituality or religion, but this isn't the case.
I don't believe in Christianity, but I don't attack a Christian's personal beliefs because I don't agree with them. I expect others to respect my personal beliefs the same way.
(1) Which glyphs are the same are entirely font dependent.
(2) Greek letter A lowercased should look like ; Latin letter A lowercase should look like a. There are 23 o-like characters, some symbols, some alphabetic characters, 1 ideograph; each of them has their own properies, and many of them may or may not look the same depending on the font.