Slashdot Mirror


Spoofing URLs With Unicode

Embedded Geek writes: "Scientific American has an interesting article about how a pair of students at the Technion-Israel Institute of Technology registered "microsoft.com" with Verisign, using the Russian Cyrillic letters "c" and "o". Even though it is a completely different domain, the two display identically (the article uses the term "homograph"). The work was done for a paper in the Communications of the ACM (the paper itself is not online). The article characterizes attacks using this spoof as "scary, if not entirely probable," assuming that a hacker would have to first take over a page at another site. I disagree: sending out a mail message with the URL waiting to be clicked ("Bill Gates will send you ten dollars!") is just one alternate technique. While security problems with Unicode have been noted here before, this might be a new twist."

36 of 432 comments (clear)

  1. Our Task is Obvious by donnacha · · Score: 4, Funny


    So, what would be the cyrillic for Slashdot.org?

  2. old trick by krokodil · · Score: 3, Interesting

    It is widely used on russian-language IRC
    networks like RusNet. http://www.irc.net.ru/

  3. Done in DOS a long time ago by aoihai · · Score: 4, Interesting

    Anyone else remember using alt+255 and other special characters to make hard to open directories (idiot proof anyway) on shared command line systems?

    --
    You were eaten by a grue.
    1. Re:Done in DOS a long time ago by chabotc · · Score: 3

      Tabs, spaces, dots and even backspaces... *sigh* those were the days directories could still be confusing, hidden and full of illegal software.. On second thought, nothing changed realy, except that now nothing on the net is hidden anymore ;-)

  4. I gave m1cr0s0ft.com my credit card number!!!! by Anonymous Coward · · Score: 4, Funny

    Should I be concerned?

    1. Re:I gave m1cr0s0ft.com my credit card number!!!! by Dr.+Awktagon · · Score: 4, Insightful

      Whew, good thing you caught it in time! Don't worry, the credit card companies can take care of it, no worries, just enter your name,credit card number, social security number, and mother's maiden name at each of the following URLs:

      • AMERlCANEXPRESS.COM
      • ClTlBANK.COM
      • FlRSTUSA.COM
      • DlSCOVERCARDS.COM

      (Those all use "ell" instead of "eye" when possible.. they look exactly the same with my fonts.. Since there already "homographs" in plain ASCII, and plus Javascript mouseovers can be used to change the browser status area, and plus many people don't even fully understand the difference between "microsoft.com" and "microsoft.evil.com", this Unicode trick is nothing to worry (more) about!)

  5. Workaround by neolazer · · Score: 3, Insightful

    What is InterNic and such doing in the meantime to help prevent spoofs such as this? The Legal ramifications of this are interesting. One could also post stories with false links, that most people would never even realize weren't true.

  6. WHY THIS IS IMPORTANT by Anonymous Coward · · Score: 5, Informative
    people seem to be missing the point in this thread. Here is why this is very important.

    When you pay money, say with paypal.com, you always want to check the URL. Of course someone could have fake link like: "click here to pay with paypal" and then redirect you to their bogus site with the intention of stealing your passwords. But it would be fairly obvious from the location bar in the broswer that the URL was not paypal.com. But if unicode can be used to spoof the location bar then it will rope in even cautious users.

  7. I would have thought it wasn't a problem except... by SwellJoe · · Score: 4, Informative

    I recently received an email from a confused user who had received an email that appeared to be from Apple, and was selling Apple products using Apple logos, Apple website concepts and images, etc., but was not from Apple. He didn't sign up for the list, and though it appeared to be a legitimate Apple affiliate as far as I could tell (though perhaps one that used somewhat shaky methods to reach customers), he was confused why Apple was sending him email that he didn't ask for. It was his belief that the mail had actually come from Apple, because it looked like it was from Apple.

    Non-nerds have proven to be extremely difficult to educate on the concept that "what email claims to be is not always what email is, and where it claims to come from is not always where it really came from". During the recent Klez outbreak, I even received a message from a nerd-friend saying that he thought my machine might be infected, because he received an infected message from "me". Of course it was spoofed, because I happen to be in a lot of peoples address books, but since I haven't used Windows on the desktop in over three years, it clearly didn't actually originate with my box.

    Folks are just kinda thick about questioning the veracity of claims (hell, astrology still sells books and 900-number phone calls). And this could definitely be used for nasty purposes...and certainly will. Spammers will have a field day with this, because they can't help but seem 'fly by night' because they cannot establish a real brand name due to the disgusting nature of their busines. If they stand still, they'll get lynched. But if they can, even for a short time, hijack a real name that people trust, and offer up a too-good-to-be-true scam under that trusted name...well, you see where I'm going with this.

    Of course, everyone here knows that unsolicited "business offers" by email are always scams run by filthy people...but my grandmother doesn't know it, nor do my parents or many of my non-nerd friends for that matter.

    Just a thought. We'll see how it plays out, I reckon...

  8. Unicode Environments by saveth · · Score: 4, Insightful

    I develop applications for a DSP company, and we've recently switched to using Unicode in our products. Unicode certainly has its quirks, and this is one of the more obvious ones. I fail to see why it has been implemented so widely, without very, very rigorous testing.

    Actions like the one described in this article could bring down a company, if a person tried hard enough. Of course, Microsoft could just call Verisign and ask them to remove the Cyrillic domain, with no problems. But, for a small company, it could be hell. An entire user group using the same character set to access a certain website would be sent to a different site. In a worst case scenario, anti-company propaganda might be posted on the spoofing site, and it would deter people from visiting the "real" site in the future.

    The only solution I can imagine is to simply prevent the translation of characters among character sets, especially in this sort of environment.

    A Russian site, such as The Moscow Times, could have its site spoofed in exactly the same manner, and everyone using the Cyrillic character set (obviously, widely used in Russia, for example) would be sent to some other site, possibly indefinitely, knowing how registrars have been acting lately. This would create havoc for the newspaper and significant hurt revenue.

  9. Re:Terminology whine by os2fan · · Score: 3, Insightful
    Russian Cyrillic is not redundant. The other languages that use cyrillic letters have different letters, (eg Ukrainian has an "I", instead of the back-to-front N), and some of the Russian letters are uniquely Russian.

    --
    OS/2 - because choice is a terrible thing to waste.
  10. DNS was, and is, an ugly kludge by Sanity · · Score: 4, Interesting
    Amazing how many comments betray the fact that people haven't read the article.

    At the moment these unicode domain names will not be displayed correctly by web-browsers, rather you will see a bunch of cunfusing control codes, so this threat isn't really a problem yet.

    Of course, the underlying problem is that DNS is an ugly kludge which has long-outgrown itself. The administrative cost of constructing a massive global namespace is vast, and we can all see the opportunities for cyber-squatting it creates, to the detriment of the public interest.

    These days I am more likely to go to Google and type in a few words, rather than try to guess the URL. The task of finding the website you are interested in should be left to the specialists (like Google and other search engines), we shouldn't try to maintain an ugly, broken, monopolistic, and expensive "first come first serve" architecture like DNS.

    There is no good reason why a web user should ever need to see a URL (except perhaps momentum), any more than they need to see the HTML which makes up a document.

    1. Re:DNS was, and is, an ugly kludge by NoMoreNicksLeft · · Score: 3, Insightful

      If there is a demand for a service which locates the authorative websites of corporations, then capitalism will provide. This is a lame argument specific to the way Google happens to work.

      If there is a demand for something that we already have at this time, for free and with no effort? In other words, you would like it if I paid for something I already get now for free... well, if you can't find a good business model, why not create an artificial one?

      What about the cyber-sqatting, cost, and creation of private monopolies? DNS is an ugly ugly solution to the problem of finding IP addresses.

      Cyber-squatting is simple. Outlaw domain parking, domain transfers, false advertising (which is what registering www.books.com and pointing it at a porn site is), and enforce trademarks. If you want a domain, then use it. Use it for something other than pointing yet another name at your lame web site. Only allow registrations and de-registrations... if someone wants to try and sell the domain and someone else wants to pay money for it fine. But they don't get it, it just goes back into the unregistered pool. And if someone has a valid trademark (microsoft is valid, computers.com isn't) by all means give it back to the trademark holder. Duh. DNS is pretty handy for finding IP's, actually. It just isn't as good at making websurfing as effortless as you'd prefer. Or for keeping people from being assholes and polluting the namespace, I should add.

      Market forces will create a demand for comprehensive search-engines which aren't biased, in fact, they already have.

      Dumbass. On a fresh install of the browser of your choice (or lack thereof), you can't get everywhere you want to go only by clicking links. If the url field is hidden or disabled, which you advocate, you'll be reduced to clicking a toolbar button or a pre-loaded bookmark. I'm sure one such will be a searh engine... but with M$ can you count on its integrity?

      What the hell are you ranting about? This has nothing todo with whether your ISP supports cgi.

      So sorry, I thought you might have the ability to understand non-monosyllabic words. Let me try again...

      I-S-P bad. No like us have nice web names. Must use bad homepage **DAMN* ... It can't be done.

      I'm tired, so I'll try to make this clearer. If users are only ever allowed to use crappy homepage webspace, of course half the URL's on the net will be long and ugly. I also failed to mention that many commercial sites have bad web design... this accounts for the other half being ugly.

      And if I got off on a rant, so what? I see someone like you talk out of your ass, I become a little bit upset. Well, guess what? If you want to add another protocol, pick a port number and get to work. I won't stop you. But stop ranting yourself about how the current ones are ugly, when you have no clue why they are even like they are.

      DNS isn't broken, and it isn't ugly. As a protocol, it is highly distributable, robust, and solves the IP-human readable name problem as well as anything that has ever been published. It is the foundation of many protocols and services available on the internet, only one of which is the web. We don't need a seperate, incompatible system for the web, and you've offered nothing that would suffice for anything but that, and even then only poorly.

  11. Re:Terminology whine by RelliK · · Score: 4, Insightful
    The Cyrillic alphabet was developed a long time ago by a religious man (guess what his name was), because the Russian peoples he was trying to convert had no written alphabet

    That is false. Russian people had alphabet long before Cyrillic. Incidentally, that should really be proto-Russian, or Eastern Slavic since the people diverged into Russian, Ukrainian, and Belorussian much later.

    So it could be said that "Russian Cyrillic" is redundant.

    It is not. There are several "dialects" of the Cyrillic alphabet. They are mostly the same but a few letters are different. I already mentioned three of them above. There's also Bulgarian, Serbian, and I'm not sure what else.

    I seriously doubt the the "c" and "o" characters mentioned in the article are unique to the K018R charset

    The charset is called KOI8-R. Or are you using the l33t sp3lling?

    --
    ___
    If you think big enough, you'll never have to do it.
  12. Re:The site by Servo5678 · · Score: 3, Funny
    Hey, that URL is infringing on my copyrights! It's similar to my business's name, Bq--at77w373jih7xepx7om7p6zx7oq Enterprises, Inc.

    Lousy cybersquatters...

  13. i know you're being funny, but... by Anonymous Coward · · Score: 5, Interesting

    I believe it would be something along the lines of .

  14. Re:It shouldn't really be a problem. by GigsVT · · Score: 4, Informative

    Most people just blindly click OK, because it is usually OK.

    A lot of small e-business sites want to use their hosting provider's cert, but don't want the user's browser to display the hosting company's domain rather than their own. (Yes I know it's stupid, people are picky as fuck when you are making web pages).

    Anyway, that causes the browser to warn that the cert is not valid for the domain it is being used in.

    It's kinda possible to get around this using frames, but then the browser might say something about mixed secure and unsecure items on a page. The only real way to do it right is to just let the users see the hosting provider's address, as far as I know, or have the site buy their own cert.

    --
    I've had enough abrasive sigs. Kittens are cute and fuzzy.
  15. Are international domain names even necessary? by ukryule · · Score: 4, Insightful

    From the article:

    But are international domain names even necessary? Kuhn, who is German, doesn't think so: "Familiarity with the ASCII repertoire and basic proficiency in entering these ASCII characters on any keyboard are the very first steps in computer literacy worldwide."

    That's like saying basic numeracy is the first step for computer literacy worldwide, so we should go back to using IP addresses!

    Currently email addresses and URLs are the only reason a native Chinese speaker needs to use ASCII. For someone from Germany, ASCII is pretty easy to handle, but for a lot of languages, Unicode URLs & email addresses are very necessary ...

    1. Re:Are international domain names even necessary? by plumby · · Score: 4, Insightful

      What if the Internet had started in China? Would you be happy to learn the Chinese alphabet in order to enter URLs?

  16. IDNC3 by Russ+Nelson · · Score: 5, Informative

    Dan Bernstein has a proposal for internationalized domain names which solves this problem and many other problems. It's called IDNC3. IDN stands for ``internationalized domain name.'' C3 stands for ``clean, careful, conservative.''

    --
    Don't piss off The Angry Economist
  17. Who needs a paper... this is irrelevant by wadetemp · · Score: 4, Informative

    1) Some people are not good at spelling, and wouldn't know microsoft.com from microssoft.com, especially if it's just seen in a few quick glances.

    2) There are more TLDs out now, and the same name at a .biz or .info TLD does not mean it is the same company... but no doubt alot of people think that's true.

    3) There's always the old numeral "1" swapped for the lowercase "L" or the uppercase "I", trick, among other similar things that never involved Unicode, but rather human vision and high-resolutions.

    4) The "@" symbol in the URL trick, like http:\\microsoft.com\moneyfrombil@haxor.com?action =allyourmoneyarebelongtous

    So if you haven't figured out my point yet, a good percentage of people that use the internet are going to be fooled by far simpler feats of social engineering. Who needs Unicode to do it?

  18. Re:WHY THIS IS IMPORTANT - It's already been done by JesterOne · · Score: 4, Informative

    Even better... I seem to recall a scam that did just that with paypal. They sent out bulk mail about updating your account or something but the link was not paypa(lower case 'L').com but paypa(Capital 'I').com and had made a carbon-copy of paypal's website, hoping you would log in. The address in the location bar looks identical for both. This sounds like the same kind of thing but using Unicode to make the spoof.

  19. Think of the fun you could have with this! by chabotc · · Score: 3, Funny

    Ok, first take microsoft.com (alternate spelling), name your mail gateways identitcal to microsoft's, and then send out emails (as balmer@microsoft.com?) to a lot of MS employees, telling them to remove IE from XP ..

    From there on, it only gets better and better. Think of the countries you would be able to influance, technology developement you could steer, and leaked memo's you could fabricate..

    Damn i wish i had thought of it ;-)

  20. Different behaviour on different TLDs by ukryule · · Score: 3, Interesting

    One way to control this would be to restrict the valid characters based on the TLD.

    So for example '.uk'/'.au'/'.us' etc. can ONLY have ASCII 2nd level domains. '.de' Can only have German characters, '.fr' only French, and so on ...

    Then for completely different character sets, you have new Unicode TLDs (Arabic, Greek, Chinese), which can only have their relevant characters.

    I guess you leave .com/.org./.net as ASCII, although they are meant to be global they are based on the Latin character set.

    Of course, this adds complexity - but you can do all the testing for validity when the domain is registered (i.e. a web client can request any URL, but dodgy mixed character set domain names cannot be registered).

  21. Re:cyrillic trivia Re:Terminology whine by os2fan · · Score: 3, Offtopic
    I'm aware of all of this. But even in the soviet empire, there were extra letters. Compare this in the west, where Icelandic still uses thorn and etha. Thorn was used in english before the latin alphabet arrived, and continued afterwards. edda or etha is a crossed d. Capital thorn looked something a Y with a vertical left stroke. Hence "Ye Olde Shoppe".

    Ohter english letters to fade is yoch [looks like a 3] - this is the z in Menzies = Men3ies "Menges".

    Also of note is digamma. In the greek number system, this is 6, that is, the 6th letter of the alphabet. As a letter, it appear between epsilon and zeta. Since our alphabet is derived from the greek, one notes the letter here not only looks like digamma, but preserves much of the original sound: F. Phi was an asperated p.

    Cyrillic bears a much closer resemblance to the classical greek letters, and the theta, indeeds represents an f here.

    Unicode reflects current realities. There is more than one Cyrillic Alphabet, just as there is more than one Latin alphabet.

    --
    OS/2 - because choice is a terrible thing to waste.
  22. Verisign -- the company you can trust! by Corgha · · Score: 3, Interesting
    Verisign never ceases to amaze me. The first sentence on their website is:
    VeriSign, Inc. (Nasdaq:VRSN) is the leading provider of digital trust services that enable businesses and consumers to engage in commerce and communications with confidence.

    ... so it seems safe to say that trust is the foundation of their business. Essentially, we trust Verisign to ensure that we're communicating with whom we think we're communicating, and to protect us from various forms of spoofing. They should therefore, IMHO, actively avoid even the appearance of impropriety.

    However, we all remember the Microsoft certificates they mistakenly gave out to a third party.

    Now we've got them registering another domain to someone that looks just like "microsoft.com." While it's tempting to absolve Verisign of guilt in this, I think they were asking for it. After all, even I thought of this possibility when I first heard about Unicode domain names, and I'm not the sharpest knife in the drawer. You've got to think someone at Verisign raised the possibility, but they chose not to deal with it.

    Again, one might be tempted to say that this isn't their problem, if not for the fact that they are in the trust business. As the article says, "Certification agencies (which include VeriSign) ensure that encoded names are not misleading and that the registration corresponds with the correct real-world entity." It should not be technically difficult, for instance, to build a set of lists of visually similar Unicode characters and to refuse to register domains visually identical to existing ones. Maybe they should decide to forgo a relatively small amount of revenue and to refuse to sully their reputation with such inevitably deceptive domain registrations, especially considering that they interfere with Verisign's core business.

    Of course, none of this compares to the letters they sent out trying to fool people into switching their domains over to Verisign. The other two were negligence and foolishness, but that was an active attempt to deceive from a company that's selling trust.

    It all leaves me in a bit of shock. It's not that I'm shocked to see a company doing stupid and deceitful things; it's that trust is Verisign's primary asset. Hearing about these (colossally, in my mind) stupid decisions is like hearing that GM decided to torch all its manufacturing plants and assasinate all its employees. It leaves me with two questions: "what they hell are they thinking?" and "why does anyone continue to do business with Verisign?"
  23. Re:I fail to see by AndyElf · · Score: 3, Interesting

    Domain spoofing is one are. But what if you see an email address on a business card, say @mirsft.com? How do you know what encodings are those 'c', 'a' and 'o' are in (for those with UNICODE brain-damaged browsers the address above should look like ca@microsoft.com)? Same goes for URLs, etc. Another option -- say a Swedish company registers an URL that perfectly represent the name of the comapny in Swedish. With all those umlauts and whatever-they-are-called-those-circles-over-A. And you are sitting there with a US_en keyboard -- how are you expected to type that URL into a location field in your browser?

    For the use-cases like this I think that multilingual URLs are a Bad Idea (TM).

    --

    --AP
  24. Paper Online by AstroMage · · Score: 5, Informative
    Inspite of what the heading says, the original paper is online- you can find it on Evgeniy Gabrilovich's homepage.

    That is, if you are interested in the dry, technical details... ;-)

  25. What needs to be done to solve this by Hellkitten · · Score: 3, Insightful

    Solution: Make brovsers default to displaying links to sites with non-ascii address different from regular links

    Also since link display mey be overridden by style sheets, either make the browser override stylesheets for these links.

    Display a warning when user follows one of these links

    If this warning is displayed as a popup, if the user checks the "never show this warning again" display a text that explains why this is a bad idea

    The only true way to security is to annoy your users into submission

    --
    - We are the slashdot. Resistance is futile. Prepare to be moderated -
  26. Re:Terminology whine by VP · · Score: 3, Informative

    Can you perhaps explain why KOI8 characters are out of order?

    Because they were ordered as a transliteration for the Latin alphabet (sorry, can't put it in Cyrillic): ABCDEF instead of ABVGDE.

    My guess is that this was done to easily transform Russian text written using the Latin alphabet into Cyrillic by simply flipping a bit.....

  27. Re:Right.. excpet.. SSL by Alan · · Score: 4, Insightful

    Isn't the point of the article that now you can go to a Verisign approved website for (unicode of some big company) and have it check out properly because there is a verisign cert for the site (unicode of some big company)?

    People now seem to be good at knowing that if you get funny pop ups about self signed certs or certificates not matching the url that they don't put in their credit card number... now suddenly that doesn't apply, because you won't get that, and the differences aren't as obvious as those for something like paypaI.com or micros0ft.com :)

  28. Re:Still... by arivanov · · Score: 3, Interesting

    In windows (the EU edition) - anyone. Just add the language. Your only problem is that the idiots in Redmond have yet to add a keyboard editor (something that has been present in all third party internationalisation packages since Windows 3.10). As a result you will be stuck with some extremely obscene keymap inherited from a cyrillic typewriter. Alternatively you can pick up dlls from third party cyrillisation packages made for older windows versions and violate the sanctity of the MSFT sertificate by slapping it on top of the current ones. It usually works. And you get a proper keymap.

    Under unix it is usually a bit more p*** in the a*** because most internationalisations rely on Xmodmap and it no longer works nowdays. Once again by default you will get stuck with something you cannot use unless you have a keyboard that is engraved with the alternate characters. Once again you will need to spend half an hour with vi swearing at whoever made Xmodmap not to work any more in order to get a less obscene keymap.

    --
    Baker's Law: Misery no longer loves company. Nowadays it insists on it
    http://www.sigsegv.cx/
  29. Re:Why not stick with English? by dvdeug · · Score: 5, Insightful

    I'm trying not to sound like a lingual elite-ist by any means, but can anyone really say that we shouldn't standardize on English/ASCII?

    The 5 billion people in the world who don't have English as their native language might. Some would argue that language is a cornerstone of culture, and that when a society loses their language, they lose a significant part of their culture. I've read parts of Shakespeare in German, and was very unhappy about the destruction of the writing. I know several poets of my native tongue (Poe, in particular) would be lost completely in translation. I have no interest in condeming other people to reading the great literature of their cultures in translation.

    In any case, ASCII isn't good enough for English writing. French accents are used in English writing, as well as the ae and oe ligatures. Even in modern writing, proper quotes and apostraphes are needed, and footnote daggers often show up in English writing. For specialized work, mathematics, linguistics (even of English), historical English writing and APL all have thier own body of characters outside ASCII that need supported.

  30. Re:Why not stick with English? by ukryule · · Score: 3, Informative

    I'm trying not to sound like a lingual elite-ist by any means, but can anyone really say that we shouldn't standardize on English/ASCII?

    Yes. It's ridiculous to ask people to learn (admitedly a small part of) a new language to use a computer. Just because English is taught in a lot (not all) of schools around the world, it doesn't mean that everyone is comfortable using it. A truely usable computer should be one which allows you to interact with it 100% in your own langauge.

    The internet has shrunk the barrier to exchange information, which has made diverse languages even more significant of a barrier.

    The main barrier to computer usage in a large part of the world is that it is still an elitist medium - only useable (and affordable) by the well-educated. If you are actually interested in making it easier for everyone to communicate, then the main technical issue to be solved is how to make the internet useable by anyone from any background.

    If we use UNICODE and just let accept that everyone wants to use their own language, then the internet will end up as a group of national islands of information. Each group will surf their set of native language web sites.

    This already happens. Of course people surf websites in their own language! Because you (and I) only surf the English-speaking fraction of the web, you don't see it. All that international domain names adds is that a Russian accessing a Russian website can do so via a Russian URL. What could be more sensible or obvious than that?

    If no standard is agreed upon, proprietory standards will pop up all over the place, and it'll be a huge mess. In fact this is already happening - although he's the current anti-Christ of Slashdot, the big selling point of RealNames was for non-English languages, and if you believe Keith Teare's account, he was shafted by Microsoft because they wanted to control (via their browser) the translation of non-ASCII names to ASCII URLs.

  31. [OT] not quite correct by brokeninside · · Score: 3, Interesting
    IMO, the major contribution of St. Cyrill and Methodius is not the creation of an alphabet, but their disputes with the Western church and the Pope regarding the right for the different peoples to learn and practice Christianity in their own language. Up to that point only Latin, Greek and Hebrew was used in church services...


    This was only true in Western Christendom and then only true to a limited extent. For example, in the west, the first Christian missionaries to the British Isles translated the service books of the early Church to Gaelic and other Celtic languages. In the east, the the generally accepted practice was to use the venacular. This is why some of the oldest extent copies of the Bible are in one of the Ethiopic languages, Coptic, Syrian, etc.

    The Roman canon that the liturgy could only be practiced in one of the tongues spoken by the apostles was of relatively late invention and only applied to congregations under the sole apostolic see of the west, Rome. Congregations under the apostolic sees of the east always used the venacular.

    Hence it is somewhat ironic that many eastern Churches refuse to update the liturgy from being in liturgical Greek or old Slavonic into their modern equivalents.

    Regards,

    -l
  32. Re:Because they're smart by bani · · Score: 3, Interesting

    For many language encodings the conversion to unicode is a one-way ticket, there is no roundtrip possible -- so you sometimes lose critical information about the characters.

    It's also disappointing that unicode forum dropped their official JISUTF tables. There is no longer any official translation table for japanese encodings to unicode. It's the wild west for asian languages in unicode (ever wonder why no asian data systems use unicode?)