Slashdot Mirror


Registrations Now Accepted For Asian Domain Names

Eric Sun was among the first to point out that as of Thursday evening, VeriSign has begun accepting Chinese, Japanese and Korean domain names. "This increases the possible characters from 37 (26 letters, 10 numerals, and hyphen) to 40,282. Find more information [see this AP story]." snrsamy points to the same story as featured on C|Net . jamie suggests reading the technical lowdown at VeriSign.

138 comments

  1. Re:We need a net Pat Buchanan by Lard+Kano · · Score: 1

    To help us keep the internet English compatible.

    Come on, we invented it, we populated it, we control it, and now the Asian hordes are trying to subvert it.

    Let them make their own internet.

    Not to mention some of the domain names may belong to Al Gore :)

  2. Re:TLD by tomjgroves · · Score: 1

    But what's the POINT?? It just totally screws up existing protocols that have been tried, tested and proven to work totally well. And what if we want to browse to an asian site? Seriously, lets say you have a friend in China and he has an article up that you want to read that's in english. HTF are we supposed to get to it? Wheras these guys can view the entire internet, they also have their own "private" section that only very few of the western world can access. tom

  3. Re:Quick (maybe stupid) question... by jayhop · · Score: 1
    Basically you need a pair of 8 bits for one Chinese character (the first bit for identifying this is Chinese and the other 7 for actual encoding). So each character tooks two bytes.

    However, Chinese are commonly known as more concise than English or other languages with a small character set. There are thousands of commonly used characters each of which have the function of a word in English. Many characters have more than one meaning, and their combination (2 characters in most case) makes new words. And don't forget the amazing flexibility in the Grammar system (e.g. fewer stop words like "the")! We are not even talking about the ancient Chinese which is much SHORTER.

    Give me any sentence with more than 10 English words (with no words like Yugoslavia of course), I guarentee to re-write it in Chinese in less space.

    You see, this is the basic rule of information. You increase the complexity of encoding scheme, you get more density.

    How complex this is? Well, I have to say that the 12 years' of Chinese class are a painful memory.

  4. Re:Not a troll, but... by BJH · · Score: 1

    Just wanted to add that anyone who wants to know more about the whole topic should start at this page.

  5. Re:What a lot of whining! by Krilomir · · Score: 1
    > How can I enter these funky characters?
    > I dunno, just a guess, but maybe someone's already thought of this? Perhaps...

    It's easy to enter kanji if you are using Internet Explorer - just visit Windows Update and download the Japanese Input Method Editor update, and you'll be able to type kanji in your browser (using romaji I think). I don't know how you do with Mozilla...

  6. Extra RS-232 pins (unserious) by mrob · · Score: 1

    What would happen if someone said "Let's add 2 new data pins to RS232"?

    They already did:

    pin 14 STD Secondary transmit data
    pin 16 SRD Secondary receive data
    (also pin 19 SRTS Secondary RTS, pin 13 SCTS Secondary CTS, etc.)

    These pins can be used to double the amount of data sent through your RS-232 cable, which would be useful if you decided to (say) switch from 8-bit characters to 16-bit characters.

    It's not an RS-232 cable unless it has all 20 wires!!! (-: (-:

    --
    Lawyers: The Other White Trash.
  7. Re:Big5 or Unicode by Rob+Parkhill · · Score: 3

    Since nobody seems to want to read the article, or research any of the info, here is the quick low-down (since I have to deal with this at work right now...)

    - This solution is only for web browsers. It requires a special version of a web browser, or a plugin, to be able to use the new encoding scheme. It won't work for email, ftp, telnet, gopher, etc, unless a special version of the program is written.

    - DNS doesn't break. DNS still uses ASCII. This scheme uses RACE to encode the multi-lingual character set into ASCII. NSI will put a small prefix at the start of the domain name to identify it as multi-lingual (for example eq- would be found at the start of the domain name. The exact prefix has not yet been released to prevent squatters from snapping them up.)

    - The special browsers will detect the prefix, and translate the ASCII gibberish into the specified multi-lingual character set. The browser also does the conversion back to ASCII to allow a DNS lookup.

    - WHOIS does not/will not support this. You can only use WHOIS with the ASCII encoded gibberish.

    - This is not supported by the IETF. This is a custom solution implemented by NSI. But it looks like they are going to be WAY behind schedule in actually rolling this out.

    - They are accepcting registrations right now, but none of these names will resolve for at least a month, probably much longer. In other words, the system isn't useable yet, but NSI can collect money.

    - The IETF is working on their own, probably completely incompatible system, to do the same thing.

    --
    "Tomorrow's forecast: a few sprinkles of genius with a chance of doom!" - Stewie Griffin
  8. The United States lost the Vietnam War. by prodeje · · Score: 1

    Although according to the gov't, it was never a war at all.

    --

    Bitchslapped? Give Rob a bitchslap from bitchslapped.com.

  9. Re:What a lot of whining! by Stiletto · · Score: 2

    Well, if your only connection to the Asian population is spam email, this should make your isolationism even more simple: the standard uses a standard prefix for RACE-encoded domain names; block those and you're in arrogant English/USian bliss.

    Blah. Spare us your arrogant anti-English/US attitude.

    Fact is, it is conveniant to be able to block certain top-level country codes at the business gateway (or ISP) in order to cut down on spam.

    Incidentally, someone's connection to the Asian population is most likely NOT through spam, since most spam coming from asian top-levels is actually just U.S. spam--either routed through someone elses mail system, or with spoofed headers.

  10. Re:RFC by jsproul · · Score: 1

    Whether or not this is compliant with one or more RFCs, it is entirely noncompliant with most. Internationalization of the Internet is inevitable and a Good Thing(tm), but only when it takes place via the appropriate processes. As others have pointed out, internationalization was already happening, but it takes time.

    This is nothing more than an attempt by NSI to open another huge revenue stream without any consideration for the effect it will have on the Internet, or the long-term interests of the Internet community. After all, they see an untapped potential market and a chance to dominate it by jumping in before the standards are developed that would allow others to participate. Now their competitors will have to follow their lead or risk losing the market, and the standards process will have been neatly circumvented. The cost is borne by the Internet community, and the benefits are reaped by NSI.

    Why did I vote for Nader? Now I remember...

  11. Re:RFC by jafuser · · Score: 2
    The > and < symbols are not part of the RACE string. I tried typing in anime () into their "Multilingual Conversion Tool" and got the following result:

    Input String
    Utf-8

    Prepared String
    Utf-8

    Registration String
    RACE
    bq--gcrmxyi

    --
    EFF Member #11254

    --
    Please consider making an automatic monthly recurring donation to the EFF
  12. Re:Thats a lot of characters... by Sensor · · Score: 2

    The problem isn't necisarily with buffer overflows, read bug-traq...

    there was a report a couple of weeks ago regarding a problem with internationalised IIS's where unicode representations of directory traversal codes (.,/,\,etc) where being substitued after access checks had been applied...

    Now imagine domain based trust relationships - these will be implemented in numerous sub-systems (tcp wrappers, .rhosts, sendmail.cf, etc...) each of which may perform the normalisation/access checks slightly differently.

    I imagine that this will lead to numerous security issues due to slight differences in systems support for multi-byte characters.

    Another question (which I suspect will be answered in the FAQ) is do you need to register the same domain name several times to take account of the differing unicode byte widths?

  13. Re:RFC by Megane · · Score: 4
    Okay, for the PDF challenged, it seems to not be an RFC, but to be compliant with the current RFC spec, in consideration of RFC2825, which points out that there is simply too much software out there which will break when given UTF-8 domain names.

    How it works is there is a special prefix "<rp>" (or maybe this just represents the prefix, I can't really tell from the PDF, but I didn't think < and > were valid domain name characters) that indicates a part of the domain is encoded, followed by the encoded name which only uses ASCII characters, and includes information about which character set was used (Unicode, SJIS, etc.). The algorithm is called RACE, Row-based ASCII Compatible Encoding.

    A couple of examples were given for both a domain name and a server name:

    <rp>45dfg62de34432.COM
    <rp>3df45gd345.<rp>45dfg62de34432.COM

    So I guess you can set your spam filters to block any domain starting with <rp>! :)

    --
    #naabhaprzrag, #sverubfr-000, #agi-fcbafberq, negvpyr[pynff*=' negvpyr-ary-'] { qvfcynl: abar !vzcbegnag; }
  14. This is not meant to sound xenophobic, but by fatphil · · Score: 1

    I think this is 'a bad thing'.
    I don't think standards like this scale well.
    What would happen if someone said
    "Let's add 2 new data pins to RS232"?

    I live in a country where we want 3 extra symbols to accomodate the language. They're all in Latin-1, of course. I don't even think that expanding to 8-bit Latin-1 is necessarily a good thing, let alone introducing an entirely new character encoding (16-bit) to the scheme of things.
    I don't want to be
    "f\0a\0t\0p\0h\0i\0l\0.\0o\0r\0g\0\0"

    We don't let Russian trains into central Europe (the tracks are wider), why should be let Kanji into our character sets. (Yes, I know Russian trains do come to Europe, I live at the end of one of the lines, just not central europe!)

    Anyway, here's to 3-bit serial lines...
    (Could I patent that? I'd need to design an IC of flip-flap-flup-flipflop-catflap-flatcap-fatcat-flo ps first...)

    FatPhil

    --
    Also FatPhil on SoylentNews, id 863
    1. Re:This is not meant to sound xenophobic, but by dentin · · Score: 1

      > It's called evolution. Things weren't implemented properly the first time. Now we're correcting that. A lot of modern computing was invented in
      > English speaking countries, it's hardly any wonder our systems can't cater for the rest of the world. It seems rather unfair to put them at a
      > disadvantage. Besides, they will eventually force a change, and we don't want incompatibilities now, so we? Personally, I can't wait for
      > everybody to move to Unicode - it will make life as a software developer easier.

      It's called evolution. The idiogramic languages weren't implemented properly the first time. Now, they should be correcting that. A lot of modern computing was invented in english speaking languages, partially because it's so much easier to develop with an alphabet of under a hundred characters that fits conveniently in a byte. It seems rather unfair to tear down those years of development and put programmers at a disadvantage because users of idiogram languages refuse to construct a proper alphabet. Besides, eventually they will change anyway, for reasons of international communication and improving literacy - why force programmers to introduce incompatibilities over it? Personally, I can't wait for unicode to die a horrible death. It was a brain damaged idea from the start.

      -dentin

      --
      Alter Aeon Multiclass MUD - http://www.alteraeon.com
    2. Re:This is not meant to sound xenophobic, but by Malc · · Score: 2

      It's called evolution. Things weren't implemented properly the first time. Now we're correcting that. A lot of modern computing was invented in English speaking countries, it's hardly any wonder our systems can't cater for the rest of the world. It seems rather unfair to put them at a disadvantage. Besides, they will eventually force a change, and we don't want incompatibilities now, so we? Personally, I can't wait for everybody to move to Unicode - it will make life as a software developer easier.

  15. Not a troll, but... by Speare · · Score: 3

    Will moderators shoot down the fact that I mention Microsoft?

    Windows has had a CJK-capable kanji input scheme for years. CJK: Chinese, Japanese, Korean. Windows also has had bidi (bidirectional) support for right-left and/or top-bottom languages, including Hebrew.

    If you have the appropriate cjk-input features installed, it's just a funky keyboard shortcut to open it up to enter kanji. If not, you'll probably be limited to clicking on visible links, not entering domain names or other text by hand.

    I don't know what features Linux has to handle EFIGSS (English, French, Italian, Swedish, Spanish) differences, nevermind bidi or kanji input.

    --
    [ .sig file not found ]
    1. Re:Not a troll, but... by BJH · · Score: 3

      Kanji are usually input under Linux with kinput2 (although Netscape has always had a few... problems... in dealing with them). Luckily, Mozilla is much better in this respect.
      Some programs, like Emacs, communicate directly with the Japanese conversion server (canna, Wnn[4|6], ATOK, etc.), but there are very few apps which can do this.

  16. Re:Don't let them ruin the Internet by Lard+Kano · · Score: 1

    What happened to "AND"? I want and back!!!

  17. You think prostitution is funny? by prodeje · · Score: 1

    Man, you're pretty fucked up.

    --

    Bitchslapped? Give Rob a bitchslap from bitchslapped.com.

  18. Re:IMO about time by RCobbett · · Score: 1

    Um...just out of interest how often do you go to Asian sites? An estimate will do - maybe once since you first logged on? For the vast majority of Internet users in the West this will have no effect whatsoever at all because the vast majority don't speak Chinese, Japanese and the other languages which use different alphabet sets. The people affected will be the ones whose alphabets are being introduced, and therefore the ones who are likely to find it convenient not to have to use our system. The sites run by companies such as, say, Sony will continue to have sites which can be easily accessed by the rest of the world. A very black day for the net? Not really. More a sign that the system doesn't have to be designed by Americans for Americans.

  19. RFC by Megane · · Score: 2

    So is there an RFC on how this works?

    --
    #naabhaprzrag, #sverubfr-000, #agi-fcbafberq, negvpyr[pynff*=' negvpyr-ary-'] { qvfcynl: abar !vzcbegnag; }
    1. Re:RFC by sheckard · · Score: 1

      There is a technical whitepaper on the Verisign site... just look above...

    2. Re:RFC by snorked · · Score: 1
      The current memo (pre RFC) from the Network Working Group can be found at: ftp://ftp.isi.edu/in-notes/rfc2825.txt where it clearly states that the matter of UTF-8 names are solely up to the IETF. (second paragraph of the Abstract section)

      The IETF draft (clearly not an RFC) on the matter, dated 28 June 2000 can be found at: http://www.i-d-n.net/draft/draft-ietf-idn-requirem ents-03.txt

      The remaining questions are a) NSI has no control over the TLD for each respective character set, so why are they offering these? b) why are they polluting the .com, .net, and .org TLDs? c) if you already own "wine.com", does this mean they're willing to give the UTF-8 translation to Joe Blow so he can hijack all your asian client and ruin your otherwise good name?

      Clearly this is not well thought out at all.

      Please peruse this: http://www.emarketer.com/enews/reuters/11_09_2000. rwntz-story-bcnetinterlanguagedc.html?re f=dn and come up with your own conclusing as to the real reason why. (hint: third paragraph)

      --

      I wasn't there. You can't prove it. So nahhhhh. :-PpPppP

    3. Re:RFC by whaley · · Score: 1

      The RACE draft says that:

      - Host parts that have no international characters are not changed.
      so it should not be possible to RACE-encode a domain name in order to hijack it.

      Ofcourse its still possible to describe slash dot in Chinese and register that name :)

      See also
      http://www.i-d-n.net/draft/draft-ietf-idn-race-0 2.txt

    4. Re:RFC by Spyffe · · Score: 1

      Though what you say is true, it would still be interesting to see how they deal with the fact that, say, Japanese character sets provide for full-width alphanumeric characters, which, although they look the same as A,B,C,etc... except for their width, have a different encoding.
      In addition, there's the inherent difficulty in the fact that a Chinese website using a Simplified Chinese set of ideographs could hijack surfers wanting to go to a site with the same name, but with Traditional Chinese ideographs.
      In Japanese, there are hiragana, katakana, and kanji. The first two are phonetic alphabets, and the third is an ideographic alphabet based on Chinese characters. Generally, input methods convert from the first to the second, often selectively, so difficult ideographs are replaced with simpler phonetic symbols, though the meaning remains. One word could have lots of representations, and still mean (and read) the same!
      These issues should have been thought out before NSI started this idiocy.

      --
      Sigmentation fault - core dumped
    5. Re:RFC by Speare · · Score: 3

      The rp is a variable. The first couple pages notes that the implementation-testers should assume that the "RACE Prefix," or rp, should be "bq-".

      --
      [ .sig file not found ]
    6. Re:RFC by snorked · · Score: 1
      To all the folks out there who are saying "but UTF-8 doesn't directly translate to ascii!", I wonder why NSI ("for as little as $49!") will "translate" a name for me.

      So, this says that if I have foo.com, I can translate that into any of the four alternate character sets.

      Someone remind me how this isn't sanctioned and assisted domain hijacking again?

      ref: http://www.networksolutions.com/promotions/offers/ multilingual/trans-req.html

      --

      I wasn't there. You can't prove it. So nahhhhh. :-PpPppP

    7. Re:RFC by Megane · · Score: 2
      Though what you say is true, it would still be interesting to see how they deal with the fact that, say, Japanese character sets provide for full-width alphanumeric characters, which, although they look the same as A,B,C,etc... except for their width, have a different encoding.

      True, they say that any name part consisting entirely of USASCII characters are not allowed to be encoded this way, but they would have to go out of their way if they wanted to ensure that double-wide SJIS romaji were not confusingly registered. Then again, we can already do "s1ashdot.org" with just plain ASCII.

      In addition, there's the inherent difficulty in the fact that a Chinese website using a Simplified Chinese set of ideographs could hijack surfers wanting to go to a site with the same name, but with Traditional Chinese ideographs.

      IIRC, in Unicode, Chinese and Japanese ideographs all map to the same code if they're basically the same character, with the differences considered font-specific. In the extreme case, one common radical is rendered with one less stroke in Japanese, which could have created hundreds of extra codes.

      Most simplified kanji/hanzi should be unique, but a few, at least in Japanese, use an already existing, more common character. Generally, though, this won't be a problem if Unicode is used.

      --
      #naabhaprzrag, #sverubfr-000, #agi-fcbafberq, negvpyr[pynff*=' negvpyr-ary-'] { qvfcynl: abar !vzcbegnag; }
    8. Re:RFC by truthsearch · · Score: 3

      Kind of ironic the algorithm is called RACE, isn't it? Can we filter by RACE? Can we browse domains of only a certain RACE? Can it be enhanced with RACISM, Row-based ASCII Compatible Interface for Stereotyping Mayhem?

  20. URLs can't hold these characters by darkeye · · Score: 1

    AFAIK the RFC describing URLs limits the valid characters in a URL to basically lower and upper case letters and some marks (like slash, underscore, etc.) But not even european letters are allowed. If so, having a chinese domain name is fine, but you can't have a URL pointing to it. Or can you?

  21. Re:And these site names are entered how? by Zocalo · · Score: 1
    > you don't need a special keyboard

    Yeah, I'd kind of figured that, hence the reference to the fictional "UnicodeMap". I occasionally use character map programs for accents, and even know a few keyboard shortcuts for common ones. I can't imagine doing that for a whole line, let alone a language I don't know enough (any) to have a clue where to start looking for the character that probably can't be displayed anyway because the neccesary fonts are not installed, Chinese might as well be Martian in that respect.

    I don't really think it's going to be an issue though; NonLatinAlphabet.com is almost certainly going to register their URL in the DNS supported languages of all the countries they wish to do business in and point them to that language version of the site. Ultimately it should make it easier for users who don't have Latin keyboards to get by on the web, and this is definately a very good thing.

    English may well be the lingua-franca of the web, but why should a Chinese speaker get to a Chinese web site, hosted in China, that is displayed in Chinese by entering a URL in English. All web users require some support for Latin characters, and probably always will, but as a failsafe the reverse should apply too, and we can't fall back on IP numbers because the web is supposed to be using HTTP 1.1 isn't it?

    --
    UNIX? They're not even circumcised! Savages!
  22. Could you steal Domain Names? by decipher_saint · · Score: 1
    If I wrote "yahoo.com" in Katakana (the Japanese phonetic language for forign words) would Yahoo be able to sue me?

    What about sites that want their corporate name in all these new languages (would Yahoo have to register it's name under all the new languages?). Is there a market for this kind of registration?

    Capt. Ron

    --
    crazy dynamite monkey
  23. Re:Oh great, Japanese URLs, just what we need. by DrWiggy · · Score: 2

    Had you ever actually considered what using the Internet must be like for non-English speaking countries? Probably something equally unpleasing to the eye.

    Seeing as the Internet is supposed to be the medium that allows a break-down of barriers between nations and a free flow of information, don't you think that it might be a good idea to include as many languages as possible rather than exclude anybody who doesn't use a language that conforms to your standards?

    I think you need to realise now, that English is not the only language in the world - in fact we're in a vast minority. It's possible that at some point enough people will undertake the task of learning enough foreign languages to free up communication between ourselves, and perhaps ulitmately one language will be considered the accepted standard - however, don't expect that to be English.

  24. Spamming floodgate by AntiPasto · · Score: 3
    Man I thought the long IP http://2034890234890294 thing was annoying... now I won't be able to make sense of *anything* in their damn spam. Oh well... another clue to hit delete.

    ----

  25. Re:IMO about time by Tet · · Score: 2
    It's nice to see that the global part of the Internet is still spreading...

    No, it's not. This is one of the most brain dead decisions ever made, in the name of political correctness, with complete disregard for the practical issues. The effect of this will be to reduce the global appeal of the web, not increase it. Western surfers will now effectively be cut off from many far Eastern domains. Sure, there's a reasonable workaround for entering non-ASCII domains on an ASCII keyboard, but it's too complex for the general public, and far Eastern companies are unlikely to publish the ASCII-fied domain anyway. This is a very black day for the net...

    --
    "The invisible and the non-existent look very much alike." -- Delos B. McKown
  26. Re:Unicode usage in Japan is close to 0% by Dahan · · Score: 1
    Characters are sorted according to the Japanese alphabet ordering (Unicode uses random ordering)

    No, the Unicode hiragana/katakana ranges are ordered in standard Japanese ordering, and the kanji in the CJK range is ordered in Chinese dictionary order (radical first, then stroke count). You do know that kanji means Chinese characters, right? It's not unreasonable to order them the Chinese way.

    In IE or Netscape, look under the encoding menu. You will find 3 choices; Shift-JIS, JIS, and EUC.

    Well, I also find Unicode (UTF-8) in IE, and both Unicode (UTF-7) and Unicode (UTF-8) in Netscape. You need to realize that Unicode is for displaying all languages, not just Japanese.

    Most Japanese experts on this subject view Unicode as an unwanted Western imposition.

    True... also known as "Not Invented Here".

  27. Learning Asian by TheAmazingGoat · · Score: 1

    It's a shame that this happened now, instead of 5 years ago. I bet if I had spent the last 5 years on a net with Asian characters in their domain names, I would've learned more than a few words in the language just from exposure. (The only real way to learn a language, imo.)

  28. Quick (maybe stupid) question... by Kierthos · · Score: 1

    How are you supposed to be able to type all 40,000+ new characters? Are we going back to Escape-Meta-Alt-Shift for an upper case 'Q'?

    Kierthos

    --
    Mr. Hu is not a ninja.
    1. Re:Quick (maybe stupid) question... by King+of+the+World · · Score: 1

      Don't they have glyph composite symbols or something?

    2. Re:Quick (maybe stupid) question... by fatphil · · Score: 1

      Hmmm, the Chinese on the menus at my local Chinese resautant in Cambridge took up about 4 times th space of the English. The characters had to be twice the height as well as wider than the Latin characters due to resolution issues. Maybe they were just being more descriptive, but they seemed to have the same redundancy in them as the English, so I assumed they were in exact equivalence.

      I can't agree with your "basic rule of information". I can see nothing about it in my copy of Cover and Thomas. Kolmogorov or Chaitin have stuff to say about this kind of thing.

      FatPhil

      --
      Also FatPhil on SoylentNews, id 863
    3. Re:Quick (maybe stupid) question... by TheLink · · Score: 1

      It's possible that the menus you mention say more descriptive or fancy stuff in Chinese than in English.

      e.g. Steamed fish vs steamed red snapper in soy sauce ;).

      As for the spoken language, chinese is actually easier for human ears in noisy channels/environments than english because you can detect the changes in pitch. Whereas in english much of the pitch component is "wasted".

      Cheerio,
      Link.

      --
    4. Re:Quick (maybe stupid) question... by truthsearch · · Score: 3

      My Chinese co-worker has informed me that to type Chinese, he sets the desired language in whatever app to Chinese and then types phonetically. The problem is that even phonetically there are many similar words, so he basically types a few English letters to verbally spell out a word, then Chinese characters appear on the screen which he must then choose. He tells me there are also special keyboards where you hold down multiple keys.

    5. Re:Quick (maybe stupid) question... by BJH · · Score: 1

      Most CJK-capable computers use a pretty standard QWERTY layout...

    6. Re:Quick (maybe stupid) question... by jafuser · · Score: 1
      I'm not familiar with Chinese, but I am studying Japanese writing, and if Chinese has the same general system, then it may be all phonetics, in which case it will probably take the same or more room than the roman languages to write out.

      --
      EFF Member #11254

      --
      Please consider making an automatic monthly recurring donation to the EFF
    7. Re:Quick (maybe stupid) question... by Kierthos · · Score: 1

      Well, I know that some of the Oriental 'alphabets' have numerous different ways to represent the same concepts, but hwo would using glyph composite symbols help (if I understand what you mean)?

      Just because there exists in a language two symbols 'blah' and 'thingy' so that 'blahthingy' means something else doesn't mean that this standard will adopt it. It's much more likely to use the 'common' kanji. (Ob note: There's only about 50 different Japanese characters for dragon from a quick search on lycos... or some really poor kanji writers).

      That being said, it would be impossible to set up all possible configurations where composite symbols would redirect to the 'obvious' site. (i.e. www.golddragon.com, no matter how it's spelled in kanji or whatever would not necessarily all go to the same site.) It would be a neat trick if it could, but it would require registering dozens of permutations.

      Kierthos

      --
      Mr. Hu is not a ninja.
    8. Re:Quick (maybe stupid) question... by fatphil · · Score: 2

      There was such a thing as a Chinese Typewriter. It had 300 keys and required multiple presses (Shift, Ctrl, Meta, Alt, Hyper etc. style)
      to generate characters.

      This is a really crap picture of one:

      http://acc6.its.brooklyn.cuny.edu/~phalsall/imag es/typewrit.gif

      So many keys each one is barely distinguishable from the next (that's also poor photo quality though)

      If fell into disuse fairly swiftly because it was slower than script.

      Our typewriters were invented so that they could be faster than script.

      They lose.

      FatPhil

      --
      Also FatPhil on SoylentNews, id 863
    9. Re:Quick (maybe stupid) question... by Malc · · Score: 2

      It's easy under Windows. For everything but Win2K (and ME?) you will have to download and install Global IME from MSFT. I don't know how you do this under X, or for Lynx users, in a console. I have to admit, MSFT makes it quite easy for us developers to internationalise our products.

    10. Re:Quick (maybe stupid) question... by Psmylie · · Score: 1

      Maybe we'll just have to use a really huge keyboard.

      --

      psmylie's dictionary: Godzillion (noun) Any number large enough to destroy Tokyo

    11. Re:Quick (maybe stupid) question... by fatphil · · Score: 1

      Yes. Chinese has a tonal phonetic component.
      I think you can at least have
      "middle", "rising", "falling" and "high"
      versions of the same sound, and they can all have different characters associated with them. I'm not sure if a "low" version exists in Chinese.

      Other languages have "rising then falling" versions too!

      Woooo!

      If you can't be understood over a 6bit connection then it's a lousy language!

      6 bits is expressive enough to even include smileys in English. (Though sacrificing upper/lower case annoys some.)

      5 bits has been used historically in America and the UK. You needed excape codes to flip into numeric mode.

      I think that Colossus was 5-bit. It simply had to decypher single-case letters.

      FatPhil

      --
      Also FatPhil on SoylentNews, id 863
    12. Re:Quick (maybe stupid) question... by darthaya · · Score: 2

      It is easy, you use CXterm, a special program developed to input chinese under X. And there are a number of other programs you can use to input chinese under UNIX's console mode as well.

    13. Re:Quick (maybe stupid) question... by Happy+Monkey · · Score: 1

      I've seen one of those, on Reading Rainbow many years back, I think. It looked like one of those synthesizers for an electronic musician (anyone seen Christopher Franke's portrait on the first Babylon 5 CD?). Hopefully there have been advances in Asian keyboard technology since then.
      ___

      --
      __
      Do ya feel happy-go-lucky, punk?
  29. Re:Ideogrammatic languages are a pain by chrischow · · Score: 2

    the commies tried with pinyin but it doesn't work very well because of the many homophores in chinese. hanzis are much cooler anyway and a more compact way of writing and representing data.

  30. Oh great, Japanese URLs, just what we need. by AFCArchvile · · Score: 2
    So will we have to extend ASCII to 65,536 from 256? Will legacy Japanese URLs look like "http://%0077%0077%0077.%0073%006F%006e%0075.%0063 $006F.006A%0070/"?

    And what will the new ones look like to us Americans? Ugh, I can't bear to think of it.

    --
    "Ancillary does not mean you get to rule the world." --U.S. Circuit Judge Harry Edwards, speaking to the FCC's lawyer
  31. I can't wait... by Kronos. · · Score: 1

    ... to see clueless news readers reading out a URL with all these characters in it ;)

  32. Re:Big5 or Unicode by Yardley · · Score: 3

    This is probably an attempt to force migration over to Unicode. Anyways, why is Verisign behind this? Didn't we learn from Network Solutions that a privately-owned, commercial company is not the solution to internet domain name databases (and their "ownership")?

    How can one company be granted the monopoly rights to something so important to the world's economy and everyone on the Internet again? Should this be assigned to a not-for-profit entity under the auspices of ICANN?

    --

    --

    --
    He lives in a world where those who do not run the client software of the omnipresent meme are unacceptable.
  33. And we implement this how? by Aged+Cynic · · Score: 1

    This is fundamentally a good idea for the future. It's also a prime example of the marketeers making decisions that the technology is not yet ready to support. My understanding is they're basically telling people that "we'll take your money and register your name, but if you can't use it (and you can't, for some time yet), you don't get your money back." Foo.

  34. Re:Big5 or Unicode by Speare · · Score: 2

    Since the majority of chinese users input their chinese as big5, (eg www.ê.com) will not be the same as the unicode equivalent

    I think it's probably not too difficult for the Chinese browsers to do the conversion behind the scenes. Kinda like ASCIIEBCDIC conversions; you don't need to change the keyboard to enter text of the other variety.

    Now, which one does the registrar accept, and the DNS servers cache? Read the article? From the first couple pages, it appeared that the domain name is actually not in Unicode nor Big5; it's translated to an ugly ASCII encoding.

    --
    [ .sig file not found ]
  35. Re:Unicode would be better. by Guy+Harris · · Score: 2
    Unless you want to register domain names in Klingon.

    Michael Everson of Everson Gunn Teoranta has proposed an encoding of Klingon in Plane 1 of ISO/IEC 10646-2; if it gets adopted, future versions of Unicode may adopt it (Everson's one of the editors and authors of Unicode 3.0).

  36. Re:Take your medicine.. by fatphil · · Score: 1

    Why do fuckwits hide behind AC?

    Wide guage trains physically cannot come to _CENTRAL_ Eurpoe, where the 6" narrower guage is used.
    However, I can hop on a wide guage train here in Helsinki which goes all the way to Moscow.
    You see not all of Europe is CENTRAL Europe.
    I'm sure you'd agree that not all of America is Central America. Screw it, I don't need your agreement, your opinion is less than worthless.

    Now safe me the fucking effort and go kill yourself.

    FatPhil

    --
    Also FatPhil on SoylentNews, id 863
  37. I'd argue the other way by Galvatron · · Score: 2
    Most (I would say all, but I'm not entirely certain of that) have roman alphabet representations, usually without using accents, umlauts, or what have you. So, they can represent their languages in urls, just a less commonly used form. German, Spanish, French, etc, often have words that, stripped of special characters, are written identically. On top of this, it's relatively easy to write special roman-alphabet characters on a QWERTY keypad (I managed to figure it out through trial and error), but quite difficult to type asian characters, so asian character urls will serve to make the Internet more regional.

    I have occasion to buy an international airline ticket this year, and I refuse to use priceline because they have Will Shitner doing their ads. Give me Nemoy, Stewart, Dorn, Spiner, McFadden, anyone but shitner. Blow me priceline.

    Man, you have got some real problems, don't you? Did Shatner beat you as a child or something? I mean, I'm not crazy about Troi, but it's not like I carry some kind of grudge. And you manually typed in a .sig as an anonymous coward? That's just weird.

    --
    "The question of whether a computer can think is no more interesting than that of whether a submarine can swim" -EWD
  38. Re:IMO about time by Grumpy+Penguin · · Score: 1

    So what your saying is that it's ok for non-english speaking people to try and use our ASCII system but totally wrong and inappropriate for them to have their own native language system and for us to to try and learn how to use that? It would seem you embrace the global village idea... providing it is english speaking and conforms to your native character set.


    --

    --


    --
    Democracy is the art of saying "nice doggy" while subtly reaching for a large stone.
  39. Times are changing. . . by cra · · Score: 1

    So, the era when humans could remember an easy, pronouncable name instead of an IP address is over, then?

    I guess I better start learning the numbers. . .


    ---

    --
    This message has been ROT-13 encrypted twice for higher security.
  40. Re:Why asian character sets? by AlanStokes · · Score: 2

    The proposal includes umlauts - it's based on a mapping to US-ASCII from any Unicode string. (Admittedly if you only wanted to represent a handful of European languages you'd come up with a different scheme, but it would obviously be less general.)

    Presumably they're pitching it at the asian market cos that's where they expect to make money.

    There are apparently good reasons for not allowing 8-bit characters not in US-ASCII in domain names - it would break too much.

    --
    - Alan
  41. Re:Asia Carrera by Zugok · · Score: 1
    Asia Carrera, and she runs Linux.

    I think she runs Solaris now. *sigh* a pornstar after my own heart

    --
    "I just can't sit while people are saying nonsense in a meeting without saying it's nonsense" J Watson, Sci Am 288:(4)51
  42. Re:argggg! by BlowCat · · Score: 1
    As far as I know the S-Zet is still used, but not in the words where it can be replaced with "ss" without affecting the pronounciation.

    I.e. Gie/3er remains Gie/3er, but da/3 becomes dass.

  43. Re:Why asian character sets? by mutende · · Score: 1
    Wouldnt it make more sense to implement umlauts like ö/ü/ä first?
    It's been there for a while, please visit www.whats.nu for details.

    // Klaus
    --

    --
    Unselfish actions pay back better
  44. Asia Carrera by Hairy_Potter · · Score: 1

    Hot Asian teens? I didn't know there were any. Well, maybe if you're a latent homosexual who likes flat chested, smush faced girls.

    Asia Carrera,

    and she runs Linux.

  45. Re:Dangit, now how will I get hot Asian teens? by frisket · · Score: 1

    I think there's a company in Richmond WA that claims to make one...

  46. Re:It breaks the dns-rfc. by lizrd · · Score: 2
    If I remember correctly, it do NOT allow special chars in the domainnames.

    Damn you're quick. Of course the whole point of this is to provide a work-around to that problem. All it does is make an ASCII representation of a different character set. These representations are flagged by having the hostname start with bq-. So if you run across a hostname that looks like bq-safjdlfaqwue72819.bq-hewaguifuifdajhks.co.jp you'll know that the hostname probably makes good sense to anyone who has a Japaneese web browser. If you are in the habit of reading such pages you'll get the appropriate plugin. If you don't have the plugin, you probably couldn't read the content anyway and believe you me, there is a LOT of content on the web that's written in a language you can't read. (I'm not saying that you're stupid or anything, I'm just making the bet that there isn't anyone here who knows every language in which material has been posted to the internet, this includes Klingon)
    _____________

    --
    I don't want free as in beer. I just want free beer.
  47. Re:We need a net Pat Buchanan by Lord+Omlette · · Score: 1

    I know Al Gore invented the internet in terms of convincing Congress to heavily fund the net... and some other congressman opened up the net to commercial use... but wasn't the web invented in Europe (CERN)? and aren't domain names a big part of the web? does that mean that if we try to keep asian hordes away from the net, the europeans will try to keep the crass american lummocks from using the web? ^^;; -confused
    --
    Peace,
    Lord Omlette
    ICQ# 77863057

    --
    [o]_O
  48. Re:We need a net Pat Buchanan by Ashran · · Score: 1

    The web was not invented by americans, even if you like to belive that :p
    It has been invented at the CERN in Swizzerland .. you know thats in Europa

    --

    Before you email me, remember: "There is no god!"
  49. Re:Don't let them ruin the Internet by Sensor · · Score: 1


    ummm... DNS is only used in name resolution, packets are routed according to the IP address once resolved which is totally unrelated to the domain name - that happens right now - nothing has changed.

    If anything extending the number of TLD's will reduce latency as it will spread the load accross more servers probably on a geographical basis!

    feel free to troll its your god given right, but do try to remember that acting both jingoistic and technically ignorant in the same mail is very unlikely to get you any respect.

  50. Why Verisign is behind this. by TheLink · · Score: 1

    So that they can centralise more power to themselves.

    Verisign owns Network Solutions and Thawte.

    So they own your certs (need to be renewed) and your names (refer to Network Solutions' terms and conditions).

    And there's this push for DNSSEC, which isn't that great anyway. But it'll be a convenient tool to centralise even more power.

    Open your eyes a bit and you'll see more scary stuff.

    Soon there'll be a bigger push for certificates becoming mainstream - via smartcards and other stuff. And Windows 2000 has some nice support for that... Maybe Microsoft will buy Verisign.

    What do you think?

    Have fun,
    Link.

    --
  51. ordering RACE-encoded names with joker.com? by Zach+Fine · · Score: 1

    There are a couple of Japanese domain names I've thought of purchasing, but would rather use the CORE registrar joker.com than register.com due to the difference in price (joker.com is around $8-11 per year, depending on the exchange rate of the Euro). I was sad to see that I'd have to use register.com and spend $20 for a Japanese domain name.

    But now what's to stop me from looking through the RFC, figuring out how to encode my domain name using RACE, and then registering it using joker.com as a domain name that begins with "bq-"?

    1. Re:ordering RACE-encoded names with joker.com? by Morden · · Score: 1
      What'll stop you doing that is that the prefix will change, and your domain will be left out in the cold.

      This has already been tried - stories were doing the rounds last week of registrars doing this. When bq- changes, they'll have some very annoyed customers.

  52. Re:Why asian character sets? by Ralph+Wiggam · · Score: 1

    I'm gonna get Släshdöt.org. It has a more "heavy metal" feel to it, like "Mötley Crüe".

    -B

  53. Re:Appearance of names by Morden · · Score: 1
    On a Windows system at least with a Chinese Input Method Editor (who the hell thought of that TLA?), I *think* it will in fact display in native characters due to the IME taking control of the text input fields and clobbering the rest of the OS with a blunt stick.

    Have to try that one when I get to work tomorrow.

    (NJStar's not bad as IME's go - at least it's not a Microsoft product)

  54. Re:IMO about time by Tet · · Score: 2
    So what your saying is that it's ok for non-english speaking people to try and use our ASCII system but totally wrong and inappropriate for them to have their own native language system and for us to to try and learn how to use that?

    Yes, that's *exactly* what I'm saying. I'm not saying it because I happen to use ASCII, but because ASCII is a more natural system for computers to deal with. If Western European and American languages consisted of 30000+ characters, and those in the the East consisted of some 100 or so, I'd suggest using the Eastern system at the drop of a hat, even if it wasn't my native system. This has nothing to do with whether or not it's my native character set that's chosen, and everything to do with whether a good decision is made from a techincal perspective.

    --
    "The invisible and the non-existent look very much alike." -- Delos B. McKown
  55. What about Cyrillic and ISO-Latin? by vlax · · Score: 2

    I want to be able to register domain names in French, German and Russian too. If they are going to support all three zillion kanji and Chinese characters, they need to at least support the various Cyrillic and eastern European Roman alphabets, and the rest of ISO-Latin-1 (which covers all the major and most of the minor Western European languages.) The Persian-based alphabets (Arabic, Farsi, Urdu, etc), Hebrew and Thai are written right-to-left, so I suppose that won't be implimented right away, but it needs to be on the drawing board.

    If all those other languages are accounted for, I view this as a good thing. If this is part of an overall shift to Unicode on the web, then all these languages are automatically supported, and I would think it an even better thing.

  56. Re:DeCSS in a domain name? by titus-g · · Score: 1

    or in a url (using directories)

    the domain name wouldn't work though, they are talking alphabetic symbols rather than length of domain name, i.e. you have the existing English alphabet of 26 letters + 10 numbers & - for _each_ char of the allowed 67, now you can also use ascii encodings of asian characters as well.

    --

    ~ppppppppö

  57. Re:What a lot of whining! by Alan+Shutko · · Score: 2

    Actually, most of my spam is from Asian top-levels (mostly cn) and in some CJK encoding. (Not being able to read it, I don't know if it's _really_ US spam in a foreign language, but....)

    Furthermore, much of that spam comes through the same set of systems which never seem to do anything about it.

  58. Re:Why asian character sets? by jesser · · Score: 1
    ö/ü/ä

    argh, i want it to be easier to tell urls apart from each other, not harder.

    --

    --
    The shareholder is always right.
  59. Re:Actually, The Current Max Characters is 67... by Kierthos · · Score: 1

    I think the point of the original quote of 37 characters max is the 'old' number of characters in the symbol set that were allowed, not the length of the actual URL. And your article from 2600 lists a maximum URL length of 63, not 67.

    BTW, are hyphens and tildes inter-changeable? Because I've seen a lot of web-pages with tildes, and only some of them turn into hyphens when reloading.

    Kierthos

    --
    Mr. Hu is not a ninja.
  60. Gratuitous offensive South Park reference. by darylp · · Score: 1

    So how many ways can we now register "sucky-sucky.com"?

  61. What a lot of whining! by Speare · · Score: 2

    Within a few minutes of this story being posted, most of the posts are along the following lines.

    • Why not get European hacks like uumlauts working first?
      I dunno; maybe because the Japanese don't know enough German? Why should the Asians wait for Europe to get its act together before they solve the issues they face every day?
    • Great, now I have to see even more ugly spam!
      Well, if your only connection to the Asian population is spam email, this should make your isolationism even more simple: the standard uses a standard prefix for RACE-encoded domain names; block those and you're in arrogant English/USian bliss.
    • How can I enter these funky characters?
      I dunno, just a guess, but maybe someone's already thought of this? Perhaps the people who work in kanji all day know something about entering kanji, and have hardware or software solutions around. If you don't normally have to type it, I'm sure your browser will let you CLICK on encoded links just fine.

    Missed anything?

    --
    [ .sig file not found ]
  62. Re:Thats a lot of characters... by Malc · · Score: 2

    If it's implemented properly, surely it shouldn't matter. It's not just size of the Unicode chars, but also the big and little endian-ness. If it's implemented properly, the DNS would just determine what you're using (UCS-2BE, UCS-2LE, UCS-4BE. UCS-4LE) and convert it to it's internal representation for the lookup.

  63. Re:Actually, The Current Max Characters is 67... by SUWAIN · · Score: 1
    My understanding of the 40,000 character thing was that there were forty thousand possible characters, not the length.

    By the way, I think it's actaully 64 characters, assuming a three-character TLD. I think that they allow you to register 67 characters, but the TLD (.com, .net, .org, etc.) counts. But the dot in the TLD doesn't. Or something confusing like that...

    ...............
    SUWAIN: Slashdot User Without An Interesting Name

    --

    ...............
    SUWAIN: Slashdot User Without An Interesting Name

  64. Re:Appearance of names by ce25254 · · Score: 1

    Yeah, I assume that is the case. That's how it works with KanjiKit as well. But for those without an appropriate IME, it will look like junk. Oh well.

  65. Spam spam spam spam... by sdo1 · · Score: 2

    I guess I can look at this two ways...

    1) Oh God, there's gonna be a MASSIVE amount of spam coming from domains with characters outside of the standard 37.

    2) I can block anything and everything coming from domains with characters outside of the standard 37.

    -S

    --
    --- What parts of "shall make no law", "shall not be infringed", and "shall not be violated" don't you understand?
  66. Re:Ideogrammatic languages are a pain by titus-g · · Score: 1

    hey quit knocking pinyin, it lets us gwai lo learn easier if nothing else :P

    --

    ~ppppppppö

  67. possible to "cause damage to the Internet" by mughi · · Score: 1


    A few things worry me a bit. First there's the part of the RACE working draft where they mention that if you don't follow all the MUST and MUST NOT statements "exactly", otherwise it's "likely to cause damage to the Internet"



    Then there's the issue of the chairman of the IETF basically calling this premature...



    "Getting this work done right is more important," he said, "than getting it done quickly."
  68. Unicode usage in Japan is close to 0% by puz · · Score: 1

    You bring up a very good point. Likewise in Japan, Nobody uses Unicode. The preferred encoding scheme is Shift-JIS (JIS = Japan Industry Standard) which has been in use since 1969. The usage is probably over 90%. The reason for the popularity is that shift-JIS was designed in Japan and extremely well-planned. Characters are sorted according to the Japanese alphabet ordering (Unicode uses random ordering), and ideograms sub-divided into compulsory, common, and extended (Unicode uses random ordering). JIS and EUC comprise the remaining 10% of usage. My site uses shift-JIS. Yahoo Japan uses EUC. One should try to search for a web site in Japan that uses Unicode. I think you will find none. Even if you do, your browser will not be capable of displaying it! I'm not kidding. In IE or Netscape, look under the encoding menu. You will find 3 choices; Shift-JIS, JIS, and EUC. Let's face it. Unicode is a badly designed standard conjured by an uninformed committee. Most Japanese experts on this subject view Unicode as an unwanted Western imposition.

    --
    Download Mazes and Puzzles from www.puz.com
    1. Re:Unicode usage in Japan is close to 0% by BJH · · Score: 1

      You're mostly right, but JIS is a standard character set. The encodings used for that set include ISO-2022-JP (which, confusingly, is also called JIS), Shift-JIS and EUC. The ISO-2022-JP encoding, which uses escape sequences, is used mainly for email; EUC is mainly found on UNIX/Linux systems, and Shift-JIS was an implementation developed and used by Microsoft (and also used by Apple) that made some unfortunate alterations to the code points for half-size katakana characters, among other things, that basically stuffed up code conversions between the various encoding schemes. The best scheme for general use is EUC, in my opinion.

  69. Re:Big5 or Unicode by spitzak · · Score: 2
    Gad. We should just say that "bytes with the high bit set must be sent unchanged" through everything and scrap everything that does not obey this.

    This would allow all transports to ignore the character encoding as long as the encoding only uses bytes with the high bit for non-ascii. It also means that case-independence of non-ascii would be illegal, thus stopping the emergence of a dangerous (for security) mess of incompatable implementations of equality tests for URLs.

    This would allow us to use UTF-8 for the URL, for the page contents, for email, for everything, and we would not have this horrid mess of prefixes and mime types.

    Yes, some programs, routers, etc, would not pass this stuff through. Well, tough, those should be obsolete!

  70. Other issues: ASCII fallbacks by vlax · · Score: 2

    According to the artivle, they're working on a substitution scheme so ASCII only users can still type in the URL's. Does this mean that ASCII equivalets will be arbitrary and unintuitive? If so, that's a problem. Let me propose something slightly different:

    Unicode is not supposed to over-unify characters, so the ASCII fallback for Japanese could be the romanji transcription - and therefor registering a Japanese domain name automatically registers the romanji equivalent, except that some kanji have more than one possible romanji transcription.

    However, some kanji are unified with Chinese characters, which have a different pinyin trasncription.

    Chinese is another problem. The logical ASCII equivalent is pinyin stripped of its diacritical marks. But then, many different characters may have the same transcription.

    All Cyrillic languages also have an ASCII trasncription scheme too, but it isn't unified. One character may be trasncribed one way in Russian and another way in Bulgarian. Is there a unified transcription scheme for all Cyrillic languages, and is it truely one-to-one? I don't think so. Look at the character usually transcribed as "j" in Russian, and the one usually transcribed that way in Serbian.

    ISO-Latin-1 and -2 fallbacks: For ISO-Latin-1, the fallbacks are pretty obvious: "Champs-Élysée" ==> "Champs-Elysee" or in German "Düsseldorf" ==> "Duesseldorf", but in Czech it's a little less obvious. Does "C hacek" map to "Cz" or "Ch" or "Cs"?

    So, here is a possible solution: devise unified ASCII transcritption schemes for each language, admitting whatever ambiguities exist in Japanese or similar languages. Then, when you register a non-ASCII name, you are asked on the form to fill out the transcribed ASCII name that corresponds to it and it is also automatically registered to you.

    There is some potential for conflict here, if the ASCII transcription corresponds to an existing registered domain or, as in the case of Chinese more than one foreign name corresponds to the same transcription, but I think the problem is manageable.

  71. The Real Web by jjr · · Score: 1

    To all the people that are complaining on how this will break things. I say good this will help the industry realize that the USA is the only country on the internet. I am glad to see progress.

  72. Re:English Based Systems sending E-Mail? by Speare · · Score: 4

    So how's this gonna work for systems not set up to handle the asian character set?

    Read the links.

    The proposal implements an ASCII encoding scheme, called RACE. A certain prefix (they list the debugging prefix as "bq-") indicates a RACE-encoded domain name.

    The rest of the ASCII encoding either appears in ASCII for dumb browsers, or is converted to Unicode or Big5 or whatever character set it wants.

    For "dumb browsers" (not a flame, just an indication of character-set-awareness), you'd see some crazy domain like http://www.bq-ag0970ag00ah07h.or.jp/; for "smart browsers," it would appear in your own kanji font.

    --
    [ .sig file not found ]
  73. It breaks the dns-rfc. by arcade · · Score: 2

    Has there been an update to the DNS RFC allowing this? If I remember correctly, it do NOT allow special chars in the domainnames.

    Furthermore, does this limit those domains to 32 chars of length? (unicode, 2 bytes per char, dns system allows a maximum of 64 chars for domainnames .. but, that should probably be interpreted as bytes).

    Also, doesn't it kinda suck to make large parts of the net unavailable for most?

    --paddy
    --

    --
    "Rune Kristian Viken" - http://www.nwo.no - arca
    1. Re:It breaks the dns-rfc. by Speare · · Score: 2

      Such an authoritarian title. Are you sure? It proposes ASCII encoding, not a Unicode or other mbcs usage directly.

      Also, doesn't it kinda suck to make large parts of the net unavailable for most? Don't you think the Chinese and Japanese people could say the same thing about English?

      --
      [ .sig file not found ]
  74. Multilingual Domain Names by Smuj · · Score: 3

    A few notes...

    The Internet Society probably isn't too happy about this. They released a statement on November 8th encouraging NSI to back off and let the IETF IDN WG do its job.

    Also, there are companies that are already currently operating in this market, including WALID, which is taking registrations for Arabic domain names (AND RESOLVING THEM), and will soon be adding Hindi, Tamil, and two Chinese scripts before moving into other markets.

  75. Ideogrammatic languages are a pain by swb · · Score: 2

    Because the Mediteranneans figured out that if they came up with simple symbols that represented sounds (an alphabet) and could be strung together to transcribe those spoken words instead of sepeate ideograms for each spoken word, you could not only learn to read and write much more easily you could also write down other languages with the same written symbols.

    One of the major reasons this happened was there was they were trading with different peoples who used ideograms instead of alphabets. Since learning one ideogrammatic written language is hard enough and learning 5 is a single lifetime's achievment, a simpler way was found.

    The Chinese were heterogenous and didn't need to deal with anyone other than the Chinese and hence kept their ideogrammatic written language.

    It's a simple fact that it's far easier to implement the Roman alphabet on a computer than a zillion independant symbols -- you need less RAM, simpler displays and so on.

    What the Chinese need to do is settle on a single way to transliterate spoken Chinese into the Roman alphabet (or even the Cyrillic, Hebraic or Greek if that's what they want). Ideograms are neat, but they're a pain in the ass.

    Sorry, it's not cultural imperalism, just pragmatism!

  76. Re:how do we see it? by Anonymous Coward · · Score: 1
    will english users who don't like to put in korean and/or japanese language inputs on their box3n be completely cut off from a good deal of the net?

    You already are, unless you actually speak those languages. You didnt think the internet was all english speaking did you?

  77. Re:Actually, The Current Max Characters is 67... by ras_b · · Score: 1

    ahhh.. the light just turned on, thank you. the verizon story is still an interesting read.

  78. Re:Why asian character sets? by FigWig · · Score: 3

    Wouldnt it make more sense to implement umlauts like ö/ü/ä first?

    I have dibs on släshdot.org!!

    --
    Scuttlemonkey is a troll
  79. Kanji by dizee · · Score: 2

    w3m, the console web browser that can format tables, frames, etc, was written by Akinori Ito. He includes support for kanji. I know because there is a #ifdef PC_KANJI that is misplaced every time I go to download and compile it without japanese character support.

    I believe there is also a xterm counterpart for kanji.

    Mike

    "I would kill everyone in this room for a drop of sweet beer."

    1. Re:Kanji by BJH · · Score: 1

      The equivalent of xterm for Japanese systems is kterm, although some distros use a patched version of rxvt instead.

  80. canonicalisation issues by jbert · · Score: 3

    Hmm. This could lead to fun. Some character sets/character encodings allow different byte sequences to map to the same character.
    (See the Unicode bugs recently in IIS, where a unicode representation of '../' is used to navigate upwards in the directories of the server to view files outside of the server root.)
    Now, does a company have to register all possible permutations of byte sequences which all map to the same character sequence? As well as doing so in .com, .net and .org.
    We'll see.

  81. Re:Thats a lot of characters... by Sensor · · Score: 1

    If it's implemented properly, surely it shouldn't matter The if is exactly what I meant... after support for unicode is added to domain name encoding schemes in applications each and every application has an opotunity to make a mistake... some of them will.

  82. English Based Systems sending E-Mail? by tomjgroves · · Score: 4

    So how's this gonna work for systems not set up to handle the asian character set? Lets say I want to send to joe.bloggs@somechinesename.net from my FBSD or Linux boxes? Not too much fun, I think...

  83. Great, if not already blocked by strredwolf · · Score: 2
    This would be great for China, if half (if not all) it's mail servers didn't relay spam back to the US (and therefore be blocked independently by ISP's and by the MAPS RSS). There's been no responce out of those admins who don't have the latest software (comeon! Sendmail 8.10 is free! Why are you running the broken SMI Sendmail?!?).



    --
    WolfSkunks for a better Linux Kernel
    $Stalag99{"URL"}="http://stalag99.keenspace.com";

    --

    --
    # Canmephians for a better Linux Kernel
    $Stalag99{"URL"}="http://stalag99.net";
  84. Why asian character sets? by Ashran · · Score: 3

    Wouldnt it make more sense to implement umlauts like ö/ü/ä first?
    Easier to test etc..

    --

    Before you email me, remember: "There is no god!"
  85. Dangit, now how will I get hot Asian teens? by Hairy_Potter · · Score: 1

    If I can't type in the new domain names.

    Maybe I'd better upgrade to a Unicode compatible keyboard and OS.

  86. Big5 or Unicode by Giant+Robot · · Score: 4

    How is this going to work? Since the majority of chinese users input their chinese as big5,
    (eg www.ê.com) will not be the same as the unicode equivalent..

    1. Re:Big5 or Unicode by balls001 · · Score: 1

      Didn't Verisign acquire Network Solutions?

      Regardless, Verisign is NOT the only registrar to provide this support. OpenSRS also is registering multi-lingual domains, and I suspect there are others as well.

    2. Re:Big5 or Unicode by darthaya · · Score: 1

      You are wrong.

      The majority of the chinese users input their chinese as GB encoded.

      It is a pain in the butt to have two different type of encoding system.

    3. Re:Big5 or Unicode by Anm · · Score: 1

      While Verisign may be the first (loudest?) to start this, the fact that they are publicly posting their specs (kinda necessary anyway) means that any domain name registar can also provide this support.

      In my own opinion, it is a good thing. From what I see in the specs, it is possible use any Unicode characters, which is a huge step.

      Anm

  87. Appearance of names by ce25254 · · Score: 2

    The general FAQ answers how the names will appear in a web browser, but they use a GIF to show the Chinese name. So I'm still wondering how it will look to someone without an OS that displays the characters properly. Never mind that you can download extensions to display the content in the web browser; the location will be garbage, right?

    Will this be a good kick in the butt for internationalization of your OS?

  88. Re:And these site names are entered how? by chrischow · · Score: 1
    you don't need a special keyboard, i enter chinese into my mac using a boring old standard USB keyboard

    its all s/w

  89. Hebrew/Arabic domain names by MotyaKatz · · Score: 1

    That would be interesting to implement, considering the RTL direction.

    Could possibly mean that a domain name will have to contain some more fields, like charset and direction (RTL|LTR)

    --
    -- "If you had fallen into a shit pit during a battle, lick yourself off and move on." - Jaroslav Hasek
  90. Re:Are You American? by davidmb · · Score: 1

    I'm running a special clinic on the central reservation of the M4, pop over on foot and we'll discuss it.

  91. www.nic.nu by emir · · Score: 1

    actually www.nic.nu has been doing similar thing for .nu domains for 5-6 months now. you can register domain under .nu with all characters in iso-8859-1 (latin-1) and they are possible going to add even support for iso-8859-2 (latin-2).

    btw a bit info om iso-8859-1 and iso-8859-2. iso-8859-1 is primarily used in western european countries while iso-8859-2 is used in southern and eastern europe.

    --
    -- http://electronicintifada.net --
  92. DeCSS in a domain name? by BlowCat · · Score: 1
    This increases the possible characters from 37 (26 letters, 10 numerals, and hyphen) to 40,282

    Anyone wants to encode DeCSS in a domain name?

  93. IMO about time by RCobbett · · Score: 2

    I'm surprised it took so long for somebody to do this. I don't relish trying to learn a whole new set of shortcuts (my grasp of the 255 odd ASCII set is slipping fast, never mind kanji!). I did a story about this yesterday called over at http://www.t3.co.uk. It's nice to see that the global part of the Internet is still spreading...

  94. Easier to remember warez sites by Wolfier · · Score: 1

    No need to remember those numbered IP addresses while surely harder to trackdown by US law enforcements.

    just kidding.

  95. Thats a lot of characters... by 11thangel · · Score: 1

    Can someone say buffer overflows?

    --

    I am !amused.
  96. how do we see it? by Lord+Omlette · · Score: 1

    will english users who don't like to put in korean and/or japanese language inputs on their box3n be completely cut off from a good deal of the net? Japanese input screws up windows keyboards (98se, m$ natural keyboard) and Korean messes up the fonts royally. Maybe it's a good thing, so that lego guy can get his own domain name, put his heart and soul into his work, then he won't have to be criticized by a bunch of assholes saying he's just another obsessive compulsive Japanese. :(
    --
    Peace,
    Lord Omlette
    ICQ# 77863057

    --
    [o]_O
  97. And these site names are entered how? by Zocalo · · Score: 1

    Ok, it's easy if you have the right keyboard, but how would us with Latin alphabet keyboards, or any of the newly supported characters for that matter, access a URL that contains characters not available on our keyboard?
    Where's the RFC?
    IS there an RFC?
    I can see it now; "UnicodeMap - your essential tool for surfing far-east pr0n sites with dodgy URLs and even dodgier content..."

    --
    UNIX? They're not even circumcised! Savages!
  98. Re:We need a net Pat Buchanan by tomjgroves · · Score: 1

    Come on, we invented it, we populated it, we control it, and now the Asian hordes are trying to subvert it. Almost right...we invented it, that's true. We control it is also true. But it only takes one quick search for a totally innocent topic, ie. the other day I was looking for info on a bug in MSIE, and you get a shitload of asian porn sites popping up. I wouldn't care, but I can't find the Preview button. All the more reason for them to use english only chars :) Tom

  99. Well, let them fix their literacy first by Wolfier · · Score: 1

    The vast majority of people who understands only GB but not Big5 code are either illliterate or don't have computers.

    Simplified Chinese was, and still is, a good intent but the wrong thing to do. Often the characters before simplification are easier to learn than their simplified equivalences.

    They gotta fix the root of the problems first.

  100. Watch all the Terrified Geeks run for Cover! by Fantastic+Lad · · Score: 1
    Ha! Ha!

    Look at all the terrified geeks run for cover as their tidy little Euro-American paradigm begins to crumble before the advancing Yellow Hoard.

    In reading all the above posts, you can see terms like, 'A Preposterous System!' and 'A Dark Day Indeed!'

    You fucking losers.

    What the hell did you *think* was going to happen after everybody tripped over themselves to cash in on selling billion dollar phone systems and western tech deals to China?

    Silly White Folks. We must now all perish!

    If you thought the Evil American Empire was bad, try one on for size which downright punishes the concept of individuality. . . "The nail which stands up will be hammered down."

    Embrace the Dragon, Bomb the Dragon, or be Dragon Food.

    -Fantastic Lad

    I love the Smell of Xenophobia in the Morning!

  101. Yes! But Unicode is disastrous so far. by bcrowell · · Score: 1
    Thank you for talking some sense!

    But the plain, pathetic truth is that Unicode doesn't really work yet. For example, if you want to guarantee that your web page will not display properly for a large percentage of users, try spelling non-English names like Schro:dinger with Unicode. The first thing we need is for people selling operating systems to stop thinking of Unicode as something they could add later and charge extra money for. For now, Unicode is like the third rail of computing.

    --

  102. Are we witnessing the segregation of the net? by Zoisite · · Score: 1

    This idea is simply preposterous.
    We're turning the web into distinct little pieces that will be unable to interract with each other, while the interest of the web is its variety and world-wide reach.

    Without a Japanese OS, inputting Japanese text, even in a Japanese version of Internet Explorer is a _pain_. (compared to just installing Japanese fonts to read web pages in the language) So with those URLs, we are deliberately limiting the number of people who can access the pages, by making it impossible for some to even type the address. I don't want to have to memorize strings of numbers to find my way around the net.

    I don't see where is the problem with romanizing foreign words in URLs. English is the default language for international communications, and all 'language' editions of web browsers can type English characters. So this would make more sense, and keep URLs accessible to all.

    Imagine needing a browser that can accept _all_ possible character sets (Russian, Arabic, Chinese, Japanese)... o_O

    (Note: I am French and I still find those French URLs unbelievably silly)

  103. argggg! by dentin · · Score: 1

    I'm reminded by whoever it was here who had the .sig:

    "Programmers are so enthralled by the fact that they can that they seldom think about whether they should."

    I think this is a perfect example.

    The fact is that the internet operates in 8 bits, with 8 bit bytes being sent across the wires. Every piece of software on the planet (almost anyway) uses 8 bit bytes. And, for convenience, 8 bit display characters. I'd just as soon not see programmers all over the world add needless layers of complexity to support all this crap. In a hundred years, there will be a global language anyway - if anything we should be vehmently refusing to pointlessly break perfectly good code to support local quirks. Not to mention the other associated hassles of translation and maintainability.

    I wouldnt be as pissy about it if this wasnt forcing unicode adoption. I think it would be far more effective to simply throw out the idea of unicode for any network infrastructure and force those languages that currently need it to make their own 8 bit substitutions. They are going to have to do that eventually anyway, might as well start right now.

    And no, I don't care if the global language is english. Esperanto or german would be fine with me. (Side note: did you know that the german language has officially thrown out a couple of characters, most notably the s-set, for similar reasons?)

    -dentin

    --
    Alter Aeon Multiclass MUD - http://www.alteraeon.com
  104. Protecting Translated Copyrights? by wizarddc · · Score: 1

    Are companies protecting their translated domain names? Companies like Coca-Cola, which have a big presence in Asia, also have different brand names to accomdate for the language barrier. Could I buy the translated version of Microsoft.com (if there is one, of course).



    www.ermac.org - pick a number.

    --
    Th
  105. RACE Encoding scheme is not very PC by ers81239 · · Score: 2

    Isn't it odd that the acronym for the encode scheme of asian domains is called RACE? Who's in charge over there at Verisign, the Klu Klux Klan?

    --
    there are 2 kinds of people. those who divide people into 2 kinds, and those who don't.
  106. TLD by Fjord · · Score: 2

    I noticed a promotion for this on networksolutions website a week or two ago. I think that this is great, but we need TLDs in these characers as well, one with the chinese character for commercial, one for organization, one for educational. I wonder if that new TLD system that they are testing will allow these characters. For 50,000, you could register one of these Chinese TLDs and probably make a lot of money.

    --
    -no broken link
  107. Actually, The Current Max Characters is 67... by ras_b · · Score: 1

    according to this story at 2600.com, the current maximum allowed characters for a domain name is 67. that story is a very intersting read about how verizon sued 2600 for registering www.verizonreallysucks.com, so 2600 took advantage of the 67 character max and registered
    www.Veri zon ShouldSpendMoreTimeFixingItsNetworkAndLessMoneyOnL awyers.com