Registrations Now Accepted For Asian Domain Names
Eric Sun was among the first to point out that as of Thursday evening, VeriSign has begun accepting Chinese, Japanese and Korean domain names. "This increases the possible characters from 37 (26 letters, 10 numerals, and hyphen) to 40,282. Find more information [see this AP story]." snrsamy points to the same story as featured on C|Net
. jamie suggests reading the technical lowdown at VeriSign.
Since nobody seems to want to read the article, or research any of the info, here is the quick low-down (since I have to deal with this at work right now...)
- This solution is only for web browsers. It requires a special version of a web browser, or a plugin, to be able to use the new encoding scheme. It won't work for email, ftp, telnet, gopher, etc, unless a special version of the program is written.
- DNS doesn't break. DNS still uses ASCII. This scheme uses RACE to encode the multi-lingual character set into ASCII. NSI will put a small prefix at the start of the domain name to identify it as multi-lingual (for example eq- would be found at the start of the domain name. The exact prefix has not yet been released to prevent squatters from snapping them up.)
- The special browsers will detect the prefix, and translate the ASCII gibberish into the specified multi-lingual character set. The browser also does the conversion back to ASCII to allow a DNS lookup.
- WHOIS does not/will not support this. You can only use WHOIS with the ASCII encoded gibberish.
- This is not supported by the IETF. This is a custom solution implemented by NSI. But it looks like they are going to be WAY behind schedule in actually rolling this out.
- They are accepcting registrations right now, but none of these names will resolve for at least a month, probably much longer. In other words, the system isn't useable yet, but NSI can collect money.
- The IETF is working on their own, probably completely incompatible system, to do the same thing.
"Tomorrow's forecast: a few sprinkles of genius with a chance of doom!" - Stewie Griffin
How it works is there is a special prefix "<rp>" (or maybe this just represents the prefix, I can't really tell from the PDF, but I didn't think < and > were valid domain name characters) that indicates a part of the domain is encoded, followed by the encoded name which only uses ASCII characters, and includes information about which character set was used (Unicode, SJIS, etc.). The algorithm is called RACE, Row-based ASCII Compatible Encoding.
A couple of examples were given for both a domain name and a server name:
<rp>45dfg62de34432.COM
<rp>3df45gd345.<rp>45dfg62de34432.COM
So I guess you can set your spam filters to block any domain starting with <rp>! :)
#naabhaprzrag, #sverubfr-000, #agi-fcbafberq, negvpyr[pynff*=' negvpyr-ary-'] { qvfcynl: abar !vzcbegnag; }
Will moderators shoot down the fact that I mention Microsoft?
Windows has had a CJK-capable kanji input scheme for years. CJK: Chinese, Japanese, Korean. Windows also has had bidi (bidirectional) support for right-left and/or top-bottom languages, including Hebrew.
If you have the appropriate cjk-input features installed, it's just a funky keyboard shortcut to open it up to enter kanji. If not, you'll probably be limited to clicking on visible links, not entering domain names or other text by hand.
I don't know what features Linux has to handle EFIGSS (English, French, Italian, Swedish, Spanish) differences, nevermind bidi or kanji input.
[
----
This is probably an attempt to force migration over to Unicode. Anyways, why is Verisign behind this? Didn't we learn from Network Solutions that a privately-owned, commercial company is not the solution to internet domain name databases (and their "ownership")?
How can one company be granted the monopoly rights to something so important to the world's economy and everyone on the Internet again? Should this be assigned to a not-for-profit entity under the auspices of ICANN?
--
--
He lives in a world where those who do not run the client software of the omnipresent meme are unacceptable.
My Chinese co-worker has informed me that to type Chinese, he sets the desired language in whatever app to Chinese and then types phonetically. The problem is that even phonetically there are many similar words, so he basically types a few English letters to verbally spell out a word, then Chinese characters appear on the screen which he must then choose. He tells me there are also special keyboards where you hold down multiple keys.
Developers: We can use your help.
The rp is a variable. The first couple pages notes that the implementation-testers should assume that the "RACE Prefix," or rp, should be "bq-".
[
So how's this gonna work for systems not set up to handle the asian character set?
Read the links.
The proposal implements an ASCII encoding scheme, called RACE. A certain prefix (they list the debugging prefix as "bq-") indicates a RACE-encoded domain name.
The rest of the ASCII encoding either appears in ASCII for dumb browsers, or is converted to Unicode or Big5 or whatever character set it wants.
For "dumb browsers" (not a flame, just an indication of character-set-awareness), you'd see some crazy domain like http://www.bq-ag0970ag00ah07h.or.jp/; for "smart browsers," it would appear in your own kanji font.
[
A few notes...
The Internet Society probably isn't too happy about this. They released a statement on November 8th encouraging NSI to back off and let the IETF IDN WG do its job.
Also, there are companies that are already currently operating in this market, including WALID, which is taking registrations for Arabic domain names (AND RESOLVING THEM), and will soon be adding Hindi, Tamil, and two Chinese scripts before moving into other markets.
Wouldnt it make more sense to implement umlauts like ö/ü/ä first?
I have dibs on släshdot.org!!
Scuttlemonkey is a troll
Hmm. This could lead to fun. Some character sets/character encodings allow different byte sequences to map to the same character. .com, .net and .org.
(See the Unicode bugs recently in IIS, where a unicode representation of '../' is used to navigate upwards in the directories of the server to view files outside of the server root.)
Now, does a company have to register all possible permutations of byte sequences which all map to the same character sequence? As well as doing so in
We'll see.
So how's this gonna work for systems not set up to handle the asian character set? Lets say I want to send to joe.bloggs@somechinesename.net from my FBSD or Linux boxes? Not too much fun, I think...
Wouldnt it make more sense to implement umlauts like ö/ü/ä first?
Easier to test etc..
Before you email me, remember: "There is no god!"
How is this going to work? Since the majority of chinese users input their chinese as big5,
(eg www.ê.com) will not be the same as the unicode equivalent..
Kind of ironic the algorithm is called RACE, isn't it? Can we filter by RACE? Can we browse domains of only a certain RACE? Can it be enhanced with RACISM, Row-based ASCII Compatible Interface for Stereotyping Mayhem?
Developers: We can use your help.