ICANN Plans Non-English Character Domain Testbed
Wanted writes: "This article reveals ICANN's plan to open registration of domain names
with national characters. Actually it's Network Solutions, who are responsible for technical issues of implementing that project. Initially they want to support CJK (Chinese, Japanese, Korean), then Spanish and other European languages. I don't know why they like Spaniards, but I'd rather say about supporting ISO-8859-1, not particular languages. Nevertheless the Internationalized Domain Names IETF Working Group should be pretty happy about it. Wonder, how would you type www.wong-kar-wai.org in Chinese with classic keyboard :)"
Numbers. 417 million people speak Spanish, 191 million speak Portuguese, 128 million each speak French and German, and no other Latin-alphabet European language has as many as 100 million speakers. It isn't that NSI prefers Spaniards, it's that it prefers larger markets over smaller ones.
CJK has a similar "numbers" vibe. Since the CJK character sets are generally handled by a single solution in software (esp. since written forms of Japanese and Korean include both native syllabic/alphabetic [respectively] scripts and Chinese idographic script), you get Japan, Korea, and Greater China in one fell swoop. (Greater China here not only including the PRC and Taiwan, but the Chinese-speaking groups in Maylasia, Singapore, and Indonesia.)
So why not Devanagari too? Because 1) there are a lot more CJK and Spanish language customers than Hindi/Bengali customers due to internet penetration and financial factors, and 2) the people who would buy the domains in India generally are of the educated classes that speak English. So there's less demand for Devanagari.
Steven E. Ehrbar
Does this mean I can register micrösoft.com and yàhoo.com and släshdot.org?
--
Domain names should map from something like: "Señor Hussong's Cantina.com" to "senorhussongscantina.com". Spaces, punctuation, and hyphens should be deleted. Special characters should be translated into the closest low ascii character.
This way, you can write your domain name however you want, and there isn't so much of a potential for people registering something similar.
Hyphens have got to be the dumbest idea of all time. If you have a multi-word name, you almost have to register both with and without the hyphen or you will lose visitors.
Even better would be using something like soundex, which makes a "hash" of a name so that similar sounding words map to the same value. Memorizing exact spelling is not something people are used to doing.
They shouldn't do CKJ domain names, they should just define a standard translation, which can then be incorporated into client software and possibly into DNS systems. What's next, I'll be unable to get to a site unless I also choose the correct encoding? Let's see, was that "cool-shit.org in 8859-1, or coolshit.org in japanese encoding, or maybe cool-shit.net eastern european encoding. Or was it coolshít.org?"
Slashdot has had to ban accented characters to prevent this kind of abuse; ICANN should do the same lest they a similar outbreak of mimicry infect the entire Web.
History of the "Something must be done to control the outbreak" syndrome.
Early 1990s: OMG! People are making up their own web sites in large numbers. Thousands of people will see them and be unable to distinguish fact from fiction.
Mid 1990s: OMG! People are now making up their own news sites. Millions of people are reading them and can't tell the difference between real and fake news.
Late 1990s: OMG! People are posting stock market tips which are causing market fluctuations. People will be unable to tell the difference between real and fake stock market news!
Early 2000s: OMG! People are allowed to use accented chars. Millions of people will be diverted to fake sites which use similar accented chars in their domain name, and thus be unable to tell the difference between real and fake sites!
Here, take a chill pill. Welcome to the internet, my friend.
w/m
This is a bad idea -- domain names must be interoperable on all systems, with or without Unicode or any other charset support, with or without keyboard capable of entering certain characters. The ASCII subset allowed in DNS now is the only subset supported by absolutely all computers (even ones that natively use EBCDIC), and no matter how the use of other charsets (and/or Unicode) will expand, this is not going to change. I see it as an attempt to just promote "unicodefication" of existing standards for no good reason.
And if anyone cares, my native language has nothing to do with ASCII.
Contrary to the popular belief, there indeed is no God.
Recently I posted this comment mentioning the fact that there's really no reason why a domain such as www..com (you should see two Chinese ideograms meaning "China" between the "www." and the ".com" parts; further, if you click on this link, your browser should open a window telling you that the domain "www..com" does not exist, with the same two Chinese ideograms) doesn't exist.
Let us recall: first, as specified by the HTML specification, every HTML document, no matter what character set it is "encoded" as, is written in the all-englobing Unicode character set. So when you write something like "中国" in HTML, it refers to the Unicode characters (decimal) 20013 and 22269, no matter what the current character encoding and font are. So that's how you write the link text. Second, as for the URL itself, well, although it is not (as far as I know) formally recommended by an Internet standard, it is widely recognized that URLs are written in the UTF-8 encoding format (which is afterward %-encoded into ASCII).
The whole process is described in this Internet Draft ("Internationalized Uniform Resource Identifiers"; WORK IN PROGRESS!) by Larry Masinter and Martin Duerst where the relationship between URIs and IURIs (Internationalized URIs) is discussed in detail.
The DNS is the toughest part of all. The DNS specification (RFC1034) states (section 3.1) that DNS data is to be taken as binary for possible upward compatibility (this was wonderful foresight on Mockapetris' part!). Consequently, there is nothing as per standards wrong with using (UTF-8 encoded Unicode) 8-bit data in DNS labels. Except, of course, that many "buggy" implementations will have to be corrected for broken assumptions, *sigh*. The IDNS working group suggests using a UTF-5 encoding to avoid going beyond the current domain name limits: I think this is not a good thing and we should stick to UTF-8 and repair broken software.
Oh, and incidentally, see this page too know how broken your browser's Unicode support is.
Unless I'm mistaken, Unicode is a combination of two ASCII characters to create a single one, which is how Japanese, Chinese, etc., characters are created. 255^2 is a lot of characters. (65025, to be exact.) Doesn't this mean that these domains are limited to 31 characters? Further, can BIND *support* using characters beyond [a-z0-9-.]? I sure wouldn't think that it could.
I didn't find these questions answered anywhere on ICANN or NSI's sites. Anybody have any ideas?
-Waldo
-------------------
Look, I'm panamanian. Spanish is my first language (it is Panamá, not Panama), but i just can't agree with this because i don't think it's practical at the moment. Take for example this web site we're building called galeriacentral.com. everyone knows automatically how to acces it when they hear an ad for it on the radio, but with the intl characters allowd, I would have to register galeriacentral.com, galeríacentral.com (correct form) and galerìacentral.com. and then someone would register galeríacentrál.com and i'd be screwed (cybersquattin is allowed in most parts of the world)...
.com/net/org are already abused enough to leave more room for stuff like slashdog.org.
my recomendation would be to leave it up to the countrlies TLD's. so if i want to register cualquiercosa.com.pa then ok, but the regular
There are two kinds of people in the world: Those with good memory.
And worldsnames.net also features Japanese characters, Chinese, Korean, Arabic, Cyrillic.
Tho one of the nice "features" of the internet is the fact that you have the opportunity to reach a gobal public. Which is rather hard when you have country/language specific characters. my 0.02