Registrations Now Accepted For Asian Domain Names
Eric Sun was among the first to point out that as of Thursday evening, VeriSign has begun accepting Chinese, Japanese and Korean domain names. "This increases the possible characters from 37 (26 letters, 10 numerals, and hyphen) to 40,282. Find more information [see this AP story]." snrsamy points to the same story as featured on C|Net
. jamie suggests reading the technical lowdown at VeriSign.
To help us keep the internet English compatible.
:)
Come on, we invented it, we populated it, we control it, and now the Asian hordes are trying to subvert it.
Let them make their own internet.
Not to mention some of the domain names may belong to Al Gore
But what's the POINT?? It just totally screws up existing protocols that have been tried, tested and proven to work totally well. And what if we want to browse to an asian site? Seriously, lets say you have a friend in China and he has an article up that you want to read that's in english. HTF are we supposed to get to it? Wheras these guys can view the entire internet, they also have their own "private" section that only very few of the western world can access. tom
However, Chinese are commonly known as more concise than English or other languages with a small character set. There are thousands of commonly used characters each of which have the function of a word in English. Many characters have more than one meaning, and their combination (2 characters in most case) makes new words. And don't forget the amazing flexibility in the Grammar system (e.g. fewer stop words like "the")! We are not even talking about the ancient Chinese which is much SHORTER.
Give me any sentence with more than 10 English words (with no words like Yugoslavia of course), I guarentee to re-write it in Chinese in less space.
You see, this is the basic rule of information. You increase the complexity of encoding scheme, you get more density.
How complex this is? Well, I have to say that the 12 years' of Chinese class are a painful memory.
Just wanted to add that anyone who wants to know more about the whole topic should start at this page.
> I dunno, just a guess, but maybe someone's already thought of this? Perhaps...
It's easy to enter kanji if you are using Internet Explorer - just visit Windows Update and download the Japanese Input Method Editor update, and you'll be able to type kanji in your browser (using romaji I think). I don't know how you do with Mozilla...
What would happen if someone said "Let's add 2 new data pins to RS232"?
They already did:
pin 14 STD Secondary transmit data
pin 16 SRD Secondary receive data
(also pin 19 SRTS Secondary RTS, pin 13 SCTS Secondary CTS, etc.)
These pins can be used to double the amount of data sent through your RS-232 cable, which would be useful if you decided to (say) switch from 8-bit characters to 16-bit characters.
It's not an RS-232 cable unless it has all 20 wires!!! (-: (-:
Lawyers: The Other White Trash.
Since nobody seems to want to read the article, or research any of the info, here is the quick low-down (since I have to deal with this at work right now...)
- This solution is only for web browsers. It requires a special version of a web browser, or a plugin, to be able to use the new encoding scheme. It won't work for email, ftp, telnet, gopher, etc, unless a special version of the program is written.
- DNS doesn't break. DNS still uses ASCII. This scheme uses RACE to encode the multi-lingual character set into ASCII. NSI will put a small prefix at the start of the domain name to identify it as multi-lingual (for example eq- would be found at the start of the domain name. The exact prefix has not yet been released to prevent squatters from snapping them up.)
- The special browsers will detect the prefix, and translate the ASCII gibberish into the specified multi-lingual character set. The browser also does the conversion back to ASCII to allow a DNS lookup.
- WHOIS does not/will not support this. You can only use WHOIS with the ASCII encoded gibberish.
- This is not supported by the IETF. This is a custom solution implemented by NSI. But it looks like they are going to be WAY behind schedule in actually rolling this out.
- They are accepcting registrations right now, but none of these names will resolve for at least a month, probably much longer. In other words, the system isn't useable yet, but NSI can collect money.
- The IETF is working on their own, probably completely incompatible system, to do the same thing.
"Tomorrow's forecast: a few sprinkles of genius with a chance of doom!" - Stewie Griffin
Although according to the gov't, it was never a war at all.
Bitchslapped? Give Rob a bitchslap from bitchslapped.com.
Well, if your only connection to the Asian population is spam email, this should make your isolationism even more simple: the standard uses a standard prefix for RACE-encoded domain names; block those and you're in arrogant English/USian bliss.
Blah. Spare us your arrogant anti-English/US attitude.
Fact is, it is conveniant to be able to block certain top-level country codes at the business gateway (or ISP) in order to cut down on spam.
Incidentally, someone's connection to the Asian population is most likely NOT through spam, since most spam coming from asian top-levels is actually just U.S. spam--either routed through someone elses mail system, or with spoofed headers.
Whether or not this is compliant with one or more RFCs, it is entirely noncompliant with most. Internationalization of the Internet is inevitable and a Good Thing(tm), but only when it takes place via the appropriate processes. As others have pointed out, internationalization was already happening, but it takes time.
This is nothing more than an attempt by NSI to open another huge revenue stream without any consideration for the effect it will have on the Internet, or the long-term interests of the Internet community. After all, they see an untapped potential market and a chance to dominate it by jumping in before the standards are developed that would allow others to participate. Now their competitors will have to follow their lead or risk losing the market, and the standards process will have been neatly circumvented. The cost is borne by the Internet community, and the benefits are reaped by NSI.
Why did I vote for Nader? Now I remember...
Input String
Utf-8
Prepared String
Utf-8
Registration String
RACE
bq--gcrmxyi
--
EFF Member #11254
Please consider making an automatic monthly recurring donation to the EFF
The problem isn't necisarily with buffer overflows, read bug-traq...
.rhosts, sendmail.cf, etc...) each of which may perform the normalisation/access checks slightly differently.
there was a report a couple of weeks ago regarding a problem with internationalised IIS's where unicode representations of directory traversal codes (.,/,\,etc) where being substitued after access checks had been applied...
Now imagine domain based trust relationships - these will be implemented in numerous sub-systems (tcp wrappers,
I imagine that this will lead to numerous security issues due to slight differences in systems support for multi-byte characters.
Another question (which I suspect will be answered in the FAQ) is do you need to register the same domain name several times to take account of the differing unicode byte widths?
How it works is there is a special prefix "<rp>" (or maybe this just represents the prefix, I can't really tell from the PDF, but I didn't think < and > were valid domain name characters) that indicates a part of the domain is encoded, followed by the encoded name which only uses ASCII characters, and includes information about which character set was used (Unicode, SJIS, etc.). The algorithm is called RACE, Row-based ASCII Compatible Encoding.
A couple of examples were given for both a domain name and a server name:
<rp>45dfg62de34432.COM
<rp>3df45gd345.<rp>45dfg62de34432.COM
So I guess you can set your spam filters to block any domain starting with <rp>! :)
#naabhaprzrag, #sverubfr-000, #agi-fcbafberq, negvpyr[pynff*=' negvpyr-ary-'] { qvfcynl: abar !vzcbegnag; }
I think this is 'a bad thing'.
o ps first...)
I don't think standards like this scale well.
What would happen if someone said
"Let's add 2 new data pins to RS232"?
I live in a country where we want 3 extra symbols to accomodate the language. They're all in Latin-1, of course. I don't even think that expanding to 8-bit Latin-1 is necessarily a good thing, let alone introducing an entirely new character encoding (16-bit) to the scheme of things.
I don't want to be
"f\0a\0t\0p\0h\0i\0l\0.\0o\0r\0g\0\0"
We don't let Russian trains into central Europe (the tracks are wider), why should be let Kanji into our character sets. (Yes, I know Russian trains do come to Europe, I live at the end of one of the lines, just not central europe!)
Anyway, here's to 3-bit serial lines...
(Could I patent that? I'd need to design an IC of flip-flap-flup-flipflop-catflap-flatcap-fatcat-fl
FatPhil
Also FatPhil on SoylentNews, id 863
Will moderators shoot down the fact that I mention Microsoft?
Windows has had a CJK-capable kanji input scheme for years. CJK: Chinese, Japanese, Korean. Windows also has had bidi (bidirectional) support for right-left and/or top-bottom languages, including Hebrew.
If you have the appropriate cjk-input features installed, it's just a funky keyboard shortcut to open it up to enter kanji. If not, you'll probably be limited to clicking on visible links, not entering domain names or other text by hand.
I don't know what features Linux has to handle EFIGSS (English, French, Italian, Swedish, Spanish) differences, nevermind bidi or kanji input.
[
What happened to "AND"? I want and back!!!
Man, you're pretty fucked up.
Bitchslapped? Give Rob a bitchslap from bitchslapped.com.
Um...just out of interest how often do you go to Asian sites? An estimate will do - maybe once since you first logged on? For the vast majority of Internet users in the West this will have no effect whatsoever at all because the vast majority don't speak Chinese, Japanese and the other languages which use different alphabet sets. The people affected will be the ones whose alphabets are being introduced, and therefore the ones who are likely to find it convenient not to have to use our system. The sites run by companies such as, say, Sony will continue to have sites which can be easily accessed by the rest of the world. A very black day for the net? Not really. More a sign that the system doesn't have to be designed by Americans for Americans.
So is there an RFC on how this works?
#naabhaprzrag, #sverubfr-000, #agi-fcbafberq, negvpyr[pynff*=' negvpyr-ary-'] { qvfcynl: abar !vzcbegnag; }
AFAIK the RFC describing URLs limits the valid characters in a URL to basically lower and upper case letters and some marks (like slash, underscore, etc.) But not even european letters are allowed. If so, having a chinese domain name is fine, but you can't have a URL pointing to it. Or can you?
Yeah, I'd kind of figured that, hence the reference to the fictional "UnicodeMap". I occasionally use character map programs for accents, and even know a few keyboard shortcuts for common ones. I can't imagine doing that for a whole line, let alone a language I don't know enough (any) to have a clue where to start looking for the character that probably can't be displayed anyway because the neccesary fonts are not installed, Chinese might as well be Martian in that respect.
I don't really think it's going to be an issue though; NonLatinAlphabet.com is almost certainly going to register their URL in the DNS supported languages of all the countries they wish to do business in and point them to that language version of the site. Ultimately it should make it easier for users who don't have Latin keyboards to get by on the web, and this is definately a very good thing.
English may well be the lingua-franca of the web, but why should a Chinese speaker get to a Chinese web site, hosted in China, that is displayed in Chinese by entering a URL in English. All web users require some support for Latin characters, and probably always will, but as a failsafe the reverse should apply too, and we can't fall back on IP numbers because the web is supposed to be using HTTP 1.1 isn't it?
UNIX? They're not even circumcised! Savages!
What about sites that want their corporate name in all these new languages (would Yahoo have to register it's name under all the new languages?). Is there a market for this kind of registration?
Capt. Ron
crazy dynamite monkey
Had you ever actually considered what using the Internet must be like for non-English speaking countries? Probably something equally unpleasing to the eye.
Seeing as the Internet is supposed to be the medium that allows a break-down of barriers between nations and a free flow of information, don't you think that it might be a good idea to include as many languages as possible rather than exclude anybody who doesn't use a language that conforms to your standards?
I think you need to realise now, that English is not the only language in the world - in fact we're in a vast minority. It's possible that at some point enough people will undertake the task of learning enough foreign languages to free up communication between ourselves, and perhaps ulitmately one language will be considered the accepted standard - however, don't expect that to be English.
----
No, it's not. This is one of the most brain dead decisions ever made, in the name of political correctness, with complete disregard for the practical issues. The effect of this will be to reduce the global appeal of the web, not increase it. Western surfers will now effectively be cut off from many far Eastern domains. Sure, there's a reasonable workaround for entering non-ASCII domains on an ASCII keyboard, but it's too complex for the general public, and far Eastern companies are unlikely to publish the ASCII-fied domain anyway. This is a very black day for the net...
"The invisible and the non-existent look very much alike." -- Delos B. McKown
No, the Unicode hiragana/katakana ranges are ordered in standard Japanese ordering, and the kanji in the CJK range is ordered in Chinese dictionary order (radical first, then stroke count). You do know that kanji means Chinese characters, right? It's not unreasonable to order them the Chinese way.
In IE or Netscape, look under the encoding menu. You will find 3 choices; Shift-JIS, JIS, and EUC.
Well, I also find Unicode (UTF-8) in IE, and both Unicode (UTF-7) and Unicode (UTF-8) in Netscape. You need to realize that Unicode is for displaying all languages, not just Japanese.
Most Japanese experts on this subject view Unicode as an unwanted Western imposition.
True... also known as "Not Invented Here".
It's a shame that this happened now, instead of 5 years ago. I bet if I had spent the last 5 years on a net with Asian characters in their domain names, I would've learned more than a few words in the language just from exposure. (The only real way to learn a language, imo.)
How are you supposed to be able to type all 40,000+ new characters? Are we going back to Escape-Meta-Alt-Shift for an upper case 'Q'?
Kierthos
Mr. Hu is not a ninja.
the commies tried with pinyin but it doesn't work very well because of the many homophores in chinese. hanzis are much cooler anyway and a more compact way of writing and representing data.
And what will the new ones look like to us Americans? Ugh, I can't bear to think of it.
"Ancillary does not mean you get to rule the world." --U.S. Circuit Judge Harry Edwards, speaking to the FCC's lawyer
... to see clueless news readers reading out a URL with all these characters in it ;)
This is probably an attempt to force migration over to Unicode. Anyways, why is Verisign behind this? Didn't we learn from Network Solutions that a privately-owned, commercial company is not the solution to internet domain name databases (and their "ownership")?
How can one company be granted the monopoly rights to something so important to the world's economy and everyone on the Internet again? Should this be assigned to a not-for-profit entity under the auspices of ICANN?
--
--
He lives in a world where those who do not run the client software of the omnipresent meme are unacceptable.
This is fundamentally a good idea for the future. It's also a prime example of the marketeers making decisions that the technology is not yet ready to support. My understanding is they're basically telling people that "we'll take your money and register your name, but if you can't use it (and you can't, for some time yet), you don't get your money back." Foo.
Since the majority of chinese users input their chinese as big5, (eg www.ê.com) will not be the same as the unicode equivalent
I think it's probably not too difficult for the Chinese browsers to do the conversion behind the scenes. Kinda like ASCIIEBCDIC conversions; you don't need to change the keyboard to enter text of the other variety.
Now, which one does the registrar accept, and the DNS servers cache? Read the article? From the first couple pages, it appeared that the domain name is actually not in Unicode nor Big5; it's translated to an ugly ASCII encoding.
[
Michael Everson of Everson Gunn Teoranta has proposed an encoding of Klingon in Plane 1 of ISO/IEC 10646-2; if it gets adopted, future versions of Unicode may adopt it (Everson's one of the editors and authors of Unicode 3.0).
Why do fuckwits hide behind AC?
Wide guage trains physically cannot come to _CENTRAL_ Eurpoe, where the 6" narrower guage is used.
However, I can hop on a wide guage train here in Helsinki which goes all the way to Moscow.
You see not all of Europe is CENTRAL Europe.
I'm sure you'd agree that not all of America is Central America. Screw it, I don't need your agreement, your opinion is less than worthless.
Now safe me the fucking effort and go kill yourself.
FatPhil
Also FatPhil on SoylentNews, id 863
I have occasion to buy an international airline ticket this year, and I refuse to use priceline because they have Will Shitner doing their ads. Give me Nemoy, Stewart, Dorn, Spiner, McFadden, anyone but shitner. Blow me priceline.
Man, you have got some real problems, don't you? Did Shatner beat you as a child or something? I mean, I'm not crazy about Troi, but it's not like I carry some kind of grudge. And you manually typed in a .sig as an anonymous coward? That's just weird.
"The question of whether a computer can think is no more interesting than that of whether a submarine can swim" -EWD
So what your saying is that it's ok for non-english speaking people to try and use our ASCII system but totally wrong and inappropriate for them to have their own native language system and for us to to try and learn how to use that? It would seem you embrace the global village idea... providing it is english speaking and conforms to your native character set.
--
--
Democracy is the art of saying "nice doggy" while subtly reaching for a large stone.
So, the era when humans could remember an easy, pronouncable name instead of an IP address is over, then?
.
I guess I better start learning the numbers. .
---
This message has been ROT-13 encrypted twice for higher security.
The proposal includes umlauts - it's based on a mapping to US-ASCII from any Unicode string. (Admittedly if you only wanted to represent a handful of European languages you'd come up with a different scheme, but it would obviously be less general.)
Presumably they're pitching it at the asian market cos that's where they expect to make money.
There are apparently good reasons for not allowing 8-bit characters not in US-ASCII in domain names - it would break too much.
- Alan
I think she runs Solaris now. *sigh* a pornstar after my own heart
"I just can't sit while people are saying nonsense in a meeting without saying it's nonsense" J Watson, Sci Am 288:(4)51
I.e. Gie/3er remains Gie/3er, but da/3 becomes dass.
--
Unselfish actions pay back better
Hot Asian teens? I didn't know there were any. Well, maybe if you're a latent homosexual who likes flat chested, smush faced girls.
Asia Carrera,
and she runs Linux.
I think there's a company in Richmond WA that claims to make one...
Damn you're quick. Of course the whole point of this is to provide a work-around to that problem. All it does is make an ASCII representation of a different character set. These representations are flagged by having the hostname start with bq-. So if you run across a hostname that looks like bq-safjdlfaqwue72819.bq-hewaguifuifdajhks.co.jp you'll know that the hostname probably makes good sense to anyone who has a Japaneese web browser. If you are in the habit of reading such pages you'll get the appropriate plugin. If you don't have the plugin, you probably couldn't read the content anyway and believe you me, there is a LOT of content on the web that's written in a language you can't read. (I'm not saying that you're stupid or anything, I'm just making the bet that there isn't anyone here who knows every language in which material has been posted to the internet, this includes Klingon)
_____________
I don't want free as in beer. I just want free beer.
I know Al Gore invented the internet in terms of convincing Congress to heavily fund the net... and some other congressman opened up the net to commercial use... but wasn't the web invented in Europe (CERN)? and aren't domain names a big part of the web? does that mean that if we try to keep asian hordes away from the net, the europeans will try to keep the crass american lummocks from using the web? ^^;; -confused
--
Peace,
Lord Omlette
ICQ# 77863057
[o]_O
The web was not invented by americans, even if you like to belive that :p .. you know thats in Europa
It has been invented at the CERN in Swizzerland
Before you email me, remember: "There is no god!"
ummm... DNS is only used in name resolution, packets are routed according to the IP address once resolved which is totally unrelated to the domain name - that happens right now - nothing has changed.
If anything extending the number of TLD's will reduce latency as it will spread the load accross more servers probably on a geographical basis!
feel free to troll its your god given right, but do try to remember that acting both jingoistic and technically ignorant in the same mail is very unlikely to get you any respect.
So that they can centralise more power to themselves.
Verisign owns Network Solutions and Thawte.
So they own your certs (need to be renewed) and your names (refer to Network Solutions' terms and conditions).
And there's this push for DNSSEC, which isn't that great anyway. But it'll be a convenient tool to centralise even more power.
Open your eyes a bit and you'll see more scary stuff.
Soon there'll be a bigger push for certificates becoming mainstream - via smartcards and other stuff. And Windows 2000 has some nice support for that... Maybe Microsoft will buy Verisign.
What do you think?
Have fun,
Link.
There are a couple of Japanese domain names I've thought of purchasing, but would rather use the CORE registrar joker.com than register.com due to the difference in price (joker.com is around $8-11 per year, depending on the exchange rate of the Euro). I was sad to see that I'd have to use register.com and spend $20 for a Japanese domain name.
But now what's to stop me from looking through the RFC, figuring out how to encode my domain name using RACE, and then registering it using joker.com as a domain name that begins with "bq-"?
I'm gonna get Släshdöt.org. It has a more "heavy metal" feel to it, like "Mötley Crüe".
-B
Have to try that one when I get to work tomorrow.
(NJStar's not bad as IME's go - at least it's not a Microsoft product)
Yes, that's *exactly* what I'm saying. I'm not saying it because I happen to use ASCII, but because ASCII is a more natural system for computers to deal with. If Western European and American languages consisted of 30000+ characters, and those in the the East consisted of some 100 or so, I'd suggest using the Eastern system at the drop of a hat, even if it wasn't my native system. This has nothing to do with whether or not it's my native character set that's chosen, and everything to do with whether a good decision is made from a techincal perspective.
"The invisible and the non-existent look very much alike." -- Delos B. McKown
I want to be able to register domain names in French, German and Russian too. If they are going to support all three zillion kanji and Chinese characters, they need to at least support the various Cyrillic and eastern European Roman alphabets, and the rest of ISO-Latin-1 (which covers all the major and most of the minor Western European languages.) The Persian-based alphabets (Arabic, Farsi, Urdu, etc), Hebrew and Thai are written right-to-left, so I suppose that won't be implimented right away, but it needs to be on the drawing board.
If all those other languages are accounted for, I view this as a good thing. If this is part of an overall shift to Unicode on the web, then all these languages are automatically supported, and I would think it an even better thing.
or in a url (using directories)
the domain name wouldn't work though, they are talking alphabetic symbols rather than length of domain name, i.e. you have the existing English alphabet of 26 letters + 10 numbers & - for _each_ char of the allowed 67, now you can also use ascii encodings of asian characters as well.
~ppppppppö
Actually, most of my spam is from Asian top-levels (mostly cn) and in some CJK encoding. (Not being able to read it, I don't know if it's _really_ US spam in a foreign language, but....)
Furthermore, much of that spam comes through the same set of systems which never seem to do anything about it.
argh, i want it to be easier to tell urls apart from each other, not harder.
--
The shareholder is always right.
I think the point of the original quote of 37 characters max is the 'old' number of characters in the symbol set that were allowed, not the length of the actual URL. And your article from 2600 lists a maximum URL length of 63, not 67.
BTW, are hyphens and tildes inter-changeable? Because I've seen a lot of web-pages with tildes, and only some of them turn into hyphens when reloading.
Kierthos
Mr. Hu is not a ninja.
So how many ways can we now register "sucky-sucky.com"?
Within a few minutes of this story being posted, most of the posts are along the following lines.
I dunno; maybe because the Japanese don't know enough German? Why should the Asians wait for Europe to get its act together before they solve the issues they face every day?
Well, if your only connection to the Asian population is spam email, this should make your isolationism even more simple: the standard uses a standard prefix for RACE-encoded domain names; block those and you're in arrogant English/USian bliss.
I dunno, just a guess, but maybe someone's already thought of this? Perhaps the people who work in kanji all day know something about entering kanji, and have hardware or software solutions around. If you don't normally have to type it, I'm sure your browser will let you CLICK on encoded links just fine.
Missed anything?
[
If it's implemented properly, surely it shouldn't matter. It's not just size of the Unicode chars, but also the big and little endian-ness. If it's implemented properly, the DNS would just determine what you're using (UCS-2BE, UCS-2LE, UCS-4BE. UCS-4LE) and convert it to it's internal representation for the lookup.
By the way, I think it's actaully 64 characters, assuming a three-character TLD. I think that they allow you to register 67 characters, but the TLD (.com, .net, .org, etc.) counts. But the dot in the TLD doesn't. Or something confusing like that...
SUWAIN: Slashdot User Without An Interesting Name
SUWAIN: Slashdot User Without An Interesting Name
Yeah, I assume that is the case. That's how it works with KanjiKit as well. But for those without an appropriate IME, it will look like junk. Oh well.
I guess I can look at this two ways...
1) Oh God, there's gonna be a MASSIVE amount of spam coming from domains with characters outside of the standard 37.
2) I can block anything and everything coming from domains with characters outside of the standard 37.
-S
--- What parts of "shall make no law", "shall not be infringed", and "shall not be violated" don't you understand?
hey quit knocking pinyin, it lets us gwai lo learn easier if nothing else :P
~ppppppppö
A few things worry me a bit. First there's the part of the RACE working draft where they mention that if you don't follow all the MUST and MUST NOT statements "exactly", otherwise it's "likely to cause damage to the Internet"
Then there's the issue of the chairman of the IETF basically calling this premature...
You bring up a very good point. Likewise in Japan, Nobody uses Unicode. The preferred encoding scheme is Shift-JIS (JIS = Japan Industry Standard) which has been in use since 1969. The usage is probably over 90%. The reason for the popularity is that shift-JIS was designed in Japan and extremely well-planned. Characters are sorted according to the Japanese alphabet ordering (Unicode uses random ordering), and ideograms sub-divided into compulsory, common, and extended (Unicode uses random ordering). JIS and EUC comprise the remaining 10% of usage. My site uses shift-JIS. Yahoo Japan uses EUC. One should try to search for a web site in Japan that uses Unicode. I think you will find none. Even if you do, your browser will not be capable of displaying it! I'm not kidding. In IE or Netscape, look under the encoding menu. You will find 3 choices; Shift-JIS, JIS, and EUC. Let's face it. Unicode is a badly designed standard conjured by an uninformed committee. Most Japanese experts on this subject view Unicode as an unwanted Western imposition.
Download Mazes and Puzzles from www.puz.com
This would allow all transports to ignore the character encoding as long as the encoding only uses bytes with the high bit for non-ascii. It also means that case-independence of non-ascii would be illegal, thus stopping the emergence of a dangerous (for security) mess of incompatable implementations of equality tests for URLs.
This would allow us to use UTF-8 for the URL, for the page contents, for email, for everything, and we would not have this horrid mess of prefixes and mime types.
Yes, some programs, routers, etc, would not pass this stuff through. Well, tough, those should be obsolete!
According to the artivle, they're working on a substitution scheme so ASCII only users can still type in the URL's. Does this mean that ASCII equivalets will be arbitrary and unintuitive? If so, that's a problem. Let me propose something slightly different:
Unicode is not supposed to over-unify characters, so the ASCII fallback for Japanese could be the romanji transcription - and therefor registering a Japanese domain name automatically registers the romanji equivalent, except that some kanji have more than one possible romanji transcription.
However, some kanji are unified with Chinese characters, which have a different pinyin trasncription.
Chinese is another problem. The logical ASCII equivalent is pinyin stripped of its diacritical marks. But then, many different characters may have the same transcription.
All Cyrillic languages also have an ASCII trasncription scheme too, but it isn't unified. One character may be trasncribed one way in Russian and another way in Bulgarian. Is there a unified transcription scheme for all Cyrillic languages, and is it truely one-to-one? I don't think so. Look at the character usually transcribed as "j" in Russian, and the one usually transcribed that way in Serbian.
ISO-Latin-1 and -2 fallbacks: For ISO-Latin-1, the fallbacks are pretty obvious: "Champs-Élysée" ==> "Champs-Elysee" or in German "Düsseldorf" ==> "Duesseldorf", but in Czech it's a little less obvious. Does "C hacek" map to "Cz" or "Ch" or "Cs"?
So, here is a possible solution: devise unified ASCII transcritption schemes for each language, admitting whatever ambiguities exist in Japanese or similar languages. Then, when you register a non-ASCII name, you are asked on the form to fill out the transcribed ASCII name that corresponds to it and it is also automatically registered to you.
There is some potential for conflict here, if the ASCII transcription corresponds to an existing registered domain or, as in the case of Chinese more than one foreign name corresponds to the same transcription, but I think the problem is manageable.
To all the people that are complaining on how this will break things. I say good this will help the industry realize that the USA is the only country on the internet. I am glad to see progress.
So how's this gonna work for systems not set up to handle the asian character set?
Read the links.
The proposal implements an ASCII encoding scheme, called RACE. A certain prefix (they list the debugging prefix as "bq-") indicates a RACE-encoded domain name.
The rest of the ASCII encoding either appears in ASCII for dumb browsers, or is converted to Unicode or Big5 or whatever character set it wants.
For "dumb browsers" (not a flame, just an indication of character-set-awareness), you'd see some crazy domain like http://www.bq-ag0970ag00ah07h.or.jp/; for "smart browsers," it would appear in your own kanji font.
[
Has there been an update to the DNS RFC allowing this? If I remember correctly, it do NOT allow special chars in the domainnames.
.. but, that should probably be interpreted as bytes).
Furthermore, does this limit those domains to 32 chars of length? (unicode, 2 bytes per char, dns system allows a maximum of 64 chars for domainnames
Also, doesn't it kinda suck to make large parts of the net unavailable for most?
--paddy
--
"Rune Kristian Viken" - http://www.nwo.no - arca
A few notes...
The Internet Society probably isn't too happy about this. They released a statement on November 8th encouraging NSI to back off and let the IETF IDN WG do its job.
Also, there are companies that are already currently operating in this market, including WALID, which is taking registrations for Arabic domain names (AND RESOLVING THEM), and will soon be adding Hindi, Tamil, and two Chinese scripts before moving into other markets.
Because the Mediteranneans figured out that if they came up with simple symbols that represented sounds (an alphabet) and could be strung together to transcribe those spoken words instead of sepeate ideograms for each spoken word, you could not only learn to read and write much more easily you could also write down other languages with the same written symbols.
One of the major reasons this happened was there was they were trading with different peoples who used ideograms instead of alphabets. Since learning one ideogrammatic written language is hard enough and learning 5 is a single lifetime's achievment, a simpler way was found.
The Chinese were heterogenous and didn't need to deal with anyone other than the Chinese and hence kept their ideogrammatic written language.
It's a simple fact that it's far easier to implement the Roman alphabet on a computer than a zillion independant symbols -- you need less RAM, simpler displays and so on.
What the Chinese need to do is settle on a single way to transliterate spoken Chinese into the Roman alphabet (or even the Cyrillic, Hebraic or Greek if that's what they want). Ideograms are neat, but they're a pain in the ass.
Sorry, it's not cultural imperalism, just pragmatism!
You already are, unless you actually speak those languages. You didnt think the internet was all english speaking did you?
ahhh.. the light just turned on, thank you. the verizon story is still an interesting read.
Wouldnt it make more sense to implement umlauts like ö/ü/ä first?
I have dibs on släshdot.org!!
Scuttlemonkey is a troll
w3m, the console web browser that can format tables, frames, etc, was written by Akinori Ito. He includes support for kanji. I know because there is a #ifdef PC_KANJI that is misplaced every time I go to download and compile it without japanese character support.
I believe there is also a xterm counterpart for kanji.
Mike
"I would kill everyone in this room for a drop of sweet beer."
Hmm. This could lead to fun. Some character sets/character encodings allow different byte sequences to map to the same character. .com, .net and .org.
(See the Unicode bugs recently in IIS, where a unicode representation of '../' is used to navigate upwards in the directories of the server to view files outside of the server root.)
Now, does a company have to register all possible permutations of byte sequences which all map to the same character sequence? As well as doing so in
We'll see.
If it's implemented properly, surely it shouldn't matter The if is exactly what I meant... after support for unicode is added to domain name encoding schemes in applications each and every application has an opotunity to make a mistake... some of them will.
So how's this gonna work for systems not set up to handle the asian character set? Lets say I want to send to joe.bloggs@somechinesename.net from my FBSD or Linux boxes? Not too much fun, I think...
--
WolfSkunks for a better Linux Kernel
$Stalag99{"URL"}="http://stalag99.keenspace.com";
--
# Canmephians for a better Linux Kernel
$Stalag99{"URL"}="http://stalag99.net";
Wouldnt it make more sense to implement umlauts like ö/ü/ä first?
Easier to test etc..
Before you email me, remember: "There is no god!"
If I can't type in the new domain names.
Maybe I'd better upgrade to a Unicode compatible keyboard and OS.
How is this going to work? Since the majority of chinese users input their chinese as big5,
(eg www.ê.com) will not be the same as the unicode equivalent..
The general FAQ answers how the names will appear in a web browser, but they use a GIF to show the Chinese name. So I'm still wondering how it will look to someone without an OS that displays the characters properly. Never mind that you can download extensions to display the content in the web browser; the location will be garbage, right?
Will this be a good kick in the butt for internationalization of your OS?
its all s/w
That would be interesting to implement, considering the RTL direction.
Could possibly mean that a domain name will have to contain some more fields, like charset and direction (RTL|LTR)
-- "If you had fallen into a shit pit during a battle, lick yourself off and move on." - Jaroslav Hasek
I'm running a special clinic on the central reservation of the M4, pop over on foot and we'll discuss it.
actually www.nic.nu has been doing similar thing for .nu domains for 5-6 months now. you can register domain under .nu with all characters in iso-8859-1 (latin-1) and they are possible going to add even support for iso-8859-2 (latin-2).
btw a bit info om iso-8859-1 and iso-8859-2. iso-8859-1 is primarily used in western european countries while iso-8859-2 is used in southern and eastern europe.
-- http://electronicintifada.net --
Anyone wants to encode DeCSS in a domain name?
I'm surprised it took so long for somebody to do this. I don't relish trying to learn a whole new set of shortcuts (my grasp of the 255 odd ASCII set is slipping fast, never mind kanji!). I did a story about this yesterday called over at http://www.t3.co.uk. It's nice to see that the global part of the Internet is still spreading...
No need to remember those numbered IP addresses while surely harder to trackdown by US law enforcements.
just kidding.
Can someone say buffer overflows?
I am !amused.
will english users who don't like to put in korean and/or japanese language inputs on their box3n be completely cut off from a good deal of the net? Japanese input screws up windows keyboards (98se, m$ natural keyboard) and Korean messes up the fonts royally. Maybe it's a good thing, so that lego guy can get his own domain name, put his heart and soul into his work, then he won't have to be criticized by a bunch of assholes saying he's just another obsessive compulsive Japanese. :(
--
Peace,
Lord Omlette
ICQ# 77863057
[o]_O
Ok, it's easy if you have the right keyboard, but how would us with Latin alphabet keyboards, or any of the newly supported characters for that matter, access a URL that contains characters not available on our keyboard?
Where's the RFC?
IS there an RFC?
I can see it now; "UnicodeMap - your essential tool for surfing far-east pr0n sites with dodgy URLs and even dodgier content..."
UNIX? They're not even circumcised! Savages!
Come on, we invented it, we populated it, we control it, and now the Asian hordes are trying to subvert it. Almost right...we invented it, that's true. We control it is also true. But it only takes one quick search for a totally innocent topic, ie. the other day I was looking for info on a bug in MSIE, and you get a shitload of asian porn sites popping up. I wouldn't care, but I can't find the Preview button. All the more reason for them to use english only chars :)
Tom
The vast majority of people who understands only GB but not Big5 code are either illliterate or don't have computers.
Simplified Chinese was, and still is, a good intent but the wrong thing to do. Often the characters before simplification are easier to learn than their simplified equivalences.
They gotta fix the root of the problems first.
Look at all the terrified geeks run for cover as their tidy little Euro-American paradigm begins to crumble before the advancing Yellow Hoard.
In reading all the above posts, you can see terms like, 'A Preposterous System!' and 'A Dark Day Indeed!'
You fucking losers.
What the hell did you *think* was going to happen after everybody tripped over themselves to cash in on selling billion dollar phone systems and western tech deals to China?
Silly White Folks. We must now all perish!
If you thought the Evil American Empire was bad, try one on for size which downright punishes the concept of individuality. . . "The nail which stands up will be hammered down."
Embrace the Dragon, Bomb the Dragon, or be Dragon Food.
-Fantastic Lad
I love the Smell of Xenophobia in the Morning!
But the plain, pathetic truth is that Unicode doesn't really work yet. For example, if you want to guarantee that your web page will not display properly for a large percentage of users, try spelling non-English names like Schro:dinger with Unicode. The first thing we need is for people selling operating systems to stop thinking of Unicode as something they could add later and charge extra money for. For now, Unicode is like the third rail of computing.
--
Find free books.
This idea is simply preposterous.
We're turning the web into distinct little pieces that will be unable to interract with each other, while the interest of the web is its variety and world-wide reach.
Without a Japanese OS, inputting Japanese text, even in a Japanese version of Internet Explorer is a _pain_. (compared to just installing Japanese fonts to read web pages in the language) So with those URLs, we are deliberately limiting the number of people who can access the pages, by making it impossible for some to even type the address. I don't want to have to memorize strings of numbers to find my way around the net.
I don't see where is the problem with romanizing foreign words in URLs. English is the default language for international communications, and all 'language' editions of web browsers can type English characters. So this would make more sense, and keep URLs accessible to all.
Imagine needing a browser that can accept _all_ possible character sets (Russian, Arabic, Chinese, Japanese)... o_O
(Note: I am French and I still find those French URLs unbelievably silly)
I'm reminded by whoever it was here who had the .sig:
"Programmers are so enthralled by the fact that they can that they seldom think about whether they should."
I think this is a perfect example.
The fact is that the internet operates in 8 bits, with 8 bit bytes being sent across the wires. Every piece of software on the planet (almost anyway) uses 8 bit bytes. And, for convenience, 8 bit display characters. I'd just as soon not see programmers all over the world add needless layers of complexity to support all this crap. In a hundred years, there will be a global language anyway - if anything we should be vehmently refusing to pointlessly break perfectly good code to support local quirks. Not to mention the other associated hassles of translation and maintainability.
I wouldnt be as pissy about it if this wasnt forcing unicode adoption. I think it would be far more effective to simply throw out the idea of unicode for any network infrastructure and force those languages that currently need it to make their own 8 bit substitutions. They are going to have to do that eventually anyway, might as well start right now.
And no, I don't care if the global language is english. Esperanto or german would be fine with me. (Side note: did you know that the german language has officially thrown out a couple of characters, most notably the s-set, for similar reasons?)
-dentin
Alter Aeon Multiclass MUD - http://www.alteraeon.com
Are companies protecting their translated domain names? Companies like Coca-Cola, which have a big presence in Asia, also have different brand names to accomdate for the language barrier. Could I buy the translated version of Microsoft.com (if there is one, of course).
www.ermac.org - pick a number.
Th
Isn't it odd that the acronym for the encode scheme of asian domains is called RACE? Who's in charge over there at Verisign, the Klu Klux Klan?
there are 2 kinds of people. those who divide people into 2 kinds, and those who don't.
I noticed a promotion for this on networksolutions website a week or two ago. I think that this is great, but we need TLDs in these characers as well, one with the chinese character for commercial, one for organization, one for educational. I wonder if that new TLD system that they are testing will allow these characters. For 50,000, you could register one of these Chinese TLDs and probably make a lot of money.
-no broken link
according to this story at 2600.com, the current maximum allowed characters for a domain name is 67. that story is a very intersting read about how verizon sued 2600 for registering www.verizonreallysucks.com, so 2600 took advantage of the 67 character max and registeredL awyers.com
www.Veri zon ShouldSpendMoreTimeFixingItsNetworkAndLessMoneyOn