Internationalized Domain Names Coming Soon
rduke15 writes "You think you know how to parse a domain name for validity? Well, in case you haven't noticed, things are getting tougher as registrars keep adopting IDN (Internationalized Domain Names), which uses a weird encoding named Punycode to enable accented characters in domain names. The Register reports about Switzerland, Germany and Austria's joint move to enable IDN. See the overview in English from Switch. But I guess it would be difficult to talk about this on /., since it does not even support basic Latin-1 ... :-)"
More ways for trolls to disguise goatse.cx links...
Sounds like a job for Unicode.
Unicode.org
It looks to me like the problem is that the DNS servers don't support unicode so they're using a bad implementation of it.
Why not extend dns to support unicode? That way they'd be no translation or other crap to go through.
Granted software would need changing but that be the case with the mangled crap that's mentioned in the article.
What am I not understanding here? Or is this just implementation dreamed up to make life complicated?
There's a gorilla from Manilla whose a fella that stinks of vanilla and has salmonella.
I'm sorry, is it just me or do they seem to be taking a bad shortcut to get to a good end? It doesn't seem like they are doing this correctly. Why not plan to migrate to unicode? Their choice seems shortsighted and flawed. I hope they atleast considered unicode and came up with real reasons why not to use it.
But I guess it would be difficult to talk about this on /., since it does not even support basic Latin-1
Just say the ascii number?
When anger rises, think of the consequences.
Confucius (551 BC - 479 BC)
I'm not sure what all the accents are on the alphabet, will I have to know to type them to access a simple website? Sorry, this doesn't make using the net easier.
Trolling is a art,
Now the Europian Union will want everyone to click on the left side of the mouse, left-handers be damned.
The French will demand that "bandwidth exceeded" errors be renamed to "(web page) surrenders"
The Germans will try to take over the internet.
In a sneak attack, the Iraqis will launch a massive DDOS attack, but accidently hard-code localhost in the trojan. The Iraqi information guru will deny everything.
Taco est un mechant garcon.
'
While it's logical for, say, Chinese companies to have a Chinese domain name and Chinese e-mail addresses, it may not be the best choice if the company wishes to expand oversea.
Unfortunate but true, if a company has a Chinese domain name, it would probably be only used within China, Taiwan, Hong Kong, Singapore, Japan (since it's unicode), and maybe South Korea. The company would be pretty much limited to the East Asia market.
However, I suppose the company could get both a Chinese domain and an English, or rather Pinyin, domain so they could make their Chinese, or maybe other Asian clients feel "closer" while also being able to reach clients outside of East Asia.
I also think that it'd be great to give people the option of having a native-language email address. It's not too hard to set up a romanized email alias for it. An SMTP "X-Roman-Address" header could even by added to outgoing messages in case a recipient can't read the default "From" line.
There's 10 types of people in this world, those who understand binary and those who don't.
I sure hope this harebrained idea doesn't take off.
After all, now they need not only worry about registering say...
c rosoft.tv
Microsoft.com
Microsoft.net
Microsoft.org
Mi
etc..
But also
Microsoft.com
Microsoft.com
Well, you get the picture.
--Won't that be grand? Computers and the programs will start thinking and the people will stop. - Dr. Walter Gibbs
I'm delighted to tell that Mozilla is one step forward again, and already supports IDN since version 0.9.5 http://www.mozilla.org/projects/intl/idn_mozilla.h tml
I have mixed feelings about this. I am from Sweden, and it always looks kind of ugly when names lose their dots and circles in the domain name.
On the other hand, this is also quite convenient. I live in the US now, and I travel around quite a bit. I often surf on Swedish Internet sites, typically without access to a Swedish keyboard. It would not be very convenient if the domain names used non-English symbols.
Sometimes I go to Japanese sites also, and I am really glad that I don't have to install a Japanese word processor to do this...
Tor
Any Internet RFC which includes the phrase, -with-SUPER-MONKEYS, has GOT to be good. (And in case you think I'm trolling, check the link.)
[
I am glad too see others than the Mesopotamians using the wheel which was originally invented for use in Mesopotamia.
Slashdot Sig. version 0.1alpha. Use at your own risk.
U.S.A.!!! U.S.A.!!! U.S.A.!!!
If it wasn't for us we'd all be speaking German. Wait.
[ducks]
sig
Punycode *is* a Unicode encoding.
Unicode has many encodings; UTF-8 is one encoding and Punycode is another. UTF-8 aims for efficiency when the majority of the text is ASCII, and Punycode aims for completeness when you must fit in 64 characters and use only the ASCII characters to do it.
[
Yeah, but did anybody get Al Gore's approval to make these changes?
To err is human, but to really foul things up requires a computer
Now I won't have to be limited to using a hyphen! I can register d[i-circ]xiechicks.com, or dixi[e-grave]chicks.com, or maybe dixie[c-cedil]hicks.com!
That last one would be doubly good, because if I understand the Punycode spec correctly, it'll get translated to ASCII as dixiehicks-XXXX.com. Not my opinion of the group, but maybe it would attract hits from the Toby Keith crowd.
Stressed? Me? Of course not. Stress is what a rubber band feels before it breaks, silly.
- - - - ..
I, for one, welcome our new European overlords.
An Indian-American Hindu committed to non-violent thought/speech/action alarmed by the global explosion of radical Islam
It's ve1 21; diff7 ;lt t& #112;e 1 16;h
r&#
ic
o ty
like &#
is
--
the strongest word is still the word "free"
> You think you know how to parse a domain name for validity?
Yes, I do, and if you _read_ the RFC you'll see that nothing changes, these domain names are encoded into the same character set as the current DNS system. And hence if you give me a URL I can validate it with existing scripts. There's an example which shows that Bucher.ch (with an umlaut on the u) would be translated to: xn--bcher-kva.ch which looks totally parseable to me.
John.
Personally I can't wait to see funky chinese character domain names in my web logs (mostly from infected windows machines trying to attack my apache server).
I Am My Own Worst Enemy
thier
...after all, some people find just 26 letters and 0-9 hard enough already ;)
ludacris
femail
curce
mentaly
Since this solution doesn't break any old implementation just the countries that need it will have to modify their software, and not wait for the slow and expensive process of changing all of DNS, which a large part of the 'net isn't motivated do pay for.
Often used url's I have as book marks and when i need some other site, it is much easier to make a guess via Google. What I am looking for is almost always on page one of googles choices.
Sure Google could find a way to handle the special characters and make an intelligent suggestion, if nothing else based on IP address of the request. If it is from Burundi chances of needing a German umlaut is slim
Help fight continental drift.
Let's assume (and I might not be correct in this assertion) that every computer in every country can at least type & see the 26 letters used in the English language plus digits 0-9 and the dash & period signs. However, I have no idea how to type anything coherent in Chinese Simplified or Traditional (hell, it's all Chinese to me...)...
In the interest of fostering the best method to communicate your ideas, products, services, etc., would you not want to use the characters that most everybody can type?
Oh, and this begs the next question - what about languages that go right-to-left instead of left-to-right? How about Thai, Arabic, and Hebrew? Personally, I don't want to see any domain names outside of the 26 chars used in English, 0-9, and the period & dash signs.
"Yeah, let's make sure that every normal english domain name can easily be spoofed with accented characters, not to mention having everyone open up and hunt around charmap to get to these new domains"...
This isnt going to be abused, AT ALL. Worst idea ever.
The Internet (domain names, top-tier nameservers, nameserver software, web and e-mail server software, all markup documents) runs on english, there's no way to i18n it without opening up a world of hurt. Sorry, but I don't want to have to upgrade BIND to a whole new series of bugs and exploits just so that some jagoff can open up his own go~o`le'.com.
The funny part is you'd probably be the first to complain had the Internet been designed by some foreign country and you couldn't register a plain English URL. Learning a whole new language isn't a "little learning curve", it's actually pretty hard.
if you can't handle a little learning curce to access the info, IMO you aren't capable mentaly of doing anything with the info once you access it.
Next time you go to a country the native language of which you can't understand, try planning your whole trip without once reading an English translation of any map or sign. Then you possibly might see how ignorant that statement sounds.
The Internet is a world-wide resource, and like it or not, people who speak other languages have a say in how it works too.
isaac
The internet was built as a highly decentralised, noncontrolled network, so that, in the event of a nuclear war, military leaders would have unrivalled access to pornography. (3DTIAB)
Exercise your right not to vote. thinkoutside.org
Ok, so you're mostly guarenteed a domain name if you own the trademark on the name. (To prevent cybersquatters right?)
.jp domain? How can they possibly handle this, since in Japan you cannot copyright latin characters. (Or at least as far as I've heard)
Well, what about the
This is the reasoning I've heard, as to why IBM is ai-bi-emu in Japan. And maikurosofuto, souni, etc. (roomaji transliteration there, sorry if you don't get why ai=I)
So what do you do in this case? Unless they can enter Shift-JIS or Unicode URLs, then you're stuck having people enter roomaji versions of your name, which remember, aren't technically trademarkable.
I'd love to hear I'm wrong on some point here, could anyone with more info clue me in?
I am unamerican, and proud of it!
It looks to me like this isn't really going to be such a big deal. Their domain names are going to be converted for DNS anyway, so it's not like we would have to type in a complicated string of characters that aren't on our keyboards. So we can't remember what to type so easily, so what? That's why we have bookmarks. Besides, this isn't really for us anyway. It's purpose seems to be to allow the people in other countries to use their own native languages for their own domain names. Easier for them, right? And if we want to access their domains, we just have to remember a few extra letters and dashes. No big deal. They get to do stuff in their language, we translate to ours, the whole world speaks, and maybe something gets done.
"I like you, but I wouldn't want to see you working with subatomic particles."
Good day to answer to a troll, here goes...
26 letters and 0-9 are not the best way to communicate with computer if your native language has more than 26 letters in its alphabet. It's not about being insulted or offended, it's about being understood. The computer speaks all natural languages equally badly, after all.
Let's think about average nordic webshop owner who sells beds online for a minute, operating for example in Finland or Sweden. He wants to sell stuff to the native dwellers and hence needs a domain name that has an "a" with two dots on top of it so that the domain name for bed is spelled corretly in swedish or finnish. It might surprise some people, but there are quite a lot of people who don't speak a single word of english. So the people who he wishes to sell beds to A) know how to spell "bed" in their native language and B)have a key like that in their keyboards, and, *gasp* prefer to use correct spelling when referring to things!
So you don't have an "a" with two dots on your keyboard? That's just too bad, but then again you probably don't speak finnish too well either. Why would you want to visit that e-bedshop then?
That's what utf-8 is for. Why on earth invent yet another encoding?
This means that it can't possibly include ALL of the unicode spectrum, as Unicode supports far more than just 92 extra characters.
Also, the way the coding is going to work, you still can't register a name with B.
I am unamerican, and proud of it!
microsoft and microsoft for instance are two completly diffrent words.
Reminds me of that Babylon 5 episode when they find a person named Zathras down on this planet. Ivanova thought she had been talking to Zathras:
"No, that was not Zathras, that was Zathras. There are 10 of us, all of family Zathras, each one named Zathras. Slight differences in how you pronounce. Zathras, Zathras, Zathras.. You are seeing now?" - Zathras, Babylon 5: Conflicts of Interest
"I'm not impatient. I just hate waiting." - My Dad
The last time I checked, binary had zero, so an off-hand uninformed (slightly prejudiced) comment as yours is even dumber when you actually think about it.
For the Maya's, zero was not just a placeholder. It signified the concept of an absence of value, a.k.a. an empty set.
http://en.wikipedia.org/wiki/Zero
History
The numeral or digit zero is used in numeral systems, where the position of a digit signifies its value, with successive positions having higher values, and the digit zero is used to skip a position. By about 300 BCE the Babylonians used two slanted wedges to mark an empty place in a given sequence of positional digits. It did not function in the true sense of a number. The use of zero as a number unto itself was introduced into mathematics relatively late by Indian mathematicians. An early study of the zero by Brahmagupta dates to 628.
Zero was also used as a numeral in Pre-Columbian Mesoamerica. It was used by the Olmec and subsequent civiliations; see also: Maya numerals.
The ancient Maya civilization used a vigesimal (base-20) numeral system.
A vigesimal numeral system has a base of twenty.
sarchasm: The gulf between the author of sarcastic wit and the person who doesn't get it.
Yes, it is. Because it's not just a few "umlauts". When you're talking about Asian or other non-Romanized languages then the Romanization may be totally incomprehensible to even some speakers of that language. It's one thing to lose a few accent marks and such but it's quite another to translate your language into a totally incomprehensible and unrelated format. In fact in kanji based languages at the very least Romanization actually LOSES information. It's not just a matter of transcribing the sounds into another format because the kanji carry additional meaning not present in just the phonetic lanaguage. If you've ever seen two native Chinese or Japanese speakers talk to each other they frequently will "write" kanji in the air or on the palm of the other person's hand with their fingers because their spoken language is imprecise.These changes are very necessary for the Internet to become a truly international phenomenon
Sounds like a great idea.... If you're willing to re-implement the DNS code in my Win-95 box.... or on my Amiga-4000. How about my 10 year old Apollo workstation or the SUN-3 that's still working just fine, thank you. etc. etc.
A lot of old DNS implementations would choke (and properly so) on UTF-8 encoded DNS names. We probably could have seeded the needs of the future by saying that IP-6 DNS servers should support unicode, but I think that even that boat has been missed. (or is quickly leaving dock).
In the meantime the old DNS and it's anglo-centric presumptions and restrictions are with us for the next few years (or decades, as the case may be). Clearly some people feel the need to live within those restrictions.
Free Software: Like love, it grows best when given away.
Geeks do, but your average surfer does not. They go clickly clickly on the results returned by the search engine or clicky clicky on the link someone emailed them or clicky clicky on the link from some other website.
Most users don't even *know* that you can type stuff in the Address field.
I have discovered a truly marvelous sig, unfortunately the sig limit is too small to contain i
I'm glad to see that people other than the Swiss are being recognized on the web. Which originally started as an Swiss scientific project...
Without the rest of the world, the Internet would have been obsolete and irrelevant by now. Deal.
"djbdns doesn't support unicode either, although it doesn't rely on standard c-libraries, so unicode support might only take a few weeks to add."
djbdns is 8-bit clean. Use UTF-8 all you want right now.
I'm asking because today, I've tried out the Netsol way of doing umlauts and they don't work at all with my Mac OS X and Safari: None of the listed domains work. The page lists a "plugin" that every web user is supposed to install, but it's Win only (of course...) and it's quite silly to have a domain with umlauts if you have to tell all your customers "before visiting me, please install this plugin"...
Any idea if this new way work in all circumstances where the user has a international keyboard? Thanks!
...most certainly do not welcome our new Unicode-munging overlords.
I don't care what the issues are. I have had it up to HERE with charset issues! ENOUGH ALREADY!
If you can't do it using UTF-8, don't do it at all!
Dammit.
Is this truly the only Earth I can live on?
Bah. The ancient Greeks didn't need any accents, why should we?
Don't blame me; I'm never given mod points.
There's no need to put accents on things, you can spoof just as well without. For example: the Greek omicron, Russian lowercase o, and Latin lowercase o all look identical... but they are all different Unicode characters!
Unless the registries all implement some sort of canonicalization, owners of domain names containing the letter "o" are going to have a combinatorial explosion!
Just to diverge, I'd like to represent the non-english speaker view here.
In most of the languages with 'funny accents' like umlauts, these characters often have a completely different pronounciation, and are often considered to be a completely different letter than without the 'accent'.
Simply 'brushing off the dirt' and removing the 'accent' thus changes the word. Sometimes with wierd results.
Just ask someone from the town of Moensteraas, Sweden.
Their website contains mostly municipal information intended for swedes, but due to the restrictions of DNS, the name is instead spelt 'monsteras', which means 'monster-carcass' in Swedish.
Obviously, these people would be happier spelling it with umlauts on the o, and a ring over the a.
First of all, this opens a huge hole for url hijacking and obfuscation.
Say for instance, you get a spam that has a url to http://www.microsoft.com/freeoffers
You too were tricked, but you'll notice that instead of a normal i, it is instead replaced with an accented i or an i with a grave (slashdot strips these btw). Anyone that doesn't use accents (english, japanese, chinese, etc) probably won't catch the minor detail and will probably think that it's really pointing to www.microsoft.com.
This is very similar to, but less obvious than using:
http://www.microsoft.com@via.gra.biz/offers
Most non-tech internet users will also believe this to be Microsoft's web-site. Spammers will have a hay day with all of the new opportunities.
The second non-technical problem is that say I want to go to a Japanese web-site that doesn't have an english url. If I don't know kana/kanji (like most countries don't), then I don't know what letters to type in to get the correct japanese. I would have to get a dictionary and look up each character to figure out what to type.
I agree that it's lame to only have it in english, but at this point, any country that uses the internet already has the ability to type english, but now they will need to be able to type in Japanese, Chinese, Russian, Greek, etc, etc, etc....
There are a couple others, but I don't remember them offhand... So in other words, these characters are unusable for a reason.
_______________________________
"I'm not Conceited...I'm just a realist..."
I know there are times when differnet accents sometimes indicate different words -- but I'm under the impression that it is unlikely that more than one of them would be a "good" domain name. (Am I wrong about that?)
This won't work for non-latin characters, obviously. But UTF-8 seems like a better solution to that. (I understand that most chineese words are 2-3 characters of 2-3 bytes (unified is U-430 to U-9fa and upto U-7ff is 2 characters) for 4-9 bytes -- clearly less than 63 bytes) The obvious downside is that it means that all DNS servers and resolvers must (at least!) be 8-bit clean.
Ohhh the arrogance of americans.
Here's an example of why this is good. In sweden there is a town called Horby. That's 'o' with two dots over it. Their site has to be named 'horby' as it is now (without the dots). Horby means 'the village of whores' in swedish.
Do you think that billions of people who use other alphabets than the american one, are going to agree with anything you said in your post?
This change IS a big deal, not only for small towns, but for loads of big companies, government websites and all kinds of sites you can think of.
Will code a sig generator for food
Why monsteras instead of moensteraas?
Good question. Basically people don't think/too lazy to translitterate the letters properly.
Some places have the forethought to register both:
Munich in Germany has registered both "munchen.de" and "muenchen.de".
(But it's really a u with an umlaut)
http://www.xn--rksmrgs-5wao1o.se/ will work if you are using a recend Mozilla
... : NO
Thanks for the example. Let's do a few quick tests.
The encoded version always works, and leads to a page where you have an unencoded link (normal spelling with the accents).
Copied the unencoded version, and tried:
On WinXP:
- Mozilla 1.4 : OK
- MSIE 6, Opera 6.2 : NO
On Linux - Red Hat 6.2 (of course, that's a pretty old system):
- lynx, ping, host, dig,
(cannot test Mozilla, since this server has no GUI.)
Well, I guess we'll have to live with that horrible Punycode.