Internationalized Domain Names Coming Soon
rduke15 writes "You think you know how to parse a domain name for validity? Well, in case you haven't noticed, things are getting tougher as registrars keep adopting IDN (Internationalized Domain Names), which uses a weird encoding named Punycode to enable accented characters in domain names. The Register reports about Switzerland, Germany and Austria's joint move to enable IDN. See the overview in English from Switch. But I guess it would be difficult to talk about this on /., since it does not even support basic Latin-1 ... :-)"
I'm glad to see that people other than Americans are being recognized on the internet. Which originally started as an American military project...
More ways for trolls to disguise goatse.cx links...
Will this complicate some pages a little? probably.
Will it make a lot of people a little happier? Sure.
Is it a big deal? *shrugs*
Sounds like a job for Unicode.
Unicode.org
It looks to me like the problem is that the DNS servers don't support unicode so they're using a bad implementation of it.
Why not extend dns to support unicode? That way they'd be no translation or other crap to go through.
Granted software would need changing but that be the case with the mangled crap that's mentioned in the article.
What am I not understanding here? Or is this just implementation dreamed up to make life complicated?
There's a gorilla from Manilla whose a fella that stinks of vanilla and has salmonella.
I'm sorry, is it just me or do they seem to be taking a bad shortcut to get to a good end? It doesn't seem like they are doing this correctly. Why not plan to migrate to unicode? Their choice seems shortsighted and flawed. I hope they atleast considered unicode and came up with real reasons why not to use it.
But I guess it would be difficult to talk about this on /., since it does not even support basic Latin-1
Just say the ascii number?
When anger rises, think of the consequences.
Confucius (551 BC - 479 BC)
I'm not sure what all the accents are on the alphabet, will I have to know to type them to access a simple website? Sorry, this doesn't make using the net easier.
Trolling is a art,
Now the Europian Union will want everyone to click on the left side of the mouse, left-handers be damned.
The French will demand that "bandwidth exceeded" errors be renamed to "(web page) surrenders"
The Germans will try to take over the internet.
In a sneak attack, the Iraqis will launch a massive DDOS attack, but accidently hard-code localhost in the trojan. The Iraqi information guru will deny everything.
Have you ever been to a non-english speaking country?
alawys, think more talk less
Yes, but this has nothing to do with physical space. We're talking about Teh IntarWeb, remember?
Taco est un mechant garcon.
'
While it's logical for, say, Chinese companies to have a Chinese domain name and Chinese e-mail addresses, it may not be the best choice if the company wishes to expand oversea.
Unfortunate but true, if a company has a Chinese domain name, it would probably be only used within China, Taiwan, Hong Kong, Singapore, Japan (since it's unicode), and maybe South Korea. The company would be pretty much limited to the East Asia market.
However, I suppose the company could get both a Chinese domain and an English, or rather Pinyin, domain so they could make their Chinese, or maybe other Asian clients feel "closer" while also being able to reach clients outside of East Asia.
I also think that it'd be great to give people the option of having a native-language email address. It's not too hard to set up a romanized email alias for it. An SMTP "X-Roman-Address" header could even by added to outgoing messages in case a recipient can't read the default "From" line.
There's 10 types of people in this world, those who understand binary and those who don't.
I sure hope this harebrained idea doesn't take off.
After all, now they need not only worry about registering say...
c rosoft.tv
Microsoft.com
Microsoft.net
Microsoft.org
Mi
etc..
But also
Microsoft.com
Microsoft.com
Well, you get the picture.
--Won't that be grand? Computers and the programs will start thinking and the people will stop. - Dr. Walter Gibbs
www.slashdot.org, she will be mine... oh yes... she will be mine.
yep. They all spoke english too, although they didn't seem to enjoy it...
(France, Germany, Denmark, and of course GB)
I'm delighted to tell that Mozilla is one step forward again, and already supports IDN since version 0.9.5 http://www.mozilla.org/projects/intl/idn_mozilla.h tml
I give it a day before we're deluged with emails asking to send credit card numbers to a paypal.com site or domain registration renewal notices linking to networksolutions.com.
I have mixed feelings about this. I am from Sweden, and it always looks kind of ugly when names lose their dots and circles in the domain name.
On the other hand, this is also quite convenient. I live in the US now, and I travel around quite a bit. I often surf on Swedish Internet sites, typically without access to a Swedish keyboard. It would not be very convenient if the domain names used non-English symbols.
Sometimes I go to Japanese sites also, and I am really glad that I don't have to install a Japanese word processor to do this...
Tor
This will enable more domains and people from non english-speaking countries will be able to register their domains with their correct syntax.
One thing that kinda bugs me is that this will not be a full port to unicode (apparently it'll be hard to port it all), but a work-around. Kind of reminds the entire Y2K problem... "Why write like it's supposed to if we can make it shorter?" Then, in a decade everybody will be worried because the work-around no longer works and they'll have lots more to do in order to port it all to unicode.
they have their own keyboears with their alphabet, and if they dont want to goto an english site, why would need to switch keyboard mode. think asian languages, think middle eastern, russian, greek, ...
That www.dell.com will automatically forward to www.dell.in?
this is going to add another level of complexity to things for functionality which a) isn't needed b) nobody wants.
I realize that it's important to some people to have their native language represented in what they type and in their communication but isn't this more trouble then it's worth? As it stands now, the system works well. Sure you may not be able to get a umlaut in your domain, but is that really a just cause to change the entire fscking DNS system?
++mse61--
Punycode
africa has a flag?
Any Internet RFC which includes the phrase, -with-SUPER-MONKEYS, has GOT to be good. (And in case you think I'm trolling, check the link.)
[
Is it just me, or does this totally sound like an article in The Onion?
Uh oh. You're speaking "the plain truth". People might be offended.
LOOK OUT!
Seriously, while a good portion of the Internet is English speaking, there's a need for this. Accents are notoriously hard to get into computer programs, and even some languages. For instance, ancient Greek. While this may elicit scorn and laughter, I do in fact need to type Ancient Greek into my browser and my word processor on a daily basis, and the Symbol font just won't cut it. Why? Because I need accents, both stress accents and pitch accents. Even Unicode can't really help me out. I'm glad someone's finally making a little bit of a step in the right direction, even though this probably won't help me at all.
Great
Whatever we do, lets not really solve this problem; lets just add an ill thought out work-around to complicate our lives and introduce misery and unneccessary complexity into the world.
With this carefully thought out lunacy, we can struggle with bugs and problems related to alternate character sets for years.
*Sigh*
U.S.A.!!! U.S.A.!!! U.S.A.!!!
If it wasn't for us we'd all be speaking German. Wait.
[ducks]
sig
Poland (.pl) officialy have IDN domain since 11th September 2003.
:wq
Punycode *is* a Unicode encoding.
Unicode has many encodings; UTF-8 is one encoding and Punycode is another. UTF-8 aims for efficiency when the majority of the text is ASCII, and Punycode aims for completeness when you must fit in 64 characters and use only the ASCII characters to do it.
[
Yeah, but did anybody get Al Gore's approval to make these changes?
To err is human, but to really foul things up requires a computer
Maybe I am confused, but, Japanese domains already exist.
http://www.lSOaZY.com...
Wow, Japanese characters don't seem to show up properly on slashdot...
Anyway, if you enter http://www.ningen-isan.com. Where ningen-isan is the kanji equivilent (human treasure), both URL will resolve correctly.
I don't know the technical details of how it works, but is this a different case?
Now I won't have to be limited to using a hyphen! I can register d[i-circ]xiechicks.com, or dixi[e-grave]chicks.com, or maybe dixie[c-cedil]hicks.com!
That last one would be doubly good, because if I understand the Punycode spec correctly, it'll get translated to ASCII as dixiehicks-XXXX.com. Not my opinion of the group, but maybe it would attract hits from the Toby Keith crowd.
Stressed? Me? Of course not. Stress is what a rubber band feels before it breaks, silly.
you think keyboards the world over look exactly like yours? why should they?
technology is not inherently english-speaking. and even if it starts out that way, it shouldn't have to remain so. just because we made it doesn't mean others don't have the right to use it in a manner more suited to their needs.
Hell, why have screen readers? Is it too much to ask for blind people to print out braille versions of webpages? Why not make everyone use hex lookup tables to enter and read information? or punchcards?!
- - - - ..
I, for one, welcome our new European overlords.
An Indian-American Hindu committed to non-violent thought/speech/action alarmed by the global explosion of radical Islam
don't understand a lick of french? 'taco is a mean man'
It's ve1 21; diff7 ;lt t& #112;e 1 16;h
r&#
ic
o ty
like &#
is
--
the strongest word is still the word "free"
There are many Unicode representations for the same character. (Different problem than multiple Unicode characters with similar names, symbols, or meanings.)
This _is_ Unicode, but with a non-standard encoding that forces one and only one representation for each Unicode character.
Also, Unicode UTF8, UTF16, or UDF16 tend to create very long byte strings for non-western languages. The new encoding is designed to be compact.
Also the 16 or 32 bit encoding variants of Unicode take up 100% more room for Western languages and are not backwards compatible -- only UTF8 and this new encoding preserve ASCII-only domains as readable ASCII strings and preserve backwards compatability with old software (client and server) for all domains.
To accomodate people who are "insulted" or "offended" that thier native language is not fully "respected" by the internet is ludacris.[...]As much as the information on the web should be free, if you can't handle a little learning curce to access the info, IMO you aren't capable mentaly of doing anything with the info once you access it
Like, for instance, figuring out what the hell YOU are saying requires a steep learning curve.
Your inability to master one language must the root of your jingoist hatred of the idea that people from other cultures might get full access to the internet's potential too. I mean, it would be outrageous to expect you to have a bit of a learning curve to use the internet. The rest of the world, yes, but that you might need to learn a lil' something new? Folly!
You can't take the sky from me...
It seems sensible to me that, in a similar fashion that domains are case-insensitive, accented characters, etc. should be based on an original letter, e.g. "a acute" or "a grave" should all be based on the "a" letter, as "A" is based on a". This way, its possible to have the domain name with accented characters or any other non-unicode letters, in exactly the same way we *can* have http://TALrIaS.Net/ which is exactly the same as http://talrias.net/ (shameless plug there).
Obviously this wouldn't work for non-Latin alphabets like greek, chinese, japanese. Thoughts on this anyone?
aterr - an open source threaded discussion board.
I totally agree with you.
dREI from Rome (IT)
What happens when someone registers the domain cnn.com where the c or n is actually a character in a different character set. Then it would be difficult for 99% of the population to tell the difference say when they follow a link to http://www.cnn.com/the_world_is_ending_sell_your_s oul.html
> You think you know how to parse a domain name for validity?
Yes, I do, and if you _read_ the RFC you'll see that nothing changes, these domain names are encoded into the same character set as the current DNS system. And hence if you give me a URL I can validate it with existing scripts. There's an example which shows that Bucher.ch (with an umlaut on the u) would be translated to: xn--bcher-kva.ch which looks totally parseable to me.
John.
MOD PARENT UP!!
Personally I can't wait to see funky chinese character domain names in my web logs (mostly from infected windows machines trying to attack my apache server).
I Am My Own Worst Enemy
thier
...after all, some people find just 26 letters and 0-9 hard enough already ;)
ludacris
femail
curce
mentaly
There is evidence the Mesopotamians are building weapons of mass destruction!
Errr, congratulations. Your Ethnocentric garbage has been modded up. English was not the first and is not the only language in the world, and just because someone who spoke English designed the standards we use today does not justify those standards excluding all other languages for all time. I'd expect someone as gung-ho on the English language to be able to spell "ludicrous" correctly. Turn off the MTV and turn on something educational.
Since this solution doesn't break any old implementation just the countries that need it will have to modify their software, and not wait for the slow and expensive process of changing all of DNS, which a large part of the 'net isn't motivated do pay for.
"26 letters and 0-9 are the best" but what about the punctuation marks? And spaces? Imagine communicating with the computer without spaces? Well, if we all wrote Chinese we wouldn't need spaces, we wouldn't need letters either, just particles (Chinese characters can be constructed from arrangement of 1, 2 or 3 of a small amount of simple shapes), nor would we need numbers.
What's next? Everyone must accept the latin script (or rather the English bastardisation of it)? What will I do when I want to code my APL, I'll have no greek letters to use. And damn, think about the amount of money the US will lose from the lack of expansion of IT in countries which no not use the bastardised latin script??? Hell, computers were _designed_ to run in English after all. No accent marks? Tough sh!t I say.
--
FreeNET user? Comfortable with the adverse selection?
If these people new how ugly a domain name becomes when one can't use all characters in the alphabet. Wonder what Microsoft would use if "f" was not available in domain names. "Microsopht"? It's things like that we have to live with.
Often used url's I have as book marks and when i need some other site, it is much easier to make a guess via Google. What I am looking for is almost always on page one of googles choices.
Sure Google could find a way to handle the special characters and make an intelligent suggestion, if nothing else based on IP address of the request. If it is from Burundi chances of needing a German umlaut is slim
Help fight continental drift.
let me be the first to call www.hotmail.com !
In all matters of opinion, our adversaries are insane. -Oscar Wilde
You meant to say: "a) isn't needed by english speaking people b) americans don't want".
Let's assume (and I might not be correct in this assertion) that every computer in every country can at least type & see the 26 letters used in the English language plus digits 0-9 and the dash & period signs. However, I have no idea how to type anything coherent in Chinese Simplified or Traditional (hell, it's all Chinese to me...)...
In the interest of fostering the best method to communicate your ideas, products, services, etc., would you not want to use the characters that most everybody can type?
Oh, and this begs the next question - what about languages that go right-to-left instead of left-to-right? How about Thai, Arabic, and Hebrew? Personally, I don't want to see any domain names outside of the 26 chars used in English, 0-9, and the period & dash signs.
"Yeah, let's make sure that every normal english domain name can easily be spoofed with accented characters, not to mention having everyone open up and hunt around charmap to get to these new domains"...
This isnt going to be abused, AT ALL. Worst idea ever.
The Internet (domain names, top-tier nameservers, nameserver software, web and e-mail server software, all markup documents) runs on english, there's no way to i18n it without opening up a world of hurt. Sorry, but I don't want to have to upgrade BIND to a whole new series of bugs and exploits just so that some jagoff can open up his own go~o`le'.com.
26 letters and 0-9 are the best, most simple way to use and communicate with a computer, IMO, other than speaking binary at the CPU with a f*cking megaphone. Just because you don't need them doesn't mean everyone else doesn't. I doubt any American websites would ever need to take advantage of this (but then again, why not?), but many European languages simply cannot be rendered properly without accents. It just makes a lot of sense to allow these for URL's as well. I'm all for information being free, and the web remaining a pace for a free flow of information to the whole world, but complicating the very foundation of the way the tech works to avoid some learning curve is just plain stupid. I don't see what a learning curve has to do with anything. Most Europeans are quite adept with English (as opposed to many Americans who can't speak any foreign languages). The point is that they simply want to be able to type URL's correctly in their own language. You don't speak Hungarian? Fine, you probably won't need this, but why not let Hungarians use their own language with their own accents?
Get ready for a slew of new hair band websites...it just wasn't the same without the umlauts!
Beavis: heh heh, he said umlaut, heh heh
The funny part is you'd probably be the first to complain had the Internet been designed by some foreign country and you couldn't register a plain English URL. Learning a whole new language isn't a "little learning curve", it's actually pretty hard.
if you can't handle a little learning curce to access the info, IMO you aren't capable mentaly of doing anything with the info once you access it.
Next time you go to a country the native language of which you can't understand, try planning your whole trip without once reading an English translation of any map or sign. Then you possibly might see how ignorant that statement sounds.
The Internet is a world-wide resource, and like it or not, people who speak other languages have a say in how it works too.
isaac
No, that's boy.
Homme is man, or gentleman really.
Does Texas count?
Reminds me of name-mangling, which I saw a lot back when I was assisting on a cfront porting project (the old C++ preprocessor).
You still see mangling in C++ object file symbol tables -- that's because they wanted to keep the linker's name-space nice and simple and flat.
(I think the new politically-correct term for it now is "name decoration".)
You, sir, are a fucking idiot. I'd say it in the six languages I know if it weren't for the fact that three of them aren't representable without Unicode.
Ok, so you're mostly guarenteed a domain name if you own the trademark on the name. (To prevent cybersquatters right?)
.jp domain? How can they possibly handle this, since in Japan you cannot copyright latin characters. (Or at least as far as I've heard)
Well, what about the
This is the reasoning I've heard, as to why IBM is ai-bi-emu in Japan. And maikurosofuto, souni, etc. (roomaji transliteration there, sorry if you don't get why ai=I)
So what do you do in this case? Unless they can enter Shift-JIS or Unicode URLs, then you're stuck having people enter roomaji versions of your name, which remember, aren't technically trademarkable.
I'd love to hear I'm wrong on some point here, could anyone with more info clue me in?
I am unamerican, and proud of it!
Yes because it's so important for Hans to have an email address:
sexymutha21354828246@ge~'rm\a|n.gmn
that has the correct accents.
It looks to me like this isn't really going to be such a big deal. Their domain names are going to be converted for DNS anyway, so it's not like we would have to type in a complicated string of characters that aren't on our keyboards. So we can't remember what to type so easily, so what? That's why we have bookmarks. Besides, this isn't really for us anyway. It's purpose seems to be to allow the people in other countries to use their own native languages for their own domain names. Easier for them, right? And if we want to access their domains, we just have to remember a few extra letters and dashes. No big deal. They get to do stuff in their language, we translate to ours, the whole world speaks, and maybe something gets done.
"I like you, but I wouldn't want to see you working with subatomic particles."
Good day to answer to a troll, here goes...
26 letters and 0-9 are not the best way to communicate with computer if your native language has more than 26 letters in its alphabet. It's not about being insulted or offended, it's about being understood. The computer speaks all natural languages equally badly, after all.
Let's think about average nordic webshop owner who sells beds online for a minute, operating for example in Finland or Sweden. He wants to sell stuff to the native dwellers and hence needs a domain name that has an "a" with two dots on top of it so that the domain name for bed is spelled corretly in swedish or finnish. It might surprise some people, but there are quite a lot of people who don't speak a single word of english. So the people who he wishes to sell beds to A) know how to spell "bed" in their native language and B)have a key like that in their keyboards, and, *gasp* prefer to use correct spelling when referring to things!
So you don't have an "a" with two dots on your keyboard? That's just too bad, but then again you probably don't speak finnish too well either. Why would you want to visit that e-bedshop then?
as a precursor to a much greater problem?
This is a step in a direction I dont think we want to go. Imagine if this goes through, if you will. What will follow?
Next you're going to hear about programming languages being developed in other languages. Think outsourcing to india is so great? Wait til your next batch of outsourced code cannot be read, because it's not in english anymore!
One of the things about computing has been the language standardization. Sure, you can do things in other languages, but it's generally been accepted that English is the way to go for things like programming languages and domain names. Granted, this only happened because of the involvement of the US in the creation of the net, but still, it's primarily a Very Good Thing (TM).
Perhaps an international language will come out of this? That would be nice, but I see this as the first step in splintering the internet and the computing world at large.
I'm all for information being free, and the web remaining a pace for a free flow of information to the whole world
As long as the language is US English, eh?
The world is a big place. You ought to get out and see more of it.
That's what utf-8 is for. Why on earth invent yet another encoding?
This means that it can't possibly include ALL of the unicode spectrum, as Unicode supports far more than just 92 extra characters.
Also, the way the coding is going to work, you still can't register a name with B.
I am unamerican, and proud of it!
microsoft and microsoft for instance are two completly diffrent words.
Reminds me of that Babylon 5 episode when they find a person named Zathras down on this planet. Ivanova thought she had been talking to Zathras:
"No, that was not Zathras, that was Zathras. There are 10 of us, all of family Zathras, each one named Zathras. Slight differences in how you pronounce. Zathras, Zathras, Zathras.. You are seeing now?" - Zathras, Babylon 5: Conflicts of Interest
"I'm not impatient. I just hate waiting." - My Dad
If http can be a standard, xml can be a standard, posix can be a standard, why stop there? Why not have english be the standard too? If developers have to wade through the confused bable that is the W3C recommendations, then certainly the rest of the world can drop their own native languages just as surely as we drop our own native implementations of rendering and networking engines.
English as the world language is surely as efficient as a single standards based unix as a world operating system.
This is my sig.
Why on earth are hyphens the only allowable punctuation at the moment?
Is there really any reason to continue to disallow things like:
10%.com
10off.com
#dot.org
and most importantly Andy_R.com
while allowing motorhead.com to have their umlauts?
A pizza of radius z and thickness a has a volume of pi z z a
> Think outsourcing to india is so great? Wait til your next batch of outsourced code cannot be read, because it's not in english anymore!
You've obviously never seen a program coded in Perl. *cringe*
I agree that the world has many langauges and many nationalities, cultures, etc. that should have ready access to the Net, and for whom English is not their native tounge. But I must make two important points that go hand-and-hand.
:)
1) English is the most widely used language in the world, using a ROMAN LATIN alphabet that many other langauges hold largely in common with it.
2) Complicating matters is antithetical to the very nature of the Net. We need a LCD for addressing. Otherwise large sections of the Net will become segmented based on national and lingustic boundries.
I like being able to browse Japanese web sites, for instance. But if I had to use kanji to get there? I'd have a hard time doing so. Most English speakers would be likewise. This is so simply b/c they already speak the most common tounge. Conversely, numerous Japanese citizens are familiar with the Roman characters, and even some basic english.
We should consider this carefully.
As my Pappy used to say, "you just can't get here from there"
isn't slashdot rendering html like shit today. shouldn't you guys be testing on a test server or something?
As far as I can tell, every client's "gethostbyname()" is going to have to be modified to support this, no matter how it's implemented. Right?
That's a huge number of machines to be upgraded anyway. So why not do a clean design? The added cost of upgrading the DNS servers will be miniscule compared to the client-upgrade costs.
This reminds me of the flaw with Verisign's SiteFinder so-called "service" -- where they mistakenly put a client-side feature on the server-side. With Punycode translation, they seem to be making the opposite mistake -- they're applying the character translation on the client-side instead of the server-side where it belongs -- the server already provides translation services, so why not simply add in the Unicode translation as well?
It's a crappy solution to get rid of the symptoms - 8-bit brokenness of DNS and SMTP servers and web browsers. Those apps should be fixed in the first place to be 8-bit clean so we can finally use something other than ASCII that actually makes sense, such as UTF-8.
*.Dar-al-Kufr
*.Dar-al-Harb
What more do you need? :-)
--- Ban humanity.
Sounds like a great idea.... If you're willing to re-implement the DNS code in my Win-95 box.... or on my Amiga-4000. How about my 10 year old Apollo workstation or the SUN-3 that's still working just fine, thank you. etc. etc.
A lot of old DNS implementations would choke (and properly so) on UTF-8 encoded DNS names. We probably could have seeded the needs of the future by saying that IP-6 DNS servers should support unicode, but I think that even that boat has been missed. (or is quickly leaving dock).
In the meantime the old DNS and it's anglo-centric presumptions and restrictions are with us for the next few years (or decades, as the case may be). Clearly some people feel the need to live within those restrictions.
Free Software: Like love, it grows best when given away.
Geeks do, but your average surfer does not. They go clickly clickly on the results returned by the search engine or clicky clicky on the link someone emailed them or clicky clicky on the link from some other website.
Most users don't even *know* that you can type stuff in the Address field.
I have discovered a truly marvelous sig, unfortunately the sig limit is too small to contain i
"djbdns doesn't support unicode either, although it doesn't rely on standard c-libraries, so unicode support might only take a few weeks to add."
djbdns is 8-bit clean. Use UTF-8 all you want right now.
I'm asking because today, I've tried out the Netsol way of doing umlauts and they don't work at all with my Mac OS X and Safari: None of the listed domains work. The page lists a "plugin" that every web user is supposed to install, but it's Win only (of course...) and it's quite silly to have a domain with umlauts if you have to tell all your customers "before visiting me, please install this plugin"...
Any idea if this new way work in all circumstances where the user has a international keyboard? Thanks!
Base64-ed UTF-8? String comparisons can still be carried out in the encoded form anyway. Nothing except the browsers needs to be updated?
...most certainly do not welcome our new Unicode-munging overlords.
I don't care what the issues are. I have had it up to HERE with charset issues! ENOUGH ALREADY!
If you can't do it using UTF-8, don't do it at all!
Dammit.
Is this truly the only Earth I can live on?
One thing that's nice about using roman-character-based languages for DNS is that there's only one REAL way to spell things. But what about in Japanese where you can use kanji, hiragana/katakana, or romaji?
:) )
For example, what if someone wants to have the domain "nanisore.com" (translated: "what is that?"). Do they get the domain in kanji/hiragana, all hiragana, or all romaji? I guess they have to buy all combinations if they want the domain exclusively. Ugh.
(That must be the point of this whole exercise. Make the domain registrars more money by creating the need to register multiple domains where one would have sufficed previously!
Microsoft.com
Microsoft.com
and
Microsoft.com
It's tough to tell the difference, but it's there for those who can see it.
To signal use of this scheme, Switch et al propose to signal use of this encoding by prefixing domain names using it with the sequence "xn--".
Does anyone else see a problem here?
<thememusic> ....
Your assignment, should you choose to accept it, is to use the xn-- prefix and punycode to register a domain in the Swiss, German, or Austrian country domains which will transliterate to g*****.** in the asian script of your choice.
There's no need to put accents on things, you can spoof just as well without. For example: the Greek omicron, Russian lowercase o, and Latin lowercase o all look identical... but they are all different Unicode characters!
Unless the registries all implement some sort of canonicalization, owners of domain names containing the letter "o" are going to have a combinatorial explosion!
Just to diverge, I'd like to represent the non-english speaker view here.
In most of the languages with 'funny accents' like umlauts, these characters often have a completely different pronounciation, and are often considered to be a completely different letter than without the 'accent'.
Simply 'brushing off the dirt' and removing the 'accent' thus changes the word. Sometimes with wierd results.
Just ask someone from the town of Moensteraas, Sweden.
Their website contains mostly municipal information intended for swedes, but due to the restrictions of DNS, the name is instead spelt 'monsteras', which means 'monster-carcass' in Swedish.
Obviously, these people would be happier spelling it with umlauts on the o, and a ring over the a.
I've had a hard enough time trying to educate my family to check that a URL in an email is actually from the domain they say it is from (ebay, etc).
Now I have to deal with teaching them to recognize domains with accented characters to be fraudulent. Here's to another wave of harder to detect social engineering.
of an article I read a while back, about "URL hiding by using alternate character set". I did a little bit of searching, and come up with this
One problem with non-Latin scripts is that cybersquatters could begin registering non-Latin versions of popular domain names in order to divert viewers from intended destinations. Two Israeli students did just that in order to make an international point: They registered microsoft.com using the Russian Cyrillic "o" and "c," an international domain that looks exactly like microsoft.com in English even though it is in fact a different domain name.
Whole text can be found here
Here we are arguing about new standards supplanting old ones. We've already got esperanto, and - being technical people here on slashdot - we already understand it. Let's just make it the standard.
Oh, it didn't catch on? Damn, I thought I had this one licked.
Anyway, if the characters wont render on every machine properly, then it'll be a great day for the crooks out there who already do a pretty good job of fooling the masses (can you say paypa1.com? I know you could.)
FWIW, when talking about languages, especially the rise of English, I view most of the comments from the French as sour grapes. I mean, how insulting is it to have to admit that the Lingua Franca of the world is now American (which is somewhat distict from, but clearly similar to "English," which is spoken by very few - mostly in primary education classes devoted to that topic).
Is it just my observation, or are there way too many stupid people in the world?
Did they really require fags to wear a pink triangle in Nazi Germany? Those crazy krauts, gotta love their sense of humor!
Only the English drive at the left side, and they aren't the EU last I checked (they don't even use Euro's)
Now, suppose that Cyrillic letters are added to the DNS in the future. Many people say that unicode should be used, and that implies Cyrillic too. It is impossible to distinguish visually Cyrillic "o" from Latin "o", yet their Unicode codepoints are different. In other word, you cannot be sure that the URL containing "o" is the one you want -- it may be one that is visually the same, yet the codes behind it may be different. Thus an attacker can lure people with URLs that look properly, yet they resolve to completely different IP addresses, namely the attackers'.
First of all, this opens a huge hole for url hijacking and obfuscation.
Say for instance, you get a spam that has a url to http://www.microsoft.com/freeoffers
You too were tricked, but you'll notice that instead of a normal i, it is instead replaced with an accented i or an i with a grave (slashdot strips these btw). Anyone that doesn't use accents (english, japanese, chinese, etc) probably won't catch the minor detail and will probably think that it's really pointing to www.microsoft.com.
This is very similar to, but less obvious than using:
http://www.microsoft.com@via.gra.biz/offers
Most non-tech internet users will also believe this to be Microsoft's web-site. Spammers will have a hay day with all of the new opportunities.
The second non-technical problem is that say I want to go to a Japanese web-site that doesn't have an english url. If I don't know kana/kanji (like most countries don't), then I don't know what letters to type in to get the correct japanese. I would have to get a dictionary and look up each character to figure out what to type.
I agree that it's lame to only have it in english, but at this point, any country that uses the internet already has the ability to type english, but now they will need to be able to type in Japanese, Chinese, Russian, Greek, etc, etc, etc....
If you have a domain name like www.samba-choro.com.br here in Brazil, the registar won't let any one but you to register www.sambachoro.com.br (without the hifen). Hope they are smart enough not to let anyone register the accented version too.
Why not? We need more fonts, better input methods, and ideally, better keyboards. Our 26 letters, and the ten or so other symbols isn't enough anymore--for the internet guru, this restriction should feel restraining. 101-key keyboards? Please. Keyboards with more keys aren't exactly hard to make.
The only real obstacle is getting past the sentiment that English or Latin-based languages are the only important natural languages. The foundation technology for true multi-lingual access is already here.
> It might surprise some people, but there are quite a lot of people who don't speak a single word of english. ;-)
Really? I have never spoken to any of them
Oh, the typewriter days.
;-)
I'm getting all warm inside.
You know, I'm only 23 and I can still recall doing those accent tricks on a typewriter? (A good ole Dutch one with a letter "ij" under the left little finger -- now why did we have to loose that letter in the Information Revolution when the Germans still got their Umlaut? We have been walked over!)
And having a C= 64 of course. You know, my grampa has seen some history passing in his life, but so have I!
"We can confirm that Debian does *not* ship the version with the trojan horse. Our version predates it." [CA-2002-28]
scripsit isaac338:
I would add that anything Indo-European using the Latin, Greek, or Cyrillic alphabets doesn't count. A smart American can fake his or her way through a lot of signs in anything from Spanish to Russian. Try that with Hebrew or Korean.
In principio creauit Linus Linucem.
Why not just incorporate this into the IPv6 standards?
This sig no verb.
I learned that your a confrontational twit from MTV. Oh, baby, baby...
Another ignorant american? The American historical perspective is constrained to the last century it seems. The world has been going on for a lot longer than you think, 'Joe'.
The most glaring evidence of this conclusion is the confusion in the parent about who is the historical agressor and who does the majority of the surrendering.
France is one of the all-time most agressive countries. They've taken on almost all of europe at once, and prior to the last century, no single country managed to defeat them. And the only reason france collapsed so bad in WW2 is that they were hurting from the double-impact of the depression and massive losses from WW1. Surrender and subversion was their only option, seeing as how the americans did not see fit to provide backup for them.
Does this matter to the high-school-history-challenged american? nope. Apparently not...
And germany, a traditionally pascifist country, made was involved in just a few wars in its history (and only the in the two of recent times were they agressors)
sigh...
FUCK!!!!!!
.ORG ISN'T!
.ORG then the whole URL must be written in Latin letters (there are no 'R' or 'G' letters in Russian), no? I go to apoma.org and see something really different from Russian wine supplier.
I thought this idea was put into the grave year ago. It required crazy plugin and of couse nobody had it.
Now to the problem. In Russian some letters look like latin and other don't. So if you type aroma.ru you get to the company site. Now imagine that they registered name apoma.ORG. In cyrillic. I.e. that 'p' is cyrillic 'r'. And other letters are cyrillic to. But
What I have to enter when I see (and remember later) that URL from street ad? If I see lats
Even if everyone got used that domain name contains both (and I don't know how to explain that to average dumb net user) the confusion will happen too often.
The domain name system is already messed up.
...) which are more than enough to fool Joe Average.
A month ago an old domain name of mine (that I hadn't renewed in time) was re-registered by someone else hours after it was released. The domainname in question was a long, flemish name with a very specific meaning, so it was a deliberate overtake. The squatters are simply comparing snapshots of the domain name DB's to find expired domain names.
The new owner was a shady company... they look like a registrar, but to buy a domain name you first have to fill in a form where you enter nothing but the wanted domainname and your email address: no doubt so they can register it before you can, and then force you to buy it at any price they like (namegiant.com).
Domain name spoofing has already been done through simpler means (m1crosoft, http://www.microsoft.com@1.2.3.4/,
It might not be important for English speakers, but even accents can cause differences in meaning. Plus all those 'squiggly chinese characters' are the normal way of writing that particular language. If it bothers you that you cannot read it, learn the language.
I don't see the web getting any much worse from this, yet it offers significant advantages to the rest of the world.
And germany, a traditionally pascifist country
Doh?! You obviously have never heard of Prussia, have you? They practically invented militarism. Living there basically meant being in the army.
Enlarge your code Today!!! And inches in length and diameter!! Make her scream with delight!
--
"Outlook not so good." That magic 8-ball knows everything! I'll ask about Exchange Server next.
my favorite quote from the spec is "The ratio of basic string length to extended string length is small."
That means the extended string length is much longer than the basic string length....
Somehow I don't think that's what they meant.
Never. ;) Actually, it's not so much the Holocaust that the g'parent was mentioning - more respect for Germany's war abilities, like winning a war with France as soon as it even existed as a unified nation. ;)
Americans are just tweaked at france cuz it didn't fall in line like Britain did.
Not the case - if that were true, you wouldn't find a littany of France jokes before 2001. But you do, because France has been a military joke for the last 200 years, approx. It's also the massive ingratitude following WWII displayed by France that tweaked Americans, combined with France's complete military ineptitude.
It's not France's not supporting the war that angers Americans so, at least me. It's the fact that they were more interested in playing politics simply to get revenge with America for some perceived slight, which prompted the move they made. That and France was trying to become a friend of Sadaam, trying to get sanctions lifted so they could trade with Sadaam, etc. I truly think that with France's support early on, the whole scenario could and would have been resolved without a war.
On the other hand, most american folks are OK with Japan these days... odd
I would say it's the way that Japan, in the wake of getting nuked, turned into a very peaceful, hard-working society. Hard not to respect that a bit.
-Looking for a job as a materials chemist or multivariat
It's not just the anglo-chauvinists who want to leave domain names in ASCII. There are strong arguments for it. Some things ... word-processors come to mind ... obviously must be able to use a large character set like Unicode. But the argument for expanding the character set that domain names can be written in is much weaker. And if any move in that direction breaks a lot of existing implementations, maybe it just shouldn't be done.
use lose
not loose
you know it makes sense kids
You realize umlauts in domain names will only bring about another wave of hair metal bands, don't you?
It goes from God, to Jerry, to me.
For Gawd's sake, this is SLASHDOT, Man! You must mean mute point!
Personally, the current implementation of IDNs is just a cop out to provide a marketable option for the folks out there. It's all just following the horrendous RFC guideline for timeliness. Wouldn't it have been better that every system just switches to accepting 8 bit, and hence unicode?
Will these domain names be Funkifiable?
For example, http://slashdot.org == http://1109654166/.
Join moola.com, play games to earn money.
I know there are times when differnet accents sometimes indicate different words -- but I'm under the impression that it is unlikely that more than one of them would be a "good" domain name. (Am I wrong about that?)
This won't work for non-latin characters, obviously. But UTF-8 seems like a better solution to that. (I understand that most chineese words are 2-3 characters of 2-3 bytes (unified is U-430 to U-9fa and upto U-7ff is 2 characters) for 4-9 bytes -- clearly less than 63 bytes) The obvious downside is that it means that all DNS servers and resolvers must (at least!) be 8-bit clean.
Motorhead and Motley Crue rejoice!
- DRFSR
This system: ...and this is supposed to be *better* than upgrading our DNS servers to support UTF-8?
-Requires a browser plugin (therefore doesn't work with mail, IRC, etc)
-The plugin is Windows only and IE 5+ only
-shortens domain names by 4 chars + at least 2 more per every accent/umlaut/kanji
-Is mostly unreadable in it's ASCII-fied form
0 1 - just my two bits
I found the technical section of the Switch article didn't fully explain the encoding. After thinking that I should code up the RFC to figure it out (and perhaps gain a name for myself on CPAN), I found to my mixed delight that someone had already beaten me to it: IDNA::Punycode and Encode::Punycode.
--LP
No, wait.
.. .. ..
..
Already getting that.
' '
Digital Cable Booster - only 49.95 izc x uxqdshl
D'oh!
If "trilingual" means someone who speaks three languages, and "bilingual" means someone who speaks two language, what's English for someone who can only speak one language?
...it loses a lot in translation, but you get the idea.
American.
My post should be understandable anyway but slashdot left out the non-English characters o with two dots over it and a with a ring over it in the names Monsteras and Horby.
Karma. Moderation. Is my
Sparta hardly invented militarism...Sparta, anyone?
-BIFF
(antilamenessfilter lowercases and some more)
HEY! Who's modding a legit first post as FLAMEBAIT???
The internet is now worldwide, but it originally started as DARPANET - an American military project!
I'm American, but honestly most Americans can't see past the nose on their face!
Somebody mod me back up!
... unless we're faking our humanity for advertising purposes only, of course.
Thank God a mega-corp isn't actually demonstrating REAL humanity for advertising purposes. That would cause devastating confusion.
Let's not even get started with genuine corporate humanity: my cynicism is perfectly functional under the current arrangement.
-kgj
-kgj
And here i was thinking that South East Asia is where most people live and where the fastest growing economy is.
But off course you mean that USA companies also need a chinese domain name...to not be limited to the few people on this world that only speak english.
European Linux user, living in Antwerp
First they destroy the very concept of a clean, simple and reasonably useful markup language because they couldn't care less about organisation and contents as long as they can show shiny bits to mesmerize the consumer.
Took years to begin to fix that one.
Having DNS support other character sets is a consequence of the second of their mostrous brain farts: confusing addressing and indexing.
A domain name is NOT an index keyword. It has no meaning beyond setting up a delegation structure. 'foo.com' need have nothing to do with foos; it's just a frigging address .
My postal address is "40 Some Street, Laval, Qc, CA" Canada -> Quebec -> Laval -> exact location. Repeat after me. De-le-ga-tion.
Let's do a little brain experiment. Let's say all the postal services of the world decided to go the DNS way:
My address could be marc-andre-pelletier.programmer.
It would be a nightmare for any postal service, because rather than delegate the information to find where the house is to increasingly regional authorities, they'd have to look me up in a huge database to figure out where to send my mail.
It would also be confusing because of mark-andre-pelletier.programmer and marc-andre-peletier.programmer ad nauseam.
Of course, then there is the conflict that will ensue because there is inevitably another programmer with my name who may or may not be better known than I and will want my 'address'.
Sounds familiar?
This is what is inexorably happening with DNS. It's too late to change any of this; the marketroids already imposed that fuck-up and domain names are already worthless technically.
So what do we do? Let's ask the marketroids. "Let's increase the breadth of the flat namespace even further!"
Sigh.
Allowing more (conceptual) codepoints in domain names isn't being nice to non-english speakers. It's just increasing to the confusion with no useful value.
Maybe we should petition the phone companies to get unicode phone numbers. It'll make your phone ring much less often-- not only will many people not be able to remember your number right (or even successfuly write it down (how many people know how to write an uppercase epsilon?)) but many people won't even have a phone with the right symbols.
Who knows; maybe telemarketers will stop calling when they can't figure out your number.
-- MG
I'll just be the devil's advocate here:
The premise of this entire project is that we really *need* european letters for domain names. I say we don't. (I'm not being U.S.-centric... hell, I'm not even American.)
The point is:
Let's say I claim to need a ' ' (space) in my domain name. Or let's say J.Crew needs a '.' for their URL. Or let's say I need an exclamation point for my domain name here in the U.S. in order to correctly reflect my copyright.
The point is that domain names were *never intended* to completely reflect spoken or written language -- English or any other. They have and have *always* been a 'semi-representational' system.
The undercurrent of most of the discussion here is that the current lack of unicode characters reflects a sort of digital American unilateralism.
This is a crock.
Let's not forget that what the current lack of unicode actually represents is a *non-capitalist* system where brand names and precise spellings are less important.
I for one am quite happy to approximate German letters with the closest equivalent. And if Volkswagen gets upset about it then maybe their advertising agency should have been thinking globally and rebranded without an umlaut.
Pthtp!
------ The best brain training is now totally free : )
I guess the pun(s) was(were) intended by the author of the coding? Will have to Google around a bit...
Loose is loose because it has two o's.
Lose has lost because it has one o.
The solution here is to not use domain names for verification rather then trying to force the whole world to use Latin characters for naming internet sites. As you mentioned, there are already problems with using Latin. A bank system should already have a digital certificate, and that's mostly what we're worried about.
The reason not to use Unicode is that it won't be backwards compatable with the old DNS standard, which was meant to be compatable across ASCII and non-ascii systems.
autopr0n is like, down and stuff.
Ohhh the arrogance of americans.
Here's an example of why this is good. In sweden there is a town called Horby. That's 'o' with two dots over it. Their site has to be named 'horby' as it is now (without the dots). Horby means 'the village of whores' in swedish.
Do you think that billions of people who use other alphabets than the american one, are going to agree with anything you said in your post?
This change IS a big deal, not only for small towns, but for loads of big companies, government websites and all kinds of sites you can think of.
Will code a sig generator for food
Anything that has link to non-"old-standard" site in the body of the message yet got sent to plain old latin US domain can be easily throwin into SPAM folder.
Might not work for companies but certainly will for individuals.
Hyperom.com
Why monsteras instead of moensteraas?
Good question. Basically people don't think/too lazy to translitterate the letters properly.
Some places have the forethought to register both:
Munich in Germany has registered both "munchen.de" and "muenchen.de".
(But it's really a u with an umlaut)
http://www.xn--rksmrgs-5wao1o.se/ will work if you are using a recend Mozilla
... : NO
Thanks for the example. Let's do a few quick tests.
The encoded version always works, and leads to a page where you have an unencoded link (normal spelling with the accents).
Copied the unencoded version, and tried:
On WinXP:
- Mozilla 1.4 : OK
- MSIE 6, Opera 6.2 : NO
On Linux - Red Hat 6.2 (of course, that's a pretty old system):
- lynx, ping, host, dig,
(cannot test Mozilla, since this server has no GUI.)
Well, I guess we'll have to live with that horrible Punycode.
fleshes out some of the issues.
we've all switched to IPV6. =)
Seriously, who's going to use one of these domains when big chunks of your visitors can't call up your site, or at least can't call it up with the "same" address (I gather from one of the earlier posts that there is some sort of encoding to make it magically work for everyone).
Why not UTF-8? I suppose that might have broken apps that expect ASCII-only, which I guess is one of the reasons we now have base64 or quoted-printable encoding for email. But it sure would have been nice. Reading domain names in Punycode in many cases will not be a lot easier than reading binary... :(
This method allows new domain names to work with existing DNS servers and clients. While old clients won't show internationalized domains they will still be able to use them. In other words, it doesn't break anything that currently works. While unicode is altermately a good solution it requires every piece of software that uses domain names to be rewriten to handle unicode. This means everything from mail clients, web browsers, web servers, dns server, mail servers etc.
The world is a big place. You ought to get out and see more of it.
This is an odd statement. The truth is that those who actually do get out and see the world are those most exposed to the problems that miscommunication causes. Far too many people tie their culture to their communication and are unwilling to change. If most countries can seem to standardize on the metric system without much problem, why can't they all standardize on a language (any single language; not necessarily English)? If we can all share Euros, why can't we all share the same word for money?
When you get to the web page its all in this funny Kanji stuff. Clearly they should all just write in American english;)
doesn't the A in ASCII stand for American?
So instead of ANSI don't we need a INSI (I standing for international) standard?
That sounds like a very good idea. To make it shorter, we could call it ISO (International Standards Organization). Yeah, what a terrific idea. Let's get a domain name straight away. Maybe iso.org? Sounds cool? Well someone took that already: ISO.
"Here's an example of why this is good. In sweden there is a town called Horby. That's 'o' with two dots over it. Their site has to be named 'horby' as it is now (without the dots). Horby means 'the village of whores' in swedish."
rofl
i those two lines convinced me why your right
Capitalization is language dependant.
Just as the quality of being a "different letter" is language dependent.
But, certainly we can make a good compromise that most people will like, for global capitalization.
re: But if 16-bit charsets are allowed...
16-bit charsets ?
Perhaps you confuse Microsoft's implementation of outdated Unicode 2.1, and their deep confusion between character sets and character byte encodings.
But if the essence of your argument is that we should not allow Korean domain names which contain both Chinese and Hangul, your argument seems to me to be badly misguided. I argue we should give people what will please them, rather than what seems easy to implement.
AFAICT, your argument is that we should not allow anything outside of ASCII, because it will complicate your life. I assume this means denying even basic Korean domain names (say, in Hangul only). That seems worse than misguided to me -- that seems to be what we started with -- the mantra of "English only and forever".
I always find it annoying when that happens, how come all aliens speak English? Babylon five handled it pretty well as far as I can remember, in fact in a Crusade episode some English speaking aliens show up and the humans get freaked because of that. I won't even get into the humanoid aliens thing (Star Trek being the biggest offender of all).
Alien languages are always severely mistreated, it's a pity, really.
Unfortunately, djb's "proposal" boils down to "use UTF-8" with no interoperability for RFC 1035 clients (the idea that "all relevant software has been upgraded" will ever happen is simply ludicrous). Only allowing registration of dissimilar characters is a good move, but he should have gone further and mandated a mapping from prohibited to allowed characters (if the problem is that the user can't distinguish them, how are they supposed to know which one to type?)
The script direction would seem to me to be just a display issue. However the user enters the domain name, it would be encoded and transmitted in just one direction. I think that even scripts written vertically (as mainland Chinese was until the Communist Party adopted the horizontal as a standard) should be supportable.
Because standardizing on a language is just slightly more complex than standardizing on a measurement system or currency--not to mention the cultural implications (though I wouldn't expect an American to understand that).
Actually, /.ters, we should ask them for a domain name "/..org" ... i will add this: /..com. in a 66.35.250.151
... if i don't came back in 5 minutes, i know why : )
to my company's dns server
WTF am I doing replying to an AC at 5 A.M on a Friday night?
I am actually quite concerned about the push to internationalize DNS.
It is not that I don't have things to gain from it -- people would be buying more domains, and my company, among other things, sells domains. I also speak two languages; one of them requires accents in some situations. It would be nice to be able to include them.
So its not that I don't understand the attractiveness to the various stakeholders.
BUT, from a practical perspective, I think it is a nightmare. We've already seen situations where people register paypa1.com (the last character there is a one, not the character l) and use it to grab people's info. Additional possibilities include spammers registering domains similar to others' and sending spam with a URL on that domain. Or entries in syslog. With the limited characterset currently allowed, the only thing that can happen is people who aren't looking closely, or are using certain fonts that don't necessarily distinguish things as well as they could/should, get burnt. But if we implement international domains, there will be a LOT of ways to register names that are incredibly similiar -- and depending on how much of unicode/utf-8 we implement, it would actually be possible that there would be two different encodings for a character that is *supposed* to appear exactly the same on screen.
What a nightmare.
SSL Certificate
I'm just happy that I can finally start my Motley Crue fan site:
:P )
www.motleycruerox.org
They put the "uber" in "umlaut"! Rock on!
p.s.: this was, of course, the real reason why unicode was invented.
(slashdot isn't allowing posting of html entities/unicode,
> He claimed to have created the Internet. I fail to see the distinction between creating an invention and inventing it.
No, he claimed to have taken the initiative in Congress in creating the Internet.
Get it? Congress created the Internet; Al Gore led the charge. That much is true, it's on the public record. Without Gore's efforts, the Internet as we know it today would not exist.
Yes, he could have been less boastful and he could have worded it better. But he didn't lie, and he didn't say he 'invented the Internet.'
This has been a public service announcement.
Why not extend dns to support unicode? That way they'd be no translation or other crap to go through.
Because some total fuckhead at the patent office allowed the idea to be Patented.
echo '[q]sa[ln0=aln80~Psnlbx]16isb572CCB9AE9DB03273snlbxq' |dc
Just get everyone to remember the IP address of all sites they are ever likely to visit.
Roll on IPv6!
Ceci n'est pas un sig.
Saying that because english is not standardized does not address the problem. What we really need to do is have a world wide standard language. Esperanto sucks, but, something english like could gain considerable traction.
Some principals of this language:
a) a phonetic language. a symbol in writing should represent a sound produced by humans.
b) one sound, one symbol. I agree that the relationship between phonemes and graphemes is a complete mess. Many to many relations between the two cause needless confusion.
If all we did was to have a single set of characters represent a single set of sounds, we would be infinately farther along then we are now. I think UNICODE solves the wrong problem.
We argue of favor of standards for all levels of human work - suggestions that some other protocol than http be carried on the web are oft met with derision, that some other database language than SQL ought to be introduced decried as heresy, and often cultural arguments for technology are dismissed as laziness. But, those same arguments can be applied to human language, and arguably, they should carry greater weight, as, what's the point of any standard if half the world cannot read, or if, cultural differences in languages render meanings subtley different?
One World == One Language.
Everything else is unimportant.
This is my sig.
For the love of GOD, or just plain your eyesite, do NOT visit: this site.
If you do visit this site, please post your experiences with the rest of slashdot so we can laugh at your stupidity.
Go on. I double. Naaa. Triple. Naaaaa. Pentiple dare you to visit the above site.
Thank you. HAND.
Do foreign languages have alphabets, or is the concept generalized from its original meaning? alphabet. aka. alpha beta. aka. the first two letters of the latin, for want of a better word, alphabet.
So should other langauges have squiggle1squiggle2 as their name for an alphabet?
Curious, or brain dead, people want to know.
DARPA and USC contributed money and goals, but Bob Kahn, Vint Cerf, Jon Postel, and their contemporaries had already created the Internet well before Gore's involvement.
Of course Gore deserves a lot of credit for what he did. But "creating the Internet" is not what he did (that was already done). Cerf's well-earned respect is leading him to overlook the plain meaning of Gore's statement.
And the point remains, he didn't claim to have 'created the Internet.' Congress did. Vint Cerf et al did their research with money approved by Congress. Al Gore was the major legislative backer for NSFnet, which is the basis for our modern Internet.
Of course the technologies were already invented, but the original ARPANET didn't become the "Internet" (using TCP/IP) until 1983. That was pretty much a government/academic network, not a commercial one, and the "old" Internet was subsumed by NSFNet in 1990. Thanks, in large part, to Gore.
He sort of fumbled his comment, I'll grant you. But he was talking about his congressional record at the time, and his statement was mostly on-target.
Here's the story if you have the desire to read the whole thing.
We'll just have to disagree. IMHO "the Internet" is a protocol suite and an address registry, which Cerf et al are the creators of even though the US Congress sponsored their work.
This *is* Unicode. To deal with Unicode, participants need to agree on an encoding, such as UTF-8, UTF-16, ISO-8859-1, etc. In this case, they have chosen a 7bit clean encoding method that means no changes need to be made to the DNS infrastructure and still usable by people using applications without Unicode domain name support.
An application that wants to support Unicode domains needs to implement RFC3490 (Internationalized Domain Names in Applications) and RFC3492 (Nameprep: A Stringprep Profile for Internationalized Domain Names (IDN)). Nameprep is sort of like an i18n version of lower(), used to compare domains unambiguously. IDNA defines the mechanism for encoding a Unicode domain into the ASCII clean representation. If people want to play, Python 2.3 has this out of the box:
(erk - Python's triple > prompt confuses SlashDot...)And yes, Virginia, there is nothing stopping you registing Unicode domain names in .com, .net, .org etc. right now, if you know how to encode them yourself. To the DNS system and most registrars they are just perfectly valid ASCII domain names, with the decoding left to the applications and/or network libraries. Dealing with US keyboards and obsolete browsers is left as an excercise for the reader.
Actually, that makes me suspect that domain registrars must be pushing pretty hard for this.
Think about it -- a French site would have to register their domain w/ the accents AND the unaccented domain: one to be correct, and the other for the foreigners. An international Japanese company might need dozens of domains, because of the different ways their name is transliterated into various languages.
Hm. It seems like there are going to be a lot of problems that this change can cause, mostly because it's NOT easy to type the chars that will be in some of these URLs. It's not obvious at all how to type an accented e (plus it's different by OS). Why? It's a cause of frustration even for English-only speakers! "So... you will resume sending me your what??" When I was studying French in school, I used to mark all the accents by hand into papers I typed up, after printing them out. It was just so much faster than any other method I found. I can't even imagine how I'd manage to input a URL in cyrillic or pictographs.
The thing is, though... this is the lesser of two evils. I really don't see everyone sharing a common language anytime soon, which would be the other solution. The internet all over the world is swiftly moving from academics and geeks to businesses to common folks, all over the world. Don't you WANT to be able to fully translate your website, so any bumpkin in Siberia can order your products? They aren't all going to learn a new alphabet.
Think about it. Just about every language has its own keyboard layout. All my emails from Europe looked pretty funny until I figured out how to change the keyboard layout... (yes, you can do this in many web cafes. Damn! I can't spell caf-ay without an accent!). Someone somewhere is going to have a tough time typing any given characters.
It's only fair if accessing foreign language sites is an equal pain-in-the-ass for all. The internet is not destined to be a single language medium.
Maybe the operating systems and keyboards need to improve. What a shocking thought that is. Hey, it would be a huge help to lots of frustrated kids trying to do their foreign language homework on a computer.
And really, if you think about it, how many foreign-language URLs do you type in during your day? Sometimes you get a search result that looks interesting so you translate it. Okay... you don't type that URL, you copy it and paste in the translator. Where else do we get URLs? When won't you be able to just copy it? Very rarely...
I think we can deal with it for a while.
There are only 10 types of people: those who understand decimal, those who don't, and, uh, 8 other types I forget.
/ / ..
I emailed taco my resume from the web cafe. Am I being naive?
(they should hire me; I know how to fix it!)
There are only 10 types of people: those who understand decimal, those who don't, and, uh, 8 other types I forget.