ICANN Under Pressure Over Non-Latin Characters
RidcullyTheBrown writes "A story from the Sydney Morning Herald is reporting that ICANN is under pressure to introduce non-Latin characters into DNS names sooner rather than later. The effort is being spearheaded by nations in the Middle East and Asia. Currently there are only 37 characters usable in DNS entries, out of an estimated 50,000 that would be usable if ICANN changed naming restrictions. Given that some bind implementations still barf on an underscore, is this really premature?" From the article: "Plans to fast-track the introduction of non-English characters in website domain names could 'break the whole internet', warns ICANN chief executive Paul Twomey ... Twomey refuses to rush the process, and is currently conducting 'laboratory testing' to ensure that nothing can go wrong. 'The internet is like a fifteen story building, and with international domain names what we're trying to do is change the bricks in the basement,' he said. 'If we change the bricks there's all these layers of code above the DNS ... we have to make sure that if we change the system, the rest is all going to work.'" Given that some societies have used non-Latin characters for thousands of years, is this a bit late in coming?
Changing a system which works is a very, very bad idea.
Wont this open up the system to many more phishing attacks involving addresses which include non-latin characters which look similar to latin ones?
Wait, so it's not tubes... It's a 15 story building?
Anyone else getting more lost every day?
"If you make people think they're thinking, they'll love you; But if you really make them think, they'll hate you." - DM
It won't break the whole Internet! Just DNS. DNS is overrated anyway. Now if you'll excuse me, I need to finish reading all the new posts on 66.35.250.150.
Quidquid latine dictum sit, altum sonatur.
Yes, countries that use non-English characters should be able to interact with the rest of the world using their natural language. No, they shouldn't rush the change and risk a possible crash of a large portion of the Internet. Be patient young patawans, soon you will be able to have DNS names with any character you can think of, but it will be reliable and actually work.
Space for rent, inquire within
Plans to fast-track the introduction of non-English characters in website domain names could 'break the whole internet', warns ICANN chief executive Paul Twomey
Luckily for us, GWB knows that we have some redundancy with the Internets, so if one breaks we can just use another.If you want news from today, you have to come back tomorrow.
The ICANN tries to give a technical reason to a political problem, although this reason may be valid, it is not a very good idea. With the UN, it will be handled by international comitees and we will all be long dead before they finally agree on which country will be in that comitee.
Perhaps, but I can't fault ICANN for this one, as much as I might like to. Like it or not, most internet technologies have their roots in latin speaking countries, which means systems developed there may not be tweaked to work with outside language schemes.
If the fault lies with anyone, it's with the individual contributers of the tech. Or better, with the non-latin countries appearent lack of interest in some of the core projects needed to push this through ICANN ( specifically DNS, httpd ).
Mod me down with all of your hatred and your journey towards the dark side will be complete!
- Don't be too surprised when people around you start building their own houses rather than choosing to pay rent.
DNS upheaval has been a long time coming, and the current anti-American sentiment worldwide isn't exactly helping to stabilize it. We're already seeing all sorts of adhoc routing setups that deal with shortcomings of an ameri-centric DNS. My guess is that within the next few years, ICANN's 'control' of the internet will be in name only as everyone else in the world will have moved on to alternative routing and domain systems.
The next Slashdot story will be ready soon, but subscribers can beat the rush and slashdot the links early!
"Given that some societies have used non-Latin characters for thousands of years, is this a bit late in coming?"
No.
Zonk either knows zero about the histories of the Internet or DNS, or is so enamored of finishing stories with questions that he'll tack on the truly ridiculous.
What you do with a computer does not constitute the whole of computing.
For all you people saying "There's no problem, just do it" - I say watch out... there will be a rush of attacks and spoofs as soon as this is opened up. The letter "a" appears in the unicode character set multiple times, and some of the variants are almost indistinguishable. I'm not just talking about someone registering släshdot.org, I'm talking about someone reigstering slashdot.org (the a is FF41 instead of the normal a). Good luck telling the attacks appart from the real sites.
I'd be in favor of the change just because anything that undermines the Unix Tower of Babel -- the dependency on ASCII which complicates text handling sooooo much even when Windows solved the problem soooo long ago -- is good. Even Java gets it. Even Apple (finally) get it. Unix Is Teh Problem.
And the ASCII problem isn't just bad because it forces people to use inefficient encodings like UTF-8 (THREE bytes per character?) It's bad because it allows people to write code like:
if(string[index] == '.' || string[index] == '?' || string[index] == '!') sentenceEnd = true;
(a line repeated, with subtle variations, several hundred times in the code of a certain ubiquitous editor).
And, lo and behold, the above does not work, but once it appears in a few thousand places it's impossible to fix, and a vast towering structure of fixes made by people who don't really understand why it's an issue is built.
So, even though the proposed change would be hugely inconvenient for a huge number of people, I'm in favor, because I want the world to grow the fork up and understand that text != byte array some time while I'm still alive.
Whence? Hence. Whither? Thither.
> Given that some societies have used non-Latin characters for thousands of years, is this a bit late in coming?
Had Hammurabi stored his laws on silicon instead of stone then perhaps there would be a point to that question.
Unicode has many characters that look almost exactly like characters in Latin-1.
For example, if "www.microsoft.com" is shown in your browser's address bar, how would you know for sure that the "c" is not from the Cyrillic alphabet, or the "o" is not from the Greek alphabet?
You simply won't be able to trust your browser's address bar anymore. The possibilities for phishing attacks are endless.
Whatever happened to Punicode (Unicode in a special dns-characters-only encoding format)? There was some hoopla about the scheme, which would require browsers to show punicode-encoded URLs in the appropriate characters on the screen, but some naysayers said that it was a phisher's dream since many glyphs throughout Unicode looked alike. I figure this issue has nothing to do with Unicode per se, but with phishing vs certified sites in general, but I haven't heard a peep from the Punicode camp for over a year.
[
Prince no longer goes by that strange symbol as his name anymore.
Where were you when the voynix came?
Domain names in Tengwar, Yay!
So, how does Google work in an international Internet? If each content contributer is submitting in their native language, will I be able to search for terms anymore?
I don't think English SHOULD be the default language, but there is certainly some advantages to one language for all content. Related to that, weren't all computers susposed to be using Japanese or something by this point? There was a prediction "back in the day" along those lines for a while, something about that character set being more efficient for machines to parse...
Nice for localising, sure, but how usable will Japanese, Indian, or Arabic script URLs -- for example -- be for those who do not have access to the respective sets or keyboard layouts?
Of course it's late in coming.
But that doesn't mean it should be done hastily and badly.
http://alternatives.rzero.com/
why would you want a domain name the biggest users (and hence customers) on the internet worldwide cannot type on their keyboard.
As there is no current danger of the current DNS address space running out...
sticking to ASCII seems sane to me but then again this is a political not a technical problem....
"One source of the pressure was Adama..."
And he will not rest until the script of each of the 12 Colonies is properly represented with ICANN. I hear he's not too keen on Cyrillic, however.
Where were you when the voynix came?
Many of them actually do speak English
Help me take back Slashdot. When did 'News for Nerds' become 'FUD and Conspiracy Theories for Extremist Nutjobs'?
Given that some societies have used non-Latin characters for thousands of years, is this a bit late in coming?
Let's be clear. The domain name system only uses English characters. There are lots of languages in Europe (Italian, Spanish, French...) which are closer to latin than English (which isn't really a latin language at all) which are not currently represented, because you can't use accents in domain names, or other letters such as the spanish Enye (n with a squiggle, actually a distinct letter). English speakers often think accents aren't important but they can completely change a word's meaning.
The internet was originally conceived, designed, and implemented in the USA at a time where hardware was at a premium, and corners were cut to conserve that limited resource. DNS was just one of the results of that era. However, it is the most visible because it is the front end means for people to find each other. That means there is now a very well established standard, used by people across the entire globe, that is very difficult to change.
Changing all the DNS servers in the world to switch from ASCII to Unicode is NOT trivial. The fact that some societies have used non-latin characters for thousands of years is completely and utterly irrelevant. THEY didn't make the internet. They simply bolted themselves on to an existing infrastructure.
I agree that progress needs to be made to accomodate non-latin characters, but to have people whining about "how they want it, and want it now"... That's just ridiculous. It's like waltzing into a house that was built 40 years ago and having a tantrum because the stairs are too steep and the house is too squished. Major structural renovations take time, effort, and careful planning. And there is nothing you can do to avoid that, short of implementing cheap stop-gap measures that are virtually guaranteed to cause even bigger unintended headaches later on.
Given that some societies have used non-Latin characters for thousands of years, is this a bit late in coming?
Those societies did not build an entire economic and social infrastructure using all 50,000 of those characters in a few decades, though.
Rex is 09 F9 11 02 9D 74 E3 5B D8 41 56 C5 63 56 88 C0
"What we're trying to do is change the bricks in the basement."
It's the Internet; so it's more accurate to say we're changing the bricks in your parent's basement.
Are you...Are you some kind of genius?
No, ma'am, I'm just a regular Slashdot reader.
Why English? Why not, say, Spanish, or Mandarin?
How 'bout we all just speak English and forget about all those weird letters.
(It was a joke... well sort of)
ICANN Under Pressure Over Non-Latin Characters
You mean white people?
If Nalgene water bottles are outlawed, only outlaws will have Nalgene water bottles.
Now if you'll excuse me, I need to finish reading all the new posts on 66.35.250.150.
Base-Ten CHAUVINIST!!!
What about societies that use Base 2 [binary], or Base 8 [octal], or Base 16 [hexadecimal]?
Or entire societies, like the British empire, which use no base at all?
12 inches in a foot. 3 feet in a yard. 1760 yards in a mile...
60 seconds in a minute. 60 minutes in a hour. 24 hours in a day. 7 days in a week. 52 weeks in a year [give or take]...
Or how about base 12?
12 keys in a chromatic scale: A 440, then, logarithmically [give or take a little well-tempering]: A#, B, B# == C [kinda sorta], C#, D, D#, E, E# == F [kinda sorta], F#, G, G#, and finally A 880.
Except that on the continent, things are often just a little sharper - say A 443/444/445 & A 886/888/890...
And let's not even get into water freeezing & boiling at 32 & 212 versus 0 & 100...
Were some random non-UTF8 country to make interworking with the rest of the Internet harder, it would be cutting its nose off to spite its face. For the G7 countries (yes, G7, not G8), the value of Internet connectivity to random minor countries is minimal. The value to those countries of Internet connectivity is large. Do US users care if Uzbekistan is on the Internet? No: it has zero impact on 99% of them, minimal impact on 0.9% of them, etc. Do people in Uzbekistan care about being able to access Google, Wikipedia, Amazon, CNN, the BBC? I rather think they do.
No one likes pointing out to random minor countries that their presence on the Internet is far more in their interest than it is in anyone else's. But that doesn't make it any the less true. So, in general terms, the choice they're getting is ``largely anglophone, largely UTF-8, or nothing''.
ian
Who idea was to use cyrillic character where theres and exactly identicaly non-cyrillic character on the ascii set. They just decided to stick cyrillic characters in a totally seperated grid rather than adding the additional characters form the cyrillic.
2. Doesnt require any change to the DNS system. (other than some name policy changes)
3. Allows links to be imbedded in normalweb-pages so that they can be cut and pasted by anyone with latin functionality. So a Japanese person could cut and paste the link to some arabic site that they dont have the font for.
4. While this is a kludge it has some major advantages over rebuilding the DNS system.
Storm
DNS won't break. In fact, it already works! The thing is called IDN and is supported by all modern web browsers (including IE). Try for yourself - http://www.kozowski.pl (I hope Slashcode won't caniballize letter "").
So DNS and Web is OK. Any breakage I can think of may appear in email systems or other domain-based forms of communication.
:wq
Because those languages are dirka, obviously
I don't see anyone complaining that air traffic control "should include non-english words", so why should the internet include non-latin characters? The system works, and I see no reason not to leave it as it is.
maybe you're stuck in os9, dunno...
Apple just somehow made "their unix" do all kinds of fancy stuff
m10
(let's keep ourselves from the argument if macosx/darwin is to be classified as a *real* unix or not - it's a completely different discussion)
It's so frustrating how often people or groups with great political power but little technical insight frequently force changes without the capacity to truly weigh the risks involved. So very bad things continue to happen unnecessarily.
Case in point: The US publicizing advanced engineering documentation on how to build a nuke because some shmuck, who didn't grasp the consequences, ordered it posted thinking that somehow it would help his political party.
As technology becomes more ingrained in our lives, politicians will salivate even more over the being able to claim party to any major breakthroughs. There is more or less a separation of church and state in most countries. Can't we do the same with technology?
Never ascribe to malice what can be adequately attributed to ignorance. -Napoleon
... parallel multi-nets. I guess servers will have multiple domain names for same IP address, one for each culture they wish to address.
:) ).
No matter what, english-language net will continue to be *the* Internet, a global Forum, direct connection between common people from all parts of the world ( Hey there!
All the other nets will have quite a marginal significance. Nations will try to boost them in order to keep their citizens indoctrinated with own traditional values, but things that do not fly by themselves usually have short age, lose appeal and fade away. Internet as we know it will take only a mild hit, so no worries.
All this is needed for final globalization of internet - reaching people of the world with only as much as elementary literacy in their own mother's tongue. That is something that native english speakers take for granted - "Your grandma can use Internet". Well, most grandma's of the world still can't, or have difficulties with it.
"we have to make sure that if we change the system, the rest is all going to work.'"
Pfttt. To quote Rocky Squirrel "that trick never works". Especially if they drop bind, and come up with something new. It's positively guaranteed that there will be a crack somewhere, waiting for a smart cracker to take advantage of it. History has well shown that the odds are extremely high there will be a back door vulnerability somewhere in the system.
*rubs his hands in greedy anticipation*
Oh yeah, here's a hint. Beware of accepting code from China (the allusion to the Trojan Horse is appropriate).
Perhaps *they* might have spent some of those thousands of years inventing computers and internetworks? If *they* want non-latin characters, let *them* build *their* own root nameservers, *their* own implementation of DNS, and go for it. Who's stopping *them*?
I thought this was what unicode was for. The only 3 scared characters that I wouldn't want messed with are the ":", "/", and "." How come we don't have a unicode DNS solution so countries could use the entir unicode address pool for domain names? I've read postings basically bashing the non-English world for not being invovled with the original tech so being left out. So that's a valid reason to discrimnate now? What used to get me excited about slashdot was the unquie solutions that you could find in the comments for real world problems. Used to be, on slashdot at the mere suggestion of a unicode DNS solution some one would either find one or write one. Now a days we get bitching and moaning regionalism about how either the US is behind the rest of the world due to our policies or that the US is better than rest of world because we speak English. Ug. Slashdot drives me batty sometimes.
I say that non-English countries should just do it. If it breaks Standard US or European IE or FireFox by putting the domain name in the address bar, no loss to your country.
At the risk of sounding like a cultural chauvenist... because we invented the damn internet, and we speak English, and use the Latin-1 character set.
If individual countries want to implement their own DNS-equivalents in their national character set them more power to'em, I say. However, they'll also have to deal with upgrading every DNS-capable application on every machine in the country, then find a solution to the massive problem of phishing they've just caused by introducing two identical-looking (but numerically different) characters... and then find a way to enable other nationalities to type and use those URLs without necessarily having the characters on their keyboards or character-sets on their machines.
I honestly don't see a way around this.
Everything in moderation, including moderation itself
Besides, think of how well prepared DNS will be to start supporting lookups in extra-terestrial languages when the time comes if we do this now! We'll be completely compatible with Martian, Klingon, Mimbari, and Vulcan networking systems the day we meet them! We should be able to view each other's pron almost immediately!
----- Connection reset by beer
The internet was developed in the US, and therefore US people designed the early protocols. The internet has been around for how long now, and these countries are just now really trying to push their character set. Couldn't you have thought about this before we had so many webpages and domain names? And with non-latin domains, aren't we setting ourselves up for something like www.@(öö)@.com? I wouldn't mind it in a hyperlink, but just try to type it.
Why URI in non-English speaking countries should start with "http://", "ftp://", ..., instead of the accronym of the protocol's translated name (which shoud be something like "ptht://", "ptf://" in French :) ) ?
Does ICAN control .cn (China)? Or other national TLDs? Why don't they just start registering .com, .org, .mil (ie the USA TLDs) English.
domain in their local language. Leave
God forbid these foreigners "view source" and realize that html is largely broken English. That would really put a bee in their bonnet.
Tht ìs thê £äst thïñg wë ñèêd
Dibs on ©óm
What's this going to do for security. Didn't we have phishing attacks receintly that consisted of unicode characters being inserted into e+bay.com for instance that didn't get displayed. the domain e+bay.com being different than ebay.com.
"A domain name is a unique address that allows people to access a website, for example, smh.com.au"
No,a domain name is a sequence of characters mapped to an IP address. It was designed so as you won't have to remember 66.35.250.150 instead of slashdot.org. This wasn't a problem while the original Internet consisted of just four computers. DNS was never designed to provide identity. There was also the case of a stock trader hacking a DNS server and redirecting traffic from a legitimate finantial site to his own where he had duplicated the real site only with bogus information.
"He said that this could create problems where, for example, a character in Urdu looks identical to one in Arabic"
It sure could. How about totally replacing DNS with a system of online identities.
davecb5620@gmail.com
"Yes, countries that use non-English characters should be able to interact with the rest of the world using their natural language."
Why... No really. You speak as if this is a good thing. Why should they be able to use their natural language rather than English? Why shouldn't they be restricted to a limited area of local language speaking people?
The reason the Internet is useful is because everyone speaks TCP/IP. Incompatible protocols are to be actively discouraged because they balkanise the network. Language is exactly the same. The reason the Internet is useful is because everyone speaks English, the more divided it becomes the less useful it becomes.
Languages are anachronisms, the only reason we have more than one is the physical distance between locations and difficulty travelling allowed them to evolve independently. Well that isn't the world we live in any more and the different languages actually make communication far more difficult now. They're no longer beneficial. So get rid of them, insist on a common language. The most popular happens to be English at the moment. I could live with Spanish, but for those of you about to suggest Chinese, read this before deciding: http://www.pinyin.info/readings/texts/moser.html
We should be using this opportunity to actively get rid of languages.
Deleted
http://en.wikipedia.org/wiki/Internationalized_dom ain_name
Im in a country that is based between europe and middle east, we have a few non-latin characters in the alphabet, still it creates problems when conferring domain names.
no wonder the middle east (arabic) countries are especially wanting this, because the majority of the inexperienced internet users there will be more likely to easily use these domain names, hence the sites using those domains will be greater incentive for controlling what they see, because these domains will be under their control nationally.
not only this, but we as it people will be very unwilling to change all our software to adapt with the new situation because of the horrible development/testing/implementation involved, and hence wont be accepting these domains as valid in our network traffic, which will create a second internet which is as described above, less free.
this should not be allowed.
Read radical news here
1. Physical
...
...
2. DataLink
6. Presentation
7. Application
8. Tubes
9. Bricks
10. Porn
11. Google
12. YouTube
13. ??
16. Profit
It was hard enough remembering them all back when there were only 7.
I do not know of these "patawans" of which you speak. I felt a great disturbance in the DNS as if thousands of URLs cried out in anguish, and then were silenced.
much even when Windows solved the problem soooo long ago
i18n on windows is far from "solved".
I do admit that MS had a huge benefit when they started pushing unicode.
(It takes a company with microsoft's level of clout to push around national governments )
And the ASCII problem isn't just bad because it forces people to use inefficient encodings like UTF-8 (THREE bytes per character?)
Perhaps you don't realize that UTF-8 is moving on to become the most dominant character encoding,
and the legacy cruft such as UTF-16 (designed to deal with design flaws in windows) is being phased out.
Even languages that would end up as mostly 3 byte characters tend to benefit from the savings on single byte
characters for control and formatting markup.
I'm not going to harp on about it, but a few basic web searches could enlighten you here.
if(string[index] == '.' || string[index] == '?' || string[index] == '!') sentenceEnd = true;
Code like that *works* in UTF-8, which is one of the things that makes it beatiful. (among many others)
It allows you to deal with world characters sets when it matters, and allows you to ignore them when it does not.
(for example, a lexical analyzer that specifies its tokens does not want to support punctuation from every language ever conceived)
And if you think code like that doesnt exist in the windows world, you are sadly quite naive.
In my experience internationalizing applications, its typically far easier to upate unix applications, which
on occaision need nearly no changes at all, compared to the laborious grind and near total re-write often needed
for ms-windows applications.
Virtual keyboards, character maps, and the Optimus keyboard, just to name a few points...
China, Persian, and Arab nations need more security through obscurity to prevent kiddy-script cracking with Latin/Russian/English fonts and control the internet in their sovereign totalitarian nations for god and corruption to freely grow in their two-cast corporatist-communist (haves and nots, just like the US and EU) cultures.
... they aren't even spelled the same.
... to provide better tools for building "The Great Digital Divide Wall" for themselves, the Terrorist Defense Network (TDN), and other two-cast totalitarian corporatist-communism nations. God bless US one and all!
Remember: An exploitable-labor force is never a slave-labor force, I mean like, for real, exploitable and slave are two totally different words
This should help China, Persian, Arab nations and Yahoo, Google, Microsoft
Also, the collapse of the internet will be useful to developing the "New World Order" that everyone in charge really wants.
Unaccountable leaders are masters, and unrepresented people are slaves. How do US and EU fare?
Run the new one in parallel, same port just different host, and let the user decide which one to use, which is what they do now. Eventually everyone will be using the new system.
Undetectable Steganography? Yep, there's an app fo
Because it'll be the easiest way to be sure you're hitting the correct server.
Deleted
If Europeans and other Westerners manage to write with a few dozen symbols why cant Asians do the same? It would be easier for everbody (including their children). Some Asian nations are already on the right track, the Turkish and the Vietnamese people already switched to Latin writing; why cant other Asian nations do the same? Anyway the Alphabet is an Asian invention (Phoenicians were Asians - Lebanon is in Asia). The switching nations wont adopt a Western writing system, but an Ancient Asian one! The sooner the better!
Regarding switching, there are discussions about backward compatibility (pepole wont manage to read old books, etc). Sure the Vietnamese and the Turkish peoples solved this problem in some way; other nations should ask for advice from them.
The only thing inflamitory about that statement is how true it is...
But all this will lead to is more domain hijacking and phishing... these people that are complaining about not having non-latin characters in a domain have just run out of domains to hijack.
Adding unicode to DNS names would make phishing much more difficult to detect unless all the browsers, email clients and other tools are modified to indicate that a URL may not be what the user thinks it is. It is bad enough as it is, and remember, most Internet users are not as savvy as those of us on Slashdot. I forsee a lot of security implications by adding this.
This post is encrypted twice with ROT-13. Documenting or attempting to crack this encryption is illegal.
1) the infrastructure needs to support unicode-16 and -32 DNS names.
2) collisions need to be found and identified, and name-registrations that collide with existing names not allowed.
A collision is any letter that is easily confused with another, the very things scammers use to trick you. Even in the existing 37 characters, 1 and l collide in some fonts. Add uppercase, and 0 and O will collide. Unfortunately, any existing collisions will probably have to be grandfathered.
3) some standard needs to exist for accents - are they a single code or two?
Trivia: In the days of yore, ancient devices called "typewriters" did not have "!" or "1". You used "l" (lowercase l) for "1" and for "!" you used "'" (single-quote, apostrophe) then backspace then "."
Knowledge is how to play a game, intelligence is how to win, wisdom is knowing what game to play.
From your post it appears that you think that *someone* has this solved.
... except it didn't last.
... but even microsoft can't abide by its inefficency.
They don't.
Microsoft, for examples, uses UTF-16. Long long ago UTF-16 and UCS-16 were the same, so you could encode all unicode characters in two bytes. Sure, it's inefficent for the bulk of text out there (which is in ascii) and totally incompatible with the bulk of systems out there (which use ascii).. but it avoided the problems with varible length characters.
You see, long before Microsoft managed to actually ship a real production ready product with their wonderful commitment to two-byte fixed length characters it was realized that 16 bits isn't enough to encode all the characters that people care about (in particular a lot of asian language characters were left out). As a result UTF-16 broke away from UCS-16. UCS-16 is the fixed lenghth representation which can't capture all the unicode characters, and UTF-16 became a variable length representation which is two bytes at a minimum and four bytes at maximum. So users of UTF-16 still have all the bugs from mishandling variable length encodings, but they just happen in less obvious ways.
Of course, you could use UCS-32
This is just getting out of hand. What we need to do is replace Latin characters in DNS with numbers. That way there will international unity through the language of math. Every DNS entry will be a number that points to current IP addresses. I don't know why this hasn't already been implemented. It completely solves "the problem."
Chums up, let's do this!
And we, dickwad, invented the computer -- so you are
henceforth required to spell words like honour, colour
and aluminium, correctly when using one.
You are not at the risk of sounding like a cultural
chauvenist, you are a cultural chauvinist.
To build a standardised layer on top of DNS that translates the native alphabet in to whatever subset of ASCII DNS allows?
That way none of the rest of the protocol stack would be disrupted and everybody would be able to enter all URLs on a standard (English) keyboard?
While it would be neat to be able to have addresses using non-latin characters I think there are some fundamental problems with this.
First of all, what happens to those unable to type non-latin characters. Windows and OS X both support such text entry, but how the hell would a non-Chinese speaker know how to type anything, assuming they know how to set it up or even choose the proper language? Automatically the vast majority of the World's population is for all intents and purposes blocked from Chinese sites. In most cases it may not matter, but the fact is that people will be indirectly denied access to some websites.
Secondly, aren't there fundamental problems interpreting non-latin characters at the OS-level? As I'm sure most people here know, non-Latin characters are formed by a string of what essentially looks like nonsense characters. If even one of those is lost for whatever reason that character goes missing. How will the browser, let alone a server know that a string of characters is an actual character and not gibberish. And what if it happens to coincide with something in some other language? Then there are other problems like Traditional Chinese versus Simplified Chinese, versus Japanese. These languages all share characters, but they don't interact meaning you can't copy a character typed in Japanese and paste it as Traditional Chinese.
Then there are all the forms of encoding which add to the problem. Forcing everyone to use UTF-8, for example, would cause huge problems in Taiwan because few people use it. And I think compared to some other forms of encoding it tends to have problems.
So are all non-Latin languages included? What about languages like Mongolian which are written vertically? I guess they could use Cyrillic, but then if they're going to do that they might as well just stick with latin.
The Chinese the writing system, and any logogram-based language, is not suited to computers. It's far too complex to be practical. At least no English-based operating system isn't. But I've yet to see anyone try to make an OS specifically designed for Chinese or Japanese. What for? They've adopted the Latin system fairly well.
Not to get into wacky conspiracy theories, but I can't help but think that this is more of a political move to undermine Western control of the internet.
I think a more practical solution would be to device a larger character set that can accommodate most major languages but mainly derived from the Latin character set, which nearly everyone already uses with no problem. Perhaps some day we'll have a universal writing system, but we're a long way off from seeing that implemented.
Geez! It's like nobody on Slashdot has ever heard of UTF-8 or Punycode.
http://outcampaign.org/
> largely anglophone, largely UTF-8, or nothing
You probably mean Latin-1 (or a subset thereof in the case of DNS) not UTF-8 which is a 1 to 4 byte character encoding capable of representing non-latin languages.
It's a good thing my preferences are set to give trolls +3... otherwise I would have missed your post. So true, so true, what you say.
Why are they called English letters? In 700-600 BC when people of Rome used them regularly, the inhabitants of British Islands were backward illiterates! For that matter if you call them English why dont name them Spanish, French, Czech, Croatian, or even Turkish! Most of Europe use them, AND THEY DID NOT GET THEM FROM THE ENGLISH!
Did the Roman Empire leave the Latin characters in its will to the British Empire or what?. Maybe all nations using Latin charaters should pay a license fee to Britan and/or the US!
I thought this was solved by Punycode, RFC 3492 http://www.ietf.org/rfc/rfc3492.txt, over three years ago. It is "Standards Track"; all major browsers support it; and it does not break the entire Internet.
It's pop-up hell.
One of the pop-ups says:
callto://JOIN_THE_GNAA__2005_RECRUITMENT_DRIVE
Knowledge is how to play a game, intelligence is how to win, wisdom is knowing what game to play.
Begone with those those foreign temperature scales, say I. We'll have no Celsius, Fahrenheit or Réaumur here. And Centigrade sounds suspiciously metric, and we'll have nothing to do with Bonapart's schemes.
Stick to a proper British Scale, named after a Scottish Lord, no less, with water freezing at 273.15 and boiling at 373.15!
My first thought would be to use unicode, but then TFA pointed out a big problem: URLs that LOOK exactly the same but are not (as in, Oh, that's "ebay" in Latin Extended-A, not the Basic Latin "ebay"). Unicode is great for displaying things, but bad for uniquely identifying things. How many domains would you have to register?
Prov 9:8 Do not rebuke mockers or they will hate you; rebuke the wise and they will love you.
'The internet is like a fifteen story building, and with international domain names what we're trying to do is change the bricks in the basement.
Who ever heard of building a fifteen story building out of tubes?!?!
...it's called Punycode. It's just a way of encoding Unicode into the 37 characters supported by normal DNS. Firefox, Opera, Safari, and IE7 give a transparent implementation, all with different protections against homograph attacks.
Just introduce a restriction according to which a valid URL can only contain symbols from one alphabet. I believe it's not too hard to determine http://www.unicode.org/charts/ which character set does a UTF-8 code belong to, and if the URL uses more than one.
Currently there are only 37 characters usable in DNS entries
Wrong. The usable characters are 0-9, a-z, A-Z, period (.), underscore (_), and dash (-), so 65 characters are usable in DNS entries. I know the 37 number came from TFA, but it's still flat wrong.
Since we know it's going to happen, and we know how it will happen, all it will take is planning to defeat it before it is implemented.
And it shouldn't be too difficult. Just compare the preferred unicode of the browser to the unicode of the URL and put a banner or something at the top saying that there may be a phishing problem when they don't match.
Even better, have an option to auto-deny any javascript or java or anything else for sites that don't match the unicode of the browser.
In fact, get the big players in the browser market together TODAY and get them to agree on the standard response that will be generated when the unicode doesn't match. That way everyone will only have to learn ONE warning for this attack.
ICANN should tie that standard to the release of their standard. Until the browsers agree, there will not be any change in the domain names. That way the various countries can put the pressure on the browser people to get the standard out.
We are already compatible with Klingon porn, just do a search for knives, swords, batleth (no unicode on Slashdot working for me), etc.
For Vulcan compatibility, open your favourite raw file viewer and set it to Binary mode.
Martian porn is (apparently) covered by the Periodic Table of Elements, specifically the permutations of Illudium and Pu.
Enjoy your new worlds of entertainment!
You can have it fast, accurate, or pretty. Pick any 2.
An URL is just a representation, anyway. Sure, if its meaning actually has some relation to the actual content, it might serve someone who tries to find websites by entering www..com better, but most people I know use search engines, anyway. In fact, as a native Chinese writer, I find typing the occasional Chinese character more time consuming because I have to switch to the Chinese IME; I'd rather URLs are all English.
Anyone can "stand up for what they believe", but it takes a very brave individual to change what they believe. - Loundry
see http://en.wikipedia.org/wiki/Internationalized_dom ain_name
IDN is backwards compatible with existing DNS-servers, and has been in use for several years. Mozilla, Firefox, Safari and Opera support it. So does Internet Exploder 7.
I just checked, I found it in my cellar, and it has ! (it was produced in the thirties in Germany)
As inconvenient as this may be, create a response form from which the person chooses the desired or intended URL. For example, one seeks a site with no accent, but 3 variations exist. The look or search would bring them all back and prompt the user to select one. Helpfully, their email entry or search term will precede or follow the domain or URL as applicable (in a locale or language rules set).
Right now, when e-mailing, people still incorrectly enter company names, even in English. So, what is their response? They fall back on a business card, a PDA entry, some scrap of paper, or even a search engine, or a phone call, or out of laziness, they give up. But, a mailing interface smart enough to seek the known registry of companies and non-commercial sites could bring back the options and let the user decide.
Currently, any foot-dragging on the issue smacks of "pre-eminence" or "we had it first and WE'LL decide what and when." Obviously, that didn't hold back on Japan or China. Now, these other countries want technology to be used to resolve their exclusion in what is a language-set exclusive club. By keeping the web to a handful of languages, particularly in English, it almost could be seen as wanting to ensure that English is the default language for business and international communications. If accented languages enter the scene, then all sorts of unforseen permutations in business, communications, and economics might occur, much to the dismay of some (certain governments?).
Personally, I think there is PLENTY of technology and brainpower to have solved this issue 5 years ago. There just was no pressure. Now, I don't by any means think ONE single person is holding this up. I think there are many forces and "interests" asking that this be deferred as long as possible. Normally, geeks and techies like challenges. This is a big-as-hell challenge if there EVER was one, yet it's being stonewalled, just like Asian languages are or have been. If Linux can support dozens of languages at the desktop (heck, Mandriva offers multitudes from which to select not only for install, but for desktop use), then why cannot ICANN and the registries follow suit? Oh, umm... yeh...
Previously: "Linux... Toward the Sunrise..." Now: "Linux... Toward the-- No, now, part of Every Sunrise"
Hey ASSHOLE, who's the fucking TROLL? Why don't you pull your FUCKING panties out of your ASSCRACK? Or is that the only way you can get sexual gratification?
Air traffic control is not something that every other luser engages in. It is not meant for real communications. There are approximately 300 phrases that you are allowed to use, and because Americans got there first, it's in English. It could be in Korean, and there would be no difference. If you ever fly an airplane, you will see that clear communication is paramount to your safety. The internet has many purposes, and it makes no sense to continue to use the same character set.
Well, even restricting to the G7, 4 out of 7 have characters missing from the 37 allowable in DNS:
Canada: à â ç é è ê ë î ï ô û ù ü ÿ missing
France: à â ç é è ê ë î ï ô û ù ü ÿ missing
Germany: ä, ö, ü and ß missing
Italy: All letters covered (as far as I can tell)
Japan: Most letters missing
United Kingdom: All letters covered
United States: All letters covered
But restricting to the G7 is very convenient. How about the most popular languages by native speakers (taken from wikipedia)
Mandarin: 672 million native (old statistic)
English 425 million
Spanish: 390 million
Arabic: 272 million
Indonesian: 222 million native
Portugese: 210 million
Bengali: 194 million
Khariboli: 180 million
Russian: 145 million
Japanese: 130 million
French: 120 million
Persian: 101 million
German: 100 million native
The hilarious thing is that English is the only language out of these 13 to have no issues at all with only having the 37 dns characters.
man! could we, as a single species, possibly agree on a single charset?
redundancy is such a drag when paired with über-specificity (talk about two bazillion words for snow, or sand, or pr0n).
... was very, very nicely put. It, to me, just underscores that any foot-dragging on the accented and Chinese/Asian character sets adoption would be an unacceptable denial of additional color and nuance to the ways of accessing the Internet.
Previously: "Linux... Toward the Sunrise..." Now: "Linux... Toward the-- No, now, part of Every Sunrise"
1. Since lots of characters in different languages are look-alike, this would create lots of security problems with character substitutions. There was a demonstration a year ago or so, with the registration of paypal.com and obtaining a legal ssl certificate for it, where a where non-latin. when user cannot distinguish between the two, how can he or she trust any site on the internet?
2. Creating domains in different languages is also bad idea for collaboration. It will create unnecessary internet segments. Essentially, now people of the world use latin characters for accessing websites and sending emails. How in a world an English speaker suppose to type chinese or arabic characters of a domain to send an email if he or she doesn't know the alphabet or doesn't have a keyboard support installed? No way... so all these domains should really be a supplement to normal latin domains if you want to collaborate with the rest of the world...
Some only had uppercase even. I guess those did have a "1".
Knowledge is how to play a game, intelligence is how to win, wisdom is knowing what game to play.
I've read peoples arguments about needing change, as it's a barrier to people who don't use Latin characters as the standard for their language.
However, we're looking at multiple thousands of characters potentially being allowed.
Personally, I think it's not much to ask the non Latin based to learn 37 new characters, rather than asking the whole planet to learn thousands. Not to mention the fact that massive amounts of information will likely become unavailiable to large numbers of people.
With the current system, even if I end up at a site with foreign language, I can at least babel it, and get a general idea of what's going on. A change would likely prevent me from getting there in the first place.
ian
Kind of an interesting point. Maybe we should just let Google run the DNS system, and just replace it with a giant search engine. If we make actually typing in a web address hard enough, then that's what we're effectively doing anyway: people will just start typing everything (including the domain name of sites they want to go to) into the Google Search box at the top of their browser window, instead of the actual address bar.
Actually, DNS arguably is a giant search engine, which simply works on a 1:1 relationship and uses a distributed database (you input one piece of information, and it gives you some corresponding piece of information back). Replacing it with a 'fuzzier' search engine that would give you back a number of results, ranked by relevance, isn't that huge a leap.
"Ladies and gentlemen, my killbot features Lotus Notes and a machine gun. It is the finest available."
Why has this not been modded up as +5 Absurd yet?
I think we should also standardize all programming languages. Let's just use C++ and be done with it.
I also find comedies to be useless. All fiction should be tragedy.
Currently there are more Chinese than anyone else, so we should just wipe out all non-Chinese. It would make things so much more efficient.
Read the EFF's Fair Use FAQ
Consider yourself lucky they allow anything other than ones and zeros!
Um, that'd be because it's a meritocracy rather than a democracy. The johnny come lately others couldn't be bothered inventing the system in the first place.
Deleted
I didn't find anything in the WP article. I can only guess, but are you saying that
they used Office 97 to create false documents?
The present character set, actually, does _not_ cover Latin languages! In fact, it only covers English. I don't know of a single other language that doesn't have either accents or characters not present in English.
You should rewrite the topic to say "ICANN Under Pressure to Include Other Languages Besides English", but that might give the impression that what we actually have doesn't really cover over half of the Internet, right?
(8-DCS)
Obviously, that didn't hold back on Japan or China.
And how did you come to that conclusion? Do you know what the most popular email/blogging sites are in China? Many of them are numbers. Try something like www.163.com. This is easier than having some romanticized Chinese (and no, Pinyin, the official romanticized version, is not allowed either, as it requires accents, so it would be some bastardized Pinyin or other such setup). Are they still doing business? Sure. Are they inconvenienced at all (held back)? Yes. It didn't stop them from getting online, but it is restricting their ability to use the Internet as freely as we do.
Learn to love Alaska
I would leave DNS for IPv4 as it is. Build a new DNS for IPv6 from the ground up, with all things implemented new and no backward compatibility. Keep both of them separate, and phase the old DNS system out when IPv4 is phased out.
the Internet is meant to enable communication. Communication requires common standards. When you start including uncommon things (Unicode), then you end up breaking the communication.
My blog. Good stuff (when I remember to update it). Read it.
It's the URL.
If the unicode of the URL does not match the browser's default unicode, then throw up the standard warning.
I don't care about "almost identical". We have that already with "l1" and "O0". I'm talking unicode.
If the unicode of the URL does not match the default unicode of the browser (because people should be most familiar with the language they browse in) then throw up the standard warning, disable java and javascript and activex and anything else until the user approves them.
It is that simple.
But *IS* there a decent way to enter Chinese characters? Last I heard people in China (& Japan) used Roman letters because as clumsy as the were, they were easier to enter than the Hiragana. (I think I got that right. Of course, even if I did that would be Japanese only, and it doesn't include Kanji.)
With most languages the problem is different. European languages all have a small and well defined set of characters..so small that they could ALL have been handled by a one-byte code. Arabic languages are more difficult, as they have multiple different context sensitive forms for certain letters. (And that's the extent of my Arabic.) Hebrew is more amenable to computerized representation...but it still means that you need to go beyond the one-byte limit (if you want to include the other languages).
OTOH! Sanskrit, Javanese, Minoan, Burmese, etc. 32 bits isn't quite enough to handle everyone. So now you're using 32 bits for each character transmitted, unless you use a culturally biased form. You've just quadrupled the overhead on the DNS.
Yes, it's possible. I don't really think it's a good idea, though. If you were just proposing the UTF-8 subset then this argument wouldn't apply, but it would then STILL be culturally biased, and it would have all sorts of "false twin" URL mappings. And various other problems (see the other posts).
I think we've pushed this "anyone can grow up to be president" thing too far.
One character == one byte, dammit!
Don't let them change that.
In the course of every project, it will become necessary to shoot the scientists and begin production.
It's exactly the point I wanted to make -- but you got there first.
here.
Why does ICANN need to do this? The task is supposed to go to the national domain registries, which control country-level domains.
That's not at all what people have been saying. I'll admit I'm not sure why not, but that's not at all what I would expect, either. None of those are part of ASCII-7 (though they're in lots of the 8-bit expansions).
I think we've pushed this "anyone can grow up to be president" thing too far.
Nobody complains that Java and C keywords are all English, do they?
With all respect to the Arabic speakers out there - but isn't Arabic already supported in DNS? For example I can type in this Arabic address already: http://17.112.152.32/
:-) (e.g. colour, neighbour, favour etc).
Works perfectly fine.
But seriously, if the world can adopt Arabic numerals as a standard for counting things, perhaps it should consider the humble English language for DNS domains. Even the Asian countries use Arabic numerals.
But I *do* wish everyone would use good British/International English, not that poorly spelt American cuckoo.
Whupps.. Yep, you now remind me that I have seen numeric sites. I sit corrected. I guess I'd gotten too much of Sina and Baidu...
& oe=UTF-8
0 05/01/03/200501030500036/200501030500036_2.html
But, when I use Google to search for a Chinese Character and get pages back, they have English URLs, true.
http://www.google.com/search?q=%E5%8A%A9&ie=UTF-8
http://www.dongailbo.co.kr/docs/magazine/weekly/2
http://www.91985.com/
http://www.chosun.com/
and I realize now that I'd forgotten that english is STILL in thees sites' names.
But, honestly, I could swear that in Mozilla I'd seen Asian or Korean fonts in the URL/location bar only a few weeks ago. I'd been playing around with explicitly searching for information using Korean fonts in the URL at:
http://bemil.chosun.com/
I'll have to reinstall Mozilla and recreate my steps.
Previously: "Linux... Toward the Sunrise..." Now: "Linux... Toward the-- No, now, part of Every Sunrise"
But *IS* there a decent way to enter Chinese characters? Last I heard people in China (& Japan) used Roman letters because as clumsy as the were, they were easier to enter than the Hiragana.
Have you ever watched someone type Chinese? They enter in the romantic letters, but then select the Chinese character from the list of ones that match the possibilities. There is no one-to-one match, so what is typed in letters does not indicate the character, it is the person that picks from a list that is narrowed by what is typed in.
and it would have all sorts of "false twin" URL mappings.
If I were Emperor of the Universe, I would map all the visually similar characters to a single character, and map all incoming requests to the same single character so that the English A and [whatever other alphabet with the same look but different code for A] would all be mapped to the single code. No one would be able to have multiple codes for the same character image because they'd be "cleaned" in DNS and web browsers (cleaned in web browsers for convenience, cleaned in DNS for "security" by restricting ambiguity). That doesn't seem hard or time consuming. One person in a week and a couple lines of code and we're done. And I only came up with that while constructing this response, if you gave me another 10 minutes, I'm sure I could come up with 10 more ways to prevent the same problems others are complaining about. It always kills me how people presume that if they can't think of a fix to a problem that there must not be a fix to a problem.
Learn to love Alaska
As a native french speaker, I can tell you that accents ARE most definitely used in signs and capitals. Even in acronyms now (pretty much always have been in Canada, relatively recent in France)
I'm really fed up with those stupidities.
The DNS addresses alphabet was defined as 37 latin chars. And people complain that to be able
to enter this system, they want to extend it. That's as shameful as to define that there
are not enough IPv4 addresses for everybody, so let's extend the alphabet of IPv4 addresses to
1000 values for each byte, so that we will be able to have addresses from 000.000.000.000 to
999.999.999.999. Ah, it doesn't fit a byte ? let's extend the byte to 10 bits! After all, it's
already what has been done with this stupid UTF8 which breaks every mailer, including
those handling it natively !
There's a system which works as a whole and which relies on sensible technical specifications.
Changing some of them can have huge impacts on everything. Look at all the junk mail you all
receive with unreadable chars. We really don't need to get unreadable addresses.
And yes, I do already use accents and some characters that don't enter the ASCII map in my
every day life, but when I use the net, I conform to established standards. What will happen
IMHO is that people using such new an unsupported features won't have an easy access to the
rest of the net, so they will finally avoid it. On the other hand, spammers and phishers will
intensively abuse the new mechanisms. I'm sure we'll see new security options in all common
tools to allow or block usage of such domains...
That's becoming very very sad.
There's already an active market in Internationalized Domain names (IDN) with many domain speculators buying up names for investment and/or parking purposes. Sites focused on this niche include IDN Forums, the IDN Blog and the IDN Domain Blog. A lot of companies and web sites will go to register their trademarked terms in these multilingual domain names and find that they're long gone, and getting them back could mean international litigation. I think it hasn't even occurred to many corporations to register IDN domains.
RichM
Data Center Knowledge
Using RTL (right-to-left) characters in DNS names is a bad idea. In a Unicode context, once you start using RTL characters in an environment that is not exclusively RTL you need to do bidirectional reordering converting from logical order to visual display order. This causes neutral characters like the dot character to jump to the left side of the word or the right side of the word depending on what character logically proceeds it. Sometimes the only way of forcing the display to look as you intended is to insert various zero width characters, like Right-Left-Mark, Right Left Mark etc. Another problem occurring e.g. in joined languages like indic languages and arabic languages is that you sometimes need to solve ambiguities by using other zero width characters like the ZWJ (Zero width joiner) and the ZWNJ (zero width non joiner). Is the proposal suggesting to allow these zero-width characters in DNS names as well? I certainly hope not... Though English is not my native language, I certainly hope that the current DNS subset will remain.
There is at least one reason, why non-latin letters in DNS names are a bad idea: International keyboards always support at least two character sets: latin (english) and whatever native character set. Thus even if the user is localized in Cyrillic, if I publish my DNS name using latin characters, I know that he and other people all over the world can reach it. If, on the other hand, I was a businessman in China, I could possibly create a nice domain name that was entirely in chinese, even if it was well-known name that was recognized across the world. Now suddenly only people who happen to a) read chinese and b) have chinese character support turned on in their OSes (input support, not display) can access my site. Whereas I could simply transliterate my name into latin characters and reach everyone, without requiring special input methods or skills on the part of the end user.
Certainly I don't want to have add big-5 support to my Linux install and all the various input methods just to visit a site of guy in China that wants to sell me radio-controlled electronics for my hobby (I do want to buy from him though).
Ehhmm... The whole of unicode can be encoded using UTF8. You mean ASCII probably. Please see wikipedia.
Last time I checked, the US created it. We didn't need non-english domains, so implying that it's late is abit stupid - it was never part of the deal. I'm not so sure it's a good idea in any case - it's been trucking on along just fine without it and the thought of adding a whole bunch of different domains where some dumbass is simply adding an accent mark to a letter - this is BEGGING to making phishing even worse (it makes me wonder how much organized crime is behind the push on this)
I've watched people enter Chinese texts on cellphones in China, and it's amazing to watch. Whatever entry method they're using seems remarkably efficient.
Confucius say, "Find worm in apple - bad. Find half a worm - worse."
This is nonsense.
.nu and .se domains. It's up to each TLD to decide what profile of domain names they accept, and no modifications of the DNS protocol is needed.
The technical solution already exists today, it already works in both Firefox and IE and some other applications, and it is already used in i.e. the
The ICANN has nothing to do with this. Move along.
There are accented versions of e and o.
Another coincidence, go figure.
Confucius say, "Find worm in apple - bad. Find half a worm - worse."
Seriously, there is already a system for simulating the presence of non-latin characters in domains... its called the Internationalized domain name system
names are basically translated from Unicode to an ASCII encoding and prefixed with "xn--"
anyway .. if you want more info the Wikipedia Article is pretty good.
Why isn't this system adequate?
Instead of changing the fundamental DNS which is a programmer's and administrator's tool, not an advertising medium. It is founded, like programming languages, on a fundamental 7-bit ASCII character set, and is not intended to be used for NLS text.
A far better solution is some form of VDNS that translates NLS text names into the proper domain name at the system level. That also allows the same domain to have multiple language translations to reflect localized product and service names.
We seriously need to kick the general political community in the arse. They keep trying to impose technical decisions, and it fails as miserably as any corporate PHB's uninformed decisions. ASK the techies to propose solutions instead of shoving ill-conceived ideas down our throats.
For example -- once you mandate multibyte domains, you implicitly mandate multibyte URL components. Goodbye direct mapping of names to the directories, file systems, and servers.
Bad idea. Very bad idea.
I do not fail; I succeed at finding out what does not work.
No kidding. However, how come you're making a comment like that when Slashdot still refuses to allow non-Roman characters in posts, and insists on using charset=iso-8859-1? ... Pot, meet kettle.
Seriously, this is one of those times i really want to say fuck you to the world. Considering the DNS is based in the US, Latin it is. Make yout own DNS that no one uses if you want another. Seriously having the same character in maultiple chracter sets visually indistuingishable from eachother is the best thing that will happen for phishers in ages. How do you know yourbank.com is actually yourbank.com in latin and not in some other characterset. Its not just for latin either. Imagine Japanese Kanji and Chinese in simplified then in traditional encodings. Nice! you can have nihongo(kanji) .com written in 3 different encodings and can send you to three differnt sites. How do you know you are going to nihongo.com and not ribenyu.com.
Jeez this is the most retarded thing i have heard of since the UN threat to split the net a few months back.
The war with islam is a war on the beast
The war on terror is a war for peace
If the Domain Name System supported English it would allow my name to be a domain name, but not even IDN allows me to put a space in a domain name.
What, "da Silva" isn't an English name?
If not, then what is? English is a mongrel language, every name in it comes from a conqueror or settler. Should we go back to the Angles and Saxons, or perhaps to a good celtic name like Cúchulainn?
Whoops.
Of course goatse is much to easy to sucker people into if they can't read the site. I would recommend only 1/2 points for people who can't read the site.
Imagine Internet had Hawaiian origin - you'd be wanting to break the DNS to use your "odd" characters like "c" and "d". Why would you want to "fix" Hawaiian alphabet if it's not broken for Hawaiians!?
English native speakers = 5% of Humanity and going down. The dictature of English has lasted long enough.
... is this:
The purpose of human language is communicate with other human beings
You are operating under the assumption that human language fulfills no other function beyond functional communication. There is plenty of evidence that language defines culture. Speakers of different languages do often see the world slightly differently. If you want everyone to think the same way, one language is a great idea. Personally, I prefer a more heterogeneous world, despite the friction languages create.
Computer languages are instructions to a computer, they aren't a human communication medium
That's an interesting interpretation of computer languages, given that they are created by humans and used by humans to create software. Choice of language informs what the human programmer can create, and has a strong effect on broader human culture. As Lessig argues, code is speech.
You seem to be looking at language as if it merely is a conduit for information. If we were all computers, programmed merely to pass information between each other, that would be the case. For humans, I think it serves many other purposes, and the profusion of languages is good for humanity. Homogeneousness, while it seems like a cure for our maladies, doesn't necessarily help us. Americans speak English; that hasn't stopped violence, disagreement, or other forms of conflict. I would also argue that American culture is so strong in part because it is continuously enriched by influences from other cultures and their languages.
Read the EFF's Fair Use FAQ
But we dont! Not everyone speaks or writes in English on the web! What kind of argument is that?! The Internet is useful to the local group exactly because it is available in their langauge - and not anything else! Why would a government agency publish in a foreign language? Why would a commercial website list their service in a language the customer does not understand?
Perhaps I am not seeing your point here, I am quite tired and sleepy, but your comment is really strange.
-non-latin north european (æøåòóöôéáäâ)
In Windows, you can setup Alt-Shift-Number to change to a different keyboard. I've used three different ones (Dvorak, US English, and Norwegian) for three years, no problem.
In Linux, you can even use the otherwise useless Caps Lock to rotate layouts.
The Optimus keyboard should be another stepping-stone in making non-English layouts a whole lot more mainstream.
Something similar to what you are describing exists, and is called IDN ( http://en.wikipedia.org/wiki/Internationalized_do
It exists currently and is supported in all major browsers. I would like to hear more about why IDN doesn't work for international users, and why native 16-bit DNS is needed.
To be clear, I would have no problem with a Unicode DNS system if I could see a simple way around the identical glyphs/different letters problem that didn't involve making phishing almost impossible to spot or avoid.
Alas:
1. The very nature of the problem seems intractable (how to make two things which look identical look different, without making either one look different)
2. Computers are already almost entirely latin-1-based - on the command-line/at a low level even if not always in the GUI layer
3. Thanks in part to this state of affairs most languages already have widely-known rules or guidelines for rendering words in non-Latin alphabets into the Latin-1 character set - ask any foreign person you know if they have much problem communicating with their countrymen using only normal (western) qwerty keyboards - all of my foreign friends seem adept enough at it even they've never highlighted it as a problem.
This wasn't intended as a racist "fuck 'em for not inventing the computer" rant, but as a genuine hands-thrown-in-the-air regrettably-there's-no-viable-solution-I-can-see and is-this-really-a-problem statement of opinion.
And incidentally, I already spell honour, colour and aluminium "correctly" (and nice little bit of cultural chauvenism there too). I also buy beer by the pint and understand the meaning of the word "irony". I meant "we" as in "the English-speaking (or even Roman-alphabet-using) western world", not as in "American".
Now who feels stupid, eh?
Everything in moderation, including moderation itself
What's the weather like on the planet you're on?
Seriously, while I agree with what you say in principle, it is never going to happen in the real world.
dragonhawk@iname.microsoft.com
I do not like Microsoft. Remove them from my email address.
What does this have to do with Palestine?
Where were you when the voynix came?
I'm guessing as much use as the parts of the Internet using languages you don't speak.
It's a network of networks. The computers can all speak to each other. However, not everyone using those computers is interested in speaking to everyone else. There are people in Japan who just want to email their friends and family (also in Japan), in Japanese. They have as little interest in speaking English with you as you do in speaking Japanese with them.
Keep in mind that the Internet was not designed, and is not being used, to serve you in particular. Or anyone else in particular. As long as the computers can reach each other, the Internet is doing its job. What we use it for is up to us.
There are billions of people on this planet who do not speak English. They are not going away, no matter how much it inconveniences you. I bluntly suggest you get adjust your world-view to include the whole world.
dragonhawk@iname.microsoft.com
I do not like Microsoft. Remove them from my email address.
Given that some societies have used non-Latin characters for thousands of years, is this a bit late in coming?
I would put it thus: "Given that the Internet has been around for only a tiny fraction of those thousands of years, this is actually happening dangerously fast."
{sigh} Just another example of politics overriding engineering reality.
The higher the technology, the sharper that two-edged sword.
37 characters ought to be enough for anybody!! /joke off
Every domain would be required to have a name according to the current limited scheme, as it is now. And then on top of that you add a capability to create domain aliases that use extended character sets. This is the only thing that makes sense. One should never be forced to type in a bunch of cryptic Kanji to reach a domain. It should be broadly accessible to people using older platforms and DNS standards.
The extension could eventually be wrapped up into DNS, but it would be best to develop it as a separate module for the time being, to absolutely ensure nothing breaks.
-- thinkyhead software and media
...because I've got money waiting for me at ámäzón.com & çïtîßäñk.com that I can't get thru to!
You forgot that Latin language itself, which is WAY OLDER THAN ENGLISH and which was for 1700 years the official language for science, does not use any accents or diacritics. I has even fewer letters than English, only 23.
Only two or three hundreds of years ago big scientists and mathematicians like Euler or Gauss wrote their papers in Latin. Latin in not a dead language because it is still the official language of the catholic church; even though the tridentine mass is not the standard mass anymore the official Vatican documents are all issued in Latin.
Just for the guy above who must be having a bad day with his "UNIX is the problem" quotes...
http://www.fsck.co.uk/symbol5.gif
Who is Seg Fault, and what is he doing with Kernel Space?
The reason ICANN wants to do lots of testing (after having dragged their feet for years before getting started) is that IDNs fundamentally change how DNS works, and it's really important not to break too much when you do that (not that ICANN traditionally worried about that.) It's *not* simple, and you don't want to get it wrong.
DNS translates a set of strings of nominally-ascii characters into numbers, or translates numbers into a set of strings of characters, or translates some sets of strings into other sets of strings, depending on which query you run, and uses specific data formats to represent those strings and numbers. There are restrictions on what characters can be in the strings, some for reasons that we could easily declare to be obsolete (7-bit, uppercase-to-lowercase translation), some for reasons that are harder to change (printable characters only, please), and some which are really hard (dots are used as delimiters, and nulls terminate character strings in some popular computer languages. So you can't just plug in arbitrary Unicode two-byte characters instead of pairs of ASCII bytes and skip the case-munging, because some of the bytes will have values that can't be handled, though most of the 8-bit-character alphabets can be used transparently if you don't mind people using incorrect character sets on occasion. 8-bit character sets simply aren't enough - you can handle most Western languages in ISO-8859-1, and UTF-8 is closer but apparently not quite a cigar (too bad - it would have been my preference.)
The main IDN strategies replace this by adding one more translation layer - character-string-set IDN names are translated into ugly-but-recognizable Punycode strings, which get used with standard DNS character-string-set to number translations in the forward direction, and in the reverse direction, anything that arrived as a Punycode xn-uglystuff string usually gets fed to a Punycode-to-Unicode translator by a user interface.
Some things can be fixed by recompiling (or relinking, or re-DLLing) all of your programs with a DNS resolver library that guesses whether to convert strings or not - forward DNS knows to punycode non-ascii characters and not to re-punycode xn--uglystuff, though reverse DNS doesn't necessarily know whether to convert it to Unicode 16 or UTF-8 or just pass it on directly, and if you've typed in a domain name using something other than 7-bit lowercase+digits ASCII, it knows to punycode it, and obviously any domain registry supporting punycode ought to allow anybody who registers a name that doesn't need punycode to have both the straight and punycode names. But it's still ugly.
Bill Stewart
New Fast-Compression-only CPR http://preview.tinyurl.com/dy575ks
It requires more data than ASCII, since you'd probably need several 7-pixel tall columns to communicate a single letter, but its advantage in radio communications is that it is "fuzzy" -- a one-bit error will just mar the letter, but probably won't make it illegible, or silently swap it for another letter. It lets the human eye and brain do the error correction, rather than trying to do it in filters.
So anyway, good idea; so good, you've been beaten to it by 86 years.
"Ladies and gentlemen, my killbot features Lotus Notes and a machine gun. It is the finest available."
It's been obvious since the Europeans got DNS for their ftp and email that there was a problem, even before they invented the web, and even aside from myopic silliness like having
DNS has a couple of restrictions that may have made sense in 1985, long before Unicode was invented. Some of them are easy to fix, especially since most DNS servers in the world use versions of one of three or four server programs, but there's a lot more resolver software out there that deliberately casefolds (though you could fix most of that in two or three generations of Microsoft releases, if you knew what you wanted it to do), and you can fix some of it administratively, by having the people who register UPPERCASE-EXAMPLE.COM also register uppercase-example.com and maybe Uppercase-example.com and do a few similar things for munged Unicode.
Bill Stewart
New Fast-Compression-only CPR http://preview.tinyurl.com/dy575ks
The starts-with-a-letter restriction is gone, mostly because of 3Com, but I think there may still be restrictions against starting with a dash.
Bill Stewart
New Fast-Compression-only CPR http://preview.tinyurl.com/dy575ks
Sure, you can have browser hackery that knows to display xn--ugly-punycode-string-ewtr.cn as the Han characters for the web site. But if you're having trouble reaching the site, does your ping or traceroute program have the builtin IE hack? Or if you want to email them, and you're using an email program that's not part of your browser, can you type in their name and email them? Or can you cut&paste from your browser?
Bill Stewart
New Fast-Compression-only CPR http://preview.tinyurl.com/dy575ks
It always kills me how people presume that if they can't think of a fix to a problem that there must not be a fix to a problem.
;-). At least ASCII doesn't have 3 different codes for the letter 'b', for example.
It's not so much that people can't think of solutions, but rather to so often people's idea of their own writing system is so entrenched that they refuse to solve problems.
You can see this even in the limited English writing system. The computer field has been plagued from the start with the O/0 and I/l/1 problems, and there isn't the slightest chance that any solution will ever be accepted. We also keep using fonts that confuse "d" with "cl" (as in clear vs dear) and "m" with "rn" (as in modem vs modern), even when we realize that the font causes a problem.
If we can't straighten out the ongoing screwups caused by these charset problems in English, how can we preach to the rest of the world about how easy their problems are?
Of course, the Unicode gang made the east Asian scripts even worse in this regard than even their writing system. Many characters have 2 or 3 different codes, and I think I saw one with the same glyph for 4 codes (but I could just be dreaming this
Those who do study history are doomed to stand helplessly by while everyone else repeats it.
OK, I'm no expert, and my information is 2-3 decades old...however...
I have this strange suspicion that the stuff they enter into cell-phones is the rough equivalent to "texting" (or whatever that set of peculiar abbreviations is called).
I think we've pushed this "anyone can grow up to be president" thing too far.
using the Punycode encoding, which recent versions of all major browsers support.
If you want to register an internationalised domain name, just convert it to punycode, register the resulting domain as per usual, and you're set to go.
The major browsers even deal with IDN homograph attacks, by making sure that a url containing characters from different languages will be displayed in raw punycode rather than the internationalised string.
What's more, strings containing things like combining diacriticals and different accented marks need to normalised using the Nameprep algorithm to reduce a string that could be encoded in several different (but visually equal) ways into a single representation.
This system has been in place for a couple of years already, and works fine. No need to go breaking anything.
wikipedia provides a whole host of examples which your browser (if it's recent) should automatically convert to punycode when it makes the request to the page you're after, all the while, displaying the nice internationalised name for you to see.
So, move along people, nothing to see here...
punycode - e.g. http://en.wikipedia.org/wiki/Punycode - is here, and it already works. It's a gigantic tempest in a teapot. (Never mind that figuring out which normalized form of Unicode to use is bad enough as it is.)
Lamest idea ever. How the hell am I supposed to type a non-latin domain in my browser? Do I rely on Google for everything, or do I keep a link to Character Map on my quick launch bar?
50,000 characters makes for a huge keyboard.
That's why I don't use it- it only works for the web browser.
OSx86 FTW
It's high time everyone else learned English anyway.
Seriously though, having lots of character sets is a stupid idea. Suddenly, you can only even type in a website's name if you have the right language pack/keyboard.
English is the dominant international language right now, and it is only sensible that there be a standard English, or at least latin character set, interface to everything. It's one thing to provide support for other languages, but moving too far away from the standard will only increase the segmentation of the internet and decrease the information available to any one person.
Latin domain names insure that there's a standard and simple way for anyone running any OS with any language pack to type in any domain name, if only to run it through babelfish. Most major foreign languages (japanese, chinese) have systems for writing in the latin character set anyways, which are certainly sufficient for the short strings used in *domain names*.
I just thought it was an intriguing allegation, I have no idea whether this has any bearing
on Palestine.
I didn't see another comment that made the distinction, so:
:-) )
Don't confuse display, transmission, and storage encoding. It was a convenience when 7-bit ASCII characters could be used for all three (seemingly). But it was never really true. The encoding on disk was never really the same as the transmission encoding or the display encoding. When you display an ASCII 'A', you don't see '0100 0001'. End users want to see and type in a pretty URL. If it's transmitted or stored as an unreadable hexadecimal hash, who cares? It's like complaining that the magnetic domains on the hard disk that represent the letter 'o' don't form a circle.
If I wanted a logo domain name, I could register an encoding of the SVG as my domain name, then write and contribute the code for firefox to decode and display my logo domain name. I wouldn't need to lobby ICANN to allow SVG or bitmapped domain names (although I might have to write an RFC for the IETF
Actually, since DNS names and IP addresses aren't trustworthy on a massive global scale anyway, we should be using public keys to identify hosts, websites, and other Internet entities. But that's a topic for another discussion.
It was just another jab at Dan Rather for his "all the evidence for the story is faked, but I tell ya, it's TRUE!!!!" . Not to mention his insistence that the faked Word 97 documents were real for two weeks after they were proven fake. Not the best way to end his long and quite distinguished career.
Where were you when the voynix came?
Been a while since I read any Italian, apart from signs & menus, seems I'm getting rusty. Still, as long as I can tell the difference between a casinò and a casino...
Confucius say, "Find worm in apple - bad. Find half a worm - worse."
The Chinese and Japanese sections of Wikipedia are likewise nearly absent of Latin characters.
America rules the internet and domain names. Why do you think there are only latin chars? GO TEAM!
It's amazing, krell. How do you type so much with Rush Limbaugh's cock rammed down your throat?
I feel like death on a soda cracker.
....the fake story was aired on CBS News, and not Rush Limbaugh. What does Limbaugh have to do with it???? Dan Rather torpedoed himself, unless you are one of those that think Limbaugh planted the fake story to trap Dan Rather.
More precisely, DNS is supposed to be case-insensitive and case-fold requests when appropriate. For IDN purposes, the important issue is that one 8-bit byte may get transformed to a different 8-bit byte, which is fine for 7-bit ASCII characters and usually wrong for bytes that represent half of a 2-byte Unicode character. The fact that the transformation can also be implemented by a bitmask is an implementation detail that's not really in the DNS standards, but it does mean that there are bytes with values 128-255 (such as ISO-LATIN-1 bytes or halves of 2-byte Unicode) that might be undamaged by a lookup table implementation but would be damaged by a bitmask implementation.
Bill Stewart
New Fast-Compression-only CPR http://preview.tinyurl.com/dy575ks