ICANN Approves Non-Latin ccTLDs
Several readers including alphadogg tipped the news that ICANN has approved non-Latin ccTLDs at its meeting in Seoul. "Starting in mid-November, countries and territories will be able to apply to show domain names in their native language, a major technical tweak to the Internet designed to increase language accessibility. On Friday, the Internet's addressing authority approved a Fast-Track Process for applying for an IDN (Internationalized Domain Name) and will begin accepting applications on Nov. 16. The move comes after years of technical testing and policy development... Currently, domain names can only be displayed using the Latin alphabet letters A-Z, the digits 0-9 and the hyphen, but in future countries will be able to display country-code Top Level Domains (cc TLDs) in their native language. ... 'The usability of IDNs may be limited, as not all application software is capable of working with IDNs,' ICANN said in a 59-page proposal (PDF) dated Sept. 30 that describes the [application] process." Reader dhermann adds, "Great, now even less chance I can identify NSFW links before they are blocked by my work's big brother app and my boss is notified... again."
Arabic TLDs are a threat to national security
When a true genius appears, you can know him by this sign: that all the dunces are in a confederacy against him.
micrösöft.cöm?
I'm glad we're going with Non-Latin TLDs now, I never understood going to the website "e.pluribus.unm"
There go my plans for world domination through venividivici.vvv
ï höpé thãt slâshðõt wìll dö thís töø wìth ÜRLs!
www.íçáñn.örg
ìt wörkéð!
First non-latin top level post.
nss or Kansas for that matter
Far too much software makes the assumption that TLDs only contain [a-z0-9-], so if you want to go changing that there needs to be a damn good reason, there is not. There are ~1369 2 letter TLDS to be shared between ~200 soverin states and 49284 3 letter generic ones to be split between uses (.xxx .nws .org .edu, etc), there doesn't seam to be any good reason to expand that and make lots of software more complex.
IranAir Flight 655 never forget!
The encoding seems weird to me:
Any DNS gurus care to explain why they wouldn't simply use UTF8?
===Reader dhermann adds, "Great, now even less chance I can identify NSFW links before they are blocked by my work's big brother app and my boss is notified... again."===
Seriously? You think people shouldn't be able to use internet in their native language because you are afraid of getting in trouble for browsing the web at work when you already know you shouldn't? I'd fire you right now if I was your boss.
... of course, is Punycode.
A comment before yours has www.íçáñn.örg, which, when entered into Firefox, turns into
www.xn--n-tfarxw.xn--rg-eka
. Looks like the software will still live :)
This will only make phishing attacks easier unless there are SERIOUS checks on domain name registrations. There are letters in the Cyrillic alphabet that have different character codes than their look-alike letters in the Latin alphabet. I'm sure there are other collisions as well. I'm sure they accounted for this in the proposal, but the problem always lies in the implementation. From a security standpoint, this is a VERY bad idea without proper regulation of domain name registrations, and so far it has been demonstrated that we cannot manage them properly even with only the Latin alphabet. From a cultural and usability standpoint, this is a good thing. It will be easier for someone whose native language uses a non-Latin alphabet to recognize the supposed purpose of a web site by its domain name if some of those domain names can be in their native language. A hypothetical native Tamil speaker who speaks no English will be able to recognize the purpose of a site with an appropriate domain name in Tamil, for example
I wonder what impact this will have on the ever decreasing amount of IPv4 addresses available.
This will have absolutely no effect on IPv4/IPv6. This is a DNS change to allow additional characters in domain names.
The domain names get translated to ip addresses by DNS servers.
I doubt that individuals & companies said, "No! We refuse to go on the internet until we can have TLDs with non-Latin characters."
Yay!!! The door is open for an even harder to detect phishing scheme! Imagine the emails linking to http://slashd/öt.org/something...
.es for example)...
I'm all for internationalization, but perhaps limit it to internationalized domain extensions (.jp or
If a man isn't willing to take some risk for his opinions, either his opinions are no good or he's no good
So build your own damn internet.
Thee current RFC 1738 http://www.faqs.org/rfcs/rfc1738.html Only allows URLs to be composed of
" Within those parts, an octet may be represented by the chararacter which has that octet as its code within the US-ASCII [20] coded character set. In addition, octets may be encoded by a character triplet consisting of the character "%" followed by the two hexadecimal digits (from "0123456789ABCDEF") which forming the hexadecimal value of the octet. (The characters "abcdef" may also be used in hexadecimal encodings.)"
So A-Z and %ddd Just ain't gonna cut it.
Currently URLs are in the ASCII subset of utf-8. What are they going to be in in the future?
What about languages that go from right to left like Hebrew and Arabic?
Open Source Identity Management: FreeIPA.org
There are letters in the Cyrillic alphabet that have different character codes than their look-alike letters in the Latin alphabet. I'm sure there are other collisions as well. I'm sure they accounted for this in the proposal, but the problem always lies in the implementation
This is a decision made by ICANN. We've known for some time that they will willingly approve really tremendously bad ideas, if enough money is presented to them. They recently moved on a motion to start selling gTLDs, after all.
From a security standpoint, this is a VERY bad idea without proper regulation of domain name registrations, and so far it has been demonstrated that we cannot manage them properly even with only the Latin alphabet
Security is not of any concern for ICANN. Never has been, never will be. As long as they keep making money they're happy; security, spam, phishing, etc, be damned.
Damn_registrars has no butt-hole. Damn_registrars has no use for a butt-hole.
Yeah, Slashdot apparently needs to be internationalised too. That ".ws" should be "[U+27A1].ws" (BLACK RIGHTWARDS ARROW).
Is it my imagination, or does this proposal only apply for TLDs, like .uk and .jp? I don't see any mention of supporting it for the rest of the domain name. That seems a logical extension, but it's not been announced.
No kidding!!! What do you say at this point?
cmon how could you think "but in future countries" sounds okay.
it should be "but in the future countries"
great info though. I mean its nice to see that the internet is starting to become more international, especially as the US cuts mandatory ties to ICANN.
... Yeah, I didn't think so.
ICANN just made another move to make everyday life on the internet slightly more difficult for many users, while making life for con artists, spammers, phishers, etc, much much easier (and more profitable). It is safe to expect that someone (probably more than one actually) at ICANN made some money on the deal.
Hell it wouldn't surprise me if they were working with some financiers to try to find a way to sell internet subprime mortgages for profit as well.
Damn_registrars has no butt-hole. Damn_registrars has no use for a butt-hole.
Now those countries, organizations and businesses that wish to become inaccessible to most of the world (except the native speakers of their own language) can finally do so as easily as possible. Create their own little Internet reservations and stay there :)
As long as my software (such as Firefox) obligingly converts these IDN urls into the dash-hex notation making them obviously unreadable, I am ok with that.
Disclaimer: I am a native of non-English speaking country. I am sure a few of my countrymen will use this feature based on misplaced patriotism. I am also sure that vast majority will ignore it just like they ignore potential to use non-latin domain names that exists right now.
Good news for the non-english speaking users. Though a challenge for search engines.
QTriangle Infotech Best web design, Web Hosting and Domain Registration
With blackjack and hookers! In fact, forget about the internet.
Seriously though, it is nice to have a lowest common denominator in characters, so that everyone can type every address on the Internet.
Escher was the first MC and Giger invented the HR department.
Unicode can mean many things - UTF-8, UTF-16, UTF-32 - so specifying Unicode is not detailed enough to implement and by not specifying, it is opening a can of worms IMO. UTF-8 tends to be slower and larger for non-ASCII but has wide acceptance. It would also be the favorite for Linux/UNIX because it is very common there (my Linux box has LANG=en_US.UTF-8) and also for communication with databases (in my experience, UTF-8 is what most enterprise companies use for their database settings if they need multi-language databases). UTF-16 is worse for ASCII because it always has a second byte, but is generally faster and smaller for multibyte languages. It is also the default character encoding for MacOS and Windows (and contrary to its name, it can, in fact, contain 4 bytes of characters - the older format, UCS-2 was 2-byte only). It would be possible to support multiple encodings maybe on the URL, but this needs to be specified (for instance you could do something like http8:// or http16://).
To further throw a wrench in the works, wchar_t in C has unspecified length and can be 8, 16, or 32 bit characters. On Windows and Linux it is 16 bits. On mac and BSD UNIX it is generally 32 bit. This makes multi-platform programming using wide characters in C/C++ a bitch (and I say that from experience).
I doubt that individuals & companies said, "No! We refuse to go on the internet until we can have TLDs with non-Latin characters."
You think that companies have only a single domain? You think that they use only a single IP?
iain@expat-tc ~ $ host www.microsoft.com.au
www.microsoft.com.au has address 203.19.66.74
iain@expat-tc ~ $ host www.microsoft.it
www.microsoft.it is an alias for microsoft.it.
microsoft.it has address 207.46.232.182
microsoft.it has address 207.46.197.32
microsoft.it mail is handled by 10 maila.microsoft.com.
Most people here seem to miss one of the big reasons for this. Just imagine what a pain it would be for you if it was required that you type 2 or 3 Kanji characters at the end of every URL that you type out manually. These are not characters that are generally available on your keyboard and you have to switch they keyboard input to try and type them, or use a software keyboard etc. Even if you are fluent in both languages, it is a pain in the arse.
If this is a common problem for you, turn off your browser's "load images" setting. Not a perfect solution, but better than a flashing neon animated GIF of bouncing boobs right as your boss walks by. Myself, I've a number of people I follow on twitter who post links and often fail to mention if they're work appropriate, so I set up PuTTY to be an SSH tunnel/SOCKS proxy (scroll down to, "PuTTY for WindowsXP") to my home file server.
See these slides about exploiting UTF-aware software.
http://www.casabasecurity.com/files/Chris_Weber_Character%20Transformations%20v1.7_IUC33.pdf
Finally, someone who gets it! Now can you please explain that to all your other non-English speaking brethren, because we only speak English here..
"But this one goes to 11!"
A lot of the debate here seems to be about English-speaking countries vs. the rest of the world, but English isn't the only language that uses the Latin. Also, the unavailability of non-Latin scripts hasn't hampered the flourishing of home-grown websites in India and China named in their many local languages - what makes the ICANN think this is even necessary?
Karma fed to this user will be promptly burnt. Be warned; be wary.
RTFA. Internationalized characters in domains are encoded. See also RFC 3492.
Regardless of implementation, once/if this does go through, my biggest question is what (if anything) is being done about domain squatting? We are talking about opening up potentially millions of domain names that have never been registered and I assume the moment this begins to be possible there would be some mad dash to register everything imaginable...
Yay. Now you can can register yourbankname.com with some funky characters that render in exactly the same way as the letter you are used to.
Aren't IDNs already available via Puncode encoding? (For example the ones at http://www.w3.org/2003/Talks/0425-duerst-idniri/slide12-0.html) Or am I missing something?
More places for those damn domain squatters to snatch up before we can.
This will have absolutely no effect on IPv4/IPv6.
It's not as clear as you think. The post you respond to probably thinks that having non-Latin TLDs will increase domain registrations, which might require more IP addresses. Not all new registrations will be redundant.
As I said, a complete mess.
God knows what will happen with all DNS caches full of cn--, all security risks, bugs, all unreachable websites (unless you have unicode in your system), confusion, the exponential gowth of phishing, scams, and domain theft.
Unbelievable. Today Internet was such an orderly, quiet medium and now this. One day they will allow people to call each other through the Internet without using their home telephone! Can you imagine?
... not being able to enter the URL!
How exactly do you think you'll be able to type in a URL in mandarin or russian on west european keyboard?
Don't most countries already have a country code tld? What will they do with the old code for thier country? Sell it like Tonga and Tuvalu?
(And those two countries just mentioned don't have a different alphabet - I don't think they had a written language before contact with europeans.
apart from typing google into google?
proud caffeine whore
how is that a problem? if you can't get to the website because it's in a funny language, what makes you think you can read the contents?
could be chàse.com or cháse.com
every website i go to from now on, i need to study the url with a magnifying glass to make sure i am getting the actual site i wanted. not even as a security precaution, but just to avoid phony sites that might be spoofing a real one for all sorts of purposes, even if just humor, not all of them nefarious, but all of them certainly annoying
a with accent mark may be easy to see, but there are some subtle unicode characters that are so completely like the lowercase "L" or upper case "I" or upper and lowercase "O", etc., and each different font might render the different characters in so many subtle variations, that its almost impossible anymore to guarantee that the link you followed actually went to the site you think it did
so we have to type addresses by hand to make sure they are genuine from now on?
its not cultural imperialism to support only 30 or so characters for website addresses. think of it as a universal routing system, that is purposefully limited, simply for the sake of security and peace of mind
characters for website addresses should remain small in number. simplicity means security. now we have opened a can of worms, and i think the spoofing will actually be worst for those who use nonlatin alphabets, as they are more likely to be mixing latin and nonlatin characters in their address bar
intellectual property law is philosophically incoherent. it is your moral duty to ignore it or sabotage it
There are a lot of websites where the words don't matter.
Uh, yeah, because the keyboard you're using is a clear indicator of which language(s) you understand.
....although obviously not ... in Kansas.
Genesis 1:32 And God typed
One word: Klingon.
Genesis 1:32 And God typed
This will make guys like google seem like definite gateway to the internet... Everything seems to be shifting. I don't think phishing will become a huge problem for people who get some PC training. On the other hand the countries more likely to have problems with this also have less of a chance to get people trained. I also would hope that browsers will adopt a feature to protect us in a more realistic way.
How exactly do you think you'll be able to type in a URL in mandarin or russian on west european keyboard?
You enable Chinese keyboard layout (dunno what's it called), and type it. The letters printed on the keys of your keyboard aren't some sort of magic that lets your computer input languages written in them, you know.
I don't have any keyboards with Russian characters on them, but I happily type in Russian regardless (in fact, I only first realized that I do actually truly touch type when I first ran into this problem, which turned out to not be a problem in the end).
There is no Mandarin keyboard, you have to use an input method to input Chinese characters. Computers sold in China are in qwerty. You type the romanisation and you can choose from a dictionary of characters which one you want. Of course you should not have to do it to type an URL to visit a page in English, but I expect all Mandarin speakers to have a way to type Chinese on their computers so it should not be a problem.
You may have more problems with european languages, for example French and its accents. Input methods are available but not widely used because computer sold in France have the accents directly on the keyboard.
You think that companies have only a single domain? You think that they use only a single IP?
I am well aware that companies have many domain names and many ip addresses.
But like I said, I doubt that individuals & companies said, "No! We refuse to go on the internet until we can have TLDs with non-Latin characters."
The limiting factor was not non-latin domain names.
Finally we can do web addresses in Klingon!
how is that a problem? if you can't get to the website because it's in a funny language, what makes you think you can read the contents?
Ever go on holiday? Ever need to use an internet cafe in your holiday country?
I look forward to browser plugins that block or auto translate these urls for the sake of security.
As you should be able to use all non-latin TLD's as a element to filter against.
'Cause as a website visitor, I *really* want to learn all those other languages and switch my keyboard every other website I go to. *REALLY!* I've wanted to for *years*!!
And as a website owner, I want to pay to register 248+ domain names just to cover all the new TDL languages.. instead of having http://www.mywebsite.com/jp/index.php, et al.
Smells like teen scam to me. I really had to laugh when they showed the South Korean Pensioners learning how to use the internet. And complaining about having to learn english.
English: the de facto Lingua Franca of the web!
I've abandoned my search for truth; now I'm just looking for some useful delusions.
Last time I was in Thailand, the keyboards at the internet cafe had both latin and thai characters on them, and it was trivial to switch the keyboard language
No sig for the moment.
Then there are the ones written vertically, like Japanese and Chinese - yikes! :-)
It must have been something you assimilated. . . .
How about just put the icon in the URL bar (like there is for FAVICON etc) with the country flag of the non-ASCII components in a URL if it exists.
The only reason not to use a flag in the case of plain old ASCII 7 bit is because it should be the UK flag, but the USians wouldn't like that.
So if you see www.chase.com and a french flag, YOU run away. Unless you know they host a french site at that URL.
No. Why do you ask?
"But this one goes to 11!"
At least restrict the character set to UTF-8
Where can i find this list? Or does it only exist in the SLASH source code?
The SLASH source code is published in a Git repository. However, SLASH exposes several settings to the site owner, such as how much karma a "Funny" is worth (+1 in stock SLASH, 0 on Slashdot), and I'm guessing this character whitelist is one of those.
It looks to me they just threw the baby out with the bathwater.
Given how few articles on Slashdot are explicitly about internationalization, there is only enough baby to count as "acceptable collateral damage". A SLASH-based site directly about i18n issues would obviously have a wider whitelist.
If the problem was unicode's direction control characters, why not just blacklist those few control chars?
Because we don't know what additional control characters Unicode Consortium will define in the future. Also because Slashdot admins want to discourage, say, ASCII art made out of Japanese characters.
Instead we now have a whitelist so ridiculously small, it's useless.
The success of Slashdot shows that the character whitelist on Slashdot is useful for everything but talking about i18n.
h
t
t
p
:
/
/
w
w
w
.
Your computer can almost certainly display Chinese and support the same text input methods that the Chinese do. Your browser, if it's a recent version, already implements Punycode. And nobody's stopping you from learning Chinese, you know. Or from hiring people who know it to browse the web for you and help you deal with Chinese-language sites with Chinese-character URLs.
In fact, what all these standards are doing is to make it possible for you to access the same websites as everybody else, that they're going to write in a foreign language anyway. It's not like they need your permission to use their language, you know.
Are you adequate?
You know, dhermann makes a really good point. I think we should hold off on any further changes to the internet or the web, so that he can continue shirking his duties at work. Why should he be inconvenienced, just so that all these barbarians with their crazy moon languages can have domain names that make sense to them? The audacity of these people!
What a dumb asshole comment to make. Grr.
How exactly do you think you'll be able to type in a URL in mandarin or russian on west european keyboard?
You enable Chinese keyboard layout (dunno what's it called), and type it. The letters printed on the keys of your keyboard aren't some sort of magic that lets your computer input languages written in them, you know.
I don't have any keyboards with Russian characters on them, but I happily type in Russian regardless
I'm happy you'll do this. I won't, and the majority of the internet users won't either. It'll just further separate nations, because I won't go through the hassle of typing in a foreign character domain name - it'll just a site I won't visit.
-- If we don't stand up for our rights, now, there will be no right to stand up for them later.
I'm happy you'll do this. I won't, and the majority of the internet users won't either. It'll just further separate nations, because I won't go through the hassle of typing in a foreign character domain name - it'll just a site I won't visit.
Presumably, if a site is designed to be visited by someone who only understands English, it will use an English TLD. If it uses TLD with national characters, then most likely the content is in the language other than English as well, and you'd need to have means to input that language to fully interact with the site anyway.
I used to look here (http://www.ietf.org/rfc/rfc1738.txt) for this kind of thing...
Yup, I see jackshit.
Here is a demonstration of how non-latin characters really show here:
New Economic Perspectives
What about them? The URL is still encoded exactly the same; it's just displayed differently in the browser. Of course that would require that LTR and RTL characters aren't combined but someone already suggested that domain names should only be allowed to contain characters from one alphabet (more or less; for example there are special Unicode code points for Latin letters used in Japanese text).
USE HOT GRITS WITH STATUE OF NATALIE PORTMAN (NAKED AND PETRIFIED)
Seriously though, it is nice to have a lowest common denominator in characters, so that everyone can type every address on the Internet.
Um...Why? Do you know every address on the internet? There are already web sites in other languages that you don't know about what is the problem if their address is in some other language? Does it really bother you?
Initially I thought this was cool. But then I started thinking about this and I realized that all this is going to do is fragment the internet. The existing system ensured a convenient standard that anyone could access. How the hell are non-Chinese, for example, every going to figure out how to type a Chinese address? Unless someone provides you with an address it's not likely you'll ever figure it out.
Even being able to speak Chinese this would be a challenge for me. I expect even Chinese natives are going to have a hard time with this. I could tell someone my web address, but then I also have to explain which character I mean because there could be there might be multiple characters for that particular phonetic. And lets not get into all the languages out there with their own unique writing systems.
The fact is that certain languages aren't quite as conducive to use with computers as others. In many cases it's probably just that nobody has made the effort to optimize input devices and system interfaces. But then when you do that you also alienate the rest of the world. It's entirely possible most foreigners wouldn't ever end up on these sites anyway but I don't like this fragmentation by what I see as dumping a standard. Technology will eventually reach a point when this is not an issue, but we're not there yet. I really don't see what was wrong with the Latin alphabet and Arabic numerals. Every computer in the world supports this by default so how exactly does this move enhance accessibility?
...nobody in the world speaks Chinese and Arabic? Last time I checked, they're not on the same keyboard. There's a multitude of languages that aren't on the same keyboard. Chinese keyboards don't always even have English on them.
"the majority of the internet users won't either."
Sorry, but that sounds like typical American ethnocentricity. The MAJORITY of internet users actually are people who don't natively speak English. Chinese speakers, Russian speakers, European people, many of whom use cyrilic alphabets, Arabs, South Americans, Indians, and others that I'm surely missing.
How can you possibly speak for "the majority of internet users", when people who speak English as their native language constitute a pretty small percentage of the world's people? I could google, but I'm almost willing to bet that more people on this earth grow up speaking Chinese, than people who grow up speaking English as their first language.
If a guy is more comfortable using his own language, I'm all for him doing so.
"Windows is like the faint smell of piss in a subway: it's there, and there's nothing you can do about it." - Charlie Br
You enable Chinese keyboard layout (dunno what's it called), and type it.
You forgot the part where you strap on around 40,000 extra keys to cover all the different pictograms. Typing Mandarin, Cantonese and other Chinese dialects (or at least one way a colleague showed me) involves using latin characters to spell the start of the pictogram sound, then selecting from sub-menus the actual word or part word you want.
You can advertise in this sig from as little as £99.99 a month!
Not a chance spic.
I agree! Therefore I suggest that all tlds and the whole domain for that matter be binary, 1s and 0s only. In preparation for my obviously superior domain scheme being implemented I am registering 01110011 01101100 01100001 01110011 01101000 01100100 01101111 01110100 00101110 01101111 01110010 01100111 today just to be an a$$ so there!
Oh, man, how will us westerners get our fill of burkini babe pics?
IMHO, the worst decision ever made in the history of the internet was ICANN's decision to allow non-ASCII subdomains of .com, .net, and .org. Those three gTLDs, if not others as well, should be forever off-limits to any characters besides the original 26 letters a..z, the digits 0..9, and hyphen.
For other TLDs, and for national TLDs, DNS should be extended to allow a TLD's authoritative top-level registrar to authoritatively indicate which UTF8 characters (or range(s) of characters) beyond those historically-allowed for ASCII DNS are valid for its subdomains. The registrar for Spain might decide it needs accented vowels and tilde+n, but has no reason for Turkish vowels that conveniently (for phishers) look identical to ASCII characters.
The next step would have been the creation of a few brand new TLDs, like ".Zhong" (U4E2D) and ".Nihongo" (U65E5 + U672C) -- think ".(Chinese)" and ".(Japanese)", not to mention similar TLDs for Hindi, Korean, Russian, and other languages that use non-Roman alphabets.
The point is, ICANN could have done a much, much better job with this whole mess. I think everyone can agree that international domain names are something that needs to exist, but trying to staple them onto .com/.net/.org was an incredibly bad idea.
Thank you, Captain Obvious, I had no idea!
Since you missed the sarcasm in my previous post, I assume you'll miss the sarcasm in this one as well. As such I will translate:
"Yes, I am aware of that, as is everyone involved in the thread."
before posting any more comments to this story be sure you understand what these terms mean:
* TLD = top level domain
* punycode
<grumble>...used to be news for nerds...</grumble>
I suppose I should go read the friendly A and see if ICANN has already specified all the native TLDs allowed as equivalents for country codes, and probably for .com, .org, .net, .mil, .edu, .etc., and mapped them to the equivalents.
Somehow, I doubt it.
Computer memory is just fancy paper, CPUs just fancy pens with fancy erasers; the 'net is just a fancy backyard fence.
It may be that I am not directly affected in some cases, but I'm pretty sure I'm going to hit a wall sometime trying to figure out whether the uri in some cryptographic siggy is valid or not.
Computer memory is just fancy paper, CPUs just fancy pens with fancy erasers; the 'net is just a fancy backyard fence.
So, you're telling me that there will be no documents I need to read on the website ".."?
In my case, both my keyboard and my eyeballs have no problem with the characters (unlike the slashdot software).
It is true that I could probably dig out as many as I could find of the relevant (English language) pages on the Japanese government's tax office websites and send them to my sister, were I to ask her to help me with my taxes, but even that is not always an available option.
To say nothing of the potential need to verify a uri or url written native.
We at least need to be able to map the TLDs to something more or less commonly legible.
Computer memory is just fancy paper, CPUs just fancy pens with fancy erasers; the 'net is just a fancy backyard fence.
So, you're planning to learn Chinese as your second or third language?
So you can get to all the important (by majority reasoning) websites?
Yeah, I know I'm being obtuse. There's a reason. Majority has nothing to do with the argument, on either side.
Computer memory is just fancy paper, CPUs just fancy pens with fancy erasers; the 'net is just a fancy backyard fence.
Well, for the problems with strange variations of .com, .org, .etc., don't forget that they are opening up the whole TLD space.
Computer memory is just fancy paper, CPUs just fancy pens with fancy erasers; the 'net is just a fancy backyard fence.
Ever heard of wubi, cangjie or daiyi? There are plenty of stroke-based input methods where keys are assigned groups of strokes and you compose the character that way - no Latin involved. Then there's the zhuyin/bopomofo phonetic input used in Taiwan, which uses Chinese phonetics. Once again, not Latin involved.
I know what you mean, but I suspect it won't make much difference.
Most of us find new sites either by a search engine, which is only going to look for sites with content in the language we are using to search (and mostly ignoring the domain name), or by a link from somewhere, in which case it won't matter at all. The only case that would matter is where links are printed so that you have to type them out again (or occasionally, the moronic designers who put them in graphic images).
For every expert, there is an equal and opposite expert. - Arthur C. Clarke
Then there are the ones written vertically, like Japanese and Chinese - yikes! :-)
Actually, the national specs for how Japanese and Chinese are written include horizontal left-to-right as one of the two standard layouts. True, both were primarily written vertically (starting at the upper right), and there are still publications that do that. But the European horizontal printing convention has long since been decreed legal and standard in all the countries that use Chinese characters, and it's widely used.
It's no big deal, actually. Consider that it's not unusual to see English written vertically, mostly on signs hanging above the sidewalk in front of buildings. Few English-speaking people have any trouble reading those signs. Why would you think that the Japanese or Chinese would have any trouble with their language written horizontally?
(Well, OK; the Chinese used to also write horizontally from right to left. But you mostly only see that in museums and a few historic buildings nowadays, plus the equivalent of "Ye Olde ____ Shoppe" signs. ;-)
Those who do study history are doomed to stand helplessly by while everyone else repeats it.
If you ask google about "most widely spoken languages", you can find a number of good articles on the topic. Currently the first hit is http://www2.ignatius.edu/faculty/turner/languages.htm, which gives a number of rankings of the top languages, depending on just how you phrase the question. They point out that the number of native speakers isn't necessarily the best way to judge the importance of a language. By that simple measure, Mandarin is the top language. But it isn't used much outside of east Asia. English, French and Spanish have fewer native speakers, but are more important in most of the world, for a number of reasons.
Anyway, you can learn a lot of interesting stuff about the topic by reading a few of the things in the above google search. It's a lot more complex than you might think, especially if you live in one of the parts of the world (e.g. the US) where most everyone speaks the same language.
Those who do study history are doomed to stand helplessly by while everyone else repeats it.
ICANN haz UTF-8 domainz?
The landgrab is going to be a mess, especially in places like China. Here domains use romanization to represent domain names, but each pinyin (romanized) syllable maps to sometimes hundreds of different chinese characters... definitely will be a lot of jostling.
LS
It was a joke. :-) Although, using Kanji or Han -like characters might be problematic as some (many, most?) can mean entire words or concepts and are context dependent... Perhaps they'll use Romaji -like characters instead.
It must have been something you assimilated. . . .
I live in Taiwan, I have no idea what you are talking about. "Chinese" keyboards are nothing more than your standard keyboards. They have all the same keys and everything. All you have is a different typing system. Why would their keyboards not have English, how do they use the internet right now? Most people who use simplified Chinese use roman pinyin for character entry.
I use Mandarin Phonetic Symbols for character entry. From Windows I just add that keyboard from the regional settings and I can type. In linux I use SCME and I can type. All on the same keyboard. I also type in Dvorak when I type English, guess what, it uses the same keyboard. I bought a keyboard in Taiwan, and guess what, I can type in both Chinese AND English. I know not of a language that exists that uses a keyboard different than our standard 104 or 108 key keyboard. It's all about the keyboard mapping. I don't know how this thread got so long without someone realizing everyone here are just a bunch of idiots.
To address a real problem: /end rant
What if I am on a computer that I cannot add the correct language keymapping on to visit the site? That is the problem that I see. It is a software, not a hardware issue.
If the webmaster wants people of other languages to access the site he'll also register a latin domain name. If he don't cares, it's mainly his problem as he loses audience and possibly money.
Second: services to map non-latin domain names into latin ones will appear, similarly to URL shorteners services. That will solve the problem for most of us.
But also think about this: Chinese-language sites have latin domain names and Chinese-speaking users typing on Chinese keyboards now. They'll be able to let their main users to type URLs in their native language. That's definitely a good thing.
I'm going to just throw it out there, but seriously, why should an American web site care about the rest of the world. Honestly, I could put a big filter on any domain that has any non-8 bit ASCII character out there and I would be utterly happy. While it might be nice to talk to the rest of the world, its not worth the extra byte for unicode, and its certainly not worth f---ing up polymorphism between strings and vectors just so we can have dumbass umlauts and other crap in our text.
Call me a flamebait, but seriously, for consumers, if you carved up the whole internet into 8 bit character fiefdoms, and had just the asians deal with utf-16 or even utf-32, then, wouldn't that just actually be smarter for end users? Sure, monolithic corporations might balk at the cost of this, but why should I need to give the likes of Exxon a goddamn doubling of all of my strings just to make it easier for them to do world wide operations.
I'm in favor of ASCII, that's what I'm saying.
This is my sig.
Ok, I should have put "one of the ways of inputting Chinese characters is..." as obviously there are others. My point was it's not just as simple as changing from UK input to French so you can do all those little curly things under your c's!
You can advertise in this sig from as little as £99.99 a month!
Although we already have non-Latin domain names, these were rather inconvenient for languages that are based on alphabets that are not derived from the Latin alphabet. E.g., if you wanted a Greek domain, you'd get something like <bunch_of_Greek_characters>.gr, which requires keyboard switching to type, and is more inconvenient than simply typing an all-Latin name. Turning that "gr" into Greek may actually make browsing Greek sites easier, as the keyboard may be left permanently switched to Greek, while typing.
The same goes for Cyrillic, Arabic, Chinese, etc.
It goes without saying that there is a lot of money to be made here. Not only are non-English web sites now going to have an incentive to actually register non-Latin domain names, they're still going to keep renewing the old Latin domain names as well, so that the sites remain accessible from the English-speaking parts of the world.