International URLs Pass First Test

Great by otacon · 2007-03-13 03:22 · Score: 5, Funny

now I have to learn second languages to look at asian porn.

--
In a world of acronyms, the words are the real victims.

Re:Great by vivaoporto · 2007-03-13 03:25 · Score: 4, Funny

I watch porn to learn foreign languages, you insensitive clod!
Re:Great by Lenneth-chan · 2007-03-13 03:28 · Score: 3, Funny

I'm pretty sure most porn sounds are the same in any langauge.
Re:Great by mosburn · 2007-03-13 03:32 · Score: 2, Insightful

That's what the cheesy lines at the beginning are for, typical pizza boy/plumber/etc. to get you going as your intro to a new language.
Re:Great by Anonymous Coward · 2007-03-13 03:58 · Score: 0, Insightful

I'm pretty sure most porn sounds are the same in any langauge.

Nuh, Japanese porn girls sound like they are crying and/or in agony when they are having sex. Quite a turn off unfortunately.
Re:Great by Anonymous Coward · 2007-03-13 04:02 · Score: 1, Funny

Quite a turn off unfortunately.

It is? You vanilla people are weird.
Re:Great by Anonymous Coward · 2007-03-13 04:12 · Score: 0

Vanilla people?
Re:Great by omeomi · 2007-03-13 04:18 · Score: 1

Yeah, that'll be just great. "Oops! I mistyped the address...I forgot the e was supposed to be #DD44AA, while the s and the x are supposed to be #FF22BB".

--
ZuluPad, the wiki notepad on crack
Re:Great by Anonymous Coward · 2007-03-13 04:28 · Score: 0

As opposed to Artichoke people or Chocolate people or Grape people, I guess.
Re:Great by nacturation · 2007-03-13 04:51 · Score: 2, Funny

Vanilla people? Vanilla typically refers to the deep brown extract from the vanilla bean, which is itself nearly black once fully dried and cured. So the original poster is clearly referring to African tribesmen.

--
Want to improve your Karma? Instead of "Post Anonymously", try the "Post Humously" option.
Re:Great by Hoi+Polloi · 2007-03-13 05:09 · Score: 1

If there is one thing Asian porn has taught me is that Asian people have pixellated genitals.

--
It is by the juice of the coffee bean that thoughts acquire speed, the teeth acquire stains. The stains become a warning
Re:Great by Anonymous Coward · 2007-03-13 05:33 · Score: 0

bs - vanilla typically applies to ice cream
Re:Great by dunkelfalke · 2007-03-13 06:07 · Score: 2, Funny

http://www.youtube.com/watch?v=cfwNACNxDNA

--
Conservatism: The fear that somewhere, somehow, someone you think is your inferior is being treated as your equal.
Re:Great by Anonymous Coward · 2007-03-13 06:46 · Score: 0

I'm pretty sure most porn sounds are the same in any langauge.
As a matter of fact the sound Japanese catgirls make is more of a "Nyan, nyan!" than a "Meeeow!"

Happy to help.
Re:Great by don_bear_wilkinson · 2007-03-13 08:54 · Score: 1

In this context, "vanilla" refers to people who are not interested in/involved with/ experienced with BDSM. BDSM = Bondage & Discipline, Dominance and Submission and Sadomasochism. 'Kinky shit'. :)
The prior was insinuating that sounds akin to crying would be an erotic thing for some people.

--
In Nature, stupidity is a capital offense. In human society, too many get off with less than a warning.
Re:Great by Anonymous Coward · 2007-03-13 10:23 · Score: 0

Non-ASCII? This is awesome! I can't wait for the ANSI addresses to start showing up.

Or, god forbid, RIP.

Actually, come to think of it, that might not me such a bad idea... graphical URLs...
Re:Great by laparel · 2007-03-13 18:48 · Score: 1

I thought vanilla people refers to white rappers...

Phishing just got a lot more interesting by L.+VeGas · 2007-03-13 03:25 · Score: 4, Funny

Imaging all the new ways to spell bank0famerlca.com.

--
Best Windows Freeware

Re:Phishing just got a lot more interesting by slart42 · 2007-03-13 03:39 · Score: 4, Informative

>Imaging all the new ways to spell bank0famerlca.com.

This is already happening. A common example is the cyrillic lower case "?", which looks almost exactly like the latin "a" in most fonts.

See http://en.wikipedia.org/wiki/IDN_homograph_attack for more information.
Re:Phishing just got a lot more interesting by colfer · 2007-03-13 03:42 · Score: 3, Informative

Preventing that has been part of Mozilla's IDN implementation, and I assume other browsers have addressed (ha) it as well. If a TLD, like .ie, Ireland, has a policy against phishing, and a table of lookalike letters, then Firefox will present the IDN address in the address bar in its own, non-English, language. Otherwise, Firefox displays the address in its IDN-encoded form, which is all ASCII. AFAIK, from reading bug reports on Mozilla, this is already in force.
Re:Phishing just got a lot more interesting by colfer · 2007-03-13 03:50 · Score: 2, Informative

Here are the references on IDN puny-code spoofing prevention settings in Mozilla. http://kb.mozillazine.org/Network.IDN.blacklist_ch ars http://kb.mozillazine.org/Network.IDN.whitelist.* http://kb.mozillazine.org/Network.enableIDN http://kb.mozillazine.org/Network.IDN_show_punycod e For example. .jp Japan is whitelisted but .ie Ireland is not. There was a debate between people that wanted to disable or hobble IDN/puny-code, for security, and people who wanted to internationalize Mozilla completely. The resulting blacklist/whitelist and configurability was a compromise.
Re:Phishing just got a lot more interesting by Alky_A · 2007-03-13 04:22 · Score: 1

I'd hope the first update all the browsers get is a 'language' label next to the address bar showing the language all the characters in the URL are from. URLs with multiple languages would then get a big flashing red bar screaming "OMG PHISH PHISH."
Re:Phishing just got a lot more interesting by Yetihehe · 2007-03-13 04:37 · Score: 1

It's not going to work. Would you visit address bkófmrika.com? (Caution - polish letters, if you do not have appropriate font, you will just see question sign "?")

--
Extreme Programming - Redundant Array of Inexpensive Developers
Re:Phishing just got a lot more interesting by drsquare · 2007-03-13 05:00 · Score: 4, Funny

Rubbish. Since when does a question mark look like an 'a'?
Re:Phishing just got a lot more interesting by 0xABADC0DA · 2007-03-13 05:05 · Score: 1

Informing the user that some URL might be something different from what they thought it was supposed to be so that they can hopefully recognize fakes is just plain stupid. It relies on people interpreting information and people make mistakes, so a phisher just keeps trying until somebody messes up. Plus, it presumes excellent eyesight... some people can't really tell an 'o' from a 'e' in a normal English font, either from bad eyesight or bad monitor or both. Some people can't tell a 'b' from a 'd' due to dyslexia. But I guess it's okay for those people to get their information stolen??

The solution is to not have users type in their information all the time.

Client certificates. When you register your account the first time with some site you get a certificate that your browser has to use each time you visit the site, or you can't get in without say actually calling the business to get one. The user never types in their login / password again (certificate contains their name or id number). Now user goes to a site and it says 'dude we need your password again' when it hasn't asked for that in 5 years and they get suspicious or better yet their password expired 10 days after they received it and they *can't* give the phisher access.

The only change for this to happen is for sites to actually use client certificates with SSL and for the browser and other software (hotsync, etc) to make this easier and... problem solved.
Re:Phishing just got a lot more interesting by operagost · 2007-03-13 05:13 · Score: 1

That's why the phisher will register a domain with only one or two changed letters.

--

Gamingmuseum.com: Give your 3D accelerator a rest.
Re:Phishing just got a lot more interesting by VWJedi · 2007-03-13 05:21 · Score: 1
Client certificates. When you register your account the first time with some site you get a certificate that your browser has to use each time you visit the site, or you can't get in without say actually calling the business to get one.

Your system breaks down in a couple different scenarios:
- I don't have access to my certificate because I don't have it with me. (e.g. I usually contact the site from my laptop and I don't have it with me, the battery is dead, or the hard drive crashed.)
- I access the site on a shared computer. Either I need to remove the certificate after every use (and call the business to get a new one for next time), or someone else will have access to it.
I realize there are work-arounds to these, but it's starting to sound less than "user-friendly".
Re:Phishing just got a lot more interesting by Sukh · 2007-03-13 05:26 · Score: 1

Since Slashdot chews up non-ASCII characters?? I think he meant '' which is U+0430!
Re:Phishing just got a lot more interesting by SnowZero · 2007-03-13 06:35 · Score: 1

True, but I think the more likely spoof would be something like "bankófamerica.com". If the history of phishing is a guide, that will net some users. I don't know if they are already considering such a thing, but limiting each component in the URL to a single language character set might be a good idea.

To me, the best approach would be to limit the character sets that can appear below any given TLD. So, having simplified Chinese under ".ch" would be fine, but not under ".com" or ".us" -- The idea being that local users should be able to tell the difference. That's because the biggest problem, as I see it, lies when people get exposed to characters they aren't used to; Our brains tend to map unfamiliar symbols to the ones we know.
Re:Phishing just got a lot more interesting by tokul · 2007-03-13 07:09 · Score: 1

Rubbish. Since when does a question mark look like an 'a'?
Slashdot uses ISO-8859-1. html entities are not supported. You can't write Cyrillic characters on slashdot.org.
Re:Phishing just got a lot more interesting by chihowa · 2007-03-13 08:52 · Score: 4, Funny

Rubbish. Since when does a question mark look like an 'a'?

Didn't you even read the post? When it's lowercase. Duh.

--
If you want a vision of the future, imagine a youtube comments section scrolling - forever.
Re:Phishing just got a lot more interesting by thelamecamel · 2007-03-13 20:15 · Score: 1

But '/' looks nothing like 'a'...

Great by Azathfeld · 2007-03-13 03:25 · Score: 1

Non-ASCII? This is awesome! I can't wait for the ANSI addresses to start showing up.

Dibs! by truthsearch · 2007-03-13 03:26 · Score: 3, Funny

I got dibs on sêx.com!

--
Developers: We can use your help.

Re:Dibs! by kimba · 2007-03-13 03:54 · Score: 2, Informative

I got dibs on sêx.com!

Umm, you do realise this was registered in 2005? Such domains already exist and can be registered today.

The technical test is about having Internationalised Domain Names at the top-level, or root, of the DNS. So then you can have .sêx rather than .sex.
Re:Dibs! by VWJedi · 2007-03-13 05:27 · Score: 5, Funny

The technical test is about having Internationalised Domain Names at the top-level, or root, of the DNS. So then you can have .sêx rather than .sex.

So we could theoretically have sex at any level... but this is slashdot, so it's not likely to happen for anyone around here.

Of little use by Anonymous Coward · 2007-03-13 03:26 · Score: 0

I don't see this as being very popular. Does the average Internet user know how to get an umlaut to display?

All it's going to do is open the door for more domains for the squatters to sit on.

Re:Of little use by Anonymous Coward · 2007-03-13 03:31 · Score: 2, Interesting

I would bet the average German Internet user knows how to do that. It's pretty easy when the key is on your keyboard: http://carbon.cudenver.edu/~tphillip/GermanKeyboar dLayout.html
Re:Of little use by TheThiefMaster · 2007-03-13 03:35 · Score: 1

And this is mostly for countries that don't use the same characters as English (Latin alphabet?), like Japan and China.
Re:Of little use by TheRaven64 · 2007-03-13 04:09 · Score: 1

Your average Mac user does too, since it's just option-u, followed by the letter. It was similarly easy on the Psion Series 3, but it seems harder on some other operating systems.

--
I am TheRaven on Soylent News
Re:Of little use by pclminion · 2007-03-13 05:35 · Score: 1

I don't see this as being very popular. Does the average Internet user know how to get an umlaut to display?

Yeah. All those people of the world who speak languages that use those characters have no clue how to actually type them in. Are you freaking stupid?
Re:Of little use by Anonymous Coward · 2007-03-13 06:16 · Score: 0

option-u? Who was the genius that came up with that? It should have been option-e or option-"
Re:Of little use by TheRaven64 · 2007-03-13 08:04 · Score: 1

It is the key of the vowel you are most likely to use the umlaut with. Typing ü involves holding option, tapping u, releasing option, and tapping u again. Option-e does the same thing for an accent like this é.

--
I am TheRaven on Soylent News
Re:Of little use by Anonymous Coward · 2007-03-13 08:17 · Score: 0

No, but apparently you are. Not every country has the same special characters, making these very nation-specific. Add to the fact that unless you have a special keyboard, entering these characters is annoying, at best.
Re:Of little use by fbjon · 2007-03-13 10:50 · Score: 1

Not every country has the same special characters, making these very nation-specific. Add to the fact that unless you have a special keyboard, entering these characters is annoying, at best.
Did you know that it's possible to embed links to other pages in HTML? It's actually fairly popular these days!

--
True confidence comes not from realising you are as good as your peers, but that your peers are as bad as you are.
Re:Of little use by Kirth+Gersen · 2007-03-13 15:36 · Score: 1

Well, the people who speak that language can type in the url, but probably nobody else in the world can. (It would probably lead to all foreign sites having to set up one server with an English domain name and one with a local domain name.) Heck, I'm not even positive I can type in an "accent grave" in this Slashdot message and get it to actually work. As for things like Khmer Unicode, which is poorly supported even inside Cambodia, lotsa luck.

Also, a lot of people don't know how to set up a computer for their own language, so they're stuck if they need to use a computer in a foreign country. (Even if the computer doesn't need admin rights to change the setup.)

Phishing by Alioth · 2007-03-13 03:27 · Score: 1, Redundant

In my skim through the various links, I didn't see what they are proposing to do for practical real-world problems such as phishing. What are they going to do to ensure that a phisher doesn't register a domain with characters that look almost indistinguishible from different characters in a different language, so as to trick users into visiting the phisher's site instead of the legitimate version of the site?

--
Oolite: Elite-like game. For Mac, Linux and Windows

Re:Phishing by Billosaur · 2007-03-13 03:36 · Score: 1

They'll do the same as is done right now: very little. If you're a company in this day-and-age, you have to register as many variants of your name as you can to ensure that phishers/domain squatters don't get undue traffic from your name. On the other hand, phishers don't necessarily need domain names that are close to their target domain; people don't generally read URLs that closely, just clicking on links they are sent. That's why phishing is still effective despite all the negative publicity.

--
GetOuttaMySpace - The Anti-Social Network
Re:Phishing by evought · 2007-03-13 03:50 · Score: 2, Informative

This has actually been discussed to some extent for years. One method is to only allow domains to be registered or displayed in a single language character set, such that a domain name can use latin characters or greek characters, but not both. This can be enforced at registration or when displayed in the browser (the browser can highlight improper URLs). This does not prevent attacks where the entire spelling of the domain is available in an alternate character set. One solution is for the browser to somehow tell the user what language a URL is written in.

Here is a detailed description of how IE handles this, and also a w3c page discussing general techniques and different browsers. An interesting note is the possible use of the fraction slash to add fake urls to a domain name. Of course, at the end of the day, standard phishing protection applies to domains which slip through the net.
Re:Phishing by LaurieDash · 2007-03-13 04:50 · Score: 1

It's still the powers that be that create the top level domains (.com, .co.uk, etc).

If you didn't allow non-ascii domain names on the current top level domains, then never create ones that look similar in non-ascii then the problem is solved.

I don't seeing anyone currently rushing to have .c0m as a top level?
Re:Phishing by insanecarbonbasedlif · 2007-03-13 11:47 · Score: 1

A shorter digest of the info evought provided is here.

--
Just because I doubt myself does not mean I find your position compelling.

Maybe not.. by KeepQuiet · 2007-03-13 03:29 · Score: 1

While browsers can't even properly show non-english alphabet, this doesn't seem to be a good a idea. My native language contains many special characters and I usually end up deciphering the emails sent by mom to me, because along the way, servers replace these characters with funny things.

Re:Maybe not.. by LighterShadeOfBlack · 2007-03-13 03:38 · Score: 3, Insightful

While browsers can't even properly show non-english alphabet, this doesn't seem to be a good a idea. My native language contains many special characters and I usually end up deciphering the emails sent by mom to me, because along the way, servers replace these characters with funny things. Well is it the browsers or the servers that are the issue? AFAIK any modern browser fully supports Unicode and any other encodings so there shouldn't be an issue there. If the servers are the problem then either it's the protocol that needs updating/replacing (I don't know nearly enough about SMTP, IMAP4, or POP3 protocols to comment) or the servers themselves are non-compliant. If there's a problem it should definitely be fixed, but you really need to know what the problem is first.

--
Spelling mistakes, grammatical errors, and stupid comments are intentional.
Re:Maybe not.. by walt-sjc · 2007-03-13 04:02 · Score: 1

Considering a lot of email is text, the inability to handle a character set may make it impossible for some people to email you if you have non-ascii characters in your address. Even people in your own country may have trouble. Not everyone uses the Outlook / Exchange combo...
Re:Maybe not.. by Petrushka · 2007-03-13 10:30 · Score: 2, Informative

Just about any e-mail service should enable the use of non-ascii characters. Any halfway decent e-mail client will; if you're using Thunderbird or Mail or Pegasus, just set the character set to UTF-8; I believe Pine allows UTF-8 too. (Personally I can't imagine any reason for not using UTF-8 as default; I use it all the time, even though almost all of my e-mails are in English.) Most web-interfaces allow it as well: Gmail certainly does, for example; I'm pretty sure Yahoo does.
Re:Maybe not.. by fbjon · 2007-03-13 10:53 · Score: 1

Such mail clients deserve to die already, IMHO. Not that I want to pressure anyone, but really, life and requirements move on.

--
True confidence comes not from realising you are as good as your peers, but that your peers are as bad as you are.
Re:Maybe not.. by shadow_slicer · 2007-03-14 03:56 · Score: 1

Most modern browsers can show the characters. Even IE 6 and older versions of mozilla have no problem displaying them (though older versions of mozilla may require some tweaking). The way special characters are encoded in mail was designed to be compatible with already deployed servers (with some special tags and something similar to the base64 encoding used for attachments). These servers don't see anything other than plain 7bit ASCII text, so it is unlikely it became garbled during transit. The most likely cause is either the sender is using a poorly configured mail client (that isn't setting the codepage and escaping things correctly), or you are using a poorly configured mail client (that isn't respecting the codepage specified).

What about security issues? by argent · 2007-03-13 03:30 · Score: 0, Redundant

The concern I have with IDNs is that they will make it too easy to produce "lookalike" domains, like "mcrosoft.com".

Testing functionality and behaviour with "good" names is an easy bar to hurdle.

Re:What about security issues? by argent · 2007-03-13 03:33 · Score: 1

That should be "mࡺcrosoft.com". Slashdot will probably need to be upgraded to support IDNs, it seems. :)
Re:What about security issues? by Anonymous Coward · 2007-03-13 03:33 · Score: 0

Better solutions for this problem(based on hashes, caches of visited sites, or similar) are already needed. "Paypal" and "PaypaI" look exactly the same in a few fonts and are very hard to tell apart in many.
Re:What about security issues? by JanneM · 2007-03-13 03:47 · Score: 2, Insightful

Like you already have with "l", "I" and "1"; or "O" and "0"; or "V" and "U", depending on the particular font you happen to use?

Phishing attacks mostly works not because people can't see a minute difference between two lookalike letters; they work because as long as nothing is utterly obviously, grossly out of order people just assume they're in the right place. You can have domain names that aren't even close to the real one, and websites with only superficial similarities to the original and a lot of people will still be duped.

--
Trust the Computer. The Computer is your friend.
Re:What about security issues? by 99BottlesOfBeerInMyF · 2007-03-13 03:49 · Score: 1

The concern I have with IDNs is that they will make it too easy to produce "lookalike" domains, like "mcrosoft.com".
This really seems like a pretty minor issue to me. Browsers would just need to adopt a policy of flagging URIs with mixed language character sets, highlighting that character in red or something. More dangerous is the new domain land grab as companies grab legitimate domains in other languages that natives feel the real company simply must own, but which the parent company probably does not. This can be addresses by a certificate scheme that ties identity verification to the site, however, and such a scheme really needs to be implemented on a wide scale to deal with current security problems anyway.
Re:What about security issues? by amuro98 · 2007-03-13 04:57 · Score: 1

It's worse than that, actually. Many codepages include double-byte versions of the ASCII characters that, for all intents and purposes, look IDENTICAL to your standard ASCII letters.

An example of this is Japanese's curious, and depreciated, half-width and full-width alpha-numeric characters. Both of these replicates ASCII letters using different code values. So within just Japanese alone, there are three distinct but identical-looking ways to display the letter "a" within a domain name. And other language codepages have this "feature" as well...

Short of being able to decipher raw bytes against a given encoding, you won't be able to tell where that link that says "ebay.com" will take you.
Re:What about security issues? by VWJedi · 2007-03-13 05:40 · Score: 0

An example of this is Japanese's curious, and depreciated, half-width and full-width alpha-numeric characters.

How do you calculate the monetary value of characters? I didn't know they could depreciate.

Or did you mean to say they are deprecated?
Re:What about security issues? by adolfojp · 2007-03-13 05:47 · Score: 1

But at least I will be able to register my last name. It is nice to see the World Wide Web becoming more... world-centric.
Re:What about security issues? by argent · 2007-03-13 05:58 · Score: 1

Like you already have with "l", "I" and "1"; or "O" and "0"; or "V" and "U", depending on the particular font you happen to use?

Indeed. This makes an existing problem much much bigger.

Phishing attacks mostly works not because people can't see a minute difference between two lookalike letters; they work because as long as nothing is utterly obviously, grossly out of order people just assume they're in the right place.

And what people see as "obviously out of order" changes as people learn about phishing. It's like conterfeiting: when the appearance of money changes you get a period where lower quality notes can be successfullt "passed", and the Treasury makes an effort to get the word outahead of time to make sure businesses at least are familiar with the new notes.

Similarly, people can learn not to be phished. That's why you have phishers hiding the address bar, emulating the address bar, creating addresses that try and push the "root" of the name off the address bar, creating addresses like "http://microsoft.com@192.168.1.1/security", and so on.

Being able to have addresses that are visually identical but encoded differently is a real problem, and one that needs to be solved before IDNs are rolled out.
Re:What about security issues? by argent · 2007-03-13 06:08 · Score: 1

This really seems like a pretty minor issue to me. Browsers would just need to adopt a policy of flagging URIs with mixed language character sets, highlighting that character in red or something.

It's not a minor issue, and it's not an insoluble issue, but it's one that needs to be positively and aggressively addressed.

And it's not just browsers: you need to flag these characters in any application that renders internationalized text with or without HTML being an intermediary. Alternatively, registered domains can be restricted to distinct subranges of Unicode, so that you couldn't (for example) register a second level domain containing glyphs outside a single national character set.

The point is, this test is just verifying that an issue that nobody really thought was going to turn out to be a problem is, in fact, not a problem. It doesn't mean that widespread use of IDNs should be considered imminent.
Re:What about security issues? by Anonymous Coward · 2007-03-13 10:19 · Score: 0

As someone already pointed out, eg. the Japanese set contains the full alphanumeric alphabet so searching "glyphs outside a single national character set" would not work.

A solution would be that the domain granting organizations will have to do a non-trivial string compare, collating all possible "look-alike" characters in the whole Unicode set, to ensure the uniqueness of the URL being requested by a registrar. This could be automated so there's not really any problem with the phishing stuff, IMHO.
Re:What about security issues? by argent · 2007-03-13 13:44 · Score: 1

This could be automated so there's not really any problem with the phishing stuff, IMHO.

Not if it's actually implemented.

But given some of the ratbags running domain registrars, you think they'll bother?

the Japanese set contains the full alphanumeric alphabet

There are always a few special cases. You just deal with them... for example, deny names using just those characters.

Cool new phishes by Threni · 2007-03-13 03:30 · Score: 0, Redundant

I look forward to www.paypa|.com etc etc

Great... by Anonymous Coward · 2007-03-13 03:32 · Score: 0

Slashdot; "it is what IT is" unless it isn't...

slàshdot.org
sláshdot.org
slâshdot.org
slãshdot.org
...
slashdöt.org

We should simply invade any country that doesn't use the latin alphabet and teach them English.

Re:Great... by ObsessiveMathsFreak · 2007-03-13 04:46 · Score: 1

We should simply invade any country that doesn't use the latin alphabet and teach them English.
Please don't. That's how your own country got started in the first place.

--
May the Maths Be with you!
Re:Great... by Anonymous Coward · 2007-03-13 04:50 · Score: 0

And look how she turns out! Oh wait, you were saying....?

In practice it means "national" URLs. by KokorHekkus · 2007-03-13 03:33 · Score: 1

If your company/organisation/you have any international contacts then you will NOT be using these international URLs. So you still need the old-style URLs or you'll need to explain how to get those umlauts etc to type in the url. On their national keyboard... not yours that has them. And if you've done any support you know how hard it's even to get someone to READ what's already on the screen...

Re:In practice it means "national" URLs. by leuk_he · 2007-03-13 03:39 · Score: 2, Informative

umlaut is hardly a problem if you set the use keyboard to üs-ïnternätional. But asian/hebrew/arabic/hebrew charcacter are much more difficult to enter... in my expierence.

But you will still be able to click them. IDN support is available in most popular browser (although disbled for security issues.)
Re:In practice it means "national" URLs. by JanneM · 2007-03-13 03:42 · Score: 1

So you still need the old-style URLs or you'll need to explain how to get those umlauts etc to type in the url.

How often do you ever type in an URL in the first place? You get the link from another web site, from Google, in an email or wherever. And AFAIK, the fallback representation is no less readable and typeable than many current domain names.

Besides, if the website is already in the country's language, you won't be too likely to be interested in it anyway unless you know it (and, presumably, know how to type it).

--
Trust the Computer. The Computer is your friend.
Re:In practice it means "national" URLs. by kimba · 2007-03-13 03:57 · Score: 1

IDN support is available in most popular browser (although disbled for security issues.)

What browser are you referring to? IDN support is in Firefox, IE, Opera etc. and not disabled, so I am wondering what this most popular browser you are referring to is...
Re:In practice it means "national" URLs. by widhalmt · 2007-03-13 03:58 · Score: 1

So any company will do itself something good if they get the "local" and the english domain. I already have customers acting that way. Unfortuantely many companies aren't even aware of this drawback. :-( I'm really curious what this switch will bring us and what problems will arise with all the little programmes which are already hard to use, like WAP browsers.

--
Feel the power of the Sun!
Re:In practice it means "national" URLs. by KokorHekkus · 2007-03-13 04:07 · Score: 1

umlaut is hardly a problem if you set the use keyboard to üs-ïnternätional. But asian/hebrew/arabic/hebrew charcacter are much more difficult to enter... in my expierence.
Those who will have these "international" URLs will almost all be using their national keyboards so they will not be familiar with the US keyboard layout... or other foreign layouts. And umlauts was just one example... what about "ç" (had to paste it myself..) or "". How would they be certain how they're mapped in a foreign keyboard (not just US.. swedish, german, french etc). I think my point stands... i.e "you still need the old-style URLs" (when having international contacts).
Re:In practice it means "national" URLs. by KokorHekkus · 2007-03-13 04:25 · Score: 1

Getting it the URL from a mail will be all nice and dandy if the mail comes from someone who knows the input method. But if the person who got it wants to send the link to someone else they'd need to paste it instead of typing it (I often type URLs into mails instead of pasting them). Of course you can find the urls by other methods... but you'll just be pissing people off by making it more difficult to reach you. And there's much more easier ways to do that if you want to play that game.

And I said "If you have any international contacts...". Obviously if you only use the local language you're not having much international contacts. So the last "Besides..." was pretty much a non sequitur.
Re:In practice it means "national" URLs. by Bogtha · 2007-03-13 05:42 · Score: 1

If your company/organisation/you have any international contacts then you will NOT be using these international URLs.

No, it means you can't rely on them. Which is pretty much the same for any new technology that requires client support.

A practical way of using these domains is to set up an ASCII one that you advertise, and redirect to the canonical one with the umlauts etc. That way native speakers aren't alienated by the mangling of their language and don't get errors when they type in the real name of the company + '.com'.

You might think that native speakers of non-ASCII languages would be used to mangling their words for domains by now, but it's surprising how persistent muscle memory is when it comes to spelling. Even though I've been a web developer for many years, I still occasionally get frustrated with typing 'colour' when the CSS property is 'color'*, etc. It wouldn't be so bad if CSS came out of the USA, but it was a Norwegian who created it and the international organisation who developed it was created by an English guy working in France.

--
Bogtha Bogtha Bogtha
Re:In practice it means "national" URLs. by Kadin2048 · 2007-03-13 07:21 · Score: 1

This is quite true. Only businesses who are exclusively local will have a single domain name that uses the high-order characters; everyone else will get two, at minimum -- one local one, and one that's the closest-possible ASCII approximation.

It's not just a "doing business with Americans" (or other Westerners) problem, it's a 'doing business with anyone outside your area' problem. ASCII is the only character set where you have a good chance of ensuring that some other person will be able to type it. I.e., someone using Indian localizations and another person using Japanese localizations would probably be hard-pressed to find characters in common that can be easily typed, other than ASCII.

I could see this having some interesting effects, besides the obvious (land-grab for domains, both ASCII and localized) -- since many more people are going to be using multiple domains, hosting and registration companies will probably want to bring out more tools for simplifying the management of multiple domains. Anyone who does business in n languages, will probably want [n+1] domains (unless one of their languages is English or another language which maps easily to ASCII), while today they might just have a single ASCII one. That's going to be a lot of domains to register and manage.

--
"Ladies and gentlemen, my killbot features Lotus Notes and a machine gun. It is the finest available."
Re:In practice it means "national" URLs. by leuk_he · 2007-03-13 09:10 · Score: 1

international keyboard is a setting of the US layout keyboard....

type a " and e and you will get ë

Get back to work, Dubya by Anonymous Coward · 2007-03-13 03:35 · Score: 0

And quite posting as AC...we know who you are.

Re:Get back to work, Dubya by Anonymous Coward · 2007-03-13 04:34 · Score: 0

it's not like he knows English well enough to follow up with this suggestion...

Yeah well... by Anonymous Coward · 2007-03-13 03:35 · Score: 0

I'm going to be the only slashdotter in history to have se×.com

Can we have "/..org" now ? by Rastignac · 2007-03-13 03:36 · Score: 1

It can be cool (?).

--
-- Rastignac was here.

Re:Can we have "/..org" now ? by zootm · 2007-03-13 03:49 · Score: 1

/ is a separator in URLs, so I suspect not.
Re:Can we have "/..org" now ? by Anonymous Coward · 2007-03-13 04:34 · Score: 0

/ is a separator in URLs, so I suspect not.
But there are many Unicode characters that should look much like it. For example there's a separate division character available that should be interpreted like any other non-ascii letter.
I'm just not sure about the dot part.
Re:Can we have "/..org" now ? by zootm · 2007-03-13 05:08 · Score: 1

I'm sure there's plenty of Unicode characters which look like a period too, so yeah, if you just want it to look like it you're probably fine. At worst the dot could be replaced with a dot at half-line height (which would probably be more accurate to the word "dot" anyway ;) ).

They are not "international urls" by El_Muerte_TDS · 2007-03-13 03:39 · Score: 1

They are internationalized urls. If they were international urls I would be able to enter them in my browser without doing funky stuff.

Couldn't they just have encoded it? by mwvdlee · 2007-03-13 03:40 · Score: 1

Pardon my ignorance, but couldn't they have just thought of an encoding scheme? Similar to how certain characters are encoded in the path of an URL ("&"-style or "%20"-style). Possibly a more complicated scheme would have been necessary, but surely it would have been possible without requiring changes to the ASCII nature of domains.

--
Slashdot social media options: AIM, ICQ, Yahoo, Jabber and Mobile Text. Why no MySpace?

Re:Couldn't they just have encoded it? by Tet · 2007-03-13 03:45 · Score: 1

Pardon my ignorance, but couldn't they have just thought of an encoding scheme?
Already been done. See Punycode (RFC3492). The problem with encoding schemes, though, is that they aren't memorable, and hence are problematic to typo into, say, the location bar of a browser.

--
"The invisible and the non-existent look very much alike." -- Delos B. McKown
Re:Couldn't they just have encoded it? by slart42 · 2007-03-13 03:46 · Score: 1

>Pardon my ignorance, but couldn't they have just thought of an encoding scheme?

This is exactly what is happening behind the scenes AFAIK. It's called Punycode.

See http://en.wikipedia.org/wiki/Punycode
Re:Couldn't they just have encoded it? by Anonymous Coward · 2007-03-13 05:19 · Score: 0

The problem with encoding schemes, though, is that they aren't memorable, and hence are problematic to typo into, say, the location bar of a browser.
Not really a valid objection. My browser already translates the large number of characters that aren't valid in URLs (like spaces and so forth) into the proper escaped equivalents. Internationalization of URLs is purely a user interface concern; there's no need for it to be done on the backend, with all the corresponding implementation headaches and breakage that implies.

In fact, this is exactly how IDNs work.
Re:Couldn't they just have encoded it? by Schraegstrichpunkt · 2007-03-13 11:39 · Score: 1

The funny thing about Punycode is that it isn't even necessary, at least from the perspective of the DNS protocol. Since DNS labels are encoded as length+data, you can theoretically just put arbitrary UTF-8 characters directly into a DNS name. Unfortunately, I think BIND (in its infinite brokenness) didn't handle that very well or something.

--
http://outcampaign.org/
Re:Couldn't they just have encoded it? by amorsen · 2007-03-14 01:01 · Score: 1

Unfortunately, I think BIND (in its infinite brokenness) didn't handle that very well or something.

BIND seems to handle it just fine; I don't know of any problems with UTF-8 in BIND. I still don't get why punycode was invented, and this is the one issue where I agree with Daniel J. Bernstein. See his page on the issue.

--
Finally! A year of moderation! Ready for 2019?

They could split unicode into sections by Colin+Smith · 2007-03-13 03:40 · Score: 2, Insightful

Call them, say, "character sets.

Then only allow names and queries all from the same character set.

--
Deleted

English "X" vs. Cyrillic "khah" by J.R.+Random · 2007-03-13 03:42 · Score: 3, Insightful

This is just common sense -- there's no reason why Chinese, Greeks, and Russians should have to use a character set meant for the English language. But any given URL should have a language associated with it and any character in that URL not associated with its language should be color coded. So English language URLs would get "omicron" flagged while Greek URLs would get "O" flagged. The "default" language could be English so that existing URLs are unchanged, for other languages their ISO code could precede the URL. Now this particular scheme might have some fatal flaw but something similar ought to be workable.

Re:English "X" vs. Cyrillic "khah" by Anonymous Coward · 2007-03-13 03:54 · Score: 0

Cause it's our goddamned internet.
Re:English "X" vs. Cyrillic "khah" by pavon · 2007-03-13 04:52 · Score: 2, Insightful

Agreed, although I think a dialog box should also be shown as an annoyance / deterant. Otherwise just imagine what the Web 2.0 folks will do when they realize they can redirect their site to one with cool multi-colored URLs, thus conditioning people to ignore the colored warning. And you thought del.icio.us was overly cute :)
Re:English "X" vs. Cyrillic "khah" by glwtta · 2007-03-13 08:16 · Score: 1

Good point; never thought of this.

My suspicion is that the only way to deal with this is to completely disallow mixing of languages in the same URL (or at least in the domain name, which should be enforced by the registrars). Anything less leaves far too much room for abuse. Imagine the field-day phishers would have with this: register www.bankofamerica.com with an omicron and a digamma (ok, the lower case wouldn't look right - you know what I mean), and you control a domain visually indistinguishable from your target.

--
sic transit gloria mundi

Glyph Masquerade by Doc+Ruby · 2007-03-13 03:44 · Score: 1

Also needed is automatic translation by, say, a Firefox extension, from the domain name's registered home language (if any) into the user's default language. How do you say "goatse" in Urdu?

A good complement to the new system to preempt the huge coming problem of "glyph masquerade" would be registrations including a list of the domain name translated into different languages. Or at least a declaration of the home language. Without enforcement (ICANN doesn't even enforce name/address veracity) it won't be proof of anything, but it would be a start. And 3rd party databases could include in trust ratings the completeness of the name entry, as well as cross-checks.

I'd like my GUI to at least indicate when a domain name is rendered in foreign glyphs, so I can try to tell whether it's really just foreign glyphs that look like a familiar English word, fooling me into clicking on something totally unrelated.

Opening the system to foreign scripts and languages will get even more worthwhile people and orgs onto the Net, so it's well worth the risks of misidentification. But the risks are real, and largely predictable. We should roll out the new, inclusive system with risk mitigations to welcome those new people in greater security.

--

--
make install -not war

Re:Glyph Masquerade by Anonymous Coward · 2007-03-13 04:34 · Score: 0

How do you say "goatse" in Urdu?

I dunno, but you can experience it by eating the beef vindaloo!
Re:Glyph Masquerade by Anonymous Coward · 2007-03-13 04:43 · Score: 0

How do you say "goatse" in Urdu?

It's: docruby
Re:Glyph Masquerade by Anonymous Coward · 2007-03-13 05:50 · Score: 0

How do you say "goatse" in Urdu?

The same way you do in many languages: "Aarrgghh!"
Re:Glyph Masquerade by Anonymous Coward · 2007-03-13 11:41 · Score: 0

Also needed is automatic translation by, say, a Firefox extension

Riiiight, because we only want this to be usable by Firefox users.
Re:Glyph Masquerade by Doc+Ruby · 2007-03-13 12:12 · Score: 1

Apparently you're not familiar with the idiom of "X, say, Y", which means "for example Y, among other appropriate measures".

Perhaps you need an automatic translation from English to, say, duh.

--
--
make install -not war

No big deal by Dancindan84 · 2007-03-13 03:45 · Score: 0

With the exception of the phishing possibilities that others have already noted, there really shouldn't be any change for English speaking internet users. Most English websites aren't going to want to use special characters. My parents have a hard enough time grasping ctrl-c and ctrl-v for copy and paste. Good luck to anyone explaining alt-145 for them to get to æon.com

--
"Always forgive your enemies; nothing annoys them so much." - Oscar Wilde

Other strange domains already in existence by Anonymous Coward · 2007-03-13 03:45 · Score: 0

I saw this domain.. ©.com (http://©.com) for me it is accessible in firefox but not IE

Re:Other strange domains already in existence by Intron · 2007-03-13 06:51 · Score: 1

I felt a great disturbance in the Net, as if millions of DNS servers suddenly cried out in terror.

--
Intron: the portion of DNA which expresses nothing useful.
Re:Other strange domains already in existence by Fred+Ferrigno · 2007-03-13 13:27 · Score: 1

Firefox was ever so nice as to convert that into Punycode for me: xn--gba.com

Strangely, accessing ©.com in IE directed me to an advertisement for VeriSign's IDN client software. xn--gba.com works just fine in IE though.

More than just non-ASCII by Anonymous Coward · 2007-03-13 03:48 · Score: 0

66.35.250.150 is non-ASCII.

Security minded questions by merc · 2007-03-13 03:50 · Score: 2, Interesting

Will having non-ASCII data in FQDN's open us up to buffer-overflow attacks in various network-aware services?

--
It's true no man is an island, but if you take a bunch of dead guys and tie 'em together, they make a good raft.

Re:Security minded questions by Anonymous Coward · 2007-03-13 04:53 · Score: 0

My guess is that for a typical network app written in C (I bring up C because it's infamously prone to buffer overflow attacks), if it's totally unaware of fancy domain names, it should just work dumbly. For example if the domains are encoded using UTF-8, strlen() will still work on a UTF-8 string, even if the application is too dumb to know that it's UTF-8 and not ASCII.

On the other side of things, if the application is written with internationalization in mind and maybe uses wchar_t's everywhere properly, that wouldn't be much of a problem.

The problem could come when apps try to change from narrow characters to wide characters, and they get it sort of moved over but miss a line or two here and there.

Some Unanswered Questions About IDNs ... by Ron+Bennett · 2007-03-13 03:51 · Score: 2, Interesting

Below is a quick copy and paste from one of my posts on DNForum regarding IDNs ... I own some IDNs and believe they have much potential, but there are still many unanswered questions...

Excerpt from a post of mine on DNForum regarding IDNs:
http://www.dnforum.com/showthread.php?p=732080

I'm running into a lot of issues that many IDN folks aren't discussing - probably because they've not consider them ...

Various issues / threats / questions:

?? The existance of numerous diverse dialects, even totally different languages, etc in the same country ... it's among the reasons that English dominates in some areas; some natives, even if they can understand a particular dialect, will sometimes speak a totally non-native language, such as English, instead to avoid risk of offending the other party. One can't assume one language dominates an entire region - languages can also overlap many areas ... it's one of the reasons some are pushing for language / culture based TLDs, such as .CAT (among the dumbest ideas ever, but that's another discussion for the .CAT thread running here on DNF).

?? An IDN that contains western european characters that very close matches a non IDN ... ie. cafe.com verses café.com ... what happens? Will the IDN be highlighted / blocked by default? ... likely an easy UDRP target? ... introduction of a new IDN specific dispute procedure? -perhaps there already is one?

?? Trademark issues ... ie. an IDN that is similar / exact to a trademark in another country ... less obvious, what about an IDN that translates to that of a trademarked word / phrase? -I believe there's a thread discussing such an issue now on one of the other boards here.

?? language variants (more applicable to asian languages, etc) related issues ... how good / stable are the various language variant tables?

?? what happens when a language variant table changes? -how are conflicts handled?

?? what happens if a character variant (an IDN [IDL package] technically can comprise multiple character variants [code points]) is released? ... does the current registrant get first dibs? ... even if yes, it may not be quite that simple if a character variant occurs in numerous permutations.

?? What happens if a reserved character variant is changed to a preferred character variant? - while such a change would have little to no effect on affected IDNs (IDL packages), it could result in the appearance of some IDNs changing ... probably not a biggie compared to some other issues, but one to be aware of.

?? How reliable, especially for those in languages with numerous character variants, will IDN domain resolution be? ... IDN resolution depends on much client-side APIs.

?? How well will IDN resolution APIs be regulated ... I can easily envision scenerios in which a web browser and/or other applications (email, IM, etc) implement resolution differently ... ie. adding and/or ignoring one or more valid language associations for a particular IDN / converting similar-looking western european characters to standard A-Z characters, etc. A related concern is language table management - I'm a little hazy on if the tables will be internally stored by each app or remotely loaded for each session, etc.

Rambling on, but there are a lot of things that one needs to be aware of with IDNs.

H4x0rs our there rejoice... by chord.wav · 2007-03-13 03:51 · Score: 1

http://www.145/|-|D07.org

Imagine it with different ANSI colors for each char.

Balkanising the internet? by hcdejong · 2007-03-13 03:53 · Score: 3, Interesting

Would this lead to segregation of the internet into zones defined by the language used for the domain name? At the moment, I can access e.g. Japanese websites easily, even if the content of that site is in a language I don't understand [1].
If non-Roman domain names become popular, will I still be able to access them, or will they disappear behind untypeable URLs? A search engine may be able to mitigate this problem somewhat, but ATM I sometimes get search results for Japanese-language pages only because my search term is present in the URL.

1: yes, a site can still be useful in this case and no, despite the stereotype it's not just for porn.

Re:Balkanising the internet? by Churla · 2007-03-13 03:57 · Score: 1

You're looking at this from the perspective as a native English speaker.

Imaging all the Japanese who don't know English, but have to learn/type english domain names. Very unintuitive for them.

My concern would be for all the internet filtering and firewalling software which explicitly only allows ASCII in HTTP headers.

--
I'm a fiscal conservative, it's a pity we don't have a political party anymore
Re:Balkanising the internet? by kimba · 2007-03-13 03:59 · Score: 2, Informative

My concern would be for all the internet filtering and firewalling software which explicitly only allows ASCII in HTTP headers.

IDN encoding is pure ASCII, in a similar way that MIME email attachments are. The protocol layer never sees anything other than letters, numbers and hyphens. All IDN encoded domains are prepended with "xn--" so that end-user interfaces can tell them apart and convert them back and forth.
Re:Balkanising the internet? by tpjunkie · 2007-03-13 04:19 · Score: 1

Considering english is mandatory in schools there, the number of people who dont know any is quite low, and mainly an older segment of the populace (insert korea-old people-email joke). Also, Romanji (as roman characters are called there) are used everywhere, from signs to advertising to hillarious clothing . But true enough, in many countries english domain names on a non english keyboard could be a real pain in the ass.
Re:Balkanising the internet? by badasscat · 2007-03-13 04:19 · Score: 2, Interesting

Imaging all the Japanese who don't know English, but have to learn/type english domain names. Very unintuitive for them.

Bad example.

The Japanese are probably the *least* likely of any non-English speaking country to use non-roman url's. The fact is the standard Japanese keyboard is the same exact QWERTY keyboard we use. They can type Japanese through software, which is how they normally work when writing to each other, but there's nothing "non-intuitive" in using an English keyboard in the way that it was intended. In fact, most of them write Japanese using romanizations, then select the correct kanji through a list. So they're universally familiar with romanized url's, and like any habit, it's not going to change just because an alternative became available. Typing kanji is harder on a Japanese computer than typing a romanization.

Now, the Chinese, Russians, etc. I don't know about, so there could be better examples out there of people who would take advantage of this.
Re:Balkanising the internet? by semiotec · 2007-03-13 04:23 · Score: 1

well, I am guessing that Google and other search engine/portal sites will be wetting their pants out of excitement, if what you fear becomes prevalent, as people will have to rely more and more on searching for the sites and clicking the link rather than typing in the address. But seriously, I think most navigations these days are done through clicking anyway, rather than actually typing the address into the navigation bar, and even then the auto-complete feature means you rarely have to type the entire address. There also also tools like the useful Firefox extension that turns any text string into a url, which really reduces the necessity of having to type out an address. I think the only web address I ever type these days is my bank's address.
Re:Balkanising the internet? by amuro98 · 2007-03-13 05:31 · Score: 1

There are different types of keyboards available for Japan. One uses pretty much the standard US QWERTY layout with English letters on the keys. You then type in romajii, and the computer tries to guess what character(s) you want to use. Another keyboard has hiragana characters on the keys and acts as you'd expect.

The algorithm used to guess what characters you want to use has gotten pretty sophisticated, using a combination of statistical analysis to keep track of commonly used words (like your name) and limited context parsing, it's gotten a lot better from the early days when you would type in the letter "o" and be presented with 60+ characters ;)

Chinese is similiar. You can type in pin-yin or one of the other romanization methods. You can also type in "bo-po-mo-fo" which is a phonetic script that is taught to kindergarten kids to help them learn to pronounce Chinese properly. Like Japanese, the keyboard can come in a US-QWERTY style, or with the "bo-po-mo-fo" characters. There's probably some others as well. I know that the early computer keyboards for Chinese contained keys for every radical from the Chinese characters. To type a single character, you'd punch in the radicals in the order you would normally write, and the computer would assemble them into the desired character.

Russian and other similar languages have their own keyboard layouts. Even French and German have slightly different keyboard layouts from the standard EN-US 101 key QWERTY layout. For fun, try going into Windows' regional settings, and change your victim's keyboard setting. A very nasty trick to play on a touch-typist ;)
Re:Balkanising the internet? by Ilgaz · 2007-03-13 07:05 · Score: 1

Japanese (or any international including latin based) will KEEP their current english hostnames, they will (likely) buy an additional international name and simply put their IP addresses to DNS.

Domain sales will explode. Well at least for a real and justified reason now...

dates by Anonymous Coward · 2007-03-13 03:55 · Score: 0

What's with the stupid dates - eg 7 March 2007 on that site?!
You'd think they'd use the ISO 2007.Mar.7

First Test? by Rocketship+Underpant · 2007-03-13 03:55 · Score: 1

As far as I know, Japanese URLs have been working and in use for quite some time. I've visited several myself. Mind you, I'm surprised anyone in the anglophone sphere takes notice.

--
He who lights his taper at mine, receives light without darkening me.

Romanization as DNS lingua franca by StreetStealth · 2007-03-13 04:00 · Score: 2, Interesting

Couldn't these linguistically-heterogenous domain spaces still be universally linked through romanization? I see one possible solution: An intermediary DNS conversion server; i.e. type "[those were supposed to be Japanese kanji].co.jp" and your DNS request is treated the same as "rakuten.co.jp". Beyond the inability to rake in tons of money for new registrations, what might be the disadvantages of such a system?

--
Your mind is clear / The things that you fear / Will fade with how much you / Believe what you hear

Re:Romanization as DNS lingua franca by Nimey · 2007-03-13 04:18 · Score: 2, Interesting

For some languages, like Arabic, there is no one standard for romanization. A trivial example is Qu'ran/Koran.

--
Hail Eris, full of mischief...

E pluribus sanguinem
Re:Romanization as DNS lingua franca by amuro98 · 2007-03-13 05:05 · Score: 1

Japanese has 2 romanization standards, Chinese has at least 2, as well as the "bo-po-mo-fo" method some learned in schools which is itself a non-ASCII character set... And what about languages that have no written form whatsoever? Doesn't Unicode attempt to address these languages as well? Does that mean we could eventually see something like k!ung''.com from a hunter-gatherer tribe in South America?
Re:Romanization as DNS lingua franca by ConceptJunkie · 2007-03-13 05:11 · Score: 1

Reminds me of the Saturday Night Live sketch listing the various spellings of "Khadafy".

--
You are in a maze of twisty little passages, all alike.

Terrorists by Anonymous Coward · 2007-03-13 04:06 · Score: 0

Now we're just letting the terrorists win! They're hide behind their exotic non-ascii URL names, hold secret forum meetings, etc., and there is nothing the USA can do to see them! Hopefully the NSA will get special training ("Okay. Hold down ALT. Now press these numbers on the numeric pad...")

I heard of this long time ago by guruevi · 2007-03-13 04:10 · Score: 1

And there are quite some solutions to it. One of them (I think this is the one we're talking about) is converting the characters to ASCII and serialize them. Quite simple, let the browser do it.

--
Custom electronics and digital signage for your business: www.evcircuits.com

*top* *level* domains, not domains by Anonymous Coward · 2007-03-13 04:17 · Score: 0

What are those morons at BBC writing about? Internationalised Domain Names (IDNs) have been available for some years. Ah, that's it, from the ICANN home page:

Autonomica AB has, under a contract with ICANN, investigated whether the addition of top level domains containing encoded internationalized characters (so called IDNs) would have any impact on the operations of the root name servers providing delegations, or the iterative mode resolvers used to look up the information. No impact at all could be detected. All involved systems behaved exactly as expected.

So it's about non-ASCII top level domains, not just non-ASCII domains, i.e. . instead of .co.jp.

Some have been working on this for a while... by ketilf · 2007-03-13 04:29 · Score: 1

http://pi.cr.yp.to/

As a side note, it's interesting that Slashdot says this link is at cr.yp.to.

fingering fun by monotony · 2007-03-13 04:41 · Score: 1

so how am i, on my gb keyboard suppose to conveniently type in all sorts of foreign characters?
if there is going to be some traditional ASCII alternative url.. then just what are we doing?

i am all for versatility, but there is always talk about unification, this would just segregate the web into 'things i can type' and 'things i can't'

and considering that html is in american, and that most people take into account that english is a very common language when designing a page, are we not just creating some novelty, which after a while will annoy all but a few?

of course, dns is only a convenience anyway, we could solve all this and all start memorising ip addresses, especially when IPv6 should soon be in play. XD

Re:fingering fun by hyfe · 2007-03-13 05:16 · Score: 1

First of, welcome to facing the problems the rest of the world have been for some time.
so how am i, on my gb keyboard suppose to conveniently type in all sorts of foreign characters?

You're not. Same way there is no convient way to write english chars on a russian keyboard. There's nothing to do but switch charset and try to remember where the characters were.

The more important question is, why should countries with completely different alphabets than us be forced to use our alphabet? Right now, Russians have to switch alphabets based on what they're doing. Writing a document, Russian charset! Typing an URL, english charset! For SMS and short messages, they just transcribe everything into the english charset, but anything remotely official have to be decent.
i am all for versatility, but there is always talk about unification, this would just segregate the web into 'things i can type' and 'things i can't'

It's already been divided by language. If you know the langauge, you probably had their charset available anyways, as you presumably write it occassionaly.
and considering that html is in american, and that most people take into account that english is a very common language when designing a page, are we not just creating some novelty, which after a while will annoy all but a few?

Of course, to an American the web looks English. To a Norwegian, it doesn't. It's about half'n'half between English and Norwegian content for me personally.. and I read English just as effortlessly as Norwegian, so it's all about content. (with the occasional Russian thrown in to verify that not practicing isn't magically increasing my skills).

In short, get your head out of your ass and stop whining. Have a cookie.

--
"" How about taking the safety labels off everything, and let the stupidity-problem solve itself? """
Re:fingering fun by pclminion · 2007-03-13 05:38 · Score: 1

so how am i, on my gb keyboard suppose to conveniently type in all sorts of foreign characters?

What are you saying -- that you actually TYPE URLs into the address bar? Have you never heard of del.icio.us? Or bookmarks? Or clicking on a link?
Re:fingering fun by monotony · 2007-03-14 04:18 · Score: 1

to be a bit awkward, i'm not american =P
and although i do agree with your points, i was of course refering to markup, which is written in american.

unless i've ben mistaken all this time, html was all about something anybody can use... but of course you'd have to speak english, or at least the american version of it. so hadn't we already decided on a standard?

far from my intention to whine, but i shall have a cookie anyways =)

Misleading article ? by Bob-taro · 2007-03-13 04:49 · Score: 1

From the test results document (it appears to me, anyway, that this is test they're talking about in the article):

With IDNs, the domain names stored in the DNS servers are ordinary domain names just like before. The names stored have no special properties that makes it possible for the DNS servers to single out the IDN domains. There is no reason to believe that IDNs would make the DNS system as a whole behave different from its normal behaviour. Nevertheless, for prudence ICANN has asked that it be tested that this assumption is true.

I looked at this because I wanted to see what some of these internationalized URLs looked like, and they were all regular ascii urls. I'm not really sure what this test proved.

--
Prov 9:8 Do not rebuke mockers or they will hate you; rebuke the wise and they will love you.

Re:Misleading article ? by Andy_R · 2007-03-13 05:14 · Score: 1

I spent a while blundering round the icann site trying to find out which characters they were going to support, and all I foud out is that they never use 10 words to say something when 1000 would do. No wonder they never get anything done!

Has anyone found a list of the new characters they are planning to allow? There are loads of ASCII ones currently banned, and I'd like to know if would allow a backdoor to registering some english domains that I might want, such as Andy_R.com

--
A pizza of radius z and thickness a has a volume of pi z z a

Already done by kahei · 2007-03-13 05:03 · Score: 2, Interesting

Once again, committees lag behind actual problems and actual solutions.

Now if you'll excuse me I'll go back to browsing .jp.

(I seem to recall that /. has issues of its own, so the ascii encoding of that would be http://xn--cckev5k8eta5k.jp/. Anyway, the point is that characters beyond ASCII have been used for ages. Mostly by people who don't mind it when users from other countries can't access their site.)

--
Whence? Hence. Whither? Thither.

Hogwash and a waste of time... by ConceptJunkie · 2007-03-13 05:17 · Score: 1

If ASCII was good enough for the Apostles Peter and Paul then it ought to be good enough for everyone.

--
You are in a maze of twisty little passages, all alike.

Multiple character sets in one URL? by Kadin2048 · 2007-03-13 05:19 · Score: 1

Okay, I'll bite. I have what I think amounts to a fairly good, if basic, understanding of how internationalized character sets and encodings work, but I don't understand how you'd encode multiple character sets into one URL.

I mean, first of all, in order to use non-Latin characters at all, you have to have some way of transmitting which character set / codepage you want to use. I can't find any place in TFA where they actually describe how this is going to work (although I didn't read the PDF, so perhaps it's in there), but my assumption was that it would be transmitted outside the actual stream of bytes that represent the URL.

So, a "URL block" might consist of some metadata about the URL that's going to be transmitted -- e.g., what character set it's written with, etc. -- and then the stream of bytes that actually represent the address. Doing it that way would by definition only allow one character set per URL, because there's no way of changing it mid-stream.

If you allow people to change character sets in the middle of the address, so as to have an address where one part was written in ASCII or Latin-1, and then another byte or two in UTF-8, and then the remainder in Latin again, would hugely complicate the standard both from an implementation and use perspective.

As long as all the alternative (that is, alternative to ASCII) encodings include within them a minimalist Latin charset, enough so that you can type the ".com" and other TLDs, then there doesn't seem to be any reason to allow mixed-charset URLs.

--
"Ladies and gentlemen, my killbot features Lotus Notes and a machine gun. It is the finest available."

Re:Multiple character sets in one URL? by Anonymous Coward · 2007-03-13 06:24 · Score: 0

I'm going to go out on a limb and say 'unicode' (probably UTF-8). It'd be ridiculous in this day and age to support every single character set in the old way. ASCII is a subset of both Latin-1 and UTF-8, and Latin-1 is *almost* (but not quite) a subset of UTF-8. How is one supposed to type 'a minimalist latin charset' in a keyboard that doesn't have them, are you proposing changing the keyboards of all those who don't use latin-based charsets? Learn a language or two, try to find out more about the world, and *then* give your opinion, at least then it'll have a basis in fact.
Re:Multiple character sets in one URL? by Kadin2048 · 2007-03-13 06:56 · Score: 1

I'm going to go out on a limb and say 'unicode' (probably UTF-8). It'd be ridiculous in this day and age to support every single character set in the old way. ASCII is a subset of both Latin-1 and UTF-8, and Latin-1 is *almost* (but not quite) a subset of UTF-8. How is one supposed to type 'a minimalist latin charset' in a keyboard that doesn't have them, are you proposing changing the keyboards of all those who don't use latin-based charsets? Learn a language or two, try to find out more about the world, and *then* give your opinion, at least then it'll have a basis in fact.
Choosing a single character encoding and bytestream transmission standard (like UCS plus UTF-8) would be the logical choice, IMO, and that's what I was getting at. However, the GGP was specifically talking about the possibility of using multiple character sets in the same URL, which I think would be wholly impractical, and unnecessary given the widespread use of the UCS.

As for your other, snarky, comment, I'll only respond by saying that I have traveled extensively, and if you had as well, you would realize that most of the rest of the world uses keyboards which look suspiciously similar to those here in the U.S., albeit with different glyphs printed on them, and different input methods for complex characters. (E.g., kana, romanji, phonetic entry, progressive-exclusion via GUI, etc.) But due to the widespread use of Latin characters, I've yet to see any computer system, anywhere, which didn't have some method for entering them. (Think about it: right now, all URLs are basically ASCII: if a computer didn't have the ability to enter those first 128 characters at all, then it would be nearly impossible to get online.) Standard Japanese keyboards, for instance, have the QWERTY Latin layouts printed below the kana glyphs. (But don't take my word for it, see for yourself.)

Also, virtually all localized charsets (including the UCS) are backwards-compatible in that they include ASCII as the first 128 positions, so ASCII TLDs are by definition the most "guaranteed safe" you can get, in terms of being able to be read and written by everyone, everywhere.

--
"Ladies and gentlemen, my killbot features Lotus Notes and a machine gun. It is the finest available."
Re:Multiple character sets in one URL? by Petrushka · 2007-03-13 10:24 · Score: 1

However, the GGP was specifically talking about the possibility of using multiple character sets in the same URL, which I think would be wholly impractical, and unnecessary given the widespread use of the UCS.
Speaking out of ignorance here: surely some languages require multiple character sets? If you're using English, French, German, or Spanish, fine, ASCII will do fine. But If you're using Polish, Lithuanian, Czech, Ma_ori, or Turkish, you're going to have to use a combination of basic Roman characters plus characters with a variety of diacritics or other modifications, aren't you? I guess there's something I'm missing.

Anyway, even presuming I am missing something, it looks to me like it'd still be pretty easy to phish by using, say, Turkish for your character set, but make it look like basic Roman except that one of the 'i's is missing a dot.
Re:Multiple character sets in one URL? by Anonymous Coward · 2007-03-13 10:29 · Score: 0

>I guess there's something I'm missing.

Yep.

http://düsseldorf.de/ by armomurha · 2007-03-13 05:37 · Score: 1

Workaround

I don't know by Vexorian · 2007-03-13 05:43 · Score: 1

English is not my native language (as if you didn't notice) I still think this is not exactly the best idea ever, I actually think it is pretty bad... Phising has been named, but it also seems as a huge overcomplication. Most sites (aka youtube) already get to survive with totally cryptic URLs , so I don't really thing this is a problem at all.

--

Copyright infringement is "piracy" in the same way DRM is "consumer rape"

Misleading summary: all this is about TLDs by pieleric · 2007-03-13 05:50 · Score: 1

Actually, as the abstract of the paper correctly states, it's about non-ascii characters in TLDs. International characters already exist in the domain names, as some posters have pointed out.

In this article they applied the same encoding used for domain names to TLDs, and they noticed it works fine. So to summarize, it's not about miçrósoft.com, it's about microsoft.çóm . That's much more fun!

Spam filters rejoice by denebeim · 2007-03-13 06:27 · Score: 1

The day that goes on-line I'll be able to filter scads of spam simply by refusing to resolve international domain names. Woot!

Passwords aren't any better, done right. by Kadin2048 · 2007-03-13 07:08 · Score: 1

Given the number of passwords that the average person who does a lot of stuff online needs to remember, unless they're doing something hideously insecure already (like using the same password everywhere), they can probably only sign on from a single computer anyway, because that's where their passwords are stored or written down.

The problem of certificate management is, IMO, actually more tractable than the problem of password management. There are lots of ways that you could allow people to move certificates around, if you really wanted to; you could issue USB sticks or smartcards that they could jack in to public machines (although preferably you'd create some method that never actually let the unsecure machine 'see' the certificate itself; you'd just do some sort of challenge/response with the USB key or smartcard).

Passwords really aren't all that convenient; if you're using passwords properly (not reusing the same ones in multiple places), and you're not using a crutch like iterative generation, or just writing the things down (which basically makes it a very insecure "analog certificate"), you're probably way out on the tail-end of the bell curve of what a normal person can remember. Passwords are only "user friendly" because the way that most people use them is hideously insecure.

--
"Ladies and gentlemen, my killbot features Lotus Notes and a machine gun. It is the finest available."

But they'll still keep the ridiculous part by professorfalcon · 2007-03-13 19:20 · Score: 1

How will they translate "www."?

English Visitors by cjb110 · 2007-03-14 07:02 · Score: 1

How then are English visitors suppose to visit one of these sites? Purely by links?

Although I can kinda see the point, I can't see how this will work...all I can see is the internet fragmenting, which seems to be against the whole spirit of things!

For those that don't see why someone who can't read the language would want to visit the site...the reason is simple: Pictures tell a thousand words. Secondly technology and science is often language independent, so the specs on a Korean phone site are useful.

What we really need is an massive all out war :), and not of these poncy half wars, where we stop fighting before we've won and 'peacekeep' for the next century. At least then there would only be the conquering language to speak.

--
----- I refuse to have an argument with an unarmed person

Slashdot Mirror

International URLs Pass First Test

159 comments