Gmail Recognizes Addresses Containing Non-Latin Characters
An anonymous reader writes In response to the creation in 2012 by the Internet Engineering Task Force (IETF) of "a new email standard that supports addresses incorporating non-Latin and accented Latin characters", Google has now made it possible for its Gmail users to "send emails to, and receive emails from, people who have these characters in their email addresses." Their goal is to eventually allow its users to create Gmail addresses utilizing these characters.
So the next lot of phishing will come from: róót@gmail.com / Àdministrator@gmail.com or BìllGàtes@gmail.com etc?
Great.
Sendmail is like emacs: A nice operating system, but missing an editor and a MTA.
Google updated their regular expression. Good for them.
Finally I can get motörhead@gmail.com!
From what I can tell, a mail server has two options when receiving this mail:
Accept it.
Reject it.
The default, with software that doesn't understand this RFC yet (which seems to be... just about everything), is to reject. So trying to use this as an email is not only going to mess up every form you try to fill in online (because they won't see it as an email address either), but quite likely just gets you bouncebacks from everyone you email.
What was needed was surely a system similar to the IDN system for internationalisation, which would allow those with ASCII-only DNS servers etc. to STILL WORK, by converting the Unicode characters to ASCII subsets and then sending the email as normal, through the entire PLANET-worth of working email servers out there that could accept it.
Having a content negotiation option at the SMTP level, that mail servers have to implement and handle specifically, is just ridiculous, and even with GMail's kickstart it could be decades before you can guarantee that your UTF-8 email address will work across the Internet and even then there'll be some old legacy server that will just bounce all your email BECAUSE of that character set in your address. And it will be perfectly legitimate to do so.
However, as others have pointed out, if this goes through, it will be nigh-on impossible to spot phished/faked email addresses, just like it is with IDN links unless you know how to find the original ASCII-encoding of them.
This is a real concern,and probably why gmail is not yet allowing internationalised gmail addresses. Most email names could be spoofed using Cyrillic characters which look exactly the same as latin ones. How could you tell if the "c" in chrisq@gmal.com really was a latin 'c' or a cyrillic Es?
My e-mail address ends with the suffix ".name". It is perfectly correct (even if not common), but I still sometimes have issues today because some stupid website has an outdated regular expression which says that ".name" is not correct.
Now imagine this with non-latin characters (or just non-ASCII characters)... If you only write to people also using GMail, it might work.
I hope they implement the same kind of anti-phishing measures that browsers are taking for displaying domain names with non-Latin scripts. http://en.wikipedia.org/wiki/I...
Also, thin space, zero width space, zero width non joiner, combiners that combine in such a way that they essentially do nothing. There are a lot of possibilities and if any of them are missed it will be a disaster.
I forsee a lot of pain comming from this.
Most email names could be spoofed using Cyrillic characters which look exactly the same as latin ones. How could you tell if the "c" in chrisq@gmal.com really was a latin 'c' or a cyrillic Es?
gmail.ru (or its equivalent) will find a way to support cyrillic
gmail.qc.ca and gmail.fr will find ways to support French accents (otherwise, Google will get sued or blocked by Quebec or France)
These details will get worked out at the local level. It will take time, but they'll get there eventually.
The Latin alphabet is not American.
How on earth am I supposed to email someone when I don't even have a key that corresponds to a letter in their email address. And do I'm not keeping a huge chart of Alt+number combinations handy.
Of course there is probably someone in China or Korea thinking "why do I have to use this special keyboard mode with characters I don't understand to write emails".
They might mark conspicuous characters, like when multiple character sets are combined in a single domainname.
Slashdot social media options: AIM, ICQ, Yahoo, Jabber and Mobile Text. Why no MySpace?
you cannot use solely international characters, the first one need to be simple ascii
What?Where do you get that from? TFA gives examples where the whole email address is in international characters (katakana)
Hit the 'Reply-To' button, naturally.
- After adding the user to your Address book.
http://www.irongeek.com/homogl...
Maybe now my e-mails to Tutankhamun will quit bouncing.
Sheesh, evil *and* a jerk. -- Jade
It would be easy to WARN a USER if the name contains mixed alphabets or diacritics that differed from the user's browser's preferred language. Each Unicode Character has a name eg "Greek Upsilon With Hook Symbol", or "Latin Capital Letter R", or "Cyrillic Capital Letter Es With Descender", "Arabic Letter Qaf", or "CJK Ideograph" for Chinese/Korean/Japanese.
By default ubuntu doesn't unless your codepage requires it. Most of the 'complete' unicode fonts aren't included by default.
Yes, warning users works really well. Especially after decades of windows training users to click accept on alerts without reading them.
Most email names could be spoofed using Cyrillic characters which look exactly the same as latin ones. How could you tell if the "c" in chrisq@gmal.com really was a latin 'c' or a cyrillic Es?
gmail.ru (or its equivalent) will find a way to support cyrillic gmail.qc.ca and gmail.fr will find ways to support French accents (otherwise, Google will get sued or blocked by Quebec or France) These details will get worked out at the local level. It will take time, but they'll get there eventually.
I don't think that would work in protecting users against attacks unless you said that only users if gmail.ru could receive emails from users with Cyrillic characters in the name, etc.
Implementing proper domain and user authentication by baking PGP or some other PKI right into the email protocols will both solve the spam problem comprehensively AND allow UTF8 domains with minimum risk of phishing /spoofing.
I hate printers.
"+" or plenty of other special characters. Stuff like quotes can even be valid if used properly, while we still have some website that won't even accept a dash/underscore.
That "signed char" was a bad coding choice back in the day.
"+" or plenty of other special characters. Stuff like quotes can even be valid if used properly, while we still have some website that won't even accept a dash/underscore.
I had to wait nearly 10 years for my ".name" domain to be accepted by most websites (say, 99.5%).
For "+" or other funny characters, my estimate is that you will need at least 10 years starting from now.
I would not hold my breath.
Because no language ever makes use of characters from other languages, I mean surely Latin capital letter R is only used by latin speakers. Seriously you should get a better understanding of what you are saying before you make bold claims about how 'easy' something is going to be, could it be done, maybe, will there be oversights, bugs and glitches for people to exploit, almost definitely.
Actually Unicode does make a good effort of classifying characters into scripts, with some "common" characters that can appear in any scripts and some "inherited" characters (like diacritics) that belong to the character that they are applied to. Thus the Cyrillic"Es" looks like a Latin "C" but is a different Unicode character, one belonging to the Cyrillic scripts and the other to the Latin script. The different languages using the same scriptis a red-herring, it doesn't matter that both French and English use the capital "R", what does matter is that you can't put a Cyrillic character into the middle of a Latin script string to make something that looks like a certain name but isn't. Checking whether a name contains characters from more than one script is easy. there are methods in some languages that trivialise this.
Interoperability is why we still write to anonymous@slashdot.org!mail.comcast.net!mail.myisp.com!gateway.local instead of just having globally resolvable addresses. Upgrading the infrastructure is just too hard and will never happen.
And don't forget that maybe some Chinese dude has problem with typing English (although I think most keyboards all around the world do keep ASCII letters and base ASCII punctuation at least, so there's that at least today...)
Phonetic entry using pinyin is still the most common method, which has been greatly sped up with predictive text like on cell phones, so the most common characters can be entered with a few keystrokes. Google Pinyn in this regard is, as the kid's say, the shiznit.
.
Prisencolinensinainciusol. Ol Rait!
You are beneath contempt, and it would be otherwise intuitive that you should be ignored as an aberration. However, it is extremely important that decent people of good will realize that their opposites, people like you, are not an aberration, that you exist in the environment as a pervasive and pernicious evil, and therefore appropriate countermeasures must be put in place and vigilance maintained.
They better be filtering out the non-printing characters that do fun stuff like reverse the text direction, overstrike, etc. How long until people start registering gmail addresses with Zalgo text?
And how long until someone registers pile of poo @gmail.com?
#naabhaprzrag, #sverubfr-000, #agi-fcbafberq, negvpyr[pynff*=' negvpyr-ary-'] { qvfcynl: abar !vzcbegnag; }
As a webdev who gets irritated at websites that fail badly with their email validation (e.g. not allowing + in the local part, or only allowing 2 or 3 char TLDs), I do try very hard to get this right. So I've got a solid(ish) email validation function. But, I'm a bit sketchy on what to do with UTF-8.
For the domain, I'd hope that the MTA (Postfix in my case) would allow UTF-8 and convert to punycode as required, but I'm not sure it does. So currently I don't allow for that. I _could_ convert to punycode myself, but I don't.
And as for the local-part, I'm fairly certain Postfix doesn't allow for UTF-8 at present.... at least, not the Postfix version supported on Debian 7.
So I'm just wondering what everyone else is doing? Should I improve my support, or should I just wait for support to be added to my MTA before I bother?
Probably set up so that if the Russian gets bounced, it tries again with the latin alphabet.
Also, the signature of all emails sent from this should have a copy of the latin email address, so that people that don't have the Russian capability can reply.
excitingthingstodo.blogspot.com
Usually with such things it's better to whitelist than blacklist. As you add characters to the whitelist you determine what character they should be equivilent to for conflict-management purposes.
Out of interest does anyone know if people actually use internationalised domain names as their main domains or if they stick to conventional names that work with all software and which everyone can type.
note: i'm known as plugwash most places but i screwd up registering that here somehow in the past and now can't register
Now that Google has implemented 2012 i18n technology, maybe vaunted technology site Slashdot can catch up to 1998 and implement UTF-8 properly?
Nah.
Liberty in your lifetime
I don't want to sound racist, but I've never heard of Jewish suicide bombers, Jewish plane hijackings, etc.
Get free satoshi (Bitcoin) and Dogecoins
Cause while his countrymen were running around killing sparrows with sticks at the behest of an insane, leftist ruler, the capitalist west had already been working on the transistor for 10 years, and was continuing working on improving the integrated circuit it had become part of and thus had a huge head start on defining the standards that would be used in a global communications network of billions of computers.
first off, I went down the slippery death defying slope of email address validation recently... Our software had simple regex rules... so I thought I would just implement RFC rules, or find a library that did... wow. RFC is a mess... APIs are worse.
This is a valid email address:
dude"".dude@[192.168.1.1]
so is this:
a@com
also valid:
test+test=gmail.com@test.com
none of those will work in MS Outlook or exchange, none of them will work with jquery validation plug-in, some close to that will work with java mail API. Most funky but standards compliant email addresses will pass Apache commons validation.
In the end, I went with a 2 part validation: 1) Apache Commons Validation (mostly RFC correct), then a second pass on Javax.mail because if I can't send email to it, then what is the point of having it? We still get addresses that pass both validations, and bounce at some SMTP relay due to "invalid address format."
I am sure internationalization will make all this better.
I wouldn't be surprised to see l'Office québécois de la langue française do something like that. I speak french and I still think they're assholes who are over-reaching their boundaries.
Get free satoshi (Bitcoin) and Dogecoins
Isn't this something, which was introduced years ago?
If a chinaman and a russian swap buisness cards and both have used their own scripts for email addresses are thier thoughts going to be "great" or "how the fuck do I type this?"
My guess is nationalists who don't care about the world beyond their countries borders may adopt this, those who care about being part of the global community (or simply about interoperating with older software) will avoid it like the plauge it is.
note: i'm known as plugwash most places but i screwd up registering that here somehow in the past and now can't register
Posts like these are just random pot shots looking for a response. Chances are he doesn't even believe what he says, rather he just wants to cause somebody to come out speak in a righteous manner. Mission accomplished, I think?
Take "mathematical letter kappa" and "latin k" for example, do you think your mom will be able to tell the difference?
To be fair they do have different script values so would be identified by the proposal
That would probablly work reasonablly well for greek and cryllic scripts.
For other scripts have fun dealing with weired rules for mixing LTR and RTL chacters. Characters that join together into something that looks more like squiggly handwriting that what we would recognise as printed text, or a sea of thousands of characters that all look very similar to the western eye.
note: i'm known as plugwash most places but i screwd up registering that here somehow in the past and now can't register
The same what they email you when they don't even have a key that corresponds to a letter in your email address.
Cause now that the capitalist west has offshored the production of those billions of transistors, integrated circuits and computers to his countrymen, we need to be able to e-mail them.
I don't know about the c, but that "gmal" domain looks mighty suspicious.
Cwm, fjord-bank glyphs vext quiz
"+"-characters keep out an amazing amount of spam. Please do not teach the world to recognize them.
Finally! A year of moderation! Ready for 2019?
...between these two addresses:
firstname.lastname@gmail.com
firstnamelastname@gmail.com
I keep getting email at the former addressed to the latter. Anyone else encounter this oddity with Gmail?
Wuss I prefer the old days when you could just sent mail to me at c=uk cn=firstname - kids today cant parse a x.400 address to save their lives. And yes I did have root on the UK's ADMD -)
The "From:" header has been spoofable in ASCII since the beginning of e-mail. Given its unreliability, you are foolish if you put much stock into it.
Fascism should more properly be called corporatism because it is the merger of state and corporate power. -- Mussolini
Of course there is probably someone in China or Korea thinking "why do I have to use this special keyboard mode with characters I don't understand to write emails".
Any educated person there, and in countries that use Cyrillic in case you wondered, will learn the Latin alphabet in school. By the way, their keyboards always have the Latin alphabet on them along with symbols for certain characters in their own writing systems.
The very first suicide bomber in occupied Palestine was in fact a Zionist Israeli Jew...
We can just use the corollary of the standard westerner's mode of improving communications with furriners of raising your voice, ALL CAPS!!!
Try using those email addresses to register on various web sites and watch them say "invalid email address".