Gmail Recognizes Addresses Containing Non-Latin Characters
An anonymous reader writes In response to the creation in 2012 by the Internet Engineering Task Force (IETF) of "a new email standard that supports addresses incorporating non-Latin and accented Latin characters", Google has now made it possible for its Gmail users to "send emails to, and receive emails from, people who have these characters in their email addresses." Their goal is to eventually allow its users to create Gmail addresses utilizing these characters.
So the next lot of phishing will come from: róót@gmail.com / Àdministrator@gmail.com or BìllGàtes@gmail.com etc?
Great.
Sendmail is like emacs: A nice operating system, but missing an editor and a MTA.
Google updated their regular expression. Good for them.
Finally I can get motörhead@gmail.com!
From what I can tell, a mail server has two options when receiving this mail:
Accept it.
Reject it.
The default, with software that doesn't understand this RFC yet (which seems to be... just about everything), is to reject. So trying to use this as an email is not only going to mess up every form you try to fill in online (because they won't see it as an email address either), but quite likely just gets you bouncebacks from everyone you email.
What was needed was surely a system similar to the IDN system for internationalisation, which would allow those with ASCII-only DNS servers etc. to STILL WORK, by converting the Unicode characters to ASCII subsets and then sending the email as normal, through the entire PLANET-worth of working email servers out there that could accept it.
Having a content negotiation option at the SMTP level, that mail servers have to implement and handle specifically, is just ridiculous, and even with GMail's kickstart it could be decades before you can guarantee that your UTF-8 email address will work across the Internet and even then there'll be some old legacy server that will just bounce all your email BECAUSE of that character set in your address. And it will be perfectly legitimate to do so.
However, as others have pointed out, if this goes through, it will be nigh-on impossible to spot phished/faked email addresses, just like it is with IDN links unless you know how to find the original ASCII-encoding of them.
This is a real concern,and probably why gmail is not yet allowing internationalised gmail addresses. Most email names could be spoofed using Cyrillic characters which look exactly the same as latin ones. How could you tell if the "c" in chrisq@gmal.com really was a latin 'c' or a cyrillic Es?
My e-mail address ends with the suffix ".name". It is perfectly correct (even if not common), but I still sometimes have issues today because some stupid website has an outdated regular expression which says that ".name" is not correct.
Now imagine this with non-latin characters (or just non-ASCII characters)... If you only write to people also using GMail, it might work.
They might mark conspicuous characters, like when multiple character sets are combined in a single domainname.
Slashdot social media options: AIM, ICQ, Yahoo, Jabber and Mobile Text. Why no MySpace?
Hit the 'Reply-To' button, naturally.
- After adding the user to your Address book.
As a webdev who gets irritated at websites that fail badly with their email validation (e.g. not allowing + in the local part, or only allowing 2 or 3 char TLDs), I do try very hard to get this right. So I've got a solid(ish) email validation function. But, I'm a bit sketchy on what to do with UTF-8.
For the domain, I'd hope that the MTA (Postfix in my case) would allow UTF-8 and convert to punycode as required, but I'm not sure it does. So currently I don't allow for that. I _could_ convert to punycode myself, but I don't.
And as for the local-part, I'm fairly certain Postfix doesn't allow for UTF-8 at present.... at least, not the Postfix version supported on Debian 7.
So I'm just wondering what everyone else is doing? Should I improve my support, or should I just wait for support to be added to my MTA before I bother?
first off, I went down the slippery death defying slope of email address validation recently... Our software had simple regex rules... so I thought I would just implement RFC rules, or find a library that did... wow. RFC is a mess... APIs are worse.
This is a valid email address:
dude"".dude@[192.168.1.1]
so is this:
a@com
also valid:
test+test=gmail.com@test.com
none of those will work in MS Outlook or exchange, none of them will work with jquery validation plug-in, some close to that will work with java mail API. Most funky but standards compliant email addresses will pass Apache commons validation.
In the end, I went with a 2 part validation: 1) Apache Commons Validation (mostly RFC correct), then a second pass on Javax.mail because if I can't send email to it, then what is the point of having it? We still get addresses that pass both validations, and bounce at some SMTP relay due to "invalid address format."
I am sure internationalization will make all this better.
In case you are asking this for real, this is a documented gmail feature.
https://support.google.com/mail/answer/10313?hl=en
You can actually log in with any variant of your username that includes 1 or more periods added in arbitrary locations.