Slashdot Mirror


Gmail Recognizes Addresses Containing Non-Latin Characters

An anonymous reader writes In response to the creation in 2012 by the Internet Engineering Task Force (IETF) of "a new email standard that supports addresses incorporating non-Latin and accented Latin characters", Google has now made it possible for its Gmail users to "send emails to, and receive emails from, people who have these characters in their email addresses." Their goal is to eventually allow its users to create Gmail addresses utilizing these characters.

25 of 149 comments (clear)

  1. Next wave of phishing? by CRC'99 · · Score: 4, Funny

    So the next lot of phishing will come from: róót@gmail.com / Àdministrator@gmail.com or BìllGàtes@gmail.com etc?

    Great.

    --
    Sendmail is like emacs: A nice operating system, but missing an editor and a MTA.
    1. Re:Next wave of phishing? by rvw · · Score: 2

      So the next lot of phishing will come from: róót@gmail.com / Àdministrator@gmail.com or BìllGàtes@gmail.com etc?

      It's not about bìllgàtes@outlook.com, but billgates@óutlook.com. It's the domain that is going to cause problems, not the user!

    2. Re:Next wave of phishing? by Captain_Chaos · · Score: 4, Insightful

      Worse; they will come from root@gmail.com, administrator@gmail.com or BillGates@gmail.com, only those o's and a's will be Cyrillic or something like that (can't do it here; Slashdot doesn't display them).

    3. Re:Next wave of phishing? by dejanc · · Score: 2

      That kind of phishing already exists, even more sophisticated: a bug that a lot of software contains is not distinguishing between same looking characters in different alphabets. E.g. you can sign up on many forum/bbs platforms as Administrator if your leading A is cyrillic A instead of latin A. Both look the same but have different html entity codes and are different unicode chracatres, which is true for most vowels and many consonants (e.g. cyrillic B and latin B, C and C, E and E...). Or, for more fun, look at this (single) character which looks exactly as "lj".

      Those of us with customers who use two alphabets constantly have known about this problem for a long time and we've seen phishing on all different kinds of platforms using this strategy.

      IDN (internationalized domain names) solves this problem in domain names with policy: you can't register a domain which looks exactly like some other domain except for that change in character. Still though, you can register both casino.it and casinò.it and that's where the real phishing potential is. I think, at least most native English speakers, would probably be fooled easier by a domain such as paypal-customer-division.com than paypàl.com.

    4. Re:Next wave of phishing? by rvw · · Score: 2

      Worse; they will come from root@gmail.com, administrator@gmail.com or BillGates@gmail.com, only those o's and a's will be Cyrillic or something like that (can't do it here; Slashdot doesn't display them).

      When you mix Latin htmail with a Cyrillic o to get hotmail, Google and all email programs should refuse that address immediately, mark it as spam, make the address red with a warning sign etc. Mixing character sets should not be allowed in a domain or in a username. So the username may be all Cyrillic or Greek, the domain name may be all Chinese or Latin, and these may mix, but no mixes in the domain name or username itself.

    5. Re:Next wave of phishing? by Chrisq · · Score: 2

      I think that's the way to go - only allow characters from a single unicode script in the username and in the domain name. The domain name part is currently handled by registras so that may not need any additional rules.

      However this really should be part of the RFC, or else anyone banning mixed names would be "non compliant". If the RCF does not specify this then the best that gmail (or any other system could do) would be to prevent people registering mixed names themselves and giving a warning (and maybe colour characters) if email is recieived from an address with mixed scripts.

    6. Re:Next wave of phishing? by ArsenneLupin · · Score: 2

      ... and they'll use a greek lower case omicron (), rather than an accented o. The looks exactly the same as an o (except on Slashdot, of course. Slashdot hates Unicode...)

    7. Re:Next wave of phishing? by Megane · · Score: 3, Funny

      I think ròót@gmail.com is a better choice because it looks angry.

      --
      #naabhaprzrag, #sverubfr-000, #agi-fcbafberq, negvpyr[pynff*=' negvpyr-ary-'] { qvfcynl: abar !vzcbegnag; }
  2. Well, I'm impressed. by Anonymous Coward · · Score: 3, Insightful

    Google updated their regular expression. Good for them.

    1. Re:Well, I'm impressed. by Chrisq · · Score: 5, Informative

      I would imagine that there they implemented RFC6532, which involves a lot more than changing a regular expression

  3. Metal umlaut! by Anonymous Coward · · Score: 5, Funny

    Finally I can get motörhead@gmail.com!

    1. Re:Metal umlaut! by jones_supa · · Score: 2

      I will represent myself as a shady unofficial sales representative for an Australian microphone brand.

    2. Re:Metal umlaut! by Deep+Esophagus · · Score: 2

      Finally I can get motörhead@gmail.com!

      This is exactly what is going to happen, and I don't mean that in a good way. I already see it in other chat environments, like Second Life, where the full power of Unicode allows any and all characters in usernames. It's bad enough that they substitute Latin letters with superficially similar characters from other languages so we end up with names like ££¥ and , but miles of decorative symbols drawn from Braille and mathematics... and don't even get me started about the entire upside-down alphabet. These typographic idiots don't realize or care that they are making their names (and often text) completely unreadable, as long as it looks cool.

      Thanks for stripping the illustrative part of my post out, Slashdot. The first name should have shown a Greek Beta, an i with a little circle over it (1F34), then the Pound and Yen symbols for "Billy". The next example, "Sarah", should have shown an Arabic Kaf, a Greek A with a bunch of curlicues (1F8C), the Cyrillic Ya (backwards R), another A, and a Cyrillic N (looks like H) with more curlicues (04A2).

      Anyhow, you get the idea.

  4. Sigh by ledow · · Score: 5, Insightful

    From what I can tell, a mail server has two options when receiving this mail:

    Accept it.
    Reject it.

    The default, with software that doesn't understand this RFC yet (which seems to be... just about everything), is to reject. So trying to use this as an email is not only going to mess up every form you try to fill in online (because they won't see it as an email address either), but quite likely just gets you bouncebacks from everyone you email.

    What was needed was surely a system similar to the IDN system for internationalisation, which would allow those with ASCII-only DNS servers etc. to STILL WORK, by converting the Unicode characters to ASCII subsets and then sending the email as normal, through the entire PLANET-worth of working email servers out there that could accept it.

    Having a content negotiation option at the SMTP level, that mail servers have to implement and handle specifically, is just ridiculous, and even with GMail's kickstart it could be decades before you can guarantee that your UTF-8 email address will work across the Internet and even then there'll be some old legacy server that will just bounce all your email BECAUSE of that character set in your address. And it will be perfectly legitimate to do so.

    However, as others have pointed out, if this goes through, it will be nigh-on impossible to spot phished/faked email addresses, just like it is with IDN links unless you know how to find the original ASCII-encoding of them.

    1. Re:Sigh by hawkinspeter · · Score: 3, Funny

      Nope, there's four options:

      Accept it.
      Reject it.
      Temporary failure, try again later.
      User not local, will forward to <somewhere>.
      Syntax error, command unrecognised.

      Wait, I'll come in again...

      --
      You're a temporary arrangement of matter sliding towards oblivion in a cold, uncaring universe
  5. Re:Dammit this is a terrible idea by Chrisq · · Score: 5, Insightful

    This is a real concern,and probably why gmail is not yet allowing internationalised gmail addresses. Most email names could be spoofed using Cyrillic characters which look exactly the same as latin ones. How could you tell if the "c" in chrisq@gmal.com really was a latin 'c' or a cyrillic Es?

  6. Good luck by Pascal+Sartoretti · · Score: 4, Interesting

    My e-mail address ends with the suffix ".name". It is perfectly correct (even if not common), but I still sometimes have issues today because some stupid website has an outdated regular expression which says that ".name" is not correct.

    Now imagine this with non-latin characters (or just non-ASCII characters)... If you only write to people also using GMail, it might work.

    1. Re: Good luck by Pascal+Sartoretti · · Score: 2

      It's not stupid website. It's just stupid 4 char tld that shouldn't exist according to the standards.

      Please show me in RFC 1035 where you see this 3 letter limitation.

      By the way, the ".arpa" pseudo-domain has always existed.

      There are myriad of validators out there that will reject it

      No, most validators correctly implement the standard, only a handfew are incorrect.

      they worked well for decades.

      Something on the web that worked well for decades has necessarily been enhanced at some point...

  7. Re:Dammit this is a terrible idea by mwvdlee · · Score: 2

    They might mark conspicuous characters, like when multiple character sets are combined in a single domainname.

    --
    Slashdot social media options: AIM, ICQ, Yahoo, Jabber and Mobile Text. Why no MySpace?
  8. Re:Great idea by OolimPhon · · Score: 2

    Hit the 'Reply-To' button, naturally.

    - After adding the user to your Address book.

  9. What should the rest of us do? by Zaiff+Urgulbunger · · Score: 2

    As a webdev who gets irritated at websites that fail badly with their email validation (e.g. not allowing + in the local part, or only allowing 2 or 3 char TLDs), I do try very hard to get this right. So I've got a solid(ish) email validation function. But, I'm a bit sketchy on what to do with UTF-8.

    For the domain, I'd hope that the MTA (Postfix in my case) would allow UTF-8 and convert to punycode as required, but I'm not sure it does. So currently I don't allow for that. I _could_ convert to punycode myself, but I don't.

    And as for the local-part, I'm fairly certain Postfix doesn't allow for UTF-8 at present.... at least, not the Postfix version supported on Debian 7.

    So I'm just wondering what everyone else is doing? Should I improve my support, or should I just wait for support to be added to my MTA before I bother?

    1. Re:What should the rest of us do? by Richy_T · · Score: 2

      I don't see much point getting anal about email validation, especially since it's fairly hard problem. It's been a while since I've written one but something along the lines of something@something.something is usually enough and let the mail servers sort out the rest.

  10. address standards are a nightmare by netsavior · · Score: 2

    first off, I went down the slippery death defying slope of email address validation recently... Our software had simple regex rules... so I thought I would just implement RFC rules, or find a library that did... wow. RFC is a mess... APIs are worse.
    This is a valid email address:
    dude"".dude@[192.168.1.1]
    so is this:
    a@com
    also valid:
    test+test=gmail.com@test.com
    none of those will work in MS Outlook or exchange, none of them will work with jquery validation plug-in, some close to that will work with java mail API. Most funky but standards compliant email addresses will pass Apache commons validation.

    In the end, I went with a 2 part validation: 1) Apache Commons Validation (mostly RFC correct), then a second pass on Javax.mail because if I can't send email to it, then what is the point of having it? We still get addresses that pass both validations, and bounce at some SMTP relay due to "invalid address format."

    I am sure internationalization will make all this better.

    1. Re:address standards are a nightmare by allo · · Score: 2

      just send a mail. if it fails, discard the pending registration or whatever, possibly via "not confirmed" timeout some days later.

  11. Re:If only they recognized the difference... by devman · · Score: 2

    In case you are asking this for real, this is a documented gmail feature.

    https://support.google.com/mail/answer/10313?hl=en

    You can actually log in with any variant of your username that includes 1 or more periods added in arbitrary locations.