Slashdot Mirror


Why Google's Gmail Phishing Warnings Give False Positives (vortex.com)

Vortex.com is one of the oldest domains on the internet -- one of the first 40 ever registered, writes Slashdot reader Lauren Weinstein. So why does Google sometimes block the email he sends? Here's why. First, my message had the audacity to mention "Google Account" or "Google Accounts" in the subject and/or body of the message. And secondly, one of my mailing lists is "google-issues" -- so some (digest format) recipients received the email from "google-issues-request@vortex.com"... Apparently what we're dealing with here is a simplistic (and frankly, rather haphazard in this respect at least) string-matching algorithm that could have come right out of the early 1970s...! [A]t least in this case, it appears that Google is basically using the venerable old UNIX/Linux "grep" command or some equivalent, and in a rather slipshod way, too.
In addition, the article concludes, "I've never found a way to get Google to 'whitelist' well-behaved senders against these kinds of errors, so some users see these false phishing warnings repeatedly.

49 comments

  1. Probably an acceptable trade-off for Google by vadim_t · · Score: 4, Interesting

    With the huge volumes of data that Google handles, it's probably hard to do any better.

    AI style approaches can fail in quite unpredictable ways, and I think Google likely much prefers that too much is blocked than failing to find something obviously fishy but that gets through the algorithm for some obscure reason.

    Sometimes simple approaches are the way to go. You're going to have false positives and false negatives no matter what, the question is how much and in what circumstances. And this particularly scenario is unlikely to be all that common.

    1. Re:Probably an acceptable trade-off for Google by hey! · · Score: 2

      On the other hand, it wouldn't be hard to provide some kind of whitelist procedure; or better yet a way of slotting email verified from certain domains into certain algorithmic tracks.

      --
      Post may contain irony: discontinue use if experiencing mood swings, nausea or elevated blood pressure.
    2. Re:Probably an acceptable trade-off for Google by 93+Escort+Wagon · · Score: 1

      But domains - even old, established ones - change hands somewhat regularly. So maintaining a useful and effective whiltelist would likely involve significantly more work than one might think.

      --
      #DeleteChrome
    3. Re:Probably an acceptable trade-off for Google by stephanruby · · Score: 1

      A false positive implies that Google is wrong. Google is not wrong this case. Vortex.com is just not keeping its identity secure. Any email you receive from that domain should automatically be treated as suspect because it could have been sent by anyone.

      I may not be able to send email directly from vortex.com because vortex.com has an SPF record, so at least, they secured that much, but anyone can easily forge an email header with the vortex.com domain name because they didn't bother to implement DomainKeys Identified Mail (DKIM)

    4. Re:Probably an acceptable trade-off for Google by hey! · · Score: 1

      Sure. And if people start *reporting* spam coming out of formerly whitelisted domains, you must un-whitelist them.

      --
      Post may contain irony: discontinue use if experiencing mood swings, nausea or elevated blood pressure.
    5. Re:Probably an acceptable trade-off for Google by parkinglot777 · · Score: 1

      Sure. And if people start *reporting* spam coming out of formerly whitelisted domains, you must un-whitelist them.

      I think it is more complicated than that if you want to use domain name alone.

      What if a new domain is created, how do you put it in the white listing? If your domain was compromised without your knowledge and was used in spamming/phishing, how do you know that your domain is removed from the white list? And how do you prove that your domain is now safe to be enlisted in the white list again? How much would you lose while you are trying to get it back on the white list? There are a lot more cases going on that domain name white listing approach alone shouldn't be used.

      Big data is really huge. Sometimes people don't really feel it until they have to deal with it... I would accept false positive rather than false negative in this case.

  2. Re: ROFL! by Anonymous Coward · · Score: 0

    Yup. +1

  3. Never whitelist good behavior by Anonymous Coward · · Score: 0

    Whitelisting good behavior leads to making it very desirable to impersonate people who have good behavior.

  4. Just a thought... by mellon · · Score: 3, Interesting

    Tweak your mailer so that it sends mail from gi-request instead of google-issues-request, and don't mention "Google Account". Granted, this sucks, but the Internet routes around brokenness, and that's what you need to do in a situation like this. Is that a sad thing? Yes, of course. If we had a mail architecture that was pull- rather than push-based, maybe we could have nice things, but until that magic day, the whole thing is bubble gum and bailing wire, and it's honestly not Google's fault that that's so.

    As another example of brokenness, I often get mail that is marked spam because it went through a mailing list expander and the headers didn't get rewritten, so that it fails DKIM validation. Yes, we can all rail about how evil and awful DKIM is, but the bottom line is that if you don't want that to happen, you rewrite the headers. Again, a system that's pull-based rather than push-based would make this a lot better.

    1. Re:Just a thought... by Anonymous Coward · · Score: 0

      Pull-based doesn't scale. How many hundreds of thousands of mail servers do you want to check for mail? Millions? It's push-based for a reason.

      If you want people you don't necessarily know to send you information, you publicly announce to them where you'd like it to be sent, and that's where you pick it up, more or less. All there is to it.

    2. Re:Just a thought... by GuB-42 · · Score: 1

      It is actually a mix of both. SMTP is push-based but POP/IMAP are pull-based. A purely push-based or pull-based system would require 24/7 connectivity for all clients (for receiving and emitting respectively).
      Making the internal connectivity pull-based would just make things slower. Mail is push-based by nature, sending mail is the active part, receiving it is passive. With real-life post offices, the sender is the one who do all the procedures and pays for the stamp, the receiver just needs to check his mailbox. Compare to buying a newspaper, where the customers is the receiver and the sender simply makes it available.

    3. Re:Just a thought... by Anonymous Coward · · Score: 0

      "the headers didn't get rewritten, so that it fails DKIM validation"

      Um, no. If you rewrite the headers, *then* DKIM will fail, since the whole point of DKIM is to identify exactly that: tweaked headers. I think what you mean is SPF failed, as it will when mail is forwarded.

      So, you get to choose: pass DKIM but fail SPF because you didn't rewrite headers; or pass SPF but fail DKIM because you rewrote the headers.

      This is why forwarded mail (and mail going through mailing list expanders) regularly gets nailed as spam: the two main standards are incompatible.

    4. Re:Just a thought... by Anonymous Coward · · Score: 0

      DKIM works just fine with mailing lists. When you configure DKIM on your outgoing SMTP server, you can specify which headers and/or the message body should be included to compute the DKIM signature. If you inspect a piece of email that is signed with DKIM, you will see a line similar to

      h=Date:From:To:Subject:References:In-Reply-To;

      This line says that these headers were used to compute the DKIM signature of the email. If any of them are altered, the DKIM validation will fail. However, you can add or modify headers not included in that list without causing problems.

    5. Re:Just a thought... by mellon · · Score: 1

      It's push-based because of history. The number of connections is the same either way: the sender has to announce that a message is available. The difference with a pull-based solution is that the receiver ignores announcements from senders it doesn't know, and decides when/if to pull. You use pull-based solutions every day. Facebook is pull-based. Reddit is pull-based. The reason SMTP isn't pull-based is that back in the day, we didn't realize that there would be assholes. It's really that simple and that sad.

    6. Re:Just a thought... by mellon · · Score: 1

      No, you'd still have servers, and it's servers that would be on 24x7. Your client would use IMAP or JMAP (hopefully not POP).

    7. Re:Just a thought... by mellon · · Score: 1

      What I mean by "the headers didn't get rewritten" is that the sender didn't get rewritten to a sender that would validate. If I forward your message from my server, you aren't sending it, and so the DKIM isn't going to validate. I have to send it as me.

    8. Re:Just a thought... by mellon · · Score: 1

      Sure. So what happens if From: isn't on that list? Answer: the message is rejected, if the recipient is e.g. yahoo or google.

  5. Right, so what? by Anonymous Coward · · Score: 0

    So one of the "first forty registered domains" can't behave poorly? Get SPF and DKIM going and these problems go away. Or just bitch about it to slashdot, whichever!

    1. Re:Right, so what? by Anonymous Coward · · Score: 0

      IDK what the 40 domains thing is supposed to be about. Just because it's old doesn't mean they can't be stupid.

      I bet they sent out some anti-trump shit to their users. And the users got ticked.

  6. Too whiny by Rick+Zeman · · Score: 5, Insightful

    C'mon, Lauren, with the 10's of millions of spams that google catches every day, some things are going to get caught by the filter that shouldn't be. Even if the filter is 99.99 effective that means there will be 1000 false positives in there...and yours is one of them. Shit happens. Adjust and move on.

    Apparently what we're dealing with here is a simplistic (and frankly, rather haphazard in this respect at least) string-matching algorithm that could have come right out of the early 1970s...! [A]t least in this case, it appears that Google is basically using the venerable old UNIX/Linux "grep" command or some equivalent, and in a rather slipshod way, too. is drawing a trend and a conclusion from one data point.

    1. Re: Too whiny by Anonymous Coward · · Score: 0

      Roughly 20% of the messages in my gmail spam box is legitimate email.

      I have to download everything, in order to sort out the spam

  7. Whine whine whine by Anonymous Coward · · Score: 0

    No one cares how old long the domain has existed. It could go from pristine track record to spewer of spam in a single successful hack ... or a simple change of ownership.

  8. Email is broken by Anonymous Coward · · Score: 0

    What spam wounded, anti-spam killed. Sending email has become a dark art. It is impossible to find out if an email will actually reach a recipient or if an MTA will silently drop it or if the email will be sorted away into the spam folder. If the email is suppressed in any way and you find out about it, there's often no way of determining what caused it to fall out of favor. I used to make fun of people who would call to announce that they sent you an email. If an email is important, I do that now.

  9. 50% false positive rate by Anonymous Coward · · Score: 0

    This is also my experience.

    I have my own domain, I run my own web-server, been doing so for years.

    About 50% of the time my email goes to spam - even when I'm sending to the same people, who've replied to me int he past and presumably also removed my email from spam.

    If I have to email someone at gmail, I use protonmail.

    Having said that I try not to email gmail accounts, because of the lack of privacy.

  10. Face it, the Goog does not care about you by Anonymous Coward · · Score: 0

    You are a product, not a customer, and a product goes in whatever box the Goog wants it to.

  11. Implement SPF and DKIM by Walking+The+Walk · · Score: 4, Informative

    GMail won't normally mark your email as spam/phishing if you've implemented basic mail server identification such as SPF (Sender Policy Framework) and DKIM (DomainKeys Identified Mail). This is well known, and I guarantee that if the author bothered to search for why their mail ends up flagged by GMail he would hit at least one of these two terms in the first few results.

    --
    A recursive sig
    Can impart wisdom and truth
    Call proc signature()
    1. Re:Implement SPF and DKIM by cdwiegand · · Score: 1

      So you're SPF and DKIM signed - you can still be marked by Google as spam based on content, which is what their problem is here. I know - I've had to deal with it, and it's very annoying because no ESP out there cares about senders - you're not their customer.

      --
      . Define sqrt(x) as something really evil like (x / rand()), and bury it deep. Watch your coworkers go nuts.
    2. Re:Implement SPF and DKIM by arth1 · · Score: 1

      SPF and DKIM can be useful for resenders, but it adds nothing to security for original senders except maintenance costs and delays. If the A record for the sender address points to the IP of the remote MTA, it can be generally be trusted to come from that domain. Unless the DNS server is taken over, in which case SPF and DKIM won't do much good either.

    3. Re:Implement SPF and DKIM by Anonymous Coward · · Score: 0

      That's great because SPF and DKIM are not security tools. If you want your email to be secure, then encrypt them with GPG or S/MIME.

      SPF, DKIM, and DMARC are great spam prevention tools and the internet would be better off if more sysadmins knew how to properly configure them. It would be a great start for all of the domains that don't send mail to have a restrictive SPF policy by default--I wish the domain registrars that provide DNS (e.g., parked domains) did this automatically.

    4. Re:Implement SPF and DKIM by Anonymous Coward · · Score: 0

      SPF, DKIM, and DMARC are spoofing protection tools. No more, no less. If spammers believe that these are treated as anti SPAM, you can bet that they will line up to implement them, and get their mail through to your users!

    5. Re:Implement SPF and DKIM by Anonymous Coward · · Score: 0

      Consider that spam is regularly initiated from computers that have been infected with viruses and need a valid domain to spoof in order to deliver their payload. SPF, DKIM, and DMARC are effective tools that limit this type of spam. That is, when email fails DKIM and/or SPF, it is a high indicator that the email is spam. On the flip side, when an email passes DKIM and/or SPF, it gets a muted score to indicate non-spam (ham) and continues on to be inspected further, maybe by more CPU intensive bayesian classification systems or similar.

  12. 1970s by Known+Nutter · · Score: 5, Funny

    Apparently what we're dealing with here is a simplistic (and frankly, rather haphazard in this respect at least) string-matching algorithm that could have come right out of the early 1970s...!

    You mean like that vortex.com front page?

    --
    Beware of the Leopard.
    1. Re:1970s by Anonymous Coward · · Score: 0

      90% of the latest "AI" fad is basically grep right out of the 1970's, or could be improved by making it so.

    2. Re:1970s by msk · · Score: 1

      Easy to read, links are unambiguous.

      What's to hate?

    3. Re:1970s by Anonymous Coward · · Score: 0

      The fact that everything is haphazardly strewn across the page, maybe.

  13. Bullshit! They use AI. by Anonymous Coward · · Score: 1

    The AI is so advanced it looks exactly like a human running grep.

    1. Re: Bullshit! They use AI. by Anonymous Coward · · Score: 0

      For real, the article provides zero evidence that this is not a statistical system. fact of the matter is that a classifier may well have learned very high weights for the features his domain and mail present. the fact that he *believes* the problem is some strings in his mail does not mean that grep is the programming which is flagging him

  14. Interface could be better by Okian+Warrior · · Score: 3, Interesting

    With the huge volumes of data that Google handles, it's probably hard to do any better.

    GMail may be "hard to do any better", but dealing with spam is complex and labyrinthine.

    My client began having conversations with a vendor last week, and as a result GMail put *all* subsequent E-mails into my spam folder, including ones from my (whitelisted) client addressed to the vendor CC'ing me. I only found out by accident.

    One might *expect* a quick, easily identified control that says "whitelist this person" or "whitelist this company", but there isn't. You have to go to "Settings->Settings->Filters and blocked addresses", none of which terms are "spam", so the casual user can't just scan headings for the term.

    You can't, apparently, just refer to the spam and say "whitelist that person", you need to create a new filter. You can't, apparently, say "@example.com" as a wildcard for the business, you have to identify an actual sender by complete address.

    And of course, you have to discover that you need to do this, because GMail doesn't give any warning. (Surprising, since every time I use GMail from a different location it sends me a warning E-mail. Every. Single. Time.)

    I'm not even sure why everything went to spam in the first place - I had sent E-mails to both the vendor and the client, so they should have been in my "recently used" list.

    GMail has a pretty cryptic interface, compared to some of the other mail readers I've used.

    1. Re:Interface could be better by Hognoxious · · Score: 3, Interesting

      If they gave you options & explanations it would spoil the flat simple look!

      --
      Confucius say, "Find worm in apple - bad. Find half a worm - worse."
    2. Re:Interface could be better by Anonymous Coward · · Score: 0

      1) It's Gmail, not GMail.
      2) Click the "not spam" button above a message.

      The fact that you're too stupid to understand Gmail isn't a valid argument.

  15. They always flag my legit Paypal emails... by Anonymous Coward · · Score: 0

    I always figured it was because some suits at Google were shorting Paypal stock or something.

  16. first impressions usually right by supernova87a · · Score: 2

    One look at Vortex.com's front page, and I would quickly classify it as spam too...

  17. Bad guys only have to succeed once. by EzInKy · · Score: 1

    A million false positives are much more preferable than one false negative.

    --
    Time is what keeps everything from happening all at once.
    1. Re:Bad guys only have to succeed once. by DamonHD · · Score: 1

      Really no.

      Having all my incoming and outbound email thrown away because I might be a phisher or one of the people sending something to me might be is not a good thing. (And there are random days when for no obvious reasons suddenly some portion of mail in either direction for one of my mail accounts appears to be treated as SPAM by the large mail handlers for example, and I don't know.)

      I am capable of spotting and avoiding responding to phishing attempts myself, without assistance.

      I get 10,000 SPAM delivery attempts per day (when I bother to count), so I have a view on whether such a binary view of automated filtering is useful. Details matter.

      Rgds

      Damon

      --
      http://m.earth.org.uk/
  18. Why not? by Anonymous Coward · · Score: 0

    I’m certainly not going to change the names of my mailing lists or treat the term “Google Accounts” as somehow verboten!

    Why not? That seems like an easy way to deal with the problem.

  19. Regardless of what you think of Vortex... by Esekla · · Score: 1

    Setting the source and tone aside is crucial to any good analysis of a subject, so setting the Vortex source aside, I've noticed two things that seem to be relevant here. The first is that while Google is usually pretty good at blocking spam without false positives, it seems to be getting worse at that task, rather than better. Furthermore, it is notably bad detecting when an email is or is not a scam.

    The second, more important, point comes at the end of the original note, and it's that Google as a whole has virtually no functional feedback mechanism for error correction. It is very hard to get any attention at all from staff, and even if you can manage it, my recent interactions simply yielded brain-dead responses and endless run around. This was with a botched Google Wallet payment where the firm sent confirmation saying "This money is now yours." but never delivered it. After many hours of investigation, there is still no real answer or means for progress.

    Everybody gets things wrong sometimes, but Google seems has strayed a long way from its original "Don't be evil." motto. It now seems to do whatever benefits its bottom line, and costs the least without any regard for accuracy or allowing people to help it fix its mistakes.

  20. Don't look for constructive ideas on Slashdot by shanen · · Score: 1

    Even for today's Slashdot, that was a remarkably low-insight comment to get an insightful moderation. There are constructive approaches to consider, but at this point, why bother?

    --
    Freedom = (Meaningful - Coerced) Choice != (Speech | Beer^2), and sad sock puppets' bad mods avail them naught.