Ask Slashdot: Why Can't Google Block Spam In Gmail?
An anonymous reader writes Every day my gmail account receives 30-50 spam emails. Some of it is UCE, partially due to a couple dingbats with similar names who apparently think my gmail account belongs to them. The remainder looks to be spambot or Nigerian 419 email. I also run my own MX for my own domain, where I also receive a lot of spam. But with a combination of a couple DNSBL in my sendmail config, SpamAssassin, and procmail, almost none of it gets through to my inbox. In both cases there are rare false positives where a legit email ends up in my spam folder, or in the case of my MX, a spam email gets through to my Inbox, but these are rare occurrences. I'd think with all the Oompa Loompas at the Chocolate Factory that they could do a better job rejecting the obvious spam emails. If they did it would make checking for the occasional false positives in my spam folder a teeny bit easier. For anyone who's responsible for shunting Web-scale spam toward the fate it deserves, what factors go into the decision tree that might lead to so much spam getting through?
Agreed, I run both my companies network (mx, spf, all that jazz) and my personal through gmail, and I get maybe 1 spam message per month on each account tops. I often open them as it is usually an interesting trick that the spammer used (that google will pick up immediately and I'll never see again)
Disclosure: my name is Bruno Bowden and I managed the engineering team on Enterprise Gmail many years ago at Google before leaving to work in venture capital. My profile is www.linkedin.com/in/brunobowden. Though I didn't work on spam fighting directly, I interacted a great deal with the spam team while I worked there.
One of the main architects of the spam fighting system - Brad Taylor - published a scientific paper on "Sender Reputation in a Large Webmail Service" - http://www.ceas.cc/2006/19.pdf. This has a lot of detail about the system. We keep much of the internals secret as it reduces the chance that a spammer can reverse engineer and work around the system. If you'll allow me to be vague, the number of signals it uses was stunning to me. There's a mixture of hard wired tests (e.g. is the sender in someone's address book), reputation (domain and content), machine learning and anything else we can make work.
One of the principle improvements came when we switched to user classification through the "Report Spam" button. People have different opinions on what constitutes spam, so individual filtering is far more effective. It also avoids the politics of certain lists of domains and IPs from third parties which can be controversial. Even then it has challenges, as sometimes users will mistakenly pick out a phishing email and mark it "Report Not Spam". Because of that, Gmail now adds a red warning banner to indicate more strongly what is a likely a phishing attempt. In general, Google has tried to be very supportive of encryption, e.g. DKIM for authentication (and SPF) to STARTTLS for privacy. I would also like to mention the abuse team that works hard to prevent gmail being used as a source of spam, shutting down accounts as soon as possible after suspicious email is sent, then helping affected users to recover their account.
In general, the Gmail has received a lot of compliments on the spam filtering, I'm sure the team will be grateful for the positive comments here on Slashdot. There are still things that can confuse the system, e.g. receiving forwarded email (which might be missing source IPs) or genuine email that is sent to the wrong address. Though the system isn't perfect, I know the team will continue to work hard on it.