Ask Slashdot: Why Can't Google Block Spam In Gmail?
An anonymous reader writes Every day my gmail account receives 30-50 spam emails. Some of it is UCE, partially due to a couple dingbats with similar names who apparently think my gmail account belongs to them. The remainder looks to be spambot or Nigerian 419 email. I also run my own MX for my own domain, where I also receive a lot of spam. But with a combination of a couple DNSBL in my sendmail config, SpamAssassin, and procmail, almost none of it gets through to my inbox. In both cases there are rare false positives where a legit email ends up in my spam folder, or in the case of my MX, a spam email gets through to my Inbox, but these are rare occurrences. I'd think with all the Oompa Loompas at the Chocolate Factory that they could do a better job rejecting the obvious spam emails. If they did it would make checking for the occasional false positives in my spam folder a teeny bit easier. For anyone who's responsible for shunting Web-scale spam toward the fate it deserves, what factors go into the decision tree that might lead to so much spam getting through?
Spam folder in my Gmail catches 99.9% of all spam I receive.
As a bonus: it's also excellent about learning what I mark as spam, and dealing with false positives.
I realize that this is not a helpful response, but my Gmail account never gets spam, it's all properly filtered into the spam folder. Been years since I even gave spam a second though, actually. I imagine that most peoples' situations are similar.
This has not been my experience at all. I've found Google's email filters to be significantly better than anyone else's.
I can think of several other reasons not to use gmail - but spam filtering is not on that list.
#DeleteChrome
I think more likely what occurs is that they need to be extremely careful about false positives. So they push everything into a SPAM folder. But if you miss a critical email because Google accidentally thought something was spam when it wasn't, then Hello lawsuits. From a legal perspective, blocking anything going into their inboxen is a risk.
Agreed, I run both my companies network (mx, spf, all that jazz) and my personal through gmail, and I get maybe 1 spam message per month on each account tops. I often open them as it is usually an interesting trick that the spammer used (that google will pick up immediately and I'll never see again)
Google does an excellent job of catching spam. The submitter's problem isn't that, it's that he's got other numpties giving out his email address and then he's not using the Google-supplied tool (that little "mark as spam" button) to mark unwanted email so that Gmail learns his preferences. Instead, he's Dunning-Krugered together his own solution that barely works.
Submitter's problem is PEBKAC.
Hail Eris, full of mischief...
E pluribus sanguinem
Google can not do that because while for YOU an email in Chinese is a huge red flag, it means nothing to the chinese american student living in New York who still gets emails from her cousin in Hong Kong.
Most of the decisions you make are like this one. For you, country, language, etc. etc. are indications of spam, but they are not true for the general population.
So a spam filter designed for your personal use will always work a lot better than one designed for all users of google.
excitingthingstodo.blogspot.com
I'm not sure what this guy is doing, but when I ran my own mail server (which I did personally and professionally for well over a decade), spam was a huge problem for me. No combination of spamassassin, rbl's, heuristics signature checks, virus, etc... Nothing got me past 85-90% blockage. And I did everything right. And it was a constant unending fight.
When I switched to Google apps for my personal domain, my life changed. Google catches a HUGE amount of spam. Things still get through occasionally, and definitely get worse as black Friday and Christmas campaigns kick into high gear. But the majority of the spam I get is from legitimate business that decides to put me on their mailing lists without my permission.
The op either has on blinders, or is baiting.
I've had the exact opposite experience. GMAIL's filters are so much better than any service out there. I get less than 1 SPAM email a month into my actual inbox.
Mike @ The Geek Pub. Let's Make Stuff!
There are lots of legitimate sites that send emails on behalf of someone not on the domain. A lot of 'email this content to someone' links work that way. Maybe Microsoft understands how email is used in the real world far better than you do.
switch over to Yahoo mail
I've seen a lot of recent spam campaigns that get through my basic scanning using the following tactics:
1. Careful design to not trigger Spamassassin content rules, including blocks of text to fool the bayes filter.
2, Careful omission of any identifying headers except for completely valid SPF and DKIM headers with appropriately configured DNS.
3. Real Linux mail servers dropped onto virtual hosting providers.
4. Fresh IP addresses and domains - never used domains that are not blacklisted yet and IP addresses blocks from the hosting providers that take 10-30 minutes to get blacklisted
Then they use snowshoe spam tactics to trickle them out until they're blacklisted and then move to the next domain and address.
If your address is on the lists that the perpetrators of these campaigns are using, it's really hard to avoid spam right now. Not impossible, there are some countermeasures, but vanilla Spamassassin and your standard appliances are going to have problems. I can imagine google is going to have an easier time with this because of its size and volume (=more information), but it's far from trivial.
-db
You have to be careful not to break mailing lists etc. there are plenty of systems which mess up the headers.
Catching spam and filtering it is the wrong way to deal with the spam problem. At that point the spam has already been sent, already taken up storage and CPU time somewhere, and already cost you money (yes, even with a "free" email account like gmail it still costs money somewhere). And if you add in the costs of filters, with the admin time and storage they consume, it is even worse.
As I have said many times before, the only effective way to deal with spam is to approach it from an economic angle, as spam is an economic problem. Spam isn't sent out to piss you off, it is sent to make money. The spammers don't need you personally to buy anything, they just need someone else to buy something. The ROI on spam is incredible as the cost is almost nothing to send to billions of addresses, and only a couple of suckers are required in order to make money off the venture.
If you want to actually help end the spam epidemic, stop talking about filters and other crappy "solutions" that only accelerate the arms race with the spammers. The way to stop spam is to remove the profit motive. This has been done successfully already; if you can prevent the spammers from getting paid they won't send spam because it won't be worth their time. Groups have succeeded in this and the effect has been dramatic. By contrast filters just encourage spammers to employ more creative measures to get their messages through - many of which result in reducing the S:N ratio of filters.
Damn_registrars has no butt-hole. Damn_registrars has no use for a butt-hole.
The submitter does NOT complain about Google's ability to catch spam! He asks why Gmail does not REJECT obvious spam. Rejecting an email means that - in this case the Gmail - server does not even accept it. In such cases the sender gets back a Delivery Status Notification from his own server, telling him that his email did not go through because of such and such error. An important point here is that the email is not lost without any notification. The sender can try to contact the recipient in another way. Actually this may be better than putting the email into a spam folder if that is not monitored regularly, or at all. Yes, this is a valid question, but almost none have undersood it.
I was actually thinking of the opposite trend since a couple of years ago: even people fully capable of running their own mail servers are all using gmail these days; I think we're easily at the breaking point where noone really knows how to run a mail server anymore.
" It cannot just mark all advertisement as spam"
Advertisements in email are competition, not revenue. Google's incentives and your own are aligned.
... and may chance you didn't read my post: (There was a LOT more to my presentation that just this; this single part presented here to convey the concept).
The trouble is - the single part that you presented is clearly broken (eg it doesn't work well with the way many mailing lists work), so if it conveys the concept of your whole presentation, people are naturally going to assume that the whole presentation was broken...
Need to type accents and special characters in Windows? Use FrKeys
This isn't spam; at worst, it's bacn with a case of mistaken identity.
As someone whose full-time job is preventing spam (I work on Akismet, which checks about 380MM Web comments per day for spam), my general response to these kinds of questions is this: Fighting spam is hard because what's spam for you is not always spam for someone else, and spammers are continually changing tactics -- what worked to prevent spam yesterday may not work as well tomorrow, so it's a constantly moving target.
In my experience, GMail's filter is just ok. I see about 50 spam per day end up in my spam folder, 3 or 4 that make it to my inbox, and maybe one false positive per month (when I bother checking). That's a 94% success rate with a 0.3% FP rate (based on my ham email activity), assuming that they're not instantly discarding blatant spam that wouldn't even merit ending up in the spam folder (which they very well might be doing). If Akismet had this same success rate filtering comments on my blog, I'd have to manually mark 230 comments as spam each day instead of Akismet's missed spam average of about one per day. I don't complain about it though, since fighting spam is hard (see above).
And related ... there should be the ability for me to restrict where my email is access to/from and where it was sent from. I'm not going to Russia -- so why can't I block all access to my account from Russia?
Yeah, it's not quite a solution to spam, but I've had periods where I get a lot of spam in Cyrillic or Chinese/Japanese characters, and it would have been nice to be able to at least say, "If the email isn't using the Latin alphabet, treat it as suspect because I don't read any languages that use any other alphabets."
I've always thought part of the key to putting a dent in spam would be to make cryptographic email signatures ubiquitous. Then we could check the signature against a valid authority, and if an authority is vouching for too many spammers, then you yank its status as "a valid authority". Then it becomes the authority's job to self-police. Of course, getting people onboard with something like that is impossible.
Now how does your solution in checking "origin" compare with something like SPF? What is it checking the origin against?
And what if one of your friends goes to Russia on vacation and wants to send you an email?
The OP wrote, "I'd think with all the Oompa Loompas at the Chocolate Factory that they could do a better job rejecting the obvious spam emails. If they did it would make checking for the occasional false positives in my spam folder a teeny bit easier." In other words, he's saying that he wants Google to reject the mail before it gets to his spam folder. He's not complaining about the efficacy of their spam filters, but is instead suggesting that Google should find a way to reject it before it even hits his spam folder.
Disclosure: my name is Bruno Bowden and I managed the engineering team on Enterprise Gmail many years ago at Google before leaving to work in venture capital. My profile is www.linkedin.com/in/brunobowden. Though I didn't work on spam fighting directly, I interacted a great deal with the spam team while I worked there.
One of the main architects of the spam fighting system - Brad Taylor - published a scientific paper on "Sender Reputation in a Large Webmail Service" - http://www.ceas.cc/2006/19.pdf. This has a lot of detail about the system. We keep much of the internals secret as it reduces the chance that a spammer can reverse engineer and work around the system. If you'll allow me to be vague, the number of signals it uses was stunning to me. There's a mixture of hard wired tests (e.g. is the sender in someone's address book), reputation (domain and content), machine learning and anything else we can make work.
One of the principle improvements came when we switched to user classification through the "Report Spam" button. People have different opinions on what constitutes spam, so individual filtering is far more effective. It also avoids the politics of certain lists of domains and IPs from third parties which can be controversial. Even then it has challenges, as sometimes users will mistakenly pick out a phishing email and mark it "Report Not Spam". Because of that, Gmail now adds a red warning banner to indicate more strongly what is a likely a phishing attempt. In general, Google has tried to be very supportive of encryption, e.g. DKIM for authentication (and SPF) to STARTTLS for privacy. I would also like to mention the abuse team that works hard to prevent gmail being used as a source of spam, shutting down accounts as soon as possible after suspicious email is sent, then helping affected users to recover their account.
In general, the Gmail has received a lot of compliments on the spam filtering, I'm sure the team will be grateful for the positive comments here on Slashdot. There are still things that can confuse the system, e.g. receiving forwarded email (which might be missing source IPs) or genuine email that is sent to the wrong address. Though the system isn't perfect, I know the team will continue to work hard on it.
More GMail tricks, that may help you: when you have account
someaccountname@gmail.com
all email of the form
someaccountname+anysuffix@gmail.com
goes to your account. The plus sign is a literal character, not a concatenation operator. The only downside to this is that some email validation suites don't allow plus signs in user IDs, even though RFC 5322 allows them. Sometimes I use the format
someaccountname+onlinestore@gmail.com
when giving my email address to OnlineStore.com so that it's clear from where particular messages should originate.
Put my fist through my alarm clock with its ding-dong death inside my ear. - The Blackjacks.