Gmail Spam Filter Testing

← Back to Stories (view on slashdot.org)

Posted by Hemos on Monday June 14, 2004 @02:35AM from the send-the-mail-in dept.

An anonymous reader writes "What can you do with 1000MB of e-mail space on your Gmail account? One guy, by the name of Aaron Pratt ( prattboy@gmail.com ), has decided to test the spam filters of Google's Gmail service by having his Gmail account blasted with every kind of spam imaginable. He is testing to see how well Gmail's spam filters can sort out the spam from legitamate email (yes, he does get personal emails from people). As of May 25th, he was at about 30% of his Gmail account's 1GB capacity. You can track his progress on his website, http://gmail.prattboy.net (Google cache of this site: cache: gmail.prattboy.net). Here is also an article talking about Aaron's efforts from webpronews.com"

19 of 285 comments (clear)

Min score:

Reason:

Sort:

whining? by Gothmolly · 2004-06-14 02:38 · Score: 5, Insightful

What's Google going to do to protect its users from mail bombs?

Now you're complaining that your free, 1GB-limit, access-from-anywhere email service could be mailbombed? Live with it. If Google "decides" anything more about our emails, we put on our tinfoil hats and scream. If we broadcast a bogus email address, obtained from gmail for clearly sinister purposes, and it gets mailbombed, we whine that Google doesn't "protect" us. Whats the story, or are we all just schizophrenic?

Don't want that "vulnerability"? Don't use Gmail!

--
I want to delete my account but Slashdot doesn't allow it.
1. Re:whining? by supersnail · 2004-06-14 02:47 · Score: 5, Insightful
  
  I don't think its about protection just practicality. Google offers a SPAM filter the littel pratt tested it and found it wanting.
  
  I think its more of a problem for Google than the end users. The whole Gmail "get a gigiabyte of memeory free" business model is predicated on most people using only a small fraction of that Gigibayte but felling good about the capacity being there. If I open up a gmail account, get p*ss*d of with the spam and go elsewhere without closing the account the 1G will fill up with spam in a couple of months, Google will end up storing terabytes of spam for cutomers who no longer use the service.
  
  --
  Old COBOL programmers never die. They just code in C.
2. Re:whining? by Pharmboy · 2004-06-14 02:55 · Score: 5, Insightful
  
  Now you're complaining...
  
  That is his JOB, to point out shortcomings of the system. He is a tester, and he is doing it for FREE. Google doesn't want testers who get 3 emails a day, they want people to test the living shit out of the service and point out what is wrong with it. Everyone knows Google will try to fix all the bugs, so all the press, good or bad, is still good press.
  
  If Google barfs when handling 999 messages in 4 minutes during testing, image when several million people have gmail accounts. Fortunately, now Google has an even to look at to see what the problem is. When you are trying to harden a system, YOU MUST BREAK IT OVER AND OVER AGAIN, to see where it is weak. This is what is happening.
  
  My impression is that the tech's at Google are spending a significant amount of time saying "oh shit, never thought of that, cool." which is the ENTIRE REASON FOR TESTING. They can't think of every situation by themselves. This is also the entire concept behind "open software is more secure". Google's gmail is going to have bugs at this stage and lots of them, period. Google knows this, hell, everyone knows this (this is why its in testing, and not open to the public yet, duh)
  
  It not whinning, its stating the facts, which Goggle obviously WANTS him to gather, as a TESTER. Seems to me that he is going beyond the call of duty to test their servers, since he is spending a fair amount of his own time.
  
  --
  Tequila: It's not just for breakfast anymore!
3. Re:whining? by StrongAxe · 2004-06-14 15:07 · Score: 2, Insightful
  
  Good point about the problem of abandoned accounts, which won't bring Google any ad revenue. Wouldn't be surprised if they start euthanizing inactive accounts.
  
  Both Yahoo and Hotmail automatically close and erase free mail accounts that are inactive for 30 days. I wouldn't be surprised if most other free email services had similar policies.
If this guy has used 30% of his capacity... by Dagny+Taggert · 2004-06-14 02:38 · Score: 3, Insightful

...how many e-mails has he recieved in total? I've kept spam for six months before and it totaled less than 100MB...and I get a cubic buttload of crap daily.

--
Don't be a looter...and yes, I know that it's spelled with an "A" instead of an "E".
gmail still beta by ryen · 2004-06-14 02:39 · Score: 2, Insightful

isn't gmail still in 'beta' stages? if so, isn't a review of spam filtering techniques a little premature?
Not a fair test by SWroclawski · 2004-06-14 02:40 · Score: 5, Insightful

He's not counting all the mail that Google is rejecting and not even being allowed in for further classification.
1. Re:Not a fair test by Plutor · 2004-06-14 04:09 · Score: 3, Insightful
  
  Is there any evidence that Google actually does this? I would think that would be terribly non-transparent. Auto-deleting email that it's "really sure" is spam is still dangerous. Even the best-trained Bayesian filters will have false positives sometimes. Is this just random theorizing, or does GMail really fail to deliver some emails it thinks is spam?
Re:Not that impressive by Apiakun · 2004-06-14 02:44 · Score: 5, Insightful

Don't forget that this is google's first foray into mail software, and it is still in beta. I have so far gotten very little spam in my gmail inbox.
What is the big deal? by Zugot · 2004-06-14 02:45 · Score: 2, Insightful

Mozilla Thunderbird or Spamassassin will filter at least as well or even better. Is this just a test to see how quickly we can fill up gmail's disk?

--
-- Bryan
Re:One of the best things Google/GMail could do by Anonymous Coward · 2004-06-14 02:52 · Score: 2, Insightful

Anti-Spammers have thought of this, too. Things like the Distributed Checksum Clearinghouses have "fuzzy" matching.

Google also has enough computer power to generate some sort of Bayesian filter to catch the most common spam system wide, and even a personalized filter on each account to catch the rest.
Re:One of the best things Google/GMail could do by jefe7777 · 2004-06-14 03:34 · Score: 5, Insightful

>> You think they bother?

heh heh...abdolutely.

100 known good addresses are worth 10,000 "who the fuck knows" addressess.

>>It's cheaper to just send mail to everyone

no it's not.

let's pretend you are a spammer, and you want to send out spam.

If you target 1 billion questionable addresses, each time a client has a new campaign, then that's 1 billion pieces you have to deliver. every time.

what if you have 1000 clients? that's 1000 billion deliveries.

do you see where this is going? if you don't KNOW WHAT A VALID EMAIL ADDRESS IS, YOU HAVE TO GUESS.

but what if the first time you send out just a "test" to those billion addresses, and then subtract the one's that bounce.

You are left with 50,000 known good addresses.

that's gold. You now have 1/20th of the load,and you are now serving your clients quicker, a helluva lot less load. you are only using an open relay for 1/20th of the time.

overall a smaller footprint by 1/20th.

you tell me. does it make sense to blindly blast out email?
Re:About spam and blocking by stevesliva · 2004-06-14 03:45 · Score: 2, Insightful

I've found whitelists, combined with treating everything as junk, to be far more useful than blacklists.

--
Who do you get to be an expert to tell you something's not obvious? The least insightful person you can find? -J Roberts
Re:One of the best things Google/GMail could do by letxa2000 · 2004-06-14 03:56 · Score: 5, Insightful

Spammer is trying to do two things: 1. break any Bayesian filter used on that mail server/inbox. Adding noise to the filter will allow more mail through as "questionable". This might still be tagged as spam, but not as readily as it would be without the added noise
Except that won't work, as anyone that understands Bayesian filtering will tell you. In the case of every message with "random words" I've checked recently, the random words actually increased the spam score of that message. Why? Because it seems the random words aren't so random and either the same spammer is using the same "random words" over and over or various spammers are using sets of the same words. Over time most of the "random words" they use actually become great indicators of spam since my real email doesn't typically contain the random words they use.
In one recent analysis, 10 random words were inserted by the spammer. He got lucky and 1 of those words actually had a very low score for my Bayesian corpus. Unfortunately (for him), the other 9 words had scores of 99.99%! His use of random words literally nuked any possibility of him getting through my filter.
Anyway, random words will not help spammers get through Bayesian filters. But it seems that many people (both spammers and non-spammers) think it will. But, hey, that's good for me: as long as "random words" is seen by spammers as a viable solution to Bayesian filters, my Bayesian filter will continue to work and will not have to deal with any innovative way to get around the filter (if any exists).
Paid yahoo is better by Avumede · 2004-06-14 04:03 · Score: 2, Insightful

I pay the $20 for extra Yahoo email, and I have to say that their spam filtering is much better than gmail's right now. I have about 10 spams a day to clear out of gmail, where with Yahoo it's more like 1, often 0.

People that don't pay for Yahoo don't seem to get such good spam filtering, though.

Google can definitely do better.
Calculations? by haxor.dk · 2004-06-14 04:08 · Score: 2, Insightful

So, in less than a month, he has recieved in excess of 300 Megabytes of useless junk ?

I think somebody needs to recalculate axactly how much bandwidth go to waste because of this SPAM plague. The cost in global comms traffic must be staggering!
More focus on false positives. by ron_ivi · 2004-06-14 04:14 · Score: 5, Insightful

Reviews of spam filters always seem to focus on how much stuff they block.
The consequenses of blocking a non-spam email are so much worse (parent not hearing from kid. the customer that would have saved your startup.) than a spam getting in, I wish the spam filter reviews would focus on those.
Re:One of the best things Google/GMail could do by wickidpisa · 2004-06-14 04:52 · Score: 3, Insightful

It may not increase false negatives, but it has decent chances of increasing false positives which is a much greater problem. My best guess is that spammers are hoping that once enough random words are classified as spam words, real emails with those words will start being classified as spam. If they can force enough false positives, people will start turning off bayesian filtering.
Re:New spin on the "word salad" strategy by mibus · 2004-06-14 15:01 · Score: 2, Insightful

Anyone else having trouble with these spams?

Surely it's the people who aren't having this problem that you want to hear from - they're the ones with good spam filtering ;-)