Gmail Spam Filter Testing

← Back to Stories (view on slashdot.org)

Posted by Hemos on Monday June 14, 2004 @02:35AM from the send-the-mail-in dept.

An anonymous reader writes "What can you do with 1000MB of e-mail space on your Gmail account? One guy, by the name of Aaron Pratt ( prattboy@gmail.com ), has decided to test the spam filters of Google's Gmail service by having his Gmail account blasted with every kind of spam imaginable. He is testing to see how well Gmail's spam filters can sort out the spam from legitamate email (yes, he does get personal emails from people). As of May 25th, he was at about 30% of his Gmail account's 1GB capacity. You can track his progress on his website, http://gmail.prattboy.net (Google cache of this site: cache: gmail.prattboy.net). Here is also an article talking about Aaron's efforts from webpronews.com"

8 of 285 comments (clear)

Min score:

Reason:

Sort:

whining? by Gothmolly · 2004-06-14 02:38 · Score: 5, Insightful

What's Google going to do to protect its users from mail bombs?

Now you're complaining that your free, 1GB-limit, access-from-anywhere email service could be mailbombed? Live with it. If Google "decides" anything more about our emails, we put on our tinfoil hats and scream. If we broadcast a bogus email address, obtained from gmail for clearly sinister purposes, and it gets mailbombed, we whine that Google doesn't "protect" us. Whats the story, or are we all just schizophrenic?

Don't want that "vulnerability"? Don't use Gmail!

--
I want to delete my account but Slashdot doesn't allow it.
1. Re:whining? by supersnail · 2004-06-14 02:47 · Score: 5, Insightful
  
  I don't think its about protection just practicality. Google offers a SPAM filter the littel pratt tested it and found it wanting.
  
  I think its more of a problem for Google than the end users. The whole Gmail "get a gigiabyte of memeory free" business model is predicated on most people using only a small fraction of that Gigibayte but felling good about the capacity being there. If I open up a gmail account, get p*ss*d of with the spam and go elsewhere without closing the account the 1G will fill up with spam in a couple of months, Google will end up storing terabytes of spam for cutomers who no longer use the service.
  
  --
  Old COBOL programmers never die. They just code in C.
2. Re:whining? by Pharmboy · 2004-06-14 02:55 · Score: 5, Insightful
  
  Now you're complaining...
  
  That is his JOB, to point out shortcomings of the system. He is a tester, and he is doing it for FREE. Google doesn't want testers who get 3 emails a day, they want people to test the living shit out of the service and point out what is wrong with it. Everyone knows Google will try to fix all the bugs, so all the press, good or bad, is still good press.
  
  If Google barfs when handling 999 messages in 4 minutes during testing, image when several million people have gmail accounts. Fortunately, now Google has an even to look at to see what the problem is. When you are trying to harden a system, YOU MUST BREAK IT OVER AND OVER AGAIN, to see where it is weak. This is what is happening.
  
  My impression is that the tech's at Google are spending a significant amount of time saying "oh shit, never thought of that, cool." which is the ENTIRE REASON FOR TESTING. They can't think of every situation by themselves. This is also the entire concept behind "open software is more secure". Google's gmail is going to have bugs at this stage and lots of them, period. Google knows this, hell, everyone knows this (this is why its in testing, and not open to the public yet, duh)
  
  It not whinning, its stating the facts, which Goggle obviously WANTS him to gather, as a TESTER. Seems to me that he is going beyond the call of duty to test their servers, since he is spending a fair amount of his own time.
  
  --
  Tequila: It's not just for breakfast anymore!
Not a fair test by SWroclawski · 2004-06-14 02:40 · Score: 5, Insightful

He's not counting all the mail that Google is rejecting and not even being allowed in for further classification.
Re:Not that impressive by Apiakun · 2004-06-14 02:44 · Score: 5, Insightful

Don't forget that this is google's first foray into mail software, and it is still in beta. I have so far gotten very little spam in my gmail inbox.
Re:One of the best things Google/GMail could do by jefe7777 · 2004-06-14 03:34 · Score: 5, Insightful

>> You think they bother?

heh heh...abdolutely.

100 known good addresses are worth 10,000 "who the fuck knows" addressess.

>>It's cheaper to just send mail to everyone

no it's not.

let's pretend you are a spammer, and you want to send out spam.

If you target 1 billion questionable addresses, each time a client has a new campaign, then that's 1 billion pieces you have to deliver. every time.

what if you have 1000 clients? that's 1000 billion deliveries.

do you see where this is going? if you don't KNOW WHAT A VALID EMAIL ADDRESS IS, YOU HAVE TO GUESS.

but what if the first time you send out just a "test" to those billion addresses, and then subtract the one's that bounce.

You are left with 50,000 known good addresses.

that's gold. You now have 1/20th of the load,and you are now serving your clients quicker, a helluva lot less load. you are only using an open relay for 1/20th of the time.

overall a smaller footprint by 1/20th.

you tell me. does it make sense to blindly blast out email?
Re:One of the best things Google/GMail could do by letxa2000 · 2004-06-14 03:56 · Score: 5, Insightful

Spammer is trying to do two things: 1. break any Bayesian filter used on that mail server/inbox. Adding noise to the filter will allow more mail through as "questionable". This might still be tagged as spam, but not as readily as it would be without the added noise
Except that won't work, as anyone that understands Bayesian filtering will tell you. In the case of every message with "random words" I've checked recently, the random words actually increased the spam score of that message. Why? Because it seems the random words aren't so random and either the same spammer is using the same "random words" over and over or various spammers are using sets of the same words. Over time most of the "random words" they use actually become great indicators of spam since my real email doesn't typically contain the random words they use.
In one recent analysis, 10 random words were inserted by the spammer. He got lucky and 1 of those words actually had a very low score for my Bayesian corpus. Unfortunately (for him), the other 9 words had scores of 99.99%! His use of random words literally nuked any possibility of him getting through my filter.
Anyway, random words will not help spammers get through Bayesian filters. But it seems that many people (both spammers and non-spammers) think it will. But, hey, that's good for me: as long as "random words" is seen by spammers as a viable solution to Bayesian filters, my Bayesian filter will continue to work and will not have to deal with any innovative way to get around the filter (if any exists).
More focus on false positives. by ron_ivi · 2004-06-14 04:14 · Score: 5, Insightful

Reviews of spam filters always seem to focus on how much stuff they block.
The consequenses of blocking a non-spam email are so much worse (parent not hearing from kid. the customer that would have saved your startup.) than a spam getting in, I wish the spam filter reviews would focus on those.