Gmail Spam Filter Testing
An anonymous reader writes "What can you do with 1000MB of e-mail space on your Gmail account? One guy, by the name of Aaron Pratt ( prattboy@gmail.com ), has decided to test the spam filters of Google's Gmail service by having his Gmail account blasted with every kind of spam imaginable. He is testing to see how well Gmail's spam filters can sort out the spam from legitamate email (yes, he does get personal emails from people). As of May 25th, he was at about 30% of his Gmail account's 1GB capacity. You can track his progress on his website, http://gmail.prattboy.net (Google cache of this site: cache: gmail.prattboy.net). Here is also an article talking about Aaron's efforts from webpronews.com"
psh.. i've done this to my friends before.. they didn't need to make a website to ask for it...
-------
"In times of universal deceit, telling the truth becomes a revolutionary act."
-- George Orwell
NMG
Is use the GMail data to operate a checksum blacklist. Obviously, if thousands (or millions) of their users are getting the exact same email, it's probably spam.
... to the entire Slashdot community! Now he's going to be flooded with all sorts of spam and shit. LOL!
:)
Oh... right.
What's Google going to do to protect its users from mail bombs?
Now you're complaining that your free, 1GB-limit, access-from-anywhere email service could be mailbombed? Live with it. If Google "decides" anything more about our emails, we put on our tinfoil hats and scream. If we broadcast a bogus email address, obtained from gmail for clearly sinister purposes, and it gets mailbombed, we whine that Google doesn't "protect" us. Whats the story, or are we all just schizophrenic?
Don't want that "vulnerability"? Don't use Gmail!
I want to delete my account but Slashdot doesn't allow it.
...how many e-mails has he recieved in total? I've kept spam for six months before and it totaled less than 100MB...and I get a cubic buttload of crap daily.
Don't be a looter...and yes, I know that it's spelled with an "A" instead of an "E".
He's not counting all the mail that Google is rejecting and not even being allowed in for further classification.
Can anyone provide a link or source to the kind of filters google has working on gmail?
Let's keep in mind that patents are in place to keep lawyers employed and keep them litigating. -CatGrep
Let's all send him an email and ask him how it's working out.
Best Windows Freeware
"Here is also an article talking about Aaron's efforts from webpronews.com""
Since we are talking about spam and obtaining more spam, I don't know if I should read the site the article is on as "web pro news dot com" or "web pron ews dot com"...
I guess I'll figure it out sometime.
Seems like Gmail only filters approx. 50% of spam. That is not very impressive, since the top anti-spam software and e-mail clients (such as Outlook 2003 and Mozilla Thunderbird) can easily reach 95% accuracy in spam filtering.
I am starting to second guess whether I should transfer everything to my Gmail account.
>isn't gmail still in 'beta' stages? if so, isn't a review of
>spam filtering techniques a little premature?
What part of Beta TEST escapes you here?
Just because you're paranoid doesn't mean they aren't out to get you
The guy who got booted off AventureMail (2GB free) for trying to test their spam filters? The story is on Kuro5hin, if anyone wants to see it.
I did some testing of my own. I forwarded a ton of spam from my personal account to my gmail account, just to see what would get through and what would be filtered. For me, gmail was really effective, but strangely, one Nigerian e-mail scam mail didn't get tagged.
:)
It was from " Mr Jubril Udeh Manager of Credit and Accounts Department of North Atlantic Securities Sarls Lome-Togo Republic."
Now, the funny part is not that the mail made it through, but that google also decided to show me contextual ad's on that account. Currently, the ads are:
- Payroll Cards a Poor Substitute for Checking Account
- Tips for Tackling Check Fraud
- Sophos hoax description: Ethiopian airline letter
- FAP non-US Investment FAQs
In the past the mail has also shown me ads on how to open an off-shore bank account. I'm glad google is willing to help me with the $10.5 million dollars that I'm about to receive!
- "When you want something with all your heart, the entire universe conspires to give it to you" -Paulo Coelho
Checksums are nearly useless against spam. It only takes one byte to change the checksum value and probably more than 90% of spam contain a personalization code to check which addresses are functional. Different code = different checksum.
This doesn't mean it wouldn't be possible to create a system which would automatically detect individual spam messages based on tagging known spam, you just have to be smarter about the detection than just plain MD5ing the email body.
"Although it is not true that all conservatives are stupid, it is true that most stupid people are conservative."
While we cannot block every domain name (i.e. if you get spam from $#(*$#sexphreak@yahoo.com) because it will alienate your legitimate contacts, there are many domain names that we can block (i.e. @spam-your-gmail.com). Yahoo provides email/domain name blocking, but limits this to 100 (unless you are paying). Do we know if gmail will have this limitation? /., not me :)
-A
*just for those who didn't know, the above domain names and email accounts are random, any resemblence to an actual domain or email account is purely coincidental, and if you choose to do so, you should sue
I mod down so you can mod up. Your welcome.
I have subjected my e-mail address, afriguru@gmail.com to the same abuse. by redirecting all e-mail addresses that recieve lots of junk mail to this one and posting the address unprotected to lots of websites and newsgroups. At the initial stage, a lot of 419 scam mails got through, but now I hardly get any spam. No false positives for me so far.
_____________________
Seun Osewa, Abeokuta Nigeria
Right! My only idea is that Google's technology is so advanced, it filters messages before they are even sent. It's gotta be a result of faster-than-light calculations. Boy, I'm gonna buy me some stock.
NMG
>legitamate
How about having Slashdot editors/Hemos test the gmail spell checker too?
Spam is unsolicited, so google should filter none of his mail.
This guy solicited it.
Did anybody else notice that his site hasn't been updated in almost a month (May 25)? Seems his project is no longer working. I wonder if Google booted him.
KevG
For those of you that don't have Gmail yet, there is a little "Report Spam" button you can use to, well, report spam. When Gmail gets a few million users, and even 1% use this little button, you are going to see the spam detect rate skyrocket.
no, you inversed it. You want MB/message, not message/MB.
3778 messages / 213 MB = 17.37 messages / MB
213 MB / 3778 messages = 0.0564 MB / message
So that's pretty reasonable.
Do you really expect the Google servers to go down because of /.? ;)
If I could stop all the spam I get...I'd feel like a whole string quartet!
Just to get you started, I'll give you a quick hint: virtually every internet discussion on spam includes some high and mighty moron that claims that by not giving out his email address, he never gets spam.
The problem is, that for every one of those, there are plenty more who follow the same precautions and yet get plenty of spam to those accounts for a variety of reasons. Clearly, your soution is not the answer to "how to never get spam."
A good rule for using the internet is to read a few discussions before you post. This way, you will be less likely to post something that makes you look naive. So sit back, relax, and enjoy a steaming hot cup of STFU while you read and learn!
His wang is going to be huge!
Crushing my karma one post at a time.
The consequenses of blocking a non-spam email are so much worse (parent not hearing from kid. the customer that would have saved your startup.) than a spam getting in, I wish the spam filter reviews would focus on those.
I have Mozilla, it has a Bayes SPAM filter. Lately, it's been getting fooled more and more. The messages that make it through have one or more of the following features:
.GIF form only - no plain text.
1) Several intentionally mis-spelled words
2) Lots of text in white (so it's invisible or nearly invisible)
3) Message in
Could you add filters that look for, say, more than 10% of the words mis-spelled, text font nearly equal to background color, or no actual text in message? These would take effect in addition to the existing Bayes filter.
A goal is a dream with a deadline
I tried to do the same thing with my AventureMail account but AventureMail wasn't cool with it. They deleted my account! You can check out what little data I collected before the account suspension and read the emails to and from AventureMail about the merits of the account suspension at http://3fingersalute.net/aventuremail
FoundNews.com - get paid to blog.,
Right, and my Thunderbird Bayesian filter catches all of those word salad approaches. But they've come up with a new one - what I call the "encyclopedia attack."
What they do is copy an encyclopedia entry and put it at the bottom of their spam. The thing is usually a few paragraphs long, so that textually it dominates the message. The subjects are fairly random, and are occasionally educational ;)
The problem is that the text of this doesn't trip the "too many strange words" flag that's used for word salads. My Thunderbird filter is really having trouble with these. Anyone else having trouble with these spams?