Gmail Spam Filter Testing
An anonymous reader writes "What can you do with 1000MB of e-mail space on your Gmail account? One guy, by the name of Aaron Pratt ( prattboy@gmail.com ), has decided to test the spam filters of Google's Gmail service by having his Gmail account blasted with every kind of spam imaginable. He is testing to see how well Gmail's spam filters can sort out the spam from legitamate email (yes, he does get personal emails from people). As of May 25th, he was at about 30% of his Gmail account's 1GB capacity. You can track his progress on his website, http://gmail.prattboy.net (Google cache of this site: cache: gmail.prattboy.net). Here is also an article talking about Aaron's efforts from webpronews.com"
psh.. i've done this to my friends before.. they didn't need to make a website to ask for it...
-------
"In times of universal deceit, telling the truth becomes a revolutionary act."
-- George Orwell
NMG
Is use the GMail data to operate a checksum blacklist. Obviously, if thousands (or millions) of their users are getting the exact same email, it's probably spam.
... to the entire Slashdot community! Now he's going to be flooded with all sorts of spam and shit. LOL!
:)
Oh... right.
What's Google going to do to protect its users from mail bombs?
Now you're complaining that your free, 1GB-limit, access-from-anywhere email service could be mailbombed? Live with it. If Google "decides" anything more about our emails, we put on our tinfoil hats and scream. If we broadcast a bogus email address, obtained from gmail for clearly sinister purposes, and it gets mailbombed, we whine that Google doesn't "protect" us. Whats the story, or are we all just schizophrenic?
Don't want that "vulnerability"? Don't use Gmail!
I want to delete my account but Slashdot doesn't allow it.
...how many e-mails has he recieved in total? I've kept spam for six months before and it totaled less than 100MB...and I get a cubic buttload of crap daily.
Don't be a looter...and yes, I know that it's spelled with an "A" instead of an "E".
isn't gmail still in 'beta' stages? if so, isn't a review of spam filtering techniques a little premature?
He's not counting all the mail that Google is rejecting and not even being allowed in for further classification.
Can anyone provide a link or source to the kind of filters google has working on gmail?
Let's keep in mind that patents are in place to keep lawyers employed and keep them litigating. -CatGrep
Let's all send him an email and ask him how it's working out.
Best Windows Freeware
"Here is also an article talking about Aaron's efforts from webpronews.com""
Since we are talking about spam and obtaining more spam, I don't know if I should read the site the article is on as "web pro news dot com" or "web pron ews dot com"...
I guess I'll figure it out sometime.
cometh to Google
And now he'll have every troll and curious person here sending him spam to.
--- [Insert intresting Sig here]
Seems like Gmail only filters approx. 50% of spam. That is not very impressive, since the top anti-spam software and e-mail clients (such as Outlook 2003 and Mozilla Thunderbird) can easily reach 95% accuracy in spam filtering.
I am starting to second guess whether I should transfer everything to my Gmail account.
Of fighting the good fight...something parallel to that of a penguin, and a gun, and millions of tiny little Windows logos charging forth...
___ In the words of Gen. Douglas McArthur: "I'll be right back."
If I understand what he was talking about on his site, what he cansiders as spam partialy legitimate mailing lists and are not realy spam even if he did not personally sign up for them. IE (Me signing him up for the gay porn of the month club.) He may not want it but, I signed him up.
to be able to reserve a name without numbers attached to it.... Damn it's going to be a race :(
I mod down so you can mod up. Your welcome.
Mozilla Thunderbird or Spamassassin will filter at least as well or even better. Is this just a test to see how quickly we can fill up gmail's disk?
-- Bryan
The guy who got booted off AventureMail (2GB free) for trying to test their spam filters? The story is on Kuro5hin, if anyone wants to see it.
I wonder, if any, how many messages marked as spam were false positives?
I did some testing of my own. I forwarded a ton of spam from my personal account to my gmail account, just to see what would get through and what would be filtered. For me, gmail was really effective, but strangely, one Nigerian e-mail scam mail didn't get tagged.
:)
It was from " Mr Jubril Udeh Manager of Credit and Accounts Department of North Atlantic Securities Sarls Lome-Togo Republic."
Now, the funny part is not that the mail made it through, but that google also decided to show me contextual ad's on that account. Currently, the ads are:
- Payroll Cards a Poor Substitute for Checking Account
- Tips for Tackling Check Fraud
- Sophos hoax description: Ethiopian airline letter
- FAP non-US Investment FAQs
In the past the mail has also shown me ads on how to open an off-shore bank account. I'm glad google is willing to help me with the $10.5 million dollars that I'm about to receive!
- "When you want something with all your heart, the entire universe conspires to give it to you" -Paulo Coelho
Checksums are nearly useless against spam. It only takes one byte to change the checksum value and probably more than 90% of spam contain a personalization code to check which addresses are functional. Different code = different checksum.
This doesn't mean it wouldn't be possible to create a system which would automatically detect individual spam messages based on tagging known spam, you just have to be smarter about the detection than just plain MD5ing the email body.
"Although it is not true that all conservatives are stupid, it is true that most stupid people are conservative."
As of May 25th, he was at about 30% of his Gmail account's 1GB capacity
30% * 1000MB= 300MB of spam? I don't think I've got half of that in my life. Maybe 100MB of spam lifetime.
SO, let's do more math. Avg. spam message=5kb. Therefore, 5kb/300MB(by 300*1024)=61440 messages? Am I right? That is a whole bunch!
While we cannot block every domain name (i.e. if you get spam from $#(*$#sexphreak@yahoo.com) because it will alienate your legitimate contacts, there are many domain names that we can block (i.e. @spam-your-gmail.com). Yahoo provides email/domain name blocking, but limits this to 100 (unless you are paying). Do we know if gmail will have this limitation? /., not me :)
-A
*just for those who didn't know, the above domain names and email accounts are random, any resemblence to an actual domain or email account is purely coincidental, and if you choose to do so, you should sue
I mod down so you can mod up. Your welcome.
I have subjected my e-mail address, afriguru@gmail.com to the same abuse. by redirecting all e-mail addresses that recieve lots of junk mail to this one and posting the address unprotected to lots of websites and newsgroups. At the initial stage, a lot of 419 scam mails got through, but now I hardly get any spam. No false positives for me so far.
_____________________
Seun Osewa, Abeokuta Nigeria
Select "create a filter". Do so with the text of the bomb.
Select all the messages that it displays as able to be included that you've already archived (one click).
Select "Move to trash".
Viola.
His last week stats are:
Something is off... Unless his spams contain attachments, this says that each of his emails were 17 MB in size each.
I mean 17.73708.. This is /. afterall. :)
Hmmm.
... who needs spam filters anyway?
I've had no issues with my gmail account getting spam. As of right now, I've had it for about a month, with 50 megs of messages sent/recieved, and I've yet to find a single spam message in my inbox.
>legitamate
How about having Slashdot editors/Hemos test the gmail spell checker too?
And It's a Gmail account filled with spam.... ....God help us!
May the Maths Be with you!
- Home loans and refinancing
- Proven techniques help you find a date tonight - guaranteed!
- suuuper streeeeeeetch your coock
- Drive that new car today
- Give the girl what she needs
- STRAIGHT TALK ON HAIR TRANSPLANTS
- SEXUALLY-EXPLICIT: Rise N Shine, there all here
- Your Degree by Fedex shipped
- Make your man hood work right
- Rooooock Haaaaard Ereeeectiooons In 60 Seeeeeecooooooonds
- Sexually Explicit: At home mom's nude on cams
- Free Phone Free Shipping Easy Qualify
- get the p
.e. nis si. ze she wants - Clearance on 6 MegaPixel DigiCamera
- FWD: Ciialiss quazy DISCOUNTS - this is better then viagra, $2.0
** NOTICE: all subjects taken from my Yahoo email acct, from emails recieved this weekend. now you see why I run my own mailserver at home? **CB
free ipod and free gmail!
i mean, not that's not obvious.... testing the spam whatever capacities of gmail? talk about lame...
but yeah, if you're thinking, woah, i know that kid. you know, the crazy punk guy who always went on about tard factories... well, that's not him.
just so you know.
-- d'arcy poirot
some body please tell him to report PR scores instead of Accuracy!!!!
and is that his girlfriend in the background?
Anyone know how he's pulling the numbers off the page? Is there some kind of sneaky back-end that we can get stats about our account with? Is he manually entering all this info? Or maybe some kind of "screen-scraping" techniques to pull the data off the page... hmm...
I guess because his stats are about 2-3 weeks behind, it would indicate that things are leaning towards the manual procedure...
Spam is unsolicited, so google should filter none of his mail.
This guy solicited it.
i wonder how it deals with spam from other countries... say, korean / chinese spam?
he should have opened a ufie.org e-mail free mail account.
I've found GMail's filtering to be highly effective so far. I haven't received a single message yet.
No, but seriously... I've used my gmail account for posting on Usenet newsgroups even and initially there was minor training to be done with the SPAM filter, and ever since then I have had to see a single UCE.
This guy gets thousands of Spam mails without a problem, yet I can't receive a simple HTML attachment without the mail being rejected (552 Illegal Attachment). Hrmm...
Did anybody else notice that his site hasn't been updated in almost a month (May 25)? Seems his project is no longer working. I wonder if Google booted him.
KevG
For those of you that don't have Gmail yet, there is a little "Report Spam" button you can use to, well, report spam. When Gmail gets a few million users, and even 1% use this little button, you are going to see the spam detect rate skyrocket.
...or a mailing list.
Sailors. Oh man!
Has anyone gotten gmail to work in konqueror. I can get the login but nothing else. I just don't understand why they don't make it compatible. If it works in Safari for OS X then shouldn't it also work in Konqueror?
I just got an invite code and they only allow Konqueror at work. Any attempt at installing a different browser will be noticed (the admins are power crazy) and the resulting beating won't be worth it.
Don't let your email address appear in a public forum of any kind that is or can be crawled. I have employed this technique and can say with a straight face that in six months I have not received ONE piece of spam.
Sure, it's:
prattboy@gmail.com
Do you really expect the Google servers to go down because of /.? ;)
If I could stop all the spam I get...I'd feel like a whole string quartet!
Personally, I guard my gmail account as if it were more valuable than Gold.
:-)
I waited so long for the invite! And I got exactly the name I wanted, given it's so early in the system's lifespan. Now I bask in the admiration of other geeks as they receive my e-mail from gmail.
I hope they aren't going to kick us all off our accounts once the beta is over....
It works really great and Spam is not an issue for me anymore (I had more than 50 Spammails in my inbox AFTER mozilla-thunderbird filtering...)
When spamming his test account, if you send him a spam-like message and it's recognized as spam, GMail might start thinking that your email address is a spam source and suddenly you won't be able to email anyone who uses Google.
Wait, so spam counts towards the limit in GMail? That sucks horribly. It's possible that you could run out of space just from spam alone? (Although it WOULD take a while...)
It would be nice if they would do what Yahoo Mail does and have the spam folder not count toward your total.
Eh, I only got 180MB worth of email and spam out of the deal though, before I decided to delete the account. The Gmail Spam filter was rather horrible at the time; catching only the most tried and true SPAM, letting tons of other SPAM through, and then randomly flagging legitimate messages from people whom it had not flagged before. I think it has improved some since then.
http://www.sampletheweb.com
His wang is going to be huge!
Crushing my karma one post at a time.
I've been reading for months about people using gmail. When are the rest of us going to get an account?
I pay the $20 for extra Yahoo email, and I have to say that their spam filtering is much better than gmail's right now. I have about 10 spams a day to clear out of gmail, where with Yahoo it's more like 1, often 0.
People that don't pay for Yahoo don't seem to get such good spam filtering, though.
Google can definitely do better.
No, I'm not keeping proper statistics. =b
The World Wide Web is dying. Soon, we shall have only the Internet.
So, in less than a month, he has recieved in excess of 300 Megabytes of useless junk ?
I think somebody needs to recalculate axactly how much bandwidth go to waste because of this SPAM plague. The cost in global comms traffic must be staggering!
The consequenses of blocking a non-spam email are so much worse (parent not hearing from kid. the customer that would have saved your startup.) than a spam getting in, I wish the spam filter reviews would focus on those.
Working on Gmail...
"Please invite me to GMail!"
I have Mozilla, it has a Bayes SPAM filter. Lately, it's been getting fooled more and more. The messages that make it through have one or more of the following features:
.GIF form only - no plain text.
1) Several intentionally mis-spelled words
2) Lots of text in white (so it's invisible or nearly invisible)
3) Message in
Could you add filters that look for, say, more than 10% of the words mis-spelled, text font nearly equal to background color, or no actual text in message? These would take effect in addition to the existing Bayes filter.
A goal is a dream with a deadline
Single-user spam filters have to solve a tough problem, but Gmail can use a multi-user spam filter, which recognizes similar spams mailed to different mailboxes. The fundamental property of spam is that similar messages go to many people. Google can exploit that, much as Spamcop does.
In theory, Google should be able to recognize spam far more reliably than single-user spam filters. And this is a search problem, something Google is good at. What's wrong over there?
>And this is a search problem,
A detection problem, actually.
They can _find_ spam (and all other) messages, the problem is how to tell which ones are not legitimate while keeping false positives at minimum.
I assumed the same thing as the grandparent when I saw those emails -- that they were trying to get normal words marked as spam words, and make the filters less effective with normal messages. It would appear, though, that they're not very bright yet -- they're not targeting the low-scoring words. I expect that'll change before too long. What'll happen to your filter when all of the lowest scoring words it knows suddenly become the highest-scoring?
I don't actually know -- but I do know that you aren't the only one with access to those percentages.
I tried to do the same thing with my AventureMail account but AventureMail wasn't cool with it. They deleted my account! You can check out what little data I collected before the account suspension and read the emails to and from AventureMail about the merits of the account suspension at http://3fingersalute.net/aventuremail
FoundNews.com - get paid to blog.,
I'm sending prattboy a free DVD player (I got an ad for one) and it also offered me $1,000,000 from Publisher's Clearing House and it had the following pre-filled in and it wasn't the name I entered for the DVD player (Tester, Gmail): Girl, Pratt heow, AR 12333
Since we all know that slashdot is a favorite place to harvest addresses for spam, listing the guy here should do the trick : prattboy@gmail.com
prattboy@gmail.com
Here is a discussion from Yale's LawMeme on the legal ramifications of Prattboy's experiment. Does asking others to sign you up for spam count as an opt-in?
FoundNews.com - get paid to blog.,
Am I in the wrong month here? Doesn't his web page say the last stats for May 25th? This is June people his stats are over 2 weeks old, doesn't he update this page or what?
Right, and my Thunderbird Bayesian filter catches all of those word salad approaches. But they've come up with a new one - what I call the "encyclopedia attack."
What they do is copy an encyclopedia entry and put it at the bottom of their spam. The thing is usually a few paragraphs long, so that textually it dominates the message. The subjects are fairly random, and are occasionally educational ;)
The problem is that the text of this doesn't trip the "too many strange words" flag that's used for word salads. My Thunderbird filter is really having trouble with these. Anyone else having trouble with these spams?
I've had a gmail account for almost 3 months now. In the first month I got 3 spam messages, they all made it thru the filter. Since then I've gotten 5 more, only 1 of which made it thru. It's not statistically significant yet, but to me it feels like the filter has improved. I'm already up to 5% of my 1gig too...
What about vetting at least the image-based spam for checksumming? Scan the e-mail for image links (or images included inline). If there's a link, check it against the known list of spam links. If it's in the list, mark the message as spam. Spammers will quickly figure that trick out, though, so step two would be for Google to follow those links, and retrieve the images. Run a checksum of the image file itself; if there are a lot (say, a thousand) messages including the same image, tag it as spam. This combines spam filtering with the fun of reminding spammers that Google has an order of magnitude more bandwidth than they do. Use their own messages against them: the more you spam, the bigger the Slashdotting (Googling? Alas, that word's already taken.)
For bonus points, keep the downloaded images in the Google cache; keeps them available for the mail user, alleviating the load on the sending site for legitimate messages, and keeps them available for, well, the Google cache.
"Make it ten--I am only a poor corrupt official."
--Captain Louis Renault (Claude Rains), Casablanca
Why does he have a picture of Christina Aguilera for his website background image?
Nothing costs nothing
"Here is also an article talking about Aaron's efforts from webpronews.com"
:^(
Can you imagine my disappointment when I visited WebPronEws to find out what kind of porn "ews" is, and it turned out to be yet another dull tech site?
Sigh.
I'm not surprised that this is the type of thing he would be known for. This, or starting a website for people to complain about that school.
/. post this afternoon, he was unaware of it at the time.
P.S. I told him about this
-Mikey P
Through all that spam, I wonder if prattboy will notice if I asked him for a Gmail invite.
I have 3 ideas that may overcome spam.
This may require an overhaul of the email system though. One may be to have multiple addresses bound together. So you would give one email address and only "authenticated" or approved contacts could get your second address. Now sending an email simultaneously to the two email addresses would result in the email being delivered directly. mail sent to only one of the 2 addresses would be delivered as per normal, and would be subject to the normal filtering. But i guess the spammers would find ways to get both addresses too and defeat that, but it sould be doubly difficult, if not actually an order of magnitude more difficult. How may people get the same spam on different email addresses? This could be useful.
The other is to hit spammers where it hurts, audience. By rolling out a proper ad delivery system (yuck) which was separate from email, if people used their email less for getting information about products, but had it collected by some RSS type system, the spammers would be left with a dwindling audience unless they switched too. The ads would be strictly opt in.
Or mail collection rather than mail delivery. If people collected mail rather than got it delivered to them, they could in theory just not collect spam. why would anyone collect spam?
Lastly is education. if people kept their own whitelists of approved mailers, they could in theory get rid of most spam by keeping good whitelists.
If someone wanted to receive a lot of spam, what is the most effective way to do so? Sure, you could post messages to some newsgroups including your real email address - but does anyone know of some sure-fire way to get lots of spam very quickly?
Wonderful. So we finally have an article which provides ... *drumroll* ... an actual cache to the content! Rockin'. I don't care who hosts the cache (although I suspect that google is better equipped than most, in both cost absorbtion and raw bandwidth / failover capability), and in fact I don't want it to be slashdot. That'd be stupid, IMO.
Anyway ... yay!
Ok some of us out here, are begging for gmail addresses, and doing all sorts of things to get one. and this puZ is begging for spam mail. simple solution..... mv /home/prattboys"brain" /dev /null
Last night I found an error on prattboys website that others had pointed out and I was going to email him questioning it. Woke up this morning to check my own gmail and found this...
This is an automatically generated Delivery Status Notification
Delivery to the following recipient failed permanently:
prattboy@gmail.com
----- Original message -----
WEeeeeee! maybe they nuked his box? or was this because of akamia hosing?