Slashdot Mirror


SpamArchive.org Launched

An anonymous reader writes "SpamArchive.org has just been launched. SpamArchive.org is a community resource that provides a database of known spam to be used for testing, developing, and benchmarking anti-spam tools. The goal of this project is to provide a large repository of spam that can be used by researchers and tool developers. In the past, there were a few small personal spam archives that were used. There was no large set of spam that could be used to test new anti-spam algorithms. Thus, developers could not sufficiently test their techniques across a range of messages. Also, the lack of a "standard" sample of spam made it difficult to effectively benchmark anti-spam tools."

18 of 269 comments (clear)

  1. Hard to get worked up about that by RebRachman · · Score: 5, Interesting

    Even I know how to buy a domain name and write a few paragraphs of text on a white background. There is nothing about this archive to hint at its origin or credibility. This is a /. worthy story?

  2. archive overload by ndevice · · Score: 2, Interesting

    Asking for a slashdotting is one thing, but asking to be an archive for spam is another.

    I wonder if anyone knows just how much of the stuff is out there, and if it's even possible to store all that. Of course, spam being mostly duplicates and all, maybe they have a chance. But with spammers staying ahead of the game and rotationg their text, I wouldn't count on it.

    On the other hand, why not just set up a couple of hotmail accounts, bait them a bit, and just watch the spam come in? Why even bother asking for it?

  3. Who are these guys? by gomerbud · · Score: 5, Interesting

    Dude, i could have registered a simlar domain and put up a comparable web page within a matter of hours. I hope they really exist.

    Wouldnt it be great if the submit email address was forwarded to someone's ex girlfriend? Thats the ultimate form of revenge...

    1) Register domain name.
    2) Put up web page advertising some kind of anti-spam database.
    3) Forward all email sent to the submit address to someone you dont like.
    4) Get slashdotted.

    The end result is that three million people send 100 spams the first hour to the submit address. Within a short amount of time, your foe has 300 million emails in his/her mailbox. Now that's spam.

    --
    Kan jeg få en pils, vær så snill?
  4. Oh i thought it was a collection.... by phunhippy · · Score: 3, Interesting

    Damn!
    And there I was thinking they were creating a historical archive of all the funny worthless spam we get in our mailboxes every day...

    See that could turn spam in to a fun thing! set up a site where spam is ranked most popular by the number of people forwarding in the same SPAMS they get.. i think it would be interesting to see a daily/hourly/weekly TOP 10 SPAM in the world graphs..

    I would do this myself.. cept i suck at html.. anyone need a VoIP network built? :)

  5. What about the others ? by ltjohhed · · Score: 2, Interesting
    Like SpamHaus ? It seems like a similar service right ?!

    --
    All generalizations are false
  6. What if... by serlaten · · Score: 5, Interesting

    ...spammers use the anti-spam tools to create spam that doesn't trigger the automatic spam filters.

    1. Write spam mail
    2. Filter through widely used spam filter
    3. If spam is flagged as spam, rewrite; goto 2
    4. Send
    5. Profit
  7. Good idea by arvindn · · Score: 3, Interesting


    Aside from all the bashing these guys are getting here for not having any working code, this kind of database would actually be quite a good idea.

    One main problem for anti-spam is this: humans are very good at telling spam from legitimate messages. Comupters are nowhere close. Why not? Well, humans are simply better at certain types of problems like pattern recognition because of centuries of evolution. But there are ways around this: genetic algorithms and neural nets are two that I can think of. Both of these are "learning" strategies and need large databases to get started. We're talking about billions of messages or more, not the hundreds that you get everyday.
    So the kind of database (one for spam, one for non-spam) that these guys are talking about would be an excellent way to develop intelligent spam-detectors.

    Sorry if this is unpopular opinion, but we are against legal and in favor of technolgical solutions for most of the problems of the internet, aren't we? Then why are we waiting for anti-spam legislation to fall like manna from the sky? The best way to fight spam is using technology. Methinks this is a step in the right direction. So get off your ass and contribute. Forward your spam to them. Think of clever algorithms that can make good use of a large database. And code them. And submit patches. Isn't that what open source is for? Hey, may be this is going to be a killer app for open source, considering how big a problem spam is going to be in the next few years :)

  8. Re:A hotmail account is just as good by Anonymous Coward · · Score: 1, Interesting

    Having a hotmail account for a year now and receiving about 1 spam mail every 2 weeks on it (as opposed to my ISP account with 2 a day or so) i honestly have no idea what you're talking about

  9. Won't make a difference by ch-chuck · · Score: 2, Interesting

    You might as well start up a database to catalogue all the different shapes of sand on the seashore - largely useless exercise in futility.

    What people are starting to do is block EVERYTHING that isn't on a 'whitelist'. That way granny and Junior don't get mail from anyone unless they're pre-approved. If they get mail from J.Random Stranger it's bounced with a request to put a short random token in the subject line. Thanks to marketing a good third of Internet mail traffic is useless crap. Thanks marketers!

    To show just how evil and desperate unemployed, cash strapped, deep in debt spawns of satan those people are - yesterday I got a letter from my mortage holder, Chase Manhattan bank, marked "IMPORTANT ACCOUNT DOCUMENTS ENCLOSED". It turned out to be yet another credit card pitch. ("You qualify to give us even more money!!") Bastards. It's not my fault the Msft office automation vision they bought into turned out to be way more expensive than the sales flak led them to believe.

    I wish unemployed marketers would turn to prostitution and drugs instead of spam - at least they'd be supplying things people actually WANT.

    --
    try { do() || do_not(); } catch (JediException err) { yoda(err); }
  10. I think this is just going to make spam more annoy by autopr0n · · Score: 3, Interesting

    Call me a cynic, but in my estimation, the only thing effective Spam filters based on content are going to do is make Spam more annoying. Why? Because spammers are going to have the same access to filters that regular people do. All they'll need to do is run their Spam through the filters to check and make sure they pass. In other words, if these Spam filters really work well then it won't be possible to determine what is and isn't Spam by a quick glance at the subject line or formatting of the message. Rather then "INCREDIBLE OPPORTUNITY FOR FAST EAZY MONEY$$$$$$$$$5390ANFP9O" and "HOT HORNY SLUTS WANT TO MEAT YOU" we'll get stuff like "Dude, check this out!" with a body like "hey man, long time no see. What have you been up to? I've just been hanging out, not too exciting, although I met this cool chick off the 'net. Hrm, you still looking for a gf? You should check out FriendFinder.com :). Anyway, talk to you later, bro."

    And you'll need to read the whole message before you realize its Spam

    You might not like to believe it, but spammers (or at least some spammers) are hackers, in both senses of the word. ESRs supposed "hacker ethics" are as much bullshit as anything else he says.

    The only way these things will work is if the vast majority of people do not use these things. I don't know how likely that will be, with MSN already promoting it's 'less Spam' features.

    I think what we need is a fundamental change in the way email is handled. The current system is just way to prone to abuse, and should be replaced entirely. The new standard could use things like digital certificates and other technology to make sure you're talking to an individual (while protecting anonymity in some cases, although the receipt of anon email could be optional, etc, etc)

    --
    autopr0n is like, down and stuff.
  11. Maybe I'm being cynical.... by Maddog+Batty · · Score: 5, Interesting

    If you were a spammer and wanted to collect a large number of valid email addresses, how about this as an idea...

    1) Produce a website pretending to be antispam.

    2) Ask people to send their spam emails to the site (generally including a valid from address of course)

    3) Publish on slashdot so as to get lots of interest.

    4) ???

    5) Profit!

    (Unfortunately, we all know what stage 4 is for spammers...)

    --
    wot no sig
  12. Re:So... by Arker · · Score: 3, Interesting

    I've got two @msn.com accounts, and one @hotmail.com account. At most, I'll get two to three spam mails a week. I get more then that on my isp account (@attbi.com).

    I don't believe you.

    I'll tell you why. First, my mom has an MSN account, and it's overloaded with spam daily. Now granted, that may be her own damn fault, she could have given it out in ways she shouldn't, etc. But, I also have a hotmail account. I made it a few months ago solely to have a login to the MSN chat thingy because one particular client wanted to contact me that way. I was very careful to make sure that I read every page during sign up, and un-checked all the appropriate boxes - I opted in to NOTHING. I NEVER gave it to ANYONE, I never posted it anywhere, I never even logged into it, I only know about the email that hits it because the chat program tells you how many new mails you have when you sign in. I haven't used that either in awhile, but two weeks after creating the account, it had over 380 new messages.

    So I must say your claim is quite unbelievable.

    --
    =-=-=-=-=-=-=-=-=-=-=-=-=-=-
    Friends don't let friends enable ecmascript.
  13. How to end spam by Permission+Denied · · Score: 5, Interesting
    I've had the same email address for five years, and I receive zero spam. None whatsoever. I also advertise the email address widely (web, usenet, mailing lists).

    How does this work, you ask? I create a new email address each time I give out my email address. We have a sendmail setup that allows you to make "username+foo@example.com" go to "username@example.com" where "foo" is any arbitrary string.

    So, amazon.com thinks I'm "username+amazon@example.com", securityfocus thinks I'm "username+bugtraq@example.com" and so on. Once I receive spam on one of the addresses, it's trivial to write a filter that matches with near 100% confidence ("username+bugtraq@example.com" should only receive messages originating from securityfocus, etc.). Most times, if an address receives a spam, I can just procmail all mail to the address to /dev/null (eg, no complex rules like for the bugtraq example). This also allows me to track where spammers get their lists.

    We use sendmail. Equivalently, qmail allows "username-foo@example.com" and if you own your own domain, just use "foo@example.com".

    I find this advanced filtering stuff fascinating, from a completely academic point of view. I, of course, can't apply any of it since I don't receive any spam, but it's interesting nonetheless. I just read through how the Bayesian filter works. It is very simple: it only filters based on word (token) probabilities. So, it would assign a value to "make," "money" and "fast," but not "make money fast". Seems like you could get much better results if you do something more advanced like Markov chains or a neural net. There's lots of research out there on textual matching, and I'm not sure why people would start out with such a simple algorithm when there may be better things available (where "better" is measured not only by accuracy, but also by training time).

  14. um, why not just use the FTC? by rakerman · · Score: 3, Interesting

    They've got gazillions of messages sent to uce@ftc.gov

    Why not just make that available to the public for creating training sets for spam?

    The idea of a central archive is good, but I don't see why there's a need to reinvent a New! Improved! wheel.

  15. I think it's already been done, but in reverse... by Pendant · · Score: 5, Interesting

    In order to counter the rising tide of spam I recently installed a spamblocker, even though I'm wary of such beasts because of the danger of false positives.

    Sure enough, I have received false positives. But only from one source: my filter traps the Network Solutions email asking for confirmation to proceed with the transfer away of a domain to another registrar. Net$ol changed the format of these emails a while back: they now start off by talking about a "special offer" and it's only towards the end that the real purpose of the message is revealed. My suspicious mind wonders whether these emails are intentionally designed to look like spam to reduce the number of successful transfers... sneaky :(

  16. Re:So... by plumby · · Score: 5, Interesting
    It may partly depend on what user name you picked. I've got two email accounts with my ISP, neither of which I've ever given to anyone. One has a common surname as the account name. The other has a collection of random gibberish as username. The first one recieves several spam messages per day. The other one has probably recieved one in the last 3 months.

    I guess that the spammers quite probably have a standard list of common names that they put in front of @hotmail.com, @aol.com, etc.

    As a tip, though, I've just set my spam levels on hotmail to only recieve emails from people that are in my address book. I've not got a single spam on that account (except from MS themselves) since I did that.

  17. Resistant Strains? by Queuetue · · Score: 3, Interesting

    Although spam eradication is a good idea in general, I wonder if bulk training will only result in resistant strains of superspam developing, much like the v-cillin resistant staphs that are popping up lately.

    If we deal with a little spam by hand today, will that keep us from having to deal with undetectable spam later? I can imagine spam systems that probe you (using actual system probes of you and your contacts, marketing history and social engineering) to target spam that you may actually believe is a recommendation for the Sony(tm) handicam from your Uncle Bowser, or really is your wife asking you to pick up some Clorox(tm) brand bleach and fabric softener on the way home...

    Luckily, neither of them is likely to be sending information about my penis to me at work.

    Much like modding the Xbox (and thus giving MS the practice they need to harden Palladium), giving the hard fight to the spammers might just backfire on us.

  18. Re:So... by wheany · · Score: 3, Interesting

    I made a hotmail account that has a long username by repeating my "real" username several times. That way it is pretty safe from aaaaaaa, aaaaaab -type attacks. I've gotten 0 spams so far.