Slashdot Mirror


SpamArchive.org Launched

An anonymous reader writes "SpamArchive.org has just been launched. SpamArchive.org is a community resource that provides a database of known spam to be used for testing, developing, and benchmarking anti-spam tools. The goal of this project is to provide a large repository of spam that can be used by researchers and tool developers. In the past, there were a few small personal spam archives that were used. There was no large set of spam that could be used to test new anti-spam algorithms. Thus, developers could not sufficiently test their techniques across a range of messages. Also, the lack of a "standard" sample of spam made it difficult to effectively benchmark anti-spam tools."

7 of 269 comments (clear)

  1. Hard to get worked up about that by RebRachman · · Score: 5, Interesting

    Even I know how to buy a domain name and write a few paragraphs of text on a white background. There is nothing about this archive to hint at its origin or credibility. This is a /. worthy story?

  2. Who are these guys? by gomerbud · · Score: 5, Interesting

    Dude, i could have registered a simlar domain and put up a comparable web page within a matter of hours. I hope they really exist.

    Wouldnt it be great if the submit email address was forwarded to someone's ex girlfriend? Thats the ultimate form of revenge...

    1) Register domain name.
    2) Put up web page advertising some kind of anti-spam database.
    3) Forward all email sent to the submit address to someone you dont like.
    4) Get slashdotted.

    The end result is that three million people send 100 spams the first hour to the submit address. Within a short amount of time, your foe has 300 million emails in his/her mailbox. Now that's spam.

    --
    Kan jeg få en pils, vær så snill?
  3. What if... by serlaten · · Score: 5, Interesting

    ...spammers use the anti-spam tools to create spam that doesn't trigger the automatic spam filters.

    1. Write spam mail
    2. Filter through widely used spam filter
    3. If spam is flagged as spam, rewrite; goto 2
    4. Send
    5. Profit
  4. Maybe I'm being cynical.... by Maddog+Batty · · Score: 5, Interesting

    If you were a spammer and wanted to collect a large number of valid email addresses, how about this as an idea...

    1) Produce a website pretending to be antispam.

    2) Ask people to send their spam emails to the site (generally including a valid from address of course)

    3) Publish on slashdot so as to get lots of interest.

    4) ???

    5) Profit!

    (Unfortunately, we all know what stage 4 is for spammers...)

    --
    wot no sig
  5. How to end spam by Permission+Denied · · Score: 5, Interesting
    I've had the same email address for five years, and I receive zero spam. None whatsoever. I also advertise the email address widely (web, usenet, mailing lists).

    How does this work, you ask? I create a new email address each time I give out my email address. We have a sendmail setup that allows you to make "username+foo@example.com" go to "username@example.com" where "foo" is any arbitrary string.

    So, amazon.com thinks I'm "username+amazon@example.com", securityfocus thinks I'm "username+bugtraq@example.com" and so on. Once I receive spam on one of the addresses, it's trivial to write a filter that matches with near 100% confidence ("username+bugtraq@example.com" should only receive messages originating from securityfocus, etc.). Most times, if an address receives a spam, I can just procmail all mail to the address to /dev/null (eg, no complex rules like for the bugtraq example). This also allows me to track where spammers get their lists.

    We use sendmail. Equivalently, qmail allows "username-foo@example.com" and if you own your own domain, just use "foo@example.com".

    I find this advanced filtering stuff fascinating, from a completely academic point of view. I, of course, can't apply any of it since I don't receive any spam, but it's interesting nonetheless. I just read through how the Bayesian filter works. It is very simple: it only filters based on word (token) probabilities. So, it would assign a value to "make," "money" and "fast," but not "make money fast". Seems like you could get much better results if you do something more advanced like Markov chains or a neural net. There's lots of research out there on textual matching, and I'm not sure why people would start out with such a simple algorithm when there may be better things available (where "better" is measured not only by accuracy, but also by training time).

  6. I think it's already been done, but in reverse... by Pendant · · Score: 5, Interesting

    In order to counter the rising tide of spam I recently installed a spamblocker, even though I'm wary of such beasts because of the danger of false positives.

    Sure enough, I have received false positives. But only from one source: my filter traps the Network Solutions email asking for confirmation to proceed with the transfer away of a domain to another registrar. Net$ol changed the format of these emails a while back: they now start off by talking about a "special offer" and it's only towards the end that the real purpose of the message is revealed. My suspicious mind wonders whether these emails are intentionally designed to look like spam to reduce the number of successful transfers... sneaky :(

  7. Re:So... by plumby · · Score: 5, Interesting
    It may partly depend on what user name you picked. I've got two email accounts with my ISP, neither of which I've ever given to anyone. One has a common surname as the account name. The other has a collection of random gibberish as username. The first one recieves several spam messages per day. The other one has probably recieved one in the last 3 months.

    I guess that the spammers quite probably have a standard list of common names that they put in front of @hotmail.com, @aol.com, etc.

    As a tip, though, I've just set my spam levels on hotmail to only recieve emails from people that are in my address book. I've not got a single spam on that account (except from MS themselves) since I did that.