SpamArchive.org No More?
IrishMASMS writes "Back on November 21, 2002 Slashdot announced SpamArchive.org had just been launched. I configured my spam filters to submit to these guys. Well, the last few I have sent rejected; giving a 553 (sorry, that domain isn't in my list of allowed rcpthosts) error. Did some digging, and come to find out the SpamArchive.org site is just a placeholder; and the WHOIS shows virtualclicks.com aka PSI-USA, Inc. dba Domain Robot aka a Robert Farris now owns the domain. Some searching on the net indicates the fellow is a domain squatter. Anyone know the story as to what happened, and if the Spam Archive project is now dead? Was the Spam Archive project even a benefit or value added to the fight against spam?"
That sounds like a clever way of:
But hey, maybe I'm just being cynical.
FATMOUSE + YOU = FATMOUSE
Just provide your email address, and I'll be happy to provide you with a FREE feed of my spam archive. No need to thank me, just a little service I provide.
A second issue is that you want current spam; the global characteristics of spam change from week to week. So what's the use of an ancient archive?
And perhaps the biggest problem is that SpamArchive is a hodge-podge of mail from different sources, vetted only by the people who send it in. It isn't a sample of spam in any statistical sense.
Finally, there is no scarcity of spam. Ham is what people don't want to share.
So a collection of spam, particularly an old one sent in by self-selected volunteers, is of little practical use. The hard thing to get is a collection of spam and ham from a common place.
The TREC tests use private corpora that have legitimate mixes of ham and spam. They also use public corpora in which the spam has been carefully spoofed to make it appear to have been sent to the same recipients as the ham. Collecting the spam for the corpus was easy; spoofing was not.
Was it a little bit like Archive.org?
I know I'd be interested in finding out how badly people needed more inches and V!agr4 in the good ol' days.
Saskboy's blog is good. 9 out of 10 dentists agree.
And, as others have pointed out, a big slab of spam is useless for research unless you have equal amounts of real email to compare against.
So no wonder it didn't last.
ipfilter.org is similarly going to a domin squatting link page.
i need a filter that notices these bogus pages and blocks them.
comment directly in my journal
According to Justin Mason, it didn't help SpamAssassin much, at least where testing the effectiveness of rules was concerned. The main problems were that (1) the data was too anonymized to be able to properly test header checks and (2) submissions weren't verified, meaning someone would have to go through the archive and check to make sure there wasn't any legit mail that had accidentally been dropped into the wrong folder. (And, of course, unless you're the original recipient, you can't be absolutely certain whether something was solicited or not.)
Considering that the Netcraft uptime list shows a change of hosting/ip, chances are they forgot to renew and the domain was immediately squatted.
___
*insert sig here*
All this talk about penetrating and Viagra is making me feel uncomfortable...