SpamArchive.org No More?
IrishMASMS writes "Back on November 21, 2002 Slashdot announced SpamArchive.org had just been launched. I configured my spam filters to submit to these guys. Well, the last few I have sent rejected; giving a 553 (sorry, that domain isn't in my list of allowed rcpthosts) error. Did some digging, and come to find out the SpamArchive.org site is just a placeholder; and the WHOIS shows virtualclicks.com aka PSI-USA, Inc. dba Domain Robot aka a Robert Farris now owns the domain. Some searching on the net indicates the fellow is a domain squatter. Anyone know the story as to what happened, and if the Spam Archive project is now dead? Was the Spam Archive project even a benefit or value added to the fight against spam?"
That sounds like a clever way of:
But hey, maybe I'm just being cynical.
FATMOUSE + YOU = FATMOUSE
Considering this is the first time I've heard of it, probably not as much as it should have been. Did it help SpamAssassin? If so, then yes, it was.
;)
If it's yet another site that finally went by the wayside because no one was using it, maintaining it, or interested in it; then it might have already served its purpose and has been retired.
The Internet moves fast and new things come along all the time to replace those things that are outdated and old. Some might say that about digg and Slashdot though
Just provide your email address, and I'll be happy to provide you with a FREE feed of my spam archive. No need to thank me, just a little service I provide.
I'd say its working exactly as advertised
A second issue is that you want current spam; the global characteristics of spam change from week to week. So what's the use of an ancient archive?
And perhaps the biggest problem is that SpamArchive is a hodge-podge of mail from different sources, vetted only by the people who send it in. It isn't a sample of spam in any statistical sense.
Finally, there is no scarcity of spam. Ham is what people don't want to share.
So a collection of spam, particularly an old one sent in by self-selected volunteers, is of little practical use. The hard thing to get is a collection of spam and ham from a common place.
The TREC tests use private corpora that have legitimate mixes of ham and spam. They also use public corpora in which the spam has been carefully spoofed to make it appear to have been sent to the same recipients as the ham. Collecting the spam for the corpus was easy; spoofing was not.
Was it a little bit like Archive.org?
I know I'd be interested in finding out how badly people needed more inches and V!agr4 in the good ol' days.
Saskboy's blog is good. 9 out of 10 dentists agree.
And, as others have pointed out, a big slab of spam is useless for research unless you have equal amounts of real email to compare against.
So no wonder it didn't last.
ipfilter.org is similarly going to a domin squatting link page.
i need a filter that notices these bogus pages and blocks them.
comment directly in my journal
mr31ducky7153@hotmail.com
...the cost of penetrating the defenses of the savvy user is much higher than just spooging "Buy! V|AGr4 N0W!@" emails all over the place, hoping some of them 'stick'...
So the odds of them bothering are lower, though not completely out of the picture. They just keep upping the ante once the clever ones pass down effective answers to block/bounce the damn stuff to the less clever people because it's not gotten too expensive for these monkeys to stop flinging the electronic poo around.
I am not merely a "consumer" or a "taxpayer". I am a Citizen of the State of Texas
According to Justin Mason, it didn't help SpamAssassin much, at least where testing the effectiveness of rules was concerned. The main problems were that (1) the data was too anonymized to be able to properly test header checks and (2) submissions weren't verified, meaning someone would have to go through the archive and check to make sure there wasn't any legit mail that had accidentally been dropped into the wrong folder. (And, of course, unless you're the original recipient, you can't be absolutely certain whether something was solicited or not.)
Considering that the Netcraft uptime list shows a change of hosting/ip, chances are they forgot to renew and the domain was immediately squatted.
___
*insert sig here*
We have to take into account that SpamArchive may have gone the way of Blue Security. Perhaps SA was effective enough to frustrate spammers and they took appropriate actions.
www.c3studios.ca
I'm sorry to see the spamarchive gone, but do want to point out that http://www.watchmyspam.com/ (*) has been going for a few months now. Not trying to one up spamarchive, as I've never heard of them, but WMS provides an RSS feed of current spam, which would make integrating the spam in to your own applications that much easier.
* While it's not fully web 2.0 compliant, it does have a shiny logo, is still in 'beta', and uses some javascript for not much real benefit.
creation science book
You would think that any self respecting CyberSquater would be collecting all the email address of the recipiants. After all, only a real person would reject Spam, hence the 'To:" address was a valid one. Most spammers would pay a bundle for a list of valid addresses!
Hey! How'd you get my email address?!?! DAMNIT SpamArchive.org!!!
Sorry to hear that SpamArchive.org is offline.
:)
However, if you want to read a lot of SPAM, I invite you to visit my site : www.testcompany.com
I've been posting all the email I've received @ testcompany.com for a few years now. If you like SPAM, and feel like you don't get enough, click on through
I ran a spam archive for a number of years, with emails dating back to around 1997. It's a lot of trouble - classifying spam isn't 100%, so there ended up being a few personal emails in there. However the big problems were these:
Now, none of these things can't be overcome of course (Bittorrent, no Google indexing), but I can understand why spamarchive.org wouldn't want the hassle, particularly since they probably weren't making any money out of it.
I still make my spam archive available to legitimate researchers, and I'm glad to say that a paper has been published and another is in preparation.
Rich.
libguestfs - tools for accessing and modifying virtual machine disk images
"SPAM" is food; "spam" is deliberately-generated noise on our communications systems.
http://outcampaign.org/
I wonder if he's the same Robert Farris of EFTDatalink.com and AmstarSystems.com? I once knew that Robert Farris. He had been in and out and in the business of ATMs for quite some time and operates using "curious business methods."
Who cares about SpamAssassin? spamarchive.org was good for content filters. SA is to much rule based then statistical.
the original hosting company went under and its bits and pieces got swished around and sold and resold and one day you look up and nothing is like you left it, and the process for resolution requires actual pieces of paper, an adventure in the big room, and oh so much judicial bs.
sonofabitch!
I have no timetable for the resolution of the particular issue, as it is high on the headache scale and low on the business critical scale.
--adam
He's unlikely to do anything but disparage the mini-projects of a corporation competitive with his own. Perhaps it was only useful to researchers and academics. In the grand scheme of things, it was free, so if you think it was the most worthless thing ever, you can change the frikkin channel.
--adam
well, having read all the responses (and finding a few amusing)
I have come to this conclusion:
1. spammers aren't involved
2. the squatter in question hopped on the chance to get the
domain so he could "blackmail" the original holders into paying
to get it back.
now, if he was smart, he'd monitor the incomming connections
and figure out who was who and sell the list to a spammer
for a princely sum...
Understanding is much like a 3-edged-sword. in this: there are always 2 sides and the truth.
None of these anti-spam projects is ever going to work.
Nobody is EVER going to make some kind of magic filter to distinguish spam from real mail.
The reason is obvious - it's the exact same reason there's no vaccine for the common cold - the mal adapts to the anti-mal.
The only thing that will ever stop spam is:
a) Get rid of POP mail protocol.
b) If it costs the spammer money.
c) Users can retaliate.
(a) Could be done over the next six months of only people would get working on it and do it for the good of humanity instead of trying to lock people into a world of patents and proprietry crap.
(b) Could be done by attaching digital stamps which cost $0.01. If the stamps are re-usable then nobody will really have to pay. For the really mean, I'm sure there are plenty of companies which will give you stamps in exchange for watching adverts, ISPs will give them away as promotions "200 stamps per month!", etc. All you need is a stamp-tracking network to validate stamps. I'm sure the existing mail servers can take up the slack once they're unclogged.
(c) Something like the defunct "Blue Frog", but with teeth. I need a program which goes to spamvertised web sites and sucks their bandwidth. Bandwidth isn't free.
Bottom line: Positive action needs to be taken. No amount of "research" of conferences will help.
No sig today...
A squatter jumped it after we let it expire.