Where Does Spam Come From? No, Really?
jnazario writes "The Center for Democracy and Technology has recently put together a really neat paper studying the methods by which spammers get your email addresses. The report posted otherwise unused email addresses in a variety of locations, using different techniques for visibility (ie HTML encoding vs plaintext) and then watched what accumulated after six months. They generated some interesting results into the methods by which spammers can track you (with publicly available websites containing your bare email address being the most popular method) and even some techniques to stop spam, such as HTML encoding your email address. A very interesting read."
If Slashdot posts the same report three times, is that slashspam?
This article is a duplicate of one posted on March 19 back when the CDT report was released:
CDT Releases New Report on Origins of Spam
Mirror
Right here
--sig fault--
"'These antispammers should get a life[...] Do their fingers hurt too much from pressing the delete key? How much time does that really take from their day?"
S PA M.html?pagewanted=print&position=
s s/ 1877197
"By contrast, she said, '70 million people have bad credit. Guess what? Now I can't get mail through to them to help them.'"
The whole story is available at:
http://www.nytimes.com/2003/04/22/technology/22
Also available at
http://www.chron.com/cs/CDA/ssistory.mpl/busine
Is Alyx Sachs the female Alan Ralsky?
Conclusions
1. E-mail addresses harvested from the public Web are frequently used by spammers. By an overwhelming margin, the greatest amount of spam we received was to addresses posted on the public Web.
When an address has been posted on the public Web, it can potentially be viewed by hundreds of millions of users. People who develop spam lists exploit this feature by using address-harvesting programs to surf across thousands of web sites, collecting any e-mail addresses that they encounter. Most users have no idea that their addresses have been harvested until they begin receiving spam.
2. The amount of spam received by an address posted on the public Web is directly related to the amount of traffic that Web site receives. The more visitors a Web site has in a given period of time, the greater the likelihood that an address-harvesting program used to send spam will scour it. As a result, addresses posted on high-traffic Web sites are likely to receive a greater amount of spam than address posted on smaller sites -- popular Web sites are more frequently "harvested," and addresses posted on those Web sites are added to a greater number of spam lists.
3. E-mail addresses harvested from the public Web appear to have a relatively short "shelf life." When e-mail addresses we posted on the public Web were removed, there was a pronounced drop in the amount of spam they received each day. The change was not absolute -- on a given day, an address might receive a few spam messages even months after it had been removed from the public Web. But such spam was on the order of 2 or 3 messages per day, compared to the thirty or more messages received by addresses still on the public Web.
4. Addresses posted in the headers of USENET messages can receive significant spam, though less than a posting on the public Web. Like most Web sites, USENET postings are publicly accessible and may be targeted by e-mail address-harvesting programs. When a user includes his or her address in the heading of a USENET message, that address can be harvested and used to send spam. Our preliminary data indicates that some USENET newsgroups are more frequently harvested for e-mail addresses than others.
5. Obscuring an e-mail address is an effective way to avoid spam from harvesters on the Web or on USENET newsgroups. Even when posted in publicly accessible areas, none of the addresses we obscured -- whether in English ("example at domain dot com") or in HTML -- received a single piece of spam. Users who want to avoid spam should consider obscuring their addresses when possible.
6. Sites that publish their policies and make choice available to users generally respected those policies. A major element of the CDT project was to submit e-mail addresses to a number of popular businesses and other organizations on the Web. Many of these sites had privacy policies describing how they handle e-mail addresses and other potentially sensitive pieces of information. While the terms of these policies varied, we found that almost all sites followed their policies. In addition, when consumers were offered choices about how their personal information would be handled, those choices were respected.
7. Domain name registration does not seem to be a major source of spam. Despite the fact that the WHOIS database is publicly accessible, our project received just a single spam message to an address that was in WHOIS for six months. This leads us to believe that, at least for some people registering new domain names, listings in the WHOIS database may not be a major source of spam. However, because our project had a relatively short duration, we were not able to examine whether additional spam would be received as a domain name approached its renewal date.
8. Even when an e-mail address has not been posted or shared in any way, it is still possible to receive spam through various "attacks" on a mail server. In our study, a "brute force" attack on the mail server generated a t
This is a consumer document meant to tell folks how to stop getting as much spam.
Useful insofar as it goes, but what would be much more helpful is an objective take on how spam gets to the end-system. It's very hard to generate this information. You can come up with the list of final-hop relays, but that's not as useful as you might think, since most of the really crappy spam software out there finds open relays dynamically and routes through them.
Slightly smarter software is now making it out there that performs some simple testing to determine how / if a given relay of choice can reach other sites. So for example, AOL's recent blocking of Commcast customers will help them in the short term, but over time they'll find that spammers simply stop using those relays and start using the ones that can get through. As new relays pop up, they will be used... eventually you would have to simply stop accepting mail in order to correctly prevent spam.
Like I say, it would have been useful to have the data on where spam is actually originating, but even without it, you can block spam with a very high degree of certainty based on the sender and relays with a much lower false positive (failure) rate than any of the bogus blacklist schemes out there. I'm about to add a module to SA to do just this, so stay tuned....
I was getting 500 spam a day. Hot damn, that is a lot. I have a bunch of URLs and I was promiscuous with my e-mail address(es). I had them up in newsgroups, message boards (even slashdot), I subscribed to crap, I bought things online, I registered at countless sites... and never with a condom. I have a paypal account, and I have registered at a few casinos (not to play, but to look for security holes - but that doesn't mean they don't still spam the hell out of me). And then my friends and I go through periods of signing each other up for things when we are asked to fill out forms - so it is hard to say how much of that has happened.
The bulk of what I was getting was from the URLs that I have registered - those URLs were setup to forward all mail at that address that didn't have an actual e-mail address to my address. So I disabled that feature to some extent, and it dropped my daily spam count down to a little over 120 or so a day.
So I then got curious and went through and "unsubscribed" from a bunch of them just to see what happened. My spam went down to about 30 a day. Hot damn, it worked.
But then it came back up over time - not sure if the unsubscribing just got my name on other lists, or if it just grew over time.
So I installed spamassassin, at the time 2.5 was in devel, so I used that. Various builds were better than others, and it got me down to about 1 or 2 spam that snuck through everyday.
Since then I have installed 2.6 and haven't kept up with the development builds as often since the changelog wasn't... well, wasn't changing much over the time that I was watching it.
I run it as the perl script, not the faster c daemon. I am on a shared server and scripts have to time out after 30 seconds of cpu time. So if the perl script is doing a lot of stuff, it gets killed, and the mail gets sent through.
So that was the bulk of the spam I was getting - not that spamassassin mistagged it - but that it was dying and letting it through that way.
So I went in and changed my settings. I disabled all of the blacklist checks (score RAZOR_CHECK 0 and score RAZOR2_CHECK 0). I raised the autolearning threshold to be higher so that it would do that less frequently. I have my good contacts on a whitelist. I made the required_hits spam score to be 3.5 instead of the default 5. I went in and made the 90% bayes score 3.5 and the 99% score to 4. I skipped the rbl checks and made the max attempts on anything that would try multiple times if there was any failure to be low (1-2).
As a result, it rarely kills the process now unless the server is under a lot of load - and now I get about 1 or 2 spam in a week instead of in a day.
I am a very big fan of spamassassin.
There are some odd things afoot now, in the Villa Straylight.
Mirror
- you are sofa king weed todd did
http://www.hcdonline.com/jobs/DisplayJob.asp?ID=3
Category: New Media
Job Title: eMail ad designer
Job Description: Need a techy or ad person who can jam out killer ads using front page for eMail campaigns. Easy gig for someone who knows how to write and cut and paste. Good op for freelance, college, or veteran Internet or Advertising guru
Job Location: Los Angeles
Phone Number: 323-871-2000x11
Fax Number: 323-871-0625
Email: yurontv@netglobalmarketing.com
Enjoy!
--rhad
Slashdot needs to interview Natalie Portman.
The more common strategy is to either use a fake return address, or just choose a more or less random return e-mail address either belonging to someone else (an anti spammer, perhaps?) or that has been registered for the purpose at a free e-mail service.
I used to be involved in running a fairly large free e-mail service, and our main spam problem was people using addresses from our system in the from field, not people spamming our user. When a spammer sends a few million messages to invalid AOL or Hotmail accounts and one of your addresses is in the From field, you sort of notice the bounce traffic....
Making the spammers crawl invalid e-mail addresses can reduce the amount of spam to real recipients they manage to send, though, which is why there's quite a few spamtrap scripts out there that generate pages containing lots of e-mail addresses and links to other pages generated on the fly by the script.
"Wouldn't that clog it up on their end with bounces? And maybe change the pages every few days with a new list, maybe there's a random email generator thing to come up with fake domains, like a password generator?"
Yes it would, but there in lies the problem. Say for example you are on someISP.net as your internat provider. Some one else decides to start spamming through someISP.net (either by an open relay, spoofing or even by actually having an account there. Buhzillions of bouncebacks start swarming someISP.net's servers and BAM! You dont get that e-card from your mother on your birthday.
The other problem is by having all those fake addresses. Let's say that spamboy sends out that proverbial "buhzillion" messages. That's all traffic that the backbones have to route. NOW since those e-mails are fake they have to bounce back...that's a "buhzillion" autogenerated nessages that the servers have to route again.
Congrats, we've just doubled the spamload.
Phoenix
-- Wiccan Army, 13th Airborne Division "We will not fly silently into the night"
>>at the Center for Democracy & Technology,
>>202-637-9800, ari@cdt.org.
>hmm.. I'll be interested to know how
>much spam that generates for him/her....
First note that Ari is probably male... and then...
RTFA !!
Ari heavily insists on encoding your email adress in crude HTML ASCII codes which robots don't detect yet (matter of weeks I guess - I guess not everybody on slashdot is an angel, as everywhere) but are perfectly human readable. The guy actually used the method, so it looks
on screen : ari@cdt.orgg
view source :
please note I forged his address so that robots don't harvest it here on slashdot, which parent post ignorantly forgot to do
O.
So let's beat them with their own weapons. Sugarplum is a WWW spambot poisoner feeding them with lots of email addresses which are faked, spam traps or addresses of known spammers and spamfriendly people - collected from spam emails or experience with spamfriendly ISPs. As a motivation, a lot of spamfriendly institutions don't see the problem "spam" as serious until they get a really high dosis of unwanted email per day.
My Sugarplum installation gets scanned really often. At the moment, the French superspammer Artmarket is coming back almost every day, harvesting my Sugarplum site and dumping about 100 spams each time into my spam trap box. My ratio between spam trap and spammer is 1:50, so each time Artmarket will spam about 5000 spammers.
Some German dialer operators who had a really big spam problem half a year ago are actually trying to hire people to fight against spam they are getting on their own - no wonder, their domains were about the first to be spambaited massively in Usenet newsgroups and on WWW sites. Some 419 scam gangs who spamvertise their email addresses have to change them about once a month, as they will get flooded with "counterspam", and what is worse, they rely on the availability of their email addresses to get replies from their victims - that's why they spam.
Works for me, anyhow.
I think a much better, and more truth revealing, study would be to find out the statistics on the spammer's own email habits.
Among others, some simple stats:
* How many email accounts do they own
* How much spammer do they receive per day
* How much of it do they actually bother to read and not just immediately delete
* How often do they use bogus email address when filling out forms
But, more importantly:
* What have they done to opt-out of receiving mail from lists
* What filters/blocks do they implement and why when it is such a good legitimate business
* What are their opinions on spammers vs. telemarketers
WPoison
Lacking <sarcasm> tags,
Several years ago I set up a spam account, spamforchris@yahoo.com. Everytime that I register for a web site, register software, subscribe to a newsletter, etc, I use the spam account. And when I give a friend or family member my personal email adress, I ask that they do not include me in their chain-emails. I have had less than 20 spam messages in any of my real email accounts since college.
Moral: If you are careless with your email adress, expect spam.
Simple people talk of people, better people talk of events, great people talk of ideas.
So according to the article, HTML-encoding the email addresses on your web pages can keep them from being harvested by spammers. E-Cloaker is a nice little free utility to do this for you.
Most address grabber tools do not write their own web browser/html interpreter. They simply link using IE's APIs, so anything IE can decode / unobfuscate, so can most email harvesters. The best solution is to not post email addresses on the web.