That's why I feel the next step should be creating filters that automatically follow the links. Let's DDOS the web sites. This costs the spammer more money in bandwidth (it's not free...Of course, this wouldn't stop joe jobs.:(
That is really the answer, although not as a DDoS as you think of it. If you flood a spam website with requests you cross the line and become a problem yourself. OTOH if you and a million other people merely accept the explicit invitation in spam to visit the URLs, you are doing nothing more than they asked. If you are compulsive and click on every link in every page on their sites, that can't be helped -- it's inherent in publicizing URLs that anyone and everyone may visit and may click away to their heart's content.
Now... what is the difference between you sitting there clicking on all those things and having a program do it for you? Nothing, really, as long as your program appears to be a browser and properly manages its "Referer" and other headers. In fact, there are offline browsing tools such as WebWhacker that allow you to download entire websites so you can view them later at your convenience. One could even make the case that downloading an entire website *once* is far less abusive of the web server than what ordinary people do in revisiting pages many times and navigating back and forth in less than fully systematic ways.
Then... what if it becomes a popular notion to use such tools to download spam websites? Just once per received URL, mind you, and at a gentle traffic rate to avoid any suggestion of DoS?
The effect, even if only 1% or fewer of Netizens were to do it, would be to surge the monthly bandwidth requirements of the spam websites and eat into their profit margins, which already have to be thin and very subject to being wiped out.
spamsitemgr: How's it going?
spamsitegeek: Traffic is way up with that last spam campaign and we're three levels higher in the bandwidth cost tiers. I had to add memory and bump up to a faster CPU in the server after it crashed a half dozen times. The colocation company wants to know what's going on.
spamsitemgr: How about sales?
spamsitegeek: That's what I don't understand. Sales are flat -- no spike at all.
spamsitemgr: That's bad. If we're getting more traffic and paying more for bandwidth but sales are flat, we're in trouble. What do you see in the web logs?
spamsitegeek: Nothing. Nothing at all. All the traffic seems completely normal, except that a lot more of it fails to result in closed orders than last month.
spamsitemgr: Who did we use for this last spam campaign?"
spamsitegeek: Pete Pondscum.
spamsitemgr: Drop him. Something's wrong with the referrals he's generating. Try somebody else.
-- a month later --
spamsitegeek: Bad news. I tried four other spammers and the results are always the same: we get a huge spike in traffic but sales remain flat. All of them used to generate results for us -- Sam Scumbag, Penny Pusbrain, Carl Crotchrot and Sybil Syphillis -- but now all they do is generate huge traffic spikes with no new sales. And each traffic surge is larger and longer than the one before. Our hosting bill has doubled.
spamsitemgr: OK, hold off on any new spam. This is actually costing us money. Ack! We may actually have to promote the website by honest means! Or shut down. This is terrible!
The key here is the headscratchingly unfathomable nature of the effect. This is much more effective than trying to turn a server into a smoking pit and thereby making the nature of the problem all too clear to the spam website operator. The idea is for the spam website operator to remain completely in the dark except for noticing that his bottom line no longer looks so good.
As for Joe jobs, if you're an individual all you have to do is look at the URLs before downloading
I like it when people use bad grammer. It make it easier to spot the things that I probably wouldn't want to read anyway.
You're perfectly welcome to use alittle or alot, but let me tell you, my first impression is (Score: -2, Can't Write).
I agree. That would even be a valuable addition to the slashdot Preferences. As it is I often have to waste time reading a line or two into a post before determining that the poster is unqualified to express an opinion. A Preference option to downgrade such posts would be very handy.
And when did ignorance and illiteracy become so well tolerated, even fashionable? It's one of the more weird aspects of Internet culture.
Most email clients with HTML rendering capability do not distinguish between images on servers and images embedded in the email message. The former, when retrieved for rendering, and if "bugged" with a code in their URLs, confirm that your email address is a "live one," guaranteeing you even more spam.
Most email clients will allow you click on an.exe,.com,.bat,.scr etc. attached to email if you are stupid enough to do so.
Pegasus Mail, by default, does not retrieve images from servers. In no case will it allow you to open or launch an executable of any type. If you want to run an attached executable you have to save it as a file and then run the file.
This means that Pegasus is not only a good deal more idiot proof, but that you can safely view messages that might be spam without sending HTTP confirmation to anyone that you are doing so.
...and in the same way, if content-based Bayesian filters that fight back become equally ubiquitous, that would serve as an even stronger deterrent, without the same kind of collateral damage that accompanies blacklisting.
Filters that Fight Back (FFB) do not need to be ubiquitous to have a devastating effect on spam. There just have to be enough of them to increase the bandwidth costs of the websites that are beneficiaries of spam to take the profit margin out of the spam. And the beauty of it is that we don't have to pay any attention at all to the senders of the spam, nor to the IP addresses from which it is sent, nor to any measures to trace, locate or prosecute the spammers. When spam results in gentle, soft, but very large, very widely distributed waves of traffic to the beneficiary websites without any increase in sales but otherwise indistinguishable from the traffic they desire, it will increase their bandwidth costs and decrease their profit margins while leaving them little or no way of dealing with it.
Porn really does drive innovation in some ways... I worked at JDS Uniphase (a fiber optics company, if you didn't know) two years ago, and the joke was that porn and mp3s drove their business, and Britney Spears got in on both counts.
You (or your employer) had a peculiar definition of "porn" to have included Britney Spears. Or maybe you didn't get out much.
You're right, though, about driving technology, but it's not just "in some ways." Sex has pushed a number of breakthrough technologies to levels of viability in a very big way.
Polaroid instant cameras were a cute idea, but what really made them stand up in the marketplace was the excitement of all the people who wanted to take naughty pictures at home, the kinds of pictures one wouldn't entrust to a photo developer.
Videotape, the next handy job for porn, would also have been very slow to grow had it not been for the, um, home movie market.
The explosion of modems in the 1980s was helped significantly by the BBS marketplace. Almost all the multiline BBSs were "singles" oriented, and even some otherwise very dull single-line BBSs had very hot "door" games. There was an Adventure-type game that involved a mansion full of kinky rooms and objects, in which multiple players interacted with each other over the course of successive BBS connections and won the privilege of sending each other erotic text messages.
The rapid rise of webcam hardware and software in the late 1990s can also be attributed to the same fevered market but transposed to the Internet, where "Reach out and touch someone" took on a new virtual meaning.
Digital still cameras, although faster to find more ordinary existing applications, no doubt also firmed up a lot faster due to the heavy breathing factor.
Both videotape and digital still camera technologies of course lubricated the commercial porn industry by reducing costs and by providing instant results without waiting for chemical development and printing. And on the retail side, many early video stores were really just well-clothed fronts for the Adults Only rental room in the back that paid the bills.
One can only wonder what new technologies will be thrust into the breathless, receptive marketplace, to be warmly received by closet kinks and passionately showered with bucks to fund the deployment and deep market penetration that bring prices down and provide satisfaction for the ordinary consumer.
Income taxes are progressive and sales taxes are the most regressive of taxes.
"Progressive tax:" left-wing codeword for a stupidly, destructively graduated tax that provides a strong disincentive to engage in the activity being taxed (such as succeed and make money in the case of the graduated income tax).
"Regressive tax:" left-wing codeword for a sane, uniform tax that applies at the same rate to the same things for everyone, as specified for taxes in the U.S. Constitution.
Note: Real people do not use the word "progressive" nor refer to themselves or their ideas as "progressive." This is an affectation of the extreme, wacko left wing -- the same people who would like to tax you at 100% so they could control how and when you get some of it back with gummint strings attached.
Further note: Even sales taxes in the U.S. are somewhat graduated, since most States do not tax whatever falls within their definition of basic subsistence items, such as unprocessed food, clothing, etc. It is only the definitions that have fallen behind the times (or been whacked out of shape by slimeball politicians). The original concept was that sales tax should not burden the basic things necessary to sustain life.
That same concept also underlies the per-person "exemption" in the U.S. income tax system, although it has become largely meaningless thanks to gummint-produced inflation and failure to update the law to keep pace. In the original 1913 Income Tax the personal execption was $3,000 for single and $4,000 for a married couple. Those numbers would be about $56,000 and $74,800 today.
The 1913 tax rate began at 1% from $3,000 to $20,000 of net income and topped out at 6% on the portion of net income over $500,000. If the promises of the proponents of the 1913 income tax had not been almost immediately broken, we would today not pay anything at all on net incomes under $56,000 and would pay only 1% from there up to about $374,000. The maximum rate of 6% would be paid on the portion of net incomes over $9,353,000
Among the many problems with the so-called "progressive," graduated income tax is that it penalizes upward mobility. By imposing more punitive tax rates as income rises, the graduated income tax makes it more difficult for people to realize the American dream of upward mobility. So much for the "progressives" being the friends of the working man. They want to take away everyone's dreams and goals and reduce us all to some low common denominator.
Also, the present "progressive" U.S. income tax system taxes income that is saved or invested as many as four times over, and political "progressives" would give you more of that if they could.
Any time anyone approaches you talking about how great Value Added Tax is or talks about how an income tax is "progressive" or how a sales tax is "regressive", run screaming in the opposite direction, because that person is your worst nightmare come to life.
I suppose they'll enforce sales tax on us by claiming that having membership (a userID) with a website you purchased from constitutes being in the same STATE/State/state as you make the purchase?
Maybe, but that would upset the entire conceptual foundation of sales/use taxes as they presently exist. It's not the purchase or the location where the purchase takes place that gives a State its sales/use tax jurisdiction -- it's the point of delivery and use.
There's a huge body of existing case law that rests on this foundation. There may also be Constitutional problems with any kind of national sales tax, which is probably why we don't already have one.
0) Both empirical data and theoretical elaboration seem to concurr on that an Added Value tax would be the most efficient kind of taxation.
Surely you're making a sick joke. Value Added Tax as implemented in Europe is a horrible, loathsome thing that sucks about 20% out of virtually all retail transactions. Here in the U.S. no State has a sales tax rate higher than about 8-point-something percent, and each State has its own definitions of exempt items, usually basic subsistence things such as unprocessed food, clothing, etc. A few States have no sales tax at all.
We need to follow European models about as much as we need extra holes in our heads.
It bans "use tax". It prevents states from taxing ISPs based on people simply connecting to the internet, like they do now for phone lines.
Uh, you seem to have invented your own definition of "use tax." The real definition has nothing to do with merely "using something" like phone lines or the Internet. "Use tax" is the flip side of sales tax and applies when you buy out of State and bring merchandise into or receive merchandise in your home State for your use.
Sales tax is collected by the seller on behalf of a State that has jurisdiction over both the seller and the transaction, i.e. where the Seller fits the State's definition of "doing business" and the delivery takes place in that State.
Use tax is collected by the State wherein the "use" takes place when sales tax is not applicable. Use tax is based on the jurisdiction of a State over the user of the merchandise.
The sales/use tax laws are written in such a way that only one State will tax a given transaction, either by sales tax or by use tax.
You can see this very clearly in cases of costly, registered things like vehicles. If you buy a car out of state and arrange to take delivery of it in your home State (or simply bring it into your State), the seller may not be required to collect sales tax where you bought it but your home state will collect the "use" form of sales/use tax when you register it. Unfortunately in the case of vehicles a new sales or use tax will be paid every time a vehicle is registered by another owner.
Use tax is inherently difficult to assess and collect, but many States do conduct audits of larger taxable entities within their jurisdictions and collect use tax on merchandise purchased from sellers in other States.
Daytimer, the well-known manufacturer and mail-order seller of fine business diary/scheduler and related products, voluntarily bought into this big time, decades ago, and began collecting "sales" tax on all mail order sales to customers in all States even though the company operated only in, I think, Pennsylvania, which was why I stopped doing business with them.
In fact, one thing Paul Graham didn't address in his latest pieces is the risk of following links that can have embedded identifying information in them. So, yes, you slam their server but you may have inadvertently just told them WHO you are by requesting a certain URL.
That's why I don't look at it as a DDoS. The objective should be to affect the overall costs of running spam websites, not to try to clobber them. Let their new higher-tiered bandwidth bill send them the message.
Spider the beneficiary website a few times, looking exactly and fully like a browser, with all the expected headers like "Referer:" and "User-agent:" perhaps filled with real but varying values. In spidering, the "Referer" should follow one step behind the current request. Timing can be controlled as well. The idea is to look exactly like a human at a browser, and to actually download the results with reformed links for local offline viewing. By limiting the download rate and the number of times each file is downloaded, and by saving the spams and the downloaded websites for a reasonable time, an individual will have a defense against an allegation of DoS. In fact, things like WebWhacker were and are marketed as tools for offline browsing by downloading entire Websites. Doing so can actually be *more* gentle on the web servers than human surfing often is, but unproductive surfing or downloading by large numbers of spam recipients will radically alter the economics of running spam websites.
So why not feed the entire mail source to the filters, as well as the contents of the retrieved data?
Graham does use the entire email message. He explains why in his articles on Bayesian filtering. The apparency that email headers contain a lot of stuff that looks useless is misleading. Bayesian filtering can make excellent use of strings in the headers. You don't have to look at or understand the gobbldegook -- you just say, "This message is spam" or "This message is not spam," and the Bayesian filter -- done Graham's way -- then assigns values to the tokens found in the message.
In the deluge of alleged Bayesian filtering that has hit the market since Graham's first article, it's easy to overlook the fact that a lot of implementers are complete idiots who think they are smarter than Graham.
The Bayesian part is only one element of what Graham proposed -- it's the mathematical part that computes a probability of "spam" from a list of hits against a weighted list. Just as important are A) how the input text is tokenized, B) how many and which hits are selected for use in the Bayesian computation, and C) how skew or bias is introduced in multiple places in the process to reduce both false positives and false negatives. Graham discusses all of this in the light of quite a lot of his own research. Pity that not everyone who wants to tout their "Bayesian filtering" bothers to give Graham credit for knowing what he writes about by following his whole prescription.
Thus it is quite easy to employ Bayesian computation of the probability of spam in a context of poor tokenizing, skipped headers, ineffective weighting or skewing, etc., and get very poor results that don't stand the test of time and new spam. It's a good bet that this is exactly what a lot of implementations represent -- Bayesian computation of poorly selected and biased data -- either because the implementers are stupid, or because they erroneously believe they are smarter than Graham is (or is that redundant?).
Remember SCO, VeriSlime, SunnComm and others, and think "Bayesian" as a buzzword that the suits believe will convert to cash. Then ask yourself how many of the sudden Bayesian filtering appearances are from people who actually know what they're doing.
The various reports of spammers "getting around the Bayesian filters" that have surfaced since the appearance of Graham's first article are completely inconsistent with Graham's own results, virtually proving that the people implementing a lot of this stuff are not RTFA.
"The author, in my opinion, does not fully appreciate the ramifications of his scheme."
The author, Paul Graham, in my opinion appreciates and understands a great deal more about this and a host of other things than do you. With all due respect. FYVM.
"But one has to consider the possibility (and, I argue, probability) that this cunning plan will not convince spammers to honor the desires of the, um, spammees."
It doesn't matter. The spammers and their clients will have two choices: implement working unsubcribe links, preferably those that are activated by the spidering FFBs, or see their operating margins plummet to completely unprofitability. Frankly, I don't give a rat's ass which they choose.
"...would probably result in spammers ignoring the costs of spidering on their servers..."
They can't. They may hijack bandwidth to send the spam out, but they have to pay for the bandwidth to get the resulting website visits. Those who use hijacked computers as temporary web servers will end up in prison pretty quickly, because they can't hide their DNS registrations fully or forever, and they can't hide the click trail or the money trail.
"...it would result in a non-trivial increase in global bandwidth usage..."
You're dreaming. Not only are there not that many spam websites in the context of the global Internet, many of them in fact converge onto a small number of the same IP addresses. If you took the trouble to do IP lookups on some of the hostnames in URLs in spam you receive, you would already know this.
"I grew up with the Usenet warning that my posts would be relayed through hundreds of thousands of systems..."
Yeah, and people who grew up with the ARPANET used to whine at people who had sigs that were "too large and wasting bandwidth." It's a new world. There is a lot more bandwidth out there than you realize. At any point in time, nearby cities now have more bandwidth connecting them than the entire planet used to have just a very few years before.
"...return to that way of thinking and not the current wisdom that $20/month pays for anything we'd care to dump into or pull out of Earth's biggest LAN."
Ah, but it does pay for all the bandwidth we need or want, and has been doing so very nicely for about 10 years of public use through exponential growth that no one anticipated. And at the same time, every year, almost every month, fat bandwidth gets cheaper and cheaper and cheaper. Some day your $20/mo will bring you 640 Gbits/sec on fiber and even streaming HDTV won't use it all up. Then you'll be whining that spams will have entire feature films file-attached to them to entice you to visit their 3D holographic websites.
Oops! ([R]eading [TFA] again myself), I see that Graham revised the article. Mea culpa.
Either way, though, whitelist or blacklist, I think that's Difficult (TM). I think that the way it will evolve will end up as a two-level Bayesian process of classification: one for the spam and the other for the websites.
To train up the spam filter you will give it a bunch of messages you consider to be spam and a bunch of messages you consider to be nonspam.
To train up the FFB tool, you will let it retrieve web pages pointed to by the spam you have already identified and present them to you for spam/nonspam webpage classification (with no dangerous JavaScript, Java, etc. active) You will simply say "not spam" to any websites against which you don't wish to Fight Back, for whatever reason, good or bad, correct or incorrect. There's no harm in not Fighting Back against a guilty site, only in too many people Fighting Back against an innocent site. You will, of course, also be able to whitelist completely any site(s) you wish.
Then, your Bayesian email filter will segregate new incoming spam in the normal manner, and of the URLs contained in the spam, your Bayesian FFB will download (and eventually throw away) only those sites that can reliably be identified as true spam beneficiary sites.
Malicious spam that seeks to kick off a DDoS attack against innocent websites would thus have little or no effect.
So instead of two bodies of text -- spam email and nonspam email -- there would be four, to include spam web pages and nonspam web pages. This is because the tokens that indicate "spamminess" or "nonspamminess" in spam email will not necessarily be the same ones that indicate the same things in web pages, nor will they likely have the same weights. When (not "if") FFB gets implemented in this manner and with reasonable integration so the average user doesn't have to jump through a lot of hoops, it will be easy and effective.
The FFB tool would download no more than a moderately obsessive/compulsive surfer would, and at a modest rate, looking exactly like a web browser in all respects. It will be the numbers of FFBs in service that will have the desired effect, not the download quantity or rate of any particular one.
Then someone will have to implement a tool that places phone calls to the 800 numbers in non-URL spams and sends snail mail to the physical addresses in the last category of stone age spam. Oh, and maybe Tomahawk missiles to Nigeria for the "419" spam.
"The obvious problem with this is that it provides the senders with more information."
Even if the URLs used are not cleaned up first, this is not necessarily a bad thing. It causes the spammers to send more spam to the very email addresses that cause even more unprofitable load on the beneficiary web servers.
"Using URLs that are unique per recipient..."
That's not as common as you think. The most common identifier in a spam URL is the ID of the spammer/agent, so he can get credit for any sale that results from your visit to the benefiary website.
The next most common identifiers are very lame inclusions of your email address in the URL, very easily modified by a FFB tool. Again, though, the desired FFB effect may actually be multiplied by leaving such identifiers intact, by causing the spammers to concentrate on the email addresses that hurt them the most.
Running way in last place are URLs that contain other encodings, which may be a munge of your email address or, more likely, a database key that allows them to link back to your email address and the spam they sent you. This last, though, requires a high degree of integration between the spam sender and the beneficiary website operator, which also implies a stable and very advanced IT environment, which is not typical of spam/website operations.
If you were running FFB, you could opt to munge identifying information or not, to regulate how much spam you get and thus how much Fighting Back you do. If you don't get enough spam, you could allow your FFB to use original URLs with whatever identifying information they contain. As your spam level rises, you could tell your FFB tool to munge the identifying information.
"In particular, you may be telling the spammer that you are more likely to see their message if they work hard at getting it through this particular filter (say, by not using a URL, or slightly mangling the URL)."
No, you will be diluting the information the spammer mistakenly thinks he can infer from the website visit, since you will not actually be viewing the pages (except maybe once, for Bayesian classification of the website) and there will not be a corresponding increase in completed sales.
Rather than slipping past the filters, mangling the URLs makes Bayesian filtering nearly 100% effective. Mangling only works with very stupid keyword filters. It's surprising how little the spammer programmers seem to comprehend this, since they keep trying more and more complicated mangling, only providing more and more Bayesian markers with 100% spam probability.
"If you only follow the link programatically once, and everyone else did as well, you allow the malicious to perform a DDoS an innocent server."
RTFA.
Use your brain. If someone has the desire to punish spamsites, it's trivial to review the URLs to be visited and delete any that one doesn't wish to visit because it appears to be an innocent victim of malicious spamming.
"It is unlikely that the blacklist could be maintained properly."
RTFA. Graham suggests a whitelist, not a blacklist. I think that's presently a Difficult Thing (TM), and have suggested that the upcoming tools for doing this could incorporate the same good/bad Bayesian classification for the websites that the present tools use for the email. As with training the system to identify the spam emails, there would be an initial burden of viewing and classifying, after which the system would run pretty much by itself with only occasional update training.
If the filters simply follow whatever links are in the message, and the spammers include a link with a unique tracking ID (don't they already do this sometimes?), you'd be telling them your email address was "live" just as surely as if you sent them an unsubscribe request.
The point is to increase their traffic and thus their costs, without increasing their sales.
If the spamsites make the grave mistake of increasing spam to email addresses that hammer their websites, they will just multiply the negative effect on their own profit margins. It's a case in which giving them exactly what they want (short of a purchase) delivers a big hit to their profit margins.
Tools will optionally munge your email address if it is in the URL.
Tools will optionally munge many other encodings that would allow such correlations.
Right now, very little spam contains non-visible encodings of your email address or other identifier to correlate back to the spam that was sent to you. Most of the encodings are to give credit to the spam contractor/agent who sent the messages, and he will eventually be hurt by the plummeting percentage of sales that result from website visits identifying him as the originating contractor/agent.
(5) is not likely to change anytime soon because the spam sender is often independent of the spam beneficiary and their systems are not integrated. Website operators who send their own spam could and some probably do encode an identifier that correlates back to your email address, but even that implies a much more integrated and stable IT system than most spammers are capable of. Also, see (2) above.
Tools will optionally alter the agent identifiers, making them unreliable.
Or, it could very well be that I'm misunderstanding the whole thing...
No, you're probably only missing the little part about hits being useless if no revenue results from them.
The spammers want traffic... let's give 'em traffic. Even if a few million people did this manually with browsers (with Java and JavaScript and autoupdate etc. disabled), it would have the desired effect. But that's a lot of time and effort to expect from a large number of people. Have no doubt: tools will appear.
Load balance and make it harder for the DoS to kill that Cable Modem / DSL connection.
Methinks you mayhap miss the point. The point is not to smoke the servers or their connections; the point is to increase their traffic costs without increasing their sales. A million times "gentle" will do it just as effectively as a hundred times "massive." More effectively, actually, since the spamsite operators will have no recourse against anyone.
Much of the spam these days is being sent by trojans running on unsuspecting computers...
Irrelevant. We don't care where the spam comes from, nor who sends it. We only care about reducing the profit margins of the beneficiaries of the spam -- the sales websites to which the spam tries to attract traffic.
...and many of the web sites pointed to in spam are on systems whose owners have no idea their machines are being abused.
I doubt there is as much of that as you claim, but it still doesn't matter, as you yourself point out in a different context:
If the spammer is a trojan running on an innocent's machine, it still gets cut, with the ISP telling the user they'll be reconnected after they fix their machine.
And finally:
We need more mechanisms in place to distribute that information and block spammers.
Maybe, but that is inherently difficult and will require worldwide coordination and widespread implementation and cooperation. Punishing spam beneficiary websites is relatively simple and can be implemented by anyone willing and able to do it. With the release of tools that are sure to be coming soon, the numbers of the able will increase vastly, leaving only willingness as a requirement.
I remember a time when auto-responders were popular...
This has nothing to do with sending or replying to email. RTFA.
I know this is a different technology...
At least you got one thing more or less right. It's a different methodology than anything that has been tried before, one made possible by the classification and segregation of spam, mostly thanks to Bayesian filtering.
It doesn't matter. The spammer already has your email address. "So he works harder to get through your filters" is a misunderstanding of how spam and spammers operate. The spammer doesn't care about you, individually, and will expend no additional effort to get through your filters. Moreover, almost all the attempts to get through Bayesian filters fail utterly and simply make the spam even more easily identified. RTF related articles on Paul Graham's website.
I don't even use Bayesian filtering yet, and not a single spam message gets through my dumber filters. Nor have I had to maintain the filters in a long, long time.
Send me spam, see your website traffic increase without additional sales. Send me more spam, see more traffic. It's that simple. Multiply it by even just a million users -- a mere fraction of 1% of Internet users -- and the spam websites will be smoking craters of melted servers or will cost so much to operate that they won't be paid for by the pitifully low sales they generate.
That's standard English when constructing derivatives. If the "m" were not doubled, the result would be pronounced "spayminess" due to the vowel that would follow the single "m."
When a spammer sends you a URL, it is an explicit invitation to click on it and visit the beneficiary website. Even if the target website is unrelated to the spam and spammer and is just an innocent victim, the responsibility lies with the spammer, not with anyone clicking on the links in the spam (more on innocent victims below).
When you click on any link, this causes your browser to download the target HTML page and, usually, all referenced components of that page. The key word is download. The page usually contains additional links, which are also explicit invitations to visit other pages and download those pages and their referenced components.
There is no functional difference between clicking on links in your browser or otherwise downloading the pages and other files in response to the explicit invitation. Actually, a strong case can be made that automated downloading of a site is less demanding of the server, since human surfers often repeatedly view the same pages and repeatedly navigate back and forth.
All Paul Graham has suggested is that large numbers of spam victims use the results of their Bayesian filters to accept the explicit invitations and "visit" confirmed spam target websites. An effect will be to increase the bandwidth and server capacity costs of the spam beneficiaries without any one victim creating more load than a slightly obsessive / compulsive surfer-clicker would create.
All this would increase the costs of running a spam/website system without increasing completed sales, thus reducing margins. Although the spam senders effectively hijack the bandwidth of other people, you can bet that almost all the beneficiary spam websites pay tiered rates for the bandwidth they use. When they start having to pay for gigabytes and terabytes of traffic, the profit margin on spam will plummet.
Note: It doesn't matter at all whether the spam is sent by the operators of the websites or by their contract spam agents. All that matters is the profit margin at the point of sale -- the beneficiary website.
Note: It doesn't matter that downloading entire spam beneficiary websites may also confirm that the spam was sent to a working email address. The more spam they send to such recipients, the more traffic they will have to pay for on their web servers. There are way more of us than there are of them.
Paul Graham already pointed out that some mechanism such as a whitelist would have to be used to protect against punishing innocent websites whose URLs might be included in spam to descredit Filters That Fight Back. My own take on that is that having a reliable whitelist is a Difficult Problem. A better solution might be to employ another level of Bayesian filtering to classify the web pages pointed to by spam on an individual user basis. Like the classification of the spam itself, this would mostly be a one-time process, with occasional followup as the spam beneficiaries try new tricks on their web pages just as the spammers try new tricks in the spam they send out.
Meanwhile, though, no matter how you segregate the spam for use in a Filters That Fight Back response, it is trivial for an individual to scan a list of extracted URLs and manually exclude seemingly innocent ones.
KornShell scripts for AIX that implement FFB are being readied now for distribution. Write me if you're interested.
berzerke wrote:
That is really the answer, although not as a DDoS as you think of it. If you flood a spam website with requests you cross the line and become a problem yourself. OTOH if you and a million other people merely accept the explicit invitation in spam to visit the URLs, you are doing nothing more than they asked. If you are compulsive and click on every link in every page on their sites, that can't be helped -- it's inherent in publicizing URLs that anyone and everyone may visit and may click away to their heart's content.
Now... what is the difference between you sitting there clicking on all those things and having a program do it for you? Nothing, really, as long as your program appears to be a browser and properly manages its "Referer" and other headers. In fact, there are offline browsing tools such as WebWhacker that allow you to download entire websites so you can view them later at your convenience. One could even make the case that downloading an entire website *once* is far less abusive of the web server than what ordinary people do in revisiting pages many times and navigating back and forth in less than fully systematic ways.
Then... what if it becomes a popular notion to use such tools to download spam websites? Just once per received URL, mind you, and at a gentle traffic rate to avoid any suggestion of DoS?
The effect, even if only 1% or fewer of Netizens were to do it, would be to surge the monthly bandwidth requirements of the spam websites and eat into their profit margins, which already have to be thin and very subject to being wiped out.
spamsitemgr: How's it going?
spamsitegeek: Traffic is way up with that last spam campaign and we're three levels higher in the bandwidth cost tiers. I had to add memory and bump up to a faster CPU in the server after it crashed a half dozen times. The colocation company wants to know what's going on.
spamsitemgr: How about sales?
spamsitegeek: That's what I don't understand. Sales are flat -- no spike at all.
spamsitemgr: That's bad. If we're getting more traffic and paying more for bandwidth but sales are flat, we're in trouble. What do you see in the web logs?
spamsitegeek: Nothing. Nothing at all. All the traffic seems completely normal, except that a lot more of it fails to result in closed orders than last month.
spamsitemgr: Who did we use for this last spam campaign?"
spamsitegeek: Pete Pondscum.
spamsitemgr: Drop him. Something's wrong with the referrals he's generating. Try somebody else.
-- a month later --
spamsitegeek: Bad news. I tried four other spammers and the results are always the same: we get a huge spike in traffic but sales remain flat. All of them used to generate results for us -- Sam Scumbag, Penny Pusbrain, Carl Crotchrot and Sybil Syphillis -- but now all they do is generate huge traffic spikes with no new sales. And each traffic surge is larger and longer than the one before. Our hosting bill has doubled.
spamsitemgr: OK, hold off on any new spam. This is actually costing us money. Ack! We may actually have to promote the website by honest means! Or shut down. This is terrible!
The key here is the headscratchingly unfathomable nature of the effect. This is much more effective than trying to turn a server into a smoking pit and thereby making the nature of the problem all too clear to the spam website operator. The idea is for the spam website operator to remain completely in the dark except for noticing that his bottom line no longer looks so good.
As for Joe jobs, if you're an individual all you have to do is look at the URLs before downloading
angedinoir wrote:
I agree. That would even be a valuable addition to the slashdot Preferences. As it is I often have to waste time reading a line or two into a post before determining that the poster is unqualified to express an opinion. A Preference option to downgrade such posts would be very handy.
And when did ignorance and illiteracy become so well tolerated, even fashionable? It's one of the more weird aspects of Internet culture.
Most email clients with HTML rendering capability do not distinguish between images on servers and images embedded in the email message. The former, when retrieved for rendering, and if "bugged" with a code in their URLs, confirm that your email address is a "live one," guaranteeing you even more spam.
Most email clients will allow you click on an .exe, .com, .bat, .scr etc. attached to email if you are stupid enough to do so.
Pegasus Mail, by default, does not retrieve images from servers. In no case will it allow you to open or launch an executable of any type. If you want to run an attached executable you have to save it as a file and then run the file.
This means that Pegasus is not only a good deal more idiot proof, but that you can safely view messages that might be spam without sending HTTP confirmation to anyone that you are doing so.
Pegasus is also free.
AlexMax2742 wrote:
Here you go. These just in, received in SPAM this morning:
spamURL
spamURL
spamURL
spamURL
spamURL
spamURL
spamURL
spamURL
dido wrote:
Filters that Fight Back (FFB) do not need to be ubiquitous to have a devastating effect on spam. There just have to be enough of them to increase the bandwidth costs of the websites that are beneficiaries of spam to take the profit margin out of the spam. And the beauty of it is that we don't have to pay any attention at all to the senders of the spam, nor to the IP addresses from which it is sent, nor to any measures to trace, locate or prosecute the spammers. When spam results in gentle, soft, but very large, very widely distributed waves of traffic to the beneficiary websites without any increase in sales but otherwise indistinguishable from the traffic they desire, it will increase their bandwidth costs and decrease their profit margins while leaving them little or no way of dealing with it.
gid13 wrote:
You (or your employer) had a peculiar definition of "porn" to have included Britney Spears. Or maybe you didn't get out much.
You're right, though, about driving technology, but it's not just "in some ways." Sex has pushed a number of breakthrough technologies to levels of viability in a very big way.
Polaroid instant cameras were a cute idea, but what really made them stand up in the marketplace was the excitement of all the people who wanted to take naughty pictures at home, the kinds of pictures one wouldn't entrust to a photo developer.
Videotape, the next handy job for porn, would also have been very slow to grow had it not been for the, um, home movie market.
The explosion of modems in the 1980s was helped significantly by the BBS marketplace. Almost all the multiline BBSs were "singles" oriented, and even some otherwise very dull single-line BBSs had very hot "door" games. There was an Adventure-type game that involved a mansion full of kinky rooms and objects, in which multiple players interacted with each other over the course of successive BBS connections and won the privilege of sending each other erotic text messages.
The rapid rise of webcam hardware and software in the late 1990s can also be attributed to the same fevered market but transposed to the Internet, where "Reach out and touch someone" took on a new virtual meaning.
Digital still cameras, although faster to find more ordinary existing applications, no doubt also firmed up a lot faster due to the heavy breathing factor.
Both videotape and digital still camera technologies of course lubricated the commercial porn industry by reducing costs and by providing instant results without waiting for chemical development and printing. And on the retail side, many early video stores were really just well-clothed fronts for the Adults Only rental room in the back that paid the bills.
One can only wonder what new technologies will be thrust into the breathless, receptive marketplace, to be warmly received by closet kinks and passionately showered with bucks to fund the deployment and deep market penetration that bring prices down and provide satisfaction for the ordinary consumer.
Pardon me while I take a cold shower...
geoffspear wrote
"Progressive tax:" left-wing codeword for a stupidly, destructively graduated tax that provides a strong disincentive to engage in the activity being taxed (such as succeed and make money in the case of the graduated income tax).
"Regressive tax:" left-wing codeword for a sane, uniform tax that applies at the same rate to the same things for everyone, as specified for taxes in the U.S. Constitution.
Note: Real people do not use the word "progressive" nor refer to themselves or their ideas as "progressive." This is an affectation of the extreme, wacko left wing -- the same people who would like to tax you at 100% so they could control how and when you get some of it back with gummint strings attached.
Further note: Even sales taxes in the U.S. are somewhat graduated, since most States do not tax whatever falls within their definition of basic subsistence items, such as unprocessed food, clothing, etc. It is only the definitions that have fallen behind the times (or been whacked out of shape by slimeball politicians). The original concept was that sales tax should not burden the basic things necessary to sustain life.
That same concept also underlies the per-person "exemption" in the U.S. income tax system, although it has become largely meaningless thanks to gummint-produced inflation and failure to update the law to keep pace. In the original 1913 Income Tax the personal execption was $3,000 for single and $4,000 for a married couple. Those numbers would be about $56,000 and $74,800 today.
The 1913 tax rate began at 1% from $3,000 to $20,000 of net income and topped out at 6% on the portion of net income over $500,000. If the promises of the proponents of the 1913 income tax had not been almost immediately broken, we would today not pay anything at all on net incomes under $56,000 and would pay only 1% from there up to about $374,000. The maximum rate of 6% would be paid on the portion of net incomes over $9,353,000
Among the many problems with the so-called "progressive," graduated income tax is that it penalizes upward mobility. By imposing more punitive tax rates as income rises, the graduated income tax makes it more difficult for people to realize the American dream of upward mobility. So much for the "progressives" being the friends of the working man. They want to take away everyone's dreams and goals and reduce us all to some low common denominator.
Also, the present "progressive" U.S. income tax system taxes income that is saved or invested as many as four times over, and political "progressives" would give you more of that if they could.
Any time anyone approaches you talking about how great Value Added Tax is or talks about how an income tax is "progressive" or how a sales tax is "regressive", run screaming in the opposite direction, because that person is your worst nightmare come to life.
zealotasd wrote:
Maybe, but that would upset the entire conceptual foundation of sales/use taxes as they presently exist. It's not the purchase or the location where the purchase takes place that gives a State its sales/use tax jurisdiction -- it's the point of delivery and use.
There's a huge body of existing case law that rests on this foundation. There may also be Constitutional problems with any kind of national sales tax, which is probably why we don't already have one.
Knights who say 'INT wrote:
Surely you're making a sick joke. Value Added Tax as implemented in Europe is a horrible, loathsome thing that sucks about 20% out of virtually all retail transactions. Here in the U.S. no State has a sales tax rate higher than about 8-point-something percent, and each State has its own definitions of exempt items, usually basic subsistence things such as unprocessed food, clothing, etc. A few States have no sales tax at all.
We need to follow European models about as much as we need extra holes in our heads.
Uh, you seem to have invented your own definition of "use tax." The real definition has nothing to do with merely "using something" like phone lines or the Internet. "Use tax" is the flip side of sales tax and applies when you buy out of State and bring merchandise into or receive merchandise in your home State for your use.
Sales tax is collected by the seller on behalf of a State that has jurisdiction over both the seller and the transaction, i.e. where the Seller fits the State's definition of "doing business" and the delivery takes place in that State.
Use tax is collected by the State wherein the "use" takes place when sales tax is not applicable. Use tax is based on the jurisdiction of a State over the user of the merchandise.
The sales/use tax laws are written in such a way that only one State will tax a given transaction, either by sales tax or by use tax.
You can see this very clearly in cases of costly, registered things like vehicles. If you buy a car out of state and arrange to take delivery of it in your home State (or simply bring it into your State), the seller may not be required to collect sales tax where you bought it but your home state will collect the "use" form of sales/use tax when you register it. Unfortunately in the case of vehicles a new sales or use tax will be paid every time a vehicle is registered by another owner.
Use tax is inherently difficult to assess and collect, but many States do conduct audits of larger taxable entities within their jurisdictions and collect use tax on merchandise purchased from sellers in other States.
Daytimer, the well-known manufacturer and mail-order seller of fine business diary/scheduler and related products, voluntarily bought into this big time, decades ago, and began collecting "sales" tax on all mail order sales to customers in all States even though the company operated only in, I think, Pennsylvania, which was why I stopped doing business with them.
letxa2000 wrote:
That's why I don't look at it as a DDoS. The objective should be to affect the overall costs of running spam websites, not to try to clobber them. Let their new higher-tiered bandwidth bill send them the message.
Spider the beneficiary website a few times, looking exactly and fully like a browser, with all the expected headers like "Referer:" and "User-agent:" perhaps filled with real but varying values. In spidering, the "Referer" should follow one step behind the current request. Timing can be controlled as well. The idea is to look exactly like a human at a browser, and to actually download the results with reformed links for local offline viewing. By limiting the download rate and the number of times each file is downloaded, and by saving the spams and the downloaded websites for a reasonable time, an individual will have a defense against an allegation of DoS. In fact, things like WebWhacker were and are marketed as tools for offline browsing by downloading entire Websites. Doing so can actually be *more* gentle on the web servers than human surfing often is, but unproductive surfing or downloading by large numbers of spam recipients will radically alter the economics of running spam websites.
SEWilco wrote:
Graham does use the entire email message. He explains why in his articles on Bayesian filtering. The apparency that email headers contain a lot of stuff that looks useless is misleading. Bayesian filtering can make excellent use of strings in the headers. You don't have to look at or understand the gobbldegook -- you just say, "This message is spam" or "This message is not spam," and the Bayesian filter -- done Graham's way -- then assigns values to the tokens found in the message.
In the deluge of alleged Bayesian filtering that has hit the market since Graham's first article, it's easy to overlook the fact that a lot of implementers are complete idiots who think they are smarter than Graham.
The Bayesian part is only one element of what Graham proposed -- it's the mathematical part that computes a probability of "spam" from a list of hits against a weighted list. Just as important are A) how the input text is tokenized, B) how many and which hits are selected for use in the Bayesian computation, and C) how skew or bias is introduced in multiple places in the process to reduce both false positives and false negatives. Graham discusses all of this in the light of quite a lot of his own research. Pity that not everyone who wants to tout their "Bayesian filtering" bothers to give Graham credit for knowing what he writes about by following his whole prescription.
Thus it is quite easy to employ Bayesian computation of the probability of spam in a context of poor tokenizing, skipped headers, ineffective weighting or skewing, etc., and get very poor results that don't stand the test of time and new spam. It's a good bet that this is exactly what a lot of implementations represent -- Bayesian computation of poorly selected and biased data -- either because the implementers are stupid, or because they erroneously believe they are smarter than Graham is (or is that redundant?).
Remember SCO, VeriSlime, SunnComm and others, and think "Bayesian" as a buzzword that the suits believe will convert to cash. Then ask yourself how many of the sudden Bayesian filtering appearances are from people who actually know what they're doing.
The various reports of spammers "getting around the Bayesian filters" that have surfaced since the appearance of Graham's first article are completely inconsistent with Graham's own results, virtually proving that the people implementing a lot of this stuff are not RTFA.
Sheetrock wrote:
The author, Paul Graham, in my opinion appreciates and understands a great deal more about this and a host of other things than do you. With all due respect. FYVM.
It doesn't matter. The spammers and their clients will have two choices: implement working unsubcribe links, preferably those that are activated by the spidering FFBs, or see their operating margins plummet to completely unprofitability. Frankly, I don't give a rat's ass which they choose.
They can't. They may hijack bandwidth to send the spam out, but they have to pay for the bandwidth to get the resulting website visits. Those who use hijacked computers as temporary web servers will end up in prison pretty quickly, because they can't hide their DNS registrations fully or forever, and they can't hide the click trail or the money trail.
You're dreaming. Not only are there not that many spam websites in the context of the global Internet, many of them in fact converge onto a small number of the same IP addresses. If you took the trouble to do IP lookups on some of the hostnames in URLs in spam you receive, you would already know this.
Yeah, and people who grew up with the ARPANET used to whine at people who had sigs that were "too large and wasting bandwidth." It's a new world. There is a lot more bandwidth out there than you realize. At any point in time, nearby cities now have more bandwidth connecting them than the entire planet used to have just a very few years before.
Ah, but it does pay for all the bandwidth we need or want, and has been doing so very nicely for about 10 years of public use through exponential growth that no one anticipated. And at the same time, every year, almost every month, fat bandwidth gets cheaper and cheaper and cheaper. Some day your $20/mo will bring you 640 Gbits/sec on fiber and even streaming HDTV won't use it all up. Then you'll be whining that spams will have entire feature films file-attached to them to entice you to visit their 3D holographic websites.
Brad Mace wrote:
Whitelist. Whitelist. RTFA.
All together now... W h i t e l i s t .
Oops! ([R]eading [TFA] again myself), I see that Graham revised the article. Mea culpa.
Either way, though, whitelist or blacklist, I think that's Difficult (TM). I think that the way it will evolve will end up as a two-level Bayesian process of classification: one for the spam and the other for the websites.
Then, your Bayesian email filter will segregate new incoming spam in the normal manner, and of the URLs contained in the spam, your Bayesian FFB will download (and eventually throw away) only those sites that can reliably be identified as true spam beneficiary sites.
Malicious spam that seeks to kick off a DDoS attack against innocent websites would thus have little or no effect.
So instead of two bodies of text -- spam email and nonspam email -- there would be four, to include spam web pages and nonspam web pages. This is because the tokens that indicate "spamminess" or "nonspamminess" in spam email will not necessarily be the same ones that indicate the same things in web pages, nor will they likely have the same weights. When (not "if") FFB gets implemented in this manner and with reasonable integration so the average user doesn't have to jump through a lot of hoops, it will be easy and effective.
The FFB tool would download no more than a moderately obsessive/compulsive surfer would, and at a modest rate, looking exactly like a web browser in all respects. It will be the numbers of FFBs in service that will have the desired effect, not the download quantity or rate of any particular one.
Then someone will have to implement a tool that places phone calls to the 800 numbers in non-URL spams and sends snail mail to the physical addresses in the last category of stone age spam. Oh, and maybe Tomahawk missiles to Nigeria for the "419" spam.
jafo wrote:
Even if the URLs used are not cleaned up first, this is not necessarily a bad thing. It causes the spammers to send more spam to the very email addresses that cause even more unprofitable load on the beneficiary web servers.
That's not as common as you think. The most common identifier in a spam URL is the ID of the spammer/agent, so he can get credit for any sale that results from your visit to the benefiary website.
The next most common identifiers are very lame inclusions of your email address in the URL, very easily modified by a FFB tool. Again, though, the desired FFB effect may actually be multiplied by leaving such identifiers intact, by causing the spammers to concentrate on the email addresses that hurt them the most.
Running way in last place are URLs that contain other encodings, which may be a munge of your email address or, more likely, a database key that allows them to link back to your email address and the spam they sent you. This last, though, requires a high degree of integration between the spam sender and the beneficiary website operator, which also implies a stable and very advanced IT environment, which is not typical of spam/website operations.
If you were running FFB, you could opt to munge identifying information or not, to regulate how much spam you get and thus how much Fighting Back you do. If you don't get enough spam, you could allow your FFB to use original URLs with whatever identifying information they contain. As your spam level rises, you could tell your FFB tool to munge the identifying information.
RevMike wrote:
RTFA. Graham suggests a whitelist, not a blacklist. I think that's presently a Difficult Thing (TM), and have suggested that the upcoming tools for doing this could incorporate the same good/bad Bayesian classification for the websites that the present tools use for the email. As with training the system to identify the spam emails, there would be an initial burden of viewing and classifying, after which the system would run pretty much by itself with only occasional update training.
William Tanksley wrote:
Uh, and spam is not "ridiculously aggressive?"
In contrast, Filters That Fight Back is considered, measured and appropriate. And, if actually implemented, devastatingly effective.
carsont wrote:
Capt_Troy wrote:
No, you're probably only missing the little part about hits being useless if no revenue results from them.
The spammers want traffic... let's give 'em traffic. Even if a few million people did this manually with browsers (with Java and JavaScript and autoupdate etc. disabled), it would have the desired effect. But that's a lot of time and effort to expect from a large number of people. Have no doubt: tools will appear.
Robert The Coward wrote:
Methinks you mayhap miss the point. The point is not to smoke the servers or their connections; the point is to increase their traffic costs without increasing their sales. A million times "gentle" will do it just as effectively as a hundred times "massive." More effectively, actually, since the spamsite operators will have no recourse against anyone.
meldroc wrote:
Irrelevant. We don't care where the spam comes from, nor who sends it. We only care about reducing the profit margins of the beneficiaries of the spam -- the sales websites to which the spam tries to attract traffic.
I doubt there is as much of that as you claim, but it still doesn't matter, as you yourself point out in a different context:
And finally:
Maybe, but that is inherently difficult and will require worldwide coordination and widespread implementation and cooperation. Punishing spam beneficiary websites is relatively simple and can be implemented by anyone willing and able to do it. With the release of tools that are sure to be coming soon, the numbers of the able will increase vastly, leaving only willingness as a requirement.
dilvie wrote:
Wrong. This has only now been suggested. RFTA.
RTFA.
This has nothing to do with sending or replying to email. RTFA.
At least you got one thing more or less right. It's a different methodology than anything that has been tried before, one made possible by the classification and segregation of spam, mostly thanks to Bayesian filtering.
It doesn't matter. The spammer already has your email address. "So he works harder to get through your filters" is a misunderstanding of how spam and spammers operate. The spammer doesn't care about you, individually, and will expend no additional effort to get through your filters. Moreover, almost all the attempts to get through Bayesian filters fail utterly and simply make the spam even more easily identified. RTF related articles on Paul Graham's website.
I don't even use Bayesian filtering yet, and not a single spam message gets through my dumber filters. Nor have I had to maintain the filters in a long, long time.
Send me spam, see your website traffic increase without additional sales. Send me more spam, see more traffic. It's that simple. Multiply it by even just a million users -- a mere fraction of 1% of Internet users -- and the spam websites will be smoking craters of melted servers or will cost so much to operate that they won't be paid for by the pitifully low sales they generate.
That's standard English when constructing derivatives. If the "m" were not doubled, the result would be pronounced "spayminess" due to the vowel that would follow the single "m."
freality wrote
When a spammer sends you a URL, it is an explicit invitation to click on it and visit the beneficiary website. Even if the target website is unrelated to the spam and spammer and is just an innocent victim, the responsibility lies with the spammer, not with anyone clicking on the links in the spam (more on innocent victims below).
When you click on any link, this causes your browser to download the target HTML page and, usually, all referenced components of that page. The key word is download. The page usually contains additional links, which are also explicit invitations to visit other pages and download those pages and their referenced components.
There is no functional difference between clicking on links in your browser or otherwise downloading the pages and other files in response to the explicit invitation. Actually, a strong case can be made that automated downloading of a site is less demanding of the server, since human surfers often repeatedly view the same pages and repeatedly navigate back and forth.
All Paul Graham has suggested is that large numbers of spam victims use the results of their Bayesian filters to accept the explicit invitations and "visit" confirmed spam target websites. An effect will be to increase the bandwidth and server capacity costs of the spam beneficiaries without any one victim creating more load than a slightly obsessive / compulsive surfer-clicker would create.
All this would increase the costs of running a spam/website system without increasing completed sales, thus reducing margins. Although the spam senders effectively hijack the bandwidth of other people, you can bet that almost all the beneficiary spam websites pay tiered rates for the bandwidth they use. When they start having to pay for gigabytes and terabytes of traffic, the profit margin on spam will plummet.
Note: It doesn't matter at all whether the spam is sent by the operators of the websites or by their contract spam agents. All that matters is the profit margin at the point of sale -- the beneficiary website.
Note: It doesn't matter that downloading entire spam beneficiary websites may also confirm that the spam was sent to a working email address. The more spam they send to such recipients, the more traffic they will have to pay for on their web servers. There are way more of us than there are of them.
Paul Graham already pointed out that some mechanism such as a whitelist would have to be used to protect against punishing innocent websites whose URLs might be included in spam to descredit Filters That Fight Back. My own take on that is that having a reliable whitelist is a Difficult Problem. A better solution might be to employ another level of Bayesian filtering to classify the web pages pointed to by spam on an individual user basis. Like the classification of the spam itself, this would mostly be a one-time process, with occasional followup as the spam beneficiaries try new tricks on their web pages just as the spammers try new tricks in the spam they send out.
Meanwhile, though, no matter how you segregate the spam for use in a Filters That Fight Back response, it is trivial for an individual to scan a list of extracted URLs and manually exclude seemingly innocent ones.
KornShell scripts for AIX that implement FFB are being readied now for distribution. Write me if you're interested.