A New Type Of Realtime Blocklist: The SURBL
Glamdrlng writes "The SURBL, or "Spam URI Realtime Blocklist", represents a nexus of RBL's and content filtering that may bring us one step closer to a spam magic bullet. While traditional RBL's perform a DNS lookup on the connecting mail server, SURBL's take this a step further by parsing the text of the email looking for URI's and doing a lookup on those web servers. They also prevent "joe jobs" by maintaining a whitelist of legitimate web servers whose domain names may show up in spam messages, e.g. EBay, Paypal, Microsoft, etc. The only requirement to implement the SURBL is a plugin on your MTA such as spamassassin that can parse the body of each email. While there is no MTA that directly supports SURBL's without a plugin, the author hints at one being in development."
(one minor thing I missed before:
The advent of bayesian spamming brought spams that included whole paragraphs of random words - just so that your list would get more and more bloated...
How long do you think it will take spammers to add dozens of valid - but in the context of the spam nonsensical - URLs just to fill up the black-list and make it useless?
Boy - that list will be f***ed up pretty soon...
Presently the only problem with this is that there are no plug-ins for the MTAs themselves yet. The plug-in is for spamassassin. That means that the message has to be transfered and passed onto Spamassassin before it can be dropped or tagged whereas, the other RBLs allow you to drop the connection before the message is transfered. This problem will be solved once there are plug-ins for the MTAs themselves.
But, I have to ask, why aren't existing RBLs like Spamhaus effective. They should be far more effective than the ~40% that I am experiencing.
- (x) Users of email will not put up with it
We'll see.- (x) Eternal arms race involved in all filtering approaches
One of the few constants is that there will be way for money to get from the target back to the original spammer or seller. (well, it's possible something more complex is going on and that's not the real goal of spam, but at the least, it's something that's remained constant for years, which is notable in the world of spam). So "following the money" is really based on an acceptance of the above criticism, and a realization that the arms race can never get around the money stream.Filters may be lead to arms races, but does anyone NOT use them right now? There are few alternatives, namely things like making email non-anonymous / PKI, enacting large legal penalties along with huge international support, rejecting email from anyone you don't know, ....
- (x) Whitelists suck
Actually, it's a blacklist. Blacklists may suck, but it's possible they suck less than spam, and the proliferation of RBLs kind of implies that.Sure, there might be a way to stop spam once and for all and then blacklists would be hated, but the very presence of a antispam-rejection-template implies that there won't be a magic bullet for a long time to come.
- (x) Sorry dude, but I don't think it would work.
The only way it CAN'T work is if money isn't the real goal of spammers, or if they make it hard enough to "follow the money" that other methods are easier/nicer.Sounds like a great idea especially for home users or some such but, as soon as you look at the bigger picture things start to break down. First of all, what about legitimate mailinglists? Some of them have hundreds of thousands of addresses. You want the administrator to have to go through and click a web page for each and every address on the list? Never gonna happen.
What about corporate use? Many legitimate emails go to a dozen recipients almost like a mailinglist. Think of the lost productivity with the senders clicking webpages for each reply and forward. Think of the dreaded Everyone group. Well, that would be an advantage but, you start to see what I mean.
Your idea is very similar in concept to a few others in that it requires a cost, someone reading a picture and clicking a button for the message to transfer. This scheme is better implemented by the various proposals that invoke a computational cost for each message transferred, like those from AOL Yahoo and Microsoft but, even these proposals all have major drawbacks and no one is rushing to implement them.
See the above list? Your post fits into:
(x) Requires immediate total cooperation from everybody at once
Also, accessibility, custom SMTP clients, yadda yadda yadda... but you've already realized your mistake so I'll stop now.
We can't ever have a workable spam filter because of the adaptability of spam.
This is because the solutions of the day focus on content instead of anonymity.
I've said it before, I'll probably say it again, get rid of unauthenticated email and the spam problem becomes a thousand times easier to fight. SPF and various RMX solutions exist in design today. If people want the spam problem to go away, that can be done today. Unfortunately people would rather piss and moan and call for legislation or perfect solutions than deal with these good ones today.
In the case of spam the perfect is the enemy of the good enough. We should stop spam today.
ANY technical solution is going to require extra work on the client side, so rejecting this outright is kind of rediculous unless you're advocating a purely legal, market-based, or vigilante solution.
Spam is getting to be such a problem that techies are setting up things like SpamAssassin for themselves and friends, and major ISPs are using RBLs. So this isn't really a problem.
One thing I've noticed a lot recently, is spammers including a big list of domains that have nothing to do with them in the text of thier junk mail. In the past 2 weeks, i've probally got about 10 spamcop reports for my customers, and in every case, my customer has had nothing to do with the junk mail, except for being listed in a list of about 15 URLs, that are not associated with the spammer. This system here says it has a whitelist for paces like ebay, paypal etc, but what about smaller people. They'd get blocked, and potentially lose business, due to something that they had absolutley nothing to do with.
Having just configured my email server with a ROBUST array of RBLs, I think this is a fantastic idea. I've been using the body_checks feature in postfix and manually adding violating URI's to create my own blacklist for several months now. I would love to benefit from a shared list. I don't care much for the white-list feature, that seems to me to create a backdoor for the spammer. Combined, RBL and private blacklisting (IPs and Addys) allow me to block 6000 plus spam A DAY. That's for a mere 150 plus users. Server side spam blocking using only Bayesian processing is an immense processor drain as is server side virus scanning. Look at it this way. Spammers need to make money. To do so you must be presented a URI to complete a transaction to make that money. They cannot easily change this URI without incurring cost so it will always be in the spam. Spammers who try to include too much "sales" content in their spams instead of a URI will be caught by a secondary bayesian filter. P.s. We have been successfully blocking encrypted URI's for months now. It's an easy rule to set up and legitimate users will never encrypt a URI. It's really quite beautiful.
If you want bloat and dysfunction like this, look at Exchange or Notes or Gropewise. It's a GUI, it's a calendar, it's a database, it's an MTA! well, it doesn't scale and it tastes like floorwax.
This is why sendmail developed the MILTER interface. Firewall-1 had a proprietary scanning interface (easy to develop clients with their kit, but a bitch to find out the SERVER's protocol and use that). "the future" (in 1999) promised a vendor free spec. Which still isn't there.
MILTER allows sendmail to speak to an external process that can do things to the headers/envelope or bodies. External can mean to another box (or group of boxes) or just another program.
This also lets you make decisions before the SMTP session closes.
I just with other MTAs had this available.
If people have to get passwords from you before they can contact you, then... what do you do if you're an open source author... or if an ex from college wants to hook up again and googles you, and finds your website, but STILL can't contact you... or you want to sign up for match.com so that random women can email you.
But if it this approach catches on significantly, the more sophisticated spammers will just build their URLs in scripts.
And the rest of us will just block all e-mail that contains scripts. Yeah, I can't wait for that to happen...
Ummmm, the hell? It's perfectly legal to go through mail. My own mail, naturally. And it's legal to tell someone else (say, a secretary) to go through your and filter it. Ditto phone calls. I agree, I wouldn't want the government or any other random person or organization rummaging through my email. But I'm more than happy to run a program to do it myself. I appreciate ISPs that offer me the service. (I'm less keen on ISPs that make the service mandatory, but that's another issue.)
Attacking the source varies from extremely difficult to impossible. Spam filtering systems (especially multiple technique systems like SpamAssassin) are a good stopgap measure. Sure, the spammer is still wasting my bandwidth, that sucks, but having it disappear into my IN.spam folder reduces my irritation. Ultimately even legal measures have limitations as spammers move overseas. Too suggest that we should just give up and drown in spam (I'm at a 2:1 spam:ham ratio these days) until we get a full solution is foolish. Much like medicine, sometimes a cure isn't a real option; all you can do is treat the symptoms. I hope a real cure is on the horizon, we should certainly look for one, but in the mean time I'll treat my own symptoms.
Search 2010 Gen Con events
- Alll sentences must contain at least one verb.
- All singular nouns other than proper nouns are almost always preceded by an article or a possessive noun/pronoun. There may be words in-between, but these will generally be adjectives (or adverbs modifying the adjectives).
- Prepositions are almost always followed by a noun within a handful of words, all of which are generally adjectives (or occasionally adverbs modifying those adjectives)
- Sentences are rarely more than about twenty words long.
- Sentences are rarely less than about five words long.
- Words that can be more than one part of speech are used fairly infrequently. Ten in a row is a pretty good giveaway.
The English language, if you include slang, jargon, etc. is 3 million words or more. However, the core of the language (removing slang and specialized terminology) only contains about 600k word forms (616,500 from OED2), mostly by adding trivial endings to a core of about 300k words. Of those, only about 200k words are actually in common use anywhere in the world. The average educated person knows about 20k. The average educated person uses only about 2k in any given week. So start with a core of about 20k words, including all articles and pronouns. That should be enough to do a fairly accurate job of determining the legitimacy of a string of text.For unknown words, assign it a probability of being a noun, verb, adjective, or adverb based on its ending and its position within the sentence. Give it only a 95% probability of being valid so that several in a row will dramatically lower the probability of a sentence being legitimate.
From this, you should be able to easily come up with a heuristic that determines with a high degree of confidence whether a string of text obeys each of the above rules. All sentences should obey most of those rules. Most sentences should obey all of those rules. Three sentences in a row that break more than one rule (even with vocabulary limitations) and it is probably either random noise, Ayn Rand, or unreadable gibberish that you won't understand it anyway. Some would argue that all three are an equivalence class, but this is somewhat subjective.
If random text generators get good enough to beat those rules, they will no longer be truly random and will run a high risk of being intelligible sentences---and potentially offensive ones at that---substantially reducing the likelihood of spammers using such generators.
Check out my sci-fi/humor trilogy at PatriotsBooks.
I see one major problem with this, which is that Spammers might now be able to cause problems for legitimate websites simply by including their URL in the a Spam.
I'm a little sensitive to this since a spammer is actually Jo-jobbing one of my domains (not autopr0n), and I get hundreds of "user unknown" messages every day, along with a handful of messages telling me "my" email was blocked. It's really irritating.
But, if it's done right, it could work out pretty well. In fact, this would actually be effective against a lot of the current Spam out there, and kill Spam with off-site images.
Anyway, let me throw one countermeasure out there. Suppose spammers start including commonly mailed URLs (such as those on hotornot, yahoo, etc) in their spams in order to decrease the usefulness of these things. If this thing gets popular, expect to see a lot of Spam include a lot of random URLs the way they now include lots of random words. You'll also start to see things like "Javascript decryption" and other techniques to prevent machines from figuring out which, exactly, URL it is that is being advertised, rather then random noise.
autopr0n is like, down and stuff.