New Method of Spam Filtering

Easily spoofed? by Sam+Ruby · 2004-02-19 06:35 · Score: 5, Insightful

What's to stop the From:, To:, and Cc: fields from being spoofed (like a lot of viruses do)?

--
- Sam Ruby

Re:Easily spoofed? by crymeph0 · 2004-02-19 07:13 · Score: 3, Insightful

Easily 30% of the spam I've received over the last few months has been addressed to several people in my office (and not to anyone outside the office). I'm guessing this a result of viruses harvesting emails off people's computers, then it's a simple matter of finding all known emails in a given domain. Would this break the system described here?

--
It should be illegal to say that freedom of speech should be limited.
Re:Easily spoofed? by FauxPasIII · 2004-02-19 07:21 · Score: 3, Insightful

> If you do as most spammers do and connect directly to the receiving server, then you can feed it
> whatever you like in the envelope sender, and it has no way of checking whether it's genuine or not.

Isn't it typical for the receiver to reverse-lookup the sender's IP, or at least forward-lookup whatever you hand it in the HELO to make sure you're legit ? I could be mistaken here, but that's always been my perception.

--
25% Funny, 25% Insightful, 25% Informative, 25% Troll
Re:Easily spoofed? by DR+SoB · 2004-02-19 07:38 · Score: 2, Insightful

"Please try to pay attention."

I'll try..

Your assuming too much dude.. Your assuming it's going to try and access your default DNS server, but it could be hardcoded to try any DNS server (i.e. use akadns.yahoo.com for lookups)..

Also, some SMTP's don't even bother to do MX look-ups, they just assume it will be either:

MAIL.[domain].[whatever]
or
MAIL1.[domain].[wh atever]

And it will be correct 80% of the time. (Yes I picked 80% off the top of my head, but let's just say I've seen enough mail server's to know..)..

--
Mod +5 Drunk

email still has to get to user by belmolis · 2004-02-19 06:39 · Score: 3, Insightful

If I understand the technique correctly, it relies on information specific to individual users. Unless there is a way for users to export their information, that means that the filtering can only be done after the email reaches its destination, not by the ISP or central mail server. So it may be helfpul to individual users, but unlike some proposed techniques, it won't cut down on total email traffic.

End user's access is not the issue. by Sentosus · 2004-02-19 06:39 · Score: 3, Insightful

For me as an ISP, I don't care if the email gets filtered between me and my customers. It hurts and costs me more for bandwidth to receive the emails, then store them, and then support the users that want me to clear their pop3 accounts when they are on dialup. Spam Filtering should take place at the Hub Cities on edge servers so it never gets to my mail server in the first place and I do not have the bandwidth charges. In exchange, I will filter all my outgoing mail on the mail server for spam outgoing. BTW, my mother likes spam. It is a good hobby of hers just to read through it. She gets very entertained by the content.

Re:Volume by Dukael_Mikakis · 2004-02-19 06:41 · Score: 2, Insightful

And from the sounds of it, what makes it different from black(or white)lists? True, it's more sophisticated because it uses the whitelists of those on your whitelists, but why not just use a plain whitelist anyway?

And how does this allow email from internet transactions or other non-social sources through? The article didn't seem to address that so clearly.

Re:Sounds interesting... by rjelks · 2004-02-19 06:41 · Score: 4, Insightful

I would agree with that in terms of personal email accounts, but for a business, new contacts are pretty important. Most companies would hope a lot of real email was from new sources.

-

--

Tech News, Reviews and Tutorials

Spam from Co-workers? by Titusdot+Groan · 2004-02-19 06:43 · Score: 2, Insightful

These guys are way behind the curve. A growing percentage of the spam I get appears to be coming from my coworkers.

These idiots have forgotten the basic rule of dealing with spammers (and other mail miscreants) which is:

They LIE!

They lie in the HELO, they lie in the MAIL FROM:, in the headers, etc. etc. etc.

Any method that depends on this kind of data is doomed to a quick failure in the real world.

Re:Viruses? by Xzzy · 2004-02-19 06:44 · Score: 2, Insightful

> Won't this just inspire more spammers to pursue
> virus, trojan and spyware-oriented methods of
> spamming?

Fine by me.. that puts them soundly into the lawbreaking category. Which means that after you track them down and actually find someone operating inside the borders of your country, you can DO something about it.

Since the laws being passed in the US are clearly indicative that spam is and will always be in an impossible to regulate grey area, the next best solution is to make spamming so difficult that only outlaws can do it.

Re:Viruses? by MoogMan · 2004-02-19 06:44 · Score: 3, Insightful

That isnt necessarily a bad thing, forcing users to clue up on good practices regarding viruses etc by automatically blackmailing their email address otherwise. If this is coupled with a decent system to stop the from/to/cc from being filtered then it may start solving two problems at once.

New math? by WD · 2004-02-19 06:45 · Score: 2, Insightful

The method works for only about half of all e-mails received - but in all of those cases, it sorts the mail into the right category

That has to be one of the most ridiculous statements I've heard in a while. That's like saying I've got a great new burglar alarm system. Now, it only works about half of the time, but when it does work it catches the crook with a 100% success rate!
Who's buying?

Re:My favorite filter by catdevnull · 2004-02-19 06:48 · Score: 3, Insightful

my namesake! spam assassin on our mail servers helps bunches. x-headers that we add are so easy to filter. gets about 99% of the spam. your milage may vary.

--

I might know what I'm talkin' about, but then again, this is Slashdot...

This method will ruin a cool part of the net by The+Wing+Lover · 2004-02-19 06:49 · Score: 5, Insightful

Used to be that one of the cool things about the net was that you would get email from total strangers... "Hi, I'm from {some far away place}. I saw your {Usenet post|web page|profile on some bulletin board site} and really liked your ideas about {something}. I've also been experimenting with {something} and I have some ideas about {whatever}..."

Now, if we only have emails from our (already existing) friends or friends of friends, then how will we ever meet anybody new?

--

- In Capitalist America, law violates YOU!

Problem halved -- Yarright by ZakMcCracken · 2004-02-19 06:49 · Score: 2, Insightful

The remaining half of the e-mail then has to be filtered in a more sophisticated way. But by then the scale of the problem has been cut in half.

Solving "half" of the problem is pretty useless. Spammers -- assuming this technology is ever be widely adopted -- wouldn't be long to find a way to get their messages in the unfiltered heap. The only ones to suffer damage will be the legit email senders.

Says the Cat, "Instead of counting all the stars in the sky, you could just count half of them and multiply the number by two. You just halved the problem there."

Bigger Issue... by glpierce · 2004-02-19 07:03 · Score: 3, Insightful

While this may work for teenagers, it has no use in the business world. In the last week, I've gotten two dozen vital emails from people I did not previously know (professors at various grad programs). In that period, I haven't gotten a single message from people I know (or who know someone I know), because I have conversations with friends them face-to-face, over the phone, or through instant messages. This sort of filtering just removes the most important reason for the existence of email, which is replacing snail-mail, not replacing conversations.

--
G

*Sigh* by NanoGator · 2004-02-19 07:07 · Score: 2, Insightful

All this work to stop spam, and ICQ's done it for years.

Frankly, a series of filters is probably the worst approach at stopping SPAM. It's a game of "make the filter, defeat the filter, and risk not getting important mail." Why bother? The solution lies in a different approach. Authorization. There needs to be authorization layers in order to defeat spam. We need buddy lists, we need blacklists, we need the ability to request authorization, etc.

I realize that fixing this problem isn't a simple one given the scale in which it's used. But man, I really wish somebody'd figure out how to do the transitory work. I'm almost completely reliant on ICQ and Private Messaging on forums in order to keep up with everybody.

--
"Derp de derp."

Mailing lists / newsletters by blorg · 2004-02-19 07:15 · Score: 4, Insightful

A mailing list would have multiple folks in the To: line, which would be easy to spot automatically.

Not necessarily, indeed most professional ones avoid this. While many spams do contain multiple people in the To: field (but also many don't). One way or the other, I don't think this is relevant if we are trying to compare the graph of a mailing list to that of a spammer. To take an example, user slashdot-headlines@newsletters.osdn.com sends thousands of emails to people *who don't know each other*. User enlargeyourdong@hotmail.com has exactly the same pattern. How do you tell these apart?

Re:Mailing lists / newsletters by The+Dakota+Kidd · 2004-02-19 08:48 · Score: 3, Insightful

According to the paper this article is based on, the algorithm is effective against messages with multiple recipients in the To: or Cc: headers. This means that messages coming from slashdot-headlines@newsletters.osdn.com would probably be in the unclassifiable half. Indeed, a good chunk of spam these days would be unclassifiable according to this algorithm.

However, the whitelist that this algorithm generates would still be valid. To me, this is the real strength of the algorithm, to be able to generate a white list with no input on my part.

bcc to all! by Datoyminaytah · 2004-02-19 07:16 · Score: 3, Insightful

These people don't seem to realize how SMTP works. The RCPT command doesn't distinguish between types of recipients, it's up to the sending process to "play nice" and put that information in properly created headers.

A spammer could manipulate the To and CC headers as necessary to fool filters that analyze them, without affecting the ACTUAL list of email addresses to which the email is sent.

I don't think spam can be stopped without replacing or overhauling SMTP, and then ceasing to support "old" SMTP. But that ain't gonna happen anytime soon. (sigh)

--
assert(birth_date<time-86400)

Some of us rely on e-mail from strangers by beagle72 · 2004-02-19 07:25 · Score: 5, Insightful

The proposed anti-spam clustering technique is of course a variation on whitelisting. While clever, it fails to address a problem I have not often seen addressed. Many people defend themselves from spam by obscuring their e-mail addresses in public places, and perhaps by using whitelists to prefer known senders. This may be effective for many people.

However, some of us can't avoid having a publically available e-mail address. For example, writers such as myself rely on feedback from readers who are, in nearly all cases, strangers (and sometimes strange, but that's another story...) Avoiding false positives from strangers is very important to me. I want their messages. But, since my e-mail address is published frequently (hence no reason to hide it here), I obviously receive a ton of spam.

For the past few months I have experimented with a plug-in called BayesIt! for the Windows email reader The Bat!. As the name implies, it's a bayesian filter. The nice thing about BayesIt is that I could point it to my already-stuffed spam folder and train it on thousands of messages in one go. So far it has worked out rather well. No false positives, and only about 10-20 false negatives per day (out of approx. 400 spams).

Still, in the long run I support proposals that shift the economics of e-mail in ways that have minimal impact on human beings while making spam unprofitable. Changing the economic model of spam is the only sure solution; relying solely on technology will simply keep us locked in an ongoing arms race.

-Aaron

Most newsletters are one-way by blorg · 2004-02-19 07:25 · Score: 4, Insightful

Easy - those thousands of people who don't know each other also send email *back* to the mailing list. Only a few dummies send email back to the spammers.

Most mailinglists and newsletters are one way - I'm not talking about discussion lists or listservs, but rather about the bot that sends me Slashdot headlines, Jakob Nielsens' Alertbox, Fred Langa's newsletter, and even commercial speech that I am signed up to and want to hear such as Komplett's weekly offers, or Ryanair's cheap flights, etc.

Re:How it works - clustering coefficients by gnu-generation-one · 2004-02-19 07:29 · Score: 2, Insightful

"My only question concerns how this would deal with mailing lists, which must appear to it like spam?"

Well mailing lists are, by definition, identical to spam, so far as an automated program looking at each messagae is concerned. Whenever there's a test of spam-filtering programs the "false positives" are mailing lists that the tester forgot to tell the spamfilter about.

It would be useful to have some way of publishing a list of mailing lists who have permission to send you email -- I'll leave it up to the "all you need is a system of public keys..." crowd to start shouting suggestions.

And for the people who'll suggest whitelisting based on the From field, don't forget that the spammers can easily put "bugtraq@securityfocus.com" as the sender.

Re:How it works - clustering coefficients by gatekeep · 2004-02-19 07:41 · Score: 2, Insightful

Whitelist on the from field, and enforce SPF.

Re:How it works - clustering coefficients by edrugtrader · 2004-02-19 07:44 · Score: 2, Insightful

NO NO NO NO NO NO NO.

do not filter ANYTHING at the ISP level.

this is not a suggestion, it is a demand.

--
MARIJUANA, SHROOMS, X: ONLINE?! - E

Re:How it works - clustering coefficients by orthogonal · 2004-02-19 07:50 · Score: 4, Insightful

The system would be ideal for implementation at a fairly high level, (e.g. the ISP level) where systems can aggregate email headers across many different users in order to come up with meaningful graphs. The advantage it claims of no false positives means that it would be feasible at this level.

Yeah, but I'd consider a high-level analysis of my email headers (either sent or received) to be a violation of my privacy. Whether or not I'm mailing to kinky@alterate.life.styles.com, fringe.politcal.groups.require@free.speech.too.org , unpopular.opinions@free.thinkers.net, or falun.gong@is.banned.by.my.dictator.org, it should be nobody's business but my own.

Someone will undoubtedly argue that since headers are sent in the clear anyway, it shouldn't matter, but keeping a database of who mails what to whom only makes abuse -- by freelance busybodies or government spies and censors -- that much the easier.

This is a case, I think, were the threat inherent in the cure is worse than the disease.

--
Opinions on the Twiddler2 hand-held keyboard?

Re:How it works - clustering coefficients by orthogonal · 2004-02-19 08:06 · Score: 2, Insightful

Yeah, but I'd consider a high-level analysis of my email headers (either sent or received) to be a violation of my privacy

And in reply to myself. ;)...

Since the whole point of this is to build social-connection-webs, it's ideal for government crackdown via the guilt by association angle: not only can you find everybody who is emailing to dump.ashcroft@new.american.revolution.org, you can also find -- and investigate -- all the friends of the dissenter, too.

And for anyone who isn't worried that the FBI occasionally oversteps it bounds in investigating dissent, just consider that the social affinity networks of p2p traders could also be subpoenaed: we know Joe uploads mp3s, let's subpoena his email "buddy list" and investigate all those people too.

--
Opinions on the Twiddler2 hand-held keyboard?

Re:How it works - clustering coefficients by mdfst13 · 2004-02-19 08:11 · Score: 2, Insightful

As someone who used to sysadmin a mail server, I can tell you that this (saving info about who emailed who) is already required. I forget what the limit was, but we were supposed to keep the mail logs (which carry from who to who info) for at least six months. We actually archived them to our write only backup system on a regular basis. AFAIK, they stayed there forever (of course, it's anyone's guess whether or not we would have been able to retrieve them; our backup system had issues--thus the write only tag).

This proposal does not involve collecting or saving new info. It involves *using* the existing info at a summary data level. Also, understand that it would be the *recipient's* ISP who would do this, not your ISP. This means that they could only collect info on what you send to email addresses on that server, not cross reference it with all the email that you send.

It's also worth noting that other ISP-level SPAM filters already process this info as well. This isn't a new concept. The new part is that it is trying to use the patterns *before* putting it in the receiver's mail box rather than after it is identified as SPAM by the receiver.

Re:Only 50%, but no false positives by ichimunki · 2004-02-19 09:00 · Score: 2, Insightful

The reason it's not giving you any false positives is because it's giving up on about half of the attempts. In my mind those are false negatives because they require additional effort (i.e. the filter errs on the side of accepting the maisl)... and at a 50% rate that's not much help. I don't think I've ever seen a Bayesian filter that was allowed to just give up on 50% of all inputs... and if it was, I'd bet good money that it wouldn't generate any false positives either.

Paul Graham kind of got everybody thinking about statistical filtering techniques, but people haven't really picked apart his algorithm or looked at ways to tighten it up. Personally I think that path is a lot more promising.

--
I do not have a signature

And let spammers kill Orkut? by grokster · 2004-02-19 21:34 · Score: 1, Insightful

How long till the spammers come up with a way of infiltrating Orkut, and inviting random people to be their friends?

30 of 326 comments (clear)