Domain: projecthoneypot.org
Stories and comments across the archive that link to projecthoneypot.org.
Stories · 9
-
Security Service Accidentally Makes Websites 60% Faster
EastDakota writes "CloudFlare was originally conceived by the team behind the open source community. Project Honey Pot as an easy way to protect any website from hackers and spammers. The concern from the beginning was that it would add latency. It was quite a surprise when the free service launched 8 months ago and ended up speeding up websites by 60%." -
Project Honey Pot Traps Billionth Spam
EastDakota writes "Project Honey Pot today announced that it had trapped its 1 billionth spammer. To celebrate, the team behind the largest community sourced project tracking online fraud and abuse released a full rundown of statistics on the last five years of spam. Findings include: spam drops 21% on Christmas Day and 32% of New Year's Day; the most spam is sent on Mondays, the least on Saturdays; spammers found at least 956 different ways to spell VIAGRA (e.g., VIAGRA, V1AGRA, V1@GR@, V!AGRA, VIA6RA, etc.) in mail received by the Project; and much more." -
Project Honey Pot Traps Billionth Spam
EastDakota writes "Project Honey Pot today announced that it had trapped its 1 billionth spammer. To celebrate, the team behind the largest community sourced project tracking online fraud and abuse released a full rundown of statistics on the last five years of spam. Findings include: spam drops 21% on Christmas Day and 32% of New Year's Day; the most spam is sent on Mondays, the least on Saturdays; spammers found at least 956 different ways to spell VIAGRA (e.g., VIAGRA, V1AGRA, V1@GR@, V!AGRA, VIA6RA, etc.) in mail received by the Project; and much more." -
Has Google Broken JavaScript Spam Munging?
Baxil writes "For years now, Javascript munging has been a useful tool to share email addresses on the Web without exposing them to spammers. However, Google is now apparently evaluating Javascript when assembling summary text for web pages' listings, and publishing the un-munged email addresses to the world; and spammers have started to take advantage of this kind service." Anyone else seen this affecting their carefully protected email addresses? -
Major Anti-Spam Lawsuit To Be Filed In VA
Rick Zeman sends us to the Washington Post, which is reporting that a John Doe lawsuit will be filed in US District Court today in spam-unfriendly Alexandria, Virginia. The suit will be filed by Project Honey Pot, which is having a week of big announcements. The suit seeks the identity of individuals responsible for harvesting millions of e-mail addresses on behalf of spammers. From the Post: "The company is filing the suit on behalf of some 20,000 people who use its anti-spam tool. Web site owners use the project's free software to generate pages that feature unique 'spam trap' e-mail addresses each time those pages are visited. The software then records the Internet address of the visitor and the date and time of the visit. Because those addresses are never used to sign up for e-mail lists, the software can help investigators draw connections between harvesters and spammers if an address generated by a spam trap or 'honey pot' later receives junk e-mail." -
Ending Spam
Shalendra Chhabra writes "Jonathan Zdziarski has been fighting spam since before the first MIT spam conference in 2003, and has now released a full-on technical book, Ending Spam, on spam filtering. Ending Spam covers how the current and near-future crop of heuristic and statistical filters actually work under the hood, and how you can most effectively use such filters to protect your inbox." Read on for the rest of Chhabra's review. Ending Spam: Bayesian Content Filtering and the Art of Statistical Language Classification author Jonathan A. Zdziarski pages 312 publisher No Starch Press rating 8 reviewer Shalendra Chhabra ISBN 1593270526 summary Very Good Book Covering Statistical Models and Techniques Implemented in Current Spam Filters
Spam (unsolicited commercial email) and phishing (fraudulent emails) are causing losses of billions of dollars to businesses. Many initiatives are currently underway for fighting this challenge. On the legal front, a Virginia court recently sentenced a prolific spammer, Jeremy Jaynes, to nine years in prison, and a Nigerian court sentenced a woman to two and a half years for phishing. Michigan and Utah have both passed laws creating "do-not-contact" registries in July/August 2005, covering e-mail addresses, instant messaging addresses and telephone numbers. Technical initiatives to fight spam include server- or client-side spam filtering, using Lists (Blacklists, Whitelists, Greylists), Email Authentication Standards (IIM, DK, DKIM, SPF, SenderID), and emerging sender reputation and accreditation services.
Ending Spam is the first book explaining the fine details of the theoretical models and machine-learning algorithms implemented in these filters. The book is divided into three parts: introduction to spam filtering, fundamentals of statistical filtering, and advanced concepts of statistical filtering.
The first section of the book discusses the history of spam, spam kings, different approaches for fighting spam such as blacklisting, whitelisting, heuristic filtering, challenge response, throttling, collaborative filtering, Authenticated SMTP, Sender Policy Framework and SenderID, spammer fingerprinting, etc. However, the author omitted any mention of locally-sensitive hash functions (such as Nilsimsa Hash) to counter spammers' random insertion of words, the use of CAPTCHA (Completely Automated Public Turing Test to Tell Computers and Humans Apart), Greylisting, Identified Internet Mail, and Domain Keys (now Domain Keys Identified Mail).
In the next chapter, the author clearly explains various components of a Language Classifier Pipeline, including the Historical Dataset (aka wordlist, database, dictionary, filter memory), Tokenizer, and the Analysis Engine with its feedback loop. However, the process flow of a language classifier could have been more generalized, e.g. incorporating an initial text-to-text transformer. This chapter also covers the advantages and disadvantages of various training modes for filters, such as Train Everything (TEFT), Train-on-Error (TOE), and Train Until No Errors (TUNE). This part concludes with the description of Paul Graham's famous spam-filtering technique using Bayesian classification (as described in "A Plan for Spam"), Gary Robinson's Geometric Mean Test, Fisher-Robinsons Inverse Chi Square (including the source code for the inversion function), and some other tricks for optimizing spam- filtering accuracy.
The second part of this book deals with the fundamentals of statistical filtering. The author explains HTML and Base64 encoding, followed by a detailed description of tokenization techniques (e.g. Sparse Binary Polynomial Hashing). Then there's a discussion of the various tricks that spammers use for penetrating filters. Although these tactics are mentioned in John Graham-Cumming's "Spammers Compendium," Jonathan has very elegantly explained why some tricks work for spammers and some don't. This part concludes by addressing some of the resource, storage and scaling concerns raised by the large number of features generated from tokenization techniques.
The third part of this book deals with advanced concepts of statistical filtering. This includes the testing criteria for measuring accuracy of an email filter, and some advanced tokenization concepts, e.g. chained tokens (taking word-pairs and phrases into account, instead of individual words) generated using a sliding 5-byte window as mentioned in Sparse Binary Polynomial Hashing. The next chapter describes the Markovian Model implemented in the CRM114 Discriminator, but the author fails to describe different weighting schemes for features implemented in the Markovian-based version of CRM114. The author then describes the Bayesian Noise Reduction Technique for purging "out of context" data from the mail text. This chapter concludes with a very nice summary of collaborative algorithms and techniques, such as Message Innoculation, Streamlined Blackhole List, Fingerprinting, Automatic Whitelisting, URL Blacklisting, and Honeypot email addresses for snaring spammers' address harvesting bots.
The most interesting part of this book is the appendix, where the author presents interviews with John Graham-Cumming of POPFile, Brian Burton of SpamProbe, Marty Lamb of TarProxy, Bill Yerazunis of CRM114 Discriminator, and Jonathan Zdziarski of DSPAM (himself). I loved this section.
The salient points of the book: it's very easy to read; each chapter begins with a very thought-provoking introduction, and concludes with a crisp "final thoughts" section. The number of technical errors are very few in this print, and the illustrations are of good quality. Since the book is geared more toward the Bayesian and statistical generation of spam filters, the absence of certain spam-busting technologies is acceptable. However, a noticeable omission is the lack of discussion about measuring spam-filter accuracy, and what impact this has on setting filtration thresholds. A section on the economics of tradeoffs, and the use of a Receiver Operating Characteristic curve (ROC) would have been very helpful.
Overall, by putting together Ending Spam, Jonathan Zdziarski has made another significant contribution (after DSPAM) to the anti-spam community. Whether you are a system administrator, anti-spam researcher, engineer or a newbie interested in fighting spam, this book is a great reference.
William S Yerazunis and Richard Jowsey also contributed to this review. Shalendra Chhabra is a Graduate Student in Department of Computer Science and Engineering at University of California, Riverside. He is on the development team of CRM114 Discriminator and has presented his work at MIT Spam Conference 2005, Cisco Systems, and Stanford University. You can purchase Ending Spam: Bayesian Content Filtering and the Art of Statistical Language Classification from bn.com. Slashdot welcomes readers' book reviews -- to see your own review here, read the book review guidelines, then visit the submission page. -
The Spam Conference 2005
dos_dude writes "This year's Spam Conference is over. As usual, the MIT provides low and high bandwidth webcasts. The talks featured a full spectrum of anything possible. From absurd to sound, from boring to entertaining, and from dead-horse-beating to brand-new. Highlights: John Graham-Cumming presented the results of the survey he did with the help of many Slashdot readers, Jon Praed gave the details of the trial against spammer Jeremy Jaynes and friends, Brian McWilliams posed the question what will happen when all spam is finally filtered, and Matthew Prince plugged Project Honeypot in a very entertaining way. Shameless but useful plug: here's the final schedule with links to the webcasts." -
New Attacks on Spam
AttackOfTheDictionaries writes "Project Honey Pot started operating back in November. The Project provides its participants with a script that generates fake webpages with unique honeypot email addresses. The end result is that Project Honey Pot can connect email harvesters' IP addresses with the spam received by those honeypot email addresses. Which is pretty nifty, but left some people asking how that would help legal attacks on spam. Well, it seems that some lawyer over at SecurityFocus has an answer." -
Tech Reporter Pursues Spammer
girish writes "Technology reporter extrordinaire, Mike Wendland, is at it again tracking down spammers. Wendland conducted the infamous interview with Alan Ralsky, the alleged mega-spammer, a few years ago. That article spawned a lively discussion on Slashdot and eventually resulted in hundreds of pieces of junk postal mail flooding Ralsky's million-dollar home. Now Wendland is using a new tool from a service called Project Honey Pot to track email address harvesters. He posted on his technology blog this morning about catching a company that is holding itself out as a legitimate bulk mailer, but appears in fact to be sending to harvested addresses and conducting on the side some other seemingly seedy businesses. Interesting stuff."