Domain: ucr.edu
Stories and comments across the archive that link to ucr.edu.
Stories · 30
-
Researchers Bypass ASLR Protection On Intel Haswell CPUs (softpedia.com)
An anonymous reader writes: "A team of scientists from two U.S. universities has devised a method of bypassing ASLR (Address Space Layout Randomization) protection by taking advantage of the BTB (Branch Target Buffer), a component included in many modern CPU architectures, including Intel Haswell CPUs, the processor they used for tests in their research," reports Softpedia. The researchers discovered that by blasting the BTB with random data, they could run a successful collision attack that reveals the memory locations where apps execute code in the computer's memory -- the very thing that ASLR protection was meant to hide. While during their tests they used a Linux PC with a Intel Haswell CPU, researchers said the attack can be ported to other CPU architectures and operating systems where ASLR is deployed, such as Android, iOS, macOS, and Windows. From start to finish, the collision attack only takes 60 milliseconds, meaning it can be embedded with malware or any other digital forensics tool and run without needing hours of intense CPU processing. You can read the research paper, titled "Jump Over ASLR: Attacking Branch Predictors to Bypass ASLR," here. -
New Device Could Greatly Improve Speech and Image Recognition
jan_jes writes: Scientists have successfully demonstrated pattern recognition using a magnonic holographic memory device, a development that could greatly improve speech and image recognition hardware. The researchers built a prototype eight-terminal device consisting of a magnetic matrix with micro-antennas to excite and detect the spin waves. The micro-antennas allow the researchers to generate and recognize any input phase pattern, a big advantage over existing practices. It takes about 100 nanoseconds for recognition, which is the time required for spin waves to propagate and to create the interference pattern. The main challenge associated with magnonic holographic memory is the scaling of the operational wavelength, which requires the development of sub-micrometer scale elements for spin wave generation and detection. -
Graphene: Reversible Method of Magnetic Doping Paves Way For Semiconductor Use
concertina226 writes: A team of physicists at University of California, Riverside have discovered how to induce magnetism in graphene in a way that still preserves the material's electronic properties, which paves the way for graphene to be used as a semiconductor.
The researchers grew a sheet of yttrium iron garnet using laser molecular beam epitaxy in a laboratory (abstract). Magnetic substances like iron are known to disrupt graphene's electrical conduction properties, but yttrium iron garnet works well as it is an electric insulator.
When a graphene sheet was placed on top of an atomically smooth sheet of yttrium iron garnet, the graphene borrowed the magnetic properties from the yttrium iron garnet and became magnetized without the need for doping. -
Researchers Hack Gmail With 92 Percent Success Rate
SternisheFan sends this report from CNET: Researchers at the University of California Riverside Bourns College of Engineering and the University of Michigan have identified a weakness they believe to exist across Android, Windows, and iOS operating systems that could allow malicious apps to obtain personal information. Although it was tested only on an Android phone, the team believes that the method could be used across all three operating systems because all three share a similar feature: all apps can access a mobile device's shared memory. "The assumption has always been that these apps can't interfere with each other easily," said Zhiyun Qian, an associate professor at UC Riverside. "We show that assumption is not correct and one app can in fact significantly impact another and result in harmful consequences for the user." To demonstrate the method of attack, first a user must download an app that appears benign, such as a wallpaper, but actually contains malicious code. Once installed, the researchers can use it to access the shared memory statistics of any process (PDF), which doesn't require any special privileges. -
Researchers Hack Gmail With 92 Percent Success Rate
SternisheFan sends this report from CNET: Researchers at the University of California Riverside Bourns College of Engineering and the University of Michigan have identified a weakness they believe to exist across Android, Windows, and iOS operating systems that could allow malicious apps to obtain personal information. Although it was tested only on an Android phone, the team believes that the method could be used across all three operating systems because all three share a similar feature: all apps can access a mobile device's shared memory. "The assumption has always been that these apps can't interfere with each other easily," said Zhiyun Qian, an associate professor at UC Riverside. "We show that assumption is not correct and one app can in fact significantly impact another and result in harmful consequences for the user." To demonstrate the method of attack, first a user must download an app that appears benign, such as a wallpaper, but actually contains malicious code. Once installed, the researchers can use it to access the shared memory statistics of any process (PDF), which doesn't require any special privileges. -
Graphene Could Be Dangerous To Humans and the Environment
Zothecula (1870348) writes "It's easy to get carried away when you start talking about graphene. Its properties hold the promise of outright technological revolution in so many fields that it has been called a wonder material. Two recent studies, however, give us a less than rosy angle. In the first, a team of biologists, engineers and material scientists at Brown University examined graphene's potential toxicity in human cells. Another study by a team from University of California, Riverside's Bourns College of Engineering examined how graphene oxide nanoparticles might interact with the environment if they found their way into surface or ground water sources." -
How Would an Astronaut Falling Into a Black Hole Die?
ananyo writes "According to the accepted account, an astronaut falling into a black hole would be ripped apart, and his remnants crushed as they plunged into the black hole's infinitely dense core. Calculations by Joseph Polchinski, a string theorist at the Kavli Institute for Theoretical Physics in Santa Barbara, California, though, point to a different end: quantum effects turn the event horizon into a seething maelstrom of particles and anyone who fell in would hit a wall of fire and be burned to a crisp in an instant. There's one problem with the firewall theory. If Polchinski is right, then either general relativity or quantum mechanics is wrong and his work has triggered a mini-crisis in theoretical physics." -
Replicating Hardest Known Biomaterial Could Improve Solar Cells and Batteries
cylonlover writes "Inspired by the tough teeth of a marine snail and the remarkable process by which they form, assistant professor David Kisailus at the University of California, Riverside is working toward building cheaper, more efficient nanomaterials. By achieving greater control over the low-temperature growth of nanocrystals (abstract), his research could improve the performance of solar cells and lithium-ion batteries, lead to higher-performance materials for car and airplane frames, and help develop abrasion-resistant materials that could be used for anything from specialized clothing to dental drills." -
University Receives $5 Million Grant To Study Immortality
Hugh Pickens writes "Humans have pondered their mortality for millennia. Now the University of California at Riverside reports that it has received a $5 million grant from the John Templeton Foundation that will fund research on aspects of immortality, including near-death experiences and the impact of belief in an afterlife on human behavior. 'People have been thinking about immortality throughout history. We have a deep human need to figure out what happens to us after death,' says John Martin Fischer, the principal investigator of The Immortality Project. 'No one has taken a comprehensive and sustained look at immortality that brings together the science, theology and philosophy.' Fischer says he going to investigate two different kinds of immortality. One is the possibility of living forever without dying. The main questions there are whether it's technologically plausible or feasible for us, either by biological enhancement such as those described by Ray Kurzweil, or by some combination of biological enhancement and uploading our minds onto computers in the future. Second would be to investigate the full range of questions about Judeo, Christian, Hindu, Buddhist, and other Asian religions' conceptions of the afterlife to see if they're theologically and philosophically consistent. 'We'll look at near death experiences both in western cultures and throughout the world and really look at what they're all about and ask the question — do they indicate something about an afterlife or are they kind of just illusions that we're hardwired into?'" -
Face Recognition Maps History Via Art
mikejuk writes "Face recognition techniques usually come with a certain amount of controversy. A new application, however, is unlikely to trigger any privacy concerns — because all of the subjects are long dead. 'FACES: Faces, Art, and Computerized Evaluation Systems' will attempt to apply face recognition software to portraits. Three University of California, Riverside researchers have just received funding to try and piece together the who's who in history. 'Almost every portrait painted before the 19th century was of a person of some importance. As families fell on hard times, many of these portraits were sold and the identities of these subjects were lost. The question we hope to answer is, can we restore these identities?' If the algorithm can be fine tuned we can look forward to the digitized collections of museums and art galleries around the world suddenly yielding a who-knew-who social network graph that could put more science, and computer science at that, into history." -
Two Dogs To Present Paper At WSEAS Conference
An anonymous reader writes "In reality, it's yet another randomly-generated paper that got into a conference with no meaningful criteria for peer review. In the world of humor, two dogs may be up for tenure soon." -
Walking Molecule Now Carries Packages
Roland Piquepaille writes "Chemists from the University of California at Riverside designed two years ago a molecule which could move straight on a flat surface — a nano-walker if you wish. Now, they've found a way to force this walking molecule to carry packages. The nano-worker can now carry two CO2 molecules. And like yourself when you carry two heavy bags, this nano-worker is slower when it carries other molecules. The researchers think their discovery will lead to reliable ways of carrying molecules, an equivalent of the conveyor belts in today's factories." -
Ending Spam
Shalendra Chhabra writes "Jonathan Zdziarski has been fighting spam since before the first MIT spam conference in 2003, and has now released a full-on technical book, Ending Spam, on spam filtering. Ending Spam covers how the current and near-future crop of heuristic and statistical filters actually work under the hood, and how you can most effectively use such filters to protect your inbox." Read on for the rest of Chhabra's review. Ending Spam: Bayesian Content Filtering and the Art of Statistical Language Classification author Jonathan A. Zdziarski pages 312 publisher No Starch Press rating 8 reviewer Shalendra Chhabra ISBN 1593270526 summary Very Good Book Covering Statistical Models and Techniques Implemented in Current Spam Filters
Spam (unsolicited commercial email) and phishing (fraudulent emails) are causing losses of billions of dollars to businesses. Many initiatives are currently underway for fighting this challenge. On the legal front, a Virginia court recently sentenced a prolific spammer, Jeremy Jaynes, to nine years in prison, and a Nigerian court sentenced a woman to two and a half years for phishing. Michigan and Utah have both passed laws creating "do-not-contact" registries in July/August 2005, covering e-mail addresses, instant messaging addresses and telephone numbers. Technical initiatives to fight spam include server- or client-side spam filtering, using Lists (Blacklists, Whitelists, Greylists), Email Authentication Standards (IIM, DK, DKIM, SPF, SenderID), and emerging sender reputation and accreditation services.
Ending Spam is the first book explaining the fine details of the theoretical models and machine-learning algorithms implemented in these filters. The book is divided into three parts: introduction to spam filtering, fundamentals of statistical filtering, and advanced concepts of statistical filtering.
The first section of the book discusses the history of spam, spam kings, different approaches for fighting spam such as blacklisting, whitelisting, heuristic filtering, challenge response, throttling, collaborative filtering, Authenticated SMTP, Sender Policy Framework and SenderID, spammer fingerprinting, etc. However, the author omitted any mention of locally-sensitive hash functions (such as Nilsimsa Hash) to counter spammers' random insertion of words, the use of CAPTCHA (Completely Automated Public Turing Test to Tell Computers and Humans Apart), Greylisting, Identified Internet Mail, and Domain Keys (now Domain Keys Identified Mail).
In the next chapter, the author clearly explains various components of a Language Classifier Pipeline, including the Historical Dataset (aka wordlist, database, dictionary, filter memory), Tokenizer, and the Analysis Engine with its feedback loop. However, the process flow of a language classifier could have been more generalized, e.g. incorporating an initial text-to-text transformer. This chapter also covers the advantages and disadvantages of various training modes for filters, such as Train Everything (TEFT), Train-on-Error (TOE), and Train Until No Errors (TUNE). This part concludes with the description of Paul Graham's famous spam-filtering technique using Bayesian classification (as described in "A Plan for Spam"), Gary Robinson's Geometric Mean Test, Fisher-Robinsons Inverse Chi Square (including the source code for the inversion function), and some other tricks for optimizing spam- filtering accuracy.
The second part of this book deals with the fundamentals of statistical filtering. The author explains HTML and Base64 encoding, followed by a detailed description of tokenization techniques (e.g. Sparse Binary Polynomial Hashing). Then there's a discussion of the various tricks that spammers use for penetrating filters. Although these tactics are mentioned in John Graham-Cumming's "Spammers Compendium," Jonathan has very elegantly explained why some tricks work for spammers and some don't. This part concludes by addressing some of the resource, storage and scaling concerns raised by the large number of features generated from tokenization techniques.
The third part of this book deals with advanced concepts of statistical filtering. This includes the testing criteria for measuring accuracy of an email filter, and some advanced tokenization concepts, e.g. chained tokens (taking word-pairs and phrases into account, instead of individual words) generated using a sliding 5-byte window as mentioned in Sparse Binary Polynomial Hashing. The next chapter describes the Markovian Model implemented in the CRM114 Discriminator, but the author fails to describe different weighting schemes for features implemented in the Markovian-based version of CRM114. The author then describes the Bayesian Noise Reduction Technique for purging "out of context" data from the mail text. This chapter concludes with a very nice summary of collaborative algorithms and techniques, such as Message Innoculation, Streamlined Blackhole List, Fingerprinting, Automatic Whitelisting, URL Blacklisting, and Honeypot email addresses for snaring spammers' address harvesting bots.
The most interesting part of this book is the appendix, where the author presents interviews with John Graham-Cumming of POPFile, Brian Burton of SpamProbe, Marty Lamb of TarProxy, Bill Yerazunis of CRM114 Discriminator, and Jonathan Zdziarski of DSPAM (himself). I loved this section.
The salient points of the book: it's very easy to read; each chapter begins with a very thought-provoking introduction, and concludes with a crisp "final thoughts" section. The number of technical errors are very few in this print, and the illustrations are of good quality. Since the book is geared more toward the Bayesian and statistical generation of spam filters, the absence of certain spam-busting technologies is acceptable. However, a noticeable omission is the lack of discussion about measuring spam-filter accuracy, and what impact this has on setting filtration thresholds. A section on the economics of tradeoffs, and the use of a Receiver Operating Characteristic curve (ROC) would have been very helpful.
Overall, by putting together Ending Spam, Jonathan Zdziarski has made another significant contribution (after DSPAM) to the anti-spam community. Whether you are a system administrator, anti-spam researcher, engineer or a newbie interested in fighting spam, this book is a great reference.
William S Yerazunis and Richard Jowsey also contributed to this review. Shalendra Chhabra is a Graduate Student in Department of Computer Science and Engineering at University of California, Riverside. He is on the development team of CRM114 Discriminator and has presented his work at MIT Spam Conference 2005, Cisco Systems, and Stanford University. You can purchase Ending Spam: Bayesian Content Filtering and the Art of Statistical Language Classification from bn.com. Slashdot welcomes readers' book reviews -- to see your own review here, read the book review guidelines, then visit the submission page. -
Ending Spam
Shalendra Chhabra writes "Jonathan Zdziarski has been fighting spam since before the first MIT spam conference in 2003, and has now released a full-on technical book, Ending Spam, on spam filtering. Ending Spam covers how the current and near-future crop of heuristic and statistical filters actually work under the hood, and how you can most effectively use such filters to protect your inbox." Read on for the rest of Chhabra's review. Ending Spam: Bayesian Content Filtering and the Art of Statistical Language Classification author Jonathan A. Zdziarski pages 312 publisher No Starch Press rating 8 reviewer Shalendra Chhabra ISBN 1593270526 summary Very Good Book Covering Statistical Models and Techniques Implemented in Current Spam Filters
Spam (unsolicited commercial email) and phishing (fraudulent emails) are causing losses of billions of dollars to businesses. Many initiatives are currently underway for fighting this challenge. On the legal front, a Virginia court recently sentenced a prolific spammer, Jeremy Jaynes, to nine years in prison, and a Nigerian court sentenced a woman to two and a half years for phishing. Michigan and Utah have both passed laws creating "do-not-contact" registries in July/August 2005, covering e-mail addresses, instant messaging addresses and telephone numbers. Technical initiatives to fight spam include server- or client-side spam filtering, using Lists (Blacklists, Whitelists, Greylists), Email Authentication Standards (IIM, DK, DKIM, SPF, SenderID), and emerging sender reputation and accreditation services.
Ending Spam is the first book explaining the fine details of the theoretical models and machine-learning algorithms implemented in these filters. The book is divided into three parts: introduction to spam filtering, fundamentals of statistical filtering, and advanced concepts of statistical filtering.
The first section of the book discusses the history of spam, spam kings, different approaches for fighting spam such as blacklisting, whitelisting, heuristic filtering, challenge response, throttling, collaborative filtering, Authenticated SMTP, Sender Policy Framework and SenderID, spammer fingerprinting, etc. However, the author omitted any mention of locally-sensitive hash functions (such as Nilsimsa Hash) to counter spammers' random insertion of words, the use of CAPTCHA (Completely Automated Public Turing Test to Tell Computers and Humans Apart), Greylisting, Identified Internet Mail, and Domain Keys (now Domain Keys Identified Mail).
In the next chapter, the author clearly explains various components of a Language Classifier Pipeline, including the Historical Dataset (aka wordlist, database, dictionary, filter memory), Tokenizer, and the Analysis Engine with its feedback loop. However, the process flow of a language classifier could have been more generalized, e.g. incorporating an initial text-to-text transformer. This chapter also covers the advantages and disadvantages of various training modes for filters, such as Train Everything (TEFT), Train-on-Error (TOE), and Train Until No Errors (TUNE). This part concludes with the description of Paul Graham's famous spam-filtering technique using Bayesian classification (as described in "A Plan for Spam"), Gary Robinson's Geometric Mean Test, Fisher-Robinsons Inverse Chi Square (including the source code for the inversion function), and some other tricks for optimizing spam- filtering accuracy.
The second part of this book deals with the fundamentals of statistical filtering. The author explains HTML and Base64 encoding, followed by a detailed description of tokenization techniques (e.g. Sparse Binary Polynomial Hashing). Then there's a discussion of the various tricks that spammers use for penetrating filters. Although these tactics are mentioned in John Graham-Cumming's "Spammers Compendium," Jonathan has very elegantly explained why some tricks work for spammers and some don't. This part concludes by addressing some of the resource, storage and scaling concerns raised by the large number of features generated from tokenization techniques.
The third part of this book deals with advanced concepts of statistical filtering. This includes the testing criteria for measuring accuracy of an email filter, and some advanced tokenization concepts, e.g. chained tokens (taking word-pairs and phrases into account, instead of individual words) generated using a sliding 5-byte window as mentioned in Sparse Binary Polynomial Hashing. The next chapter describes the Markovian Model implemented in the CRM114 Discriminator, but the author fails to describe different weighting schemes for features implemented in the Markovian-based version of CRM114. The author then describes the Bayesian Noise Reduction Technique for purging "out of context" data from the mail text. This chapter concludes with a very nice summary of collaborative algorithms and techniques, such as Message Innoculation, Streamlined Blackhole List, Fingerprinting, Automatic Whitelisting, URL Blacklisting, and Honeypot email addresses for snaring spammers' address harvesting bots.
The most interesting part of this book is the appendix, where the author presents interviews with John Graham-Cumming of POPFile, Brian Burton of SpamProbe, Marty Lamb of TarProxy, Bill Yerazunis of CRM114 Discriminator, and Jonathan Zdziarski of DSPAM (himself). I loved this section.
The salient points of the book: it's very easy to read; each chapter begins with a very thought-provoking introduction, and concludes with a crisp "final thoughts" section. The number of technical errors are very few in this print, and the illustrations are of good quality. Since the book is geared more toward the Bayesian and statistical generation of spam filters, the absence of certain spam-busting technologies is acceptable. However, a noticeable omission is the lack of discussion about measuring spam-filter accuracy, and what impact this has on setting filtration thresholds. A section on the economics of tradeoffs, and the use of a Receiver Operating Characteristic curve (ROC) would have been very helpful.
Overall, by putting together Ending Spam, Jonathan Zdziarski has made another significant contribution (after DSPAM) to the anti-spam community. Whether you are a system administrator, anti-spam researcher, engineer or a newbie interested in fighting spam, this book is a great reference.
William S Yerazunis and Richard Jowsey also contributed to this review. Shalendra Chhabra is a Graduate Student in Department of Computer Science and Engineering at University of California, Riverside. He is on the development team of CRM114 Discriminator and has presented his work at MIT Spam Conference 2005, Cisco Systems, and Stanford University. You can purchase Ending Spam: Bayesian Content Filtering and the Art of Statistical Language Classification from bn.com. Slashdot welcomes readers' book reviews -- to see your own review here, read the book review guidelines, then visit the submission page. -
Ending Spam
Shalendra Chhabra writes "Jonathan Zdziarski has been fighting spam since before the first MIT spam conference in 2003, and has now released a full-on technical book, Ending Spam, on spam filtering. Ending Spam covers how the current and near-future crop of heuristic and statistical filters actually work under the hood, and how you can most effectively use such filters to protect your inbox." Read on for the rest of Chhabra's review. Ending Spam: Bayesian Content Filtering and the Art of Statistical Language Classification author Jonathan A. Zdziarski pages 312 publisher No Starch Press rating 8 reviewer Shalendra Chhabra ISBN 1593270526 summary Very Good Book Covering Statistical Models and Techniques Implemented in Current Spam Filters
Spam (unsolicited commercial email) and phishing (fraudulent emails) are causing losses of billions of dollars to businesses. Many initiatives are currently underway for fighting this challenge. On the legal front, a Virginia court recently sentenced a prolific spammer, Jeremy Jaynes, to nine years in prison, and a Nigerian court sentenced a woman to two and a half years for phishing. Michigan and Utah have both passed laws creating "do-not-contact" registries in July/August 2005, covering e-mail addresses, instant messaging addresses and telephone numbers. Technical initiatives to fight spam include server- or client-side spam filtering, using Lists (Blacklists, Whitelists, Greylists), Email Authentication Standards (IIM, DK, DKIM, SPF, SenderID), and emerging sender reputation and accreditation services.
Ending Spam is the first book explaining the fine details of the theoretical models and machine-learning algorithms implemented in these filters. The book is divided into three parts: introduction to spam filtering, fundamentals of statistical filtering, and advanced concepts of statistical filtering.
The first section of the book discusses the history of spam, spam kings, different approaches for fighting spam such as blacklisting, whitelisting, heuristic filtering, challenge response, throttling, collaborative filtering, Authenticated SMTP, Sender Policy Framework and SenderID, spammer fingerprinting, etc. However, the author omitted any mention of locally-sensitive hash functions (such as Nilsimsa Hash) to counter spammers' random insertion of words, the use of CAPTCHA (Completely Automated Public Turing Test to Tell Computers and Humans Apart), Greylisting, Identified Internet Mail, and Domain Keys (now Domain Keys Identified Mail).
In the next chapter, the author clearly explains various components of a Language Classifier Pipeline, including the Historical Dataset (aka wordlist, database, dictionary, filter memory), Tokenizer, and the Analysis Engine with its feedback loop. However, the process flow of a language classifier could have been more generalized, e.g. incorporating an initial text-to-text transformer. This chapter also covers the advantages and disadvantages of various training modes for filters, such as Train Everything (TEFT), Train-on-Error (TOE), and Train Until No Errors (TUNE). This part concludes with the description of Paul Graham's famous spam-filtering technique using Bayesian classification (as described in "A Plan for Spam"), Gary Robinson's Geometric Mean Test, Fisher-Robinsons Inverse Chi Square (including the source code for the inversion function), and some other tricks for optimizing spam- filtering accuracy.
The second part of this book deals with the fundamentals of statistical filtering. The author explains HTML and Base64 encoding, followed by a detailed description of tokenization techniques (e.g. Sparse Binary Polynomial Hashing). Then there's a discussion of the various tricks that spammers use for penetrating filters. Although these tactics are mentioned in John Graham-Cumming's "Spammers Compendium," Jonathan has very elegantly explained why some tricks work for spammers and some don't. This part concludes by addressing some of the resource, storage and scaling concerns raised by the large number of features generated from tokenization techniques.
The third part of this book deals with advanced concepts of statistical filtering. This includes the testing criteria for measuring accuracy of an email filter, and some advanced tokenization concepts, e.g. chained tokens (taking word-pairs and phrases into account, instead of individual words) generated using a sliding 5-byte window as mentioned in Sparse Binary Polynomial Hashing. The next chapter describes the Markovian Model implemented in the CRM114 Discriminator, but the author fails to describe different weighting schemes for features implemented in the Markovian-based version of CRM114. The author then describes the Bayesian Noise Reduction Technique for purging "out of context" data from the mail text. This chapter concludes with a very nice summary of collaborative algorithms and techniques, such as Message Innoculation, Streamlined Blackhole List, Fingerprinting, Automatic Whitelisting, URL Blacklisting, and Honeypot email addresses for snaring spammers' address harvesting bots.
The most interesting part of this book is the appendix, where the author presents interviews with John Graham-Cumming of POPFile, Brian Burton of SpamProbe, Marty Lamb of TarProxy, Bill Yerazunis of CRM114 Discriminator, and Jonathan Zdziarski of DSPAM (himself). I loved this section.
The salient points of the book: it's very easy to read; each chapter begins with a very thought-provoking introduction, and concludes with a crisp "final thoughts" section. The number of technical errors are very few in this print, and the illustrations are of good quality. Since the book is geared more toward the Bayesian and statistical generation of spam filters, the absence of certain spam-busting technologies is acceptable. However, a noticeable omission is the lack of discussion about measuring spam-filter accuracy, and what impact this has on setting filtration thresholds. A section on the economics of tradeoffs, and the use of a Receiver Operating Characteristic curve (ROC) would have been very helpful.
Overall, by putting together Ending Spam, Jonathan Zdziarski has made another significant contribution (after DSPAM) to the anti-spam community. Whether you are a system administrator, anti-spam researcher, engineer or a newbie interested in fighting spam, this book is a great reference.
William S Yerazunis and Richard Jowsey also contributed to this review. Shalendra Chhabra is a Graduate Student in Department of Computer Science and Engineering at University of California, Riverside. He is on the development team of CRM114 Discriminator and has presented his work at MIT Spam Conference 2005, Cisco Systems, and Stanford University. You can purchase Ending Spam: Bayesian Content Filtering and the Art of Statistical Language Classification from bn.com. Slashdot welcomes readers' book reviews -- to see your own review here, read the book review guidelines, then visit the submission page. -
Ending Spam
Shalendra Chhabra writes "Jonathan Zdziarski has been fighting spam since before the first MIT spam conference in 2003, and has now released a full-on technical book, Ending Spam, on spam filtering. Ending Spam covers how the current and near-future crop of heuristic and statistical filters actually work under the hood, and how you can most effectively use such filters to protect your inbox." Read on for the rest of Chhabra's review. Ending Spam: Bayesian Content Filtering and the Art of Statistical Language Classification author Jonathan A. Zdziarski pages 312 publisher No Starch Press rating 8 reviewer Shalendra Chhabra ISBN 1593270526 summary Very Good Book Covering Statistical Models and Techniques Implemented in Current Spam Filters
Spam (unsolicited commercial email) and phishing (fraudulent emails) are causing losses of billions of dollars to businesses. Many initiatives are currently underway for fighting this challenge. On the legal front, a Virginia court recently sentenced a prolific spammer, Jeremy Jaynes, to nine years in prison, and a Nigerian court sentenced a woman to two and a half years for phishing. Michigan and Utah have both passed laws creating "do-not-contact" registries in July/August 2005, covering e-mail addresses, instant messaging addresses and telephone numbers. Technical initiatives to fight spam include server- or client-side spam filtering, using Lists (Blacklists, Whitelists, Greylists), Email Authentication Standards (IIM, DK, DKIM, SPF, SenderID), and emerging sender reputation and accreditation services.
Ending Spam is the first book explaining the fine details of the theoretical models and machine-learning algorithms implemented in these filters. The book is divided into three parts: introduction to spam filtering, fundamentals of statistical filtering, and advanced concepts of statistical filtering.
The first section of the book discusses the history of spam, spam kings, different approaches for fighting spam such as blacklisting, whitelisting, heuristic filtering, challenge response, throttling, collaborative filtering, Authenticated SMTP, Sender Policy Framework and SenderID, spammer fingerprinting, etc. However, the author omitted any mention of locally-sensitive hash functions (such as Nilsimsa Hash) to counter spammers' random insertion of words, the use of CAPTCHA (Completely Automated Public Turing Test to Tell Computers and Humans Apart), Greylisting, Identified Internet Mail, and Domain Keys (now Domain Keys Identified Mail).
In the next chapter, the author clearly explains various components of a Language Classifier Pipeline, including the Historical Dataset (aka wordlist, database, dictionary, filter memory), Tokenizer, and the Analysis Engine with its feedback loop. However, the process flow of a language classifier could have been more generalized, e.g. incorporating an initial text-to-text transformer. This chapter also covers the advantages and disadvantages of various training modes for filters, such as Train Everything (TEFT), Train-on-Error (TOE), and Train Until No Errors (TUNE). This part concludes with the description of Paul Graham's famous spam-filtering technique using Bayesian classification (as described in "A Plan for Spam"), Gary Robinson's Geometric Mean Test, Fisher-Robinsons Inverse Chi Square (including the source code for the inversion function), and some other tricks for optimizing spam- filtering accuracy.
The second part of this book deals with the fundamentals of statistical filtering. The author explains HTML and Base64 encoding, followed by a detailed description of tokenization techniques (e.g. Sparse Binary Polynomial Hashing). Then there's a discussion of the various tricks that spammers use for penetrating filters. Although these tactics are mentioned in John Graham-Cumming's "Spammers Compendium," Jonathan has very elegantly explained why some tricks work for spammers and some don't. This part concludes by addressing some of the resource, storage and scaling concerns raised by the large number of features generated from tokenization techniques.
The third part of this book deals with advanced concepts of statistical filtering. This includes the testing criteria for measuring accuracy of an email filter, and some advanced tokenization concepts, e.g. chained tokens (taking word-pairs and phrases into account, instead of individual words) generated using a sliding 5-byte window as mentioned in Sparse Binary Polynomial Hashing. The next chapter describes the Markovian Model implemented in the CRM114 Discriminator, but the author fails to describe different weighting schemes for features implemented in the Markovian-based version of CRM114. The author then describes the Bayesian Noise Reduction Technique for purging "out of context" data from the mail text. This chapter concludes with a very nice summary of collaborative algorithms and techniques, such as Message Innoculation, Streamlined Blackhole List, Fingerprinting, Automatic Whitelisting, URL Blacklisting, and Honeypot email addresses for snaring spammers' address harvesting bots.
The most interesting part of this book is the appendix, where the author presents interviews with John Graham-Cumming of POPFile, Brian Burton of SpamProbe, Marty Lamb of TarProxy, Bill Yerazunis of CRM114 Discriminator, and Jonathan Zdziarski of DSPAM (himself). I loved this section.
The salient points of the book: it's very easy to read; each chapter begins with a very thought-provoking introduction, and concludes with a crisp "final thoughts" section. The number of technical errors are very few in this print, and the illustrations are of good quality. Since the book is geared more toward the Bayesian and statistical generation of spam filters, the absence of certain spam-busting technologies is acceptable. However, a noticeable omission is the lack of discussion about measuring spam-filter accuracy, and what impact this has on setting filtration thresholds. A section on the economics of tradeoffs, and the use of a Receiver Operating Characteristic curve (ROC) would have been very helpful.
Overall, by putting together Ending Spam, Jonathan Zdziarski has made another significant contribution (after DSPAM) to the anti-spam community. Whether you are a system administrator, anti-spam researcher, engineer or a newbie interested in fighting spam, this book is a great reference.
William S Yerazunis and Richard Jowsey also contributed to this review. Shalendra Chhabra is a Graduate Student in Department of Computer Science and Engineering at University of California, Riverside. He is on the development team of CRM114 Discriminator and has presented his work at MIT Spam Conference 2005, Cisco Systems, and Stanford University. You can purchase Ending Spam: Bayesian Content Filtering and the Art of Statistical Language Classification from bn.com. Slashdot welcomes readers' book reviews -- to see your own review here, read the book review guidelines, then visit the submission page. -
Microsoft Researchers on Stopping Spam
TheBackBencher writes "Scientific American today has a very interesting article about "Stopping Spam" by Joshua Goodman, David Hackerman and Robert Rounthwaite from Microsoft Research. They talk about different types of spam -- spam with emails, spam on IMs, spamlinks on web pages and image based spam. They mention different techniques for spam filtering mainly fingerprinting matching techniques, n grams model, naive bayesian approach, optical character recognition, challenge/response systems and Human Interacted Proofs (HIP) in a very lucid style. They however do not mention fingerprinting approach of using Nilsimsa Hash to tackle addition of random words by spammers in emails or hypertextus interruptus technique used by spammers of splitting words using HTML comments, pairs of zero width tags, or bogus tags. Also, Spam-Research is reporting the SplitFit Technique that Spammers are using to fool Yahoo! Mail SpamGuard." -
Four Inducted Into SF Hall of Fame
maxentius writes "There are four new members of the Science Fiction Hall of Fame: Chesley Bonestell, Philip K. Dick, Ray Harryhausen, and Steven Spielberg. The Hall, once located in Lawrence, Kansas, is now a part of the Science Fiction Museum in Seattle. This brings to 40 the number of inductees; the newest members will be officially welcomed May 6. According to the SF Museum site, "The event will include a cocktail hour, seated dinner, induction ceremony, and after-party." The ceremony will occur in the middle of the Eaton Conference, a three-day presentation co-sponsored by the museum and the University of California Riverside's Eaton Collection. This year's topic is "Inventing the 21st Century: Many Worlds, Many Histories."" -
Randy Hyde's HLA Begets OS Adventure Game
jg21 writes "Paul Panks already has 30 text adventure games to his credit, and he's just written a report at LinuxWorld explaining how, when he came across Randall Hyde's website, he realized that Hyde's High Level Assembly language warranted a new departure - writing an open-source textadventure game. The result is "HLA Adventure" which he released into the public domain so anyone may contribute to the expansion of the game world, creatures within the world, and additional quests. HLA Adventure has its own Yahoo group." We recently covered HLA in our Developers section. -
High Level Assembly
dunric writes "Randall Hyde has developed a programming language called High Level Assembly (HLA). It is a great way for new programmers to develop applications for both Windows and Linux. It works with a variety of assemblers, including Gas, Fasm, Masm and others. The website for Randy's HLA is located at: http://webster.cs.ucr.edu/" -
Small Electronic Logic Blocks - eBlocks
eBlocks writes "eBlocks are small low-cost electronic devices that can be easily interconnected for a wide variety of applications such as: detecting motion, light, water, sound or magnetic fields; triggering a buzzer, a light, an electronic relay or a lock. Devices can communicate wirelessly or can be controlled remotely via the internet or a telephone. The eBlocks technology has been developed by a professor at U.C. Riverside who is looking for inspiration on its best uses. Try out the simulator. Suggestions and comments welcome!" -
Small Electronic Logic Blocks - eBlocks
eBlocks writes "eBlocks are small low-cost electronic devices that can be easily interconnected for a wide variety of applications such as: detecting motion, light, water, sound or magnetic fields; triggering a buzzer, a light, an electronic relay or a lock. Devices can communicate wirelessly or can be controlled remotely via the internet or a telephone. The eBlocks technology has been developed by a professor at U.C. Riverside who is looking for inspiration on its best uses. Try out the simulator. Suggestions and comments welcome!" -
Small Electronic Logic Blocks - eBlocks
eBlocks writes "eBlocks are small low-cost electronic devices that can be easily interconnected for a wide variety of applications such as: detecting motion, light, water, sound or magnetic fields; triggering a buzzer, a light, an electronic relay or a lock. Devices can communicate wirelessly or can be controlled remotely via the internet or a telephone. The eBlocks technology has been developed by a professor at U.C. Riverside who is looking for inspiration on its best uses. Try out the simulator. Suggestions and comments welcome!" -
Small Electronic Logic Blocks - eBlocks
eBlocks writes "eBlocks are small low-cost electronic devices that can be easily interconnected for a wide variety of applications such as: detecting motion, light, water, sound or magnetic fields; triggering a buzzer, a light, an electronic relay or a lock. Devices can communicate wirelessly or can be controlled remotely via the internet or a telephone. The eBlocks technology has been developed by a professor at U.C. Riverside who is looking for inspiration on its best uses. Try out the simulator. Suggestions and comments welcome!" -
Small Electronic Logic Blocks - eBlocks
eBlocks writes "eBlocks are small low-cost electronic devices that can be easily interconnected for a wide variety of applications such as: detecting motion, light, water, sound or magnetic fields; triggering a buzzer, a light, an electronic relay or a lock. Devices can communicate wirelessly or can be controlled remotely via the internet or a telephone. The eBlocks technology has been developed by a professor at U.C. Riverside who is looking for inspiration on its best uses. Try out the simulator. Suggestions and comments welcome!" -
A New Spin On Physical Phenomena
f00Dave writes "Researchers have discovered "a new physical phenomenon, electrostatic rotation, that, in the absence of friction, leads to spin". I'm a bit skeptical about the implied relationship between physical "spin" (as in rotation) and quantum "spin", however. Still, this is the sort of scientific advance that renews my faith in the system. Go nerds! =]" -
Trees Fall Prey to AoA
bluethundr writes "For all of the years that it has been available, the only way to read the classic instructional text known as the Art of Assembly by Randy Hyde, was to read it online or download a PDF'd copy and print it out for your own bad sef. It would seem that No Starch Press stands poised to release a pre-bound (aka BOOK) version of this most highly esteemed volumes on the arcane topic on highly convenient dead tree media. I for one am very glad about this development. While I recognize the value of hypertext for reference, when it comes to learning any complex topic at length dead tree media is the way to go, hands down IMHO! I emailed the company to get a bead on when to expect this development and this was the reply: 'No, you are not misinformed. The scheduled publication date for Art of Assembly is March, 2003. I will have something posted on the website very soon, perhaps in the next couple of days. Thanks for the interest!'" -
Finding the Viscosity of Pitch
ColdChrist writes "The University of Queensland has a page about a 72-year-old experiment on the fluidity of pitch. There's a webcam where you can try to become the first person ever to see a drop of the pitch fall; eight drops have fallen since 1930 and the ninth is now forming. The experiment 'demonstrates the fluidity and high viscosity of pitch, a derivative of tar once used for waterproofing boats. At room temperature pitch feels solid - even brittle - and can easily be shattered with a blow from a hammer', but it does flow, as the pictures demonstrate." I know this is going to bring up glass comparisons, so we'll head those off: glass is not a fluid. -
High-Density Magnets Created
Judebert writes: "University of California, Riverside scientists have created diradical magnets: magnetic particles that have two unbonded electrons instead of just one. The problem with diradical substances is that they have always been extremely chemically active, so they never stayed around longer than a few microseconds at room temperature. The new substance is stable at room temperature, even when it's in solution. And it's not even metallic. This paves the way for newer, higher-density magnetic and magneto-optical media and devices. You can help distribute the load if you visit the text mirror instead." -
Intel to Build Encryption Capabilities in Chips
Will Johnston sent us a link to a CNNfn article where you can read about Intel's plan's to incorporate encryption into it's microchips. I'm not sure about this one: The paranoid in me starts quivering, but then again, standard encryption sure would make a lot of this stuff a lot easier. What do you think?