Domain: paulgraham.com
Stories and comments across the archive that link to paulgraham.com.
Stories · 95
-
Company Takes Over Well-Known OSS Developer's Name Because the Domain Was Free
New submitter Fatalis writes: Substack is a venture capital funded startup for subscription-based newsletters, and it admittedly chose its name following the advice from a Paul Graham (co-founder of Y Combinator) article to prefer names not registered in the .com zone. The same name has also been the user handle for a prolific open-source developer who now finds themselves competing for recognition in the tech space with a capital backed company. The lesson seems to be for developers to protect their personal brand by registering a domain name with the .com extension due to it being perceived as the default. -
New Video Peeks 'Inside the Head' of Perl Creator Larry Wall (infoq.com)
"I was trained more as a linguist than a computer scientist," says Perl creator Larry Wall, "and some people would say it shows." An anonymous reader describes Wall's new video interview up on InfoQ: "With a natural language, you learn it as you go," Wall says. "You're not expected to know the whole language at once. It's okay to have dialects... Natural languages evolve over time, and they don't have arbitrary limits. They naturally cover multiple paradigms. There are external influences on style... It has fractal dimensionality to it. Easy things should be easy, hard things should be possible. And, you know, if you get really good at it, you can even speak CompSci."
Wall also touched on the long delay for the release of Perl 6. "In the year 2000, we said 'Maybe it's time to break backward compatibility, just once. Maybe we can afford to do that, get off the worse-is-worse cycle, crank the thing once for a worse-is-better cycle." The development team received a whopping 361 suggestions -- and was also influenced by Paul Graham's essay on the 100-year language. "We put a lot of these ideas together and thought really hard, and came up with a whole bunch of principles in the last 15 years." Among the pithy principles: "Give the user enough rope to shoot themselves in the foot, but hide the rope in the corner," and "Encapsulate cleverness, then reuse the heck out of it.."
But Wall emphasized the flexibility and multi-paradigm nature that they finally implemented in Perl 6. "The thing we really came up with was... There really is no one true language. Not even Perl 6, because Perl 6 itself is a braid of sublanguages -- slangs for short -- and they interact with each other, and you can modify each part of the braid..."
Wall even demoed a sigil-less style, and argued that Perl 6 was everything from "expressive" and "optimizable" to "gradually-typed" and "concurrency aware," while supporting multiple virtual machines. He also notes that Perl 6 borrows powerful features from other languages, including Haskell (lazy evaluation) Smalltalk (traits), Go (promises and channels), and C# (functional reactive programming).
And towards the end of the interview Wall remembers how the original release of Perl was considered by some as a violation of the Unix philosophy of doing one thing and doing it well. "I was already on my rebellious slide into changing the world at that point." -
Paul Graham: Let the Other 95% of Great Programmers In
An anonymous reader writes: Y Combinator's Paul Graham has posted an essay arguing in favor of relaxed immigration rules. His argument is straight-forward: with only 5% of the world's population, the U.S. can only expect about 5% of great programmers to be born here. He says, "What the anti-immigration people don't understand is that there is a huge variation in ability between competent programmers and exceptional ones, and while you can train people to be competent, you can't train them to be exceptional. Exceptional programmers have an aptitude for and interest in programming that is not merely the product of training."
Graham says even a dramatic boost to the training of programmers within the U.S. can't hope to match the resources available elsewhere. "We have the potential to ensure that the U.S. remains a technology superpower just by letting in a few thousand great programmers a year. What a colossal mistake it would be to let that opportunity slip. It could easily be the defining mistake this generation of American politicians later become famous for." -
High Tech Companies Becoming Fools For the City
theodp writes "Drawn by amenities and talent, the WSJ reports that tech firms are saying goodbye to office parks and opting for cities. Pinterest, Zynga, Yelp, Square, Twitter, and Salesforce.com are some of the more notable tech companies who are taking up residence in San Francisco. New York City's Silicon Alley is now home to more than 500 new start-up companies like Kickstarter and Tumblr, not to mention the gigantic Google satellite in the old Port Authority Building. London, Seattle, and even downtown Las Vegas are also seeing infusions of techies. So, why are tech companies eschewing Silicon Valley and going all Fool for the City? 'Silicon Valley proper is soul-crushing suburban sprawl,' Paul Graham presciently explained in 2006. 'It has fabulous weather, which makes it significantly better than the soul-crushing sprawl of most other American cities. But a competitor that managed to avoid sprawl would have real leverage.'" -
What Went Wrong At Yahoo
kjh1 writes "Paul Graham writes about what he felt went wrong at Yahoo. He has first-hand experience — his company, Viaweb, was bought by Yahoo and he worked there for a while. In a nutshell, he felt that Yahoo was too conflicted about whether they were a technology company or a media company. 'If anyone at Yahoo considered the idea that they should be a technology company, the next thought would have been that Microsoft would crush them.' This in part led to hiring bad programmers, or at least not going single-mindedly after the very best ones. They also lacked the 'hacker' culture that Google and Facebook still seem to have, and that is found in many startup tech companies. 'As long as customers were writing big checks for banner ads, it was hard to take search seriously. Google didn't have that to distract them.'" -
Micropayments For News — Holy Grail Or Delusion?
newscloud writes "Harvard's Nieman Journalism Lab sounds off on micropayments for news content, on the side of the argument that says they are a dangerous delusion: 'What does it mean for journalism? It could mean charging for different platforms, for early alerts, for special members-only access to certain premium or value-added content. But I'm pretty sure of one thing: It doesn't mean charging people fractions of a cent to read a news story, no matter how sophisticated the process.' The article provides good context on the debate over micropayments from a 2003 piece by Clay Shirky, to recent analysis and opinion by Masnick, Outing, Graham, and Reifman. Google's micropayment plans were recently discussed here." -
News Content As a Resource, Not a Final Product
Paul Graham has posted an essay questioning whether we ever really paid for "content," as publishers of news and music are saying while they struggle to stay afloat in the digital age. "If the content was what they were selling, why has the price of books or music or movies always depended mostly on the format? Why didn't better content cost more?" Techdirt's Mike Masnick takes it a step further, suggesting that the content itself should be treated as a resource — one component of many that go into a final product. Masnick also discussed the issue recently with NY Times' columnist David Carr, saying that micropayments won't be the silver bullet the publishers are hoping for because consumers are inundated with free alternatives. "It's putting up a tollbooth on a 50-lane highway where the other 49 lanes have no tollbooth, and there's no specific benefit for paying the toll." Reader newscloud points out that the fall 2009 issue of Harvard's Nieman Reports contains a variety of related essays by journalists, technologists, and researchers. -
A Hypothesis On Segway Hate
theodp writes "Admit it, IT is ingenious. Also, IT is surprisingly effective for certain uses, including real cops and mall cops. And if you tried IT, you probably smiled to yourself. So why all the Segway hate? Paul Graham looks into The Trouble with the Segway and offers a hypothesis about what prompts people to shout abuse at Segway riders: 'You look smug. You don't seem to be working hard enough.' Not that someone riding a motorcycle is working any harder, adds Graham, but because he's sitting astride it, he appears to be making an effort. When you're riding a Segway you're just standing there. Make a version that doesn't look so easy for the rider — perhaps resembling skateboards or bicycles — and Segway just might capture more of the market they hoped to reach." -
Manager's Schedule vs. Maker's Schedule
theodp writes "Ever wonder why you and the boss don't see eye-to-eye on the importance of meetings? Paul Graham explains that there are Maker Schedules (coder) and Manager Schedules (PHB), and the two are very different. With each day neatly cut into one-hour intervals, the Manager Schedule is for bosses and is tailor-made for schmoozing. Unfortunately, it spells disaster for people who make things, like programmers and writers, who generally prefer to use time in units of half a day at least. You can't write or program well in units of an hour, says Graham, since that's barely enough time to get started. So if you fall into the Maker camp, adds Graham, you better hope your boss is smart enough to recognize that you need long chunks of time to work in. How's that working out in your world?" Ironically enough, I have a meeting to attend in three minutes. -
Why TV Lost
theodp writes "Over the past 20 years, there's been much speculation about what the convergence of computers and TV would ultimately look like. Paul Graham says that we now know the answer: computers. 'Convergence' is turning out to essentially be 'replacement.' Why did TV lose? Graham identifies four forces: 1. The Internet's open platform fosters innovation at hacker speeds instead of big company speeds. 2. Moore's Law worked its magic on Internet bandwidth. 3. Piracy taught a new generation of users it's more convenient to watch shows on a computer screen. 4. Social applications made everybody from grandmas to 14-year-old girls want computers — in a three-word-nutshell, Facebook killed TV." -
Avoiding Mistakes Can Be a Huge Mistake
theodp writes "No doubt many will nod knowingly as they read Paul Graham's The Other Half of 'Artists Ship', which delves into the downside of procedures developed by Big Companies to protect themselves against mistakes. Because every check you put on your programmers has a cost, Graham warns: 'And just as the greatest danger of being hard to sell to is not that you overpay but that the best suppliers won't even sell to you, the greatest danger of applying too many checks to your programmers is not that you'll make them unproductive, but that good programmers won't even want to work for you.' Sound familiar, anyone?" -
AppJet Offers Browser-Based Coding How-To, Hosting
theodp writes "Know someone who wants to learn to program? Paul Graham advises programmer wannabes to check out The Absolute Beginner's Guide to Programming on the Web from AppJet, which aims to be 'the funnest and easiest way for a beginner to get started programming.' Setting the guide apart from other tutorials is the ability to edit and run any of the all-Javascript examples directly in your browser. Newcomers to programming and experienced developers alike can also publish their AppJet creations on the web. Sure beats GE BASIC on the General Electric Time-Sharing Service!" -
People Don't Hate to Make Desktop Apps, Do They?
Annie Peterson writes "Paul Graham has been making the argument that desktop development is dead — That's his premise for declaring Microsoft dead as well, and he claims that no one out there likes to develop for the desktop anymore. But that's not true, or is it? Desktop development is easier, faster, more productive, and infinitely more enjoyable — right? The question is, since web apps were originally built on desktop applications themselves, have the tables flipped? Or is it just wishful thinking?" -
Paul Graham Claims "Microsoft is Dead"
netbuzz writes "He doesn't mean dead as in six feet under, but rather that the software giant no longer instills the kind of fear — particularly among entrepreneurs — that it did back in the day when it was making road kill out of companies like Netscape. Microsoft obits have been around for almost as long as the company, but Graham's stature, style and devoted following are likely to make this one a classic." -
First Dynamically Balancing Biped Robot
damg writes "Anybots, which is three guys led by Trevor Blackwell, has developed the first robot that walks like we do, by dynamically balancing itself rather than being pre-programmed for walking like Asimo. The video shows the robot walking and being pushed by another 'bully' robot to demonstrate that it can't easily be pushed over." -
Why Startups Condense in America
bariswheel writes "The controversial genius developer/writer/entertainer Paul Graham writes an insightful piece on Why Startups Condense in America. Here's the skinny: "The US allows immigration, it is a rich country, it is not (yet) a police state, the universities are better, you can fire people, work is less identified with employment, it is not too fussy, it has a large domestic market, it has venture funding, and it has dynamic typing for careers. Inquire for details within." -
Is Silicon Valley Reproducible?
sunil99 asks: "Paul Graham, in his latest essay, looks at the ingredients which make Silicon Valley what it is. From the essay: 'Could you reproduce Silicon Valley elsewhere, or is there something unique about it? It wouldn't be surprising if it were hard to reproduce in other countries, because you couldn't reproduce it in most of the US, either. What does it take to make [a Silicon Valley]?'. In his opinion: 'I think you only need two kinds of people to create a technology hub: rich people and nerds'. He concludes that if a city can attract these people, it can stand a chance of replicating Silicon Valley. What do you think of Paul's opinions? If you would like some changes to the current Silicon Valley, what would those be?" While the people are an important part to the Silicon Valley experience, they are only part of the requirement. What local characteristics must also be present, even if Silicon Valley is to be duplicated on a smaller scale? What draws technology companies to a specific location? -
The Business of Software
pankaj_kumar writes "The business of software usually gets tons of footage by the tech media covering its various facets: products, people, organizations, its economics, business models, technology trends and myriad other related things. So one would think that it would be difficult, if not impossible, to say something original. However, this collection of blog entries by noted blogger Eric Sink, founder of SourceGear, a vendor of source code control system, and developer of a Web browser at Spyglass that later came to become Internet Explorer, manages to do just that. He does so by focusing on workings of a lesser known niche in software business, that of privately held small ISVs, relating to his own personal experiences in a very engaging manner." Read the rest of Pankaj's review. Eric Sink on the Business of Software author Eric Sink pages 320 publisher Apress rating 8 reviewer Pankaj Kumar ISBN 1590596234 summary compilation of essays on founding and running a small ISV
If you are like me and rely mostly on news reports, either in print or online, to track your industry then you mostly read about VC backed startups or large publicly traded companies. It is too easy to forget or not realize that software started as a cottage industry and there still are a lot of mature, privately held small companies building and selling profitable products literally from a cottage. Their workings, forces driving key decisions and a lot of other things differ from the younger VC backed startups or bigger publicly held ones in significant ways. Eric's essays talk about this difference and the realities, both good and bad, of being small.
If you are looking at starting your own software company or just interested in gaining deeper insight into this segment of the industry then go, buy this book. In fact you don't even have to do that -- most of the essays are freely available on Eric's blog. But I must mention that even though I had read some of the essays online, reading them in the book, away from the computer and thousands other exciting things just a click away on the net, was a a much more positive experience.
Although most of the essays are original, informative and highly readable, some stand out from the crowd: Whining By a Barrel of Rocks talks about opportunities for small ISVs with the analogy of a barrel filled with large stones (ie; big apps) but still capable of holding many more small pebbles; Starting Your Own Business contains nuggets of street-smart advice for wannabe software entrepreneurs; Make More Mistakes recounts Eric's decisions and actions in his career as an entrepreneur that didn't work out the way he had hoped; Great Hacker != Great Hire critiques the famous piece by Paul Graham points out the considerations of developing software and doing business in a real world.
Actually, there are more than the above four that stand out, but I will leave it here. In fact, one of the quotes that I like most appears in The Game is Afoot: "This issue is not a check box; it's a slider." Although the comment was made in context of being conservative or bold, I think it applies to most issues we encounter. Very few things in life are either black or white. They need careful deliberation within a given context and a balanced response. In fact, Eric manages to illustrate this very seemingly obvious but difficult to practice idea in the domain of small ISV with help of a number of analogies with popular games in The Game is Afoot essay.
As much I liked the book, this review will not be fair without a discussion of its shortcomings or boundaries, at least the way I see. Keep in mind that the book is a compilation of blog entries based on personal experience and beliefs, not a work of research. So do not expect official or industry analyst numbers or survey results to back up the claims. Want to know about the approximate number of small ISVs in US and total revenue generated by them have changed over last 5 years, 10 years?. No luck. In fact, the book doesn't even mention these numbers for any year.
Also, I found the essays to be too heavily leaning towards desktop software. Given the emergence of the Web, its potential to disrupt established players and its friendliness to individuals and smaller organizations, it is indeed surprising that Eric doesn't talk much about Web based software opportunities. In fact, lately there have been many success stories where services built and operated by single individuals or very small teams have become very popular and bought by bigger companies.
Another inescapable idea in the world of software that finds scant mention in these essays is open source software and its famed development process. And I don't necessarily mean the launching a business based on open source software, but rather how to reconcile with the fact that open source software exists and all businesses, especially the smaller ones, have to survive and thrive in the same world. This is perhaps explained by author's own experience as recounted in Making More Mistakes essay where he talks about his lone effort to create an Open source software AbiWord and how it failed, at least from financial perspective. Perhaps therein lies his message!
Overall, I would sum up my review of this book as a nice and balanced work by an articulate software guy with deep technical expertise and keen business sense."
You can purchase Eric Sink on the Business of Software from bn.com. Slashdot welcomes readers' book reviews -- to see your own review here, read the book review guidelines, then visit the submission page. -
What is the Best Calendar?
An anonymous reader writes "In the flurry of AJAX applications being put to market, Google's new calendar has been getting quite a bit of attention. But being drowned out in this media blitz is Kiko, a startup from Paul Graham's Y Combinator program, along with spongecell, Trumba, Yahoo! calendar, and 30boxes. Which do you prefer?" Update: 04/16 14:55 GMT by Z : YCombinator link fixed. -
Paul Graham on Patents
volts writes "The always interesting Paul Graham has a new essay, 'Are Software Patents Evil?'. "A few weeks ago I found to my surprise that I'd been granted four patents. This was all the more surprising because I'd only applied for three..."" -
Paul Graham on Patents
volts writes "The always interesting Paul Graham has a new essay, 'Are Software Patents Evil?'. "A few weeks ago I found to my surprise that I'd been granted four patents. This was all the more surprising because I'd only applied for three..."" -
How to Do What You Love
fnord_ix writes "Paul Graham has another interesting essay talking about How to Do What You Love. He talks about the lies that adults tell kids about what work is, and how work is equal to pain." From the article: "I'm not saying we should let little kids do whatever they want. They may have to be made to work on certain things. But if we make kids work on dull stuff, it might be wise to tell them that tediousness is not the defining quality of work, and indeed that the reason they have to work on dull stuff now is so they can work on more interesting stuff later. " -
Good and Bad Procrastination
dtolton writes "Paul Graham has written an interesting article on Procrastination. He presents three different types of procrastination and one type of procrastination is even good! He also suggests that some types of "getting things done" are actually weak forms of procrastination. The only downside to this article is now you'll have to look at your procrastination with an analytical eye too!" Perhaps next year's Christmas shopping can benefit from the writeup? -
Ending Spam
Shalendra Chhabra writes "Jonathan Zdziarski has been fighting spam since before the first MIT spam conference in 2003, and has now released a full-on technical book, Ending Spam, on spam filtering. Ending Spam covers how the current and near-future crop of heuristic and statistical filters actually work under the hood, and how you can most effectively use such filters to protect your inbox." Read on for the rest of Chhabra's review. Ending Spam: Bayesian Content Filtering and the Art of Statistical Language Classification author Jonathan A. Zdziarski pages 312 publisher No Starch Press rating 8 reviewer Shalendra Chhabra ISBN 1593270526 summary Very Good Book Covering Statistical Models and Techniques Implemented in Current Spam Filters
Spam (unsolicited commercial email) and phishing (fraudulent emails) are causing losses of billions of dollars to businesses. Many initiatives are currently underway for fighting this challenge. On the legal front, a Virginia court recently sentenced a prolific spammer, Jeremy Jaynes, to nine years in prison, and a Nigerian court sentenced a woman to two and a half years for phishing. Michigan and Utah have both passed laws creating "do-not-contact" registries in July/August 2005, covering e-mail addresses, instant messaging addresses and telephone numbers. Technical initiatives to fight spam include server- or client-side spam filtering, using Lists (Blacklists, Whitelists, Greylists), Email Authentication Standards (IIM, DK, DKIM, SPF, SenderID), and emerging sender reputation and accreditation services.
Ending Spam is the first book explaining the fine details of the theoretical models and machine-learning algorithms implemented in these filters. The book is divided into three parts: introduction to spam filtering, fundamentals of statistical filtering, and advanced concepts of statistical filtering.
The first section of the book discusses the history of spam, spam kings, different approaches for fighting spam such as blacklisting, whitelisting, heuristic filtering, challenge response, throttling, collaborative filtering, Authenticated SMTP, Sender Policy Framework and SenderID, spammer fingerprinting, etc. However, the author omitted any mention of locally-sensitive hash functions (such as Nilsimsa Hash) to counter spammers' random insertion of words, the use of CAPTCHA (Completely Automated Public Turing Test to Tell Computers and Humans Apart), Greylisting, Identified Internet Mail, and Domain Keys (now Domain Keys Identified Mail).
In the next chapter, the author clearly explains various components of a Language Classifier Pipeline, including the Historical Dataset (aka wordlist, database, dictionary, filter memory), Tokenizer, and the Analysis Engine with its feedback loop. However, the process flow of a language classifier could have been more generalized, e.g. incorporating an initial text-to-text transformer. This chapter also covers the advantages and disadvantages of various training modes for filters, such as Train Everything (TEFT), Train-on-Error (TOE), and Train Until No Errors (TUNE). This part concludes with the description of Paul Graham's famous spam-filtering technique using Bayesian classification (as described in "A Plan for Spam"), Gary Robinson's Geometric Mean Test, Fisher-Robinsons Inverse Chi Square (including the source code for the inversion function), and some other tricks for optimizing spam- filtering accuracy.
The second part of this book deals with the fundamentals of statistical filtering. The author explains HTML and Base64 encoding, followed by a detailed description of tokenization techniques (e.g. Sparse Binary Polynomial Hashing). Then there's a discussion of the various tricks that spammers use for penetrating filters. Although these tactics are mentioned in John Graham-Cumming's "Spammers Compendium," Jonathan has very elegantly explained why some tricks work for spammers and some don't. This part concludes by addressing some of the resource, storage and scaling concerns raised by the large number of features generated from tokenization techniques.
The third part of this book deals with advanced concepts of statistical filtering. This includes the testing criteria for measuring accuracy of an email filter, and some advanced tokenization concepts, e.g. chained tokens (taking word-pairs and phrases into account, instead of individual words) generated using a sliding 5-byte window as mentioned in Sparse Binary Polynomial Hashing. The next chapter describes the Markovian Model implemented in the CRM114 Discriminator, but the author fails to describe different weighting schemes for features implemented in the Markovian-based version of CRM114. The author then describes the Bayesian Noise Reduction Technique for purging "out of context" data from the mail text. This chapter concludes with a very nice summary of collaborative algorithms and techniques, such as Message Innoculation, Streamlined Blackhole List, Fingerprinting, Automatic Whitelisting, URL Blacklisting, and Honeypot email addresses for snaring spammers' address harvesting bots.
The most interesting part of this book is the appendix, where the author presents interviews with John Graham-Cumming of POPFile, Brian Burton of SpamProbe, Marty Lamb of TarProxy, Bill Yerazunis of CRM114 Discriminator, and Jonathan Zdziarski of DSPAM (himself). I loved this section.
The salient points of the book: it's very easy to read; each chapter begins with a very thought-provoking introduction, and concludes with a crisp "final thoughts" section. The number of technical errors are very few in this print, and the illustrations are of good quality. Since the book is geared more toward the Bayesian and statistical generation of spam filters, the absence of certain spam-busting technologies is acceptable. However, a noticeable omission is the lack of discussion about measuring spam-filter accuracy, and what impact this has on setting filtration thresholds. A section on the economics of tradeoffs, and the use of a Receiver Operating Characteristic curve (ROC) would have been very helpful.
Overall, by putting together Ending Spam, Jonathan Zdziarski has made another significant contribution (after DSPAM) to the anti-spam community. Whether you are a system administrator, anti-spam researcher, engineer or a newbie interested in fighting spam, this book is a great reference.
William S Yerazunis and Richard Jowsey also contributed to this review. Shalendra Chhabra is a Graduate Student in Department of Computer Science and Engineering at University of California, Riverside. He is on the development team of CRM114 Discriminator and has presented his work at MIT Spam Conference 2005, Cisco Systems, and Stanford University. You can purchase Ending Spam: Bayesian Content Filtering and the Art of Statistical Language Classification from bn.com. Slashdot welcomes readers' book reviews -- to see your own review here, read the book review guidelines, then visit the submission page. -
Ending Spam
Shalendra Chhabra writes "Jonathan Zdziarski has been fighting spam since before the first MIT spam conference in 2003, and has now released a full-on technical book, Ending Spam, on spam filtering. Ending Spam covers how the current and near-future crop of heuristic and statistical filters actually work under the hood, and how you can most effectively use such filters to protect your inbox." Read on for the rest of Chhabra's review. Ending Spam: Bayesian Content Filtering and the Art of Statistical Language Classification author Jonathan A. Zdziarski pages 312 publisher No Starch Press rating 8 reviewer Shalendra Chhabra ISBN 1593270526 summary Very Good Book Covering Statistical Models and Techniques Implemented in Current Spam Filters
Spam (unsolicited commercial email) and phishing (fraudulent emails) are causing losses of billions of dollars to businesses. Many initiatives are currently underway for fighting this challenge. On the legal front, a Virginia court recently sentenced a prolific spammer, Jeremy Jaynes, to nine years in prison, and a Nigerian court sentenced a woman to two and a half years for phishing. Michigan and Utah have both passed laws creating "do-not-contact" registries in July/August 2005, covering e-mail addresses, instant messaging addresses and telephone numbers. Technical initiatives to fight spam include server- or client-side spam filtering, using Lists (Blacklists, Whitelists, Greylists), Email Authentication Standards (IIM, DK, DKIM, SPF, SenderID), and emerging sender reputation and accreditation services.
Ending Spam is the first book explaining the fine details of the theoretical models and machine-learning algorithms implemented in these filters. The book is divided into three parts: introduction to spam filtering, fundamentals of statistical filtering, and advanced concepts of statistical filtering.
The first section of the book discusses the history of spam, spam kings, different approaches for fighting spam such as blacklisting, whitelisting, heuristic filtering, challenge response, throttling, collaborative filtering, Authenticated SMTP, Sender Policy Framework and SenderID, spammer fingerprinting, etc. However, the author omitted any mention of locally-sensitive hash functions (such as Nilsimsa Hash) to counter spammers' random insertion of words, the use of CAPTCHA (Completely Automated Public Turing Test to Tell Computers and Humans Apart), Greylisting, Identified Internet Mail, and Domain Keys (now Domain Keys Identified Mail).
In the next chapter, the author clearly explains various components of a Language Classifier Pipeline, including the Historical Dataset (aka wordlist, database, dictionary, filter memory), Tokenizer, and the Analysis Engine with its feedback loop. However, the process flow of a language classifier could have been more generalized, e.g. incorporating an initial text-to-text transformer. This chapter also covers the advantages and disadvantages of various training modes for filters, such as Train Everything (TEFT), Train-on-Error (TOE), and Train Until No Errors (TUNE). This part concludes with the description of Paul Graham's famous spam-filtering technique using Bayesian classification (as described in "A Plan for Spam"), Gary Robinson's Geometric Mean Test, Fisher-Robinsons Inverse Chi Square (including the source code for the inversion function), and some other tricks for optimizing spam- filtering accuracy.
The second part of this book deals with the fundamentals of statistical filtering. The author explains HTML and Base64 encoding, followed by a detailed description of tokenization techniques (e.g. Sparse Binary Polynomial Hashing). Then there's a discussion of the various tricks that spammers use for penetrating filters. Although these tactics are mentioned in John Graham-Cumming's "Spammers Compendium," Jonathan has very elegantly explained why some tricks work for spammers and some don't. This part concludes by addressing some of the resource, storage and scaling concerns raised by the large number of features generated from tokenization techniques.
The third part of this book deals with advanced concepts of statistical filtering. This includes the testing criteria for measuring accuracy of an email filter, and some advanced tokenization concepts, e.g. chained tokens (taking word-pairs and phrases into account, instead of individual words) generated using a sliding 5-byte window as mentioned in Sparse Binary Polynomial Hashing. The next chapter describes the Markovian Model implemented in the CRM114 Discriminator, but the author fails to describe different weighting schemes for features implemented in the Markovian-based version of CRM114. The author then describes the Bayesian Noise Reduction Technique for purging "out of context" data from the mail text. This chapter concludes with a very nice summary of collaborative algorithms and techniques, such as Message Innoculation, Streamlined Blackhole List, Fingerprinting, Automatic Whitelisting, URL Blacklisting, and Honeypot email addresses for snaring spammers' address harvesting bots.
The most interesting part of this book is the appendix, where the author presents interviews with John Graham-Cumming of POPFile, Brian Burton of SpamProbe, Marty Lamb of TarProxy, Bill Yerazunis of CRM114 Discriminator, and Jonathan Zdziarski of DSPAM (himself). I loved this section.
The salient points of the book: it's very easy to read; each chapter begins with a very thought-provoking introduction, and concludes with a crisp "final thoughts" section. The number of technical errors are very few in this print, and the illustrations are of good quality. Since the book is geared more toward the Bayesian and statistical generation of spam filters, the absence of certain spam-busting technologies is acceptable. However, a noticeable omission is the lack of discussion about measuring spam-filter accuracy, and what impact this has on setting filtration thresholds. A section on the economics of tradeoffs, and the use of a Receiver Operating Characteristic curve (ROC) would have been very helpful.
Overall, by putting together Ending Spam, Jonathan Zdziarski has made another significant contribution (after DSPAM) to the anti-spam community. Whether you are a system administrator, anti-spam researcher, engineer or a newbie interested in fighting spam, this book is a great reference.
William S Yerazunis and Richard Jowsey also contributed to this review. Shalendra Chhabra is a Graduate Student in Department of Computer Science and Engineering at University of California, Riverside. He is on the development team of CRM114 Discriminator and has presented his work at MIT Spam Conference 2005, Cisco Systems, and Stanford University. You can purchase Ending Spam: Bayesian Content Filtering and the Art of Statistical Language Classification from bn.com. Slashdot welcomes readers' book reviews -- to see your own review here, read the book review guidelines, then visit the submission page. -
Summer Internships - The Good, and the Bad?
loquacious d asks: "This has been a spectacular summer for open-source student internships. Google funded a huge variety of open-source projects through the Summer of Code, including GCC-CIL and other improvements to Mono, new features and fixes for Gaim, and even new packages for Common Lisp. Joel Spolsky at Fog Creek hired four interns to produce a highly modified version of VNC called Fog Creek Copilot, and Paul Graham's new venture capital firm Y Combinator helped students create their own tech companies. What internships did people enjoy this summer, and which ones didn't work out so well? Which ones would you recommend to next year's applicants, and which should they avoid?" -
What Business Can Learn from Open Source
dtolton writes "Paul Graham has written a fantastic article on what businesses can learn from Open Source. He covers why Amateurs can outperform Professionals, why the home is a better work environment than the office, and how bottom up ideas are better than top down. Finally he ties these lessons into the business relationship." Derived from a talk at Oscon 2005. From the article: "...the biggest thing business has to learn from open source is not about Linux or Firefox, but about the forces that produced them. Ultimately these will affect a lot more than what software you use. We may be able to get a fix on these underlying forces by triangulating from open source and blogging. As you've probably noticed, they have a lot in common." -
Paul Graham Describes Dangers of Spam Blacklists
CRoby writes "Paul Graham posted an essay describing the danger and corruption of the main spammer blacklists today. It discusses MAPS and the SBL, the blacklist created to try to alleviate the abuses of MAPS, and suggests (maybe) another blacklist's creation." -
Internships for Talented High School Students?
xeon4life asks: "I'm an Austin, Texas area high school senior with a slight dilemma: I need a job, I don't want what's offered at my age, and internships are not quite open for kids like me. I've recently been reading essays by Paul Graham about creating your own startup and have been motivated enough to convince two of my good friends to go into business with me later, during college. Thus, an internship at this point would be the ideal solution for me now, but nobody is willing to take me as an intern because I'm still in high school. What am I to do?" "People have suggested that I just do what every other good American high school citizen does and take a mediocre job. The problem is, I feel it would be a waste of my talents right now to be stuck folding shirts at the local mall or flipping cheeseburgers when I could be helping develop a cutting-edge game, the next-generation compiler, or even the Linux kernel as an intern. I have a higher than most college students' understanding of concepts, and some real programming experience in languages like assembly and C/C++, but that isn't going to amount to anything if I can never find an interviewer who will at least listen to me. I'd appreciate any input the Slashdot readership can give me." -
Paul Graham: Hiring is Obsolete
jazznjava writes "Paul Graham has a new essay covering what the influences of declining operating costs will have on startup companies, and the undervaluation of undergraduates." -
Paul Graham: Hiring is Obsolete
jazznjava writes "Paul Graham has a new essay covering what the influences of declining operating costs will have on startup companies, and the undervaluation of undergraduates." -
Paul Graham on PR
ralejs writes "Paul Graham takes on PR. From the article:'Why do the media keep running stories saying suits are back? Because PR firms tell them to. One of the most surprising things I discovered during my brief business career was the existence of the PR industry, lurking like a huge, quiet submarine beneath the news. Of the stories you read in traditional media that aren't about politics, crimes, or disasters, more than half probably come from PR firms.' As always, it's an interesting, surprising and slightly provoking read." -
Return of the Mac
Ben Gutierrez writes "Paul Graham has posted a new essay on the Return of the Mac which begins with: 'All the best hackers I know are gradually switching to Macs.' Tim O'Reilly said some similar things in Watching Alpha Geeks . From the article: "My friend Robert said his whole research group at MIT recently bought themselves Powerbooks. These guys are not the graphic designers and grandmas who were buying Macs at Apple's low point in the mid 1990s. They're about as hardcore OS hackers as you can get." -
Return of the Mac
Ben Gutierrez writes "Paul Graham has posted a new essay on the Return of the Mac which begins with: 'All the best hackers I know are gradually switching to Macs.' Tim O'Reilly said some similar things in Watching Alpha Geeks . From the article: "My friend Robert said his whole research group at MIT recently bought themselves Powerbooks. These guys are not the graphic designers and grandmas who were buying Macs at Apple's low point in the mid 1990s. They're about as hardcore OS hackers as you can get." -
Summer Reading and Startup Program
putko writes "Paul Graham, lisp hacker and creator of the company that became Yahoo! Store has an essay on what to do while in college. Previously, he's covered what high school students should do. He's also begun a summer startup program, which invites people with good ideas to try out for some startup capital. The deadline is March 26th." From the page: "We're going to call this project the Summer Founders Program, and it preserves many of the features of a conventional summer job. You have to move here (Cambridge) for the summer, as with a regular summer job. We give you enough money to live on for a summer, as with a regular summer job. You get to work on real problems, as you would in a good summer job. But instead of working for an existing company, you'll be working for your own; instead showing up at some office building at 9 AM, you can work when and where you like; and instead of salary, the money you get will be seed funding." -
Summer Reading and Startup Program
putko writes "Paul Graham, lisp hacker and creator of the company that became Yahoo! Store has an essay on what to do while in college. Previously, he's covered what high school students should do. He's also begun a summer startup program, which invites people with good ideas to try out for some startup capital. The deadline is March 26th." From the page: "We're going to call this project the Summer Founders Program, and it preserves many of the features of a conventional summer job. You have to move here (Cambridge) for the summer, as with a regular summer job. We give you enough money to live on for a summer, as with a regular summer job. You get to work on real problems, as you would in a good summer job. But instead of working for an existing company, you'll be working for your own; instead showing up at some office building at 9 AM, you can work when and where you like; and instead of salary, the money you get will be seed funding." -
Summer Reading and Startup Program
putko writes "Paul Graham, lisp hacker and creator of the company that became Yahoo! Store has an essay on what to do while in college. Previously, he's covered what high school students should do. He's also begun a summer startup program, which invites people with good ideas to try out for some startup capital. The deadline is March 26th." From the page: "We're going to call this project the Summer Founders Program, and it preserves many of the features of a conventional summer job. You have to move here (Cambridge) for the summer, as with a regular summer job. We give you enough money to live on for a summer, as with a regular summer job. You get to work on real problems, as you would in a good summer job. But instead of working for an existing company, you'll be working for your own; instead showing up at some office building at 9 AM, you can work when and where you like; and instead of salary, the money you get will be seed funding." -
Paul Graham Explains How to Start a Startup
woginuk writes "Paul Graham has posted a new essay on his website on how to start a startup. According to him 'You need three things to create a successful startup: to start with good people, to make something customers actually want, and to spend as little money as possible. Most startups that fail do it because they fail at one of these. A startup that does all three will probably succeed.' How difficult can that be? So go start them startups." -
What You'll Wish You'd Known
sheck writes "Eminent computer scientist, author, painter, and dot-com millionaire, Paul Graham has written down the things he wishes somebody had told him when he was in high school in What You'll Wish You'd Known, suggesting, among other things, that students treat school like a day job, working on interesting projects to avoid what he has found to be the most common regret among adults of their high school days: wasting time." -
What You'll Wish You'd Known
sheck writes "Eminent computer scientist, author, painter, and dot-com millionaire, Paul Graham has written down the things he wishes somebody had told him when he was in high school in What You'll Wish You'd Known, suggesting, among other things, that students treat school like a day job, working on interesting projects to avoid what he has found to be the most common regret among adults of their high school days: wasting time." -
What You'll Wish You'd Known
sheck writes "Eminent computer scientist, author, painter, and dot-com millionaire, Paul Graham has written down the things he wishes somebody had told him when he was in high school in What You'll Wish You'd Known, suggesting, among other things, that students treat school like a day job, working on interesting projects to avoid what he has found to be the most common regret among adults of their high school days: wasting time." -
What You'll Wish You'd Known
sheck writes "Eminent computer scientist, author, painter, and dot-com millionaire, Paul Graham has written down the things he wishes somebody had told him when he was in high school in What You'll Wish You'd Known, suggesting, among other things, that students treat school like a day job, working on interesting projects to avoid what he has found to be the most common regret among adults of their high school days: wasting time." -
Good Bad Attitude
teidou writes "Paul Graham has posted a new essay titled 'Good Bad Attitude' talking about the hacker attitude toward rules and government regulation of Intellectual Property. Choice quote: "(Hackers) can sense totalitarianism approaching from a distance, as animals can sense an approaching thunderstorm."" -
Good Bad Attitude
teidou writes "Paul Graham has posted a new essay titled 'Good Bad Attitude' talking about the hacker attitude toward rules and government regulation of Intellectual Property. Choice quote: "(Hackers) can sense totalitarianism approaching from a distance, as animals can sense an approaching thunderstorm."" -
Java 1.5 vs C#
SexyFingers writes "Sun released Java 1.5. The non-API stuff that they've added made it finally "catch-up" with C# - since both languages are built to support OOP from the ground-up, their constructs become almost identical as additional OOP "features" are supported. So if you're doing C# and your foundations in OOP are rock-solid, there really isn't any difference whether you're coding C# or Java."Here's the list of enhancements to the Java Language:
- Generics (C# 2.0 already supports this)
- Enhanced For-Loop (the foreach construct in C# 1.0, duh!)
- Autoboxing/Unboxing (C# 1.0 already has this, everything is an object, even the primitives - not really, but they do it so well...)
- Typesafe Enums (again C# 1.0 already implemented this, but I think they've added a little bit more twist in Java, that its actually a better implementation)
- Varargs (C# 1.0's params construct, ellipsis construct in C++)
- Static Import (I don't know if C# 1.0 has this, or C#2.0, but C# has a construct for aliasing your imports - which is way cooler. Static Import, actually promotes bad coding habits IMHO)
- Metadata/Annotations (this is C# 1.0's Attributes, Sun's upturned noses just gave it a fancier name - also, C#'s implementation is better and more intuitive)
They've beefed up the API some, and integrated several packages with the regular JSDK that used to be a part of a separate package or installation ---in my NSHO, the Java API has become bloated...
At this point (even before Whidbey) the deciding factor (as always) for Enterprise work, when choosing a language platform, should be the support it has behind it, in terms of IDE, tools, api, and longevity of the vendor pushing it (forget the OpenSource crap argument, those guys are too in love with Perl, Python, and Ruby - Java could become the child nobody wants to talk about if Sun dies) - right now that's C# and the .NET Framework ---
If you ask Paul Graham though, both language would be utter crap and fit only for idiots :) http://www.paulgraham.com/gh.html [I'm exaggerating, so hold off on those flames.]
-
What The Bubble Got Right
dtolton writes "Paul Graham has written an article entitled What the Bubble Got Right. In recent years the roaring tech bubble has become a byword, yet Paul does an excellent job of articulating what it got right." -
What The Bubble Got Right
dtolton writes "Paul Graham has written an article entitled What the Bubble Got Right. In recent years the roaring tech bubble has become a byword, yet Paul does an excellent job of articulating what it got right." -
The Age of the Essay
bluFox writes "Paul Graham, has just published a new article on the English literature and role of Essays. It is not connected to lisp or languages or hackers for a change, but still feels like a continuation of his earlier articles." -
The Python Paradox, by Paul Graham
GnuVince writes "Paul Graham has posted a new article to his website that he called "The Python Paradox" which refines the statements he made in "Great Hackers" about Python programmers being better hackers than Java programmers. He basically says that since Python is not the kind of language that lands you a job like Java, those who learn it seek more than simply financial benefits, they seek better tools. Very interesting read." -
The Python Paradox, by Paul Graham
GnuVince writes "Paul Graham has posted a new article to his website that he called "The Python Paradox" which refines the statements he made in "Great Hackers" about Python programmers being better hackers than Java programmers. He basically says that since Python is not the kind of language that lands you a job like Java, those who learn it seek more than simply financial benefits, they seek better tools. Very interesting read."