Domain: bme.hu
Stories and comments across the archive that link to bme.hu.
Stories · 6
-
Programming Collective Intelligence
Joe Kauzlarich writes "In 2006, the on-line movie rental store Netflix proposed a $1 million prize to whomever could write a movie recommendation algorithm that offered a ten percent improvement over their own. As of this writing, the intriguingly-named Gravity and Dinosaurs team holds first place by a slim margin of .07 percent over BellKor, their algorithm an 8.82 percent improvement on the Netflix benchmark. So, the question remains, how do they write these so-called recommendation algorithms? A new O'Reilly book gives us a thorough introduction to the basics of this and similar lucrative sciences." Keep reading for the rest of Joe's review. Programming Collective Intelligence author Toby Segaran pages 334 publisher O'Reilly Media Inc. rating 9/10 reviewer Joe Kauzlarich ISBN 9780596529321 summary Introduction to data mining algorithms and techniques Among the chief ideological mandates of the Church of Web 2.0 is that users need not click around to locate information when that information can be brought to the users. This is achieved by leveraging 'collective intelligence,' that is, in terms of recommendations systems, by computationally analyzing statistical patterns of past users to make as-accurate-as-possible guesses about the desires of present users. Amazon, Google and certainly many other organizations, in addition to Netflix, have successfully edged out more traditional competitors on this basis, the latter failing to pay attention to the shopping patterns of users and forcing customers to locate products in a trial and error manner as they would in, say, a Costco. As a further illustration, if I go to the movie shelf at Best Buy, and look under 'R' for Rambo, no one's going to come up to me and say that the Die Hard Trilogy now has a special-edition release on DVD and is on sale. I'd have to accidentally pass the 'D' section and be looking in that direction in order to notice it. Amazon would immediately tell me, without bothering to mention that Gone With The Wind has a new special edition.
Programming Collective Intelligence is far more than a guide to building recommendation systems. Author Toby Segaran is not a commercial product vendor, but a director of software development for a computational biology firm, doing data-mining and algorithm design (so apparently there is more to these 'algorithms' than just their usefulness in recommending movies?). Segaran takes us on a friendly and detailed tour through the field's toolchest, covering the following topics in some depth:
Recommendation Systems
Discovering Groups
Searching and Ranking
Document Filtering
Decision Trees
Price Models
Genetic Programming
... and a lot more
As you can see, the subject matter stretches into the higher levels of mathematics and academia, but Segaran successfully keeps the book intelligible to most software developers and examples are written in the easy-to-follow Python language. Further chapters cover more advanced topics, like optimization techniques and many of the more complex algorithms are deferred to the appendix.
The third chapter of the book, 'Discovering Groups,' deserves some explanation and may enlighten you as to how the book may be of some use in day-to-day software designs. Suppose you have a collection of data that is interrelated by a 'JOIN' in two sets of data. For example, certain customers may spend more time browsing certain subsets of movies. 'Discovering Groups' refers to the computational process of recognizing these patterns and sectioning data into groups. In terms of music or movies, these groups would represent genres. The marketing team may thus become aware that jazz enthusiasts buy more music at sale prices than do listeners of contemporary rock, or that listeners of late-60's jazz also listen to 70's prog, or similar such trends.
Certainly the applications of such tools as Programming Collective Intelligence provides us are broader than my imagination can handle. Insurance companies, airlines and banks are all part of massive industries that rely on precise knowledge of consumer trends and can certainly make use of the data-mining knowledge introduced in this book.
I have no major complaints about the book, particularly because it fills a gap in popular knowledge with no precursor of which I'm aware. Presentation-wise, even though Python is easy to read, pseudo-code is more timeless and even easier to read. You can't cut & paste from a paper book into a Python interpreter anyway. It may 've been more appropriate to use pseudo-code in print and keep the example code on the website (I'm sure it's there anyway).
If you ever find yourself browsing or referencing your algorithms text from college or even seriously studying algorithms for fun or profit, then I would highly recommend this book depending on your background in mathematics and computer science. That is, if you have a strong background in the academic study of related research, then you might look elsewhere, but this book, certainly suitable as an undergraduate text, is probably the best one for relative beginners that is going to be available for a long time.
You can purchase Programming Collective Intelligence from amazon.com. Slashdot welcomes readers' book reviews -- to see your own review here, read the book review guidelines, then visit the submission page. -
Efficiently Reading ID3v2 Tags Over HTTP?
Paul Crowley asks: "Given an HTTP URL for an MP3 file, what's the best way to read its ID3 tags on a GNU/Linux system? It shouldn't be necessary to fetch the whole file: HTTP byteranges should make it possible to fetch only the tiny fraction that's needed, for a big saving in network bandwidth. However, existing ID3v2 libraries are designed to read local files. Extending these libraries for this purpose, or implementing a new one, would be a big job. What's the clean solution - is FUSE the best way, or is there a simpler way that doesn't require root privs? Can I do it using the existing id3lib binary?" -
24-hour Programming Contest
bigboyofeq writes "The Budapest University of Technology and Economics is hosting the 3rd 24-hour programming contest. For the first time, it's open for teams from all over the world. The winner team gets 4000 Euros, so it's worth a look. When I took a look at the pictures of the previous years, I got really excited. They are available here (comments are in Hungarian)." -
Slashback: Deception, Fusion, Membership
Slashback arrives tonight with updates on the lukewarm path to cold fusion, one more update on what Microsoft claims is "the way out" (really, this time), a hopeful look at Mandrake's Club, and more -- read on below for the details."Congratulations! You may already own goats.cx!" King Mongo writes: "Well, well. First Verisign sent mail to trick domain owners into switching registrars ( as described earlier on Slashdot ); today I received a similar letter from Verisign asking me to renew cruel-intention.com with them. The problem is, I never bought cruel-intention.com and I've never used Verisign as a registrar. But what's this? Whois says I've owned it since September 2001? And the Technical Contact is Verisign? And it's registered for 10 years? You can bet I'll be contacting my state AG, as well as the USPS Inspectors' office; what if the domain name was offensive, or actionable (it may even be a DMCA violation)? Verisign has taken it upon themselves to hijack my identity and expose me to litigation! At least they let me know!"
Port softly, and carry a big Club. joestar writes: "Just seen in Mandrake Linux news... It seems that the recent call for Mandrake Club subscriptions had a double effect: it was a financial success for MandrakeSoft ($390,000 since the Club was first created on November 28th, 2001), and at the same time it generated lots of questions about this new approach of doing business with Free-Software. In a really interesting message, MandrakeSoft's CEO Jacques Le Marois gives all details about the Club results and why and how they are currently inventing a new business model dedicated to Free-Software oriented companies, since the traditional business models fail for these companies. Actually I'm impressed."
OK, perhaps we only have the way sideways. gh0ul writes "news.com is featuring an article regarding Microsoft and Unisys' joint venture to steer companies/individuals away from Unix and branch in to the corporate servers based on Windows2000. With all the negative impact towards 'wehavethewayout.com', im supprised they kept it going.. guess that $28 million matters.."
We've patented that way to think, sorry. An Anonymous Coward writes: "The Symantec marketing droids are on the rampage again. After patenting their definition update technology, this time they patented heuristic virus scanning. When will this insanity end? :P"
I'll believe it when it's powering my air-car. abburdlen writes: "A month ago an article in the Journal Science appeared hyping the possibility of tabletop fusion. Quick summary: Sonoluminescence in heavy acetone ... temperature of collapsing bubbles reaching temperature hotter than the Sun ... evidence of fusion. There was some excitement. There were also many initial skeptics. Looks like the doubtful win again. From the APS, 'The possibility of a major discovery has been obscured by substandard experimental techniques.' Ouch."
One day we'll all have decent bandwidth, right? Pathway writes "I know this has been looked at by slashdot before, but here's a good update comparing the Zipp Fiber to the Terabyte Triangle in Spokane at thelocalplanet.com. In the article, they compare how one prodject is so successful, while the other is foundering. It's a good read."
-
BladeEnc source to be released under the GPL
BladeEnc is to be released under the LGPL. BladeEnc is an MP3 encoder which not only runs faster than the ISO code but is optimized to produce better sounding 256Kbit streams. Yes, files are larger, but they sound better. Within the category of free encoders, BladeEnc claims first place in The User Oriented MP3 encoding guide. This link was found on DemoNix -
BladeEnc source to be released under the GPL
BladeEnc is to be released under the LGPL. BladeEnc is an MP3 encoder which not only runs faster than the ISO code but is optimized to produce better sounding 256Kbit streams. Yes, files are larger, but they sound better. Within the category of free encoders, BladeEnc claims first place in The User Oriented MP3 encoding guide. This link was found on DemoNix