Slashdot Mirror


Collective Intelligence in Action

lamaditx writes "The book Collective Intelligence in Action shows you how to apply theory from Machine Learning, Artificial Intelligence and Data Mining to your business. The goal is to create systems which make use of data created by groups of people — i.e. social networks — and abstract from these to gain new or additional information. Some of you might think "just another kind of Web 2.0." This is one application you might think of, but the input and output format do not matter that much. You can use these methods anywhere as long as the amount of data is big enough. You will find some examples related to the latest web technologies to explain methods, but the code is rather generic. Also, you won't find a lot disturbing details about HTML, HTTP and the like." Keep reading for the rest of Adrian's review. Collective Intelligence in Action author Satnam Alag pages 397 publisher Manning Publishing rating 8 reviewer Adrian Lambeck ISBN 1933988312 summary Shows you how to apply theory from Machine Learning, Artificial Intelligence and Data Mining to your business.

There are three main parts to Collective Intelligence in Action. The first part explains how to gather data from external sources or internal repositories. The second part, "Deriving Intelligence", explains how to analyze the collected data. This is the part where you gain information and create new ideas. This does not help you much unless you find a way to use this in you application. The third part — which is also the shortest — provides you with some information on how to use the results in order to build user centric applications. This is obviously the best way to create a unique difference no matter what kind of services you may want to provide.

I have to admit that I waited for such a book for some time. After studying Artificial Intelligence — a modern approach — maybe THE book about AI — I felt like knowing a lot theory but missed the practical aspects. Several AI concepts are used in this guide but you don't create an AI system or an agent. Don't mix up those two even though they are similar.

The "in Action" series in supposed to show how things are done in practice. You can expect a lot of Java code samples and advice. Several open source tools are introduced to enable you to build your own system. These are also Java tools. It's up to you if you prefer to use Java or some other language. From my perspective it does not really matter which language you choose because the concepts can be implemented using other languages as well. The main drawback is that you will not be able to use Java Data Mining API (JDM) which is used extensively.

The first chapter introduces the main terms and concepts of the book. It is available here together with chapter 2 and the source code. One thing I consider to be an important prerequisite are mathematics. Most aspects are easy to read and understand if you have some knowledge about statistics and linear algebra. On the other hand you can still get it with basic math because the explanation is well written. The same holds for standard concepts and algorithms like word stemming, decision trees, Bayesian networks or k-means. These are summarized with the most important properties such that you don't require prior knowledge. You will notice that the chapter, like the following ones, ends with a large amount of references.

Personally I find it hard to read formulas when they are described in words (like: take the square root of x and multiply with y) instead of the mathematical notation. This is due to the fact that you cannot look up the formula quickly, because it does not stand out from the text. It might have been better to provide the formula in words and a mathematical notation as well. You will find some formulas in mathematical notation but some are really hard to read since they are printed in a font size of about 4 while the text is written in 10.

Coming back to the content: The other sections of the first part show you how to gather data from external online sources. Of course you can apply the same concepts to offline sources or other data repositories. The key is to collect usable data to derive intelligence later on. One example is generating tags from a number of sources and associate each tag with a weight relative to the occurrence of the tag. The result will be one of the well known tag clouds.

You will need a persistent data storage such as a database for the results and access them in the second step. Unsurprisingly you will find several ER diagrams to create the right data structure. A big plus is that the author tells you explicitly the important facts which can be derived from formulas or (ER-) diagrams. Reading the text is much more convenient this way. He will also provide implications for the database design when discussing ER diagrams. You can be sure that you do not miss the important points.

The second part starts with an introduction of data mining and machine learning terminology and concepts. You are also introduced to the JDM API which proves to be helpful in the future. You may start looking for a substitute if you choose not to use Java. The extensive usage of design patterns in almost every aspect eases the change from Java to an alternative language. You get to know the common methods and how to implement them. I consider this part to be more or less craftsmanship . There is some magic to it if you never heard anything about the utilized methods.

The only thing that caught my eye was the calculation of the inverse of a matrix. The notation is pretty common when solving linear equations, but you should never (except in rare cases) use the plain matrix inversion operation when implementing your solution. The reason is that the amount of effort to be undertaken grows exponentially. The more data is used, the larger the matrix will be — and thus the longer it will take to compute the inverse. Instead one should use, e.g., LU decomposition. The footnote points you to use the weka.core.matrix.Matrix class, which uses LU decomposition, but make sure about that if you use some other package or some other language.

The last 80 pages enable you to make use of your information gain and integrate it in the application. This is also the shortest part but that is due to the fact that the heavy lifting was done in parts one and two. Application means basically querying your data in the correct way to generate the right recommendations for your users. One part of that is searching and the other one is recommending. You may imagine the necessary effort to undertake if you ever happened to take a look at the way search engines work. The author deals with that by using the open source search engine Nutch together with Lucene in such a way that you just use the interfaces. This approach enables the author to keep the last part as short as it is.

I consider "Collective Intelligence in Action" to be a very good book. It is thought through from beginning to end. Examples are not just presented to the reader, but evolve step by step. You know why things are done the way they are, which enables you to change every aspect in a way you need to. From my point of view this is the right way to do it because a copy-and-paste solution would not get the job done. I pointed out some issues that could be done better such as too-small fonts in graphics or missing literature references in the text. However these are not major problems or content errors that should be blamed on the author. Finally I think you will gain from this book because it addresses Web 2.0 to some degree but is generic enough for other applications as well.

Adrian Lambeck is a graduate student in "Media and Information Technologies" and uses C# more often than Java.

You can purchase Collective Intelligence in Action from amazon.com. Slashdot welcomes readers' book reviews -- to see your own review here, read the book review guidelines, then visit the submission page.

53 comments

  1. Congress by Rinisari · · Score: 0, Offtopic

    I clicked on this story thinking that it might be regarding Congress, but then, as it was loading, I remembered that Congress and "collective intelligence" don't belong in the same, non-sarcastic sentence, especially after the passage of the most recent economic "stimulus" spending bill.

    1. Re:Congress by Rinisari · · Score: 1

      It wasn't OK when Bush did it! His was arguably worse!

  2. Problem with Group Think. by jellomizer · · Score: 0, Offtopic

    The most charismatic person, usually wins.
    While the best ideas that actually work are thrown to the dumps.
    So the Bad Ideas succeed and the good ideas fail, Unless the Charismatic person is right.

    --
    If something is so important that you feel the need to post it on the internet... It probably isn't that important.
    1. Re:Problem with Group Think. by Anonymous Coward · · Score: 0

      Charisma helps, certainly. But even the most charismatic person in the group has to go along with the groupthink (regardless of personal opinion) or risk his or her position in the group. There is always someone there ready to take you down a peg, and stepping too far away from the groupthink, charisma or not, is a good way to get knocked down.

    2. Re:Problem with Group Think. by amilo100 · · Score: 2, Funny

      I think that we can all agree that Group Think is bad.

    3. Re:Problem with Group Think. by beav007 · · Score: 1

      I'm still undecided, so I'll go with the majority on this call...

  3. A font size of 4?! by going_the_2Rpi_way · · Score: 1, Interesting

    You will find some formulas in mathematical notation but some are really hard to read since they are printed in a font size of about 4 while the text is written in 10

    Yikes. How'd that ever happen? Is this common with Manning's Equations? (Hah!)

    Seriously though, what a nightmare.

  4. Disturbing details? by Gizzmonic · · Score: 5, Funny

    Also, you won't find a lot disturbing details about HTML, HTTP and the like."

    What "disturbing details" are there about HTTP? Does it have connections to rogue regimes or something? Is it a deadbeat dad? Does like to dress as a clown?

    --
    (-1, Raw and Uncut is the only way to read)
    1. Re:Disturbing details? by Anonymous Coward · · Score: 0

      All of that plus it kicks puppies.

    2. Re:Disturbing details? by thermian · · Score: 4, Insightful

      Also, you won't find a lot disturbing details about HTML, HTTP and the like."

      What "disturbing details" are there about HTTP? Does it have connections to rogue regimes or something? Is it a deadbeat dad? Does like to dress as a clown?

      You can embed flash in it?

      --
      A learning experience is one of those things that say, 'You know that thing you just did? Don't do that.' - D. Adams
    3. Re:Disturbing details? by Anonymous Coward · · Score: 0

      You can embed even IP in it.

  5. To quote the late Edward Abbey: by Anonymous Coward · · Score: 5, Funny

    One man alone can be pretty dumb sometimes, but for bona-fide stupidity, nothing beats teamwork.

    1. Re:To quote the late Edward Abbey: by Anonymous Coward · · Score: 0

      One man alone can be pretty dumb sometimes, but for bona-fide stupidity, nothing beats teamwork.

      Right. I always thought the collective intelligence of any mob was inversely proportional to its size.

    2. Re:To quote the late Edward Abbey: by NewbieProgrammerMan · · Score: 1

      That's why meetings are so awesome!

      --
      [b.belong('us') for b in bases if b.owner() == 'you']
  6. Karma whore much? Or astroturfing? by StandardDeviant · · Score: 5, Insightful

    Nearly identical copy and paste from an amazon review (with parts of another), eliding potential negatives like pointers to a competing book and this text not being useful (in that reviewer's opinion) for academia.

    1. Re:Karma whore much? Or astroturfing? by tomtermite · · Score: 0, Redundant

      awesome detective work. good job

      --
      - Ubique, Tom Termini www.bluedog.net - WebObjects / J2EE SOA / iPhone solutions for knowledge workers
    2. Re:Karma whore much? Or astroturfing? by Smidge207 · · Score: 1, Offtopic

      You may recall that last year year, astroturfing, sock puppetry and other forms of fake-blog bullshit were made illegal by SCOTUS, POTUS & HOWE.

      Section 22 of the Unfair Commercial Practices Directive is absolutely clear, making it illegal to go around "falsely claiming or creating the impression that the trader is not acting for purposes relating to his trade, business, craft or profession, or falsely representing oneself as a consumer".

      Has it made a difference? Nope. Just ask Roland Piquepaille (who is currently posting as StandardDeviant).

      Roland is a blogger, and a few weeks back he did a typical blogger thing. He'd been mucked about by a health club, which messed up his cancellation and continued to charge him, so he vented his frustration on his blog.

      Time passed, and then a new poster turned up, disagreeing strongly with what he'd written in a number of messages. You're talking out of your backside, the commenter told him. I've been dealing with that firm for a million years, and they're bloody great, he said. They're the nicest, smartest, funkiest, sexiest company on the face of the Earth, and they're probably the best company in the universe, too. You're a big fat liar.

      Roland idly Googled the commenter's name. Surprise! He's a middle manager in the very company Mike was blogging about. So Roland did another typical blogger thing. He blogged about that, too, outing the manager on his blog and accusing him of astroturfing. That's when the legal letters started.

      You know the drill. You're defaming our client, the letters said. Take down the posts, or we'll sue you until you squeak.

      The thing is, the lawyers didn't say that he was wrong. Quite the opposite. The poster was indeed a manager of the very health club Roland had blogged about. They didn't dispute that.

      However, they said that the company hadn't asked the manager to post anything and certainly didn't approve of such behaviour. Because Roland had named the company, his posts about astroturfing were therefore defamatory.

      Was Roland in the right? I think so, but that doesn't really matter. He wouldn't get legal aid to fight a defamation case, so if he went to court and lost, the costs would ruin him â" and even if he won, the costs would still ruin him.

      He could try to get the firm prosecuted for astroturfing, but all that stuff about 5K fines and two-year jail sentences is for the really bad guys. In a case like this, it'd be tough to persuade Trading Standards or the OFT to even investigate; if they did, the punishment wouldn't be anything more severe than a stern letter.

      Faced with a battle he couldn't possibly win, he took the posts down. It doesn't matter that his posts were honest and accurate, or that the accusations he made were absolutely true. All a company needs to do when caught astroturfing is to say "we knew nothing about it" and send the lawyers in.

      As ever, when it comes to companies behaving badly online, might beats right every time.

      =Smidge=

      --
      Is it just my observation, or is eldavojohn an idiot?
    3. Re:Karma whore much? Or astroturfing? by Anonymous Coward · · Score: 0

      Dosen't take detective work to know that smidge207 is a known professional troll.

      He's one of the smarter ones, though.

    4. Re:Karma whore much? Or astroturfing? by LionMage · · Score: 1

      True, smidge207 does appear in that lovely list of professional trolls...

      However, you could have at least made sure you replied to the correct comment thread; instead, you replied to a comment that was a sibling of the post by smidge207.

  7. Let's be thankful by MrEricSir · · Score: 3, Insightful

    ...that the article didn't mention the word "crowdsourcing."

    My collective wisdom is that "crowdsourcing" is a stupid word, and I cringe every time I see it.

    --
    There's no -1 for "I don't get it."
    1. Re:Let's be thankful by dkleinsc · · Score: 1

      My collective wisdom is that "crowdsourcing" is a stupid word, and I cringe every time I see it.

      I disagree: It's not the word that's stupid, it's the people using the word.

      --
      I am officially gone from /. Long live http://www.soylentnews.com/
    2. Re:Let's be thankful by Anonymous Coward · · Score: 0

      I tried to be thankful, but some shmuck posted the word twice in the comments!

    3. Re:Let's be thankful by Anonymous Coward · · Score: 0

      I think it's probably both. Can't we all just agree on something?

  8. I have the thing... by Anonymous Coward · · Score: 0

    It's actually a really good book. It really does cover many details of how to harness the wisdom of crowds (despite the comments above).

    1. Re:I have the thing... by nizo · · Score: 0, Redundant

      Man I'm sold; Anonymous Coward has never steered me wrong before!

  9. Chapter 11 of the book talks about Lucene... by tcopeland · · Score: 2, Interesting

    ...I've been using Sphinx a lot recently and have been really pleased with it. The indexer is fast, there's good Ruby on Rails integration, and I don't worry about scalability since if it's good enough for craigslist it's good enough for me. Definitely worth a look for your next project that needs to do full text search.

    For a quick demo of it, do some searches here.

  10. Roland's still posting? by ovu · · Score: 2, Informative

    that's a neat trick!

  11. How does this book compare to by Anonymous Coward · · Score: 0

    Programming Collective Intelligence by O'Reilly? I'm surprised that there are no comparative statements.

    1. Re:How does this book compare to by Anonymous Coward · · Score: 2, Informative

      In Programming Collective Intelligence you actually learn to build the engine for stuff like classifiers. This one abuses frameworks, and still is bigger(page num,LOC).
      Im buying it, because im interested in the subject, but from what ive seen, the python one is the one to have.
      This is just a java version of it, catering to the ones who are too lazy to leave their java confort zone.

  12. Matrix inversion by mesterha · · Score: 1

    Not to be pedantic, but matrix inversion is not exponential in n, the size of the matrix. If you want to solve Ax=b, LU decomposition is roughly 3 times faster than matrix inversion. Perhaps your thinking of Crammers method which is exponential. Also, it can be hard to solve Ax=b exactly, and I vaguely remember that the size of the answer as rational numbers can require exponentially more bits than the input, though I can't find a reference.

    --

    Chris Mesterharm
    1. Re:Matrix inversion by dido · · Score: 1

      FYI, you can do matrix inversion using either Gauss-Jordan elimination or LU decomposition, and whether you use these algorithms to solve a linear system or to invert a matrix they are both O(n^3), although you are correct that with either algorithm the constant factors involved for solving a linear system are lower than that for inverting the associated matrix of the linear system. You should only calculate the inverse of a matrix A if you expect to solve the linear system Ax = b for many different values for the column vector b, since multiplying a square matrix to a column vector can be done in quadratic time.

      The Strassen algorithm for matrix multiplication can also be adapted to do matrix inversion, and it gets O(n^2.808...) running time, but it's not numerically stable and the overhead is high enough that it's not worth doing except for large matrices, and in such cases you may be better off using Gauss-Seidel iteration instead.

      --
      Qu'on me donne six lignes écrites de la main du plus honnête homme, j'y trouverai de quoi le faire pendre.
    2. Re:Matrix inversion by mesterha · · Score: 1

      Well I was more interested in exact computation or at least relative error bounded computation. An ill-conditioned matrix is going to give problems for any type of elimination based algorithms. Perhaps iterative techniques can give decent bounds on relative accuracy while still using double precision floats. If I want exact answers, it might be the case that the best algorithm is exponential if I use rational numbers to represent the inputs. Of course, if I start using infinite precision computation, I could use an elimination algorithm and just extend the precision enough to compensate for the lost bits.

      --

      Chris Mesterharm
  13. Motley Fool CAPS? by scorp1us · · Score: 1

    I've often though that Motley Fool's CAPS was ripe for the picking. Collective intelligence of investors - and they even are rated!

    When will there be a Motley Fool CAPS fund that uses the investor intelligence?

    For that matter, where is the utility to make my trades automatic based on investor intelligence?

    --
    Slashdot's rate-of-post filter: Preventing you from posting too many great ideas at once.
    1. Re:Motley Fool CAPS? by Richy_T · · Score: 1

      Meh. Even Investors don't use the investor's intelligence

  14. None of us. by sycodon · · Score: 1

    Collective Intelligence: None of us is as dumb as all of us.

    --
    When Fascism comes to America, it will call itself Anti-Fascism, and tell you to give up your guns.
  15. collective intelligence inaction by Grendol · · Score: 1

    Isn't that what committees and legislative bodies do?

  16. I actually read the book by Anonymous Coward · · Score: 0

    but that disqualifies me from leaving a meaningful comment on slashdot.

  17. Another good book by vorpal22 · · Score: 3, Informative

    I haven't read Collective Intelligence in Action, but I have studied machine learning at university and then read Toby Segaran's Programming Collective Intelligence (linky), which I found to be an excellent, highly accessible book for learning the basic concepts of ML in a practical setting and with immediate uses being highlighted.

    Given the author's description, I'm glad that I chose Segaran's book: the programming language of choice is Python, which results in very short and readable, fully functional code samples, and builds right up from core concepts instead of hiding a lot of the underlying machinery using something like JDM. Reading example code written in Java (unless the code is specifically chosen to illustrate Java or Java APIs) sounds rather tedious.

    Collective Intelligence in Action also sounds like it might try to be too far reaching, e.g. focusing on the data model for the problem instead of on the machine learning itself. Segaran's book was strictly focused on ML, and was a very nice, informative read.

    Just thought I'd throw an alternative out there for anyone interested in machine learning, which I highly recommend studying. It's a really interesting field with loads of applications.

    1. Re:Another good book by Anonymous Coward · · Score: 0

      If you liked Tobby's book then you might be interested in this one too:

      http://www.manning.com/marmanis/

      Excellent coverage of search, recommendations, and classification!

      I thoroughly enjoyed it ...

  18. Language-specificity by Slippery+Slope+Man · · Score: 1

    A question to anyone who's read this book-- is knowledge of Java absolutely required, or is it general enough to somewhat easily use with a different language? I mean, it seems relevant to some of my personal interests but I'd like to know whether I need to give myself a crash course in Java or not.

  19. I'd like to see anyone... by OneSmartFellow · · Score: 1

    ...try to use this to predict the next Lottery numbers, or even choose stocks.

    1. Re:I'd like to see anyone... by fava · · Score: 1

      Are you not then making some assumptions about the collective intelligence of the average lottery player?

  20. Wow! What a fancy name... by Jane+Q.+Public · · Score: 1

    ... for Data Mining.

  21. Book Suggestion by Ukab+the+Great · · Score: 1

    I'm not big on Programming Collective Intelligence, but I'd love a book on Debugging Collective Stupidity.

    1. Re:Book Suggestion by kalirion · · Score: 1

      What's there to debug? Terry Pratchett has already found the problem - "The IQ of a mob is the IQ of its most stupid member divided by the number of mobsters."

    2. Re:Book Suggestion by the+eric+conspiracy · · Score: 1

      That's way optimistic. It is not an O(n) denominator!!! It isn't even polynomial.

  22. Wisdom of Crowds by mahadiga · · Score: 1

    Collective Intelligence in Action

    Is this different from The Wisdom of Crowds.
    IMO, Wisdom of Crowds works only with Altruists!
    Hence Democracy is NOT Wisdom of Crowds

    --
    I'd like to buy homeland for our 10 million people. http://twitter.com/mahadiga
    1. Re:Wisdom of Crowds by PPH · · Score: 1

      Another thing that's pointed out in the text you referenced is that this collective wisdom depends on the inputs of a number of independent individuals. If some form of consensus building or opinion shaping has occurred prior to your collecting the data, the outcome may no longer be correct.

      Applying this principle to various attempts at data mining means that you have to do quite a bit of analysis of your collection methods, prior to hacking together some code, to ensure that the data is untainted. That's a major problem in the collection of data in the social sciences and one that I'm not certain (based upon a minimal perusal of TFA) is rigorously addressed in this text.

      --
      Have gnu, will travel.
  23. Coincidence? by anomalous+cohort · · Score: 1

    I'm reading this same book now. I haven't gotten very far into it yet but so far so good. There is a somewhat intelligent use of diagramming including flowcharting and class diagrams. There are statistics formulas with examples. It's not all dry, though. There are screen shots of linked in and digg and descriptions of how to incorporate or embed collective intelligence style features.

  24. Who is in the Group? by aoheno · · Score: 2, Funny

    Does this mean the collective intelligence derived from a group of Bankers will provide new ways of tanking entire global economies?

    --
    Her lips were softer than a duck's bill, but her quacks ...