Slashdot Mirror


Boiling Down Books, Algorithmically

destinyland writes "A year ago, Aaron Stanton harangued Google over his new project, a web site analyzing patterns in books to generate infallible recommendations. In March he finally finished a prototype which he showed to Google, Yahoo, and Amazon, and he's just announced that he's finally received a big contract which 'gives us a great deal of potential data to work with.' The 25-year-old's original prototype examined over 200 books, plotting 729,000 data points across 30,293 scenes — but its universe of analyzed novels is about to become much, much bigger."

22 of 177 comments (clear)

  1. Just one more errosion.... by zappepcs · · Score: 5, Insightful

    The difference between now and 100 years ago becomes more apparent each day. Then, owning books was a sign of affluence, of intelligence. Now? Everything is up to question, and should be. Analyzing books and other public material is just another step in putting intelligence out there for everyone, not just those that can afford it. I applaud it, and all the dangers it brings. Such hurdles are necessary, but we must assault them to overcome barriers that should no longer exist.

    1. Re:Just one more errosion.... by Anonymous Coward · · Score: 5, Insightful

      Knowledge, not intelligence.

    2. Re:Just one more errosion.... by blahplusplus · · Score: 5, Insightful

      What really hits a nerve with me is why the scientific community hasn't opened up all their journals for others to read. I imagine many retired and amateur scientists, engineers, hobbyists, etc, would have a lot of insight into many engineering and scientific problems and also make many discoveries as well. Intelligence is not limited to the credentialed, those of high status or currently employed, many discoveries happen simply by exposure to as many minds as possible, and finding connections and errors in others works..

    3. Re:Just one more errosion.... by dnwq · · Score: 4, Informative

      The researchers publishing these papers typically don't get much more than citations - the money mostly goes to publishers like Elsevier. Blame them instead.

    4. Re:Just one more errosion.... by Sir+Holo · · Score: 5, Informative

      blahplusplus: What really hits a nerve with me is why the scientific community hasn't opened up all their journals for others to read.

      We scientists would absolutely love to have all of the journals opened up for free access to everyone. But, you see, the publishers own the copyright to our articles. The system requires us to give them the copyright, in order to get our stuff published. Then you, me, and everybody else has to pay to read recent research.

      Thankfully, some established journals are going open-access.

      That's very promising. But the fact remains that publishers such as Elsevier own the copyright to many decades-worth of scientific literature. And they're not about to give any of it away.

    5. Re:Just one more errosion.... by Skreems · · Score: 4, Insightful

      If you're talking about news, you're correct. But the original article is applying this to works of fiction, which still take at least a decade to go out of date (if not longer) despite the internet and the hard-on you appear to have for it. This "invention" is not about freeing information, it's basically a fancy way to mathematically calculate that if you like The Hobbit, you might also like The Lord Of The Rings. It might be beneficial to someone looking for more of the same, but it doesn't even seem to serve to further creativity since by design it will not recommend things that will expand your horizons, but will encourage people to stay with the safety of yet another rehash of something they've already read.

      --
      Slashdot needs a "-1, Wrong" moderation option.
      The Urban Hippie
    6. Re:Just one more errosion.... by smittyoneeach · · Score: 4, Insightful

      Or wisdom, for that matter.

      --
      Get thee glass eyes, and, like a scurvy politician, seem to see things thou dost not.--King Lear
    7. Re:Just one more errosion.... by smittyoneeach · · Score: 5, Insightful

      If you wish to spend your nights reading information from 2+ years ago, that is your problem. The rest of us want today's information, and now. Good luck with the personal library.

      It's getting to the point that you need a 2+ year filter just to dampen the noise in the signal.
      And let's give a shout out to all of the library homiez. While I'm affluent enough to afford the occasional impulse book at the store with the built-in coffee shop, I do recall many an hour of random wandering in the public library in my youth.

      --
      Get thee glass eyes, and, like a scurvy politician, seem to see things thou dost not.--King Lear
    8. Re:Just one more errosion.... by Wrath0fb0b · · Score: 4, Insightful
      I wish it weren't so (and I submit all my papers to http://www.arxiv.org/ as well to the journals), but the fact is, closed journals provide significant value both to the reader and to the submitting author. I'm not really trying to defend the system here, by the way, I'm just trying to explain what purpose it serves (and what an open alternative would have to match).

      Referees and Peer-Review Referees are invaluable because someone has to objectively assess articles for basic scientific merit and rigor. The better journals can recruit referees for each submission that truly grok the subject matter and can often work very productively with an author. Quite a number of important advances are made and pitfalls avoided because a referee insisted that a researcher cover her bases before submission. Of course, nobody claims that PR journals are bullshit-free, but they are certainly far better than un-reviewed sources like arxiv.

      This function is especially important for readers in multidisciplinary fields (myself included) that often read papers on subjects in which we are not expert enough to know what constitutes sound science. When I read about some group that has extracted and crystallized some protein, I'd like to know that someone competent at the relevant techniques has scrutinized their methods because I haven't the faintest clue (I'm a physicist by training, a biophysicist by necessity).

      Prestige and Selection Another important function of the journals is to select articles by importance. If a paper makes Nature or Science, that's usually a good indicator that they've made an important advance. The benefits of this selection are twofold: first, readers can keep tabs on work at the forefront without wading through lots of papers. It sounds lazy, but most of us cannot read every paper that is published and are quite glad to outsource some filtering to the journals.

      Secondly, it allows authors to demonstrate to people outside their immediate field what caliber work they've done. Even among people in the same department, it's not immediately clear what qualifies as a breakthrough work (as opposed to incremental work, which I don't trash in the least bit, but it's not really the same hat) -- prestigious journal cites are a good substitute, especially when the alternative is to either become an expert in the field or find one and ask.

      Review Articles Most journals have an in-house staff to write articles reviewing the state of a particular field/technique/whatever. This is also an invaluable services because sometimes one needs a broad, textbook-level summary instead of a large number of discrete, deep papers on a topic. Given that science is done in small, insular little bits, it's natural that there is room for someone to aggregate and summarize those bits and put them into a larger perspective.

      Editing Another thankless job (the snarky comments about the /. eds belie the fact that editing is hard work). Dupes are weeded out and researchers with poor language skills (especially when writing in an adopted language) are given help communicating their ideas. Confusing or unclear language is massaged back into form, figures are well-presented and well-labeled, text is formatted to be easy on the eyes, references are given in a standard form. These things count more than most /.ers realize (Knuth was on to something guys . . )

      Access Brutal honesty, we don't really care about the access restrictions. Every university has license to pretty much all the major journals. We can get them from wherever with a quick login and so can everyone we know. Sorry, but that's the truth.

    9. Re:Just one more errosion.... by Man+On+Pink+Corner · · Score: 4, Insightful

      I dunno, man. Pretty much every point you covered is Wiki-able.

    10. Re:Just one more errosion.... by Free+the+Cowards · · Score: 4, Insightful

      Much better to blame the researchers for not publishing in a more open medium. They're the ones who might actually change their habits, after all.

      --
      If you mod me Overrated, you are admitting that you have no penis.
    11. Re:Just one more errosion.... by Z34107 · · Score: 5, Funny

      I dunno, man. Pretty much every point you covered is Wiki-able. [Citation needed]

      --
      DATABASE WOW WOW
    12. Re:Just one more errosion.... by Virtual_Raider · · Score: 5, Interesting

      the idea of finding books you should read but don't know about seems a problem particularly poorly suited to an automated solution.

      Er... -1,Wrong* : You don't seem to be considering the impact of statistical analysis and Very Large Sets of Data (C)(TM). It's becoming increasingly possible not only to know that 125K other people all over the world bought books B, C and D along with book A that you purchased, but now you can also index and analyse their content so it will be even easier to fine tune.

      Imagine this: On the first iteration (first purchase) it can only out-of-the-blue recommend to you those books more consistently purchased along with the one you chose. But on subsequent transactions it can remember what you bought and compare the contents of the books. Now if you bought The Silmarillion, Kontakto and The Unfolding of Language over time, it would be possible to suggest that you read Shakespeare's works in their original Klingon once it realizes that you are equally interested in languages as in fictional civilizations.

      I agree with you that the day an algorithm can make value judgements on the artistic merits of any work is still far ahead, but there was just recently a story about this FireFox plug in that sumarizes user reviews. Combine the two and...

      * Didn't we have this conversation before, or is it just a popular .sig? If there was a "-1,Wrong" moderation, you would be told that the info is wrong but you would lose any insight provided by a direct reply of somebody that bothers to correct you AND post the right facts. With Slashdot being a discussion forum, it's on its best interest to actually promote discussion so you most likely will never see that mod option implemented.

      --
      +Raider of the lost BBS
    13. Re:Just one more errosion.... by h4rm0ny · · Score: 4, Funny

      What about insight?

      What about synonyms?

      --

      Aide-toi, le Ciel t'aidera - Jeanne D'Arc.
  2. Newspeak by RDW · · Score: 4, Funny

    I love how the prototype version in the link gives a 98% match between George Orwell's '1984' and the text of the USA Patriot Act!

    1. Re:Newspeak by drinkypoo · · Score: 4, Funny

      They're still working out that last 2% margin of error.

      --
      "You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
    2. Re:Newspeak by log1385 · · Score: 5, Informative

      From the FAQ:
      "Does 1984 really match the U.S. Patriot Act?
      No, that is an easter-egg. A bit of a joke on our part."

      --
      Seek and ye shall find.
  3. If you already read, you don't need this... by thereofone · · Score: 5, Insightful

    ...and if you do not read, you won't want this.

  4. I'll believe it when I see it by clarkkent09 · · Score: 4, Insightful

    I am skeptical that analyzing the content of the books can lead to good recommendations, let alone "infallible". Two books can be very similar in subject matter and writing style and yet one can be great and the other one awful. The difference is just too subtle for an algorithm to figure out, though I hope I am wrong and it turns out that it works, it would be very useful. Same applies to movies and music as well. I always found "Customers who purchased this book also purchased...." section on amazon to be more valuable than my personalized recommendations

    --
    Negative moral value of force outweighs the positive value of good intentions.
  5. Yet Another Pointless Dot-Com by techno-vampire · · Score: 4, Insightful

    This is just another pointless project that's going to waste the time and skull-sweat of a good but unrealistic programmer. All he's going to have when he's done is the solution to a problem that doesn't, for all practical purposes, exist. Good writers won't need it because they know what to do and how to do it, so they won't use it. It will only be used by poor writers, who won't know how to put the suggestions into effect properly. It may, possibly, tell a writer where their book needs work, or where it's not interesting enough, but I doubt it. Most likely, all it will do is tell it where it's not like other successful books because it won't be able to recognize or take into account any originality. Even if its recommendations are right, a poor writer is highly unlikely to profit from them, because by definition a poor writer won't know which suggestions are good or the skills to take advantage of them properly. No, what a poor writer who wants to get better needs is either a good critique group or some friends who will act as beta-readers, telling him not only what doesn't work but why (Something, I might add, that I find it hard to believe this program could ever do.) and discuss things with the author until they understand each other. Mechanical criticism of literature can only result in mechanical literature, not good writing.

    --
    Good, inexpensive web hosting
  6. algorithm bombing by notgm · · Score: 4, Insightful

    how long before someone figures how to fool the algorithm, and we all start reading books about enlarging our genetalia, but in a classy way?

  7. Who is Joe? by mustafap · · Score: 4, Interesting

    There is one persistent son of a bitch on their forum, Joe, who seems to be their nemesis. I wonder what his angle is.

    Other than that, I like their approach - involve the community *really* early on.

    Apart from Joe.

    --
    Open Source Drum Kit, LPLC deve board - mjhdesigns.com