Slashdot Mirror


An App to Boil Down Online User Reviews

An anonymous reader writes "Is this a glimpse at the future of the Semantic Web? A new startup named Pluribo has developed a technology that can auto-summarize user reviews on the internet. It is a Firefox extension that can take a webpage filled with reviews and condense it down into a couple of sentences. Currently, it just works with Amazon electronics, but the potential seems incredible. Ars Technica took an in-depth look."

9 of 82 comments (clear)

  1. Okay, now here's a request: by Penguinisto · · Score: 4, Interesting

    ...is there any way to have it filter out the obvious astroturfers and trolls?

    Seriously, any big-name product or service will have a coterie of fanboys (or paid astroturfers) who will praise something no matter what, and a flock of trolls who will point out everything wrong with it, no matter what.

    ...now how do you filter those out?

    Do that, and it'd be one hell of an advancement in filtering. :)

    /P

    --
    Quo usque tandem abutere, Nimbus, patientia nostra?
    1. Re:Okay, now here's a request: by Dachannien · · Score: 4, Interesting

      I was going to post something similar. I was apartment shopping earlier this year, and the amount of astroturfing and astrotrolling* was incredible.

      Any filter that decreases the amount of information that I can use to evaluate the "truthiness" of a review is a bad thing. What's more, if filters like this catch on, people will be selling FEO (filter engine optimization) services to game the filters with their astroturf, and then the reviews will become completely useless.

      * In case I just made up a word, what I mean by "astrotrolling" is people who post shit about a product to get people not to buy it because they have a separate axe to grind against the seller. In the case of apartments, it's often poor tenants who tore up the apartment/broke the lease/got evicted and still amazingly expected their security back.

  2. It's not summarization. by melted · · Score: 3, Interesting

    It's heavily templatized generation of language based on the automatically extracted sentiment data. The important difference here is that the language of the summary does not include phrases from the original user reviews. While this is a new twist on the old problem, automatic extraction of evaluation criteria and sentiment analysis in product reviews are not new. Heck, even Microsoft has a working system for that (electronics only):

    http://search.live.com/products/?q=nuvi%20350%20GPS%20-%20Asian%20American%20(City%2FVehicle%2C%203.5%22%20LCD)&p1=%5BCommerceService+scenario%3D%22reviews%22+docid%3D%222BECBBF6F17C98618C2E%22+p%3D%2220df8fe62a9b4e9490993ff7b91032af%22%5D&wf=Commerce&FORM=ENCA

    See the bars on the left, and be sure to click through to the individual sentences. It's spooky how accurate that thing seems to be.

    The problem with all these systems is that they're heavily domain dependent. You will use different language to write a review of a book than for kitchen appliance. In fact, you may even use different language from different kinds of books or different kinds of kitchen appliances. Worse yet, some things are notoriously difficult to accurately measure sentiment on. Once innuendo and sarcasm become frequent, all hope is lost - you need strong AI to figure that out.

    This is not to say these systems are useless - to the contrary, they are very useful in their respective domains. This is just to say that the only new thing I see here is the generated blurb.

  3. Keywords in Context by The+Raven · · Score: 2, Interesting

    I'm concerned how well the applet properly discerns the meaning of words in context. For example, just because I mention that a product is 'a portable laptop' does not mean I am impressed with it's size or weight... it's just the category the product falls in. But judging from the screenshots in the article, this exact error was committed by the plugin.

    Reading natural language is hard, and I'm of the opinion that a Firefox plugin just won't cut it for understanding the nuanced opinions given by reviewers.

    --
    "I will trust Google to 'do no evil' until the founders no longer run it." Hello Alphabet.
  4. MacOS X: How to Summarize the Contents of Document by mattkime · · Score: 3, Interesting

    This document explains how to use the Summarize services available in Mac OS X applications.

    If you have a long document, you can use the Summarize service to get a summary of the contents. For example, use this to get a short version of a long page on a Web site.

    To get a summary of a document, select the text and choose Services from the application's menu, then choose Summarize.

    If the application you are using doesn't support services, copy the text to a TextEdit document to get a summary.

    Note: The information is this document comes from Mac OS X Help, the help system included with your computer. It is based on Mac OS X 10.1.2. If a different version is installed on your computer, choose Mac Help from the Help menu. Updated and expanded information may also be available in other Knowledge Base documents.

    http://docs.info.apple.com/article.html?artnum=61336

    (but i know this feature was in OS 9 or earlier)

    --
    Know what I like about atheists? I've yet to meet one that believes God is on their side.
  5. astroturfers and trolls by bcrowell · · Score: 3, Interesting

    I run a site that catalogs free books, and accepts user-submitted reviews. (See my sig.) It's a constant source of amazement to me what a low level of morality (and intelligence) some authors have. They'll add their book to the catalog even though it's not free. (The site's UI tells them very clearly that it has to be free online in order to be listed.) Then they'll post their own "review" of the book, which reads exactly like a dust-cover blurb rather than a review. Then I check the email address they used to sign up on the site, and it's the same as the email address of the author of the book -- this despite the fact that the button they had to click on to submit their review was labeled I am not the author, and have no personal, professional, or business relationship with the author. I am submitting my review..

    About 50% of the reviews I get are like this, and I have to delete them by hand. I don't actually get that many reviews submitted, which is a good thing in a way, because if the site was really busy I'd never be able to keep up.

    I don't think there's any way of solving this problem, since the internet was designed for anonymous use, and even if it was technically feasible to verify identities on the internet, I wouldn't want to do it. Amazon tries fairly hard to deal with this problem. These days they won't let you submit reviews unless you've bought something from them, which is probably a reasonable way to stop sock puppets. They also try to get you to build up a reputation for your online persona, even if it's not publicly tied to a meatspace identity. That doesn't really work that well, though. For instance, there are certain people on amazon who submit something like ten reviews per day, 365 days per year -- obviously they're not really reading all those books. I also don't see any way to stop the phenomenon of the author getting his friends, family, and grad students to write good amazon reviews of his book.

    Because of all this, I'm suspicious of any statistical method of analyzing user-submitted reviews. You just have no way of knowing which reviewers are honest. You really have to look at the individual reviews and see if what they say makes sense. Ebay feedback is an example of how silly this can all get, even in a community where people really do have long-term online identities that they have an interest in maintaining good reputations for. What the heck does it tell you if the seller has 99% positive feedback? Absolutely nothing. You have to read the 1% negative reviews and try to evaluate whether they sound reasonable.

  6. State of research in the area by Anonymous Coward · · Score: 1, Interesting

    NIST runs yearly evaluations regarding automatic summarization. Some information about that stuff is available at http://www.nist.gov/tac/ (used to be http://duc.nist.gov/).

    There are two main approaches: the domain-dependent template filling plus text generation, or the domain independent statistical sentence extraction. And either way, the quality of the generated summaries is far far away from what a human can write.

    While machine-learning-powered research systems are much better than Word or OSX summarization, the way to go is still long...

  7. Re:One problem by linzeal · · Score: 3, Interesting

    That is why geology geeks use different phones that may not be all fancy and la dee da but they can be dropped from moving vehicles, be submerged in water/beer and still work to call into town for more supplies/beer. I do not understand the fascination with cell phone internet access. The majority of use I see for it is people in bars looking up statistics to win arguments or endless texting and picture sending from giggling girls. I have a GPS navigation system in the car as well as a laptop for full blown internet access when I need it. 95% of the time I have no desire or reason to be online when I am away from home as I have 10's of thousands of textbooks and reference books that I can use on my ebook reader. Thank you cheap SD cards.

  8. Methinks it would make it worse by Moraelin · · Score: 3, Interesting

    Since they explicitly mention Amazon, heh, my experience with Amazon's user reviews has been pretty bad to start with. Caveat: it's not about electronics, but I do buy games and the occasional DVD movie off Amazon.

    My impression is that the amount of fanboyism, astroturfing and bullshit is... epic. Monumental.

    E.g., read some reviews for a game that's not released yet. My favourite example was Gothic 3, when it wasn't even in beta yet, or even alpha. The only thing anyone had were some screenshots of what the graphics engine can do. That's it. Nobody had anything playable yet, probably not even the devs.

    Well, people were already writing reviews in which it's the greatest game ever, and the gameplay rules, the graphics are the best since Michelangelo, etc.

    When released, the game was a buggy mess that didn't even vaguely resemble those "reviews". The graphics had some major glitches. Quests could be broken because the NPC had fucked off, and I know someone who encountered that right in the freaking intro. The game had a nasty memory leak, where eventually it would start to barely crawl and eventually crash... often while saving, leaving you with a corrupt and unusable saved game. Gameplay too was a broken fuckup: e.g., combat was a broken whoever-hit-first-wins affair, because then the other would be continuously interrupted and unable to hit back or change weapons or whatever. Even a flea could probably kill you, if it hit first. Etc.

    Most of that stuff _still_ hasn't been fixed, after more than a dozen patches and the publisher giving up on it.

    But, of course, going by the user reviews, you'd think it's the greatest game ever.

    Now as a human, you can filter out the blatant bullshit, see which reviewers better reflect your taste and didn't post too much bullshit before, etc. I'm skeptical that a program can be too good at doing the same.

    But I have an even worse fear: that once people figure out that they only need to game a program, and how, we'll see even more fanboyism, astroturfing and bullshit. Plus an army of sock-puppets to mod each other up, if the bot takes that into account. Basically, think about all the link farms and link spam on the net to game Google's page rank. Now think the same for a bot aggregating reviews. I find that scary.

    So, no, I don't want it on Slashdot too. Basically, would you really want 300 goatse links, just so the bot includes it in the digested version?

    --
    A polar bear is a cartesian bear after a coordinate transform.