Slashdot Mirror


Text Mining the Multiverse

The NYT has a decent piece about text-mining, skimming large volumes of miscellaneous text to extract some sort of refined knowledge from it.

3 of 137 comments (clear)

  1. I didn't read the article by Mattwolf7 · · Score: 2, Insightful
    Why does slashdot keep linking to articles that require NYT registration? Isn't there some sort of Google news out there?

    (Yes I am a lazy /. reader)

  2. Well, DUH! by djeaux · · Score: 2, Insightful
    How well computers truly make sense of what they are reading is, of course, highly questionable, and most of those who use text-mining software say that it works best when guided by smart people with knowledge of the particular subject.

    May I offer that computers make no sense of what they are reading & that "smart people with knowledge of the particular subject" aren't optional if the results of text-mining are to be of any usefulness whatsoever, at least in any kind of reasonable time frame.

    Otherwise, the text-mining computer is playing the old "99 monkeys with typewriters" game...

    --
    "Obviously, I'm not an IBM computer any more than I'm an ashtray" (Bob Dylan)
  3. but what about the data itself? by koekepeer · · Score: 2, Insightful

    i always wondered about this

    allright, you can take huge amounts of text and apply some smart tricks to extract patterns from it.

    but how can you determine whether the original data was trustworthy?

    take the example of genome annotation (description of gene function), which would be helped greatly by including more functional descriptions from scientific literature. how do you determine whether the original publication was backed by solid experimental research?

    by the reviewers of the articles? i don't think so, peer review is a snakepit filled with politics. by the amount of people who cited it? hmmmm... so hip subjects are more true?

    me personally, because i'm experienced, can recognise bullshit articles when i see them. but how to translate this into an algorithm... anyone any ideas about this? or even working solutions?

    (of course this is an example from my field of expertise - biology, but it applies to any set of text data/articles IMO)