Text Mining the Multiverse

← Back to Stories (view on slashdot.org)

Posted by michael on Friday October 17, 2003 @08:41AM from the mother-lode dept.

The NYT has a decent piece about text-mining, skimming large volumes of miscellaneous text to extract some sort of refined knowledge from it.

3 of 137 comments (clear)

Min score:

Reason:

Sort:

I didn't read the article by Mattwolf7 · 2003-10-17 08:43 · Score: 2, Insightful

Why does slashdot keep linking to articles that require NYT registration? Isn't there some sort of Google news out there?
(Yes I am a lazy /. reader)
Well, DUH! by djeaux · 2003-10-17 09:13 · Score: 2, Insightful

How well computers truly make sense of what they are reading is, of course, highly questionable, and most of those who use text-mining software say that it works best when guided by smart people with knowledge of the particular subject.

May I offer that computers make no sense of what they are reading & that "smart people with knowledge of the particular subject" aren't optional if the results of text-mining are to be of any usefulness whatsoever, at least in any kind of reasonable time frame.
Otherwise, the text-mining computer is playing the old "99 monkeys with typewriters" game...

--
"Obviously, I'm not an IBM computer any more than I'm an ashtray" (Bob Dylan)
but what about the data itself? by koekepeer · 2003-10-17 09:35 · Score: 2, Insightful

i always wondered about this

allright, you can take huge amounts of text and apply some smart tricks to extract patterns from it.

but how can you determine whether the original data was trustworthy?

take the example of genome annotation (description of gene function), which would be helped greatly by including more functional descriptions from scientific literature. how do you determine whether the original publication was backed by solid experimental research?

by the reviewers of the articles? i don't think so, peer review is a snakepit filled with politics. by the amount of people who cited it? hmmmm... so hip subjects are more true?

me personally, because i'm experienced, can recognise bullshit articles when i see them. but how to translate this into an algorithm... anyone any ideas about this? or even working solutions?

(of course this is an example from my field of expertise - biology, but it applies to any set of text data/articles IMO)