Slashdot Mirror


Test Shows Big Data Text Analysis Inconsistent, Inaccurate

DillyTonto writes The "state of the art" in big-data (text) analysis turns out to use a method of categorizing words and documents that, when tested, offered different results for the same data 20% of the time and was flat wrong another 10%, according to researchers at Northwestern. The Researchers offered a more accurate method, but only as an example of how to use community detection algorithms to improve on the leading method (LDA). Meanwhile, a certain percentage of answers from all those big data installations will continue to be flat wrong until they're re-run, which will make them wrong in a different way.

1 of 60 comments (clear)

  1. Re:In other words, you're doing it wrong. by Drethon · · Score: 4, Interesting

    This is what scares most people, or at least me, about ideas of using big data to predict criminals or otherwise mess up people's lives.