Slashdot Mirror


The Importance — and Limits — of Very Large Data Sets

New submitter kodiaktau writes "A recently presented paper discusses how large data sets can improve learning algorithms, but points out that researchers still need to account for bias and incompleteness before drawing conclusions. The paper also goes into the need for responsible business practices to manage these data sets. 'There's been the emergence of a philosophy that big data is all you need. We would suggest that, actually, numbers don't speak for themselves.' The full paper is available through SSRN. Of particular importance is their assertion that even huge data sets can and will be affected by filters or the analyst who is interpreting it. '[Study co-author Kate Crawford] notes that many big data sets — particularly social data — come from companies that have no obligation to support scientific inquiry. Getting access to the data might mean paying for it, or keeping the company happy by not performing certain types of studies.'"

4 of 17 comments (clear)

  1. There's lots of data by MadKeithV · · Score: 2

    There's lots of data to support this article.

  2. This is a problem with most data! by garcia · · Score: 3, Insightful

    From the blurb:

    Getting access to the data might mean paying for it, or keeping the company happy by not performing certain types of studies.'"

    Even if you're using data from public institutions you still may have to pay for it (to cover staff time to procure the data--especially if you're asking for something they don't normally provide, which is quite often). While there won't be any limitations on what you can do with the data once you have it, because of lack of knowledge of their own data/bases the provider may simply provide you with incomplete or likely inaccurate data anyway.

    So yeah, welcome to the world of using data. Move along, nothing to see here.

  3. Re:This is the most obv article ever on /. by Anonymous Coward · · Score: 2, Funny

    How sure are you your data-set is adequate to make that determination?

  4. At least there IS very large social data sets by G3ckoG33k · · Score: 2

    At least there IS very large social data sets.

    Most sociologists today tend to describe the world using 'deep' interviews of 36 people in the surroundings of the campus, because that way they will get the result they wish to get.

    A cynic description, yes, but not too far the truth. So, it is good to see there IS large data sets, somewhere.