Slashdot Mirror


How Big Data Creates False Confidence (nautil.us)

Mr D from 63 shares an article from Nautilus urging skepticism of big data: "The general idea is to find datasets so enormous that they can reveal patterns invisible to conventional inquiry... But there's a problem: It's tempting to think that with such an incredible volume of data behind them, studies relying on big data couldn't be wrong. But the bigness of the data can imbue the results with a false sense of certainty. Many of them are probably bogus -- and the reasons why should give us pause about any research that blindly trusts big data."
For example, Google's database of scanned books represents 4% of all books ever published, but in this data set, "The Lord of the Rings gets no more influence than, say, Witchcraft Persecutions in Bavaria." And the name Lanny appears to be one of the most common in early-20th century fiction -- solely because Upton Sinclair published 11 different novels about a character named Lanny Budd.

The problem seems to be skewed data and misinterpretation. (The article points to the failure of Google Flu Trends, which it turns out "was largely predicting winter".) The article's conclusion? "Rather than succumb to 'big data hubris,' the rest of us would do well to keep our skeptic hats on -- even when someone points to billions of words."

69 comments

  1. Re: PROTIP by Anonymous Coward · · Score: 0

    A key that opens many locks is a great key.
    A lock that is opened by many keys is a shitty lock.

  2. This is why by Anonymous Coward · · Score: 0

    When all that data that Google, Microsoft, etc gets mined for potential criminal activity, many peoples lives will be ruined by false positives. This is why I try to avoid being monitored in the first place.

    1. Re: This is why by asjk · · Score: 1

      A variant of this is the pernicious and disconcerting act of self censorship.

  3. Makes sense by Anonymous Coward · · Score: 0

    It's part of the reason why I'm always skeptical of one-off statistics being used to try to draw conclusions on things that aren't there.

    I'm reminded of this one article I read about a week ago, where it tried to draw some sort of conclusion as to how sexist the movie industry is by looking at the age ranges for male and female characters in movies, and then seeing that it was mostly women in their early 20's while men had a broader age range. And sure, it might look that way if you only go with that single statistic.

    Problem is, there's other statistical data that indicates that men of all ages find women in their early 20's more attractive, whereas women are attracted to men who are closer to their own age. Last I checked, the whole point of most movies is to make money and thus appeal to the largest number of people possible, and wouldn't you know it, it happens that leading roles tend to go to actors and actresses that others find attractive. Imagine that.

    1. Re: Makes sense by Anonymous Coward · · Score: 0

      So uh, the SEXIST bias was right....

    2. Re:Makes sense by Sique · · Score: 1

      Basicly you are saying that the sexist bias that exists in the movie industry is just a reflection of the sexist bias that exists in society. So your point being?

      --
      .sig: Sique *sigh*
    3. Re: Makes sense by Anonymous Coward · · Score: 0

      If one group of people likes one thing, and a different group of people likes a different thing, that isn't "sexist bias".

  4. Newsflash: Buzzword turns out to be buzzword! by Qbertino · · Score: 2, Insightful

    Film at 11.

    --
    We suffer more in our imagination than in reality. - Seneca
    1. Re: Newsflash: Buzzword turns out to be buzzword! by Type44Q · · Score: 1
      If you think buzzwords are annoying, how about this little gem from the summary:

      But the bigness...

      I understand that language evolves over time... but when a simple word like "size" eludes someone, it's time that they returned to square one and tried again. In this particular instance, I'm afraid nothing short of crawling back in their mom's vagina is going to cut it.

  5. Failures of Unstructured Data by Anonymous Coward · · Score: 0

    In every discussion of Big Data that I've ever read it seems that the "goodness" of big and especially unstructured data sets is basically taken for granted. First off, why shouldn't unstructured data itself be considered something of a failure? If the data collection was more organized and thoughtful from the start, we wouldn't be looking for all sorts of esoteric algorithms and methods for cleaning up messes that seem mostly to be the result of laziness in the initial data collection and not any inherent complexity in the data being collected. Second, why is more information necessarily a good thing? Hasn't anyone ever heard of information overload or signal to noise? People, and especially young people, trash the old ways but I wouldn't be surprised if SQL and table oriented storage is still alive and kicking long after we're all gone.

    1. Re:Failures of Unstructured Data by sexconker · · Score: 0, Flamebait

      Non-relational databases are absolutely retarded. If your data has any value, it must have meaning. If your data has any meaning, it can be modeled in some way.

      "Big Data" retards just take in information and shit it all out into one dump, hoping to extract meaning and value later, without caring about integrity, correctness, or completeness. You end up searching for nuggets of gold in a mountain of shit, with no way to verify the nuggets you find are actually gold.

    2. Re: Failures of Unstructured Data by Anonymous Coward · · Score: 0

      See NSA. QED.

  6. Is this news? by Anonymous Coward · · Score: 0

    Lies. Damn Lies. Statistics.

  7. Wrong interpretation by Anonymous Coward · · Score: 0

    Big data doesn't have to be about an analysis of that data, it can also be about finding a needle in a haystack.

    Googles search engine is searching an enormous set of data, bigger than any big data set and people find it useful every day. Even normal people not scientists.

    I can go to the library and use the index to find books on a subject then flip through pages for weeks to find a how to procedure for making coal tar, or I can just search Google for it and have it in a moment.

    For me that's what book searching is about, unfortunately Google does not offer this and that is why I wrote my own engine for searching the Gutenberg database. That's what big data can be about, it doesn't have to be about predicting or finding trends just because some people use large data sets for that and a bunch of people decided to term the idea. The term doesn't lock it in.

  8. Skeptic by Anonymous Coward · · Score: 0

    Are you suggesting that the "skeptic hats" be worn even by those who possess minimal domain specific and analytic knowledge?

    1. Re:Skeptic by ceoyoyo · · Score: 1

      Yes. Simple statistical concepts are within reach of everyone. One very simple one will take you a long way:

      1) Is the conclusion based on a single analysis, or multiple analyses of a single dataset? If so it's interesting, but not conclusive.

  9. same thing in baseball by known_coward_69 · · Score: 2

    a lot of the big prediction sites have been predicting the kansas city royals to be average to poor the last few years. so far they have been to the world series twice and are in first place in their division this year. all with average to slightly above average mainstream stats but if you look at them they built a team using a team strategy instead of simply signing guys and looking at individual stats

    1. Re: same thing in baseball by Anonymous Coward · · Score: 0

      Projections of player production in baseball are generally awful. They're much too pessimistic in most cases. When the projections for individual players are bad, you can't accurately predict team performance. Also, the Royals are built for that stadium, as Kauffman is particularly large and favors speed over power hitting.

    2. Re:same thing in baseball by sjames · · Score: 1

      That's a serious limitation of sabermetrics. Baseball is a subtle game. The more you study it, the more subtleties you find.

      A good manager's gut feeling takes far more factors into consideration than sabermetrics even looks at.

      A perfect example is the steal. Sabermetrics followers claim the steal is a losing proposition. That may be true by the numbers being tracked, but it fails to account for the effect on the pitcher after a steal happens. Rattling the pitcher is a real thing.

    3. Re:same thing in baseball by Gospodin · · Score: 1

      Baseball is about the least subtle game around. Almost everything comes down to individual players' performance. The team aspect is nearly nonexistent. That's why you can throw a bunch of baseball players together to have an All-Star game and it's a pretty good game. The Pro Bowl sucks because that doesn't work in football: the players actually have to work as a team. And your example is far from perfect. Sabermetrics predicts a particular positive value for the steal and a negative value for the caught stealing. A steal is clearly not a losing proposition. An *attempted steal* may be, depending on your chance of success.

      --
      ...following the principles of Heisenburger's Uncertain Cat...
    4. Re:same thing in baseball by sjames · · Score: 1

      You haven't looked nearly deep enough. For one, in spite of having the best players in the game, the all-star game is filled with lackluster gameplay. It is understood to be more spectacle than sport. The Yankees often have the same problem. They pay top dollar to attract great players, you'd think they would be serious contenders for the World Series each and every year. They're not. The difference between a really good pitcher and a great pitcher often comes down to working well with a catcher to frame a ball as a strike.

      As for the steal, even an attempted steal can rattle the pitcher. That doesn't make a 'sacrifice steal' a good idea, but it does color the decision to attempt it or at least look like you might be ready to attempt it. The latter is a good way to draw a balk or an error on the throw.

      Sabermetrics says the steal is never worth the attempt. Many teams have manufactured many runs from the steal in fact. Others seem better at attempting it than at succeeding.

      Likewise, a pitcher's win/loss record is not necessarily a good measure of his quality.

      In theory, sabermetrics could become a powerful tool but in fact there are far too many variables it doesn't account for and so it remains necessary to take it with a few grains of salt.

  10. Big Data is not a substitute for Critical Thinking by Etcetera · · Score: 5, Interesting

    Getting folks in the Bay Area to realize that is still an unsolved problem. Maybe they have an AI team working on it.

    In all seriousness, I saw this a lot when working within a monitoring team, and in consulting I've done for other orgs. Big Data is great for vast, multi-dimensional analysis of massive amounts of data, but it's not a substitute for domain knowledge about *WHAT* you're monitoring, critically thinking about what you're looking for and what types of failure modes might occur, and simple(r) heuristics for triggers.

    Trend analysis is very useful as an adjunct, for example, but within a server monitoring context it's not a *substitute* for having hard limits on, say, CPU load, or HTTP response time, or memory usage.

    Somehow, people managed to come to conclusions and make good decisions even before we had terabytes of raw data being sifted through by statistical algorithms to come up with a result.

    To place it into a broader cultural context, I see this in parallel with "data fetishisation" where nothing at all can be possibly true unless Science. And Data. Hipster praying at the altar of data.gov as some sort of left-wing (or Millennial) shibboleth for smug certainty when the basics -- the entry-level, basic 101 class of domain knowledge for the field -- is being forgotten.

    I'm all for bringing in new tech and new analytic techniques, but you can't look at it as a panacea for failing to understand what's going on in your domain on a philosophical level.

  11. Nassim Taleb wrote of this three years ago by iMadeGhostzilla · · Score: 5, Interesting

    "Well, if I generate (by simulation) a set of 200 variables — completely random and totally unrelated to each other — with about 1,000 data points for each, then it would be near impossible not to find in it a certain number of “significant” correlations of sorts. But these correlations would be entirely spurious. And while there are techniques to control the cherry-picking (such as the Bonferroni adjustment), they don’t catch the culprits — much as regulation didn’t stop insiders from gaming the system. You can’t really police researchers, particularly when they are free agents toying with the large data available on the web.

    I am not saying here that there is no information in big data. There is plenty of information. The problem — the central issue — is that the needle comes in an increasingly larger haystack."

    1. Re:Nassim Taleb wrote of this three years ago by Anonymous Coward · · Score: 0

      The Molecular Modeling literature has been publishing "chance correlation" papers for decades. Any field that has relied on statistical models has had to deal with this, so I hope whomever's starting work in aware of the various literature bases that exist...

    2. Re:Nassim Taleb wrote of this three years ago by Anonymous Coward · · Score: 1

      "Well, if I generate (by simulation) a set of 200 variables — completely random and totally unrelated to each other — with about 1,000 data points for each, then it would be near impossible not to find in it a certain number of “significant” correlations of sorts. But these correlations would be entirely spurious.

      Exactly - determining a question after looking at the data is statistical BS. E.g. deal any bridge hand, look at it, then ask the question "was this a random deal?"
      The odds of that exact hand (in the order dealt) is 1 in 52!/39! ~ 1 in 4e23 , obviously a crooked deal at the 99.99...9 confidence level. (Even if we ignore order, the odds are ~ 1 in 6000000000)

  12. Put shit in - get shit out by Anonymous Coward · · Score: 0

    Just because you have amassed the largest pile of shit ever doesn't stop it from being shit....

  13. Re:Big Data is not a substitute for Critical Think by Anonymous Coward · · Score: 0

    The problem is not some quasi-religious cult but rather the bandwagon approach toward a rising skill set in the labor market. There is a demand for skilled analysts, and there is a need for hundreds of thousands of people to find new work at the same time. Those conflicting forces reduce skill applied in jobs filled by the mostly unqualified. There is no substitute for a quality university education with actual mathematical background if not statistics concentration. These errors encountered now are actually not all related to statistics, some are blind faith in algorithms that are functional only by heuristic not by well developed theory, or at least not be accessible theory outside of a specialized graduate education in both computer science, business skills, and math. Those skills are in fact what is required instead of resume filler used by people who create the problem you've identified by also falsely attributed cause.

  14. So many ... by Anonymous Coward · · Score: 0

    ... ignorant comments. Good to know the competition pool is not as large as I expected.

    1. Re:So many ... by Anonymous Coward · · Score: 0

      You're already out of the running if you're dumb enough to think you're competing with the other idiots on slashdot.

  15. True of anything confidence boosting by ranton · · Score: 1

    I agree this article has no significant substance. Of course having more data to run statistical models against gives more confidence. So does hiring more Ivy League grads to work as your analysts or paying pricey firms like McKinsey to help make strategic decisions.

    Everything that can help boost confidence has the potential to boost it too far. And everything that will help you make better decisions is confidence boosting. So unless you intentionally want to limit your ability to make intelligent decisions, you will have to carefully monitor whether you have too much confidence.

    --
    -- All that is necessary for the triumph of evil is that good men do nothing. -- Edmund Burke
    1. Re:True of anything confidence boosting by jonnyj · · Score: 3, Insightful

      Of course having more data to run statistical models against gives more confidence.

      Not necessarily. Data sets of a few thousand records are generally sufficient for decent p values unless you're looking for effects that are so small that they're of limited commercial value. The trouble with a really big data set is that data quality and data volume are often inversely related.

      Far more relevant than a larger data set are the answer to questions like these: Is my data set representative or does it have important biases? Is my data set stable over time? What important causal variables might be missing from my data? Am I looking at causality or common cause? Are my errors normally distributed? Are my missing data points representative of the remaining data? What does this data mean in the real world?

      In almost all cases, I would far rather work with a small, high quality data set than a large set of uncertain quality.

    2. Re:True of anything confidence boosting by Anonymous Coward · · Score: 0

      This! I don't care how much data you have. If it isn't representative of the population you're looking for insight into, it's worthless. It's sort of like the patents that are (old stuff) COMPUTER or (old computer stuff) INTERNET! Data collected without design, or with a design that isn't related to what you want to use it for, is just trash.

    3. Re:True of anything confidence boosting by ranton · · Score: 1

      Of course having more data to run statistical models against gives more confidence.

      Not necessarily. Data sets of a few thousand records are generally sufficient for decent p values unless you're looking for effects that are so small that they're of limited commercial value. .

      I agree with this and the rest of your points. Confidence is an overloaded term in this context. I meant more data obviously increases human confidence, not necessarily that it should increase that confidence. More data is by itself better than less data. But just like any positive attribute, having too much of it makes people blind to aspects that aren't as positive.

      This is true of just about any positive positive attribute, whether it is your workers' intelligence, a potential suitor's financial status, or a car's acceleration. All are considered a positive thing by most people, but all have diminishing returns. Just like getting more data.

      --
      -- All that is necessary for the triumph of evil is that good men do nothing. -- Edmund Burke
  16. Re:Big Data is not a substitute for Critical Think by Anonymous Coward · · Score: 2, Informative

    Getting data is dead easy. I can get you gobs of it. I can store it fairly quickly.

    Now for the hard part. What are you trying to find? Have we been collecting the right data? Is it in the right form? Do we actually have enough? Is it at the proper interval? These are where most people fail and they just keep collecting more of the same data. Even though it has 0 use for them somehow magically expecting the data to self organize itself.

  17. Biased data by HalAtWork · · Score: 1

    A lot of people put in crap to get past prompts, or answer ideally instead of truthfully, etc. You have to imagine a lot of this data is biased and doesn't reflect reality anyway.

  18. Big data tells you where to look but now why. by jellomizer · · Score: 1

    The key advantage of big data is the ability to show us where to look. But after that we need to dig further with much smaller data and science to see what the cause is.

    --
    If something is so important that you feel the need to post it on the internet... It probably isn't that important.
  19. Fascinating study of days of month by tgibson · · Score: 1

    There was a very interesting bit of sleuthing done to track down biases in the popularity of certain dates in scanned books. like the prevalence of Sept 11 *before* 2001.

  20. Re:Big Data is not a substitute for Critical Think by Anonymous Coward · · Score: 0

    "Those conflicting forces reduce skill applied in jobs filled by the mostly unqualified." Did you draw this conclusion by collection a lot a data and running the data through your own analysis? Or do you have some other way to prove such a broad accusation? The biggest problem we face to day is from those who use statistics to support their cause and opinions. We are constantly bombarded with poll results and nobody every questions how these results are derived. What statistical methods are being used that allows the pollsters to take a very small sample size and project those results on very large datasets? How do you ask 500 people their opinion on something and then apply those results against 400 million people?

  21. xkcd by Anonymous Coward · · Score: 3, Funny

    https://xkcd.com/1138/

  22. Re:Big Data is not a substitute for Critical Think by Anonymous Coward · · Score: 1

    Exactly! Big Data is about finding patterns, not conclusions. The whole point is that humans are capable of searching through only so much information, and at some point you need a computer to do it for you.

    Of course, once a pattern is found, it's up to humans to determine if it makes sense -- and you'd do the same for any pattern found by a human.

    dom

  23. Re:Big Data is not a substitute for Critical Think by quax · · Score: 4, Insightful

    "data fetishisation" where nothing at all can be possibly true unless Science. And Data.

    The problem is that it's all data, very little science.

    Real scientists know how to scrutinize their data, and how to rule out false positives. Actual science will not only give you a statistical level of confidence, but use domain expertise to the uttermost to rule out systematic errors. A nice case study in that regards is the recent LIGO gravitational wave results.

    Most of the people who like to call themselves "data scientists" these days know as much about science as "computer engineers" know about proper engineering.

  24. Re:Big Data is not a substitute for Critical Think by Anonymous Coward · · Score: 0

    Try reading that again, without the political BS that you added.

  25. Haystacks by gringer · · Score: 1

    I call this approach "adding more haystacks".

    --
    Ask me about repetitive DNA
    1. Re:Haystacks by Anonymous Coward · · Score: 0

      dynamically expanding the emergent top-down haystack logic hyperspace?

    2. Re:Haystacks by gringer · · Score: 1

      Brilliant, but it needs a tiny bit more work to avoid the obviously wrong word "hyperspace":

      Dynamically expanding the search space by utilising multiple concurrent haystacks in an emergent fashion via a cloud-based neural network.

      --
      Ask me about repetitive DNA
  26. On the other side... by ctrl-alt-canc · · Score: 1

    ...false data create big confidence.

  27. So let me get this right... by Hognoxious · · Score: 1

    You measure the wrong thing ten times, and get the wrong answer. You measure the wrong thing ten billion times, and you still get a wrong answer.

    It's almost like quantity and quality are different things!

    --
    Confucius say, "Find worm in apple - bad. Find half a worm - worse."
  28. Re:I HAVE A GREASED UP YODA DOLL SHOVED UP MY ASS by Anonymous Coward · · Score: 0

    That's "BANANAL"

  29. "The Bible Code" by bkgoodman · · Score: 2

    I'm reminded somewhat of "The Bible Code" - the theory/idea that there is a bunch of stuff hidden in the bible, visible when viewed different ways (like when skipping characters, etc - Google it) The reality is - the bigger the dataset - the more patterns - even false patterns may be present in it. If I had a billion money's, what would they type...

    1. Re:"The Bible Code" by RuffMasterD · · Score: 1

      Good question. If your moneys were well diversified, they would likely display mostly type S risk and minimal type I risk, leading to market average results. But if your moneys cluster together, they would display more type I risk, and therefore produce volatile results.

      --
      Human Rights, Article 12: Freedom from Interference with Privacy, Family, Home and Correspondence
  30. Re:Big Data is not a substitute for Critical Think by Anonymous Coward · · Score: 0

    Getting data is dead easy. I can get you gobs of it. I can store it fairly quickly.

    Now for the hard part. What are you trying to find? Have we been collecting the right data? Is it in the right form? Do we actually have enough? Is it at the proper interval? These are where most people fail and they just keep collecting more of the same data. Even though it has 0 use for them somehow magically expecting the data to self organize itself.

    Good point, also one failing I see with big data analyses is when the researcher is looking for or referencing a finding within their data that is at the very edge of the resolution of what they are able to measure, and they think they can make up for it by pouring more data in without examining whether what they are seeing is a repeatable measurement or if it is a fluke that is thrown one way or the other due to variations that are below the resolution of what they are able to measure.

    Prime example:

    The people who point to cell phone radiation causing cancer. You could appeal to ignorance and say that there is no way that we know for sure that a cellular phone can't cause cancer, however most people who use cellular phones also use microwave ovens and the radiation from a microwave oven is on the order of a million times stronger. That observation aside, you can see a correlation between people who use cellular phones and people who get cancer, however there are numerous confounding factors that make showing a causal relationship problematic in the extreme.

    Extra skepticism is required to explain the data and to make lucid assessments of the data and conclusions you draw from it, otherwise it is not science it is more along the lines of politics. Often times I see researchers trying to prove a conclusion that they have already come up with, rather than testing a hypothesis and there is a world of difference between the two concepts.

  31. Re:Big Data is not a substitute for Critical Think by cnettel · · Score: 1

    "Those conflicting forces reduce skill applied in jobs filled by the mostly unqualified." Did you draw this conclusion by collection a lot a data and running the data through your own analysis? Or do you have some other way to prove such a broad accusation? The biggest problem we face to day is from those who use statistics to support their cause and opinions. We are constantly bombarded with poll results and nobody every questions how these results are derived. What statistical methods are being used that allows the pollsters to take a very small sample size and project those results on very large datasets? How do you ask 500 people their opinion on something and then apply those results against 400 million people?

    By assuming random sampling, that's how. Whether that assumption is correct is a critical issue, but that is the case for any universe population significantly larger than your sample set. 500,000 or 400 million really does not matter - if you are ignorant of the demographics and how those interact with your sampling strategy, you're not gonna get a correct result.

    If, on the other hand, you somehow manage to do random sampling of the true population, 400 people would be enough to nail preferences down to a few percent, (almost) no matter the total population size. And I guess this is the danger of statistics and big data. Intuition says one thing, simple statistical assumptions say another, and a more thorough treatment is rare.

  32. Re:Big Data is not a substitute for Critical Think by jma34 · · Score: 1

    The lack of thinking is somewhat appalling. I am a "data scientist". I came from one of the science fields that understands data, high energy particle physics. People are often surprised when I tell them that their fancy map-reduce tools are not particularly interesting when it comes to actually understanding your data. The tools are not interesting. Do you hear that "big data" conference organizers. Too little time is spent understanding what the data is telling and how do you know that it is telling you that and not something else.

    Making sense of data takes knowledge and common sense. Qualifications I often find lacking in many of the job candidates I've interviewed. They know how to run the latest tool, but can't explain what the results mean when they get them.

  33. Re:Big Data is not a substitute for Critical Think by Anonymous Coward · · Score: 0

    Well, all human decision-making is appalling, really. What one should be wary about is how others will misunderstand and abuse knowledge.

    Captcha: revoke

  34. Big data are important by thomastekster · · Score: 1

    I know a lot of humans don't like the way the big websites save their information and sell them. But there is a reason for this, if we say Google they sell their information to all who want to promote their products through Google. That means Google creates more jobs and wroth in the world. Companies can promote their products and their customers can see them in Googles search engine. Alot of SEO companies are dependent on Googles way to sell more information.

    --
    PanadasDigital is a SEO bureau
  35. Re: PROTIP by RabidReindeer · · Score: 1

    Garbage In, Gospel Out

  36. Watch how financial institutions use big data by rickb928 · · Score: 1

    Many use big data systems and techniques to:

    - Identify potential new customers for products and services. Mistakes here result in poor choices and losses.

    - Identify and prevent fraud of all types. Mistakes here result in losses.

    - Identify existing customers that could be successfully marketed new or additional products and services. Mistakes here result in disgruntled customers and losses.

    Sometimes it's hard to determine if a technology is useful or even functional. Money often is a good indicator.

    --
    deleting the extra space after periods so i can stay relevant, yeah.
  37. "Big Data isn't Statistics" by HeckRuler · · Score: 1

    Oh man, I ran square into this just last week. This guy was claiming to work in big data as an economist. Said any sort of inefficiency should ultimately impact the GDP. I countered that, there are lies, damned lies, and statistic and that it might not make him comfortable, but the metrics he's using could be lying to him.

    And get this: "I work with data. Statistics is for losers". ... Can you believe that guy? Even after I point out that while every call to Map() might sort data very nicely, every Reduce() call AGGRIGATES data into a statistic, he still stuck to his guns and claimed to be somehow holier then icky statistics.

  38. yes, but is it Big Data? by UncleGizmo · · Score: 1

    The summary example (Google scanning 4% of books), while it may be "a lot" of data, isn't really big data, is it? I understand the whole point about more data not necessarily being better, but here I don't even think the example shows proves the point?

    --
    Who put this thing together? Me, that's who.
  39. Re:Big Data is not a substitute for Critical Think by Anonymous Coward · · Score: 0

    We used to teach stats with the principle "data size cannot overcome data basis" back in my day ..

  40. Foundation and Empire by tmjva · · Score: 1

    Big Data and Statistics were the problems Hari Seldon ran into, didn't he? Only worked with supra-large populations.

    --
    Tracy Johnson
    Old fashioned text games hosted below:
    http://empire.openmpe.com/
    BT
  41. Re:Big Data is not a substitute for Critical Think by ZorroXXX · · Score: 1

    Big Data is about finding patterns, not conclusions.

    Gary Taubes (author of "Good calories - bad calories" and "Why we get fat and what to do about it") is my favourite scientist because he just exhibit such a healthy, integrated "given that what we believe today is correct" attitude, e.g. being totally open to be proven incorrect. There is a saying "follow those that seek the truth, run from those that have claimed to found it", and Gary is most certainly a truth seeker in that respect.

    For instance in the interview https://www.youtube.com/watch?..., he says during the first minutes "That's what we should believe until we have remarcable evidence to reject it" and "Don't take my word for it, anyone can try it out for themselves", without this being specifically emphasised or made a big point of, it is just his natural way of reasoning which I love so much.

    And now to what triggered me to answer your post: I think it is later in that interview that he points out that observation studies can only be used to form hypothesis, dawing conclutions from them is wrong, you actually need to perform controlled experiments to do that.

    --
    When you are sure of something, you probably are wrong (search for "Unskilled and Unaware of It").