Slashdot Mirror


Why Programmers Need To Learn Statistics

David Gerard writes "Zed Shaw writes an impassioned plea to programmers: Programmers Need To Learn Statistics Or I Will Kill Them All. Quoting: 'I go insane when I hear programmers talking about statistics like they know s*** when it's clearly obvious they do not. I've been studying it for years and years and still don't think I know anything. ... I have taken a bunch of math classes, studied statistics in grad school, learned the R language, and read tons of books on the subject. Despite all of this I'm not at all confident in my understanding of such a vast topic. What I can do is apply the techniques to common problems I encounter at work. My favorite problem to attack with the statistics wolverine is performance measurement and tuning. All of this leads to a curse since none of my colleagues have any clue about what they don't understand. I'll propose a measurement technique and they'll scoff at it. I try to show them how to properly graph a run chart and they're indignant. I question their metrics and they try to back it up with lame attempts at statistical reasoning. I really can't blame them since they were probably told in college that logic and reason are superior to evidence and observation.'"

18 of 572 comments (clear)

  1. Percent probability that Zed Shaw is a jerk by Anonymous Coward · · Score: 5, Funny

    110%.

  2. correlation != causation by Hognoxious · · Score: 5, Funny

    Correlation != causation. Just repeat that and you don't need to know statistics.

    --
    Confucius say, "Find worm in apple - bad. Find half a worm - worse."
  3. Your argument is dead, Zed by BadAnalogyGuy · · Score: 5, Insightful

    Maybe the problem is in your presentation. Even here, you tell programmers that you want to kill them for not understanding a topic that even you are unwilling to acknowledge mastery of. Then you tell us how hard the topic is to understand, even though you've spent so much time trying to learn it.

    Is it any wonder that no one takes your suggestions seriously? You are practically sabotaging yourself with self-effacement.

    These aren't homework problems you're tackling here. They are business problems and you need to sell yourself and your ideas if you want to get any traction. Do you have any evidence that your methods are better than the SOP thus far? Do you have any case studies that show how effective statistic analysis is in *any* of your projects?

    Or are you simply taking something that seems like a data point and extrapolating it to cover a vast swath of applications?

    1. Re:Your argument is dead, Zed by Hurricane78 · · Score: 5, Funny

      I just found a very old hard disk. Double height. MFM/RLL. And after a “strings -n 32 /dev/hdd”, I got the following old saying, carved in the bytes of the disk:

      Computer science
      Statistics
      Social skills

      Choose one.

      ;)

      --
      Any sufficiently advanced intelligence is indistinguishable from stupidity.
  4. Or, how about... by halivar · · Score: 5, Insightful

    Statisticians need to learn programming or I will kill them all.

  5. Title fail. by girlintraining · · Score: 5, Funny

    Programmers Need To Learn Statistics Or I Will Kill Them All

    Okay, two things: First, threatening programmers never work. Management's been trying that for years. Second -- don't you mean 'kill -9' them all, or maybe demalloc(), or cast them to void*, or one of a dozen other witty things you could do besides the mundane answer of threatening stabby bits on them because you have a case of intellectual snobbery?

    --
    #fuckbeta #iamslashdot #dicemustdie
    1. Re:Title fail. by Anonymous Coward · · Score: 5, Funny

      or firefox's implementation:

      void demalloc(*ptr)
      {
      /* noop */
      return;
      }

  6. Re:93% of Programmers Think You're Wrong by Anonymous Coward · · Score: 5, Interesting
  7. The reason people ignore you Zed.. by Anonymous Coward · · Score: 5, Insightful

    is not because they don't understand statistics. It is because you are a dick.

  8. Re:93% of Programmers Think You're Wrong by ShakaUVM · · Score: 5, Insightful

    A manga statistics book, eh?

    I just realized I was a nerd. I looked at the table of contents and closed it down, then realized I hadn't even looked at the short skirt-wearing protagonist.

    Sigh...

    But to answer the article's point, elementary statistics are very easy. Advanced statistics are very hard. It's kind of like how people think "knowing the difference between circles and squares" is geometry and so analytical geometry must be just more of the same, right? It's quite possible the programmers think they know statistics because they know they're vaguely supposed to do a run multiple times, and maybe average the results or something.

    It's also possible the author of the article is a know-it-all douchebag who tries to solve problems with overwrought solutions.

    From TFA: "Zed: Fuck! Fuck! I have eyes! You do not! See!? No?! Exactly! Because you can't fucking see because you have no fucking eyes! Arrggh!"

    Just throwing that theory out there.

  9. He makes some good points... by SanityInAnarchy · · Score: 5, Insightful

    ...unfortunately, they are mostly lost in the irony of statements like this:

    I think women are better programmers because they have less ego and are typically more interested in the gear rather than the pissing contest.

    I doubt I've seen anyone more thoroughly entrenched in a pissing contest than Zed Shaw, of the website formerly known as "Zed's So Fucking Awesome".

    --
    Don't thank God, thank a doctor!
  10. Re:Mathematicians just need to shutup. by __aasqbs9791 · · Score: 5, Insightful

    One post calculus statistics course gives me enough grounding to know what I don't know and punt to experts when I need to.

    That's actually his argument (though I'm pretty sure he doesn't realize it, having met him a few years ago at a conference). People need to know their limits, and the strengths (and weaknesses) of others, and defer to them when they know what they're talking about, rather than talking out of their asses. As you point out, you can't know everything, but you'll defer to others who know more when you need to. I'm pretty sure Zed would like working with you based upon that fact alone (I know I value that trait and try to express it myself). Far too many people think they aren't allowed to have any weaknesses (and we all do in some area or another) so they talk a big game, and when push comes to shove, they will actively block people who actually know more than they do about the subject at hand. Working with too many people like that has driven Zed insane (IMHO) and I know I've been close to it at a couple of work places before (and really loved the one that wasn't like that hardly at all).

  11. Re:Mathematicians just need to shutup. by Toonol · · Score: 5, Insightful

    But statistics is one of those fields that benefits everybody; it's a bit like probability, logic, or (further afield) history. Lack of a fundamental understanding of statistic can lead you astray in a near-infinite number of ways.

    I have sat in business meetings hundreds of times where I've seen decisions made on completely meaningless and irrelevant data, because the people involved don't understand statistics. The same holds true in your personal life; decisions with purchasing products, investing money...

    Now, I'll bet that most slashdot readers have the minimum amount of knowledge of statistic to avoid the most egregious errors; but more knowledge is certainly helpful. It will help you in a myriad of ways.

  12. Re:Statistics is HARD by thesandtiger · · Score: 5, Interesting

    I don't think it's hard - I just think it requires a different way of thinking than most programmers usually take to maths.

    As a programmer/developer who went into research (in social sciences, so it's really soft), I can say that in my experience stats is really closer to a programming language than it is to other maths. Here's why:

    1) You have a LOT of tools to pick from. What kind of analysis do you want to do? What kind will give you the most useful result? What kind is your data amenable to?

    2) You don't always have a clear choice as to which is the best for a given situation. Sometimes you need multiple different types of analysis to really get the full picture.

    3) Just because it's math doesn't always mean it's right. There's some crazy ass black-box magic stats stuff we use for one project of ours that, in theory, will let us figure out the demographic composition of an unknown target population. Maybe. Sometimes. If the wind is right. Or not.

    4) At the advanced levels, it's fucking insane. People who hack stuff like ultra optimized 3d engines with large quantities of assembler or whatever always wigged me out because my brain just doesn't work that way. With the really complex stats stuff it's the same way - I can plug and chug with the formulas, but I honestly have about as much comprehension of why some of the more advanced stuff works as my dog has of CPU design.

    5) If you know the basics, you know just enough to be dangerous and really piss off people who know what they're doing. Being able to run an anova or determine correlation makes some people think they actually know what's going on because, hey, it's math. But a lot of people who just do the basic stuff think their results are more meaningful than they actually are - falling prey to the whole "it's statistically significant therefore it must be IMPORTANT" fallacy (when you can certainly have things that are "statistically significant" but actually have virtually no impact on the outcome.

    6) Even when people know their shit, they disagree. A fine example of this would be the Space Shuttle failure rate - you had people saying that the shuttle would suffer a critical failure from everywhere between 1 in 5 and 1 in 50,000 launches. And depending on what tools they used to do their analysis, they were correct. Same as with programming languages - depending on the problem, equally skilled programmers might pick entirely different languages to use because they think one part or another is more critical.

    Honestly, I really enjoy stats - if I had to do it all over again I would probably have spent a LOT more time working with stats than I did as a programmer in my younger years - but I won't pretend that it's totally clear what tools to use when. The author of TFA should do well to realize that even fellow statisticians would probably slap the shit out of him over some of his beliefs about how to properly go about utilizing stats toolsets.

    --
    Since I can't tell them apart, I treat all ACs as the same person.
  13. Re:93% of Programmers Think You're Wrong by Daniel+Dvorkin · · Score: 5, Insightful

    "Lies, damn lies and statistics" is all you need to know about statistics.

    This is right up there with "'click on the big blue e' is all you need to know about the internet."

    Speaking as both a statistician and a computer scientist, I've seen the statistics-vs.-CS argument play out many times before, and the lack of knowledge on both sides is really striking, but not all that surprising -- both are hard subjects which take a lot of work to master. The lack of mutual respect is both infuriating and pathetic, and there's no excuse for it.

    --
    The correlation between ignorance of statistics and using "correlation is not causation" as an argument is close to 1.
  14. Re:Statistical analysis of the summary by brian_tanner · · Score: 5, Informative

    Wow. What class did you take that says if you don't know something you should assume equal probability?

    I don't know if there is an invisible elephant in my kitchen, so I guess I should assign equal probability to both outcomes. I also don't really know how Baccarat works, I guess my odds are 50/50.

    Without knowing something about he or his coworkers, you by definition cannot make any statistical statements. To make any statements, you would first need to make some observations. This is how statistics is different from logic. Statistics is grounded in data.

    I don't agree with Zed, but you may have just proved his point.

  15. Re:Everyone should learn statistics by Daniel+Dvorkin · · Score: 5, Interesting

    Resampling-based statistics haven't replaced parametric models, and I doubt they ever will, for one very simple reason: as the available processing power grows, so does the amount of data. In my field, bioinformatics, the size and complexity of the data sets follows a Moore's Law of its own, and I don't think bioinformatics is unique in this. "Just bootstrap it" is easy to say, and certainly there have been many times when dealing with an analytically intractable distribution when I've done just that, but if the analytical solution takes minutes and the bootstrap solution takes weeks, you have to take this into account.

    Of course, resampling isn't the only way to look at problems non-parametrically. Often a good compromise is to go with rank-based statistics, which are fast and easy to calculate -- and you may not have an analytically tractable model for the distribution of the original data, but you don't have to, since by working with ranks you can define a distribution with good analytical properties. You still need to do some reality-checking exploratory data analysis, of course, but this is an approach that generally works well in practice.

    --
    The correlation between ignorance of statistics and using "correlation is not causation" as an argument is close to 1.
  16. Re:Very good (from someone who's taken BOTH)... ap by JWSmythe · · Score: 5, Informative

    1.) EASILY SKEWED (as in "4/5 dentists chew trident", oh "sure, sure", especially when they're on the corporate payroll (or paid off to say so by said corporation so their "evidence & observation looks good")

    and

    2.) IS THE SAMPLE SET LARGE & COMPREHENSIVE ENOUGH? (most?? Most are not, period)...

    You know, that particular citation has made me wonder in the past, but not enough to actually research it. So, I went off looking for more information and found it.

        The statistic was generated from a July 1976 survey.

        The sample group for this statistic was 1,200 dentists. These dentists were hand picked by the research company, probably with good reason.

        They were asked, what advice would they give gum-chewing patients

        1) sugared gum
        2) sugarless gum
        3) no gum at all.

        Sugarless gum got 85% of the vote. Not terribly surprising. I'd be fairly confident that their time had been paid for, or at very least they were told "This survey is being done for Trident Sugarless Gum." That is only speculation, so hush up.

        17/20 doesn't really sound very good. It just doesn't stick in your head. 4/5 is close enough, even though it reduces your answer to 80% (ahhh, a lie). Since these are marketing folks, I'm sure they pushed all kinds of values past focus groups, until "4 in 5" was accepted as most favorable.

        As the link cites, they're fairly confident that the "sugared gum" answer got at least one response. There's always someone that'll take the obvious wrong answer. If you don't believe that, look at any Slashdot poll. :)

        What they don't say is how many of the 1,200 samples were dropped. I'm sure there were non-responses, and they could have easily added any number of unfavorable answers in as non-responses. Of course, they couldn't have 100% in their favor, so they had to keep some.

    --
    Serious? Seriousness is well above my pay grade.