Slashdot Mirror


Why Programmers Need To Learn Statistics

David Gerard writes "Zed Shaw writes an impassioned plea to programmers: Programmers Need To Learn Statistics Or I Will Kill Them All. Quoting: 'I go insane when I hear programmers talking about statistics like they know s*** when it's clearly obvious they do not. I've been studying it for years and years and still don't think I know anything. ... I have taken a bunch of math classes, studied statistics in grad school, learned the R language, and read tons of books on the subject. Despite all of this I'm not at all confident in my understanding of such a vast topic. What I can do is apply the techniques to common problems I encounter at work. My favorite problem to attack with the statistics wolverine is performance measurement and tuning. All of this leads to a curse since none of my colleagues have any clue about what they don't understand. I'll propose a measurement technique and they'll scoff at it. I try to show them how to properly graph a run chart and they're indignant. I question their metrics and they try to back it up with lame attempts at statistical reasoning. I really can't blame them since they were probably told in college that logic and reason are superior to evidence and observation.'"

94 of 572 comments (clear)

  1. 93% of Programmers Think You're Wrong by Greyfox · · Score: 3, Interesting

    Everything I needed to know about statistics I learned playing poker.

    --

    I'm trying to teach myself to set people on fire with my mind... Is it hot in here?

    1. Re:93% of Programmers Think You're Wrong by Anonymous Coward · · Score: 5, Interesting
    2. Re:93% of Programmers Think You're Wrong by ShakaUVM · · Score: 5, Insightful

      A manga statistics book, eh?

      I just realized I was a nerd. I looked at the table of contents and closed it down, then realized I hadn't even looked at the short skirt-wearing protagonist.

      Sigh...

      But to answer the article's point, elementary statistics are very easy. Advanced statistics are very hard. It's kind of like how people think "knowing the difference between circles and squares" is geometry and so analytical geometry must be just more of the same, right? It's quite possible the programmers think they know statistics because they know they're vaguely supposed to do a run multiple times, and maybe average the results or something.

      It's also possible the author of the article is a know-it-all douchebag who tries to solve problems with overwrought solutions.

      From TFA: "Zed: Fuck! Fuck! I have eyes! You do not! See!? No?! Exactly! Because you can't fucking see because you have no fucking eyes! Arrggh!"

      Just throwing that theory out there.

    3. Re:93% of Programmers Think You're Wrong by Daniel+Dvorkin · · Score: 5, Insightful

      "Lies, damn lies and statistics" is all you need to know about statistics.

      This is right up there with "'click on the big blue e' is all you need to know about the internet."

      Speaking as both a statistician and a computer scientist, I've seen the statistics-vs.-CS argument play out many times before, and the lack of knowledge on both sides is really striking, but not all that surprising -- both are hard subjects which take a lot of work to master. The lack of mutual respect is both infuriating and pathetic, and there's no excuse for it.

      --
      The correlation between ignorance of statistics and using "correlation is not causation" as an argument is close to 1.
    4. Re:93% of Programmers Think You're Wrong by Devout_IPUite · · Score: 4, Insightful

      "It's also possible the author of the article is a know-it-all douchebag who tries to solve problems with overwrought solutions."

      That was kinda what I got from this. Sure, my powers of ten runs to determine performance isn't statistically sound. Did I say it was? No. Why don't I care? Because my samples are cheap. Spiking vs non-spiking is something pretty easy to see when you glance at the data.

      I mean, he said we're going to die if we don't learn statistics, but he never gave a compelling argument for it.

      The best example was users, but even that was lacking. If you design a script that's as aggressive on a system as a high use user and your system supports as many 'users' as students, you're safe, if it supports less you work on qualifying the problem better then.

    5. Re:93% of Programmers Think You're Wrong by obarthelemy · · Score: 2, Informative

      I'm sure it's not 50%, and not 25%

      heads=1, tails = 0

      0-0 0-1 1-0 1-1

      so if one of them is 1, there's a 33.33% chance the other is 1 too.

      i can work it out that way for 2 binary possiblities. couldn't generalize it x coins possiblities with y sides :-/

      --
      The Cloud - because you don't care if your apps and data are up in the air.
    6. Re:93% of Programmers Think You're Wrong by ShakaUVM · · Score: 2, Interesting

      >>Spiking vs non-spiking is something pretty easy to see when you glance at the data.

      Yeah, in fact, the way that he presents it is bad statistics. =)

      If the problem is that one out of 1000 queries is taking a minute to return instead of 0.1 seconds, then using the std deviation to describe the problem is nonsense. It is not a Gaussian distribution!

      But of course someone who "has spent his life studying statistics and even R language" would know that, right? :p

      Instead, as you point out, any programmer who did the same testing would see that one out of a thousand queries were taking far too long, and come to the same conclusion as him, without making the ghost of Gauss cry.

    7. Re:93% of Programmers Think You're Wrong by somebody1 · · Score: 2, Insightful

      Flipping a fair coin is always independent (50%) regardless of whether you flip one or a million of them. Same reason, why martingale in roulette doesn't work.

    8. Re:93% of Programmers Think You're Wrong by Hal_Porter · · Score: 2, Insightful

      He has got a point that Computer Science graduates do value logic and reason (or less charitably bullshit) over evidence and observation.

      In fact one of the best CS books I've ever read was "Computer Architecture. A Quantitative Approach" by Hennessy and Patterson precisely because all the rules of thumb in it were backed up by measurements.

      Then again most CS types have realised that with a bit of Google assisted cherry picking of the statistics they can pretty much prove any of their preconceptions to be true, i.e. their favourite ultra high level language just happens to be "potentially just as fast or faster than C++, the problem is that most people don't have the skills to do it". Sigh. It's one thing to say you like a language subjectively and are more productive in it, quite another to claim it is fast when most measurements say it is just isn't.

      --
      echo -e 'global _start\n _start:\n mov eax, 2\n int 80h\n jmp _start' > a.asm; nasm a.asm -f elf; ld a.o -o a;
    9. Re:93% of Programmers Think You're Wrong by Dwonis · · Score: 4, Insightful

      Thing is: You can only be expert in ONE of them. Period.

      Hundreds of cryptologists prove you wrong.

    10. Re:93% of Programmers Think You're Wrong by genner · · Score: 2, Funny

      then realized I hadn't even looked at the short skirt-wearing protagonist.

      That sound you just heard was a million slashdotters clicking on that link at the same time.....
      except me since I'm familiar with the book in question and realized long ago that she has sharp knees,

    11. Re:93% of Programmers Think You're Wrong by Nazlfrag · · Score: 2, Funny

      Oh and by the way he's a hit with the ladies! He never has problems with them (well he is a dashing 6'2" *swoon*) and he's just such a nice guy too.

    12. Re:93% of Programmers Think You're Wrong by donaldm · · Score: 2, Insightful

      Thing is: You can only be expert in ONE of them. Period.

      You can easily be expert or well informed in more than one field.

      I for one, choose CS. Waaayy more interesting, and compared to the nerdiness level of statistics, we look like Joe Sixpack coming to the club in his sports car, with two girls in the back. ;) If I want to do statistics, I can always hire someone.

      I suppose if I really want programming done I can hire someone. There problem you have here is trusting the person you hired to have done their job properly so you want to have some understanding of what is actually required.:)

      If you are a consultant you have to have an understanding of all the fundamentals that are required to get the job done. You don't have to be an expert in all fields but you have to be able to communicate with the people that are giving input and if that requires learning what can sometimes be a difficult field then so be it.

      Any type of computing requires knowledge of "Numerical Analysis", "Statistics and Probability", "Logical thought" and surprisingly "Art". You also should be open to input from a wide variety of sometimes conflicting ideas and have to the ability to determine what is the correct solution rather than just a solution as well as having the ability to reason and sometimes compromise with all parties. This is actually called human communication (sometimes diplomacy) and no one would say this is an easy thing to do.

      --
      There ain't no such thing as proprietary standards only proprietary formats. Standards are by definition open.
    13. Re:93% of Programmers Think You're Wrong by Daniel+Dvorkin · · Score: 3, Insightful

      You can only be expert in ONE of them. Period.

      [shrug] Depends on how you define "expert," I suppose. I have one MS in CS and another in biostatistics, and am currently working on a PhD in bioinformatics, where I use the knowledge I've gained in both fields pretty much every day. If you think CS is "waaayy more interesting," that's fine for you; personally I find them equally interesting and valuable.

      --
      The correlation between ignorance of statistics and using "correlation is not causation" as an argument is close to 1.
    14. Re:93% of Programmers Think You're Wrong by obarthelemy · · Score: 2, Informative

      I'm not assuming anything, just reading the question correctly: The question is NOT "if I flip two coins and THE FIRST ONE is heads..." (answer would then indeed be 50%), but "If I flip two coins and ONE OF THEM is heads..."

      I'm listing all 4 combinations for 2 flips, and out of the 3 that satisfy the prerequisite ("one of them is heads") counting how many combinations turn up with the other one also being heads. There's one out of 3 possibilities, so that's 33%.

      --
      The Cloud - because you don't care if your apps and data are up in the air.
    15. Re:93% of Programmers Think You're Wrong by tsalmark · · Score: 2, Insightful

      I think the conversation has devolved into a language issue: a. what is the chance of two events happening. b. after some trigger, what is the chance of one event happening. 33.3%, 50%. can I go to bed now?

    16. Re:93% of Programmers Think You're Wrong by fbjon · · Score: 2, Funny

      Everyone knows that 98.2% of all statistics are made up on the spot.

      From this we can see that 98.2% of that statistic was made up on the spot, meaning only 1.8% of all statistics are really made up on the spot. By repeated application of this we can conclude that either:

      • A: statistics made up on the spot asymptotically reaches zero
      • B: my skills in statistics are woefully inadequate.

      My god, TFA is right!

      --
      True confidence comes not from realising you are as good as your peers, but that your peers are as bad as you are.
    17. Re:93% of Programmers Think You're Wrong by Gendou · · Score: 2, Informative

      I'm not sure why I'm wasting time responding to a troll but whatever.

      > The question is 1 coin is heads, what is the probability that the other coin is heads. In other words, your girlfriend is pregnant. What are the odds that my girlfriend is also pregnant?

      No, you read it wrong. What it's actually asking is (if we pretend all girlfriends have exactly a 50% chance of being pregnant): "two girlfriends exist. At least one of the two is pregnant. What are the odds that both girlfriends are pregnant?"

      You just read it wrong and you're too stubborn too admit that you could ever be wrong, even though this puzzle is FIFTY YEARS OLD and is well documented all over the internet. Just see the Wikipedia article on it.

    18. Re:93% of Programmers Think You're Wrong by Gendou · · Score: 2, Informative

      Please see this -- this is a well-known puzzle over 50 years old, and I'm surprised that there are people on Slashdot who weren't familiar with it already.

    19. Re:93% of Programmers Think You're Wrong by ericlondaits · · Score: 2, Insightful

      Standard deviation is useless if you're not working with samples that have a normal distribution.

      Why? No. You can measure the standard deviation of any distribution, normal or not. And it is what it is, independent of distribution, it tells you how much you should expect samples to deviate from the average.

      --
      As a Slashdot discussion grows longer, the probability of an analogy involving cars approaches one.
  2. Percent probability that Zed Shaw is a jerk by Anonymous Coward · · Score: 5, Funny

    110%.

    1. Re:Percent probability that Zed Shaw is a jerk by kandela · · Score: 4, Funny

      And by that you mean 110% +/- 10% (95% confidence interval) right?

      --
      Conservation of angular momentum makes the world go round.
  3. correlation != causation by Hognoxious · · Score: 5, Funny

    Correlation != causation. Just repeat that and you don't need to know statistics.

    --
    Confucius say, "Find worm in apple - bad. Find half a worm - worse."
    1. Re:correlation != causation by jc42 · · Score: 2, Interesting

      So what if it is 1 out of 10 million that it will happen.

      When I hear this sort of reasoning, I like to point out that with modern computers, something that happens only 1 time out of a million can very easily mean thousands of occurrences per day, each of which will get us a support call. This usually ends the discussion really fast, and they agree to properly implementing the "unlikely" edge cases.

      I've also heard to observation that in computing, statistical behavior is generally referred to as "bugs".

      --
      Those who do study history are doomed to stand helplessly by while everyone else repeats it.
    2. Re:correlation != causation by JWSmythe · · Score: 2, Insightful

      You forgot to mention that the 9,999,999 transactions are normal billing transactions, and the one that fails is the batch that actually charges their credit cards. :)

      --
      Serious? Seriousness is well above my pay grade.
  4. Your argument is dead, Zed by BadAnalogyGuy · · Score: 5, Insightful

    Maybe the problem is in your presentation. Even here, you tell programmers that you want to kill them for not understanding a topic that even you are unwilling to acknowledge mastery of. Then you tell us how hard the topic is to understand, even though you've spent so much time trying to learn it.

    Is it any wonder that no one takes your suggestions seriously? You are practically sabotaging yourself with self-effacement.

    These aren't homework problems you're tackling here. They are business problems and you need to sell yourself and your ideas if you want to get any traction. Do you have any evidence that your methods are better than the SOP thus far? Do you have any case studies that show how effective statistic analysis is in *any* of your projects?

    Or are you simply taking something that seems like a data point and extrapolating it to cover a vast swath of applications?

    1. Re:Your argument is dead, Zed by Krishnoid · · Score: 4, Funny

      Or are you simply taking something that seems like a data point and extrapolating it to cover a vast swath of applications?

      Well yeah, that's what he was saying -- statistics!

    2. Re:Your argument is dead, Zed by superdana · · Score: 4, Insightful

      Maybe the problem is in your presentation.

      Meet Zed Shaw.

    3. Re:Your argument is dead, Zed by dbIII · · Score: 2, Insightful

      It's just the "beige box is the hard drive and the screen is the computer" problem over again. People pretend they know what they are doing and make stuff up and pretend that they are confident that it is real. This really annoys those that do know what they are doing but don't want to appear to be overconfident because they haven't written the textbooks themselves.

    4. Re:Your argument is dead, Zed by arendjr · · Score: 4, Insightful

      I don't know Zed Shaw yet, but I think you're right.

      The whole problem he is describing sounds like a big ego problem. He himself has a huge ego, and has problems when he runs across the programmers, who often have huge egos as well.

      Now, I think he does make a point though. The programmers he is ranting about indeed do sound like assholes, just like he himself is. In order to be a really good programmer (or a good statistics expert) you should also know when to put aside your ego.

    5. Re:Your argument is dead, Zed by Hurricane78 · · Score: 5, Funny

      I just found a very old hard disk. Double height. MFM/RLL. And after a “strings -n 32 /dev/hdd”, I got the following old saying, carved in the bytes of the disk:

      Computer science
      Statistics
      Social skills

      Choose one.

      ;)

      --
      Any sufficiently advanced intelligence is indistinguishable from stupidity.
    6. Re:Your argument is dead, Zed by lena_10326 · · Score: 3, Insightful

      Basically, many programmers feel that everybody else around him(or her) is a stupid asshole

      That's one of the reasons working in IT is not all that satisfying. Many problems have multiple solutions which for the most part are equivalent in function but vary on what they're attempting to optimize for (* see below) yet developers seem to latch onto the solution they thought of and become down right rude and nasty when evaluating a teammate's solution. When every developer assumes he is the smartest of the bunch and all others are morons it fosters an environment where everyone is unwilling to compromise and a 3rd person usually has to step in to break the tie. That leads to a hostile work place where thought battles frequently occur. Losing a battle causes a teammate to become afraid of undue criticism in the future, so the next time around they over engineer the code trying to cover all bases. This leads to large systems that solve fairly simple problems with overly complex implementations. After a few cycles of this, the software is unmanageable, which becomes evidence proving to the developer that his teammates and ones who came before are idiots with no clue, and now it is up to that lone hot shot to bitch about fixing the mess, which of course is accompanied with many nasty critiques and insinuations.

      I am a developer with a fairly open mind and I strive to eliminate ego from the workplace by staying on the positive, helpful side, but honestly I'm getting sick of working with people who don't try to do the same.

      * Example, solutions can be optimized to target maintainability, readability, CPU/IO performance, availability, reliability, correctness/precision, recovery, automation, reduction of complexity, extensibility, cross platform, resilience to change, parallelism, security, partitioning, modularization, popular design idioms. The list is nearly endless.

      --
      Camping on quad since 1996.
  5. Or, how about... by halivar · · Score: 5, Insightful

    Statisticians need to learn programming or I will kill them all.

  6. Mathematicians just need to shutup. by HornWumpus · · Score: 4, Insightful

    We know as much statistics as we need to know.

    Some know more, some less. Each has traded off hours vs. knowledge in many fields.

    For example: Why would a programmer who's job is to automate bean counting need to know more then basic statistics? (s)he rightfully focuses his efforts on accounting.

    One post calculus statistics course gives me enough grounding to know what I don't know and punt to experts when I need to.

    Fucking specialists forget all the things they don't know and only look at the world through one lens.

    --
    John McAfee 'It was like that time I hired that Bangkok prostitute; to do my taxes, while I fucked my accountant'
    1. Re:Mathematicians just need to shutup. by gardyloo · · Score: 2, Interesting

      We know as much statistics as we need to know.

      Some know more, some less.

      That's either the most honest, insightful comment I've ever seen, or the most useless. I'm 92% sure, with an uncertainty of about +/-5%, that it's the latter.

    2. Re:Mathematicians just need to shutup. by __aasqbs9791 · · Score: 5, Insightful

      One post calculus statistics course gives me enough grounding to know what I don't know and punt to experts when I need to.

      That's actually his argument (though I'm pretty sure he doesn't realize it, having met him a few years ago at a conference). People need to know their limits, and the strengths (and weaknesses) of others, and defer to them when they know what they're talking about, rather than talking out of their asses. As you point out, you can't know everything, but you'll defer to others who know more when you need to. I'm pretty sure Zed would like working with you based upon that fact alone (I know I value that trait and try to express it myself). Far too many people think they aren't allowed to have any weaknesses (and we all do in some area or another) so they talk a big game, and when push comes to shove, they will actively block people who actually know more than they do about the subject at hand. Working with too many people like that has driven Zed insane (IMHO) and I know I've been close to it at a couple of work places before (and really loved the one that wasn't like that hardly at all).

    3. Re:Mathematicians just need to shutup. by Toonol · · Score: 5, Insightful

      But statistics is one of those fields that benefits everybody; it's a bit like probability, logic, or (further afield) history. Lack of a fundamental understanding of statistic can lead you astray in a near-infinite number of ways.

      I have sat in business meetings hundreds of times where I've seen decisions made on completely meaningless and irrelevant data, because the people involved don't understand statistics. The same holds true in your personal life; decisions with purchasing products, investing money...

      Now, I'll bet that most slashdot readers have the minimum amount of knowledge of statistic to avoid the most egregious errors; but more knowledge is certainly helpful. It will help you in a myriad of ways.

    4. Re:Mathematicians just need to shutup. by Anonymous Coward · · Score: 2, Insightful

      Being socially adept is also a skill that benefits everybody but many programmers just arent. I hardly know anything about statistics, but Im not afraid to ask questions. Im sure there's stuff that other programmers know and think equally fundamental to success that Zed doesnt. It's fantastic that he's passionate about statistics. That skill certain comes in handy, but how much more important is it than helping everyone on the team get their job done, for example?

    5. Re:Mathematicians just need to shutup. by LostCluster · · Score: 2, Insightful

      The stats book I used in college had a table where they computed out the normal distribution equation to a table that the non calc-knowing could look up. Of course, than means that table had to be distributed on finals day.

      Now, there's a funny think when you write out a table of values. You have to make an intentional mistake, or you're not able to have an effective copyright because the infringer could claim they did the work themselves.

      I wrote a computer program to check the values to four digits (because that was the precision of the table) and found the one mistake. Funny thing, there were people who believed everything in the book had to be perfect... they also seemed to each have a favorite religion book, but the people didn't agree on the same one. The professors were alarmed... they had a problem about to use that value planned for the final...

  7. Title fail. by girlintraining · · Score: 5, Funny

    Programmers Need To Learn Statistics Or I Will Kill Them All

    Okay, two things: First, threatening programmers never work. Management's been trying that for years. Second -- don't you mean 'kill -9' them all, or maybe demalloc(), or cast them to void*, or one of a dozen other witty things you could do besides the mundane answer of threatening stabby bits on them because you have a case of intellectual snobbery?

    --
    #fuckbeta #iamslashdot #dicemustdie
    1. Re:Title fail. by girlintraining · · Score: 3, Informative

      Don't you mean free()?

      #include <stdhumor.h>
       
      void demalloc (void *ptr);
      void demalloc(*ptr)
      {
      /* I meant to say */
          free(ptr);
      }

      --
      #fuckbeta #iamslashdot #dicemustdie
    2. Re:Title fail. by Anonymous Coward · · Score: 5, Funny

      or firefox's implementation:

      void demalloc(*ptr)
      {
      /* noop */
      return;
      }

  8. The funny thing is he's doing exactly the same by Rix · · Score: 4, Insightful

    He's just as arrogantly claiming that he's right and they're wrong. Now, he may very well in fact be right, but he's taking the same obstinate position the people he criticizes do.

    It's important to know when your input is not desired. Even if you think it should be.

  9. The reason people ignore you Zed.. by Anonymous Coward · · Score: 5, Insightful

    is not because they don't understand statistics. It is because you are a dick.

    1. Re:The reason people ignore you Zed.. by dbarclay10 · · Score: 2, Interesting

      Your comment ("the reason people ignore you is because you're a dick") is clearly a troll, but it was also moderated Insightful ... which might also be a troll :)

      Nevertheless, assuming for a moment that you're being truthful in your expression, then I have this to say:

      This is what is wrong with the world today. Billions upon millions of morons who don't know what they're doing, and people trying to show them how to (or, hell, what the fuck - people trying to beat them into) do(ing) it the right way.

      You want these assholes who can't even figure out how to correctly measure something to build the bridge you drive over twice a day? How about the building you work in?

      Or I dunno, maybe you'd prefer having _only_ people who will point out errors when they see them working on it? How about your doctor? You want your operating room filled with maybe one smart guy who recognizes an error and six people who don't know any better? And you're saying that, when the smart guy recognizes the error and tries to point it out (no matter HOW he does it, though I'm betting the original poster isn't that much of an asshat at work), he's being a dick?

      Christ, what's wrong with you? Seriously?

      --

      Barclay family motto:
      Aut agere aut mori.
      (Either action or death.)
    2. Re:The reason people ignore you Zed.. by Anonymous Coward · · Score: 4, Insightful

      Claiming that the author is a dick is not mutually exclusive to him having a good point. The author is right in his claims that people who don't know what they're talking about often think they do and get pissy when someone claims otherwise. But the author presents this viewpoint in a really stupid manner. It is dickish to say, essentially, "Hey idiot, you're wrong", even if the person is wrong.

      Note how your response is dickish, but probably right in claiming that the world is filled with arrogant/stubborn people.

    3. Re:The reason people ignore you Zed.. by Jedi+Alec · · Score: 2, Insightful

      Ah, the good old clash between the real world and the way you(we?) think it should be.

      Pointing out that people are wrong is a sensitive process. If you do it the wrong way, you provoke an emotional response that stops the person you're trying to convince from absorbing what you're trying to tell them.

      It doesn't matter if you're right or wrong, if you convey your information in a way that is perceived as "being a dick" it will never reach its destination. That sucks, but it's just the way most human beings work. And I very much doubt that this is a root cause of "what is wrong with the world today", unless people getting pissed off because some know-it-all jackass is telling them they're a moron is a recent development.

      --

      People replying to my sig annoy me. That's why I change it all the time.
    4. Re:The reason people ignore you Zed.. by freedomlinux · · Score: 2, Insightful

      I think all statisticians should have to learn writing communications skills.
      Zed sure embarrasses himself by writing such an atrocious piece of garbage.

      Maybe people would listen to Zed if he didn't:
      a.) Depend on vulgar language to emphasize an argument (and subsequently)
      b.) Prove himself as a huge douchbag.

  10. Statistics is HARD by omb · · Score: 4, Informative

    Statistics is HARD, for two reasons:

    (a) Probability theory, on which all practical Statistics is based it both (i) counter-intuitive and (ii) difficult

    (b) The very Mathematics on which it is based is obscure

    And, worst of all, it is uniformly badly taught, even in good universities, and the Statistics for XXX are uniformly awful, blind leading the blind.

    Lastly it is very hard to get a staight answer from a mathematical Statistician.

    1. Re:Statistics is HARD by codewarren · · Score: 3, Funny

      Statistics for XXX are uniformly awful, blind leading the blind.

      They have statistics for porn? (!!)

      What could be wrong with that? And blind on blind action? Strange, but interesting.

    2. Re:Statistics is HARD by radtea · · Score: 4, Insightful

      Statistics is HARD, for two reasons:

      I'd argue that probability theory isn't as hard as people make it seem, but statisticians are wankers. Most of what we think of statistics was developed by people who were intimately engaged with empirical research, but modern statisticians are mathematicians, many of whom have never actually performed an experiment. They think the statistics are real, whereas experimental scientists know the truth: God made the Probability Distribution Functions. All else is the work of man.

      Furthermore, modern computing has made a lot of the conceptual apparatus of conventional statistics irrelevant, as it is designed to deal with the problem of reducing problems to something that can be computed by hand and finished off with a single table lookup. Today its a rare case that we can't get at the PDFs directly, bypassing much of conventional statistics. But due to how badly the stats are taught, and how poorly probability theory is understood, we are still living in a world where p-values are the exception, not the norm, and when they are quoted they are frequently unrealistic because they are based on statistical assumptions that are not warranted given the non-idealities of the data.

      So I'd argue that statistics is basically a dead field populated by zombies who are dedicated to infecting as many students as possible. If we taught thermodynamics or mechanics with equally outmoded concepts they would be really hard too.

      --
      Blasphemy is a human right. Blasphemophobia kills.
    3. Re:Statistics is HARD by thesandtiger · · Score: 5, Interesting

      I don't think it's hard - I just think it requires a different way of thinking than most programmers usually take to maths.

      As a programmer/developer who went into research (in social sciences, so it's really soft), I can say that in my experience stats is really closer to a programming language than it is to other maths. Here's why:

      1) You have a LOT of tools to pick from. What kind of analysis do you want to do? What kind will give you the most useful result? What kind is your data amenable to?

      2) You don't always have a clear choice as to which is the best for a given situation. Sometimes you need multiple different types of analysis to really get the full picture.

      3) Just because it's math doesn't always mean it's right. There's some crazy ass black-box magic stats stuff we use for one project of ours that, in theory, will let us figure out the demographic composition of an unknown target population. Maybe. Sometimes. If the wind is right. Or not.

      4) At the advanced levels, it's fucking insane. People who hack stuff like ultra optimized 3d engines with large quantities of assembler or whatever always wigged me out because my brain just doesn't work that way. With the really complex stats stuff it's the same way - I can plug and chug with the formulas, but I honestly have about as much comprehension of why some of the more advanced stuff works as my dog has of CPU design.

      5) If you know the basics, you know just enough to be dangerous and really piss off people who know what they're doing. Being able to run an anova or determine correlation makes some people think they actually know what's going on because, hey, it's math. But a lot of people who just do the basic stuff think their results are more meaningful than they actually are - falling prey to the whole "it's statistically significant therefore it must be IMPORTANT" fallacy (when you can certainly have things that are "statistically significant" but actually have virtually no impact on the outcome.

      6) Even when people know their shit, they disagree. A fine example of this would be the Space Shuttle failure rate - you had people saying that the shuttle would suffer a critical failure from everywhere between 1 in 5 and 1 in 50,000 launches. And depending on what tools they used to do their analysis, they were correct. Same as with programming languages - depending on the problem, equally skilled programmers might pick entirely different languages to use because they think one part or another is more critical.

      Honestly, I really enjoy stats - if I had to do it all over again I would probably have spent a LOT more time working with stats than I did as a programmer in my younger years - but I won't pretend that it's totally clear what tools to use when. The author of TFA should do well to realize that even fellow statisticians would probably slap the shit out of him over some of his beliefs about how to properly go about utilizing stats toolsets.

      --
      Since I can't tell them apart, I treat all ACs as the same person.
    4. Re:Statistics is HARD by Anonymous Coward · · Score: 2, Insightful

      The mathematics behind statistics is _not_ "Calculus 2". It is measure theory and analysis.

    5. Re:Statistics is HARD by omb · · Score: 2, Insightful

      Sorry, the replies indicate just how correct what I wrote was:

      1. It is not about formulas, or Calculus xxx, it is about really understanding what you are doing, and how all the formulas were derived, and some of that is really heavy Pure Mathematics in particular Algebra and Analysis, so that, if necessary, you can work out the probability theory in new situations.

      2. In addition to the Math, there is Logic, Philosophy and Science in Experimental Design.

      The big problem is that people who just know the formulae miss apply them to wrong experimental situations.

      The most topical current example is the AGW controvesy where some Climatologists, HAD-CRU, eliminated (perceived outlier) data not realising that would mean that confidence estimators on their data were thereby faulty, so all that work must be re-done.

    6. Re:Statistics is HARD by kramerd · · Score: 2, Informative

      Thought experiment 1 - this would be a significant finding, provided that you did not ask how many days have passed as the number you are asked for; if you ask for a number and they respond in a pattern, that is a significant finding. It would be statistically significant, however, that patterns are used if you instruct the person to follow such a specific pattern, because you removed the opportunity for variablity by your instruction. You dont have a random sample if there is no variability in a population. I already covered this.

      A sample, on the other hand, would not be one person responding with a number day after day after day; rather it would at least be hundreds (if not thousands) if you wish to extrapolate to the population (either people or numbers). You can't do it with 1 data point.

      Your third paragraph is confusing, because it is a paraphrase of what I said, only it doesn't fit with everything else you say in your reply.

      Second thought experiment - Sex isn't male (neither are y-chromosomes), you are thinking of gender. Regardless, you would find a statistically significant relationship between males having a y-chromosome, because its part of the definition of how you define gender. This would be like sampling a population of red cars to determine if they are red.

      Try coming up with a population that has variability, so that taking a sample makes sense, and you will see that statistical significance matters.

  11. Go ahead and try it by thetoadwarrior · · Score: 3, Insightful

    I know enough about statistics to know statistically I know I'm safe from his threats. I suspect if I were a bag of Cheetos the odds were be against me but that's not the case.

  12. It's not just statistics by im_thatoneguy · · Score: 2, Insightful

    I've found that more than just about any other degree Computer Science and to a less extent Medical Degrees imbue the recipient with an unnatural ego when it comes to subjects with which they are unfamiliar. I propose we remove the word Science from CS degrees and call it what it is "Computer Programming and Troubleshooting". There are far too many CS graduates who think they are actually scientists.

    1. Re:It's not just statistics by radarsat1 · · Score: 4, Insightful

      I disagree that CS is just "programming and troubleshooting", but I do agree that Computer Science is a complete misnomer. It's extremely misleading, and difficult to explain to people, "I'm a computer scientist, but no I'm not actually a scientist, instead I understand how to describe formal languages in terms of strict grammar rules and transform abstract syntax trees from one representation to another."

      It shouldn't be called Computer Science, it should be called Computational Mathematics, because that's what it is.

      (On the other hand, there is whole branch of CS that extends very deeply into statistics called Machine Learning, but at the core I'd say it is still more mathematics than science. There is also human-machine interaction which often goes under CS, but is actually more like psychology.. so it's not so cut and dry.)

    2. Re:It's not just statistics by Dahamma · · Score: 2, Insightful

      Maybe wherever you went to school they taught "computer troubleshooting" as a degree, but some of us actually got a solid foundation in the various theoretical and practical foundations of computer software engineering.

      Though I do agree that "Computer Science" is a stupid name. They already have Mechanical Engineering, Chemical Engineering, Electrical Engineering, etc - why not just call it "Software Engineering"? [I'd say "Computer Engineering", but since that was my major and I also had to do transistor physics and VLSI design, it I guess does need to be separate...]

  13. sounds impossible to please? by v1 · · Score: 3, Insightful

    I've been studying it for years and years and still don't think I know anything.

    And yet you're expecting someone whose expertise is in a different field to know more about it than you?

    We can't all be experts in everything. If you're the expert in the field of discussion, get used to educating your coworkers on the topic, or find another job where you're surrounded by people with the same education and expertise as you.

    The average person is an expert in no more than two or three related areas. That's why people work in teams, to cover each other's blind spots.

    --
    I work for the Department of Redundancy Department.
  14. Zed Shaw is a tosser. by toby · · Score: 2, Informative

    Nothing new to see here.

    --
    you had me at #!
    1. Re:Zed Shaw is a tosser. by mhelander · · Score: 2, Insightful

      Plus that hallmark observation of wise, old men in any profession: Whenever you see a power of ten, chances are the number is completely made up.

  15. Stats? Fuck that. by delysid-x · · Score: 2, Informative

    Statstics is WAY beyond what a programmer cares about. Logic is all that matters. Statistics->logic is the problem of the software engineer, not the programmer.

  16. He makes some good points... by SanityInAnarchy · · Score: 5, Insightful

    ...unfortunately, they are mostly lost in the irony of statements like this:

    I think women are better programmers because they have less ego and are typically more interested in the gear rather than the pissing contest.

    I doubt I've seen anyone more thoroughly entrenched in a pissing contest than Zed Shaw, of the website formerly known as "Zed's So Fucking Awesome".

    --
    Don't thank God, thank a doctor!
  17. Statistical analysis of the summary by mmmmbeer · · Score: 2, Interesting

    Let's see, we have one guy complaining about how none of his programmer coworkers understand statistics, and we have X coworkers who undoubtedly disagree with him. Since we do not know him or any of his colleagues to any meaningful degree, we have to assign equal weight to each of their opinions. Statistics then tells us there is a 1/(X+1) chance of his being right, and an X/(X+1) chance of their being right. We can assume that X >= 2 based on his ranting, therefore resulting in the odds favoring them by at least 2/3, and probably much more. Therefore it is only rational to assume they are correct.

    1. Re:Statistical analysis of the summary by Ian_Mi · · Score: 2, Funny

      I think your statics are flawed. To give equal weight to each person's opinion we should assume that each person has an independent probability, p, of being right. Then the probability of Zed being right and the others being wrong would be p (1-p)^10 while the probability of the others being right and Zack being would be p^10 (1-p). Since these events are disjoint the probability Zack being right given that one of these two events occured would p (1-p)^10 / (p (1-p)^10 + p^10 (1-p)) = (1-p)^9 / ((1-p)^9 + p^9) while the probability of the others being right would be p^9 / ((1-p)^9 + p^9). Thus if p is less than 1/2 then Zed is more likely to be correct.

    2. Re:Statistical analysis of the summary by brian_tanner · · Score: 5, Informative

      Wow. What class did you take that says if you don't know something you should assume equal probability?

      I don't know if there is an invisible elephant in my kitchen, so I guess I should assign equal probability to both outcomes. I also don't really know how Baccarat works, I guess my odds are 50/50.

      Without knowing something about he or his coworkers, you by definition cannot make any statistical statements. To make any statements, you would first need to make some observations. This is how statistics is different from logic. Statistics is grounded in data.

      I don't agree with Zed, but you may have just proved his point.

  18. Re:Logic and Reason *ARE* superior to evidence and by AnotherUsername · · Score: 2, Insightful

    I prefer logic and reason mixed with evidence and observation.

    If you just have logic and reason, then you get religion. Logically, it worked out when it was created. There is no evidence to counter it, so it must be true. Religion was created with logical reasoning. Some may say it was incorrect reasoning, but it was reasoning nonetheless.

    On the other hand, if you just have observable evidence, with no logical reasoning, you can have all the data in the world, but you will have nothing to use it with. True, you can see it, but you cannot understand why it is the way it is.

    Having all of one or the other is useless.

    --
    I don't like Linux. This doesn't make me a troll.
  19. lies, damned lies... by yalap · · Score: 3, Funny

    Lies, damned lies and statistics. Us programmers are too busy dealing with the first two to ever reach the third..

  20. Re:Reply from a programmer that knows no statistic by doublegauss · · Score: 2, Informative

    You probably still think I am a lunatic, but hear me out.

    You don't qualify as a lunatic; just as someone who has no idea of what he's talking about. Absolutely no idea. Your post, my friend, is so full of ideas you obviously misunderstood that I won't even attempt to make a list.

    And yes, I do statistics for a living.

  21. Summarized for people who don't want to read Zed by SanityInAnarchy · · Score: 4, Insightful

    So, since so many people don't seem to want to actually read Zed's stuff -- and I honestly don't blame you -- I'll try to summarize:

    Eventually, every major science adopted an empiricist view of the world. Except Computer Science of course.

    He tends to bitch a lot about computer scientists. I'm just starting a CS degree, and there is a Statistics class in the curriculum. Is he working with people with good degrees, people from a technical college with a "programming" degree, people from a diploma mill, or high school students with no degree at all?

    Of course, he seems to be implying it's everyone, and doing so in a typically Zed-like way.

    "All you need to do is run that test [insert power-of-ten] times and then do an average." Usually the power-of-ten is 1000...

    I don't know that I've ever heard that particular statement. But it's a good point:

    How do you know that 1000 is the correct number of iterations to improve the power of the experiment?

    Generally because it was probably closer to a million, so I'm erring on the side of taking more, rather than fewer, measurements. But without careful consideration, I could be way off.

    How are you performing the samplings?

    I think this is vastly less important than how you are dealing with the data, but it is also a good point. For example, his complaint is that an average isn't enough; with detailed enough logging, he could easily go back into my data and figure out min, max, standard deviation, histograms...

    How do you know that 1000 is enough to get the process into a steady state after the ramp-up period?

    Not a huge deal -- the "steady state" will almost certainly be faster than the "ramp-up" period. Worst case, I'm over-optimizing.

    What will you do if the 1000 tests takes 10 hours?

    Either ctrl+c, or try it 10 times.

    How does 1000 sequential requests help you determine the performance under load?

    Very good point here. It's still a useful statistic, but you still need to measure things like 1000 simultaneous requests, not just 1000 all in sequence.

    On the other hand, if your performance is acceptable with them all in sequence, you could just run it through something like Event Machine, so it's all sequential on production, too.

    The most troubling problem with these single number “averages” is that there’s two common averages and that without some form of range or variance error they are useless. If you take a look at the previous graphs you can see visually why this is a problem. Two averages can be the same, but hide massive differences in behavior...

    So yes, always make sure you can record enough statistics so that someone else can come along and use your data to give you something meaningful.

    The moral of the story is that if you give an average without standard deviations then you’re totally missing the entire point of even trying to measure something. A major goal of measurement is to develop a succinct and accurate picture of what’s going on...

    It doesn't have to be statistically accurate. It just has to be close enough.

    Ah, confounding. The most difficult thing to explain to a programmer, yet the most elementary part of all scientific experimentation. It’s pretty simple: If you want to measure something, then don’t measure other shit.

    This is both a very good and a very bad idea. It ties into the peeve he had before -- ramp-up time. For example:

    If we want to take one single line of code and test it then we can. If we want to only verify one single query on a database then what’s stopping us?

    What's stopping us is that our applications don't actually work like that.

    --
    Don't thank God, thank a doctor!
  22. Re:Logic and Reason *ARE* superior to evidence and by line-bundle · · Score: 2, Funny

    No, Logic and Reason are superior to Cubase.

    It's a music joke, laugh.

  23. Knowledge isn't the problem by NitWit005 · · Score: 2, Informative

    From his complaints, I can tell knowledge isn't the real issue. Testing performance takes a huge amount of time. You need to simulate other programs running, multiple users and make sure the test matches what real users might do. Generally, this requires writing completely independent test programs and charting the logging from them. People just don't want to go to that kind of effort. It can take weeks just to create proper tests for complex programs like web servers.

  24. Re:Bitch, while you were writing all that jive by cheftw · · Score: 2, Funny

    I can vouch for this. You might think AC just spends all his time on /. but the reality is that he's a real big-shot who can afford to make ridiculous claims.

    --
    Always back up, never back down. ---- Think you're cool 'cos your uid is prime? Take mine, modulo the one digit integers
  25. 90% of the programming game by presidenteloco · · Score: 2, Funny

    is one half mental.

    of course that explains why 90% of all programs written are CRUD.

    -with apologies to Yogi Berra, Theodore Sturgeon, and a 20% apology, as a matter of principle, to a guy called Pareto.

    --

    Where are we going and why are we in a handbasket?
  26. It's the Zed Effect by greg_barton · · Score: 3, Interesting

    The Zed Effect: Whether you're right or wrong people will disagree with you just to piss you off.

  27. Everyone should learn statistics by jackchance · · Score: 4, Informative

    Before computers stats involved using parametric tests (t-tests, anova, etc) which made assumptions like "the data comes from an underlying normal distribution". BTW, in stats terms "normal" mean "Gaussian".

    Now, with cheap and fast computers, we can actually compute the confidence intervals non-parametrically through permutation tests and bootstrapping without assuming anything about underlying distributions. In most cases, this non-parametric test is the "right thing to do". Most of the time, the results are the same as using a parametric test.

    However, a HUGE disaster in empirical science has been the problem of multiple comparisons. With computers it is so easy to compute correlations and significance tests between every possible slice of your data set. Many "scientists" don't have good statistical knowledge and pray at the alter of "p < 0.05". They don't know about or understand the problem of multiple comparisons. So they do 20 tests, find one that comes out p0.05 and write a paper about it. They don't get that if you do 20 tests you are very very very likely to find one that come out p < 0.05.

    Anyone who has access to excel or matlab can do this little experiment.

    samp=50 normally distributed random numbers.

    for x=1:100
    test=50 normally distributed random numbers (mean=0, var=1);
    sig(x)=ttest(samp,test);
    end

    now look at the sig vector. OMG, 5% of the tests came out significant!!!

    Now you are writing a paper all about how x is linked to y. But you are essentially throwing dice and then writing a paper about why it came up '3-3'.

    --
    1 1 2 3 5 8 13 21 34 55 89 144 233 377 610 987 1597 2584 4181 6765
    1. Re:Everyone should learn statistics by Daniel+Dvorkin · · Score: 5, Interesting

      Resampling-based statistics haven't replaced parametric models, and I doubt they ever will, for one very simple reason: as the available processing power grows, so does the amount of data. In my field, bioinformatics, the size and complexity of the data sets follows a Moore's Law of its own, and I don't think bioinformatics is unique in this. "Just bootstrap it" is easy to say, and certainly there have been many times when dealing with an analytically intractable distribution when I've done just that, but if the analytical solution takes minutes and the bootstrap solution takes weeks, you have to take this into account.

      Of course, resampling isn't the only way to look at problems non-parametrically. Often a good compromise is to go with rank-based statistics, which are fast and easy to calculate -- and you may not have an analytically tractable model for the distribution of the original data, but you don't have to, since by working with ranks you can define a distribution with good analytical properties. You still need to do some reality-checking exploratory data analysis, of course, but this is an approach that generally works well in practice.

      --
      The correlation between ignorance of statistics and using "correlation is not causation" as an argument is close to 1.
  28. Re:Reply from a programmer that knows no statistic by Improv · · Score: 2, Interesting

    In practice, statistics is an attempt to quantify messy, uncertain events into a figure. We can even measure the extent to which this works, roughly speaking. Your hard drive has a rough time-to-failure, based on analyses of the things that tend to go wrong in that system. Sure, any time it fails, it's not statistics that broke it; it's one of the kinds of problems captured in the statistical analysis. And sure, you could break it down further for disks and note that the controller has a different failure rate than some other component, just as a bridge has a number of possible failures. Problem is, for any of those, you could break it down further and get failure rates for subcomponents, regions, etc. So what? It's still useful to have statistical measures - the real world is complex, and statistics helps us capture things we otherwise couldn't.

    Programmers (particularly but not only young programmers) might not like to acknowledge any field but their own has any depth ("Everything is simple! Just do it my way", hence Ron Paul/Ayn Rand fanboyism and all sorts of other stupidities) - I don't know if there's a lot we can do but hope they grow out of it (It took me awhile to do it, as did a number of people I knew when I was younger, but I made it out).

    Basically, if your worldview doesn't wed empiricism and a reasonably flexible practical philosophy, your worldview is (if you err on the pro-logic end) too inflexible and you're going to miss out on standing on the shoulders of giants. Neither the logician nor the mystic understands the world.

    --
    For every problem, there is at least one solution that is simple, neat, and wrong.
  29. Re:There is a spell check in the comment box... by fast+turtle · · Score: 3, Funny

    What Spell Check? I didn't know I was writing a Spell. Is it a good or evil spell?

    Damn it's evil. Now I've got to listen to da da da de - da da de bop all the time.

    --
    Mod me up/Mod me down: I wont frown as I've no crown
  30. Re:Very good (from someone who's taken BOTH)... ap by LSD-OBS · · Score: 3, Insightful

    Yup. Also, for a guy who claims to know so much about statistics and measurement, it's weird how he judges programmers so sweepingly on the sole basis of his anecdotal experiences.

    --
    Today's weirdness is tomorrow's reason why. -- Hunter S. Thompson
  31. Zed Shaw sounds like a douche. by Evil+Shabazz · · Score: 2, Informative

    So I read through his article. Yes, the whole mindless rant. The conclusion that one should REALLY draw from it is: Zed Shaw is a douche with Asperger's who clearly feels like his own personal area of expertise is underappreciated. Hey Zed, get over it.

    --
    Down with the career politician! SUPPORT TERM LIMITS
  32. Wikipedia on Zed Shaw by Selfbain · · Score: 2, Informative

    I like how the first part of his Wikipedia article says "Zed A. Shaw is a troll" with four citations.

    --
    Well, it has never been successfully tested.
  33. Acknowledged Difficulty is a Good Sign by weston · · Score: 2, Interesting

    not understanding a topic that even you are unwilling to acknowledge mastery of.

    Personally, I think that little acknowledgment increases his credibility quite a bit. It suggests to me that he's actually spent some real time coming to grips not just with glossy overview you get in a high school or college course but with some of the devilish subtleties of actually using the stuff.

    The funny thing about knowledge... the more it grows, the bigger you realize the frontier is. So, how good of a heuristic is apparent confidence?

    1. Re:Acknowledged Difficulty is a Good Sign by Hurricane78 · · Score: 2, Funny

      No, it doesn’t. If someone spends years and years on a topic, and still has the feeling he understands nothing at all, then clearly, he’s just too dumb for it.

      It’s like high voltage without high current. The result is a not very bright and maybe even destroyed lamp.

      --
      Any sufficiently advanced intelligence is indistinguishable from stupidity.
  34. Obigatory Stats Joke by frank249 · · Score: 4, Funny

    "I construct two sets of n=100 random samples from the normal distribution. Now, if I just take the average (mean or median) of these two sets they seem almost the same."

    So its true. The n's justifies the means.

    --

    Today's vices may be tomorrow's virtues.

  35. yea by unity100 · · Score: 2, Insightful

    please tell me whether you would like to rely on decision theory, game theory or utilitarian techniques to handle life chances of your children or their sensitive private/critical information in a database.

  36. Re:Very good (from someone who's taken BOTH)... ap by JWSmythe · · Score: 5, Informative

    1.) EASILY SKEWED (as in "4/5 dentists chew trident", oh "sure, sure", especially when they're on the corporate payroll (or paid off to say so by said corporation so their "evidence & observation looks good")

    and

    2.) IS THE SAMPLE SET LARGE & COMPREHENSIVE ENOUGH? (most?? Most are not, period)...

    You know, that particular citation has made me wonder in the past, but not enough to actually research it. So, I went off looking for more information and found it.

        The statistic was generated from a July 1976 survey.

        The sample group for this statistic was 1,200 dentists. These dentists were hand picked by the research company, probably with good reason.

        They were asked, what advice would they give gum-chewing patients

        1) sugared gum
        2) sugarless gum
        3) no gum at all.

        Sugarless gum got 85% of the vote. Not terribly surprising. I'd be fairly confident that their time had been paid for, or at very least they were told "This survey is being done for Trident Sugarless Gum." That is only speculation, so hush up.

        17/20 doesn't really sound very good. It just doesn't stick in your head. 4/5 is close enough, even though it reduces your answer to 80% (ahhh, a lie). Since these are marketing folks, I'm sure they pushed all kinds of values past focus groups, until "4 in 5" was accepted as most favorable.

        As the link cites, they're fairly confident that the "sugared gum" answer got at least one response. There's always someone that'll take the obvious wrong answer. If you don't believe that, look at any Slashdot poll. :)

        What they don't say is how many of the 1,200 samples were dropped. I'm sure there were non-responses, and they could have easily added any number of unfavorable answers in as non-responses. Of course, they couldn't have 100% in their favor, so they had to keep some.

    --
    Serious? Seriousness is well above my pay grade.
  37. He's not claiming they are wrong - they are unset by SuperKendall · · Score: 3, Interesting

    He's just as arrogantly claiming that he's right and they're wrong.

    No he doesn't.

    He claims that programmers need to understand statistics more. The people he is talking about are therefore not wrong - they are ignorant.

    But that term is loaded with negative meaning, it's more accurate to say they are like a variable with named "statistics" with a value that has never been set. Basically, they don't know what they are missing.

    It's like when programmers try to argue about how a language is bad when they've never used it. How would they know? Yet many without understanding of statistics are saying the same thing, they don't need to know any more.

    I know enough to know statistics can be a valuable tool. Why would you not want another tool that could help you? The people who refuse do so are less than they could be (as a programmer).

    --
    "There is more worth loving than we have strength to love." - Brian Jay Stanley
  38. Is it dumbed down enough for management? by upuv · · Score: 3, Interesting

    I hear you, I do performance engineering of web based systems. The developers, the managers, the testers, the architects all have no clue. You are correct here.

    However if you can not present your "theory" of how to do something in a dumbed down enough format then who cares. Because the pretty graph is pointless. It will be mis-interpreted, mis-understood, and mis-used.

    All the stats theory on the planet will not get you passed the dumb manager or developer. don't loose sleep of this. There is no point. Simply find metrics in your analysis procedure that do mean something to these people. They may not be the total picture but they are something. Build a reputation for being correct by starting with simple things. You are always going to but heads with a know it all developer / architect / manager. Fine let them go off and waste money and time. They will be found out as morons in time. You do your thing and simply become the guy to ask about performance and how to do this.

    Being understated and consistently showing above average results for your work is how you will rise up. Being and A-hole about it is not going to help anyone. As a matter of fact I would can your butt for being a D#ck.

  39. Opposite problem here by Kludge · · Score: 2, Interesting

    I question their metrics and they try to back it up with lame attempts at statistical reasoning. I really can't blame them since they were probably told in college that logic and reason are superior to evidence and observation.

    I work with a number of statisticians and I have the opposite problem. They look at the data, apply mathematical transforms to it, and come to a conclusion, whether that conclusion makes any sense or not. They make little attempt to reason that the data may flawed (which experiments often are), or does not really represent what we are trying to measure, or they are using the wrong statistic to summarize the effect. It is very frustrating.

  40. In 1976... by alispguru · · Score: 2, Informative

    ... I ran into a professor of statistics who said that computers were going to be a passing fad in his field.

    --

    To a Lisp hacker, XML is S-expressions in drag.
  41. damn lies = MBA statistics by AliasMarlowe · · Score: 3, Interesting

    Statistics are important; it is highly unlikely that anyone with an MBA will know how or why, but they want them.

    In fact, it is almost a certainty that any given MBA will either lack statistical expertise or will misapply it unthinkingly in a cook-book style. The pseudo-statistics behind Six Sigma comes immediately to mind.

    I had repeated theoretical discussions with the four MBA experts who "trained" us (a group of six PhDs in Physics & Engineering doing R&D) in the ways of Six Sigma. There were problems with the statistical theory they presented right from the start - and they were clearly unaccustomed to being contradicted along the lines of "that's not right/applicable in this case, and here's why". For instance, they failed to acknowledge that non-Gaussian distributions could exist, then refused to accept that procedures should be adapted to the data if it was non-Gaussian. Next, they adamantly refused to believe that the 1.5 Z shift hypothesis was supported only by a few studies, all relying on a single dataset from the 1950s for die-based manufacture, and totally irrelevant to most other processes. The Six Sigma books all say "many studies" over decades support the Z shift hypothesis, but fail to cite them, and our MBA experts could not cite any such studies either. Thirdly, they refused to accept that an additional mode of variability (not in the Six Sigma beliefs) existed in processes with feedback (such as recycle lines or controllers). In many cases, this mode guarantees non-Gaussian variability in the process output.

    Their advice was that to pass the course, we should ignore our knowledge of statistics (which they acknowledged was far better than theirs) and of process variability, and just "apply the documented methods". We did, and we all passed the course. Then we ignored the Six Sigma bogus statistics bullshit and got on with our jobs using proper statistics to analyze and solve problems in variability with the products we were developing.

    MBAs seem to want statistics, but the vast majority appear to lack the training in how to generate proper statistics, or how to use them competently if someone else supplies them. Most MBAs appear to think the world is described adequately using Gaussian distributions, and a few "experts" know the Weibull distribution or the t-distribution. Other distribution types (Poisson, discrete/categorical, etc.) are totally foreign, and methods of inference beyond simple unconditional analyses are also quite alien to them.

    I also understand that people who are good at it are rare.

    Perhaps not as rare as you might think. But those who have some aptitude in statistics know enough to keep their mouths shut when the data tells them to. MBAs on the other hand, ignorant of their own ignorance, are as verbally promiscuous as politicians...

    --
    Those who can make you believe absurdities can make you commit atrocities. - Voltaire
  42. is this really necessary? by Goldsmith · · Score: 2, Insightful

    I'm a physicist, I know plenty of statistics. The kinds of statistics he's talking about are not hard. If you can do algebra, you can do things like calculate the standard deviation and variance of a set of measurements.

    Was this rant really necessary? I run into people in physics who don't take care of these details. I find that a simple "can you put a standard deviation on that number?" or "can you repeat the experiment?" generally gets the job done. If you want to be more scientific, just start with those questions, and see where it takes you... you could even add "please" if you wanted to be nice. I find threatening people with death and belittling their intellect while talking about trivial calculations doesn't generate useful data.

    To be fair, it sounds like Zed has been working as staff at a university. This has nothing to do with statistics, but it's probably the real reason he's in such a bad mood.