Why Programmers Need To Learn Statistics
David Gerard writes "Zed Shaw writes an impassioned plea to programmers: Programmers Need To Learn Statistics Or I Will Kill Them All. Quoting: 'I go insane when I hear programmers talking about statistics like they know s*** when it's clearly obvious they do not. I've been studying it for years and years and still don't think I know anything. ... I have taken a bunch of math classes, studied statistics in grad school, learned the R language, and read tons of books on the subject. Despite all of this I'm not at all confident in my understanding of such a vast topic. What I can do is apply the techniques to common problems I encounter at work. My favorite problem to attack with the statistics wolverine is performance measurement and tuning. All of this leads to a curse since none of my colleagues have any clue about what they don't understand. I'll propose a measurement technique and they'll scoff at it. I try to show them how to properly graph a run chart and they're indignant. I question their metrics and they try to back it up with lame attempts at statistical reasoning. I really can't blame them since they were probably told in college that logic and reason are superior to evidence and observation.'"
Everything I needed to know about statistics I learned playing poker.
I'm trying to teach myself to set people on fire with my mind... Is it hot in here?
110%.
Correlation != causation. Just repeat that and you don't need to know statistics.
Confucius say, "Find worm in apple - bad. Find half a worm - worse."
Maybe the problem is in your presentation. Even here, you tell programmers that you want to kill them for not understanding a topic that even you are unwilling to acknowledge mastery of. Then you tell us how hard the topic is to understand, even though you've spent so much time trying to learn it.
Is it any wonder that no one takes your suggestions seriously? You are practically sabotaging yourself with self-effacement.
These aren't homework problems you're tackling here. They are business problems and you need to sell yourself and your ideas if you want to get any traction. Do you have any evidence that your methods are better than the SOP thus far? Do you have any case studies that show how effective statistic analysis is in *any* of your projects?
Or are you simply taking something that seems like a data point and extrapolating it to cover a vast swath of applications?
Statisticians need to learn programming or I will kill them all.
We know as much statistics as we need to know.
Some know more, some less. Each has traded off hours vs. knowledge in many fields.
For example: Why would a programmer who's job is to automate bean counting need to know more then basic statistics? (s)he rightfully focuses his efforts on accounting.
One post calculus statistics course gives me enough grounding to know what I don't know and punt to experts when I need to.
Fucking specialists forget all the things they don't know and only look at the world through one lens.
John McAfee 'It was like that time I hired that Bangkok prostitute; to do my taxes, while I fucked my accountant'
Programmers Need To Learn Statistics Or I Will Kill Them All
Okay, two things: First, threatening programmers never work. Management's been trying that for years. Second -- don't you mean 'kill -9' them all, or maybe demalloc(), or cast them to void*, or one of a dozen other witty things you could do besides the mundane answer of threatening stabby bits on them because you have a case of intellectual snobbery?
#fuckbeta #iamslashdot #dicemustdie
Zed Shaw says: "I've been studying it for years and years and still don't think I know anything"
Don't you think this might be telling you something, like... perhaps statistics are too hard for you? Leave the real work to the people who do know what they are doing and do know something about the field: programmers.
I never took a statistics class as an undergrad. In retrospect, I think it would have been very useful, probably more so than the calculus I took (which I think is also a very good thing to know, but stats tend to be used more often).
He's just as arrogantly claiming that he's right and they're wrong. Now, he may very well in fact be right, but he's taking the same obstinate position the people he criticizes do.
It's important to know when your input is not desired. Even if you think it should be.
is not because they don't understand statistics. It is because you are a dick.
Statistics is HARD, for two reasons:
(a) Probability theory, on which all practical Statistics is based it both (i) counter-intuitive and (ii) difficult
(b) The very Mathematics on which it is based is obscure
And, worst of all, it is uniformly badly taught, even in good universities, and the Statistics for XXX are uniformly awful, blind leading the blind.
Lastly it is very hard to get a staight answer from a mathematical Statistician.
... something inside me wants to flame him for being a rude twat who wasted 1 minute of my lifetime, even though he has some valid points. I'd be surprised if he didn't get some responses along the lines of "cry me a river" etc.
"I love my job, but I hate talking to people like you" (Freddie Mercury)
I know enough about statistics to know statistically I know I'm safe from his threats. I suspect if I were a bag of Cheetos the odds were be against me but that's not the case.
I've found that more than just about any other degree Computer Science and to a less extent Medical Degrees imbue the recipient with an unnatural ego when it comes to subjects with which they are unfamiliar. I propose we remove the word Science from CS degrees and call it what it is "Computer Programming and Troubleshooting". There are far too many CS graduates who think they are actually scientists.
I was tasked recently with developing stat reports that would be used to give the best workers the most important tasks. I used their desired metric, and modified the numbers to show on a 0-100 scale where 75 is average and each standard deviation is 10 points. The result? The sample sizes were too small, and some groups had widely varying scores when every group member's performance was nearly identical. Then again, maybe I'm doing something wrong.
Seriously.
I've been studying it for years and years and still don't think I know anything.
And yet you're expecting someone whose expertise is in a different field to know more about it than you?
We can't all be experts in everything. If you're the expert in the field of discussion, get used to educating your coworkers on the topic, or find another job where you're surrounded by people with the same education and expertise as you.
The average person is an expert in no more than two or three related areas. That's why people work in teams, to cover each other's blind spots.
I work for the Department of Redundancy Department.
He cannot even write a logical, rational thought supporting why programmers need to know more than a casual level of statistics. He just rants about blue sunsets and writes the f-word a lot.
Nothing new to see here.
you had me at #!
That's ODBC, Junior. Details matter.
(And I'll bet you a thousand dollars that I earned more than you this month.)
Statstics is WAY beyond what a programmer cares about. Logic is all that matters. Statistics->logic is the problem of the software engineer, not the programmer.
...unfortunately, they are mostly lost in the irony of statements like this:
I think women are better programmers because they have less ego and are typically more interested in the gear rather than the pissing contest.
I doubt I've seen anyone more thoroughly entrenched in a pissing contest than Zed Shaw, of the website formerly known as "Zed's So Fucking Awesome".
Don't thank God, thank a doctor!
Zed Shaw writes an impassioned plea to programmers: Programmers Need To Learn Statistics Or I Will Kill Them All.
// This will never happen
Oh, say does that Star-Spangled Banner entwine / The myrtle of Venus with Bacchus's vine?
I certainly suffer from a feeling of being an expert in all fields. Deep down I guess I know I'm not, but I'd probably rather just muddle my way through it assuming I know everything there is to know. The trick is knowing when something is sufficiently out of your field that you need to defer to someone who is an expert in that field. Statistics is just one example. Certainly a little bit of knowledge in a lot of fields is a good thing, but when you have to choose between 4 years of study vs consulting someone who's already done 4 years of study, the choice should be obvious... (assuming you aren't going to spend the rest of your programming life doing heavily statistics related programming :)
For me the frustration is taking the word of an expert without understanding why and how they have arrived at that answer. I guess statistics is one field where the answer that 'feels right' is often not the answer that is right. The number of people who buy lottery tickets is a good example of that :)
I don't know how educated your colleagues are, but if they have studied computer science, then you should just shut your dumb mouth, because we learn how to analyze running times WITHOUT actually running it. Even without actually programming it, just by analyzing the problem itself. That is called "complexity theory" and (in that case) you are the one who doesn't have any clue about what you don't understand.
and go away with "tuning". You might improve running times a bit, but no little tuning hack can defeat the improvements you get by better algorithm design by an expert on algorithmics (I mean that e.g. some XOR AX AX might speed up your program by factor 2, but replacing simple backtracking with techniques to keep branching vectors small gets you exponential speed ups!)
The MAFIAA is a bunch of mindless jerks who will be the first up against the wall when the revolution comes
95% confidence in understanding statistics when applied to business setting is often just as good as 95% confidence in actual measurements. Yes, the last 5% are the trickiest bit, but be sure if there will be slightest indication that a proper application is required I won't be afraid to ask someone who knows more. It's just that it is quite rare.
In example: Performance testing systems. You care way more about the degradation mode than statistical model of sustainable load.
Those two things is what statistics is based in the first place as well. Evidence etcetera comes second. If you can't blow logical counterarguments away you're probably wrong and you're indeed lacking in understanding.
Let's see, we have one guy complaining about how none of his programmer coworkers understand statistics, and we have X coworkers who undoubtedly disagree with him. Since we do not know him or any of his colleagues to any meaningful degree, we have to assign equal weight to each of their opinions. Statistics then tells us there is a 1/(X+1) chance of his being right, and an X/(X+1) chance of their being right. We can assume that X >= 2 based on his ranting, therefore resulting in the odds favoring them by at least 2/3, and probably much more. Therefore it is only rational to assume they are correct.
What has Zed Shaw done for humanity?
you fix it once to handle when some Anti-Mensa card carrying twit actually makes it happen
then you fix it a second time to prevent it from happening
every time you get data from a user/outside process you should be able to handle values that make you go Eh WOT?? and then chuck those values out (and emit the correct error code)
Any person using FTFY or editing my postings agrees to a US$50.00 charge
Meh, that's what compilers are for.
Confucius say, "Find worm in apple - bad. Find half a worm - worse."
three door problem? What about poker! ;)
You know, studying stuff in college for years doesn't make you smart. Maybe these are clever, practical people, and you're just not a good communicator?
I want to delete my account but Slashdot doesn't allow it.
Everyone needs to learn statistics. All of us who understand one iota of it are in a constant state of depression over how everyone keeps on making the most banal mistakes. But just a general gripe is not very helpful. Getting everyone to take advanced degrees in statistics is simply not going to happen. Most engineering courses inclue some basics, but that only helps a bit. What is needed is to teach it (to the "masses", i.e. the ones who really ought to know better) in terms of the pitfalls first, and what to understrand the workarounds. Those who have no iterest in pursuing it further might still gain some insight about where to be careful, and those with potential might more easily see the point in investing in some real knowledge.
sudo ergo sum
I studied it for years, so my e-peen is bigger. It worked in school, so it has to work in reality and thus they are wrong when they tell me it does not, despite them having experience with real applications while I have not.
Ok, snideness aside. Statistics is a wonderful tool (hey, my degree is in statistics actually), but I wouldn't want to impose my metrics on real applications without first looking whether they measure anything sensible. I turned for programming because, well, it's more suitable to me. But when I look at the metrics some of my superiors designed, cringing is all I can do.
Example: A metric that measures how much code you produce. Which is in theory nice. Who creates more code has done more work. Right? From a statistician's point of view, yes. But any programmer will tell you that it's trivial to write lots of lines or few, and they will do the same work. Most programming languages support that just fine. Does the statistician know? Probably not, unless he is a programmer too.
Example: A metric that measures the amount of code you alter. Which is in theory nice. You check out, change and check in code, and who checks out and checks in more (and does alteration in between) does more work than others. Right? No. For reference, see the Wikipedia game.
The reason why programmers scoff at metrics is that we've all seen our share of really, really crappy metrics that led to less instead of more productivity because everyone started gaming the system. Had to do that, because if you actually did sensible work, you fell behind in the metric against those that gamed (i.e. those that didn't produce in the first place).
We used to have a Bill of Rights. Now, with the rights gone, all we have left is the bill.
I prefer logic and reason mixed with evidence and observation.
If you just have logic and reason, then you get religion. Logically, it worked out when it was created. There is no evidence to counter it, so it must be true. Religion was created with logical reasoning. Some may say it was incorrect reasoning, but it was reasoning nonetheless.
On the other hand, if you just have observable evidence, with no logical reasoning, you can have all the data in the world, but you will have nothing to use it with. True, you can see it, but you cannot understand why it is the way it is.
Having all of one or the other is useless.
I don't like Linux. This doesn't make me a troll.
Best. Troll. Ever.
You know nothing about statistics, yet want to tell us how it is a phony science?
You couldn't have taken a few minutes on wolfram, or even wikipedia to even TRY to know a little of what you are talking about?
Yes, I do think you are a lunatic.
....Zed wants everyone to be just like him.
Before you design for reuse, make sure to design it for use.
Unless they're actually programming statistical applications, most programmers probably don't need to know statistics. As long as somebody on the testing team does, all the programmer needs to understand is that function X sometimes fails to meet its timing spec (perhaps "often fails..." or "occasionally fails..." might add some value) or whatever. Then they know they need to do some optimisation. There's a natural human tendency to think that everybody should be doing what we're doing. In reality, they don't have to, because we're doing that; they need to be doing something else.
Quidnam Latine loqui modo coepi?
http://slashdot.org/comments.pl?sid=1499856&cid=30673056
"I really can't blame them since they were probably told in college that logic and reason are superior to evidence and observation." Both are superior to statistics.
Lies, damned lies and statistics. Us programmers are too busy dealing with the first two to ever reach the third..
Bridges that fail, fail predictably. It is usually just a question of collecting some data.
Good luck demonstaring that an aircraft instrument landing system is fit for purpose, then. Semiconductors might fail predictably when they're being observed under an electron microscope, but it's a bit harder in a hut by the side of an airfield.
Quidnam Latine loqui modo coepi?
" I have taken a bunch of math classes, studied statistics in grad school, learned the R language, and read tons of books on the subject. Despite all of this I'm not at all confident in my understanding of such a vast topic." I'm presented with 1 of 2 scenarios. Either he is smart and I should not bother studying statistics because it is vast and complicated and should only do research on a as needed basis. Or He is stupid. And I should just ignore the guy completely.
You probably still think I am a lunatic, but hear me out.
You don't qualify as a lunatic; just as someone who has no idea of what he's talking about. Absolutely no idea. Your post, my friend, is so full of ideas you obviously misunderstood that I won't even attempt to make a list.
And yes, I do statistics for a living.
ditto!
Please mod me 1 or troll. It's where the truth is these days, even on Slashdot. Beware the power of moderators everywh
So, since so many people don't seem to want to actually read Zed's stuff -- and I honestly don't blame you -- I'll try to summarize:
Eventually, every major science adopted an empiricist view of the world. Except Computer Science of course.
He tends to bitch a lot about computer scientists. I'm just starting a CS degree, and there is a Statistics class in the curriculum. Is he working with people with good degrees, people from a technical college with a "programming" degree, people from a diploma mill, or high school students with no degree at all?
Of course, he seems to be implying it's everyone, and doing so in a typically Zed-like way.
"All you need to do is run that test [insert power-of-ten] times and then do an average." Usually the power-of-ten is 1000...
I don't know that I've ever heard that particular statement. But it's a good point:
How do you know that 1000 is the correct number of iterations to improve the power of the experiment?
Generally because it was probably closer to a million, so I'm erring on the side of taking more, rather than fewer, measurements. But without careful consideration, I could be way off.
How are you performing the samplings?
I think this is vastly less important than how you are dealing with the data, but it is also a good point. For example, his complaint is that an average isn't enough; with detailed enough logging, he could easily go back into my data and figure out min, max, standard deviation, histograms...
How do you know that 1000 is enough to get the process into a steady state after the ramp-up period?
Not a huge deal -- the "steady state" will almost certainly be faster than the "ramp-up" period. Worst case, I'm over-optimizing.
What will you do if the 1000 tests takes 10 hours?
Either ctrl+c, or try it 10 times.
How does 1000 sequential requests help you determine the performance under load?
Very good point here. It's still a useful statistic, but you still need to measure things like 1000 simultaneous requests, not just 1000 all in sequence.
On the other hand, if your performance is acceptable with them all in sequence, you could just run it through something like Event Machine, so it's all sequential on production, too.
The most troubling problem with these single number “averages” is that there’s two common averages and that without some form of range or variance error they are useless. If you take a look at the previous graphs you can see visually why this is a problem. Two averages can be the same, but hide massive differences in behavior...
So yes, always make sure you can record enough statistics so that someone else can come along and use your data to give you something meaningful.
The moral of the story is that if you give an average without standard deviations then you’re totally missing the entire point of even trying to measure something. A major goal of measurement is to develop a succinct and accurate picture of what’s going on...
It doesn't have to be statistically accurate. It just has to be close enough.
Ah, confounding. The most difficult thing to explain to a programmer, yet the most elementary part of all scientific experimentation. It’s pretty simple: If you want to measure something, then don’t measure other shit.
This is both a very good and a very bad idea. It ties into the peeve he had before -- ramp-up time. For example:
If we want to take one single line of code and test it then we can. If we want to only verify one single query on a database then what’s stopping us?
What's stopping us is that our applications don't actually work like that.
Don't thank God, thank a doctor!
Best. Troll. Ever.
Yes, I do think you are a lunatic.
Thanks, I am honored. /.ers into it too...
I actually have degrees i mathematics, and I have a sister with a ph.d. in statistics. We have had this discussion most Yules we get together, and it is fun to get some
don't cut it off www.mgmbill.org
"... since they were probably told in college that logic and reason are superior to evidence and observation.'"
Oh, so they were taught Bayesian rather than Frequentist statistics?
"Statisticians need to learn programming or I will kill them all." - by halivar (535827) on Saturday January 09, @06:43PM (#30710618) Homepage
Well put, Halvar! Now, I'll add to it, as I have backgrounds in both areas he "bitches here" about.
First of all:
I'm in possession of degrees from both the business world (where I took STAT 1 & STAT 2 & "aced" both w/ A grades no less) & also Comp. Sci. & CIS concentration/minor (where you get exposure to a good deal of "higher mathematics" such as Calculus, & Discrete Math to name only a couple possibles)...
LOL! Man... I "just loved" (not) his "logic & reasoning is inferior to evidence & observation"...
(Especially since I know 1 VERY important thing: That stat teaches you 1 extremely IMPORTANT concept: It's ALL BASED ON SAMPLE SETS...)
As to "sample sets"? Well, those are USUALLY either:
----
1.) EASILY SKEWED (as in "4/5 dentists chew trident", oh "sure, sure", especially when they're on the corporate payroll (or paid off to say so by said corporation so their "evidence & observation looks good")
and
2.) IS THE SAMPLE SET LARGE & COMPREHENSIVE ENOUGH? (most?? Most are not, period)...
----
Simply because you cannot:
A.) Sample EVERYONE
B.) Nor can you judge the veracity & accuracy of who you are sampling!
----
E.G. #1 - Let's say I had a poll question of "Are Democrats better than Republicans?" & I sampled from a PRIMARILY REPUBLICAN AREA - So, that all "said & aside"??
What kind of answers do you think I'd get???
Would THAT be a "good/fair & representative sample set"????
Answer = Hell no!
Math people sometimes make me laugh... especially when they *THINK* they "know it all".
Lief's a BALANCE people, & there are very few "absolutes", because people are not "binary". Human beings have a LOT of "shades of grey" (or, is it "gray"?? Inquiring minds, want to know, lol!)
APK
P.S.=> Personally - I feel that life's REAL answers & REAL problems, in my estimation & opinion, aren't going to even be answered by "hard sciences" alone...
I actually tend to think that the REAL ANSWERS (for the REAL problems) will come from philosophers really!
(E.G. #2 - The serious questions to answer, like "why is man unjust to man" for example).
Yes, THAT coming from me may sound weird, especially coming from someone with fairly extensive classical education in the business sciences & computer sciences here in myself, but I do hold to that (and, all the math that comes with them like STATS, CALC, DISCRETE MATH, etc. et al, from the 'hard sciences'? They're JUST TOOLS that others should definitely use, but not "base all" on them, either, because they too can be misused, as in the examples above I note from stats itself))... apk
Stats before calculus are just memorize and regurgitate.
Take stats as an undergrad but after you finish calculus so you have grounding to understand.
Not just puke formulas back out onto an exam paper.
John McAfee 'It was like that time I hired that Bangkok prostitute; to do my taxes, while I fucked my accountant'
No, Logic and Reason are superior to Cubase.
It's a music joke, laugh.
From his complaints, I can tell knowledge isn't the real issue. Testing performance takes a huge amount of time. You need to simulate other programs running, multiple users and make sure the test matches what real users might do. Generally, this requires writing completely independent test programs and charting the logging from them. People just don't want to go to that kind of effort. It can take weeks just to create proper tests for complex programs like web servers.
this guy's an idiot. he admits to not knowing the subject matter well but still wants to chastise programmers for not being experts?!! that's his first epic fail, his 2nd is that programmers aren't meant to be experts in every area, only at programming. people that have double degree's and years of experience in a field are the only ones who should be, and they will be in lead roles.p his 3rd fail is how he makes his arguement, it reminds me of a child throwing itself on it's back and kicking it's legs till it gets it's own way.
If you mod me down, I will become more powerful than you can imagine....
The use of statistics is a means to an end that never ends. It has its uses in specific situations, and programmers trying to reach these ends in those specific situations would be well-off to know statistics? OK, I agree. If you are programming a data-mining application, then knowledge of probability and statistics seems pretty important. If you are programming a plane to land automatically on a runway, or a robot to place a chip on a board, then I want precision, not probability. (Although precision is probabilistic in itself.)
What Zed is describing is a situation where statistics could greatly improve the performance of the whole system, and he looks to be right. And that may be the real problem: He's more committed to being right than to resolving the problem.
I would say this is more a "people problem" than a programming problem. Placing blame, telling people they are ignorant, hostile language and the like are not leadership qualities.
There is another aspect here that interests me; the type of programming methodology. If this type of project were approached as a monolithic project, the scope, means and tools would be apparent before the project got to the argument stage. In an "agile" environment, the lack of pre-defined methodology would show up as part of the tweaking/improvement process. Picking the right method might be very important to alleviating the problem of the project with the "long tail" (i.e., the project that seems almost finished but there are a million little things to finish to make it deliverable).
"The mind works quicker than you think!"
Given your exposé of the facts on Slashdot, and the way you describe your colleagues and your own understanding of stats, I would say there is a 90% chance you are wrong and they are right. Or maybe 95%.
I've been doing J2EE apps for 10 years and now that we are sending a rocket to mars on our next project, I'm so sorry I didn't spend my whole life learning statistics.
" I've been studying it for years and years and still don't think I know anything."
Excatly, dumbass.
The first rule of programmers: whatever is the most expeditious path to the most usable solution is the one a programmer will take. The great skill a programmer has is the ability to assimilate and apply new information in as short a span of time as possible. If it takes years in order to not use and apply something, you can forget about a programmer ever bothering.
I scream. You scream. I assume that means we're both acquainted with the problem. We proceed.
I can vouch for this. You might think AC just spends all his time on /. but the reality is that he's a real big-shot who can afford to make ridiculous claims.
Always back up, never back down. ---- Think you're cool 'cos your uid is prime? Take mine, modulo the one digit integers
This is a vast right wing conspiracy backed by Fox news "Fair and Balanced".
is one half mental.
of course that explains why 90% of all programs written are CRUD.
-with apologies to Yogi Berra, Theodore Sturgeon, and a 20% apology, as a matter of principle, to a guy called Pareto.
Where are we going and why are we in a handbasket?
Despite all of this I'm not at all confident in my understanding of such a vast topic.
Some people are a little slow, but stick to it, you'll get there eventually.
The question of whether a computer can think is no more interesting than the question of whether a submarine can swim.
This reads a bit like the thread on the college sysadmins running the shop. Think: along the lines of over-education and not enough experience coloring one's view of the situation. See also: when you've got a hammer, everything looks like a nail.
I'd say odds are that, with someone (anyone) who's highly educated in a specific field, they tend to try to apply that discipline to everything in their lives. The welder who has metal tables and chairs, the woodworker with an oak-everything house, and the mechanic with a V8 lawn mower/snow blower are all good examples of this. Managers who think something is a "morale problem" (and not a management one) or programmers/geeks who see a social problem as one that can be fixed with computing are also examples of this.
This doesn't necessarily mean these specialized-discipline people are necessarily wrong, but it does mean they're contentious and self-righteous assholes. Statistics might help. A wireless computer in your fridge might help. So might a V8 lawn mower (that'd be fucking cool!). But chances are such things are impractical, expensive, and/or coming from an over-extension of assumption.
And sometimes, a gut feeling is as good as (or better than) a well-reasoned and thoroughly informed opinion.
Life's a crap shoot. Sometimes you can't reduce everything to numbers.
~/ssh slashdot.org ssh: connect to host slashdot.org port 22: too many beers
The Zed Effect: Whether you're right or wrong people will disagree with you just to piss you off.
Lief's a BALANCE people, & there are very few "absolutes", because people are not "binary". Human beings have a LOT of "shades of grey" (or, is it "gray"?? Inquiring minds, want to know, lol!)
The answer to this important question is grey. I read it in a book so it has to be true
Before computers stats involved using parametric tests (t-tests, anova, etc) which made assumptions like "the data comes from an underlying normal distribution". BTW, in stats terms "normal" mean "Gaussian".
Now, with cheap and fast computers, we can actually compute the confidence intervals non-parametrically through permutation tests and bootstrapping without assuming anything about underlying distributions. In most cases, this non-parametric test is the "right thing to do". Most of the time, the results are the same as using a parametric test.
However, a HUGE disaster in empirical science has been the problem of multiple comparisons. With computers it is so easy to compute correlations and significance tests between every possible slice of your data set. Many "scientists" don't have good statistical knowledge and pray at the alter of "p < 0.05". They don't know about or understand the problem of multiple comparisons. So they do 20 tests, find one that comes out p0.05 and write a paper about it. They don't get that if you do 20 tests you are very very very likely to find one that come out p < 0.05.
Anyone who has access to excel or matlab can do this little experiment.
samp=50 normally distributed random numbers.
for x=1:100
test=50 normally distributed random numbers (mean=0, var=1);
sig(x)=ttest(samp,test);
end
now look at the sig vector. OMG, 5% of the tests came out significant!!!
Now you are writing a paper all about how x is linked to y. But you are essentially throwing dice and then writing a paper about why it came up '3-3'.
1 1 2 3 5 8 13 21 34 55 89 144 233 377 610 987 1597 2584 4181 6765
Hm...
Statisticians are like designers....they should stick to designing(or statistics as it were).
IE do what they are good at. At my work we hand off these parts as modules. Designers push back a form design. The statistician pushes back some algorithms writen in a high level language. I really do treat them like library calls.
In post Patriot Act America, the library books scan you.
No, not necessarily. Running "time cp file.dat copy", with a file.dat of 88MB takes 0.083 seconds on my computer.
Would your conclusion then be that my computer has a disk capable of copying files at 1060MB/s? (It should be even faster when it ramps up!)
That would be complete nonsense of course. What happens is that the entire file goes into the write cache, cp returns almost immediately, and the kernel writes the 88MB over several seconds on the background.
If I copied a DVD image instead, it'd take a much longer time, and the "size/time" would be much closer to reality, because the file wouldn't fit in the cache.
So here you have an example where the steady state is much slower than the rampup, and where measuring too little would lead you to believe there's no performance issue at all, even if the disk is dog slow.
In practice, statistics is an attempt to quantify messy, uncertain events into a figure. We can even measure the extent to which this works, roughly speaking. Your hard drive has a rough time-to-failure, based on analyses of the things that tend to go wrong in that system. Sure, any time it fails, it's not statistics that broke it; it's one of the kinds of problems captured in the statistical analysis. And sure, you could break it down further for disks and note that the controller has a different failure rate than some other component, just as a bridge has a number of possible failures. Problem is, for any of those, you could break it down further and get failure rates for subcomponents, regions, etc. So what? It's still useful to have statistical measures - the real world is complex, and statistics helps us capture things we otherwise couldn't.
Programmers (particularly but not only young programmers) might not like to acknowledge any field but their own has any depth ("Everything is simple! Just do it my way", hence Ron Paul/Ayn Rand fanboyism and all sorts of other stupidities) - I don't know if there's a lot we can do but hope they grow out of it (It took me awhile to do it, as did a number of people I knew when I was younger, but I made it out).
Basically, if your worldview doesn't wed empiricism and a reasonably flexible practical philosophy, your worldview is (if you err on the pro-logic end) too inflexible and you're going to miss out on standing on the shoulders of giants. Neither the logician nor the mystic understands the world.
For every problem, there is at least one solution that is simple, neat, and wrong.
What Spell Check? I didn't know I was writing a Spell. Is it a good or evil spell?
Damn it's evil. Now I've got to listen to da da da de - da da de bop all the time.
Mod me up/Mod me down: I wont frown as I've no crown
The reality is that a programmer who screws up the ODBC acronym probably makes less than the everyday Joe. So the challenge, offered by this 30-something everyday Joe, still stands.
Yup. Also, for a guy who claims to know so much about statistics and measurement, it's weird how he judges programmers so sweepingly on the sole basis of his anecdotal experiences.
Today's weirdness is tomorrow's reason why. -- Hunter S. Thompson
I'm majoring in MIS at a university where Statistics is a required core course for every major, including the computer programmers. All along, I didn't get why I have to take it. I am now, and hopefully will get through it. I'd like my degree.
I run Ubuntu skinned to look like a Mac on a PC. Go figure.
Statistics in it's purest sense is simply math. Very few people know very much about this.
Statistics in the wild is generally bullshit! You should not be able to get two equally qualified people the same data set and receive two different answers!
As for statistics for performance measurement? If you are doing something important than analyze worst case performance. Statistics doesn't come into play in this case.
logic and reason are the enemy of religion. the whole age of enlightenment and the demise of religion and the advent of scientific age has been moving on those two. and they have never stopped moving on their momentum up till now.
Read radical news here
Speaking as someone with postgraduate degree in pure math, I'll be the first to admit that the subject is very hard to really understand well. Statistics is founded on probability theory, which in turn is based on measure theory, which is based on generalized integral theory and mathematical analysis. It takes 4 - 6 years of continuous hard study to cover this material and really know it all. And only people who devote their professional life to it can do that.
At most one could hope that one develops as sense for high level statistics, but that also takes several years of exposure to concrete examples, since intuition often fails miserably when it comes to even discrete probability theory.
Statistics is really useful as a scientific/theoretic method of reasoning, but convincing business people or even practicing scientists with it is futile in my opinion.
As the island of our knowledge grows, so does the shore of our ignorance.
So I read through his article. Yes, the whole mindless rant. The conclusion that one should REALLY draw from it is: Zed Shaw is a douche with Asperger's who clearly feels like his own personal area of expertise is underappreciated. Hey Zed, get over it.
Down with the career politician! SUPPORT TERM LIMITS
I like how the first part of his Wikipedia article says "Zed A. Shaw is a troll" with four citations.
Well, it has never been successfully tested.
Your point does not hit the mark at all. It only takes a simple expected value measurement to bring statistics back into play. If every customer brings in 1 dollar but the jerk who hacks into your application will cost you 1 million dollars instead, then a 1 out of 10 billion chance is a chance to take. Statistics is a very valuable asset to any developer/systems architect.
And before you come up with an even more contrived example, I suggest you take a quick glance at fields such as decision theory, game theory or other utilitarian techniques to reassess your obvious lack of understanding of the subject.
This is a replacement signature.
not understanding a topic that even you are unwilling to acknowledge mastery of.
Personally, I think that little acknowledgment increases his credibility quite a bit. It suggests to me that he's actually spent some real time coming to grips not just with glossy overview you get in a high school or college course but with some of the devilish subtleties of actually using the stuff.
The funny thing about knowledge... the more it grows, the bigger you realize the frontier is. So, how good of a heuristic is apparent confidence?
Tweet, tweet.
And yes, I do statistics for a living.
Do you work with the statistics porn guy?
http://developers.slashdot.org/comments.pl?sid=1504756&cid=30710812
"Who is the Journal of Quantum Physics going to believe?" --Stephen Hawking
"I construct two sets of n=100 random samples from the normal distribution. Now, if I just take the average (mean or median) of these two sets they seem almost the same."
So its true. The n's justifies the means.
Today's vices may be tomorrow's virtues.
please tell me whether you would like to rely on decision theory, game theory or utilitarian techniques to handle life chances of your children or their sensitive private/critical information in a database.
Read radical news here
Well I can tell you that when I tell my boss that the project is 90% complete and I just have to finish the other 90% he, and every other SE I have said this to, knows exactly what I mean. This guy actually thinks that at times the sunset is a brilliant blue. He clearly doesn't get that how he perceives things is not the same as them actually being the way he perceives them, and so he freaks when smarter people than him don't care what he has to say. Lickily I learned from the available data I have that 100% of people named Zed Shaw want to kill me, so at least I have that going for me now ;-)
Guns don't kill people; Physics kills people! - John Lithgow as Dick Solomon on Third Rock From The Sun
Maybe Slashdot should have editors, so crap like this doesn't end up on the front page.
You forget to take into account that I'm way drunk on all the money that I made from writing the... uh... whatever the cunting fuckjizzle you kids are calling database apps these days.
If you were blocking sigs, you wouldn't have to read this.
Would your conclusion then be that my computer has a disk capable of copying files at 1060MB/s?
No, because you're not measuring disk at that point. That's confounding.
But it's a good point -- I suppose "ramp up" is a kind of confounding, anyway. I was just considering it mostly in terms like VM warm-up.
Don't thank God, thank a doctor!
Perhaps your reading skills are not so good. I was agreeing with *you*, and pointing out the lack of validity of the OP's generalisations.
Better luck being trolled next time.
Today's weirdness is tomorrow's reason why. -- Hunter S. Thompson
Or perhaps you aren't the AC I was replying to, but rather Zed? In which case:
You can happily go and suck a fuck for the breathtaking amount of swollen, tumoros ego and self-importance you're throwing about here. You do know that an "appeal to authority" is rather a logical fallacy, no? And you do realise that, even if the above list of positions and titles were valid in this argument, they are still anecdotal evidence, right? Your above diatribe contains nothing more compelling than a reactionary ad hominem attack of no argumental worth.
Today's weirdness is tomorrow's reason why. -- Hunter S. Thompson
Seriously. Seek help. Do you know the meaning of the word "Yup", right?
Seriously. SEEK HELP. You have some serious people and communication issues.
Today's weirdness is tomorrow's reason why. -- Hunter S. Thompson
i.e. Chart1.DataManipulator.Statistics.InverseFDistribution(.05, 3, 4)
See, that was easy!
But seriously, I have supported a fair amount of statistical analysis in life sciences. Most programmers deal with processes that run against each one of a series of things. IMHO statistics is more like report queries where you perform groupings based on features to find favorable conditions or data falling outside of expected norms.
Could I use a solid statistician to keep me from making errors? Sure. Do I need to overbearing 'keeper of the keys' telling me I'm wrong without offering any real help? Hell no
Wherever You Go, There You Are
You know, that particular citation has made me wonder in the past, but not enough to actually research it. So, I went off looking for more information and found it.
The statistic was generated from a July 1976 survey.
The sample group for this statistic was 1,200 dentists. These dentists were hand picked by the research company, probably with good reason.
They were asked, what advice would they give gum-chewing patients
1) sugared gum
2) sugarless gum
3) no gum at all.
Sugarless gum got 85% of the vote. Not terribly surprising. I'd be fairly confident that their time had been paid for, or at very least they were told "This survey is being done for Trident Sugarless Gum." That is only speculation, so hush up.
17/20 doesn't really sound very good. It just doesn't stick in your head. 4/5 is close enough, even though it reduces your answer to 80% (ahhh, a lie). Since these are marketing folks, I'm sure they pushed all kinds of values past focus groups, until "4 in 5" was accepted as most favorable.
As the link cites, they're fairly confident that the "sugared gum" answer got at least one response. There's always someone that'll take the obvious wrong answer. If you don't believe that, look at any Slashdot poll. :)
What they don't say is how many of the 1,200 samples were dropped. I'm sure there were non-responses, and they could have easily added any number of unfavorable answers in as non-responses. Of course, they couldn't have 100% in their favor, so they had to keep some.
Serious? Seriousness is well above my pay grade.
This looked familiar, then I remembered that I read this years ago.
http://haduken.com/board/viewtopic.php?t=934&sid=ccd988ac3fa9146e94124c1228c4ac35
..that you’re just too dumb.
Know nothing after year and years? So what’s the point then?
Sorry... I can think of several millions of more efficient, more useful and more fun things to do with my life.
I hear you, about people acting like they are experts, but actually knowing shit. Like someone having read a book about HTML, who now thinks he’s a cool programmer. Or someone who clicks together a default database front-end type application, and acts as if he could compete with someone who designs hard math algorithms in Haskell or writes an OS in C/Assembler.
But I think you put way more importance on statistics, than is needed for programming. Because it’s your lovechild (nothing wrong with that). We programmers need to be good programmers. There’s only so much time in a day, to keep up-to-date with all the crazy stuff going on in CS. There are little non-science jobs where you have to keep up so much. There’s simply no place for also becoming an expert in hardware design, graphics design, usability, physics, all the areas of mathematics, including statistics, etc, etc, etc.
If I need good statistics, I’ll hire you. As soon as you know that you know them. Because there is nothing more valuable, than someone who is in love with his work. Happy? :)
Any sufficiently advanced intelligence is indistinguishable from stupidity.
Look, programmers tend towards the egotistical at best most of the time. They like to argue, even about marginally different concepts. I've watched guys argue about things like for loops and while loops and ifs and switches so many times in my career that I can only try and block as much of that inanity out. When you approach developers by TELLING them how to do something using statistical analysis, you've got to first convince their supervisor/manager/etc. of the value of it and why it's better. THEN you approach them and tell them that's how you're doing it. Otherwise, you better believe they'll argue about that...everyone has their own way of doing things, and you can bet they don't care for someone else telling them that the way they've done things in the past is all wrong. The only way to make programmers learn is to do something first, have it become successful, and be able to demonstrate the value in doing things that way first. I've been on very, very few teams with developers who were constantly open to different ways of doing things. Very few colleges even bother to put emphasis on statistics...some will even let you dodge the course entirely and take an equivalent. CS and software engineering professors generally fall in line and focus on logic. Obviously, it's a comfort level thing, and you can't get through to people unless you can demonstratively prove your approach.
If you'd RTFAed, you'd have realized that Shaw isn't talking about quantifying programmers at all. Seriously, not one bit. Your whole... I don't know what the fuck it is... misses the point. And Shaw's point as well, which kind of just proves his.
Anyone who loves or hates any language, platform, or manufacturer, doesn't know what they're talking about.
so called statisticians too that have no idea what they are doing... They barely know how to define a proper sigma field so that they can use statistics on their sample set correctly.
Very few people really grasp it... maybe as bad as one per major stats bureau.
So it's not just programmers.
Not saying here that I know all of it but it sure is simple to poke hole in a lot of stuff.
... Isn't threatening to kill someone a crime in itself?
tihs isg mead fmro rcecydle tpyos
From what I've read, most of the responders here seem to have a poor grasp of what the field of statistics encompasses. Statistics is not just probability (in the form of flip a coin, choose a door, and poker hands), but can also be used to effectively design an experiment, and reduce the variation in a production line among other things. Personally, I find statistics to be rewarding field of study and that it is easily applicable in the real world. Just don't tell that to my classmates who stare at me as if I have sprouted extra appendages when I tell them I am not graduating with them because I'm extending my engineering degree with an option in statistics...
Programmers don't know statistics. Programmers don't know quantum mechanics.... Programmers don't know aerodynamics....
I wish I had mod points to give you...
They're not rejecting statistics as a field, they're rejecting his claimed expertise in it.
He's just as arrogantly claiming that he's right and they're wrong.
No he doesn't.
He claims that programmers need to understand statistics more. The people he is talking about are therefore not wrong - they are ignorant.
But that term is loaded with negative meaning, it's more accurate to say they are like a variable with named "statistics" with a value that has never been set. Basically, they don't know what they are missing.
It's like when programmers try to argue about how a language is bad when they've never used it. How would they know? Yet many without understanding of statistics are saying the same thing, they don't need to know any more.
I know enough to know statistics can be a valuable tool. Why would you not want another tool that could help you? The people who refuse do so are less than they could be (as a programmer).
"There is more worth loving than we have strength to love." - Brian Jay Stanley
Machine learning is the logical place to take a combined knowledge of programming and statistics. It's a much rarer skill and commands a much higher salary, plus you're doing the closest thing we currently have to predicting the future for a living - and you generally still get to code plenty.
In other words, statistical knowledge can be a significant career advantage in addition to enhancing development and debugging.
AC: Nah, this guy didn't screw up. He (LSD-OBS) replied to you (AC) because he (LSD-OBS) was agreeing with you (AC). That's why he said 'Yup.' Because he (LSD-OBS) was agreeing with you (AC). When replying to a post, many people (present company included) use the word 'you' to refer to the person they are replying to. Having exhausted the second-person, use of third-person pronouns (such as he, her, or it) are used to refer to third parties. In this case, LSD-OBS' use of the word 'he' indicated that he found the author's (Zed Shaw's) sweeping generalizations strange. Honestly, I am a little concerned that you are getting so worked up over this.
... somehow, I expect that my previous post will fall on deaf ears.
I think you just proved this guy's point! Holy Shit!
Over-the-top Response Guy! Giving "Over-the-Top Responses" since 1970.
You would be amazed how FEW samples you need with good sampling to get a good estimate,
Well, actually, I'm counting on that when I just use a "power of ten".
Sampling and results is a classic garbage in garbage out scenario. If you don't sample right your results are at best meaningless at worst they give you a completely wrong impression.
That's why it's important to record as much information as possible from each sample -- at the very least, we'd know whether it's garbage. For example:
If you wanted to know the average income of a household in the US you wouldn't just sample from people in Silicon valley just before the bust, if you did that it wouldn't matter what kind of tricks you did to your data your results would be bad.
Well, no, one obvious trick is to say, "Hey, all of this is from people in Silicon Valley just before the bust." The next obvious trick is to then combine those samples with the same people after the bust, and with other people elsewhere -- then you not only correct the error, but you get a sense of the difference between Silicon Valley and elsewhere.
My point here is that it's a hack for a programmer like me, who doesn't understand statistics (much), to make it easier to work with someone who does.
I believe the general principle here is called "data porn".
Don't thank God, thank a doctor!
I read this post a couple of years ago. Why is it just now making Slashdot? According to the wayback machine, this essay must have been written in May of 2006.
The word you are looking for is Densan.
I hear you, I do performance engineering of web based systems. The developers, the managers, the testers, the architects all have no clue. You are correct here.
However if you can not present your "theory" of how to do something in a dumbed down enough format then who cares. Because the pretty graph is pointless. It will be mis-interpreted, mis-understood, and mis-used.
All the stats theory on the planet will not get you passed the dumb manager or developer. don't loose sleep of this. There is no point. Simply find metrics in your analysis procedure that do mean something to these people. They may not be the total picture but they are something. Build a reputation for being correct by starting with simple things. You are always going to but heads with a know it all developer / architect / manager. Fine let them go off and waste money and time. They will be found out as morons in time. You do your thing and simply become the guy to ask about performance and how to do this.
Being understated and consistently showing above average results for your work is how you will rise up. Being and A-hole about it is not going to help anyone. As a matter of fact I would can your butt for being a D#ck.
You can find a reason why a programmer needs to learn anything and everything - but that's not practical. I have no qualms about hiring a statistician for special programming work - any one worth their weight is somewhat familiar with tools and languages. As a programmer I'd rather find a reason for: Why Statisticians Need To Learn Programming! The statistician has much less to learn.
...Zed Shaw is a cranky, irrelevant whiner 96.3% of the time, at least according to the lambda standard deviation of the probability factor. Or so the graph shows, when enough data points are confabulated by the denominator of the sigma variation. And he thinks HE knows statistics.
This is a hacked account, for which the owner can not be held responsible.
Degrees or Degree?
and how does your sisters education reflect on you? She's the stats person not you.
Am I the only one who found that article hilarious?
A 6'2" "Good Looking" graduate who's extensive research in programmers has discovered that all males are inumerate neanderthals and only women really understand him.
Sigh. He's so sensitive. :-)
If only there was some other profession where people were trained in test coverage and such. We could call them "testers". Maybe I'll patent that idea.
Is the fact that most people program software by theories and think that they will get best performance when they apply their pet theories to a development project.
But what he really is saying is that in order to verify that the solution actually works it's also important to measure how well it works and time each stage in a process. That can actually yield some very surprising results and reveal that you lose a kiloton of performance on something that you never expected to be a problem.
I have several times encountered that kind of problems - network lag, missing database indexes, stupid compiler, horrible third party database libraries, slow disks... All revealed by timing the process.
So it's actually only part of the statistics process - the part where it comes to sampling data and understand it. There is often no need to do standard deviations and things like that when analyzing a software package. Many performance improvements are better than 10% when you tune your solution, rather you can get a 10 times improvement on some operation. But of course there are those that are small too, but those are usually not worth the effort.
And sampling of data can be done with things as simple as print statements or by using a package like Purify Plus.
And no - Zed Shaw isn't a total jerk, that's wrong. But he is a pain in the ass for some people. Especially for project managers and programmers.
He is right about the importance of analyzing a software, but it's not really necessary to plow into the realm of standard deviation and small differences when it comes to analyzing software. But it may be a good knowledge to have when developing a software package since you may not be able to throw your data into Excel for further processing.
And you shall also beware about trying to optimizing too much because one optimization may actually result in worse performance somewhere else. Just check where it will be most efficient from the overall perspective.
If builders built buildings the way programmers wrote programs, then the first woodpecker would destroy civilization.
Each of the above 3 professionals have their own areas of expertise. And Statistics (such as needed in performance estimation or dimensioning of processing capacity) simply isn't part of the average software's engineer's background (let alone that of a code monkey). You wouldn't want a Statistician to code up a decent interpreter, right? I mean: just look at the R interpreter. How about letting a Mathematician design and code your GUI? No takers?
By the same token you wouldn't want a programmer to design a Markov Chain Monte Carlo simulation. That's because programmers know nothing about Markov chains, the length of startup periods, periodicity of a chain, absorbing states, or invariant distributions. Worse yet, they have no way of knowing if their code spouts nonsense or the right answer with a lot of noise. It's not their area of expertise. You also don't want a mere programmer set up a numerical approximation. I mean: just look at the jackasses that coded up the Patriot timer and made the most elementary mistake in the book of numerical analysis by using a floating-point value as a loop counter and allowed it to accumulate roundoff error. That's a mistake first-year undergraduate engineering and maths students make before they are marked down for it.
So what does that mean? Well, one approach would be to shout: "HECK Programmers Don't Know Jack About Statistics And Need To Be Educated In A Hurry". That's the approach the author of the article takes. I don't believe that's a very fruitful approach though.
Another approach (the one I prefer) is to note that some engineering projects are of necessity TEAM efforts. Where you have a project lead who knows where the problem areas are, who is qualified to solve them, and how the team effort must be managed.
And yes, that means that sometimes programmers get to work under the direction (as in "are told what to do") of a specialist like a Mechanical, Electrical, Chemical, or Civil Engineer. Or a Statistician or a Mathematician for that matter.
On the other hand those specialists needn't be heard when it comes to things like database design, semaphores, inter-process communication, communication protocols, pre- and post-conditions, latency, cache filling, access control and the need for encryption and suchlike.
Om still other aspects you may expect specialists and programmers to work together and talk to each other.
So, while the problems mentioned in the article are recognizable (and indeed well known), they don't necessarily mean that programmers should get educated. They should be part of a team, and be professional enough to realize that they are members of the team, not in charge of it.
Zed's a total asshole. No wonder the programmers don't like him and won't listen to him. Maybe if he spent some of that stats time working on people skills he'd find office life much more enjoyable.
All those moments will be lost in time, like tears in rain.
Zed is full of crap. At least in my CS undergraduate program, we were required to take a "performance analysis" class that answered basically all of Zed's questions, plus a whole lot more. Effectively, it covered basic statistics as applied to performance analysis, simulations, measurement techniques, and some basic queuing theory.
There are published CS papers that lack statistical validity - that's inexcusable. Anyone publishing a paper that deals with performance should either know enough statistics to publish a valid paper or have their paper reviewed by someone that does.
Expecting all programmers to understand statistics well is not reasonable. "Programmer" can include everything from someone who hacks PHP pages together for a living to someone who does research into new ML techniques or designs complex software systems. For the person hacking PHP pages together, statistical validity isn't a huge issue since the primary goals are getting a system that works and doing so quickly and with minimal cost.
I question their metrics and they try to back it up with lame attempts at statistical reasoning. I really can't blame them since they were probably told in college that logic and reason are superior to evidence and observation.
I work with a number of statisticians and I have the opposite problem. They look at the data, apply mathematical transforms to it, and come to a conclusion, whether that conclusion makes any sense or not. They make little attempt to reason that the data may flawed (which experiments often are), or does not really represent what we are trying to measure, or they are using the wrong statistic to summarize the effect. It is very frustrating.
So they might realise the whole house of cards that stack ranking and HR’s beloved PRM systems are is flawed and invalid.
What will you do if the 1000 tests takes 10 hours?
Either ctrl+c, or try it 10 times.
Why 10 times? Maybe 5 times is enough or at least 20 times is required?
It doesn't have to be statistically accurate. It just has to be close enough.
How do you know that you are close enough?
One can do a benchmark a couple of times to see whether the results are more or less the same. A more sophisticated approach is to measure the standard deviation as well. However there are situations where accuracy is critical. In that case one makes a distribution assumption (e.g. Normal distribution) and then a statistical estimator is used to give a confidence interval for the estimated parameter. I.e. the confidence that the parameter will be within that interval is 95%.
This reminds me greatly of my previous assignment, where I had to work with (yet another) "difficult user". He had a Ph.D in statistics and sounded a bit by Zed. He had also done some work in datamining and data warehouses, so he started our first conversation by declaring himself an expert in my field. Great start :)
Ofcourse, as it turned out he was just very frustrated with his colleagues because he couldn't explain his ideas. No surprise there: he tried to explain very advanced mathematics with formulas, to people who barely managed to get a highschool education. After I provided an interface between the parties involved (my CS study came with a course in probability calculus so I could actually understand what he was doing) things went pretty smooth from there on. My advice to this user when I left was "get a good communications training". He said his manager was saying the same for about a year now but now it was coming from me (a techie) he'd actually think about it :)
People who can communicate are paid lots of money. You can have all the skills, but if you can't access them, or combine them, you're not getting much use out of that expertise. Zed's article being a case in point.
Therefore, by the (faulty) logic you're using, you're just a cow with a keyboard - osu-neko (2604)
We did some work involving statistics to correctly report results, see http://www.itkovian.net/base/statistically-rigorous-java-performance-evaluation (OOPSLA 2007) and http://www.itkovian.net/base/java-performance-through-rigorous-replay-compilation (OOPSLA 2008).
I am the Shield Anvil. And I am not yet done.
Statistics are very important when testing a system. You really need to know (especially if the bug was intermittent) what the probability is of NOT seeing the error per test run iteration.
It's not good enough to say, "It happens one in ten times, so if I run it 11 times I will definitely see the bug if it's still there."
The probability of not seeing the bug per test is 9 in 10 i.e. 90% or 0.9. These probabilities multiply, so if you perform the experiment (do a test run) 10 times, the probability of NOT seeing the bug (with the unfixed code) is 0.9^10 i.e. 0.349 or about 35%.
Would you be confident with that?
If you wanted a 1% probability (0.01) of not seeing the bug (in the unfixed code) how many runs would it take? Well, do your logs.
0.01 = 0.9^x
x=43.7
So you would need to run the test 44 times to have a 99% confidence that you'd fixed the bug.
Stick Men
Zed fired off an angry post yesterday after noticing he was slashdotted. It looks like some sort of retaliation swing for the onslaught of pissed off programmers gunning for Zed. http://zedshaw.com/blog/2010-01-09.html
My first thought was is Zed on some heavy duty medication? He seriously has some sort of anger problem going on and a deep seeded hatred toward his idealized concept of the "programmer". Maybe a programmer made him feel bad so now he's got a vendetta. Programmers surely can be dicks. I know because I work with them, but Zed is coming off like a dick programmer times 1000. (I chose 1000 because it's a power of 10.)
If he wants programmers to listen to him and actually change their ways, why doesn't he go with the educator approach instead of going with the approach of flame the world, stomp my feet, and call everyone stupid until they pay attention to me? The best way to get someone to ignore everything you say is to call them an idiot jackass who can't remember anything after 2 minutes. They will kindly oblige by living up to your expectation.
This Zed character may be good at some things like stats but he's damned awful at communication and demonstrating tact. I wonder if he behaves this way on the job, because I would not want to work with such a caustic person. Maybe at work he keeps the anger under wraps and behaves like a great guy, but if I were his coworker I'd lose all respect for him after reading those 2 posts.
Camping on quad since 1996.
Check out your local weather forecast. "The normal high for today is..." But what's the standard deviation? If they tell you that the normal, or the average, is 15C and today's high is 25C - wow - that's way above normal. Must be global warming. Quick, send money to AlGore. But what if they also told you that the standard deviation for today is 12 degrees? Oh. Hmm. 25C ain't that significant. Cancel the cheque to Al.
Statistics are worse than meaningless if you don't understand how to use them correctly.
linquendum tondere
... I ran into a professor of statistics who said that computers were going to be a passing fad in his field.
To a Lisp hacker, XML is S-expressions in drag.
theory is what is needed, otherwise statistics does not mean much to anyone...
With probability theory one models, while statistics is used to estimate the parameters of a model.
I wrote a little script that simulated this competition and on 10000 runs, if I didn't switch I won the price 3295 times and if I did switch I won it 4997 times.
- Raynet --> .
Just because you are perfectly right ... doesn't mean you aren't a complete and total asshole.
As a reformed asshole myself I can tell you that condescendingly pointing out the failures of your colleagues will not get you what you want. Specifically (and I'm assuming here that your goal is the same as mine) getting your colleagues to stop acting like self-righteous fucktards. Most programmers are convinced they are geniuses. This is crucial to understand if you wish to work with them and wish to get them to do anything at all.
I am ostensibly in a senior role in my day job and I do find many things these other programmers do ... well ... fucktarded. That is they are beyond retarded since a retard would know they are a retard or at least not entertain the delusion of superiority that a fucktard does. No my friends we need to call them fucktards because they are fucking arrogant in their belief of superiority. So I can't tell these geniuses to do anything. Nope. Not at all.
You need to use psychology on these fucktards. What you need to do is something Socrates used to do with his little fucktards that he taught. Ask questions. Since the genius/fucktard seems to know so much start by asking leading questions that will do one of two things... it will lead the fucktard down a road that will show you both how stupid he is (and you can pretend they figured it out themselves they love to take credit). Or it will show you where you were wrong... and that you were the fucktard.
Remember we are after end results. So we put aside lesser things (like pride) in the search for a greater goal which should be better software and the ability to make more of it. If you can psychologically manipulate an army of fucktards you will become fucking powerful. Much more fucking powerful than you fucking are on your fucking own. I wish you good fucking luck as I can tell by the response to your post that you are a fucking powerful personality and will definitely lead your own army of fucktards one day.
Hopefully when we meet on the field we can be allies and not enemies.
[signature]
Statistics are important; it is highly unlikely that anyone with an MBA will know how or why, but they want them.
In fact, it is almost a certainty that any given MBA will either lack statistical expertise or will misapply it unthinkingly in a cook-book style. The pseudo-statistics behind Six Sigma comes immediately to mind.
I had repeated theoretical discussions with the four MBA experts who "trained" us (a group of six PhDs in Physics & Engineering doing R&D) in the ways of Six Sigma. There were problems with the statistical theory they presented right from the start - and they were clearly unaccustomed to being contradicted along the lines of "that's not right/applicable in this case, and here's why". For instance, they failed to acknowledge that non-Gaussian distributions could exist, then refused to accept that procedures should be adapted to the data if it was non-Gaussian. Next, they adamantly refused to believe that the 1.5 Z shift hypothesis was supported only by a few studies, all relying on a single dataset from the 1950s for die-based manufacture, and totally irrelevant to most other processes. The Six Sigma books all say "many studies" over decades support the Z shift hypothesis, but fail to cite them, and our MBA experts could not cite any such studies either. Thirdly, they refused to accept that an additional mode of variability (not in the Six Sigma beliefs) existed in processes with feedback (such as recycle lines or controllers). In many cases, this mode guarantees non-Gaussian variability in the process output.
Their advice was that to pass the course, we should ignore our knowledge of statistics (which they acknowledged was far better than theirs) and of process variability, and just "apply the documented methods". We did, and we all passed the course. Then we ignored the Six Sigma bogus statistics bullshit and got on with our jobs using proper statistics to analyze and solve problems in variability with the products we were developing.
MBAs seem to want statistics, but the vast majority appear to lack the training in how to generate proper statistics, or how to use them competently if someone else supplies them. Most MBAs appear to think the world is described adequately using Gaussian distributions, and a few "experts" know the Weibull distribution or the t-distribution. Other distribution types (Poisson, discrete/categorical, etc.) are totally foreign, and methods of inference beyond simple unconditional analyses are also quite alien to them.
I also understand that people who are good at it are rare.
Perhaps not as rare as you might think. But those who have some aptitude in statistics know enough to keep their mouths shut when the data tells them to. MBAs on the other hand, ignorant of their own ignorance, are as verbally promiscuous as politicians...
Those who can make you believe absurdities can make you commit atrocities. - Voltaire
Zed Shaw trolling? What a complete *fucking* surprise.
I'm a physicist, I know plenty of statistics. The kinds of statistics he's talking about are not hard. If you can do algebra, you can do things like calculate the standard deviation and variance of a set of measurements.
Was this rant really necessary? I run into people in physics who don't take care of these details. I find that a simple "can you put a standard deviation on that number?" or "can you repeat the experiment?" generally gets the job done. If you want to be more scientific, just start with those questions, and see where it takes you... you could even add "please" if you wanted to be nice. I find threatening people with death and belittling their intellect while talking about trivial calculations doesn't generate useful data.
To be fair, it sounds like Zed has been working as staff at a university. This has nothing to do with statistics, but it's probably the real reason he's in such a bad mood.
Sorry, Zed I don't need statistics to do my job. Zed jumped the shark years ago - isn't he the Rails guy? That is so 2005. This story is like having deja-vu of a bad hangover.
Just go away.
Leaving the author's lack of social skills aside, the powers-that-be in computer science education agree with him, at least for now. The Computer Science Accreditation Board lists a course in probability and statistics among its criteria (sorry, I couldn't find an online link to the latest criteria) and has for at least 20 years. I don't know how influential those criteria are outside the US (though I'd be curious, if any slashdotters can help me out), but here they are pretty important, especially for the vast majority of programs that are not at the top schools, and need the credibility that accreditation can bring them.
Not everyone is happy, though. At the 2005 OOPSLA there was a panel discussion where one thing they all could agree on was that the CS curriculum was way too mathematical. They favored something more like a software apprenticeship where "projects" where replaced with "products". That point of view does not appear to be in the ascendant in computer science yet, but it might catch on in the information science departments that are often found in business colleges.
Personally, I don't think the CS departments are likely to get less mathematical as long as there is strong demand for their graduates. There are certainly a lot of students who don't major in computer science because it is too mathematical for them, and I'm sure some of them wind up as programmers through some other route, and others find some other career. Moreover, I'd say that with one probability and statistics course that follows calculus, the students do get enough to "know what they don't know", which was what the author wanted.
1) the quality of your future coworkers
I base this on the quality of my past coworkers. I was probably lucky, though.
2) the quality of commonly held CS degrees
I'm at Iowa State University right now. It seems to be an exceptionally-good CS program. Depending on the kinds of friends I make here, I'll probably end up in a job with some of my classmates.
3) how much of their education you or anyone else remembers five to ten years after leaving college
The parts you use.
It's also much easier to re-learn something than to learn it from scratch -- thus, Zed could've said "brush up on your statistics", not "learn statistics".
Don't thank God, thank a doctor!
ummm, where are you coming from here?
Everything is complex. That's the basis of every libertarian ideology. Life is too complex for a group of politicians or 'experts' to manage.
As a result of this complexity, the reasonable thing to do is to allow people to try different approaches to solve their problems... hence looking down on things like central planning.
If you think you have a solution, you are free to prove to the world that it is correct. That is freedom... the freedom to do things to solve the problem.
The alternative is the belief that some group of experts and politicians can capture all the information in the world and formulate working policies to dictate how society should behave.
Their track record? Dismal... communism, fascism, corporatism, theocracy... They all seem to fail empirically. For one, it is rare to have such experts actually know everything. Secondly, you have to cound on the experts actually have 'good will' towards the populace and not becoming corrupt or obsessed with their own power and money. Again not a trivial task.
We agree that life is complex and problems are deep. A free society demands those with solutions implement them and prove they are the best... and people will gravitate to the best (or at least good enough) solutions. Think you have a better way to run a school? Open up the school and bring in students and show people that your way is better. That is freedom.
The alternative which is what we have now? Have a bunch of experts think they can devise the best education policy, implement it within the public school system where people are taxed even if they don't attend it.
Empirically it is shown to work. School choice for example is available in many countries and places. Society does not collapse (Sweden, Chile, Alberta, British Columbia...). Yet the 'experts' who actually tend to deny empirical evidence tend to go against it in favor of theoretical arguments that society will divide if our kids don't learn together...
I used to be a socialist. Until I looked at the empirical evidence. Now I favor freedom.
The people he is talking about are therefore not wrong - they are ignorant.
I'm sure this goes against everything you've been taught, but right and wrong do exist. Just because you don't know what the right answer is - maybe there's even no way you could know what the right answer is - doesn't make your answer right or even okay. It's much simpler than that. It's just plain wrong.
Dr. Gregory House
You misunderstand the alternative. Societies have been a mix of planning and autonomy for all of human civilisation, and *that* is what has worked well. It is not perfect, but it by-and-large works. Societies that overstress planning or autonomy have never been workable. No system in the world is lassiez-faire, nor is any system entirely planned, and all systems have their failures. It is not hard to find these for the systems that are closer to lassiez-faire, and you'd do this if you were really interested in a fair comparison.
The invisible hand, even to the extent that it supports the public good, is not always optimal. Often it doesn't even try to and is off optimising something else.
Experimentation is good, and certain amounts of competition can be worked into state structures to allow that. If there are better ways to run schools, we should find them and implement them in the public schools. We are, however, going to insist that the schools be public, that everyone pays for them, and that everyone goes to them. It's otherwise too easy for one person who earns privilege (to whatever extent the degree of that privilege is just is another question) turning it into a privilege passed, unearned, throughout many generations. Universal, public, mandatory, integrated schools help prevent that. They also help prevent racism by forcing people to rub shoulders, and they help prevent idiocy by preventing religious nuts from being the only people to educate their kids.
Formal freedoms are not the only ones worth considering - if you "allow" something in a system, but that same system effectively prevents you from enjoying it, then that allowance is very shallow. Having justice but having finances result in some people being unable to hire (any or a good) lawyer results in very shallow justice. Similarly with any other social good.
If you believe in the tangled libertarian notion of liberty as the only good, your philosophy might work. If you believe in any other goods, to cling tightly to libertarian traditions and hope to pick up reasonable amounts of these other goods will prove most unsatisfactory.
For every problem, there is at least one solution that is simple, neat, and wrong.
In a world where many programmers are lucky to even finish the project with working code (software projects have very high failure rates in the real world), performance tuning of the type where statistics would be useful is often an unaffordable luxury. Most programmers make a genuine effort to avoid the more obvious performance sinks with some knowledge of Big O Notation and known antipatterns, but in a world populated by demanding managers and slashed budgets that is really the best that most of us can do. If Zed wants programmers at his company to become experts on statistics and do detailed performance benchmarking then he can pay them himself for the privilege (hint: programmer cycles are vastly more expensive than processor cycles); otherwise he can, with all do respect, shove it.
He claims that programmers need to understand statistics more. The people he is talking about are therefore not wrong - they are ignorant.
And this applies to all programmers?
He's the one making generalisations based on anecdotal experiences, which is itself a poor practice in terms of statistics.
It's a perfectly fair point to say that many people need to understand statistics better (and it can be done without sounding like a snob), but there is no reason for him to target his rant at programmers. My degree was in mathematics, and I now work as a programmer in which I use mathematics - where do I fit into his box?
A programmer could just as easily write a pompous rant about "How statisticians need to understand computers better", based on a handful of anecdotes and generalisations.
I don't know why we're giving time to someone who's level of argument is "they dont know shit", and resorts to childish ad hominems of "their confidence in their lacking knowledge is only surpassed by their lack of confidence in their personal appearance".
Statisticians need to learn about logical fallacies or I will kill them!
That doesn't sound anything like a car. You must be new here.
I thought everyone on /. is using rock-paper-scissors-lizard-spock already.
From TFA:
Almost all of the queries performed great, except one query that had sub-second response on average, but a 60 second standard deviation!
Pause and reflect on this for a moment. The average is poor and occasionally it stuffs up so severely that the stddev is pulled out by sixty seconds.
I managed to reproduce this (mean of 1.07s, stddev of 58.4). 3000 results of 1e-30s, one of 3200s (almost 1 hour).
If you need statistics to intepret the above results then you have bigger problems.
If you ACTUALLY get the above results you don't complain about the outlier and get them to rework it. Thank $DEITY, time out at a nanosecond and re-request.
As someone who holds two B.S. degrees {computer engineering, computer science}, I take issue with the GP's statement. The typical CS student does not learn about transistor fanout, CMOS logic, VLSI, etc.
CS is derived from St. Turing and his universal machine. CE covers how to make (and use) one of those.
Because I am an genius and lazy and don't need to study much in order to get an A.
Until the third year when I almost failed a math course ;)
Excellence is an attitude.
Personally, I love spouting statistics, but those are the one based on logic such as "this won't matter in 99% of all circumstances". I admit that mathematical statistics just don't interest me as the stats I use don't need to be all that accurate. I do use them for real time protocol development, but for those, a cook book is good enough. No point learning the math on them. Takes too long and I don't gain enough from the effort to justify it. My math learning brain capacity is better spent focusing on differential equations and linear algebra. I don't have the brain cells left for a 3rd discipline :)
:)
I love programming, but I despise trying to implement algorithms written by math geeks. They're typically sloppy and depend heavily on background information that I just don't care about. Write some pseudo code instead of using 30 pages describing the variables in an equation. When I had to start working with wavelet transforms, I had to learn some weird french notation for math I've never seen before that looked like Polish not Greek. (and I mean polish the language, not making polish jokes)
I'm a strong believer that programmers should have at least better than generalized math skills, but I also believe that stats geeks and math geeks should be at least able to write in Matcad or R or something. Then at least a programmer can do something with it.
If a stat geek and a code geek are expected to work with one another, they should at least have some way of speaking with one another and I genuinely believe that the stat geek can learn to program enough to make an example a lot easier than a code geek can learn to read their math.
I work in a company made up entirely of developers who have learned that instead of saying "Hmm... nope, that's not my thing, cya!" they instead say "It's not my thing, let's see if we can sort it out though." we help each other out and we solve problems. If you happen to be a math or a stats geek, we'll work with you to try and understand the garble that you're attempting to communicate, but it'll take far more than just "here's the math, cya" because then we'll just interpret it however it seems to make sense to us. And I promise you, it'll be wrong
Teamwork solves these problems.
If I'm the good Monty Hall:
When the contestant has selected the donkey, I open another door and offer them the opportunity to switch. If they've picked the prize, I open that door.
If I'm the evil Monty Hall:
When the contestant has selected the Ferrari, I open another door and offer them the opportunity to switch. If they've picked the donkey, I open that door.
Have gnu, will travel.
I find tfa pretty clueless when it comes a real understanding on what is needed for performance testing and tweaking. A statistical analysis is nice, especially with monte carlo type analysis, like Bungie running Halo 3 on numerious xboxs simulating load and player interactions. However, I find that what is lacking with programmers is a basic understanding on the high levels of process analysis, such as network analysis, CPM, and PERT. Knowing a process has high levels of variance is nice, but not useful for understanding the why. Where is Zed's example of multivariant linear regression or ordered probit? Discussion on hypothesis testing? Anyone, anyone?
As a side note, Statistics in a Nutshell is the only book programmers really need on stats.
In God we trust, all others require data.
From TFA:
I never have this problem with female programmers. Maybe it’s because I’m tall (6’2”), or nicer to them, but they always speak rationally and are really keen to learn. If they disagree, they do so rationally and back up what they say. I think women are better programmers because they have less ego and are typically more interested in the gear rather than the pissing contest.
I'm also good looking and know a lot of statistics ladies, I really respect you and I think highly of you. If you would like some private statistics lessons call me at (123) 456-7890.
Smooth move, Zed Shaw, smooth move.
"I see undead people" Warcraft III - Necromancer
I suggest programmers to learn management also http://www.netmba.com/
I'd like to buy homeland for our 10 million people. http://twitter.com/mahadiga
I generally think in programming it's the exceptions that cause the problems. I usually only look at averages and maximums, however it must be said many performance problems are caused by a exponential increase in execution time with a linear increase in load/dataset size. I don't really know stats but it's pretty easy to see when this is the case. There are many things that stats will never predict, i.e. when you are going to hit a wall without an underlying knowledge of where the walls are and how close you are to them and what/how you move towards them. It's all pipes and data in the end. You should know what's going to break it (exceptions to your assumptions) and where your bottlenecks are, and what path is going to get followed in what situations. That can get tricky in database queries, say oracle, with stats determining your execution plan. How often does the full table scan in a loop seem to cause a query to never return? Google oracle stats execution plan. I guess it keeps DBAs in a job.
You are wrong even though you think you are right. The question is, what are the odds that "if 1 is heads, the other one is also" - not what are the odd that both will be heads.
The latter corresponds to the analysis given. The former (the question asked does not). In the actual question, the first two possibilities are either ruled out, because you've already stated that the first coin is 1, or, you are allowing for either coin to be one, then what are the odds of the other to be one as follows: 0 - 0 = Ruled out by the given 0 - 1 and 1 - 0 - ....
Woops! Never mind. I was reading the question incorrectly. I read "if one is heads" as "if the first is heads". Need to work on my readin comprehension, not my odds skilzzz.
Over-the-top Response Guy! Giving "Over-the-Top Responses" since 1970.