A Rubric for IT Analysis
Aredridel writes "Zed A. Shaw has an insightful article on how analyses of software systems should be performed, and how they're often done wrong. It should be required reading for all IT journalists, and all readers of IT journals."
Would someone please run this rubric through the rubric and say how well it complies?
--
make install -not war
Even worse it works about as well as pricing soap at $1.95 instead of $2.00 to fool people into thinking it's cheaper.
I think $1.95 is cheaper, isn't it?
Better run it through the rubric...
8. Paper does not use the above terms correctly or calculates them incorrectly. Without the data you won't know the second part, but these 6 statistical concepts are very simple to calculate and get right.
I think it's broken.
Something for Timothy
When you have read that article, go and buy a copy of the 1954 classic How to Lie with Statistics by Darrell Huff, ISBN 0393310728.
The author of the rubric "carefully" lists examples of things that ought to be seen -- and then carefully extracts two graphs from a long analysis in order to "prove" his claim. Never mind that the things he argues one should look for would be embedded in the materials and metods or results section, not the conclusion or the paper summary. Never mind, either, that his objections are bogus (red versus black ink? Uh, wait -- if the winning system had been shown in red, it would have conveyed how burningly fast the system was.)
Oh, wait -- it's somewhich which shows that samba 3.0 is slower than w2k3. Never mind. This is slashdot, so the ditors have gotta troll for ad views.
The usage of red and green determines the meaning, if the higher statistic was red, it wouldn't be the "bad" effect he is stating.
... and the graphs aren't necessarily misleading in the aspect of spacing, the graph seems to be trying to show the ratio of difference, not the difference amount. ... aside from what looks like a bad example of bad examples... there are some good points in the article...
The statement that green is good, red is bad, is not really true. Red is an attention getter, Green is an easy, inobtrusive color (relaxing, generally).
While it is easy enough to make the leap that 'red' is bad because red is often an 'alert' color, the reason red is an alert color is because it is an attention getter, not because it means bad.
Why else do you think so many people drive red sports cars? If red was bad, why wouldn't they drive green ones?
MoM++ - A Classic Expanded - [Master of Magic 1.5]
http://mompp.sourceforge.net/
I hate it when people lie with statistics. Even the BBC did it recently when they were trying to justify 1 million GBP on their new weather program. They said 7/10 people either liked the new system the same as the old one or preferred the new one. Perhaps they could also have said 9/10 liked the new system the same as the old one or preferred the old one? Who knows when you lump categories together like that without providing the raw data?
What he's stating seems rather obvious, but then again I might not be his target audience. One thing he seems to be missing is: who is paying for the test and is the one in whose favour the test turns out to be also the one who paid for it?
see a Text Widget
I just sign my evaluations. My regular readers can get used to my way of doing things, and benchmark me :) Like, if this idiot (me) finds it easy to use, it's probably underfeatured ...
Apart from that remark, I think the linked articel is well-meaning but total BS.
This is not a signature.
Aredridel, is that you?!? LOL...
Who am I to say that this is a basic set of requirements for an analysis study?
A very good question indeed, my dear friend.
You have to be kidding me. The last three jobs I had, I got dinged if I did analysis of any sort. Most software developers skipped the analysis and design part, because Managers wanted them to start coding on the first day and not stop until it was ready for QA to look it over. I called it "Seat of your pants" programming. Often I had to fix problems in other developers' programs and they did not have proper documentation, source code comments, naming conventions, flow charts, or any sort of documentation to help me figure it out at all.
Requirements kept comming in, and they changed daily. Often what I started writing at 8am, was useless by 4:45PM when the requirements changed on-the-fly and adhoc and required me to program something else to replace it before I went home for the night. While I could have waited until the requirements were locked in, there was no such thing as that, any idea anyone had was instantly accepted by a manager and given to me to put into the program. Combo boxes became Listvues, then combo boxes again, then a text box, and then a Listvue again, and then a combo box. Database names for tables and columns were always changed, and of the thousands of SQL Queries in my programs that accessed them, they needed to be changed as well.
Management didn't think anything of it, and kept their "We cannot say no to anyone, no matter how insane the request" attutude.
Analysis, hooo haaaa! Yeah I wish! Corporate America apparently does not believe in it anymore.
Remember, Slashdot does not have a -1 disagree moderation, and no, troll, flamebait, and overrated are not substitutes.
At any rate, I disagree with his complaints about graphs. Choosing an appropriate y-axis scale obviously changes the impact of the presentation, but that hardly makes one scale more intrinsically "good" than another. In this case, Samba and Windows are compared on two different servers. One is twice as fast as the other, the software packages have similar relative performance and the graphs accurately reflect that. (Note: I'm only talking about the graphs. I have no idea about the technical merits of the underlying test, and don't care.)
Certainly, this is pretty thin gruel for the epitome of dishonest presentation. Let alone his complaints about the ethics of making the Linux curves red!
Regarding the x-axis -- I agree in the sense that not using round numbers is aesthetically unpleasing. (And, for heaven's sake, people -- use the same number of decimal places in each label! I grit my teeth whenever I see 0, 0.5, 1, 1.5, ...) But there's hardly anything sleazy about kludgy spacing. It makes zero difference to the message.
What I'm listening to now on Pandora...
Unfortunately there are too few people out there with scientific training, that especially includes many journalists and management, attempting to get them to apply some rigour is a futile task, especially when they have to present to an audience with no scientific training.
Standard deviations, measurement errors are for engineers. The papers you get from companies are sales tools nothing more. Simply treat them with the scepticism (caveat emptor) they deserve and try $WHATEVER yourself with the your systems and the money you were planning to spend.
Deleted
Well, no idiot. When graphed properly, they look the same. Both tests show an absolutely compareable performance ratio. What does it matter that the faster machine runs both OSses faster? How does this skew anything? Is the concept of relative speed increases a new concept for the creator of the article?
A REAL loaded graph would surpress the y-axis or something to push the lower graph further down, or to skew the proportions.
Man, is today really shit article day on slashdot?
HI O WISE PRINCE. WHT TOOK U SO DAM LONG?
What's a rubric (in my best Bill Cosby voice)
GETPKG - Package Management for Slackware
Why won't they just stick to the basics?
Fire it up on your intel based PC, running windows. If it doesn't work at all, mark it down for requiring non-standard hardware.
In the free world the media isn't government run; the government is media run.
... the data points the author criticizes are not data points but line decorations for black and white readability.
Would somebody please write a rubric for slashdot, to help it realize that posting blog crap that is biased and generally full of inaccuracies and problems in testing isn't news?
No, really. That's how it started, usually the title of a section, paragraph or similar.
Obviously the bit of red text contained something someone thought was important so eventually the word came to mean an important rule or important passage. These days it means an important set of rules.
http://www.dictionary.com/
htttp://www.m-w.com/
http://www.askoxford.com/?view=uk
Deleted
I think your post should be +5, as will anyone whose been there before. I started working for smaller companies, i.e., not corporations, and ran into the same problem. In the case of small companies, the problem comes from paying for analysis and documentation(specifically, the part of the quote that includes that).
You always seem to have some bootlicking know-nothing-but-thinks-he-knows-everything jackass who thinks he's on to your "scheme" advising the owner that it's a waste of money to plan things out in depth and document. Then you end up in the same situation. Well, I used to, I realized that I can be more selective in who I do business with so whenever I meet with a business and get a sense it will end up like that, I move on. You can't do a good job in that environment unless you get lucky and really, I'm sick of advising people, being ignored because of apparent up front costs, then getting blamed by the little tards after costs go through the roof because they can't commit to a roadmap, as they refuse to pay for one being drawn up.
Where is He-Nerd when you need him? LOL.
Who commissioned the study.
It's inevitably the company who comes out smelling like a rose, but it's never stated up-front.
disclaimer:
I'm not a member of the anything-but-Microsoft crowd. Microsoft products supply my income and have done so since I left the mainframe market fifteen years ago.
I will say I take no pleasure in seeing research results showing a Windows-based product to be exponentially superior to another product (e.g. Linux) without a statement as to what caused the study to be made: who commissioned it? Later, we find out after the headlines read "Study Says Windows Beats the Crap Out of Linux" the project was funded by Microsoft.
In the interest of fairness and honesty, would Microsoft permitted that study to see the light of day if it didn't go in their favor? Look at the vendors, especially hardware, who perform tweaking to achieve special ratings during benchmarks.
I look at these situations as a problem-solver, not a statistician in the sense of a statistician making a hypothesis then determining if it's true or false. A problem-solver, however, in analyzing the data and letting the chips fall where they may - objectively - regardless of any other influence.
A better way to explain this is an example: the various groups responsible for placing traffic signals & controlling traffic flow put down the hose meters where they think the data should be collected in order to justify where they want the signals to be put. My question to those people, and the people on the city council, or any boards is this:
Suppose we collected incredible volumes of data. Subsequent analysis would show the correct place to put a traffic signal is an intersection 2 miles outside of the city limits - almost in the countryside. Doing so would eliminate bumper-to-bumper traffic during the rush hour.
Would you do it?
You already know what their answer would be. They don't want to follow the data. They want to bend the data. ("We think this is a likely place for the light. Let's check the traffic flow to validate it.")
On the news: "so-and-so's stock doubled today." So it went from 6 to 12. Although they'd likely use "rose by 2" when it went from 98 to 100.
What happens when it goes from 2 to 4? (Doubled or Grew by 100%)? Grew by 2? Grew exponentially? And this is the local news, cranking out words without realizing what they are saying.
(although if you were to make a transcript of what they say and read it, it makes no sense whatsoever)
This article works so well considering that it follows immediatly after Performance of OpenOffice.org and MS Office. I wonder what this author (excellent article, btw) would make of the data from that other "IT Analysis" paper?
exceptio probat regulam in casibus non exceptis
Don't go imagining that when experimental data get into the hands of marketing people and business executives who have essentially no knowledge of the subject matter (let alone any knowledge of statistics), that the results you see aren't sometimes, quite simply, just a pack of lies. Remember that old adage about "If it looks too good to be true..."?
You said it before I could (I know this post is a little later, but I have other things to do)...
I'm glad I'm not the only one who thinks it odd that slashdot posts an article that pretty much bashes the previous entry.
Can we call this a dupe^(-1)?
Am I open minded towards open source, or closed minded towards closed source?
This is Zed A. Shaw here posting as AC since I'm too lazy to sign up.
Just a comment that I appreciate people's feedback and hope that the essay at least gets people talking about common criteria for analysis papers. Whether I'm right or wrong is no big deal to me. Hopefully folks will look at the list and possibly start doing their own (hopefully better) criteria.
And people might also be interested in my essay for an entertaining rant with the obnoxious title of "Programmers Need To Learn Statistics Or I Will Kill Them All". Have fun!
Thanks Aredridel, you're a peach!
Nevertheless, Zed's enumeration can be extremely valuable in helping a discerning reader (who doesn't already know it all!) to critically interpret graphs in order to decide what s/he may conclude. For example, if system X appears to outperforms system Y, but the difference may be within the (unpresented) deviation, one should not accept the assertion that X is superior. Instead, one may conclude that X may be better than or comparable to Y.
Zed's article can help some of us tell the difference between lies and truth. That's a good thing. The unfortunate weakness of the article is that the example is not particularly compelling. It simply doesn't illustrate the most important pitfalls.
With the scientific rigor proposed in the article, no PHP will be able to understand it. Without special "keywords", the PHPs will have to waste their precious time reading the whole document and getting all the details.
Let me propose my own list of what a successful IT article should have:
1. Name recognition. If it fails to mention a well known company, it's not worth reading. Good example: Microsoft vs. Linux. Bad example: Gentoo vs. Debian. Rule of thumb, if none of the companies/brands mentioned is traded on the stock market, the article is not worth reading.
2. Graphs and pictures. A picture is worth a thousand words. Execs don't have time to read. It's even better if underneath the graphs/pictures there are captions that draw the conclusion for the reader. In fact, things like legend and numbers can be left out as long as there is a caption. If numbers must be included, use $ as the unit of measurement.
3. Keywords. Lots of marketing keywords. Keywords allow overworked execs to zoom in on important bits and pieces. Details aren't important. Example: "Microsoft's next generation KILLER APP will SIX SIGMA all the competitors on the WEB SERVICE market. blah... blah... blah..."
For the rest of my guide to being a top notch IT consulting firm, please Paypal $500,000 to me and accept my Draconian DRM scheme. Remember, free information is communist!
EvilCON - Made Famous by
Lot and lots of good points in this article but a sloppy presentation. It appears that the standard deviation of this piece is probably quite wide. Maybe if the author knew he was going to be presented on Slashdot he would have taken more care in his work.
I mean he's correct about an improperly scaled graph conveying the wrong things. For example suppose I run some graphics test, call it BitchinFastMark 8002, on two graphics cards. One scores 10837, one scores 10921. Ok what that means is that these two cards are basically the same speed. That change is so small it might be experimental error. If I properly graph the results on a scale form like 0 to 12000, it'll be readily apparant that the two are almost identicle. However suppose I scale the graph from 10830 to 10930. Now one card is going to have a line much, much bigger than the other. At a glance, it would appear that one was much faster, thus the graph is misleading.
These graphs seem to be just fine. I looked at them pretty carefully and they seem to clearly indicate higher Windows performance at all larger numbers of clients. Both axes start at 0 and the scaling is linear. I don't see how they are at all misleading.
As you noted the ink claim is bogus too. So the red system means worse? Well ok, it is, according to these. Higher throughput is better, and the red on is the one with lower throughput. You could colour it pink with yellow spots if you want, doesn't change that it is the lesser performer, according to this.
According to the same charts he lambasts, the Microsoft configuration outperformed the Samba configuration over the entire range of client connections. 900 to 600mbps at peak, and 350 to 200mbps at peak. Aside from altering the data, how can one deliberately modify or misrepresent these results?
That looks convincing at any scale, regardless of "how the x axis is ticked". What x-axis tickmarks would you like, to make any difference at all? And would aligning the triangles and squares with the tickmarks make any difference whatever to the result?
The y axes are at different intervals because the ranges are different! If you have a range of 350 and tickmarks at 100 each, how goofy would that look? According to this chart, the Samba is still getting outperformed, any way you scale it.
I don't see any sleight-of-hand here. This may be an example of a "sloppy" chart, with misaligned triangles, but it certainly isn't an example of a misleading one. If there is any misrepresntation going on, it's in the author's (mis)use of these charts.
And his green and red subliminal message conspiracy theory? Is he serious?
A lot of good points made, especially on being able to reproduce the test results, etc.
But his example just indicates he has an axe to grind. The color bias thing is just bogus. His complaints about the readability of the graph seem to miss the point that graphs show trends, tables show individual points.
I've seen far worse graphs, where they cut out entire sections of the y-axis to show you a remarkable graph where 98 is a whole lot higher than 94 because they're not showing you 1-90.
Which serves a useful lesson. Just because you don't like the results of a study, doesn't mean the study was done badly.
There are two ways I read your post.
Way 1: Yes, I've seen this sort of thing before. At one company where I took over as Engineering head, the programming teams had failed to make decent forward progress. One reason is that I counted 26 people elsewhere in the company who were empowered to change the specs with a phone call (and some of them made a habit of doing so daily).
Way 2: Maybe the problem you're solving isn't really well enough defined for anyone to have written a spec up front. Maybe flipping back and forth between combos and listboxes and so on is all part of figuring that out. There are examples of successful software where prototypes were built 6 or 7 different ways before the "right" way to do things became obvious.
And, finally, your post scares me because it refers to "thousands of SQL queries in my programs." I hope you were just exaggerating to make your point, because otherwise that's a big danger signal. Those queries should be abstracted away in EJBs or the equivalent.
No choice but to take the paycheck, get the resume tuned up, and run for the hills as soon as possible. Unbelievable story.
This sort of thing makes me sick. I don't write a blog because I don't feel worthy of pushing my opinions on the world. but at least my opinions are better than this crap.