Why You Shouldn't Use Spreadsheets For Important Work
An anonymous reader writes "Computer science professor Daniel Lemire explains why spreadsheets shouldn't be used for important work, especially where dedicated software could do a better job. His post comes in response to evaluations of a new economics tome by Thomas Piketty, a book that is likely to be influential for years to come. Lemire writes, 'Unfortunately, like too many people, Piketty used spreadsheets instead of writing sane software. On the plus side, he published his code ... on the negative side, it appears that Piketty's code contains mistakes, fudging and other problems. ... Simply put, spreadsheets are good for quick and dirty work, but they are not designed for serious and reliable work. ... Spreadsheets make code review difficult. The code is hidden away in dozens if not hundreds of little cells If you are not reviewing your code carefully and if you make it difficult for others to review it, how do expect it to be reliable?'"
"I don't know how to use spread sheets properly."
Spreadsheets are like a blank piece of paper with grid squares. Which means you can put anything down, tied together with some formulae, and it's brilliant.
Which is also why it's complete pants - the "anything goes" really does mean that.
(That, and it will tend to break when you most rely on it)
"She's furniture with a pulse"
Dunno if that's a good or bad thing, though.
I've had to take over maintenance of a few "excel" based applications. Never. Again.
If you were me, you'd be good lookin'. - six string samurai
So what's the alternative? There are no good and easy to use software packages to create simple data-intensive apps. The closest alternative was VB6 and if I had to chose between it and Excel, I'd choose Excel any day of the week.
ugh...so anger! always with the nomenclature distinctions...this is a stupid approach to a real problem
a spreadsheet is a computer program
that's it...
to criticize the act of entering data and performing computations on that data using computer software is the height of ignorance
I don't know if he's right or not, but this guy's real criticizm, once you fight through his ignorance of the issue is that in his view Pickety didn't show enough of how he got his figures...or more accurately, the TFA author had to look at the spreasheet cell to see what formula it used (gasp!)
so he probably doesn't know how to use the interface of a spreadsheet very well, which makes the act of checking a formula tedious...
then he writes some dumbass article inventing a problem to vent his frustration and reinforce his self-image...
all the while missing the real problem with economics "research" (not Pickety but others do this...) it's called "P-hacking"
P-hacking is the problem in social science/economics research, not using 'spreadsheets'
gah!
Thank you Dave Raggett
It's not "spreadsheets shouldn't be used for important work", it's "spreadsheets should not be used for work that's not suitable for spreadsheets". Tools for the job, and all that.
Just because you're paranoid doesn't mean there isn't an invisible demon about to eat your face
If the inability to code review spreadsheets was a real issue, it wouldn't be too hard to convert spreadsheet functions into a functional language. For non-programmers, a spreadsheet lowers the barrier to entry. This allows people to do something useful and productive who couldn't do so otherwise. That's a good thing.
Another major issues with spreadsheets is that they don't handle data typing issues very well. For example, if you try to add a list of numbers, and somewhere in the list you have a number encoded as text, instead of throwing an error, it won't be included in the sum. Errors should never pass silently. Unless explicitly silenced.
You're doing it wrong.
Harrison's Postulate - "For every action there is an equal and opposite criticism"
The fact that Piketty's work describes a damning indictement of the USA's most cherished concept - free market capitalism - means that thousands of neo-liberal economists will pour over every single digit and operator in his spreadsheets looking for anything to negate the findings. If they can't find anything, they'll attack him. When you hear of character attacks against Piketty or some other diversionary tactic, you'll know his data is correct.
Tell that to the entire finance and insurance industry.
I figured this out twenty-mumble years ago.
I was doing data analysis in spreadsheets, and realized that I had no way to audit them.
The data and the analysis were all just...there...in the spreadsheet.
As soon as I got a grip on my data, I changed over to C programs that I could test, and document, and validate, and run at any time to demonstrate that input X generated output Y.
My father was a wise man, and a solid programmer. He liked Basic, because it was simple, and readable (in his environment the alternatives were mainly Assembler, Cobol, and RPG). Whenever people made fun of his love for Basic, and how it resulted in bad code, he always replied “there are no bad languages, just bad programmers.”
The problem isn't the spreadsheet. The problem is people building ugly models in it. Do they seriously think that if those models were written in C, Java or Perl they would have been magnitudes better? I doubt it; you're just transplanting bad habits onto a different platform.
Of course, if he'd used trained professionals to build his models in whatever language of choice the models would be better. If he'd used trained professionals to build his spreadsheet models they would have been better as well.
I've actually written a very limited version of this. My boss likes to prototype algorithms in Excel, but I need to cram them into a machine with instructions written in a scripting language. I first use a VBA tool to tokenize and collect the Excel formulas, then over to Python to do some conversions of a few built in fuctions, then run it through a symbolic algebra toolkit (Sympy). Sympy has a nice feature where it can format its output as c-code. At that point, if I were using C I would be all done, but I have another translation step to do. Still, it beats the heck out of manually running through the stupid spreadsheet.
W..w..W - Willy Waterloo washes Warren Wiggins who is washing Waldo Woo.
I recall a survey of (non-trivial) corporate spreadsheets in the mid-90's, it went something like 95% had a maths bug, in 80% of cases the bug made the sheet useless, 50% of the spreadsheets were used to make (incorrect) financial decisions. The reason why corporations coffers don't evaporate is that they use thousands of them so the +/-ve affect on the money buffer has a central limit of zero. It's a much more precarious situation if you using a single homespun spreadsheet to run a corner store
And did you exchange a walk on part in the war for a lead role in a cage? - Pink Floyd.
is that why we're fucked?
Harrison's Postulate - "For every action there is an equal and opposite criticism"
There are no corporate secrets below, but I stumbled upon this formula in an actuarial spreadsheet (I'm a developer with an actuarial education).
The only way this logic could be verified is by breaking the single formula into 20+ different cells with more simple calculations.
And of course it is in several thousand cells, bringing any computer at all to its knees during calculation.
A good example of how not to use Excel (but the actuaries don't have access to IT prototyping or core development).
=IF(F6="050",tiers!$D$21+IF(AND(F6="050",OR(E6="W",E6="X")),tiers!$D$29-tiers!$D$26+VLOOKUP("N"&100&VLOOKUP(L6,tiers!$B$38:$C$49,2,0)&"N/A"&M6/12&"/"&N6&"0-"&R6/1000&C6*1,tiers!$L$2:$W$20969,12,0),IF(OR(E6="W",E6="X"),VLOOKUP("N"&F6*1&VLOOKUP(L6,tiers!$B$38:$C$49,2,0)&"N/A"&M6/12&"/"&N6&"0-"&R6/1000&C6*1,tiers!$L$2:$W$20969,12,0),IF(AND(F6="050",E6="P",M6=36,N6=60),tiers!$D$29-tiers!$D$26+VLOOKUP("E"&100&VLOOKUP(L6,tiers!$B$52:$C$55,2,0)&"N/A"&M6&"/"&N6&VLOOKUP(R6,tiers!$B$57:$C$59,2,0)&C6*1,tiers!$L$2:$W$20969,12,0),IF(AND(E6="P",M6=36,N6=60),VLOOKUP("E"&F6*1&VLOOKUP(L6,tiers!$B$52:$C$55,2,0)&"N/A"&M6&"/"&N6&VLOOKUP(R6,tiers!$B$57:$C$59,2,0)&C6*1,tiers!$L$2:$W$20969,12,0),IF(AND(F6="050",E6="P"),tiers!$D$29-tiers!$D$26+VLOOKUP("E"&100&VLOOKUP(L6,tiers!$B$52:$C$55,2,0)&"N/A"&M6/12&"/"&N6&VLOOKUP(R6,tiers!$B$57:$C$59,2,0)&C6*1,tiers!$L$2:$W$20969,12,0),IF(E6="P",VLOOKUP("E"&F6*1&VLOOKUP(L6,tiers!$B$52:$C$55,2,0)&"N/A"&M6/12&"/"&N6&VLOOKUP(R6,tiers!$B$57:$C$59,2,0)&C6*1,tiers!$L$2:$W$20969,12,0),IF(AND(F6="050",E6="*",M6=6),tiers!$D$29-tiers!$D$26+VLOOKUP(B6&100&VLOOKUP(L6,tiers!$B$38:$C$49,2,0)&IF(Q6=48,"0-4","6-10")&M6&"/"&N6&"0-100"&C6*1,tiers!$L$2:$W$20969,12,0),IF(AND(E6="*",M6=6),VLOOKUP(B6&F6*1&VLOOKUP(L6,tiers!$B$38:$C$49,2,0)&IF(OR(L6="PPH08",L6="PTH08"),"0-9",IF(Q6=48,"0-4","6-10"))&M6&"/"&N6&IF(R6=125000,"100-125","0-100")&C6*1,tiers!$L$2:$W$20969,12,0),IF(AND(F6="050",E6="*"),tiers!$D$29-tiers!$D$26+VLOOKUP(B6&100&VLOOKUP(L6,tiers!$B$38:$C$49,2,0)&IF(OR(L6="PPH08",L6="PTH08"),"0-9",IF(Q6=48,"0-4","6-10"))&M6&"/"&N6&IF(AND(OR(L6="PPH08",L6="PTH08"),R6=100000),"0-100",VLOOKUP(R6,tiers!$B$64:$C$70,2,0))&C6*1,tiers!$L$2:$W$20969,12,0),IF(E6="*",VLOOKUP(B6&F6*1&VLOOKUP(L6,tiers!$B$38:$C$49,2,0)&IF(OR(L6="PPH08",L6="PTH08"),"0-9",IF(Q6=48,"0-4","6-10"))&M6&"/"&N6&IF(AND(OR(L6="PPH08",L6="PTH08"),R6=100000),"0-100",VLOOKUP(R6,tiers!$B$64:$C$70,2,0))&C6*1,tiers!$L$2:$W$20969,12,0),"ERROR")))))))))),IF(AND(F6="050",OR(E6="W",E6="X")),tiers!$D$29-tiers!$D$26+VLOOKUP("N"&100&VLOOKUP(L6,tiers!$B$38:$C$49,2,0)&"N/A"&M6/12&"/"&N6&"0-"&R6/1000&C6*1,tiers!$L$2:$W$20969,12,0),IF(OR(E6="W",E6="X"),VLOOKUP("N"&F6*1&VLOOKUP(L6,tiers!$B$38:$C$49,2,0)&"N/A"&M6/12&"/"&N6&"0-"&R6/1000&C6*1,tiers!$L$2:$W$20969,12,0),IF(AND(F6="050",E6="P",M6=36,N6=60),tiers!$D$29-tiers!$D$26+VLOOKUP("E"&100&VLOOKUP(L6,tiers!$B$52:$C$55,2,0)&"N/A"&M6&"/"&N6&VLOOKUP(R6,tiers!$B$57:$C$59,2,0)&C6*1,tiers!$L$2:$W$20969,12,0),IF(AND(E6="P",M6=36,N6=60),VLOOKUP("E"&F6*1&VLOOKUP(L6,tiers!$B$52:$C$55,2,0)&"N/A"&M6&"/"&N6&VLOOKUP(R6,tiers!$B$57:$C$59,2,0)&C6*1,tiers!$L$2:$W$20969,12,0),IF(AND(F6="050",E6="P"),tiers!$D$29-tiers!$D$26+VLOOKUP("E"&100&VLOOKUP(L6,tiers!$B$52:$C$55,2,0)&"N/A"&M6/12&"/"&N6&VLOOKUP(R6,tiers!$B$57:$C$59,2,0)&C6*1,tiers!$L$2:$W$20969,12,0),IF(E6="P",VLOOKUP("E"&F6*1&VLOOKUP(L6,tiers!$B$52:$C$55,2,0)&"N/A"&M6/12&"/"&N6&VLOOKUP(R6,tiers!$B$57:$C$59,2,0)&C6*1,tiers!$L$2:$W$20969,12,0),IF(AND(F6="050",E6="*",M6=6),tiers!$D$29-tiers!$D$26+VLOOKUP(B6&100&VLOOKUP(L6,tiers!$B$38:$C$49,2,0)&IF(Q6=
BlameBillCosby.com
Lemire is right, spreadsheets are terrible for complex models that need to be modified. He is right for precisely the reasons he outlined.
That doesn't mean that spreadsheets are useless. If you have a standard form where you're only modifying values, rather than functions, spreadsheets are great. There is a low barrier to entry and they are good for communicating results. But as soon as you need to audit or modify functions, you are jumping all over the place and it is easy to make mistakes. Yes, there are ways to consolidate your code (at least in spreadsheets that support scripting), but you are going to take so much time learning how to use the advanced features of you spreadsheet that may as well learn a dedicated programming language in those cases.
And the reality is that it's pretty easy to learn how to use programming languages these days. Not as easy as using a spreadsheet, to be sure, but even the standard Python distribution can handle most of the vulgarities of loading data into memory and storing it properly (i.e. you don't have to worry about parsing or data structures too much). By adding the appropriate modules you can do some decent visualization of data. In some cases the visualization will be better than spreadsheets, and in others spreadsheets will have the lead. And that's just Python, which I chose as an example because I'm familiar with it. The reality is that there are much more appropriate domain specific languages out there.
thanks for the input but this is still the wrong analogy...
it is not what TFA is saying, and it is incorrect in fact
Picketty is being criticized in TFA because he used a spreadsheet, which has 'cells' which contain 'formulas' which are descriptions of mathematical operations on data
TFA author is saying that, I quote again:
navigating a spreadsheet program to view a formula in a cell is *basic operation of the software*
if you cannot do that, you cannot properly use the software at all...it's like saying cars are unfair because some people don't understand how to use the brake pedal
sure, when research is "difficult to review" that is a bad thing....**EVERYONE AGREES ON THAT**...the fact that Picketty used a spreadsheet and TFA author can't use it's most basic functions is not worth discussing at all, ever
the problem in economics & social science research is P-hacking...gaming the results using your fancy 'non-spreadsheet' research software...that's a legit problem that we need more TFA's about
Thank you Dave Raggett
Yeeaaahhhh you lose.
http://www.huffingtonpost.com/2014/05/27/piketty-not-wrong_n_5397358.html
Krugman was joined by economists Justin Wolfers, James Hamilton, Gabriel Zucman, frequent Piketty critic Scott Winship and others, along with The Economist's Ryan Avent, The Washington Post's Matthew O'Brien, and The New York Times' Neil Irwin, to name a few.
No they aren't, they're constraint based systems. flow of control isn't the focus. You could easily reduce a spreadsheet to a bunch of Prolog or to almost any functional language.
"Malo periculosam, libertatem quam quietam servitutem." -- Jefferson
"I never worked in a company with normal people."
I'm guessing you haven't had the pleasure of working in the typical firm where the company's years-old ENTIRE lifetime of work and data is passed around e-mail as a 80MB Excel attachment.
So when did Slashot turn into a "the beige box is a hard drive because I say it is and fuck you elitist technical folk" site?
if it can execute the operation needed for the research then it is acceptable...if not, then no
You could probably write this computational code in a shell script, too. But it would still be a terrible idea. Why? Because it's the wrong tool for the job. Simple as that. It doesn't matter what you can and cannot do, it matters what you should do, and you shouldn't use spreadsheets for anything complicated. It's simply too easy to make stupid mistakes that are difficult to trace and correct (or even notice).
you can't blame a spreadsheet for a poorly devised experiment...you *can* blame a researcher for using an inappropriate statistical model...you *cannot* criticize the method of analysis as long as it is physically capable of the computation
TFA isn't blaming the spreadsheets, he's blaming the people who use them for using them. It's not acceptable to use a tool that works poorly and is highly susceptible to mistakes, and no one should listen to anyone who does so unless that person is damned good at that tool: yes, it is possible that someone is so fantastically good with spreadsheets they can use them for massive data analysis with no problems. They are, however, the exception, and I would generally be inclined to disbelieve the results from anyone who does large work with spreadsheets (simply because of the possibility for errors and the lack of concern for accuracy that using spreadsheets demonstrates). So, the conclusion is that you shouldn't use spreadsheets for important work. You absolutely can criticize an analysis if it uses a tool that is highly likely to introduce errors, and that's fundamentally the point (and it's underscored by the fact that that is precisely what happened in Piketty’s case).
"None can love freedom heartily, but good men; the rest love not freedom, but license." --John Milton
"Oh, that's just a right-wing smear from EVUL RETHUGLICANS!!!"?
Well the part about figures being constructed "out of thin air" is a smear (whoever it may be who claims it), as becomes clear when one reads the rest of the article you cite. The most balanced assessment of the Giles vs Picketty dispute is perhaps the piece Inequality: A Piketty problem? from The Economist.
Better to be despised for too anxious apprehensions, than ruined by too confident a security. --Edmund Burke
Is that the industry that cratered a couple years ago, taking much of our economy with them?
I don't think they need telling. They need mercy killing.
Assorted stuff I do sometimes: Lemuria.org