Why You Shouldn't Use Spreadsheets For Important Work
An anonymous reader writes "Computer science professor Daniel Lemire explains why spreadsheets shouldn't be used for important work, especially where dedicated software could do a better job. His post comes in response to evaluations of a new economics tome by Thomas Piketty, a book that is likely to be influential for years to come. Lemire writes, 'Unfortunately, like too many people, Piketty used spreadsheets instead of writing sane software. On the plus side, he published his code ... on the negative side, it appears that Piketty's code contains mistakes, fudging and other problems. ... Simply put, spreadsheets are good for quick and dirty work, but they are not designed for serious and reliable work. ... Spreadsheets make code review difficult. The code is hidden away in dozens if not hundreds of little cells If you are not reviewing your code carefully and if you make it difficult for others to review it, how do expect it to be reliable?'"
"I don't know how to use spread sheets properly."
Spreadsheets are like a blank piece of paper with grid squares. Which means you can put anything down, tied together with some formulae, and it's brilliant.
Which is also why it's complete pants - the "anything goes" really does mean that.
(That, and it will tend to break when you most rely on it)
"She's furniture with a pulse"
So what's the alternative? There are no good and easy to use software packages to create simple data-intensive apps. The closest alternative was VB6 and if I had to chose between it and Excel, I'd choose Excel any day of the week.
It's not "spreadsheets shouldn't be used for important work", it's "spreadsheets should not be used for work that's not suitable for spreadsheets". Tools for the job, and all that.
Just because you're paranoid doesn't mean there isn't an invisible demon about to eat your face
If the inability to code review spreadsheets was a real issue, it wouldn't be too hard to convert spreadsheet functions into a functional language. For non-programmers, a spreadsheet lowers the barrier to entry. This allows people to do something useful and productive who couldn't do so otherwise. That's a good thing.
Another major issues with spreadsheets is that they don't handle data typing issues very well. For example, if you try to add a list of numbers, and somewhere in the list you have a number encoded as text, instead of throwing an error, it won't be included in the sum. Errors should never pass silently. Unless explicitly silenced.
You're doing it wrong.
Harrison's Postulate - "For every action there is an equal and opposite criticism"
Maybe you should read it again?
His real criticizm is that spreadsheet software is horrible for any high end work, or with anything you want to share, and he is correct.
"so he probably doesn't know how to use the interface of a spreadsheet very well, which makes the act of checking a formula tedious..."
it is tedious, even if you are an expert and even if the user uses goof practices.
"P-hacking is the problem in social science/economics research, not using 'spreadsheets'"
I don't think you know what P-Hacking is.
The Kruger Dunning explains most post on
The fact that Piketty's work describes a damning indictement of the USA's most cherished concept - free market capitalism - means that thousands of neo-liberal economists will pour over every single digit and operator in his spreadsheets looking for anything to negate the findings. If they can't find anything, they'll attack him. When you hear of character attacks against Piketty or some other diversionary tactic, you'll know his data is correct.
Tell that to the entire finance and insurance industry.
I figured this out twenty-mumble years ago.
I was doing data analysis in spreadsheets, and realized that I had no way to audit them.
The data and the analysis were all just...there...in the spreadsheet.
As soon as I got a grip on my data, I changed over to C programs that I could test, and document, and validate, and run at any time to demonstrate that input X generated output Y.
I recall a survey of (non-trivial) corporate spreadsheets in the mid-90's, it went something like 95% had a maths bug, in 80% of cases the bug made the sheet useless, 50% of the spreadsheets were used to make (incorrect) financial decisions. The reason why corporations coffers don't evaporate is that they use thousands of them so the +/-ve affect on the money buffer has a central limit of zero. It's a much more precarious situation if you using a single homespun spreadsheet to run a corner store
And did you exchange a walk on part in the war for a lead role in a cage? - Pink Floyd.
is that why we're fucked?
Harrison's Postulate - "For every action there is an equal and opposite criticism"
There are no corporate secrets below, but I stumbled upon this formula in an actuarial spreadsheet (I'm a developer with an actuarial education).
The only way this logic could be verified is by breaking the single formula into 20+ different cells with more simple calculations.
And of course it is in several thousand cells, bringing any computer at all to its knees during calculation.
A good example of how not to use Excel (but the actuaries don't have access to IT prototyping or core development).
=IF(F6="050",tiers!$D$21+IF(AND(F6="050",OR(E6="W",E6="X")),tiers!$D$29-tiers!$D$26+VLOOKUP("N"&100&VLOOKUP(L6,tiers!$B$38:$C$49,2,0)&"N/A"&M6/12&"/"&N6&"0-"&R6/1000&C6*1,tiers!$L$2:$W$20969,12,0),IF(OR(E6="W",E6="X"),VLOOKUP("N"&F6*1&VLOOKUP(L6,tiers!$B$38:$C$49,2,0)&"N/A"&M6/12&"/"&N6&"0-"&R6/1000&C6*1,tiers!$L$2:$W$20969,12,0),IF(AND(F6="050",E6="P",M6=36,N6=60),tiers!$D$29-tiers!$D$26+VLOOKUP("E"&100&VLOOKUP(L6,tiers!$B$52:$C$55,2,0)&"N/A"&M6&"/"&N6&VLOOKUP(R6,tiers!$B$57:$C$59,2,0)&C6*1,tiers!$L$2:$W$20969,12,0),IF(AND(E6="P",M6=36,N6=60),VLOOKUP("E"&F6*1&VLOOKUP(L6,tiers!$B$52:$C$55,2,0)&"N/A"&M6&"/"&N6&VLOOKUP(R6,tiers!$B$57:$C$59,2,0)&C6*1,tiers!$L$2:$W$20969,12,0),IF(AND(F6="050",E6="P"),tiers!$D$29-tiers!$D$26+VLOOKUP("E"&100&VLOOKUP(L6,tiers!$B$52:$C$55,2,0)&"N/A"&M6/12&"/"&N6&VLOOKUP(R6,tiers!$B$57:$C$59,2,0)&C6*1,tiers!$L$2:$W$20969,12,0),IF(E6="P",VLOOKUP("E"&F6*1&VLOOKUP(L6,tiers!$B$52:$C$55,2,0)&"N/A"&M6/12&"/"&N6&VLOOKUP(R6,tiers!$B$57:$C$59,2,0)&C6*1,tiers!$L$2:$W$20969,12,0),IF(AND(F6="050",E6="*",M6=6),tiers!$D$29-tiers!$D$26+VLOOKUP(B6&100&VLOOKUP(L6,tiers!$B$38:$C$49,2,0)&IF(Q6=48,"0-4","6-10")&M6&"/"&N6&"0-100"&C6*1,tiers!$L$2:$W$20969,12,0),IF(AND(E6="*",M6=6),VLOOKUP(B6&F6*1&VLOOKUP(L6,tiers!$B$38:$C$49,2,0)&IF(OR(L6="PPH08",L6="PTH08"),"0-9",IF(Q6=48,"0-4","6-10"))&M6&"/"&N6&IF(R6=125000,"100-125","0-100")&C6*1,tiers!$L$2:$W$20969,12,0),IF(AND(F6="050",E6="*"),tiers!$D$29-tiers!$D$26+VLOOKUP(B6&100&VLOOKUP(L6,tiers!$B$38:$C$49,2,0)&IF(OR(L6="PPH08",L6="PTH08"),"0-9",IF(Q6=48,"0-4","6-10"))&M6&"/"&N6&IF(AND(OR(L6="PPH08",L6="PTH08"),R6=100000),"0-100",VLOOKUP(R6,tiers!$B$64:$C$70,2,0))&C6*1,tiers!$L$2:$W$20969,12,0),IF(E6="*",VLOOKUP(B6&F6*1&VLOOKUP(L6,tiers!$B$38:$C$49,2,0)&IF(OR(L6="PPH08",L6="PTH08"),"0-9",IF(Q6=48,"0-4","6-10"))&M6&"/"&N6&IF(AND(OR(L6="PPH08",L6="PTH08"),R6=100000),"0-100",VLOOKUP(R6,tiers!$B$64:$C$70,2,0))&C6*1,tiers!$L$2:$W$20969,12,0),"ERROR")))))))))),IF(AND(F6="050",OR(E6="W",E6="X")),tiers!$D$29-tiers!$D$26+VLOOKUP("N"&100&VLOOKUP(L6,tiers!$B$38:$C$49,2,0)&"N/A"&M6/12&"/"&N6&"0-"&R6/1000&C6*1,tiers!$L$2:$W$20969,12,0),IF(OR(E6="W",E6="X"),VLOOKUP("N"&F6*1&VLOOKUP(L6,tiers!$B$38:$C$49,2,0)&"N/A"&M6/12&"/"&N6&"0-"&R6/1000&C6*1,tiers!$L$2:$W$20969,12,0),IF(AND(F6="050",E6="P",M6=36,N6=60),tiers!$D$29-tiers!$D$26+VLOOKUP("E"&100&VLOOKUP(L6,tiers!$B$52:$C$55,2,0)&"N/A"&M6&"/"&N6&VLOOKUP(R6,tiers!$B$57:$C$59,2,0)&C6*1,tiers!$L$2:$W$20969,12,0),IF(AND(E6="P",M6=36,N6=60),VLOOKUP("E"&F6*1&VLOOKUP(L6,tiers!$B$52:$C$55,2,0)&"N/A"&M6&"/"&N6&VLOOKUP(R6,tiers!$B$57:$C$59,2,0)&C6*1,tiers!$L$2:$W$20969,12,0),IF(AND(F6="050",E6="P"),tiers!$D$29-tiers!$D$26+VLOOKUP("E"&100&VLOOKUP(L6,tiers!$B$52:$C$55,2,0)&"N/A"&M6/12&"/"&N6&VLOOKUP(R6,tiers!$B$57:$C$59,2,0)&C6*1,tiers!$L$2:$W$20969,12,0),IF(E6="P",VLOOKUP("E"&F6*1&VLOOKUP(L6,tiers!$B$52:$C$55,2,0)&"N/A"&M6/12&"/"&N6&VLOOKUP(R6,tiers!$B$57:$C$59,2,0)&C6*1,tiers!$L$2:$W$20969,12,0),IF(AND(F6="050",E6="*",M6=6),tiers!$D$29-tiers!$D$26+VLOOKUP(B6&100&VLOOKUP(L6,tiers!$B$38:$C$49,2,0)&IF(Q6=
BlameBillCosby.com
Yeeaaahhhh you lose.
http://www.huffingtonpost.com/2014/05/27/piketty-not-wrong_n_5397358.html
Krugman was joined by economists Justin Wolfers, James Hamilton, Gabriel Zucman, frequent Piketty critic Scott Winship and others, along with The Economist's Ryan Avent, The Washington Post's Matthew O'Brien, and The New York Times' Neil Irwin, to name a few.
"I never worked in a company with normal people."
I'm guessing you haven't had the pleasure of working in the typical firm where the company's years-old ENTIRE lifetime of work and data is passed around e-mail as a 80MB Excel attachment.
if it can execute the operation needed for the research then it is acceptable...if not, then no
You could probably write this computational code in a shell script, too. But it would still be a terrible idea. Why? Because it's the wrong tool for the job. Simple as that. It doesn't matter what you can and cannot do, it matters what you should do, and you shouldn't use spreadsheets for anything complicated. It's simply too easy to make stupid mistakes that are difficult to trace and correct (or even notice).
you can't blame a spreadsheet for a poorly devised experiment...you *can* blame a researcher for using an inappropriate statistical model...you *cannot* criticize the method of analysis as long as it is physically capable of the computation
TFA isn't blaming the spreadsheets, he's blaming the people who use them for using them. It's not acceptable to use a tool that works poorly and is highly susceptible to mistakes, and no one should listen to anyone who does so unless that person is damned good at that tool: yes, it is possible that someone is so fantastically good with spreadsheets they can use them for massive data analysis with no problems. They are, however, the exception, and I would generally be inclined to disbelieve the results from anyone who does large work with spreadsheets (simply because of the possibility for errors and the lack of concern for accuracy that using spreadsheets demonstrates). So, the conclusion is that you shouldn't use spreadsheets for important work. You absolutely can criticize an analysis if it uses a tool that is highly likely to introduce errors, and that's fundamentally the point (and it's underscored by the fact that that is precisely what happened in Piketty’s case).
"None can love freedom heartily, but good men; the rest love not freedom, but license." --John Milton
"Oh, that's just a right-wing smear from EVUL RETHUGLICANS!!!"?
Well the part about figures being constructed "out of thin air" is a smear (whoever it may be who claims it), as becomes clear when one reads the rest of the article you cite. The most balanced assessment of the Giles vs Picketty dispute is perhaps the piece Inequality: A Piketty problem? from The Economist.
Better to be despised for too anxious apprehensions, than ruined by too confident a security. --Edmund Burke