Why You Shouldn't Use Spreadsheets For Important Work
An anonymous reader writes "Computer science professor Daniel Lemire explains why spreadsheets shouldn't be used for important work, especially where dedicated software could do a better job. His post comes in response to evaluations of a new economics tome by Thomas Piketty, a book that is likely to be influential for years to come. Lemire writes, 'Unfortunately, like too many people, Piketty used spreadsheets instead of writing sane software. On the plus side, he published his code ... on the negative side, it appears that Piketty's code contains mistakes, fudging and other problems. ... Simply put, spreadsheets are good for quick and dirty work, but they are not designed for serious and reliable work. ... Spreadsheets make code review difficult. The code is hidden away in dozens if not hundreds of little cells If you are not reviewing your code carefully and if you make it difficult for others to review it, how do expect it to be reliable?'"
"I don't know how to use spread sheets properly."
Spreadsheets are like a blank piece of paper with grid squares. Which means you can put anything down, tied together with some formulae, and it's brilliant.
Which is also why it's complete pants - the "anything goes" really does mean that.
(That, and it will tend to break when you most rely on it)
"She's furniture with a pulse"
Dunno if that's a good or bad thing, though.
I've had to take over maintenance of a few "excel" based applications. Never. Again.
If you were me, you'd be good lookin'. - six string samurai
So what's the alternative? There are no good and easy to use software packages to create simple data-intensive apps. The closest alternative was VB6 and if I had to chose between it and Excel, I'd choose Excel any day of the week.
Of course it is, but we can't afford to do it right.
Life in Orange County
Maybe a lack of all these bloated runtimes will help filter out the terrible programmers.
I think Excel stores formulas in a zipped XML document. Someone could write a tool that extracts each cell's formula from a workbook, sorts them topologically, and spits out JavaScript, Python, or whatever your favorite scripting language is.
Or you could make an Excel spreadsheet that lists formulas in other spreadsheets.
ugh...so anger! always with the nomenclature distinctions...this is a stupid approach to a real problem
a spreadsheet is a computer program
that's it...
to criticize the act of entering data and performing computations on that data using computer software is the height of ignorance
I don't know if he's right or not, but this guy's real criticizm, once you fight through his ignorance of the issue is that in his view Pickety didn't show enough of how he got his figures...or more accurately, the TFA author had to look at the spreasheet cell to see what formula it used (gasp!)
so he probably doesn't know how to use the interface of a spreadsheet very well, which makes the act of checking a formula tedious...
then he writes some dumbass article inventing a problem to vent his frustration and reinforce his self-image...
all the while missing the real problem with economics "research" (not Pickety but others do this...) it's called "P-hacking"
P-hacking is the problem in social science/economics research, not using 'spreadsheets'
gah!
Thank you Dave Raggett
It's not "spreadsheets shouldn't be used for important work", it's "spreadsheets should not be used for work that's not suitable for spreadsheets". Tools for the job, and all that.
Just because you're paranoid doesn't mean there isn't an invisible demon about to eat your face
If the inability to code review spreadsheets was a real issue, it wouldn't be too hard to convert spreadsheet functions into a functional language. For non-programmers, a spreadsheet lowers the barrier to entry. This allows people to do something useful and productive who couldn't do so otherwise. That's a good thing.
Another major issues with spreadsheets is that they don't handle data typing issues very well. For example, if you try to add a list of numbers, and somewhere in the list you have a number encoded as text, instead of throwing an error, it won't be included in the sum. Errors should never pass silently. Unless explicitly silenced.
You're doing it wrong.
Harrison's Postulate - "For every action there is an equal and opposite criticism"
There was a farmer who came up with a system for how much of which chemicals should be used to fertilize different areas of his farm. He built his system in Excel, and modified/evolved it over the course of many years. It eventually got so crazy-complex (and slow) (and unable to run in any version of Excel after 2007) that he hired my old company to port it into a shiny new software package.
I called it The Spreadsheet from Hell. And it was. Any plugin I tried to throw at Excel just to map cell dependencies would crash. Trying to comprehend how it worked took months, only to decide that it would be better just to take the core concepts and build something from scratch.
TLDR; Just say no to spreadsheets
The fact that Piketty's work describes a damning indictement of the USA's most cherished concept - free market capitalism - means that thousands of neo-liberal economists will pour over every single digit and operator in his spreadsheets looking for anything to negate the findings. If they can't find anything, they'll attack him. When you hear of character attacks against Piketty or some other diversionary tactic, you'll know his data is correct.
I agree, a well made spreadsheet is far easier to follow than a proprietary program or even most study's results.
If you have a custom formula in a spreadsheet, create it in the program's scripting language instead of copy/pasting to tons of cells. Create the spreadsheet in a repeatable layout that is ease to understand the sections and the flow of the data.
I do not see how that is any different than using a proprietary program. At least with a spreadsheet you can look directly at the code for errors. In a proprietary program, you would need to learn what the behavior of the libraries or the specific nomenclature that is typically not always standardized.
The entire article reads more like a "why don't people use what I use" argument and not a reasonable critique.
The spreadsheet was a great invention and it can be very useful. But it's not useful for everything. There should be some kind of tool that allows you to write formulas programming style (e.g. python), but with some added table visualisation for inputting/outputting data.
Tell that to the entire finance and insurance industry.
I figured this out twenty-mumble years ago.
I was doing data analysis in spreadsheets, and realized that I had no way to audit them.
The data and the analysis were all just...there...in the spreadsheet.
As soon as I got a grip on my data, I changed over to C programs that I could test, and document, and validate, and run at any time to demonstrate that input X generated output Y.
My father was a wise man, and a solid programmer. He liked Basic, because it was simple, and readable (in his environment the alternatives were mainly Assembler, Cobol, and RPG). Whenever people made fun of his love for Basic, and how it resulted in bad code, he always replied “there are no bad languages, just bad programmers.”
The problem isn't the spreadsheet. The problem is people building ugly models in it. Do they seriously think that if those models were written in C, Java or Perl they would have been magnitudes better? I doubt it; you're just transplanting bad habits onto a different platform.
Of course, if he'd used trained professionals to build his models in whatever language of choice the models would be better. If he'd used trained professionals to build his spreadsheet models they would have been better as well.
I've actually written a very limited version of this. My boss likes to prototype algorithms in Excel, but I need to cram them into a machine with instructions written in a scripting language. I first use a VBA tool to tokenize and collect the Excel formulas, then over to Python to do some conversions of a few built in fuctions, then run it through a symbolic algebra toolkit (Sympy). Sympy has a nice feature where it can format its output as c-code. At that point, if I were using C I would be all done, but I have another translation step to do. Still, it beats the heck out of manually running through the stupid spreadsheet.
W..w..W - Willy Waterloo washes Warren Wiggins who is washing Waldo Woo.
I think the title should be "Why You Shouldn't Use Spreadsheets for *Complicated* Work". Just because a job is important doesn't mean the calculation is complex and something that needs to be coded in, for example, matlab.
If my job is to make a pie chart, I can't see why using Excel is a bad idea. On the other hand, if I am examining the variance of several thousand data points and then plotting the residuals from a gaussian fit, then yes, I can see why using something else would be a lot better. It has nothing to do with importance. Only complexity.
I recall a survey of (non-trivial) corporate spreadsheets in the mid-90's, it went something like 95% had a maths bug, in 80% of cases the bug made the sheet useless, 50% of the spreadsheets were used to make (incorrect) financial decisions. The reason why corporations coffers don't evaporate is that they use thousands of them so the +/-ve affect on the money buffer has a central limit of zero. It's a much more precarious situation if you using a single homespun spreadsheet to run a corner store
And did you exchange a walk on part in the war for a lead role in a cage? - Pink Floyd.
That's not the problem. The real issue is that spreadsheets are effectively a big ball of goto with multiple entry points.
you're wrong on both counts...that is not his 'real' criticism and even if it was he and you would still be wrong
spreadsheets are ***computation software***
if it can execute the operation needed for the research then it is acceptable...if not, then no
end.
it's a tool to analyze data...that's ****all any of these programs are, ever****
the method of analysis is either proper or not to test the hypothesis....that is a *completely different question* to whether the software is physically capable of doing the computation
you can't blame a spreadsheet for a poorly devised experiment...you *can* blame a researcher for using an inappropriate statistical model...you *cannot* criticize the method of analysis as long as it is physically capable of the computation
your echo TFA's ignorance...and yes, I've seen people like you use rhetoric like yours to justify P-hacking throughout my time in academia
Thank you Dave Raggett
is that why we're fucked?
Harrison's Postulate - "For every action there is an equal and opposite criticism"
Dear "Computer science professor", How about you earn your title, instead of trolling about something you know nothing about. Yours truly, People who work for a living.
There are no corporate secrets below, but I stumbled upon this formula in an actuarial spreadsheet (I'm a developer with an actuarial education).
The only way this logic could be verified is by breaking the single formula into 20+ different cells with more simple calculations.
And of course it is in several thousand cells, bringing any computer at all to its knees during calculation.
A good example of how not to use Excel (but the actuaries don't have access to IT prototyping or core development).
=IF(F6="050",tiers!$D$21+IF(AND(F6="050",OR(E6="W",E6="X")),tiers!$D$29-tiers!$D$26+VLOOKUP("N"&100&VLOOKUP(L6,tiers!$B$38:$C$49,2,0)&"N/A"&M6/12&"/"&N6&"0-"&R6/1000&C6*1,tiers!$L$2:$W$20969,12,0),IF(OR(E6="W",E6="X"),VLOOKUP("N"&F6*1&VLOOKUP(L6,tiers!$B$38:$C$49,2,0)&"N/A"&M6/12&"/"&N6&"0-"&R6/1000&C6*1,tiers!$L$2:$W$20969,12,0),IF(AND(F6="050",E6="P",M6=36,N6=60),tiers!$D$29-tiers!$D$26+VLOOKUP("E"&100&VLOOKUP(L6,tiers!$B$52:$C$55,2,0)&"N/A"&M6&"/"&N6&VLOOKUP(R6,tiers!$B$57:$C$59,2,0)&C6*1,tiers!$L$2:$W$20969,12,0),IF(AND(E6="P",M6=36,N6=60),VLOOKUP("E"&F6*1&VLOOKUP(L6,tiers!$B$52:$C$55,2,0)&"N/A"&M6&"/"&N6&VLOOKUP(R6,tiers!$B$57:$C$59,2,0)&C6*1,tiers!$L$2:$W$20969,12,0),IF(AND(F6="050",E6="P"),tiers!$D$29-tiers!$D$26+VLOOKUP("E"&100&VLOOKUP(L6,tiers!$B$52:$C$55,2,0)&"N/A"&M6/12&"/"&N6&VLOOKUP(R6,tiers!$B$57:$C$59,2,0)&C6*1,tiers!$L$2:$W$20969,12,0),IF(E6="P",VLOOKUP("E"&F6*1&VLOOKUP(L6,tiers!$B$52:$C$55,2,0)&"N/A"&M6/12&"/"&N6&VLOOKUP(R6,tiers!$B$57:$C$59,2,0)&C6*1,tiers!$L$2:$W$20969,12,0),IF(AND(F6="050",E6="*",M6=6),tiers!$D$29-tiers!$D$26+VLOOKUP(B6&100&VLOOKUP(L6,tiers!$B$38:$C$49,2,0)&IF(Q6=48,"0-4","6-10")&M6&"/"&N6&"0-100"&C6*1,tiers!$L$2:$W$20969,12,0),IF(AND(E6="*",M6=6),VLOOKUP(B6&F6*1&VLOOKUP(L6,tiers!$B$38:$C$49,2,0)&IF(OR(L6="PPH08",L6="PTH08"),"0-9",IF(Q6=48,"0-4","6-10"))&M6&"/"&N6&IF(R6=125000,"100-125","0-100")&C6*1,tiers!$L$2:$W$20969,12,0),IF(AND(F6="050",E6="*"),tiers!$D$29-tiers!$D$26+VLOOKUP(B6&100&VLOOKUP(L6,tiers!$B$38:$C$49,2,0)&IF(OR(L6="PPH08",L6="PTH08"),"0-9",IF(Q6=48,"0-4","6-10"))&M6&"/"&N6&IF(AND(OR(L6="PPH08",L6="PTH08"),R6=100000),"0-100",VLOOKUP(R6,tiers!$B$64:$C$70,2,0))&C6*1,tiers!$L$2:$W$20969,12,0),IF(E6="*",VLOOKUP(B6&F6*1&VLOOKUP(L6,tiers!$B$38:$C$49,2,0)&IF(OR(L6="PPH08",L6="PTH08"),"0-9",IF(Q6=48,"0-4","6-10"))&M6&"/"&N6&IF(AND(OR(L6="PPH08",L6="PTH08"),R6=100000),"0-100",VLOOKUP(R6,tiers!$B$64:$C$70,2,0))&C6*1,tiers!$L$2:$W$20969,12,0),"ERROR")))))))))),IF(AND(F6="050",OR(E6="W",E6="X")),tiers!$D$29-tiers!$D$26+VLOOKUP("N"&100&VLOOKUP(L6,tiers!$B$38:$C$49,2,0)&"N/A"&M6/12&"/"&N6&"0-"&R6/1000&C6*1,tiers!$L$2:$W$20969,12,0),IF(OR(E6="W",E6="X"),VLOOKUP("N"&F6*1&VLOOKUP(L6,tiers!$B$38:$C$49,2,0)&"N/A"&M6/12&"/"&N6&"0-"&R6/1000&C6*1,tiers!$L$2:$W$20969,12,0),IF(AND(F6="050",E6="P",M6=36,N6=60),tiers!$D$29-tiers!$D$26+VLOOKUP("E"&100&VLOOKUP(L6,tiers!$B$52:$C$55,2,0)&"N/A"&M6&"/"&N6&VLOOKUP(R6,tiers!$B$57:$C$59,2,0)&C6*1,tiers!$L$2:$W$20969,12,0),IF(AND(E6="P",M6=36,N6=60),VLOOKUP("E"&F6*1&VLOOKUP(L6,tiers!$B$52:$C$55,2,0)&"N/A"&M6&"/"&N6&VLOOKUP(R6,tiers!$B$57:$C$59,2,0)&C6*1,tiers!$L$2:$W$20969,12,0),IF(AND(F6="050",E6="P"),tiers!$D$29-tiers!$D$26+VLOOKUP("E"&100&VLOOKUP(L6,tiers!$B$52:$C$55,2,0)&"N/A"&M6/12&"/"&N6&VLOOKUP(R6,tiers!$B$57:$C$59,2,0)&C6*1,tiers!$L$2:$W$20969,12,0),IF(E6="P",VLOOKUP("E"&F6*1&VLOOKUP(L6,tiers!$B$52:$C$55,2,0)&"N/A"&M6/12&"/"&N6&VLOOKUP(R6,tiers!$B$57:$C$59,2,0)&C6*1,tiers!$L$2:$W$20969,12,0),IF(AND(F6="050",E6="*",M6=6),tiers!$D$29-tiers!$D$26+VLOOKUP(B6&100&VLOOKUP(L6,tiers!$B$38:$C$49,2,0)&IF(Q6=
BlameBillCosby.com
Lemire is right, spreadsheets are terrible for complex models that need to be modified. He is right for precisely the reasons he outlined.
That doesn't mean that spreadsheets are useless. If you have a standard form where you're only modifying values, rather than functions, spreadsheets are great. There is a low barrier to entry and they are good for communicating results. But as soon as you need to audit or modify functions, you are jumping all over the place and it is easy to make mistakes. Yes, there are ways to consolidate your code (at least in spreadsheets that support scripting), but you are going to take so much time learning how to use the advanced features of you spreadsheet that may as well learn a dedicated programming language in those cases.
And the reality is that it's pretty easy to learn how to use programming languages these days. Not as easy as using a spreadsheet, to be sure, but even the standard Python distribution can handle most of the vulgarities of loading data into memory and storing it properly (i.e. you don't have to worry about parsing or data structures too much). By adding the appropriate modules you can do some decent visualization of data. In some cases the visualization will be better than spreadsheets, and in others spreadsheets will have the lead. And that's just Python, which I chose as an example because I'm familiar with it. The reality is that there are much more appropriate domain specific languages out there.
thanks for the input but this is still the wrong analogy...
it is not what TFA is saying, and it is incorrect in fact
Picketty is being criticized in TFA because he used a spreadsheet, which has 'cells' which contain 'formulas' which are descriptions of mathematical operations on data
TFA author is saying that, I quote again:
navigating a spreadsheet program to view a formula in a cell is *basic operation of the software*
if you cannot do that, you cannot properly use the software at all...it's like saying cars are unfair because some people don't understand how to use the brake pedal
sure, when research is "difficult to review" that is a bad thing....**EVERYONE AGREES ON THAT**...the fact that Picketty used a spreadsheet and TFA author can't use it's most basic functions is not worth discussing at all, ever
the problem in economics & social science research is P-hacking...gaming the results using your fancy 'non-spreadsheet' research software...that's a legit problem that we need more TFA's about
Thank you Dave Raggett
Yeeaaahhhh you lose.
http://www.huffingtonpost.com/2014/05/27/piketty-not-wrong_n_5397358.html
Krugman was joined by economists Justin Wolfers, James Hamilton, Gabriel Zucman, frequent Piketty critic Scott Winship and others, along with The Economist's Ryan Avent, The Washington Post's Matthew O'Brien, and The New York Times' Neil Irwin, to name a few.
No they aren't, they're constraint based systems. flow of control isn't the focus. You could easily reduce a spreadsheet to a bunch of Prolog or to almost any functional language.
"Malo periculosam, libertatem quam quietam servitutem." -- Jefferson
"I never worked in a company with normal people."
I'm guessing you haven't had the pleasure of working in the typical firm where the company's years-old ENTIRE lifetime of work and data is passed around e-mail as a 80MB Excel attachment.
I did some computer work for a company that used a single giant spreadsheet for their entire inventory system. Every time they added or removed something from the warehouse they would open this spreadsheet, edit it, and then overwrite the old file.
Somewhere along the line this file didn't save right, and dozens of columns in the spreadsheet were lost, but because it was way out in the double-letter columns, nobody noticed it for months (or even years, we never did pin it down)
I spent 2 weeks going though every single backup they had trying to find an old copy with the missing columns.
Everyone should learn how to write macros ie: script. You can easily make a very functional program in VBA to do most data processing you would need and even store the code in a separate file that can be used by other sheets/workbooks and modified or updated between projects.
I would never suggest putting processing into a cell, unless you were just trying to figure something out fast.
I look at Excel as a sandbox. Some people can make awesome sand castles and others struggle.
The dangers of knowledge trigger emotional distress in human beings.
Lemire says:
I agree with this assessment which is why the algae macroengineering part of the Diogenes Institute's comprehensive plan for energy independence and the environment was done with a customization of Mediawiki that lets each "cell" be a compete document with the value taken by the model, the formula and comments with citations.
The "cell" identifiers are complete names suffixed with the units in which the quantity is expressed.
See The full "spreadsheet" page for an idea of how this plays.
Seastead this.
navigating a spreadsheet program to view a formula in a cell is *basic operation of the software*
I noticed you used the singular of "formula" and "cell". Is it quite as basic to be able to view multiple formulas in multiple cells?
Spreadsheets tend to mess with strings that look somewhat like a date, it will automatically convert it to a date when it sees things like that. You need to be really careful about spreadsheets automatically reformatting your data, make sure you properly indicate whether a field is Text or not.
Spreadsheet bug the end result of which was layoffs when the company was suddenly out of money.
Someone correct me if I am wrong, but I have yet to see an Excel sheet that is ACID compliant. You rely on that for your core business and you are screwed.
"Oh, that's just a right-wing smear from EVUL RETHUGLICANS!!!"?
Well the part about figures being constructed "out of thin air" is a smear (whoever it may be who claims it), as becomes clear when one reads the rest of the article you cite. The most balanced assessment of the Giles vs Picketty dispute is perhaps the piece Inequality: A Piketty problem? from The Economist.
Better to be despised for too anxious apprehensions, than ruined by too confident a security. --Edmund Burke
Thanks for proving how DUMB you are but using a dum word.
It'd be helpful if you could actually state what you believe Piketty's flawed assumptions are.
Yes and no.
No: Spreadsheets are useful as a kind of financial IDE, as long as the result is fed through an analyst and ends up at a programmer.
Yes: The result often does not end up at a programmer.
No, the real reason we're fucked is and always will be human greed.
I'm assuming that you're trying to imply that the reason you can't export formulas in a CSV is that formulas will themselves contain commas. But CSV has a de facto standard way of escaping any field that has a comma. First, a CSV writer doubles up any quotation marks within the field, and then it wraps the field with a pair of quotes. Commas in such a quoted field are thus unambiguously part of the value, not field separators.
Example of a row with four elements, one of them containing a comma:
Example of a row with three elements, one of them containing quotation marks:
If that's not what you meant, could you explain further?
http://panko.shidler.hawaii.ed... there's academic work going back years "indicting" spreadsheets. Not that anyone pays attention... inside academic circles or outside.
That is a load of bull!
.bat and bash files I've seen.
Debugging that would be child's play --- rename some cell ranges to give them names that that could be debugged thoroughly in 6 hours or less.
A similar program in a sloppy programming language would be infinitely obfuscated in function calls.
But frankly, this entire article is stupid. A spreadsheet is great for prototyping and if the shoe fits, then wear it.
The above formulas are child's play compared to nest javascript turdballs or even a lot of C++ I have seen written or common PHP.
Needless to say, several casuals have been tricked or dupped into thinking those formulas would be hard to debug and modded as such. And probably by the non-programmers that visit here that can be fooled by anything, the real veterans would quickly recognize how to de-spaghetti that "code" in minutes.
And how trivial compared to even some
Priest: "Universe from nothing, no laws of physics, sped up time"+ huge discrepancies. Creationism? No. Big Bang Theory
While on the topic, why do spreadsheets use a plain grid in the first place? Why don't they use database-like tables with labled columns? The way spreadsheets mix data and presentation never made sense to me. It's like doing programming with raw memory addresses, instead of variable names and there is little you can do to structure and query the data properly. And meanwhile coming from the other side, the grid really sucks for doing layout of the data, as you are forced to use the same column and row spacing across the whole grid, even so most data actually on the grid in non-uniform.
yes, quite so
it requires higher level statistics knowledge and years of training to **interpret** the formulas
there's no getting around it...TFA is ridiculous
Thank you Dave Raggett
The problem is when morons start using spreadsheets to do tasks that are better served with the correct tool.
The "it's not the tool, it's the people" argument has one major flaw.
Tools are built so that people can perform tasks they can't otherwise do. As a result, if tool fails because it's not good enough for the task, at least part of the blame lies with tool and its creator.
No tools are built to ASSIST someone in performing a job. They aren't built to perform the job. All tools can be misused and mishandled.
A great example is a car as a tool. No one would argue that the car is a bad tool....until a drunk or a madman gets behind the wheel and ploughs into a class of pre-schoolers.
It would be stupid to continue with Spreadsheet and not realizing that SQL was a better choice.
It;s a worse problem that Piketty wanted a certain conclusion, and a segment of the world wanted that same conclusion so badly that it accepted his drivel.
That needs explanation?
The only people I know who do serious business stuff with spreadsheets are the kind of middle management airbags who drag the company down in many other ways as well. You all know them, the types where work goes better in their department when they're on holiday or ill.
I mean, seriously? Anyone with an above room temperature IQ here who thinks spreadsheets are more than hacks for getting a quick, rough estimate of your data?
Assorted stuff I do sometimes: Lemuria.org
Is that the industry that cratered a couple years ago, taking much of our economy with them?
I don't think they need telling. They need mercy killing.
Assorted stuff I do sometimes: Lemuria.org
The point is that spreadsheets can become very complex and they don't have any serious provisions for code review and debugging. Spreadsheets are not meant to be numerically stable and bug-free. They are meant for presenting and manipulating simple data. People abuse spreadsheets for database work and data transformation that are best done in a combination of SQL + something like R. I have been confronted with big spreadsheets in my research activities and the first thing I do is just convert the raw data in CSV, load it in SQL or R and take it from there, writing proper code in the process.l
I find spreadsheets are useful as a data entry UI.
I have to use Sage for my company accounts, and I find that the UI for Sage is so clunky and restrictive that it's easier to collate all the relevant data in LibreOffice, then run some Python scripts I wrote to process the .odt file to convert the information into CSVs full of payment data and balanced journal data that Sage can import
It's way easier to scrutinise all the figures in LibreOffice Calc and make sure it all reconciles and balances then import to Sage than the teeth-pulling agony of trying to get Sage to be at all helpful.
The same goes for any situation where i need to compile a long list of things that is handy to reorder / sum values. It's easier to use a spreadsheet as a data entry UI and run a simple script to validate/dump into a database than build a GUI myself for each task.
(1.21 gigawatts) / (88 miles per hour) = 30 757 874 newtons
"What he's really saying is somebody else don't know how to use spread sheets properly."
Fixed that for you!
Just which software would you suggest, as an alternative to spreadsheets, that works great when you mis-type the data you're analyzing?
Slashdot gets worse every day... Pipedot: News for nerds, without the corporate slant
If you work in a place where you have the time/resources for access to the ideal tools to solve every problem that pops up, I'd like to know where to send my resume.
--
"If you learn Ubuntu, you know Ubuntu; if you learn Slackware, you know Unix"
....Okay....
And how exactly is Unix relevant to a lifeform that is part of a Linux ecosystem? Unix has as much relevance to me today as the DOS I outgrew three decades ago. While I know that much of what I work with in Ubuntu and Linux in general is descended from Unix, I also know that much of the English I communicate with is descended from ancient Elizabethan dialects that only classical thespians use any more. And even they don't use the old words when ordering fries at the local MickeyDees when on break between rehearsals.
Will
Googling turned up this paper which has a table of studies of speadsheet error rates:
http://panko.shidler.hawaii.edu/SSR/Mypapers/whatknow.htm
Excel is terrible at this. It wants to *help* and makes all sorts of assumptions. Even when you change to to what it should be, it will change it to something else and be all like "Hey wasn't I helpful for you!". Which can be defeated by some weird workaround you can find on the internet that you can never remember when it inevitably comes up again.
Most database ID keys are stored as text, even though they are literally a "string" of numbers. As are a lot of other unique keys that are used. Excel loves to put that into fantastic scientific notation for you as they are really big numbers and that is what you want right? Or into Long. Or Double. Format. Re-format. Copy values. Smash face against monitor. Set Format. Import. Cry. Search Internets. Eventually figure it out again. Swear. Curse whoever gave you stupid data in the first place... Why are people sending you data in excel in the first place, in such a horrible way? Face Palm.
If someone is having a problem with excel, or with access after importing data, I would guess about 90% of it is excel's messed up formatting/Typing.
ahh the old "when all you have is a hammer, every problem looks like a nail." issue huh? You need to ask yourself if something can be prototyped in Excel, that's fine. There are tools out there, open source if you have zero budget. If you have to prototype out a solution or create a solution that's robust, that can be documented and reviewed, it's more often easier to use something like "R".
While you can get easy answers from a spreadsheet you may not necessarily get the right answer and that's where their abuse becomes apparent. I've seen railroads actually use Excel to manage daily consist for trains and when they started using it was for an emergency, but then every day suddenly had an emergency/exception and then it became operationally necessary. That was until the guy who based his whole job/career around maintaining the spreadsheet finally retired, then it was WTF? for the rest of the operation because nobody could make sense out of it.
Harrison's Postulate - "For every action there is an equal and opposite criticism"
This falls well within Microsoft's "Good Enough" computing strategy of software design. Sounds like to me someone needs a new line of work if he/she is tired of supporting the Good Enough crowd. The Good Enough era is also why M$ shares are declining...there is not enough good.
How can you possibly expect a CSV to contain a formula? It's a text file.
My Python programs are text files, and they contain plenty of expressions (or "formulas" in spreadsheet parlance). So do the JavaScript programs that add behaviors to Slashdot and other web sites. So I still don't see how a mode to export a worksheet's formulas as delimited text would be so impractical.
A lot of spreadsheets have formulas that are dependent on calculated values from the preceding row or column. In fact, the replication functions most spreadsheets have encourage you to do this.
The problem, as any well-trained computer scientist knows, is that floating-point errors can rapidly accumulate using these kinds of calculations. Very. Rapidly. That means that your answers fifty or eighty rows along might well be gibberish.
I've been to three separate meetings at three separate companies whee different people's spreadsheets gave hilariously differing answers. Faces got red, voices got raised. The reality was that no one had numbers that were even close to right.
And why should the demons of the financial industry deserve mercy?
It's past time to cull.
--
If we'd listened to all the people who said it couldn't be done there'd be no fusion power stations or cities on the moon.
I see this as related to the Financial Times now discredited attempt to attack Piketty. Even the FT sister publication The Economist basically said FT tried to do hatchet job and failed. Check the wikipedia article before the corporations rewrite it:
https://en.wikipedia.org/wiki/...
work in progress
I knew accountants that did many things with spreads that worked.
One ran many company ledgers on Lotus 123 including printing the balance sheets.
In another case one accountant ran a companies pricing on spreads even though there was a mainframe that could have done it. (Job Security). This too was Lotus spreads.
I will agree with the author that considering the bugs and problems with Excel, Access and Word speak volumes to that fact they are now just interim tools not permanent storage and retrieval.I tell people all the time Access and Excel are just work tools.
yeah...obviously you (and whoever upmodded you) have some sort of database branded software you use
what is it?
SPSS?
what are the options? what database software?
tell me then I'll explain why whatever that software does can be done on an excell spreadsheet, with all work being shown for anyone with a bachelor's able to follow easily
it's done all the time...it's math...
so tell me, what is a "non spreadsheet" software program that analyzes data
do it, so I can explain how it doesn't matter for you...this is a bullshit TFA with a pointless argument and it bothers me so many agree with it
Thank you Dave Raggett
Piketty's data is sound. His conclusions support his contextualiztion of the problem in a global context.
I'm not paying a global tax, but our nation pays money to all kinds of countries in the form of aid & econmic benefit...that's tax money...hell, the US practically armed the Egyptian army
that's our tax money...so we're already paying taxes for world peace as it stands
so you admit what no one else who criticizes me will...that your criticizm is based in your political ideology not at all about actual usage of spreadsheets in research work
Thank you Dave Raggett
https://en.m.wikipedia.org/wik...
Casteism
Too many times a researches made a fancy spread sheet, added some stuff over the years and then wants a decent application that does the same. :) :)
Many times the problems they're working on are very interesting, but it's a BIG challenge to rewrite everything to a normal application
But of course we like challenges like that
Privacy is terrorism.
I have not checked Piketty's data myself, but I have seen it criticized by people who agree with his conclusions. I have also seen his comeback to a criticism that went into detail about the mistakes in his calculations. That response was essentially, "No, you are wrong." Piketty did not actually address the specific problems pointed out in his work.
The truth is that all men having power ought to be mistrusted. James Madison
I'm imagining something closer to VHDL than a traditional programming language where you primarily define connections and initial values.
It needs a way to name cells, groups of cells, contents and a way to set cell contents.
'alias' <identifier> [ '(' <parameter list> ')'] '{' <value> '}'
'set' <cell range> '{' <value> '}'
You really just need these two mechanisms, plus some extensions for formatting. A way to have set support relative addressing in the value would be nice.
# Named Cells
set A1 {"AGE}
alias age {B1}
set A2 {"Year of Birth}
alias birth_year {B2}
set A3 {"Current Year}
alias current_year {B3}
# Named Routines
alias SUBTRACT(minuend, subtrahend) {(minuend − subtrahend)}
# Simple Logic
set birth_year {1980}
set current_year {2020}
set age {=SUBTRACT(current_year, birth_year)}
# Relative addressing idea
set C1 {0}
# Previous row
set C2:C100 {=@[-1]+1)}
# Previous column
set D1:Z1 {=@[-A]*2}
# Diagonal
set D2:Z100 {=@[-A,-1]+3}
It is sort of like a language that has manual linking specification with the concurrency of VHDL. This would at least be able to be code reviewed. Redundant sets of the same cell or alias would probably be an error. That would allow for creating a dependency tree. That makes the ordering of the statements unimportant. maybe it requires aliases to be defined before use. It wouldn't be hard to write a script that makes a spread sheet from this, but what would be better would be to add a spreadsheet plugin that lets you view this code side by side in realtime with the spreadsheet. Automatically having changes update across. and letting you use the aliases directly.
Hmm... I need to write some yacc/bison.
Ignoring trolls does work, though some of them are so persistant that I see my mental disorder theory strongly verified.
Assorted stuff I do sometimes: Lemuria.org
The Economist essentially says: "We can't be sure he intentionally fudged the numbers, and he says he didn't, so we'll take his word for it."
I don't think that's anywhere near a sufficient summary of the writer at The Economist says, no.
That is an extremely weak defense.
You have committed a very serious error of thought. Choose your presumptions with greater care!
Before you go accusing anyone of murder, rape or serious academic misconduct you had better have some real evidence to back it up. I trust you are not claiming that Giles' points of attack are sufficient to establish so serious an allegation to any reasonable level of proof?
Be sceptical about Picketty's treatment and use of data, of course, but there can be no serious suggestion that Picketty "intentionally fudged the numbers," and it is scurrilous to imply he may have. Yes, we ought certainly "take his word for it." To do otherwise, absent compelling evidence to the contrary, is beyond contempt. Seriously!
Besides, that piece was hardly a defence, but rather a fairly neutral assessment of the dispute. For a more spirited, and highly informative, defence, today's data analysis piece in the The Guardian is more in order.
Better to be despised for too anxious apprehensions, than ruined by too confident a security. --Edmund Burke
Someone who accuses another of committing "a very serious error of thought" should probably not compare academic misconduct to murder or rape. When was the last time anyone went to jail for academic misconduct?
Does Piketty's book disclose that he altered the numbers he worked from, such that they do not reflect the sources he cited? If not, that would constitute serious academic misconduct under some (relevant) standards.
Of course, he admits to altering his data after someone else noticed it, but some of his changes are apparently plain errors, and the rest are not annotated or explained. They also push the data in a direction that Piketty finds useful (long before the FT article came out, scholars criticized the book for making overblown conclusions that were not well-supported by the data it presents), which is further reason to be skeptical about the nature of the changes. Data manipulation doesn't have to be intentionally biased in order to be consistently biased.
To pick just one flaw in your new link, I cannot reproduce the 6% discrepancy that Reed claims in his The Guardian piece between data sources (b) and (c) -- I calculate it as an average 3.8% difference in the top 10%'s wealth over the eight-year overlap, dropping to 3.3% if you exclude 1974's anomalously large discrepancy, and in either case dropping a bit if you round (b) to whole percentages to match (c). On top of that, the difference goes in the other direction for the top 1% estimates, which makes suspect claims of systematic bias between the two, rather than sampling noise. I think I'll opt out of being "highly inform[ed]" by bad arithmetic, thank you very much.
Many years ago, there was a program on the Pizza Box (aka NeXT machine) called Lotus Improv. It even came out on early windows systems.
It did something wonderful to spreadsheets.
It moved the formulas out of the cells, into a formula plane.
It got rid of the 2d system, where you put different sections of data on different parts of the plane, and try to keep things straight as your database grows, and instead used a collection of n-d spaces. Each one of the collections dealt with one subject; each one could track as many as 8 dimensions, and tracking 3-5 was typical.
It worked. It worked well.
Code was readable -- nothing was duplicated in every cell, or rather, almost duplicated with slight variations in each cell that you had to hope and pray was given the same and correct slight alteration each time (and god forbid you needed to change the template spread across everything). Instead, you defined clear statements once, and it automatically adjusted for each different cell.
Improv was wonderful.
Why did it die?
Someone who accuses another of committing "a very serious error of thought" should probably not compare academic misconduct to murder or rape. When was the last time anyone went to jail for academic misconduct?
Whether an offence carries the prospect of a custodial sentence or is immaterial to the question of the presumption of innocence. You err again. However your previous error, to work instead upon the presumption of guilt, was by far the more serious. This you must not do!
The "comparison" (and there was no actual comparison) with murder and rape was to emphasise, to someone who would so easily accept an accusation that "data was pulled out of thin air," the seriousness of that allegation. There can be no more serious accusation levelled against a scholar qua scholar than academic fraud. One might not end up in prison, but such an allegation being proved, rightly spells an end to an academic career and perhaps even an ancillary career (Consider the well deserved fate of one Andrew Wakefield for instance). It is an extremely serious matter.
Before you go accusing anyone of murder, rape or serious academic misconduct you had better have some real evidence to back it up.
Does Piketty's book disclose that he altered the numbers he worked from ...
Having not read the book I would presume that he it did. As should you. Moreover the fact that he discloses his data and sources exposes any alterations to scrutiny (including that of Giles). I agree that he ought in addition be explicit about the need and methodology of adjustments made to raw data and should be disappointed if he has not.
To pick just one flaw in your new link ...
I claim no ownership of that link, and really shouldn't respond to this distraction (take it up with Howard Reed) ... however ...
I cannot reproduce the 6% discrepancy that Reed claims in his The Guardian piece between data sources (b) and (c) -- I calculate it as an average 3.8% difference in the top 10%'s wealth over the eight-year overlap
On the ten (10) year overlap, (the last eight (8) of which are contiguous), I find an average 5.4% discrepancy. On the last 8 (contiguous) years the discrepancy is 6.2%. I'm not sure what it is you are doing to get your results, are you working from the supplied spreadsheet?
I think I'll opt out of being "highly inform[ed]" by bad arithmetic, thank you very much.
Quite.
Better to be despised for too anxious apprehensions, than ruined by too confident a security. --Edmund Burke
News flash for you from the real world: Society uses different standards of proof in criminal cases than for academic misconduct investigations. Still, I never suggested we should presume that Piketty was guilty of either, and I think you are arguing in bad faith to suggest I did.
We could argue all day about whether it is better to use the original Series C numbers (as I did) or the revised Series C numbers (as Reed did, perhaps because every revision for the top 10% of wealth was a downward one). We could also argue whether the differences between data sets are because of differences in how they estimate "wealth inequality" or whether they are instead estimating different kinds of wealth inequality (Atkinson et al. used estate numbers, which quantifies things at the end of life, which is about the least relevant quantity for Piketty's proposed policies).
At any rate, none of the discrepancies are pertinent to Piketty's claim that wealth inequality has been increasing since 1980. As the same data set is available and used for the 1976-2005 period, inter-data-set differences should not affect the shape of the curve during that period -- only its level. It is also tendentious to assume, as Reed and Piketty apparently want us to, that lower wealth-inequality estimates from during and after the recent global recession mean that the measurements need to be adjusted, rather than that they reflect an actual change in the wealth distribution.
We could argue all day about ...
But we shan't. As I said it's a distraction to the point I made, take it up with Mr Reed.
News flash for you from the real world: Society uses different standards of proof in criminal cases than for academic misconduct investigations.
You are confused. I wrote nothing about standards of proof. It is yet another error to conflate presumptions with a standards of proof. Third strike ... you're out.
Better to be despised for too anxious apprehensions, than ruined by too confident a security. --Edmund Burke
If I am confused, it is because you are spewing nothing but stuff and nonsense, and because you continue to lie about "presumptions" that only you have suggested. You admit that academic misconduct is not a crime like the things you compared it to, and you implicitly concede that it is not judged by the same standards. There are a lot of things that can ruin a career, and some of them are well deserved! At least in the US, we are very tired of people who feign outrage when they cannot otherwise defend their positions, so do not expect to get away with that here.
You're out.
Better to be despised for too anxious apprehensions, than ruined by too confident a security. --Edmund Burke