The Subtle Tyranny Of Spreadsheets
pipingguy writes "I found this link on a CAD-related mailing list which questioned the current state of spreadsheet usage. Since using spreadsheets is often only one step away from PowerPoint mastery, I thought it worthy of submission." An excerpt:
"The second distortion caused by conventional spreadsheets is more subtle. It's described in a 1980s paper, written by university researcher Jeffrey Kottemann and others concerning what they called 'Performance, Beliefs, and the Illusion of Control.' The paper described an experiment in which subjects were asked to perform a planning task using different tools, some of them with elaborate what-if capability and others without it." Yup, it's a ZD/Yahoo link, but it raises good questions."
for the love of god, stop misusing spreadsheets/excel as databases- They are for calculating numbers, not creating lists of things!!!!!!
The question is whether a tool can ever be a substitute for a good understanding of statistics and probability - or whether it will always be a case of monkeys playing with ever more sophisticated typewriters...?
"The expected Year 1 profit is $1 million, but there's a 30 percent chance of losses for the first two years."
Unfortunately or not, this is not what the bosses want to hear. They want to know that profits will be $1 million. Perhaps the spreadsheets have not adapted to uncertainties for a reason.
This reminds me of something a successful businessman told me about accountants: "Accountants know the cost of everything, and the value of nothing".
A problem occurs when people look at a spreadsheet of accounts and think it represents a business. It doesn't. A classic illustration of this is Marks & Spencer's returns policy. If you buy a pair of trousers from Marks & Spencers and then once you've got them home decide they don't fit or whatever, you can return them, no questions asked. To an accountant, this is just a cost. There is no identifiable figure in the accounts that you can point to and say, there's the benefit of that cost. And yet many people shop there because of the policy.
Another thing that suffers from this type of mentality is long term R&D. Japan has had many very long term R&D projects which has been criticised by outsiders as being too long term.
I've just been watching a Japanese robot demo on the TV. Very impressive. I think the fruits of there long term investment in robotics R&D will be seen in the next decade.
Of course this is actually an advertisement for a specific software package. But whats funny is that the story undercuts itself: It explains that people are wasting their time doing detailed future predictions with spreadsheets. Then it goes on to push this particular product as a way of doing detailed future predictions using statistics. But they never make the case that making predictions is good anyway, while they do provide evidence that its a waste of time!
I dont know anyone who uses their spreadsheets for doing any kind of predictions. Everyone I know uses it just like the old-fashioned pen-and-paper..spreadsheet! Its a way of accounting for the here-and-now. How many businessmen don't understand their business prospects better than a garbage-in-garbage out number crunching computer?
A spread sheet is not a stastics program. However if your office bundle includes a hammer, everyting starts to look like a nail. Excell does math, It's the hammer that makes stastics look like a spreadsheet problem. Enough said? Hammer - nail, Excell - spredsheetable data. For stastics programs look here for a list of some real stastics programs. They are not spreadsheets.
http://www.wch.org.au/CEBU/software.htm
I guess it's kind of like trying to write HTML with MS Notepad. It can be done, however other tools make the job easer.
The truth shall set you free!
The scope of the article is really limited to the use of spreadsheets in financial planning (forecasting). For which the criticisms of the author and the material he cites are pretty valid. Indeed we all have our pet hates when it comes to how the tool is used (you have no idea how much of the financial world is ruled by this spreadsheet or the other driving trading decisions!) however, the tabular representation of data is not inherently broken and it behooves the computer scientists amongst us to ask why this form has usurped the database for the representation of simple datasets and all to frequently complex ones.
"The first thing to do when you find yourself in a hole is stop digging."
So far I have seen Excell used for issue mangement, system requirement repository, time tracking, time estimation, code dependency tracking, system reference data and configuration data repository, ...
..and in 99% of the cases the spreadsheets don't even use the SUM function.
"There is a terrorist behind every bush"
People do this, certainly in my experience, because the only database they have for use is MS Access.
So really I don't blame them for avoiding that utter POS software.
You have to remember that people are stuck working within the confines of whatever software the business deems 'acceptable'. Although it would be great if we were all on Linux/OSX at work we're not.
What about people or (even worse) companies who are determined to send data in Excel sheets and it has to be processed automatically. Columns being deleted because the data typist thinks it is no longer needed, adding columns because there is more 'important' info to add, align the zipcode into its column using spaces after the address (hard one to spot!) and of course the very popular extra comments at the bottom of the data breaking the import routines.
A quick google search reveals evidence of only one paper (but not the paper itself, unfortunately) entitled, "Performance, Beliefs, and the Illusion of Control", see, e.g., here:
Kottemann, J.E., Davis, F.D., & Remus, W.R. (1994). Computer-assisted decision making: Performance, beliefs, and the illusion of control. Organizational Behavior and Human Decision Processes, 57, 26-37.
Note that this paper was published in 1994; it's not a "1980s paper" as cited in the article. Careless errors like this make one wonder what else in the author's train of thought is similarly researched. Perhaps he's just incorporating incertainty into his references, too--or, maybe he considers 1994 to be statistically similar to the 1980s?
I put "enhancements" in quotes because I am skeptical that this actually represents a true improvement of either the quality of the information or user efficiency in finding and using information.
These so-called improvements gloss over the continuing problems that plague spreadsheet users:
- Spreadsheet models encourage the use of "spaghetti" logic, where cells point to cells that point to cells, and can grow into random networks of calculation logic;
- They permit lots of easy off-by-one errors;
- They generally are difficult to verify/audit;
- They do not provide good tools for managing data either in terms of consolidation or searching for specific detail;
- Perhaps most importantly, despite their convenience, spreadsheets are not a robust repository for information.
I have seen one multinational enterprise that (believe it or not) built a budgeting system atop sets of dozens of departmental spreadsheets that they would roll up into a master budget; while it's a neat extension of the technology, only a fool would try to use this to run a large enterprise. One bad link in one subsheet, and the whole house of cards could fall down. (And the "top" vendor these days, Microsoft, isn't noted for building products that are of industrial grade robustness.)The last few points point towards where I would like to see spreadsheets go. They have been, and are very good at producing ad-hoc, one-off reports. This is a proper use of spreadsheets.
They are often being used instead as repositories for information that really ought to be managed by a database management system of some sort.
What spreadsheets should do is to allow, nay encourage, the use of data extracts from external sources, notably relational databases. The use of named ranges (which are a venerable feature from at least as early as Lotus 123 v2.01) is of assistance; Lotus Improv was a rather complex-to-use test platform for improved "modelling" whose functionality included database extraction.
Using external repositories permits the benefits of:
- A single repository that can be kept correct, rather than a multitude of mutually incompatible data stores;
- Data synchronization (a restatement of the last);
- All the good RDBMS "stuff" like:
- Data modelling and
- Stored Procedures/Triggers
In effect, the real point I would propose is that the task of building a spreadsheet should involve some data modelling, with thought not just about the report at hand, but also about where the data comes from and perhaps should go to.- Field validation,
- Maintaining field relationships,
- Transaction logging,
- Centralized backups,
and perhaps even more sophisticated things such asSpreadsheets suffer from programming flaws that we've ruthlessly stamped out in programming languages.
:
Some of these flaws are
- Cryptic names for fields
- No comments
- No obvious flow of control
- No modularisation
- No capability to test spreadsheet sub-components in isolation
- No capability to do a diff to see what's changed between versions
Spreadsheets also add flaws of their own, such as unlocalised references.
If we had to design the worst possible "programming language" we'd be wise to look at spreadsheets for an example of what to include.
I was recently on the market for a new car (hoorrray!). I shortlisted three vehicles for me to consider and I asked the salespeople of the respective companies to mail me data on service plan, warranty, replacement part prices etc. on all the three vehicles. I got two replies with Excel documents and one with a printer-friendly PDF.
I am all for open standards in communication, but what shall I do? Send a reply to the salesman "you f*ing Microserf moron, I don't want your car if you force me to buy a bloody spreadsheet just to read how much do you charge for a goddamned air filter?" But is it wise to choose a car just because of the software that a salesman uses?
Finally I picked the one that was described in PDF. It was a coincidence - a decisive factor was actually that the make of that car constantly tops in the consumer surveys, while the other two are just about average. But then I started to think - maybe that's not a coincidence after all? Maybe this make tops in surveys just because it's policy is to make all stages of customer experience as convenient as possible and they ask themselves the question that other car salesmen don't ask - "what if my prospective client does not use Microsoft Excel(TM) or Microsoft Word(TM)?".
Maybe it is possible for us to vote with our wallets against proprietary, closed standards?
Similarly, I don't have any problems with using Excel as a basic flatfile database (never relational though, I'm not that insane) where the visual layout of the data is more important than the flexibilty of querying. That said, on a basic flat-file database you can actually perform some quite sophisticated filters using Excel's auto-filter function.
I don't think the problem is with using a spreadsheet as a word processor, database, or any of the other uses it can be shoehorned into. The problem is simply that people correctly see a spreadsheet as a jack of all trades, but forget that this implies it's a master of none, with the possible exception of what it was designed for: crunching tabulated numbers.
UNIX? They're not even circumcised! Savages!
Oh I couldn't agree more about the presence of a product niche that has a spreadsheet-esque interface, but not only enforces relationships, but also provides all the snazzy features (statistical operations, et al).
:-p
The big questions here are then how many users actually need this, and how many naive users correctly understand the concept of "relationships" between data and the enforcing rules?
I have a sneaky feeling these features are missed mostly by developers tyring to squeeze that extra bit of functionality from a spreadsheet (be it to impress, or out of sheer laziness)
http://efil.blogspot.com/
From the article:
The first distortion is the use of point values and simple arithmetic instead of probability distributions and statistical measures. So far as I know, there's no off-the-shelf spreadsheet product--certainly none in common use--that provides for input of numbers as uncertain quantities, even though almost all of our decisions rest on forecasts or on speculations.
I am a student of this university : http://www.sgh.waw.pl/
Currently I am having a course in the use of Excel for prediction purposes. We do a lot of different case studies. We use Monte Carlo simulations, statistical tests, Markov chains and so on. We always discuss risk (variance, value-at-risk and so on). Excel is our basic tool and it is fine. We use different tools for specific purposes: Best-Fit for distribution fitting.
It is not a flaw of the tool, it is a flaw of the user. As someone said, give a monkey a PC instead of a type writer and you will get digital bullshit. I can only demand that people without proper education are not allowed to deliver multi-million business forcasts.
Fight Frist Psoting!
Browse Slashdot with 'Newest First'!
1 -- the article is a content-free advert for Whitebirch's financial toolkit
2 -- Excel is an incredibly powerful and important piece of software which many if not most large corps can't do without. There is no alternative to it. The fact that it's unpleasant to use is beside the point -- nobody has been able to come up with a better (or even comparable) replacement. In my experience, there is a large segment of the IT community that is pathologically unable to focus on business needs enough to understand this.
Whence? Hence. Whither? Thither.
Spreadsheets are also terrible at 3D rendering and at making coffee. They are however great for evaluating simple models with many variables. Don't confuse them with real programming languages.
Well, actually your guy computed the MEAN, not the MEDIAN. Mean is "sum of datapoints divded by number of datapoints". Median is "the center datapoint in an ordered set". To get median, you sort the data and take the center most datapoint if there are an odd number of datapoints and mean of the 2 center most if there are an even number of datapoints.
And on another note, if you have a summary report with each line having a median on it you can not get the grand total mean by taking the mean of the medians! It's even worse if you try to take the median of the medians! To get the grand total you have to go back to ALL the data points, order them, and take the central one. However if you do this, there is not a "pointy haired boss" around who can figure out why the "numbers don't add up"....
This is not an issue of spreadsheets, this is an issue of PHB's not understanding basic math.
I can't think of anything you can't do in excel.
You could make the first 4 pages with formatting, then simply import the csv values into a third source page.
I appreciate that Excel gives me the capability to make simple semi automated forms that look nice.
Prevents simple errors, and they're easy to use.
The limit of Engineering as GPA goes to zero is MBA.
Typically, math is the skill that drives that GPA down. OK, the bad joke is starting to look like a flame, and it's true that clueless big dogs with their sensless five year plans make me angry, but please - this is a joke. Everyone has got their skill set.
The simplest example of a bad problem for a spreadsheet is billiards. Momentum transfer is easy but predicting a billiards game is impossible. Yet businessmen make this kind of mistake all the time. There is no cure for this kind of bad judgement and it's good that the people at ZDnet have pointed it out. I just wish they were not trying to promote statistical packages that people are not likely to understand as a substitute for good judgement.
Friends don't help friends install M$ junk.
The classical Slashdot debate features something-stupid-done-or-said-by-non-IT-savvy-gene ral-managers, and then the appropriate bashing by IT-savvy Slashdotters. If there were a similar forum where my profession were in majority, they would probably be bashing this very thread right now (I am an economist and business manager).
Just like, say, PERL or Java, spreadsheets can be used well, and they can be used poorly. Furthermore, people with good "technical" Excel skills can produce lousy spreadsheets with little analytical value, and vice versa. I have seen some fantastic spreadsheets which have totally revolutionized the way people saw a problem. At an insurance company I worked with, they used a huge spreadsheet to do a simulation of the effects on every single customer of a planned, dramatic price increase. The result: They realized that the price increase would have much less impact than they feared. Thus, the product was kept and the employees kept their jobs. The thing with the spreadsheet was that it was developed in fast trial-and-error loops, which meant that their run-once-per-night SAS tools were not an option (this was 7 years ago).
(I have, by the way, also seen people spend 3 months on developing a mega-spreadsheet for assessing the value of a company, only to use the wrong assumption for a critical value and thereby introducing an error of about 40% in the valuation [that critical value being the discount rate]).
I can assure all the concerned citizens of this forum that there is indeed a lot of excellent, first-rate Excel usage out there. Analytical power beyond our wildest dreams is at the fingertips of people without skills in programming at any lower level. This, believe it or not, is a good thing, because anyone who has dedicated himself to becoming great at programming is probably less skilled in disciplines such as financial analysis.
Sure, there is "bad code". Sure, people get a false sense of control. Sure, this new tool puts too much options in the hands of people who do not know how to use them. But how would that be untrue of other IT tools or programming environments? What does it matter that they use Excel as a database, as long as it gets their work done easier than getting an SQL education and then doing it "right"?
Biases are part of all decision-making (as even economists are realizing). So what if that is the case in Spreadsheet World, too?
Err...I too work doing rates-related stuff for major banks. Blah blah blah??!! That's the entire point of the rates business - that's why those traders are employed, because they can tweak their prices to make a profit from the market.
Controlling tried to impose new tools on them to get a grip on their price calculation- all very difficult when the only data source is a "spreadsheet".
It did what? Really? A cost centre tried to impose inadequate tools (your own admission - not flexible enough) on to people who were actually generating cash for the bank? And they rejected it did they? Good Lord, how terribly surprising.
Sorry, but I'm utterly shocked at the cavalier attitude displayed here. I work doing a very similar job to the one described (writing tools to control rates pricing), and I tell you now that wandering in to our profit-producing users and saying that their rules are a load of 'blah blah blah' would, quite correctly, get me booted out of the City forever.
Cheer,
Ian
Spreadsheets are critical tools for "knowledge workers" because they allow them to explore ideas, analyze information and identify trends. The problem is that most "knowledge workers" are competant at some aspect of doing business and not at developing appropriate software tools. It is a problem when a spreadsheet is used as a multiuser shared data application. Spreadsheets allow:
* Entrepeneuers to financially model their business plan.
* Calculations to be performed more accurately than say, in the margin of a ledger pad.
* Simple busines processes to be tracked and managed using a computer instead of say, a legal pad.
* Executives to summarize and categorize and drill down to analyze information from a database (pivot tables)
At the end of the day, I've found that spreadsheets are not the cause of business mistakes. When there is a spreadsheet failure, there are ususally a couple of fundamental problems:
* Lack of attention to detail
* No oversight or validation
* Numbers are not reliable to begin with
* No one bothered to actually do a what-if using a reasonable range of scenarios - they only looked at the rose colored one.
-- $G
I can explain it in one sentence: No one is trying to outlaw MS Office.
"The best argument against democracy is a five minute chat with the average voter."
--Winston Churchill
I normally lurk, but need to say that I don't think spreadsheets are inherently evil. I am an engineer at a large oil company, and we use spreadsheet models for a number of processes. Typically the spreadsheet is used as an interface, since everyone is familiar with it. The "number crunching" is done with VBA. I know that among the readership of /. VBA is a dirty word. For an engineer, though, it is not that bad of a tool. Not particularly fast, but for simple, numeric algorithm implementations it works fine.
Sure - we can and do use purpose-built models. They have their place. However, they tend to be black boxes that can't be easily modified. They also tend to be really, really, really expensive and more of a solution than you need. In other words, for some problems, they tend not be the most cost-effective means for computation.
If transparency of the calculation method is most important and not millisecond execution speed, then I agree with a previous poster who argued that Excel and VBA tend to be "open source" in the context of "how the calculation was done".
Now fast forward to present. I would agree that people want to use Excel for everything: database, graphics, plotting, forms, as a programming environment (oh the humanity!) etc, etc. Most excel users probably don't even know how to use a formula in Excel. The other extreme is when the calculations are so complex that it would be better to switch to Mathematica or Matlab. But Excel is the only tool they know and they want to use it for everything. I can hear my boss's voice in my head "Let's use Excel for this" with the intonation that would make one believe it's the greatest idea ever. Oh, well, have to get back to work where I am forced to use Excel for most of those tasks mentioned above, yes, I am one of the guilty.
I think its funny that here at slashdot the center of advocasy for open software, that 95% of the discussion here is using Excel to mean spreadsheet. Talk about subtle bias! Apparently OO isn't good enough, or it isn't popular enough even amoungst slashdotters. Perhaps, its a mistake to give such generic names to the components of OO. Now if it was something like firecalc or pheonixview then I think it would be discussed more. Instead, Now when you talk about an individual component you have to use the suite's name( IE OpenOffice Calc). No one says Micorsoft Office System Excel 2004. They just call it excell, a techno sounding name that doesn't provide any clue as to its use.
Well.. maybe. Or Maybe not. But Definitely not sort of.
That's quite incredible!
The funny thing is that while everyone is going to look at this and say that it is ridiculous, and it is, think what people would say if it had been done with GNumeric. The Slashdot headline would read something like "Cool Hack Let's You Play Pacman in GNumeric" and there would be 300 comments saying how cool it is. Another 50 comments would say that the guy has too much time on his hands. People would talk about the awesome power of GNumeric but, no one would complain that it was an absurd abuse of Gnumeric as they are here about Excel.
Just some perspective.
How about the simple idea of breaking away from the rectangular grid? Or free form cells placed on a diagram or schematic or blueprint?
--- Ban humanity.
An acquaintance criticized spreadsheets and praised pencil and paper forms because mathematical errors can crop up in either one, but with paper there is a double-entry system, running totals, and review by brains and eyeballs.
My argument is that paper is a big step backwards:
line 2 (non-paying customers): 10
line 3 (all customers; add 1+2): 400
I use a decently large spreadsheet to run Technical Video Rental, and I've certainly found bugs in it, but I've noted that the bugs are denser, and harder to find in those areas where the computation appears with more intermediate values hidden.
I think that a more confident spreadsheet programmer tends to hide more variables in complex cell formula; as I am not a confident spreadsheet programmer, I've - in many places - spread formula across multiple cells...and this has helped me figure out bugs.
This points out running totals as one example of good practice. Nothing could be simpler in a spreadsheet, yet we almost never see it.
So: why do spreadsheet programmers not do these things?
One reason that occurs to me is that spreadsheets conflate calculation with presentation. Intermediate values use up screen real estate, and look ugly.
Yes, there are tools that *allow* one to separate calculation from presentation: one could have two separate tabs, for example.
Yet these tools allow for disambiguation of calculation and presentation in the same way that assembly programming allows for object oriented design.
Or, to rephrase it: "Hidden steps considered harmful".
I don't even like C/C++ code that puts too much computation on a single line: I want intermediate values that I can step through with a debugger.
Perhaps what's needed are much higher level tools with in the spreadsheet that let one select cells of interest on one tab, then create a presentation tab based on these? I've got visions of cool Mac-Aqua-like greying out of 90% of cells, while one drags and drops the still-crisp cells around... Another/alternate idea: it might be nice if instead of the heavyweight tabs that most spreadsheets support, one could open zoom in on a single presentation cell and investigate little "pocket tabs" which might have ~10 x ~10 cells in them. The equivalent in C/C++ would be a complex expression on one line that decomposed itself into multiple lines with intermediate values only when you walk it with a debugger.
Now, don't get me wrong: I'm not arguing for fancy presentation layers, or dancing pie-charts; I'm arguing for the ability to take a huge page of calculations and tie the some of the inputs, intermediate steps, and output to a much smaller summary page, or, conversely, I'm arguing for the ability to take spreadsheets as they are currently written, and expand them into a debuggable format.
This, I argue, would make spreadsheets more useful, and decrease the number of bugs that crop up in them.
This is all within the realm of what a spreadsheet is supposed to do. Actually the spreadsheet existed as a financial tool long before the computer was even invented.
OK mod me down as flame bait, but at least I am on topic!
I catch flack each and every time I say that, but I still think it's true.
The ss has some serious advantages. In an environment of increasing number density and decreasing personal involvement, the need to have a comprehesive tool for data analysis could only have given birth to the spreadsheet. We could talk all day about how handy the ss is for many of the tasks in this environment.
But the space between the substance is what concerns me. Ss have allowed us to max/min too many things without much regard for the things that are undefined and necessarily intangible, but are still entangled in the matter itself. No corporate ss takes into account the costs of pollution, unemployment and general social degradation due to uncontrolled greed.
Like handguns, ss have brought us significant personal power at the cost of a good many social problems. Hence, they seem to require more careful handling and regulation. One aspect to this is training, and in general ethics training is a good place to start. (The BBB in my area is attempting to emphasize this, but they are meeting stiff resistance from the business community.)
Ss should be used with care, and their results are suspect anyway. That's the least message I've tried to convey for years.
[You have a stable society when some nut guns down a schoolyard and the law doesn't change.]
The worst misuse of Excel is as a database. And yet Administration / HR / Marketing staffers always end up using excel to store extremely important data. Sales records, accounts receivable, timesheets, inventory, contact lists - you name it.
:-\
And they always organize the list with subtle font-weight and cell-shading. Woe unto the intern that accidentally Selects Edit->Clear->Formats. Woe unto the manager that needs to sort the list by "bold" or "light-green."
Unfortunately, MS encouraged this perversity by including the menu option Data->Forms
What were they thinking?!
In the end, I have to come along with MS Access and clean up the mess. Oh well, it's a living
The same person who does the Excel spreadsheet design could learn to use MS Access fairly easily. It's not that hard to use to do simple stuff
While above might be true for people with technical background, it's incredibly difficult for those who have barely any technical knowledge (other than being able check their email, create their presentation, write their letter, etc). Is it yet another tool, and brings with itself yet another set of complities.
You already have Excel. Chances are you have MS Access as well. It can make you that easy to use, eatrure rich front-end. It can be the backend as well, although preferrably you could use it to talk to a SQL server. It can talk to just about any database server you want over ODBC, be it Oracle, MS-SQL, MySQL(yuck), or whatever
Oh definitely, MS Access would come typically bundled with the entire Office suite, but how many executives really are able to get their head around it? You and I may have heard of ODBC, and the capability to connect to several databases, but an average "joe" user? That's asking too much IMO.
The whole point being why add more complexity to the end-user - which is something they are least interested in.
http://efil.blogspot.com/
What are you, retarded?
This is exactly the kind of ass backwards crap that caused the dot bomb. Who in his right mind would ever conceive of LAUNCHING F'ING EXCEL to perform a batch process?!?!?!
I mean seriously. Do you have any idea how non-scalable that is? I am a big believer in good enough being good enough, but any batch system that requires launching a visual application is at best a school project or a short term hack, NOT A SOLUTION.
The problem with any of these things is that it allows non programmers (which you obviously are) access to "programming" without any inherent understanding of what the hell they are doing.
Safer, make a table to compare all the distinct bad values and relate them to the correct one...
...
BadSize GoodSize
76x80 King
K King
KING King
KNG King