The Subtle Tyranny Of Spreadsheets
pipingguy writes "I found this link on a CAD-related mailing list which questioned the current state of spreadsheet usage. Since using spreadsheets is often only one step away from PowerPoint mastery, I thought it worthy of submission." An excerpt:
"The second distortion caused by conventional spreadsheets is more subtle. It's described in a 1980s paper, written by university researcher Jeffrey Kottemann and others concerning what they called 'Performance, Beliefs, and the Illusion of Control.' The paper described an experiment in which subjects were asked to perform a planning task using different tools, some of them with elaborate what-if capability and others without it." Yup, it's a ZD/Yahoo link, but it raises good questions."
That makes absolutely no sense at all.
Excel is perfect for creating lists of things, and being used as a way of storing simple data...
If you want to use that data for other purposes or it is at all complex, then sure, don't use excel.
What is a set of numbers, what about a list of data with associated figures, get real...
Problem being, that the data isn't 'related' in any sense, and when a user manages to move the data in one column, and then it's out of sync with everything else, they call me up and whine....
back in 1997 when I was a physics exchange student in Glasgow, they made me solve a *quantum machanics* problem using excel! it was ridiculous. I kept the spreadsheet just for its absurdity (it's the only .xls file on my entire harddrive)
The question is whether a tool can ever be a substitute for a good understanding of statistics and probability - or whether it will always be a case of monkeys playing with ever more sophisticated typewriters...?
This reminds me of something a successful businessman told me about accountants: "Accountants know the cost of everything, and the value of nothing".
A problem occurs when people look at a spreadsheet of accounts and think it represents a business. It doesn't. A classic illustration of this is Marks & Spencer's returns policy. If you buy a pair of trousers from Marks & Spencers and then once you've got them home decide they don't fit or whatever, you can return them, no questions asked. To an accountant, this is just a cost. There is no identifiable figure in the accounts that you can point to and say, there's the benefit of that cost. And yet many people shop there because of the policy.
I really fail to see the point in these posts about a spreadsheet program (be it Excel), not being a database.
Maybe there is a genuine need for a database program (and I use this term here loosely) that provides an interace as easy to use as spreadsheet? Not every user is a programmer, and the vagaries of the any DBMS are well known. Besides, no end-user wants to meddle with software administration.
Maybe the users use it as a database, simply because it provides an easy means of storage and manipulation of trivial data? Not ever user (not in every case, at least) has a million records to work on.
Yes, spreadsheet tools may not have capabilities such as porbability distributions or statistical measures. How many naive users need them? Oh, the average executive might need them to project forecasts, but then, is there a tool that allows this? Conversely, if this limitation has been identified (and I'm sure this must have been identified in the past and by others, as well), why do we not see this being incorporated in any mainstream spreadsheet? (hint: there probably is not enough critical mass of users demanding such a feature).
The other point listed in the article - "the worst nightmare of those who justify IT's return on investment - spending extra money on a more time-consuming product that yields absolutely no measurable improvement?". Well then, perhaps in that given scenario, the need wasn't evaluated correctly? Or maybe such a complexity wasn't required after all?
It's easy to point out the missing features/capabilities from any software, but if it's not asked for by average/most users, it will take a long while to be incorporated (if at all). Yes, this however leaves the issue of errors introduced by the use of such spreadsheets, whether tacit or implicit. In either cases, it would be due to the user being unable to find the right tool to model the problem, or not being able to understand the problem correctly and hence not taking into account as many (if not all) parameters involved.
http://efil.blogspot.com/
Spreadsheets have been and will always continue to be an extraordinarily powerful ad-hoc tool for those wishing to tabulate data with automated calculations. They are worse than useless if, for whatever reason, the user has no savvy approach to the problem at hand, or if the model which requires manipulation has no concrete representation.
After many years with little use for a spreadsheet (previously having used Supercalc and Lotus 123) I was shocked by corporate state of the art. Specifically, I was disturbed by the type system employed to represent cell values and by the way in which formatting settings can so easily obscure the values actually being processed. The way in which Excel handles dates seems particularly horrific... and OO-Spreadsheet just mimics the same mistakes. I was also amazed that modern spreadsheets haven't started to use extensible libraries to represent new data types. It seems a no-brainer for a spreadsheet to make use of pluggable C# or Java classes to allow domain specific types to be manipulated in the context of a spreadsheet environment. Am I missing something - or have we not only failed to advance the art (as suggested by the article) but actually taken several steps backwards?
stop misusing spreadsheets/excel as databases- They are for calculating numbers, not creating lists of things
1. Blame AppleWorks first. Before excel it made spreadsheets like databases.
2. If you look at the history of the spreadsheet, you will see that VisiCalc was designed for "What If?" not large scale calculating work. I was taught that spreadsheets are for the display of information - not calculation.
3. Of course I don't even need a database for storing some kinds of information. An ordinary text file is actually good enough. For example my address book is a text file.
4. I think the greatest misuse of spreadsheets is in using them to consolidate financial data. It's seductive. You get to see what you are doing, you get visual feedback, but
a. data is not protected against alteration
b. formulas are not protected against alteration
c. there is no audit trail
d. you are using explicit formulas instead of looping over data files
5. Lastly, you can say to yourself when you use a spreadsheet, "Look Mom, I'm not programming." Pretty soon you are using Macros, then Word Basic then Visual Basic for Applications. Pretty soon you have a maintenance nightmare since you have spent more time getting immediate answers than you have spent in thinking about design.
6. Yet the usual database products are a disease in themselves. I think that relational databases are not the best for transaction processing. I prefer to use programming languages with built in database support.
7. Last, using a computer gives you the illusion that numbers are real. Printed numbers assume god like authority. But of course projections are not facts or reality, except perhaps in government or the business world!
What about people or (even worse) companies who are determined to send data in Excel sheets and it has to be processed automatically. Columns being deleted because the data typist thinks it is no longer needed, adding columns because there is more 'important' info to add, align the zipcode into its column using spaces after the address (hard one to spot!) and of course the very popular extra comments at the bottom of the data breaking the import routines.
I put "enhancements" in quotes because I am skeptical that this actually represents a true improvement of either the quality of the information or user efficiency in finding and using information.
These so-called improvements gloss over the continuing problems that plague spreadsheet users:
- Spreadsheet models encourage the use of "spaghetti" logic, where cells point to cells that point to cells, and can grow into random networks of calculation logic;
- They permit lots of easy off-by-one errors;
- They generally are difficult to verify/audit;
- They do not provide good tools for managing data either in terms of consolidation or searching for specific detail;
- Perhaps most importantly, despite their convenience, spreadsheets are not a robust repository for information.
I have seen one multinational enterprise that (believe it or not) built a budgeting system atop sets of dozens of departmental spreadsheets that they would roll up into a master budget; while it's a neat extension of the technology, only a fool would try to use this to run a large enterprise. One bad link in one subsheet, and the whole house of cards could fall down. (And the "top" vendor these days, Microsoft, isn't noted for building products that are of industrial grade robustness.)The last few points point towards where I would like to see spreadsheets go. They have been, and are very good at producing ad-hoc, one-off reports. This is a proper use of spreadsheets.
They are often being used instead as repositories for information that really ought to be managed by a database management system of some sort.
What spreadsheets should do is to allow, nay encourage, the use of data extracts from external sources, notably relational databases. The use of named ranges (which are a venerable feature from at least as early as Lotus 123 v2.01) is of assistance; Lotus Improv was a rather complex-to-use test platform for improved "modelling" whose functionality included database extraction.
Using external repositories permits the benefits of:
- A single repository that can be kept correct, rather than a multitude of mutually incompatible data stores;
- Data synchronization (a restatement of the last);
- All the good RDBMS "stuff" like:
- Data modelling and
- Stored Procedures/Triggers
In effect, the real point I would propose is that the task of building a spreadsheet should involve some data modelling, with thought not just about the report at hand, but also about where the data comes from and perhaps should go to.- Field validation,
- Maintaining field relationships,
- Transaction logging,
- Centralized backups,
and perhaps even more sophisticated things such asSpreadsheets suffer from programming flaws that we've ruthlessly stamped out in programming languages.
:
Some of these flaws are
- Cryptic names for fields
- No comments
- No obvious flow of control
- No modularisation
- No capability to test spreadsheet sub-components in isolation
- No capability to do a diff to see what's changed between versions
Spreadsheets also add flaws of their own, such as unlocalised references.
If we had to design the worst possible "programming language" we'd be wise to look at spreadsheets for an example of what to include.
At university, I am taking a course in business modelling. We use Simul8 s/ware to generate thousands of monti-carlo 'runs', then analyse the results as if they were real data.
:-/
...
But it's not real data! It's completely determistic, even with a pseudo-random generator. The only things we deal with are simple supply-chain networks, which are just malkov-chains with a few probability distributions. We're using 2000 pounds worth of s/ware to solve high-school statistics problems
You'd get the same results, and have real justifications for the numbers, by using an HP Calculator and a pencil. Alarmingly our lecturers have yet to explain what any of the distributions mean, but they keep using words like 'proof' and 'verify'.
I'm back to linearly regressing my calculated data. It's insane, they're all insane, one day the sane people will rule, wibble
From the article:
The first distortion is the use of point values and simple arithmetic instead of probability distributions and statistical measures. So far as I know, there's no off-the-shelf spreadsheet product--certainly none in common use--that provides for input of numbers as uncertain quantities, even though almost all of our decisions rest on forecasts or on speculations.
I am a student of this university : http://www.sgh.waw.pl/
Currently I am having a course in the use of Excel for prediction purposes. We do a lot of different case studies. We use Monte Carlo simulations, statistical tests, Markov chains and so on. We always discuss risk (variance, value-at-risk and so on). Excel is our basic tool and it is fine. We use different tools for specific purposes: Best-Fit for distribution fitting.
It is not a flaw of the tool, it is a flaw of the user. As someone said, give a monkey a PC instead of a type writer and you will get digital bullshit. I can only demand that people without proper education are not allowed to deliver multi-million business forcasts.
Fight Frist Psoting!
Browse Slashdot with 'Newest First'!
1 -- the article is a content-free advert for Whitebirch's financial toolkit
2 -- Excel is an incredibly powerful and important piece of software which many if not most large corps can't do without. There is no alternative to it. The fact that it's unpleasant to use is beside the point -- nobody has been able to come up with a better (or even comparable) replacement. In my experience, there is a large segment of the IT community that is pathologically unable to focus on business needs enough to understand this.
Whence? Hence. Whither? Thither.
I saw my first spreadsheet on an old Osborne computer. My dad knew a guy who bought small banks, and he had the Osborne and VisiCalc.
Before this guy could buy a bank, he had to value them, and his valuations were always based on a few guesses (predictions) -- what interest rates would be, or whatever (I don't know exactly how he did it).
He told me that when he started doing this stuff with a normal calculator, a pencil, and paper, changing a guess took him a couple of days. Then he got a programmable calculator, and managed to cut it down to about 5 hours. With VisiCalc, it took a few seconds.
The point being that both the programmable calculator and the spreadsheet software gave him an edge in his work -- they made him better at buying banks. They paid for themselves.
*If* no one is using the sorts of software described in this article, and *if* the software really does make you better at making decisions, people should be able to use it to buy banks (or whatever) and do a better job than their competitors. It should give you a leg up in the market place.
That's exactly what happened with spreadsheets. That's why they're popular. A lot of dumb people have started to misuse them, apparently (that sounds plausible to me), but there's no denying that they have provided and continue to provide enormous value to users.
If this new stuff is better, then why isn't Warren Buffet using it? If the answer is "because he's too dumb", why doesn't someone else start using it, and outperform Buffet?
And then we have these PowerPoint, Excel, yada yada threads where the Slashdot crowd tends to be firmly in the "don't punish the users, it's the fault of these evil software applications" camp.
What's up with that?
Having worked as a front-office developer in a very large bank i can give a good example of how spreadsheets can be misused Excel spreadsheets were used by all traders on the desk i was supporting. They did not want to move to any other tool because only spreadsheets gave them the flexibility they wanted. The spreadsheets were absolutely HUGE, think direction 20 or more tabs, all with hundreds of DDE Links to Reuters RICS - complicated formulas hanging off these links producing tables of data each time a DDE link updated (about once a second on average). We had to install gigabytes of ram and dual CPU's desktops for them just so they could run their spreadsheets. Sure excel would crash every now and then, but not often enough to switch to a new solution.
IT tried to introduce new more stable trading tools without success, not flexible enough-did not calculate "their" prices correctly-blahblah. Controlling tried to impose new tools on them to get a grip on their price calculation- all very difficult when the only data source is a "spreadsheet".
The most insane thing that we tried was to write a spreadsheet parser that would traverse all cells, build a dependency graph, reparse the formulas inside to translate this to another programming language. Needless to say this failed.
I've written a couple of applications that use .xls files as an interface. .xls, so we don't run afoul of security settings. .xls form is simple enough, in MS Access, you can have an .xls link table stub, and 'mount' each response in turn, and excecute straight SQL to read it in. Very fast and secure. .xls explicitely to map the response to the database. .xls forms under GNUMeric with great results. .mdb similar. So the fear factor is reduced.
Idea being that you gan query some relational store, put lists of default values on a back tab, set named ranges to those lists, and then, on a front tab, use data validation to constrain the users to putting Correct Stuff in data rows.
Oh, and there is no macro code in the
This is a back-to-the-future batch system. Blank forms go out as email attachments, and come back as email attachments. They are saved to a folder inexplicably named "inbox". When the time is right, we crack them open in turn and read them into our RDBMS, and then do reporting.
If the
More complicated stuff might require MS Access to instantiate Excel and open each
I've opened some of these
Also, languages like Perl and Python can script COM objects like Access and Excel.
Furthermore, as this is very stand-alone, you could use SQLite without concurrency issues.
The biggest advantage of all is that you've blown off the whole web server mess. Obviously our problem domain is non-real-time, batch-able applications. But there are a lot of those. HTTP is great at what it does, but for shedule requests and what-I-did-this-week inputs (the two applications I've done in this mode), here is a way to do them that doesn't require much that isn't generally available and desktop-runnable.
The other key is that most business people are fairly cozy with a spreadsheet interface, and die rapidly confronted with an
Get thee glass eyes, and, like a scurvy politician, seem to see things thou dost not.--King Lear
5. Lastly, you can say to yourself when you use a spreadsheet, "Look Mom, I'm not programming." Pretty soon you are using Macros, then Word Basic then Visual Basic for Applications. Pretty soon you have a maintenance nightmare since you have spent more time getting immediate answers than you have spent in thinking about design.
This is a common thing, in my (corporate) experience. Not much thought is put into how the business fundamentally goes about its tasks, but there is a lot of time spent, e.g., masturbating with time sheet data for salaried employees, etc.
Making things worse, Microsoft's tools encourage instant gratification over design: VBA, Office Macros, ASP and Visual Basic lend themselves not to rapid application devlopment, but stupid application development. It's so easy to tweak and reload that the "right" answer often ends up being the "easy" answer. It's development by instant gratification. The resulting "solutions" are often fragile and difficult to maintain. It's like Powerpoint for Programmers (referencing Tufte), in that the cognitive model of the tools distorts the outcome as much or more than it helps produce it. I'm not convinced that these convenience tools result in less time spent in development, either; quite the opposite. I think any amount of time spent in design and planning will be outweighed by all of the re-work that will usually have to be done because of the mindset the tools engender. This is overlooked because planning isn't a source of instant gratification (it seems to drag on forever, as it requires actual thinking) -- whereas development with tools like these is a source of instant gratification, thus masking their own consumption of your time.
Napster-to-go says "Fill and refill your compatible MP3 player", which is a lie. It's not MP3. It's WMA with DRM.
---
There are already a lot of posts berating the use of Excel as a database. Yet, I have not seen a single clear argument why this is a Bad Thing. The closest someone has gotten to is saying how users might inadvertently delete columns or add unwanted formatting, etc.
---
Some people are too hung up on what something was designed for, and overlook what it could be used for. Presumably they're against the Wright brothers use of bicycle parts for the construction of the first plane also.
You shouldn't be forced to use SQL for manipulating data, you should be restrained from using Excel. ;) The reality of the differences between a spreadsheet and a database is that a spreadsheet lacks the data constraints (relationships) necessary to keep a user from entering bad data. A database can control this (data integrity) to a large degree (depending on your datamodel design).
An example I fight with daily is product attributes. I maintain a n ecommerce database with about 180,000 products, each of which would have, say, a color. The problem is that if I import data from a spreadsheet it might randomly insert spaces in the data (i.e. "Black " or " Black" instead of "Black"), whereas if I get the data entered through our tools, the user selects from a list of colors, and only if the choice doesn't exist do they add a new one.
You mention how people are doing a knee-jerk that 'DB's are sacred'. Yes, they are. So are spreadsheets, the problem is that people bastard-ize their use and end up confused about why they both exist, and how to use them.
Database = Data storage, data consistency, ease of data maintenance
Spreadsheet = Data analysis, data redundancy, lack of data integrity.
That's how I see it, anyhow.
"If voting could really change things, it would be illegal. " - Revolution Books, NY
Your point #5 brushes with the real problem. I work in a large -- very large -- financial organization, and we often see users sneaking business code into various 'documents.' Their favorite is, of course, Excel.
When we ask our users, "why?" the answer is always "it's too much trouble to deal with you technology folks." They're willing to forgo robustness, auditing, data validation, etc. in order to escape the technology bureaucracy: Getting budgets and resources, all those damn planning meetings, dealing with System Administrators, and so on. They generally know the risks and limitations of using Excel but feel the advantage of getting quick results is entirely worth it.