The Subtle Tyranny Of Spreadsheets
pipingguy writes "I found this link on a CAD-related mailing list which questioned the current state of spreadsheet usage. Since using spreadsheets is often only one step away from PowerPoint mastery, I thought it worthy of submission." An excerpt:
"The second distortion caused by conventional spreadsheets is more subtle. It's described in a 1980s paper, written by university researcher Jeffrey Kottemann and others concerning what they called 'Performance, Beliefs, and the Illusion of Control.' The paper described an experiment in which subjects were asked to perform a planning task using different tools, some of them with elaborate what-if capability and others without it." Yup, it's a ZD/Yahoo link, but it raises good questions."
for the love of god, stop misusing spreadsheets/excel as databases- They are for calculating numbers, not creating lists of things!!!!!!
back in 1997 when I was a physics exchange student in Glasgow, they made me solve a *quantum machanics* problem using excel! it was ridiculous. I kept the spreadsheet just for its absurdity (it's the only .xls file on my entire harddrive)
A manager at a company I worked for was presenting figures for the last year. He showed the financial breakdown for each division, with the profit being calculated as a percentage for each division. At the bottom, there was a summary line showing the total figures for the company and including the "average profit" for the company.
Which he had calculated by summing the profit column and dividing by the number of divisions.
I mentioned that this was producing a somewhat unrealistic figure, with a couple of small divisions showing very good profit margins and the largest department showing a slight loss. "No, that's the mathematical definition of 'median'," he answered.
Ceci n'est pas une signature
What's so strange about that? Both are highly unpredictable, so it should work pretty well.
The only possible explanation I can think of for some of the Excel sheets I have seen in the workplace, is that the poor fool who wrote it didn't have a clue how to use Access. The kicker is when they come to me to write code to automate it. Excel is simply evil. The only thing its good for is making pretty charts.
The question is whether a tool can ever be a substitute for a good understanding of statistics and probability - or whether it will always be a case of monkeys playing with ever more sophisticated typewriters...?
"The expected Year 1 profit is $1 million, but there's a 30 percent chance of losses for the first two years."
Unfortunately or not, this is not what the bosses want to hear. They want to know that profits will be $1 million. Perhaps the spreadsheets have not adapted to uncertainties for a reason.
This reminds me of something a successful businessman told me about accountants: "Accountants know the cost of everything, and the value of nothing".
A problem occurs when people look at a spreadsheet of accounts and think it represents a business. It doesn't. A classic illustration of this is Marks & Spencer's returns policy. If you buy a pair of trousers from Marks & Spencers and then once you've got them home decide they don't fit or whatever, you can return them, no questions asked. To an accountant, this is just a cost. There is no identifiable figure in the accounts that you can point to and say, there's the benefit of that cost. And yet many people shop there because of the policy.
I really fail to see the point in these posts about a spreadsheet program (be it Excel), not being a database.
Maybe there is a genuine need for a database program (and I use this term here loosely) that provides an interace as easy to use as spreadsheet? Not every user is a programmer, and the vagaries of the any DBMS are well known. Besides, no end-user wants to meddle with software administration.
Maybe the users use it as a database, simply because it provides an easy means of storage and manipulation of trivial data? Not ever user (not in every case, at least) has a million records to work on.
Yes, spreadsheet tools may not have capabilities such as porbability distributions or statistical measures. How many naive users need them? Oh, the average executive might need them to project forecasts, but then, is there a tool that allows this? Conversely, if this limitation has been identified (and I'm sure this must have been identified in the past and by others, as well), why do we not see this being incorporated in any mainstream spreadsheet? (hint: there probably is not enough critical mass of users demanding such a feature).
The other point listed in the article - "the worst nightmare of those who justify IT's return on investment - spending extra money on a more time-consuming product that yields absolutely no measurable improvement?". Well then, perhaps in that given scenario, the need wasn't evaluated correctly? Or maybe such a complexity wasn't required after all?
It's easy to point out the missing features/capabilities from any software, but if it's not asked for by average/most users, it will take a long while to be incorporated (if at all). Yes, this however leaves the issue of errors introduced by the use of such spreadsheets, whether tacit or implicit. In either cases, it would be due to the user being unable to find the right tool to model the problem, or not being able to understand the problem correctly and hence not taking into account as many (if not all) parameters involved.
http://efil.blogspot.com/
Another thing that suffers from this type of mentality is long term R&D. Japan has had many very long term R&D projects which has been criticised by outsiders as being too long term.
I've just been watching a Japanese robot demo on the TV. Very impressive. I think the fruits of there long term investment in robotics R&D will be seen in the next decade.
Of course this is actually an advertisement for a specific software package. But whats funny is that the story undercuts itself: It explains that people are wasting their time doing detailed future predictions with spreadsheets. Then it goes on to push this particular product as a way of doing detailed future predictions using statistics. But they never make the case that making predictions is good anyway, while they do provide evidence that its a waste of time!
I dont know anyone who uses their spreadsheets for doing any kind of predictions. Everyone I know uses it just like the old-fashioned pen-and-paper..spreadsheet! Its a way of accounting for the here-and-now. How many businessmen don't understand their business prospects better than a garbage-in-garbage out number crunching computer?
A spread sheet is not a stastics program. However if your office bundle includes a hammer, everyting starts to look like a nail. Excell does math, It's the hammer that makes stastics look like a spreadsheet problem. Enough said? Hammer - nail, Excell - spredsheetable data. For stastics programs look here for a list of some real stastics programs. They are not spreadsheets.
http://www.wch.org.au/CEBU/software.htm
I guess it's kind of like trying to write HTML with MS Notepad. It can be done, however other tools make the job easer.
The truth shall set you free!
Spreadsheets have been and will always continue to be an extraordinarily powerful ad-hoc tool for those wishing to tabulate data with automated calculations. They are worse than useless if, for whatever reason, the user has no savvy approach to the problem at hand, or if the model which requires manipulation has no concrete representation.
After many years with little use for a spreadsheet (previously having used Supercalc and Lotus 123) I was shocked by corporate state of the art. Specifically, I was disturbed by the type system employed to represent cell values and by the way in which formatting settings can so easily obscure the values actually being processed. The way in which Excel handles dates seems particularly horrific... and OO-Spreadsheet just mimics the same mistakes. I was also amazed that modern spreadsheets haven't started to use extensible libraries to represent new data types. It seems a no-brainer for a spreadsheet to make use of pluggable C# or Java classes to allow domain specific types to be manipulated in the context of a spreadsheet environment. Am I missing something - or have we not only failed to advance the art (as suggested by the article) but actually taken several steps backwards?
The scope of the article is really limited to the use of spreadsheets in financial planning (forecasting). For which the criticisms of the author and the material he cites are pretty valid. Indeed we all have our pet hates when it comes to how the tool is used (you have no idea how much of the financial world is ruled by this spreadsheet or the other driving trading decisions!) however, the tabular representation of data is not inherently broken and it behooves the computer scientists amongst us to ask why this form has usurped the database for the representation of simple datasets and all to frequently complex ones.
"The first thing to do when you find yourself in a hole is stop digging."
I particularly enjoyed it, and it made me wonder why I've always hated Excel. maybe it's time to forgive...
(I always used to like Pipdream on the Archimedes though. That was a combined spreadsheet and word processor).
I'm guessing that you've never tried to create a table in Microsoft Word. Yeah, it might seem a bit daft, but at least Excel won't randomly change your text styles, automatically adjust the margins so that half the table is off the paper, insert page breaks through the middle of rows, etc, etc, etc.
If I want a decent table in a Word document I have in the past been driven to embed an Excel file in it.
evil math within Nature's Cubic Creation!
So far I have seen Excell used for issue mangement, system requirement repository, time tracking, time estimation, code dependency tracking, system reference data and configuration data repository, ...
..and in 99% of the cases the spreadsheets don't even use the SUM function.
"There is a terrorist behind every bush"
People do this, certainly in my experience, because the only database they have for use is MS Access.
So really I don't blame them for avoiding that utter POS software.
You have to remember that people are stuck working within the confines of whatever software the business deems 'acceptable'. Although it would be great if we were all on Linux/OSX at work we're not.
To anyone that has Excel '97 - On a new Worksheet, Press F5. Type X97:L97 and hit enter. Press the tab key. Hold Ctrl-Shift. Click on the Chart Wizard toolbar button. Use mouse to fly around - Right button forward/ Left button reverse.
Excel 2000? Under file menu, do 'Save as Web Page'. Say 'Publish Sheet' and 'Add Interactivity'. Save to some htm page on your drive. Load the htm page with IE (don't give me any grief over this one- you're already screwing around with Excel so I don't want to hear it ). You should have Excel in the middle of the page. Scroll to row 2000, column WC. Select row 2000, and tab so that WC is the active column. Hold down Shift+Crtl+Alt nad click the Office logo in the upper-left. If you have DirectX, you will be playing what looks like spy hunter. Use the arrow keys to drive, space to fire, O to drop oil slicks, and when it gets dark, use H for your headlights. -B
What about people or (even worse) companies who are determined to send data in Excel sheets and it has to be processed automatically. Columns being deleted because the data typist thinks it is no longer needed, adding columns because there is more 'important' info to add, align the zipcode into its column using spaces after the address (hard one to spot!) and of course the very popular extra comments at the bottom of the data breaking the import routines.
A quick google search reveals evidence of only one paper (but not the paper itself, unfortunately) entitled, "Performance, Beliefs, and the Illusion of Control", see, e.g., here:
Kottemann, J.E., Davis, F.D., & Remus, W.R. (1994). Computer-assisted decision making: Performance, beliefs, and the illusion of control. Organizational Behavior and Human Decision Processes, 57, 26-37.
Note that this paper was published in 1994; it's not a "1980s paper" as cited in the article. Careless errors like this make one wonder what else in the author's train of thought is similarly researched. Perhaps he's just incorporating incertainty into his references, too--or, maybe he considers 1994 to be statistically similar to the 1980s?
I put "enhancements" in quotes because I am skeptical that this actually represents a true improvement of either the quality of the information or user efficiency in finding and using information.
These so-called improvements gloss over the continuing problems that plague spreadsheet users:
- Spreadsheet models encourage the use of "spaghetti" logic, where cells point to cells that point to cells, and can grow into random networks of calculation logic;
- They permit lots of easy off-by-one errors;
- They generally are difficult to verify/audit;
- They do not provide good tools for managing data either in terms of consolidation or searching for specific detail;
- Perhaps most importantly, despite their convenience, spreadsheets are not a robust repository for information.
I have seen one multinational enterprise that (believe it or not) built a budgeting system atop sets of dozens of departmental spreadsheets that they would roll up into a master budget; while it's a neat extension of the technology, only a fool would try to use this to run a large enterprise. One bad link in one subsheet, and the whole house of cards could fall down. (And the "top" vendor these days, Microsoft, isn't noted for building products that are of industrial grade robustness.)The last few points point towards where I would like to see spreadsheets go. They have been, and are very good at producing ad-hoc, one-off reports. This is a proper use of spreadsheets.
They are often being used instead as repositories for information that really ought to be managed by a database management system of some sort.
What spreadsheets should do is to allow, nay encourage, the use of data extracts from external sources, notably relational databases. The use of named ranges (which are a venerable feature from at least as early as Lotus 123 v2.01) is of assistance; Lotus Improv was a rather complex-to-use test platform for improved "modelling" whose functionality included database extraction.
Using external repositories permits the benefits of:
- A single repository that can be kept correct, rather than a multitude of mutually incompatible data stores;
- Data synchronization (a restatement of the last);
- All the good RDBMS "stuff" like:
- Data modelling and
- Stored Procedures/Triggers
In effect, the real point I would propose is that the task of building a spreadsheet should involve some data modelling, with thought not just about the report at hand, but also about where the data comes from and perhaps should go to.- Field validation,
- Maintaining field relationships,
- Transaction logging,
- Centralized backups,
and perhaps even more sophisticated things such asSpreadsheets suffer from programming flaws that we've ruthlessly stamped out in programming languages.
:
Some of these flaws are
- Cryptic names for fields
- No comments
- No obvious flow of control
- No modularisation
- No capability to test spreadsheet sub-components in isolation
- No capability to do a diff to see what's changed between versions
Spreadsheets also add flaws of their own, such as unlocalised references.
If we had to design the worst possible "programming language" we'd be wise to look at spreadsheets for an example of what to include.
I was recently on the market for a new car (hoorrray!). I shortlisted three vehicles for me to consider and I asked the salespeople of the respective companies to mail me data on service plan, warranty, replacement part prices etc. on all the three vehicles. I got two replies with Excel documents and one with a printer-friendly PDF.
I am all for open standards in communication, but what shall I do? Send a reply to the salesman "you f*ing Microserf moron, I don't want your car if you force me to buy a bloody spreadsheet just to read how much do you charge for a goddamned air filter?" But is it wise to choose a car just because of the software that a salesman uses?
Finally I picked the one that was described in PDF. It was a coincidence - a decisive factor was actually that the make of that car constantly tops in the consumer surveys, while the other two are just about average. But then I started to think - maybe that's not a coincidence after all? Maybe this make tops in surveys just because it's policy is to make all stages of customer experience as convenient as possible and they ask themselves the question that other car salesmen don't ask - "what if my prospective client does not use Microsoft Excel(TM) or Microsoft Word(TM)?".
Maybe it is possible for us to vote with our wallets against proprietary, closed standards?
Similarly, I don't have any problems with using Excel as a basic flatfile database (never relational though, I'm not that insane) where the visual layout of the data is more important than the flexibilty of querying. That said, on a basic flat-file database you can actually perform some quite sophisticated filters using Excel's auto-filter function.
I don't think the problem is with using a spreadsheet as a word processor, database, or any of the other uses it can be shoehorned into. The problem is simply that people correctly see a spreadsheet as a jack of all trades, but forget that this implies it's a master of none, with the possible exception of what it was designed for: crunching tabulated numbers.
UNIX? They're not even circumcised! Savages!
I wanted to model the characteristics of a turbocharger I was planning to install in my car. It seemed to me a spreadsheet was the ideal way to try various scenarios. Of course, modeling a turbo requires entering lots of lists of numbers. I had to fight with it, but despite my years of programming experience, figuring out Excel was easier and faster than writing my own custom app for the job.
Turbocharger Spreadsheet
Now I can just enter engine size, compression ratios, etc., select from a variety of compressor maps, and presto - power curves computed without breaking a sweat.
-- *My* journal is more interesting than *yours*...
Oh I couldn't agree more about the presence of a product niche that has a spreadsheet-esque interface, but not only enforces relationships, but also provides all the snazzy features (statistical operations, et al).
:-p
The big questions here are then how many users actually need this, and how many naive users correctly understand the concept of "relationships" between data and the enforcing rules?
I have a sneaky feeling these features are missed mostly by developers tyring to squeeze that extra bit of functionality from a spreadsheet (be it to impress, or out of sheer laziness)
http://efil.blogspot.com/
Well, for really small lists, it's an easy way to store it. Especially when someone with very little computer knowledge couldn't even begin to create an SQL statement.
How's this?
I created a spreadsheet in KSpread (awesome program, that) to plan my menu for Texas Brand Barbecue. So I used it to estimate all of my costs and my gross sales. I'm a perfect example of who this article is about, but I think I'm above the sort of planning the article is talking about. :) (I intentionally went conservative on sales and liberal on costs. I could be wrong in the end, but if I'm going to err, I prefer to err where my error makes profit rather than loss)
Then I needed a list of equipment to start up and to estimate the cost of all this equipment. So I switch to another sheet in the same workbook and create this list. A quick little formula gives me a total.
Aha, so now I wanted to keep all my data in one place, and the next few pieces of data were tabular in nature, but no formula attached. I needed a list of local area farmer's markets, locations, dates and times, market coordinator, and contact phone number (website and email if available). So what did I do? Well, I made a new sheet in the workbook and put my table there. Now I refer to it whenever I need to call someone on the list, or if a market falls through (that process is over, now) I can easily find another market for that day.
Spreadsheets, as another poster put it, are for the presentation of data. For my purpose, I could've taken a couple of weeks to write a program that would have less than half of the functionality of my spreadsheet, but why bother? The spreadsheet is there to do the job.
This isn't saying that I wouldn't like to have something better, and I intend to home-grow a better solution. First I have to write a driver that will let me download transactions from my cash register to my database, though. The program gets complex after that, but the intent is to replace my spreadsheet with it. (And release it as open source, of course. Doesn't give me a competitive edge worthy of note, and others could benefit by it)
Like what I said? You might like my music
- No comments
Umm.....Excel can add comments to individual cells, and you can rename columns/rows to something arbitrary.
At university, I am taking a course in business modelling. We use Simul8 s/ware to generate thousands of monti-carlo 'runs', then analyse the results as if they were real data.
:-/
...
But it's not real data! It's completely determistic, even with a pseudo-random generator. The only things we deal with are simple supply-chain networks, which are just malkov-chains with a few probability distributions. We're using 2000 pounds worth of s/ware to solve high-school statistics problems
You'd get the same results, and have real justifications for the numbers, by using an HP Calculator and a pencil. Alarmingly our lecturers have yet to explain what any of the distributions mean, but they keep using words like 'proof' and 'verify'.
I'm back to linearly regressing my calculated data. It's insane, they're all insane, one day the sane people will rule, wibble
I believe the term for that is 'Spreadsheet Physics'
In the early '90s when I was a student, and teaching physics by computer was still in it's experimental phase, one of the things they had us do was solve simple numerical problems (trajectory of a ball with air resistance etc.) using Quattro Pro. It did work, but it was not much faster or easier than programming it directly in C or FORTRAN and using GNUplot to draw the nice pictures, so as far as I know, they scrapped the program.
... is people who use excel as some kind of DTP package. I used to work in a school, and the deputy principal used to make report cards by typing them into excel, and fiddling around with the cell formatting properties until it looked right. Except it didn't, and when he couldn't make something work he used to DEMAND that I help him (some things were just impossible to acheive without using Publisher or something). The worst year was when he saved them all as CSV files (one of his friends told him it saved space or something) and it was up to me to get them back. Urgh...
From the article:
The first distortion is the use of point values and simple arithmetic instead of probability distributions and statistical measures. So far as I know, there's no off-the-shelf spreadsheet product--certainly none in common use--that provides for input of numbers as uncertain quantities, even though almost all of our decisions rest on forecasts or on speculations.
I am a student of this university : http://www.sgh.waw.pl/
Currently I am having a course in the use of Excel for prediction purposes. We do a lot of different case studies. We use Monte Carlo simulations, statistical tests, Markov chains and so on. We always discuss risk (variance, value-at-risk and so on). Excel is our basic tool and it is fine. We use different tools for specific purposes: Best-Fit for distribution fitting.
It is not a flaw of the tool, it is a flaw of the user. As someone said, give a monkey a PC instead of a type writer and you will get digital bullshit. I can only demand that people without proper education are not allowed to deliver multi-million business forcasts.
Fight Frist Psoting!
Browse Slashdot with 'Newest First'!
And same goes for graphics. Never ever use MS Word drawing functions for, well, for anything really. You can never be quite sure what happens if you for example resize the image... Or type more text into a textbox... Or breathe...
Always insert Powerpoint slide into the document. Unless you really need something Powerpoint can't do of course, in which case you should use whatever actual drawing program you need to.
>> And what the *fsck* does it mean, even if the data has a Gaussian distribution?
:)
that's what wikipedia is for
it's a measure of the width of the distribution. Given a gaussian distribution, a random measurement will occur within one standard deviation of the mean with a probability of around 68%. Or to put it the other way round, if you have data and are trying to calculate the distribution, there's about a 68% chance that the true value of the mean falls within 1 s.d. of the value you calculated.
If you don't have a gaussian distribution, you can still calculate a standard deviation but it will not have the same meaning with respect to the probability of you having got the right mean.
This is of particular relevance to spreadsheets, since they're often used to do calculations on financial data such as stock prices and most financial data is not gaussian--it's 'log normal', meaning that the logarithm of the data values are gaussian, but not the data itself. So most people doing standard deviation calculations on such data are probably completely misinterpreting the results...
The author of this post asserts his moral rights.
Since using spreadsheets is often only one step away from PowerPoint mastery.
.. for "Powerpoint Mastery" have a look at Tufte
"The Cognitive Style of PowerPoint".
Erm
Yes, I know it was discussed here before (as I guess), but still - it is worth a mention.
CC.
TaijiQuan (Huang, 5 loosenings)
1 -- the article is a content-free advert for Whitebirch's financial toolkit
2 -- Excel is an incredibly powerful and important piece of software which many if not most large corps can't do without. There is no alternative to it. The fact that it's unpleasant to use is beside the point -- nobody has been able to come up with a better (or even comparable) replacement. In my experience, there is a large segment of the IT community that is pathologically unable to focus on business needs enough to understand this.
Whence? Hence. Whither? Thither.
I saw my first spreadsheet on an old Osborne computer. My dad knew a guy who bought small banks, and he had the Osborne and VisiCalc.
Before this guy could buy a bank, he had to value them, and his valuations were always based on a few guesses (predictions) -- what interest rates would be, or whatever (I don't know exactly how he did it).
He told me that when he started doing this stuff with a normal calculator, a pencil, and paper, changing a guess took him a couple of days. Then he got a programmable calculator, and managed to cut it down to about 5 hours. With VisiCalc, it took a few seconds.
The point being that both the programmable calculator and the spreadsheet software gave him an edge in his work -- they made him better at buying banks. They paid for themselves.
*If* no one is using the sorts of software described in this article, and *if* the software really does make you better at making decisions, people should be able to use it to buy banks (or whatever) and do a better job than their competitors. It should give you a leg up in the market place.
That's exactly what happened with spreadsheets. That's why they're popular. A lot of dumb people have started to misuse them, apparently (that sounds plausible to me), but there's no denying that they have provided and continue to provide enormous value to users.
If this new stuff is better, then why isn't Warren Buffet using it? If the answer is "because he's too dumb", why doesn't someone else start using it, and outperform Buffet?
And then we have these PowerPoint, Excel, yada yada threads where the Slashdot crowd tends to be firmly in the "don't punish the users, it's the fault of these evil software applications" camp.
What's up with that?
So MS dominates the office software market becuase people really need the functionality they offer, right? After all why would anyone want to use something lightweight with a clean UI (like Lyx for example) which makes you use another application just to draw a diagram?
Having worked as a front-office developer in a very large bank i can give a good example of how spreadsheets can be misused Excel spreadsheets were used by all traders on the desk i was supporting. They did not want to move to any other tool because only spreadsheets gave them the flexibility they wanted. The spreadsheets were absolutely HUGE, think direction 20 or more tabs, all with hundreds of DDE Links to Reuters RICS - complicated formulas hanging off these links producing tables of data each time a DDE link updated (about once a second on average). We had to install gigabytes of ram and dual CPU's desktops for them just so they could run their spreadsheets. Sure excel would crash every now and then, but not often enough to switch to a new solution.
IT tried to introduce new more stable trading tools without success, not flexible enough-did not calculate "their" prices correctly-blahblah. Controlling tried to impose new tools on them to get a grip on their price calculation- all very difficult when the only data source is a "spreadsheet".
The most insane thing that we tried was to write a spreadsheet parser that would traverse all cells, build a dependency graph, reparse the formulas inside to translate this to another programming language. Needless to say this failed.
if you think that's perverse
That man tried to kill mah Daddy
Waltzing with bears by Tom DeMarco and Timothy Lister. And yes, they provide spreadsheets to calculate probabilities.
One of the biggest reasons is the sort function combined with [l]user error.
If a spreadsheet has more columns that fit on the screen, and is used by more than one person, at some point you can almost count on someone highlighting some, not all, of the columns and then sorting the highlighted columns, and saving the file. When that happens, the highlighted columns are sorted, the rest are left as is. Worst, the next person to use the file doesn't always realize the corruption has occurred.
This was a problem in Office 97 and earlier. I think it was a problem in Office 2k, but I don't remember. I have not tested this on Office XP or 2003.
. 62,400 repetitions make one truth -- Brave New World, Aldous Huxley
I swear to god. You can't make this stuff up. Our financial institution actually ran (past tense -- I'm converting it) a cluster of PCs all running Excel for pricing hugely complex financial products.
After finally getting my hands on the underlying VBA code, I printed it out. It was 56 pages of data movement (copy this piece of data from here to there). The actual pricing code was built as an add-in module and used as a formula.
The only reason for this system's existance is that several years ago someone heard about clustering PCs. They decided that it would be cool to do it with MS Windows and Excel. Gah. It's been an expensive mistake.
I do systems and IC design, RF mostly. A couple of years ago I was doing work for a very large semiconductor company. They were using an excel spreadsheet to do the majority of their system level simulations. It was a very intricate piece of work that had been revised by one individual over many years, spanning a couple of different employers. The company purchased Agilent's ADS and was having trouble getting engineers to abandon their tried and true excel spreadsheets.
I must admit, in some ways, for some simulations I preferred it as well. With ADS most components were black boxes, when input to output didn't behave as expected you didn't have much info to debug with. At least with the excel spreadsheet all the equations were there to study.
It dawned on me about half way through the project that the reason most of the engineers preferred the excel based system was in large part because it was (bear with me) open source. Many times in simulation you aren't sure if the problem is your model, or the simulation package. Analysis was much easier (or trustworthy) when all the calculation methods were easily viewed and tested. (All this is also why I use Octave!)
The author of the spreadsheet is now working at another high profile semiconductor company. Word around the campfire is that they are using the spreadsheet now too.
I can't think of anything you can't do in excel.
You could make the first 4 pages with formatting, then simply import the csv values into a third source page.
I appreciate that Excel gives me the capability to make simple semi automated forms that look nice.
Prevents simple errors, and they're easy to use.
The limit of Engineering as GPA goes to zero is MBA.
Typically, math is the skill that drives that GPA down. OK, the bad joke is starting to look like a flame, and it's true that clueless big dogs with their sensless five year plans make me angry, but please - this is a joke. Everyone has got their skill set.
The simplest example of a bad problem for a spreadsheet is billiards. Momentum transfer is easy but predicting a billiards game is impossible. Yet businessmen make this kind of mistake all the time. There is no cure for this kind of bad judgement and it's good that the people at ZDnet have pointed it out. I just wish they were not trying to promote statistical packages that people are not likely to understand as a substitute for good judgement.
Friends don't help friends install M$ junk.
The classical Slashdot debate features something-stupid-done-or-said-by-non-IT-savvy-gene ral-managers, and then the appropriate bashing by IT-savvy Slashdotters. If there were a similar forum where my profession were in majority, they would probably be bashing this very thread right now (I am an economist and business manager).
Just like, say, PERL or Java, spreadsheets can be used well, and they can be used poorly. Furthermore, people with good "technical" Excel skills can produce lousy spreadsheets with little analytical value, and vice versa. I have seen some fantastic spreadsheets which have totally revolutionized the way people saw a problem. At an insurance company I worked with, they used a huge spreadsheet to do a simulation of the effects on every single customer of a planned, dramatic price increase. The result: They realized that the price increase would have much less impact than they feared. Thus, the product was kept and the employees kept their jobs. The thing with the spreadsheet was that it was developed in fast trial-and-error loops, which meant that their run-once-per-night SAS tools were not an option (this was 7 years ago).
(I have, by the way, also seen people spend 3 months on developing a mega-spreadsheet for assessing the value of a company, only to use the wrong assumption for a critical value and thereby introducing an error of about 40% in the valuation [that critical value being the discount rate]).
I can assure all the concerned citizens of this forum that there is indeed a lot of excellent, first-rate Excel usage out there. Analytical power beyond our wildest dreams is at the fingertips of people without skills in programming at any lower level. This, believe it or not, is a good thing, because anyone who has dedicated himself to becoming great at programming is probably less skilled in disciplines such as financial analysis.
Sure, there is "bad code". Sure, people get a false sense of control. Sure, this new tool puts too much options in the hands of people who do not know how to use them. But how would that be untrue of other IT tools or programming environments? What does it matter that they use Excel as a database, as long as it gets their work done easier than getting an SQL education and then doing it "right"?
Biases are part of all decision-making (as even economists are realizing). So what if that is the case in Spreadsheet World, too?
VBA for Excel is very slow, but the figures you quote suggest that something is wrong in addition to the inherent slowness of the product. Have you checked the code doing the lookup? It might have been written badly.
One bottleneck that often happen in VBA for Excel routines is that the application tries to keep the screen refreshed while the routine is crunching, which slows things right down. If it's not there already, you should get a big improvement in speed by putting
application.screenupdating = false
at the beginning of the routine, and
application.screenupdating = true
at the end, which will mean that the screen will freeze when you start the routine, and only redraw once it's all finished.
evil math within Nature's Cubic Creation!
There was an issue of PC World that came out in late '83 or '84 that surveyed readers on which applications they used in various categories. Lotus 1-2-3 ranked third (something like 17% of respondents) in the word processing category. (This was not a mistake; it turned out that some users of 1-2-3 found it easier to enter a few paragraphs of text into a column of cells and use the Range Justify command, than to exit 1-2-3, change floppies, and launch WordPerfect.)
When I started at Lotus, my wife was a buyer for a local retail chain. She had to do quarterly plans where she distributed a fixed number of dollars over various styles of merchandise among several branches of stores; she had been doing this in rows and columns with pencil, paper, and eraser. I built a model for her to do this using 1-2-3, and several days work was reduced to hours; her peer buyers would visit us quarterly to take advantage of the new tool.
My model was flawed; I formatted the calculated values to 2 decimal places, and 1-2-3 rounded the displayed values accordingly. As a result, the actual sum of a column of calculated values was not equal to the sum of the displayed values. (A further example of the ongoing weakness of spreadsheets, and of my own carelessness; my numeric methods prof would have been ashamed of me.) (It should also be noted that my wife caught the mistake by eye, without even doing the math herself; I had been so trustful of the tools that I hadn't bothered to challenge the results. Another lesson learned.)
Eventually, the store's IT department rolled out their own application on the division's single 3270 terminal. My wife still prepared her model at home (since time in the 3270 seat was hard to come by), and transcribed the printed results into the terminal at work. The IT application required her to enter not only the table values but the calculated sums at the end of each column and row. If a sum did not match the contents of its row or column, the IT app reported an error, but did not provide the correct value, nor even state which of the thirty-odd values was incorrect!
You're endangering my revenue stream.
If it weren't for those organically grown excel/access nightmare programs most companies would never think of hiring a programmer. They hire us after they build those things up to their final catastrophic state and realize that they need somebody to come in a fix it up right.
I don't know about you, but if it weren't for homegrown messes like that it never would have occurred to me that anyone needed a program to import proposed insurance fee schedules and munge those values against previous fee schedules and usage data to decide whether or not a particular contract was worth considering. That sort of stuff isn't normally on my radar. Thanks Mr. P for building that supersized crappy excel program to do that witt, so I could re-write it!
I suggest you seek another university. A spread sheet is not even an adequate tool for teaching Monte Carlo (MC).
MC simulations typically have value only when used in large runs. Without a reasonable number of simulations, you usually end up with very poor statistics for your outcome. A single shoot of the roulet wheel tells you nothing. Ten million shots can come close to simulating an acutal event.
Spreadheets are absolutely the worst tool imaginable for such a task. Spreadsheets are good for simple calculations with well know quantities where you can check the intermediate results and make sure you have not made a bonehead mistake. They are best for back of the envelope, simplified model sanity checks. What you want for MC are any of the premade specialty packages, usually written in FORTRAN, that run as a batch process and have been extensively peer reviewed. I can imagine a dinky spreadheet MC tool with inputs for numbers of runs and odds, but with the underlying math hidden. UGH, you can't tell what it's doing! Anyone trying to cowboy a spreadsheet "solution" to this kind of problem is wasting their time.
I can only demand that people without proper education are not allowed to deliver multi-million business forcasts.
I second that demand.
Friends don't help friends install M$ junk.
Spreadsheets are critical tools for "knowledge workers" because they allow them to explore ideas, analyze information and identify trends. The problem is that most "knowledge workers" are competant at some aspect of doing business and not at developing appropriate software tools. It is a problem when a spreadsheet is used as a multiuser shared data application. Spreadsheets allow:
* Entrepeneuers to financially model their business plan.
* Calculations to be performed more accurately than say, in the margin of a ledger pad.
* Simple busines processes to be tracked and managed using a computer instead of say, a legal pad.
* Executives to summarize and categorize and drill down to analyze information from a database (pivot tables)
At the end of the day, I've found that spreadsheets are not the cause of business mistakes. When there is a spreadsheet failure, there are ususally a couple of fundamental problems:
* Lack of attention to detail
* No oversight or validation
* Numbers are not reliable to begin with
* No one bothered to actually do a what-if using a reasonable range of scenarios - they only looked at the rose colored one.
-- $G
I like it when the dude celebrates "processing an impossible amount of data" using Excel and other Microsoft products and the two fat chicks come out and dump the whole water cooler tank of water on him.
I normally lurk, but need to say that I don't think spreadsheets are inherently evil. I am an engineer at a large oil company, and we use spreadsheet models for a number of processes. Typically the spreadsheet is used as an interface, since everyone is familiar with it. The "number crunching" is done with VBA. I know that among the readership of /. VBA is a dirty word. For an engineer, though, it is not that bad of a tool. Not particularly fast, but for simple, numeric algorithm implementations it works fine.
Sure - we can and do use purpose-built models. They have their place. However, they tend to be black boxes that can't be easily modified. They also tend to be really, really, really expensive and more of a solution than you need. In other words, for some problems, they tend not be the most cost-effective means for computation.
If transparency of the calculation method is most important and not millisecond execution speed, then I agree with a previous poster who argued that Excel and VBA tend to be "open source" in the context of "how the calculation was done".
What truly functional advancement has Microsoft made to Excel in recent years? It seems that they make a few bug fixes, slap the newest Office look on it, and push it out the door. The advancements in spreadsheet functionality were made by other companies, which MS copied into Excel (pivot tables is an example). The one real exception to this is the VBA macro system. I find it useful for data reduction, and it remains the one reason I will not switch to other spreadsheets.
I just wish some other company would make inroads in spreadsheet design and either push Excel to improve or topple it from its current perch.
- Cryptic names for fields
How is this "stamped out in programming languages" other than convention and training? Is is forced on you?
- No comments
Excel can "comment" to some extent.
- No obvious flow of control
Some would argue that the freeform nature of a spreadsheet is what makes it so appealing.
- No modularisation
- No capability to test spreadsheet sub-components in isolation
- No capability to do a diff to see what's changed between versions
Excel can keep track of every change made - didn't we just have an article making fun of Microsoft for this feature in Word?!
Not saying the spreadsheets, or Excel specifically, is the answer to everything.
Now fast forward to present. I would agree that people want to use Excel for everything: database, graphics, plotting, forms, as a programming environment (oh the humanity!) etc, etc. Most excel users probably don't even know how to use a formula in Excel. The other extreme is when the calculations are so complex that it would be better to switch to Mathematica or Matlab. But Excel is the only tool they know and they want to use it for everything. I can hear my boss's voice in my head "Let's use Excel for this" with the intonation that would make one believe it's the greatest idea ever. Oh, well, have to get back to work where I am forced to use Excel for most of those tasks mentioned above, yes, I am one of the guilty.
I think its funny that here at slashdot the center of advocasy for open software, that 95% of the discussion here is using Excel to mean spreadsheet. Talk about subtle bias! Apparently OO isn't good enough, or it isn't popular enough even amoungst slashdotters. Perhaps, its a mistake to give such generic names to the components of OO. Now if it was something like firecalc or pheonixview then I think it would be discussed more. Instead, Now when you talk about an individual component you have to use the suite's name( IE OpenOffice Calc). No one says Micorsoft Office System Excel 2004. They just call it excell, a techno sounding name that doesn't provide any clue as to its use.
Well.. maybe. Or Maybe not. But Definitely not sort of.
It's not too hard to appreciate the difference between products that incorporate uncertainty and those that don't: On the one hand, you've got, "We predict a $1 million profit in the first year"; on the other, "The expected Year 1 profit is $1 million, but there's a 30 percent chance of losses for the first two years." These different statements will lead to quite different discussions.
The second point made is simply how what-if tools such as spreadsheets lead users to think they are making better forecasts while in fact their forecasts do not improve.
That's quite incredible!
The funny thing is that while everyone is going to look at this and say that it is ridiculous, and it is, think what people would say if it had been done with GNumeric. The Slashdot headline would read something like "Cool Hack Let's You Play Pacman in GNumeric" and there would be 300 comments saying how cool it is. Another 50 comments would say that the guy has too much time on his hands. People would talk about the awesome power of GNumeric but, no one would complain that it was an absurd abuse of Gnumeric as they are here about Excel.
Just some perspective.
How about the simple idea of breaking away from the rectangular grid? Or free form cells placed on a diagram or schematic or blueprint?
--- Ban humanity.
What I don't understand is: they make us software developers use change control (for good reason), but upper management builds their business on this fragile house of cards spreadsheet system.
Is there a good change control system for spreadsheets? Sure, we could treat the xls files as opaque binary files, but that's losing most of the power of the change control system. I'm sure it's out there. Pointers anyone?
Bryan
There's a special place in hell reserved for the people who use Excel spreadsheets as mockups for HTML pages, considering that there are numerous HTML WYSIWYG editors available.
These are the same people who complain when the final HTML page does not look EXACTLY like the Excel version.
Explaining to them the impedance mismatch between Excel and HTML is a pointless endeavor. Banging my head over and over against a doorknob is more enjoyable.
An acquaintance criticized spreadsheets and praised pencil and paper forms because mathematical errors can crop up in either one, but with paper there is a double-entry system, running totals, and review by brains and eyeballs.
My argument is that paper is a big step backwards:
line 2 (non-paying customers): 10
line 3 (all customers; add 1+2): 400
I use a decently large spreadsheet to run Technical Video Rental, and I've certainly found bugs in it, but I've noted that the bugs are denser, and harder to find in those areas where the computation appears with more intermediate values hidden.
I think that a more confident spreadsheet programmer tends to hide more variables in complex cell formula; as I am not a confident spreadsheet programmer, I've - in many places - spread formula across multiple cells...and this has helped me figure out bugs.
This points out running totals as one example of good practice. Nothing could be simpler in a spreadsheet, yet we almost never see it.
So: why do spreadsheet programmers not do these things?
One reason that occurs to me is that spreadsheets conflate calculation with presentation. Intermediate values use up screen real estate, and look ugly.
Yes, there are tools that *allow* one to separate calculation from presentation: one could have two separate tabs, for example.
Yet these tools allow for disambiguation of calculation and presentation in the same way that assembly programming allows for object oriented design.
Or, to rephrase it: "Hidden steps considered harmful".
I don't even like C/C++ code that puts too much computation on a single line: I want intermediate values that I can step through with a debugger.
Perhaps what's needed are much higher level tools with in the spreadsheet that let one select cells of interest on one tab, then create a presentation tab based on these? I've got visions of cool Mac-Aqua-like greying out of 90% of cells, while one drags and drops the still-crisp cells around... Another/alternate idea: it might be nice if instead of the heavyweight tabs that most spreadsheets support, one could open zoom in on a single presentation cell and investigate little "pocket tabs" which might have ~10 x ~10 cells in them. The equivalent in C/C++ would be a complex expression on one line that decomposed itself into multiple lines with intermediate values only when you walk it with a debugger.
Now, don't get me wrong: I'm not arguing for fancy presentation layers, or dancing pie-charts; I'm arguing for the ability to take a huge page of calculations and tie the some of the inputs, intermediate steps, and output to a much smaller summary page, or, conversely, I'm arguing for the ability to take spreadsheets as they are currently written, and expand them into a debuggable format.
This, I argue, would make spreadsheets more useful, and decrease the number of bugs that crop up in them.
I catch flack each and every time I say that, but I still think it's true.
The ss has some serious advantages. In an environment of increasing number density and decreasing personal involvement, the need to have a comprehesive tool for data analysis could only have given birth to the spreadsheet. We could talk all day about how handy the ss is for many of the tasks in this environment.
But the space between the substance is what concerns me. Ss have allowed us to max/min too many things without much regard for the things that are undefined and necessarily intangible, but are still entangled in the matter itself. No corporate ss takes into account the costs of pollution, unemployment and general social degradation due to uncontrolled greed.
Like handguns, ss have brought us significant personal power at the cost of a good many social problems. Hence, they seem to require more careful handling and regulation. One aspect to this is training, and in general ethics training is a good place to start. (The BBB in my area is attempting to emphasize this, but they are meeting stiff resistance from the business community.)
Ss should be used with care, and their results are suspect anyway. That's the least message I've tried to convey for years.
[You have a stable society when some nut guns down a schoolyard and the law doesn't change.]
The worst misuse of Excel is as a database. And yet Administration / HR / Marketing staffers always end up using excel to store extremely important data. Sales records, accounts receivable, timesheets, inventory, contact lists - you name it.
:-\
And they always organize the list with subtle font-weight and cell-shading. Woe unto the intern that accidentally Selects Edit->Clear->Formats. Woe unto the manager that needs to sort the list by "bold" or "light-green."
Unfortunately, MS encouraged this perversity by including the menu option Data->Forms
What were they thinking?!
In the end, I have to come along with MS Access and clean up the mess. Oh well, it's a living
Excel charts are generally horrible- the default values tend to include extraneous "chartjunk" (to borrow a word from Tufte). It is tedious to get a nice-looking chart from your data, and seems to be very difficult to produce any even mildly-complicated graphics. I use R for charts, and I'm familiar with several linux charting apps of varying degrees of complexity, but I'm not sure whether there are any good OSS apps for Windows.
A much bigger problem, though, is Excel's lack of statistical quality! This website provides a quick overview, with links to some more detailed references. Excel is occasionally accurate for simple analyses, but why on earth would you use an unreliable program for _anything_? The only way to be sure that Excel did your ANOVA or whatever correctly is to redo it in better stats software, and at that point I don't see the advantage.
This is an issue that comes up regularly on scientific mailing lists. Lots of people seem to take the path of least resistance and use Excel for both their analysis and presentation. Ick!
I prefer to be called Evil Scientist.
I often get non-standardized spreadsheets and tables of data with this exact same problem. King-sized can be expressed many different ways.
I always take the bull by the horns and normalize the data myself (UPDATE T_PROD SET SIZE='KING' WHERE (SIZE='K' OR SIZE='74x80' OR SIZE='KNG')).
My simple rule is "never make the data worse". If you substitution fixes 100 bad entries and creates only 1 questionable result, it is a good substitution.
Remember, when faced with non-standardized data, the data ALWAYS has lots of errors in it. When you are through normalizing the data it WILL be better than when you started. The client can then go clean up the rest (they will have to even if you don't normalize).
- I live the greatest adventure anyone could possibly desire. - Tosk the Hunted