The Subtle Tyranny Of Spreadsheets
pipingguy writes "I found this link on a CAD-related mailing list which questioned the current state of spreadsheet usage. Since using spreadsheets is often only one step away from PowerPoint mastery, I thought it worthy of submission." An excerpt:
"The second distortion caused by conventional spreadsheets is more subtle. It's described in a 1980s paper, written by university researcher Jeffrey Kottemann and others concerning what they called 'Performance, Beliefs, and the Illusion of Control.' The paper described an experiment in which subjects were asked to perform a planning task using different tools, some of them with elaborate what-if capability and others without it." Yup, it's a ZD/Yahoo link, but it raises good questions."
I particularly enjoyed it, and it made me wonder why I've always hated Excel. maybe it's time to forgive...
(I always used to like Pipdream on the Archimedes though. That was a combined spreadsheet and word processor).
To anyone that has Excel '97 - On a new Worksheet, Press F5. Type X97:L97 and hit enter. Press the tab key. Hold Ctrl-Shift. Click on the Chart Wizard toolbar button. Use mouse to fly around - Right button forward/ Left button reverse.
Excel 2000? Under file menu, do 'Save as Web Page'. Say 'Publish Sheet' and 'Add Interactivity'. Save to some htm page on your drive. Load the htm page with IE (don't give me any grief over this one- you're already screwing around with Excel so I don't want to hear it ). You should have Excel in the middle of the page. Scroll to row 2000, column WC. Select row 2000, and tab so that WC is the active column. Hold down Shift+Crtl+Alt nad click the Office logo in the upper-left. If you have DirectX, you will be playing what looks like spy hunter. Use the arrow keys to drive, space to fire, O to drop oil slicks, and when it gets dark, use H for your headlights. -B
'Fraid your friend's not very original. The original quote is from Oscar Wilde: "a cynic is a man who knows the cost of everything and the value of nothing"(
The author of this post asserts his moral rights.
I just found a very useful page discussing the history of spreadsheet software and their current problems and limitations. It also has some thoughts about possible improvements in spreadsheet architecture, and lists both free and commercial Linux-based apps. Check it out.
- No comments
Umm.....Excel can add comments to individual cells, and you can rename columns/rows to something arbitrary.
You shouldn't be sending things done in MS Word to the printer either-- use something good.
Add
All this script did was make a new script file called "key" which echoed the value.
So if you just typed the key at the command prompt the value came straight back (of course the *nix cmd line offers many variations for retrieval!).
Dread to think about inode usage if you did this on large scale thougth!
And if you thought that was boring you obviously havn't read my Journal ;-)
>> And what the *fsck* does it mean, even if the data has a Gaussian distribution?
:)
that's what wikipedia is for
it's a measure of the width of the distribution. Given a gaussian distribution, a random measurement will occur within one standard deviation of the mean with a probability of around 68%. Or to put it the other way round, if you have data and are trying to calculate the distribution, there's about a 68% chance that the true value of the mean falls within 1 s.d. of the value you calculated.
If you don't have a gaussian distribution, you can still calculate a standard deviation but it will not have the same meaning with respect to the probability of you having got the right mean.
This is of particular relevance to spreadsheets, since they're often used to do calculations on financial data such as stock prices and most financial data is not gaussian--it's 'log normal', meaning that the logarithm of the data values are gaussian, but not the data itself. So most people doing standard deviation calculations on such data are probably completely misinterpreting the results...
The author of this post asserts his moral rights.
Since using spreadsheets is often only one step away from PowerPoint mastery.
.. for "Powerpoint Mastery" have a look at Tufte
"The Cognitive Style of PowerPoint".
Erm
Yes, I know it was discussed here before (as I guess), but still - it is worth a mention.
CC.
TaijiQuan (Huang, 5 loosenings)
Waltzing with bears by Tom DeMarco and Timothy Lister. And yes, they provide spreadsheets to calculate probabilities.
One of the biggest reasons is the sort function combined with [l]user error.
If a spreadsheet has more columns that fit on the screen, and is used by more than one person, at some point you can almost count on someone highlighting some, not all, of the columns and then sorting the highlighted columns, and saving the file. When that happens, the highlighted columns are sorted, the rest are left as is. Worst, the next person to use the file doesn't always realize the corruption has occurred.
This was a problem in Office 97 and earlier. I think it was a problem in Office 2k, but I don't remember. I have not tested this on Office XP or 2003.
. 62,400 repetitions make one truth -- Brave New World, Aldous Huxley
You shouldn't be forced to use SQL for manipulating data, you should be restrained from using Excel. ;) The reality of the differences between a spreadsheet and a database is that a spreadsheet lacks the data constraints (relationships) necessary to keep a user from entering bad data. A database can control this (data integrity) to a large degree (depending on your datamodel design).
An example I fight with daily is product attributes. I maintain a n ecommerce database with about 180,000 products, each of which would have, say, a color. The problem is that if I import data from a spreadsheet it might randomly insert spaces in the data (i.e. "Black " or " Black" instead of "Black"), whereas if I get the data entered through our tools, the user selects from a list of colors, and only if the choice doesn't exist do they add a new one.
You mention how people are doing a knee-jerk that 'DB's are sacred'. Yes, they are. So are spreadsheets, the problem is that people bastard-ize their use and end up confused about why they both exist, and how to use them.
Database = Data storage, data consistency, ease of data maintenance
Spreadsheet = Data analysis, data redundancy, lack of data integrity.
That's how I see it, anyhow.
"If voting could really change things, it would be illegal. " - Revolution Books, NY
VBA for Excel is very slow, but the figures you quote suggest that something is wrong in addition to the inherent slowness of the product. Have you checked the code doing the lookup? It might have been written badly.
One bottleneck that often happen in VBA for Excel routines is that the application tries to keep the screen refreshed while the routine is crunching, which slows things right down. If it's not there already, you should get a big improvement in speed by putting
application.screenupdating = false
at the beginning of the routine, and
application.screenupdating = true
at the end, which will mean that the screen will freeze when you start the routine, and only redraw once it's all finished.
evil math within Nature's Cubic Creation!
Tools -> Share Workbook -> Allow changes by more than one user...
I suggest you seek another university. A spread sheet is not even an adequate tool for teaching Monte Carlo (MC).
MC simulations typically have value only when used in large runs. Without a reasonable number of simulations, you usually end up with very poor statistics for your outcome. A single shoot of the roulet wheel tells you nothing. Ten million shots can come close to simulating an acutal event.
Spreadheets are absolutely the worst tool imaginable for such a task. Spreadsheets are good for simple calculations with well know quantities where you can check the intermediate results and make sure you have not made a bonehead mistake. They are best for back of the envelope, simplified model sanity checks. What you want for MC are any of the premade specialty packages, usually written in FORTRAN, that run as a batch process and have been extensively peer reviewed. I can imagine a dinky spreadheet MC tool with inputs for numbers of runs and odds, but with the underlying math hidden. UGH, you can't tell what it's doing! Anyone trying to cowboy a spreadsheet "solution" to this kind of problem is wasting their time.
I can only demand that people without proper education are not allowed to deliver multi-million business forcasts.
I second that demand.
Friends don't help friends install M$ junk.
"Yet the usual database products are a disease in themselves. I think that relational databases are not the best for transaction processing. I prefer to use programming languages with built in database support."
I hope that I never have to book tickets using a credit card in any systems that you've been within ten miles of.
I also hope that your customers never trust you with business critical data.
If you can't see why, think about how you enforce data integrity and security if a customer asks you (or more likely someone else) to write, say, a new web interface to your data?
No but, yeah but, no but...
I use Excel constantly to do layouts for invoices, estimates, cards, presentation, etc. because of the precise sizing control. It looks professional, not cheesy at all, some of the stuff looks like it came from a printer.
As for the database aspect, Excel is well suited for a database table layout, that's one of it's principal uses. Not a relational database, but just simple tables, it great at. There's no reason you couldn't have an address book with hundreds of entries and a dialog box front end made with macros. I did this in the past, worked great.
- Cryptic names for fields
How is this "stamped out in programming languages" other than convention and training? Is is forced on you?
- No comments
Excel can "comment" to some extent.
- No obvious flow of control
Some would argue that the freeform nature of a spreadsheet is what makes it so appealing.
- No modularisation
- No capability to test spreadsheet sub-components in isolation
- No capability to do a diff to see what's changed between versions
Excel can keep track of every change made - didn't we just have an article making fun of Microsoft for this feature in Word?!
Not saying the spreadsheets, or Excel specifically, is the answer to everything.
It's not too hard to appreciate the difference between products that incorporate uncertainty and those that don't: On the one hand, you've got, "We predict a $1 million profit in the first year"; on the other, "The expected Year 1 profit is $1 million, but there's a 30 percent chance of losses for the first two years." These different statements will lead to quite different discussions.
The second point made is simply how what-if tools such as spreadsheets lead users to think they are making better forecasts while in fact their forecasts do not improve.
PDFCreator gives you "print to PDF" capability in Windows. It's free software.
For more information, click here.
It's called an ODBC connection. Just create a connector to you db of choice (I use Posgresql all the time). Most spreadsheets (Excel and OO Calc) then allow you view the database table through the spreadsheet.
I think the problem is that people use Excel as the database, not as a front end for a database (which it does okay). 90% of the time most people would be far better off using MS Access instead of the way they use Excel. The difference being that you have to make table fields in Access, whereas Excel you can just start typing away.
The things listed as mean and standard deviation in Excel are sample means and sample standard deviations. If you have a list of numbers, and you assume that they were drawn from some distribution with finite expectation and variance, you can calculate the sample mean (simple average), which is an estimate of the expectation. Then you substract this mean from all your numbers and take their squares. The average of that is an estimate of the variance of the distribution, take the square root for the standard deviation. The nice thing is that these estimates converge to the expectation and variance regardless of distribution. If you do make the additional assumption of Gaussianity, you can also say something about the rate of convergence and the distribution of the estimate. But the basic formulas in Excel and such for sample mean and variance work for any distribution that has a mean and variance.
A real life example that still gets on my nerves to this day is when a co-worker in different department who knew I was a "computer person" asked me about creating an excel database. After trying to correct her about 3 times to no avail, I just gave up and said "sorry, I don't really work with Microsoft Office" and ran away.
I agree wholeheartedly, which is why I can't yet use OpenOffice's spreadsheet app, Calc, all that much. For one, it doesn't hold more than 32k rows, and for two it doesn't have a very good "PivotTable/PivotChart" tool right now. I use Excel a lot to get aggregated data from millions of rows down to a more manageable 10's of thousands. From there I can begin analyzing seasonality trends, control charts, histograms of that data, etc. No database I know of has this kind of stuff built right in so that it's easy to manipulate like it is in Excel.
/. and I know I should be doing it on OpenOffice.org too, but I don't have that kind of time. Beef up that DataPilot, damnit!!!
If OpenOffice could make their spreadsheet app as stellar in it's power and formattability (I know, not a real word) as its Writer app, then I'd ditch MS Office in an instant.
I've been preaching this for a while now on
There are other good suggestions here (a linux box running samba, works well and is pretty easy; and PDFCreator), but I'll throw in one more: Cute PDF Writer
Works real well; installs as a virtual printer probably like your "print to pdf" program that you're shelling out money for...CutePDF is free and works like a charm; however it requires installing GhostScript on the PC.
I've no idea. What kinds of access is available for getting into these system and how are the data intgrity rules enforced ?
For your average RDBMS there are lots e.g. for Oracle there are SQLPLUS, Pro*C, ODBC, JDBV , ADO, etc etc. If anyone can get at your data through one of these methods and start twiddling with the data or data structures then you need to make sure that they can't twiddle things that they're not supposed to twiddle with and, if they are allowed to twiddle, then they musn't be able to break things. If you rely on implementing data rules in your client or middleware layer then someday someone will come along and build another client that twiddles around where ever it likes.
No but, yeah but, no but...
It'd actually be pretty easy in this situation... just select-all and double the row-height.
You do know you can use perl (know for it's regular expressions) and very easily interface with just about any database?
Oh and you can use html or even tk to make an interface. The other great thing about perl is that when your company finally figures out windows does _not_ have a lower TCO and you switch to linux you can still use perl.
Excel is just another bloated app from a company that really knows bloat. I worked for one company that used excel to generate some reports and it took forever. (in comparison to a database with a good front end.)
Excel is ok when the data set is small but if you have a lot of data or a lot of calculation, use something else.
Use FileMakerPro or other relational database tool for this. It's easier than Excel to set up the layout, and it's much easier to integrate the database (say of employee names and office phone numbers) that lies behind the page.
Doing a small address book in Excel, was cheaper than buying a copy of Filemaker Pro.
It was a lot more interesting in Excel, because I wrote the macros and designed the dialog box layout for the front end. You didn't know you could do that in Excel -- DID you?:) Yeah, I made a nice dialog box with the necessary fields, about 25-30 of them, and that's where you'd do your queries, and enter new records too. Actually it served the purpose wonderfully, and I could easily add fields that did math based on contacts/calls etc, which I did, to nice effect. That was a few years ago...
And then I went heavily into drug and alcohol abuse, lost all my clients and contacts, and didn't need a database after all.