Slashdot Mirror


The Subtle Tyranny Of Spreadsheets

pipingguy writes "I found this link on a CAD-related mailing list which questioned the current state of spreadsheet usage. Since using spreadsheets is often only one step away from PowerPoint mastery, I thought it worthy of submission." An excerpt: "The second distortion caused by conventional spreadsheets is more subtle. It's described in a 1980s paper, written by university researcher Jeffrey Kottemann and others concerning what they called 'Performance, Beliefs, and the Illusion of Control.' The paper described an experiment in which subjects were asked to perform a planning task using different tools, some of them with elaborate what-if capability and others without it." Yup, it's a ZD/Yahoo link, but it raises good questions."

30 of 554 comments (clear)

  1. A better article on the same point... by Singletoned · · Score: 4, Informative
    "When you're holding Excel, everything looks like a spreadsheet" by Yoz Grahame

    I particularly enjoyed it, and it made me wonder why I've always hated Excel. maybe it's time to forgive...

    (I always used to like Pipdream on the Archimedes though. That was a combined spreadsheet and word processor).

  2. Excel isn't a DB! It's a FS! by beacher · · Score: 3, Informative

    To anyone that has Excel '97 - On a new Worksheet, Press F5. Type X97:L97 and hit enter. Press the tab key. Hold Ctrl-Shift. Click on the Chart Wizard toolbar button. Use mouse to fly around - Right button forward/ Left button reverse.

    Excel 2000? Under file menu, do 'Save as Web Page'. Say 'Publish Sheet' and 'Add Interactivity'. Save to some htm page on your drive. Load the htm page with IE (don't give me any grief over this one- you're already screwing around with Excel so I don't want to hear it ). You should have Excel in the middle of the page. Scroll to row 2000, column WC. Select row 2000, and tab so that WC is the active column. Hold down Shift+Crtl+Alt nad click the Office logo in the upper-left. If you have DirectX, you will be playing what looks like spy hunter. Use the arrow keys to drive, space to fire, O to drop oil slicks, and when it gets dark, use H for your headlights. -B

  3. Re:The cost of everything by misterpies · · Score: 3, Informative


    'Fraid your friend's not very original. The original quote is from Oscar Wilde: "a cynic is a man who knows the cost of everything and the value of nothing"(

    --
    The author of this post asserts his moral rights.
  4. Linux Spreadsheets by Anonymous Coward · · Score: 1, Informative

    I just found a very useful page discussing the history of spreadsheet software and their current problems and limitations. It also has some thoughts about possible improvements in spreadsheet architecture, and lists both free and commercial Linux-based apps. Check it out.

  5. Re:Spreadsheets are the worst sort of hack program by kidgenius · · Score: 2, Informative
    - Cryptic names for fields
    - No comments

    Umm.....Excel can add comments to individual cells, and you can rename columns/rows to something arbitrary.

  6. Re:please everybody by Anonymous Coward · · Score: 1, Informative

    You shouldn't be sending things done in MS Word to the printer either-- use something good.

  7. Re:please everybody by MrIrwin · · Score: 2, Informative
    "For example my address book is a text file." I knew an AiX programmer who joted down phone numbers and many other things using a script:

    Add

    All this script did was make a new script file called "key" which echoed the value.

    So if you just typed the key at the command prompt the value came straight back (of course the *nix cmd line offers many variations for retrieval!).

    Dread to think about inode usage if you did this on large scale thougth!

    --

    And if you thought that was boring you obviously havn't read my Journal ;-)

  8. Re:The question does not deserve mod pts, but answ by misterpies · · Score: 3, Informative

    >> And what the *fsck* does it mean, even if the data has a Gaussian distribution?

    that's what wikipedia is for :)

    it's a measure of the width of the distribution. Given a gaussian distribution, a random measurement will occur within one standard deviation of the mean with a probability of around 68%. Or to put it the other way round, if you have data and are trying to calculate the distribution, there's about a 68% chance that the true value of the mean falls within 1 s.d. of the value you calculated.

    If you don't have a gaussian distribution, you can still calculate a standard deviation but it will not have the same meaning with respect to the probability of you having got the right mean.

    This is of particular relevance to spreadsheets, since they're often used to do calculations on financial data such as stock prices and most financial data is not gaussian--it's 'log normal', meaning that the logarithm of the data values are gaussian, but not the data itself. So most people doing standard deviation calculations on such data are probably completely misinterpreting the results...

    --
    The author of this post asserts his moral rights.
  9. "Powerpoint Mastery" by foobsr · · Score: 4, Informative

    Since using spreadsheets is often only one step away from PowerPoint mastery.

    Erm .. for "Powerpoint Mastery" have a look at Tufte "The Cognitive Style of PowerPoint".

    Yes, I know it was discussed here before (as I guess), but still - it is worth a mention.

    CC.

    --
    TaijiQuan (Huang, 5 loosenings)
  10. Probabilities and spreadsheets by Advocadus+Diaboli · · Score: 2, Informative
    For all those who think that there is no exact result but a lot of probabilities that has to be taken into account I recommend the following book:

    Waltzing with bears by Tom DeMarco and Timothy Lister. And yes, they provide spreadsheets to calculate probabilities.

  11. Why not to use Excel as a DB by linuxtelephony · · Score: 4, Informative

    One of the biggest reasons is the sort function combined with [l]user error.

    If a spreadsheet has more columns that fit on the screen, and is used by more than one person, at some point you can almost count on someone highlighting some, not all, of the columns and then sorting the highlighted columns, and saving the file. When that happens, the highlighted columns are sorted, the rest are left as is. Worst, the next person to use the file doesn't always realize the corruption has occurred.

    This was a problem in Office 97 and earlier. I think it was a problem in Office 2k, but I don't remember. I have not tested this on Office XP or 2003.

    --
    . 62,400 repetitions make one truth -- Brave New World, Aldous Huxley
    1. Re:Why not to use Excel as a DB by alasdair · · Score: 2, Informative

      Good news: it's fixed in Office XP (Excel 2002 SP2). Selecting Sort highlights all the content in your table by default.

  12. Re:please everybody by ChiaBen · · Score: 5, Informative

    You shouldn't be forced to use SQL for manipulating data, you should be restrained from using Excel. ;) The reality of the differences between a spreadsheet and a database is that a spreadsheet lacks the data constraints (relationships) necessary to keep a user from entering bad data. A database can control this (data integrity) to a large degree (depending on your datamodel design).

    An example I fight with daily is product attributes. I maintain a n ecommerce database with about 180,000 products, each of which would have, say, a color. The problem is that if I import data from a spreadsheet it might randomly insert spaces in the data (i.e. "Black " or " Black" instead of "Black"), whereas if I get the data entered through our tools, the user selects from a list of colors, and only if the choice doesn't exist do they add a new one.

    You mention how people are doing a knee-jerk that 'DB's are sacred'. Yes, they are. So are spreadsheets, the problem is that people bastard-ize their use and end up confused about why they both exist, and how to use them.

    Database = Data storage, data consistency, ease of data maintenance
    Spreadsheet = Data analysis, data redundancy, lack of data integrity.

    That's how I see it, anyhow.

    --
    "If voting could really change things, it would be illegal. " - Revolution Books, NY
  13. Re:Kill them all by tiled_rainbows · · Score: 2, Informative

    VBA for Excel is very slow, but the figures you quote suggest that something is wrong in addition to the inherent slowness of the product. Have you checked the code doing the lookup? It might have been written badly.

    One bottleneck that often happen in VBA for Excel routines is that the application tries to keep the screen refreshed while the routine is crunching, which slows things right down. If it's not there already, you should get a big improvement in speed by putting

    application.screenupdating = false

    at the beginning of the routine, and

    application.screenupdating = true

    at the end, which will mean that the screen will freeze when you start the routine, and only redraw once it's all finished.

  14. Re:please everybody by Anonymous Coward · · Score: 1, Informative
    I'd be interested to know how you get 10 people sharing a spreadsheet. AFAIK most spreadsheet programs will only allow one person to open it at a time


    Tools -> Share Workbook -> Allow changes by more than one user...

  15. Excel is no good for Monte Carlo simulations. by twitter · · Score: 2, Informative
    Currently I am having a course in the use of Excel for prediction purposes. We do a lot of different case studies. We use Monte Carlo simulations, statistical tests, Markov chains and so on. We always discuss risk (variance, value-at-risk and so on). Excel is our basic tool and it is fine. We use different tools for specific purposes: Best-Fit for distribution fitting.

    I suggest you seek another university. A spread sheet is not even an adequate tool for teaching Monte Carlo (MC).

    MC simulations typically have value only when used in large runs. Without a reasonable number of simulations, you usually end up with very poor statistics for your outcome. A single shoot of the roulet wheel tells you nothing. Ten million shots can come close to simulating an acutal event.

    Spreadheets are absolutely the worst tool imaginable for such a task. Spreadsheets are good for simple calculations with well know quantities where you can check the intermediate results and make sure you have not made a bonehead mistake. They are best for back of the envelope, simplified model sanity checks. What you want for MC are any of the premade specialty packages, usually written in FORTRAN, that run as a batch process and have been extensively peer reviewed. I can imagine a dinky spreadheet MC tool with inputs for numbers of runs and odds, but with the underlying math hidden. UGH, you can't tell what it's doing! Anyone trying to cowboy a spreadsheet "solution" to this kind of problem is wasting their time.

    I can only demand that people without proper education are not allowed to deliver multi-million business forcasts.

    I second that demand.

    --

    Friends don't help friends install M$ junk.

  16. Re:please everybody by DataCannibal · · Score: 2, Informative

    "Yet the usual database products are a disease in themselves. I think that relational databases are not the best for transaction processing. I prefer to use programming languages with built in database support."

    I hope that I never have to book tickets using a credit card in any systems that you've been within ten miles of.

    I also hope that your customers never trust you with business critical data.

    If you can't see why, think about how you enforce data integrity and security if a customer asks you (or more likely someone else) to write, say, a new web interface to your data?

    --
    No but, yeah but, no but...
  17. Re:please everybody by morelife · · Score: 4, Informative

    I use Excel constantly to do layouts for invoices, estimates, cards, presentation, etc. because of the precise sizing control. It looks professional, not cheesy at all, some of the stuff looks like it came from a printer.

    As for the database aspect, Excel is well suited for a database table layout, that's one of it's principal uses. Not a relational database, but just simple tables, it great at. There's no reason you couldn't have an address book with hundreds of entries and a dialog box front end made with macros. I did this in the past, worked great.

  18. Re:Spreadsheets are the worst sort of hack program by MrWa · · Score: 3, Informative
    Some minor points:

    - Cryptic names for fields
    How is this "stamped out in programming languages" other than convention and training? Is is forced on you?
    - No comments
    Excel can "comment" to some extent.
    - No obvious flow of control
    Some would argue that the freeform nature of a spreadsheet is what makes it so appealing.
    - No modularisation
    - No capability to test spreadsheet sub-components in isolation
    - No capability to do a diff to see what's changed between versions

    Excel can keep track of every change made - didn't we just have an article making fun of Microsoft for this feature in Word?!

    Not saying the spreadsheets, or Excel specifically, is the answer to everything.

  19. RTFA: *nothing* to do with using spreadsheet as DB by blorg · · Score: 2, Informative
    The first poster obviously didn't read the article. It has *nothing* to do with using spreadsheets as databases. It covers two things. Firstly, it covers the inability of spreadsheets to deal with probability in projections, and our tendency to optimistically adjust some figures up and others down until we get the result we want. The problem is that a spreadsheet can only show a single 'snapshot' state:

    It's not too hard to appreciate the difference between products that incorporate uncertainty and those that don't: On the one hand, you've got, "We predict a $1 million profit in the first year"; on the other, "The expected Year 1 profit is $1 million, but there's a 30 percent chance of losses for the first two years." These different statements will lead to quite different discussions.

    The second point made is simply how what-if tools such as spreadsheets lead users to think they are making better forecasts while in fact their forecasts do not improve.

  20. Re:please everybody by generic-man · · Score: 4, Informative

    PDFCreator gives you "print to PDF" capability in Windows. It's free software.

    --
    For more information, click here.
  21. Re:please everybody by archen · · Score: 2, Informative

    It's called an ODBC connection. Just create a connector to you db of choice (I use Posgresql all the time). Most spreadsheets (Excel and OO Calc) then allow you view the database table through the spreadsheet.

    I think the problem is that people use Excel as the database, not as a front end for a database (which it does okay). 90% of the time most people would be far better off using MS Access instead of the way they use Excel. The difference being that you have to make table fields in Access, whereas Excel you can just start typing away.

  22. Re:The underlying problem... by mst76 · · Score: 4, Informative
    Do you really know what the standard deviation is? For example, you know that most standard tools for calculating standard deviation it assume the data has a Gaussian distribution. But what if your data poisson distributed, or hypergeometric, or maxwell-boltzmann... [...]
    That is not correct. The standard deviation of a random variable is the square root of its variance. The variance is the squared expectation of the centralized variable (variable minus its mean). Calculating the variance of a random variable involves integrating (or summing) the probability density function p(x) times x^2. If it's a well known distribution, one would usually look it up in a book, or try to solve the sum or integral (by hand or with Maple or Mathematica). Note that the only thing you need to calculate the standard deviation is the distribution or density function, no actual data is involved.

    The things listed as mean and standard deviation in Excel are sample means and sample standard deviations. If you have a list of numbers, and you assume that they were drawn from some distribution with finite expectation and variance, you can calculate the sample mean (simple average), which is an estimate of the expectation. Then you substract this mean from all your numbers and take their squares. The average of that is an estimate of the variance of the distribution, take the square root for the standard deviation. The nice thing is that these estimates converge to the expectation and variance regardless of distribution. If you do make the additional assumption of Gaussianity, you can also say something about the rate of convergence and the distribution of the estimate. But the basic formulas in Excel and such for sample mean and variance work for any distribution that has a mean and variance.
  23. Re:please everybody by jtwJGuevara · · Score: 2, Informative
    The problem is that for everyone who is a non-techie and has never handled a large sized database or have been involved in extracting data from relational databases, they don't see the difference at all between a database and a spreadsheet. In the eyes of your average joe user, they both have rows and columns so they both must be used to store rows of information!

    A real life example that still gets on my nerves to this day is when a co-worker in different department who knew I was a "computer person" asked me about creating an excel database. After trying to correct her about 3 times to no avail, I just gave up and said "sorry, I don't really work with Microsoft Office" and ran away.

  24. Re:please everybody by cavemanf16 · · Score: 2, Informative

    I agree wholeheartedly, which is why I can't yet use OpenOffice's spreadsheet app, Calc, all that much. For one, it doesn't hold more than 32k rows, and for two it doesn't have a very good "PivotTable/PivotChart" tool right now. I use Excel a lot to get aggregated data from millions of rows down to a more manageable 10's of thousands. From there I can begin analyzing seasonality trends, control charts, histograms of that data, etc. No database I know of has this kind of stuff built right in so that it's easy to manipulate like it is in Excel.

    If OpenOffice could make their spreadsheet app as stellar in it's power and formattability (I know, not a real word) as its Writer app, then I'd ditch MS Office in an instant.

    I've been preaching this for a while now on /. and I know I should be doing it on OpenOffice.org too, but I don't have that kind of time. Beef up that DataPilot, damnit!!!

  25. Re:please everybody by ibennetch · · Score: 2, Informative

    There are other good suggestions here (a linux box running samba, works well and is pretty easy; and PDFCreator), but I'll throw in one more: Cute PDF Writer
    Works real well; installs as a virtual printer probably like your "print to pdf" program that you're shelling out money for...CutePDF is free and works like a charm; however it requires installing GhostScript on the PC.

  26. Re:please everybody by DataCannibal · · Score: 2, Informative

    I've no idea. What kinds of access is available for getting into these system and how are the data intgrity rules enforced ?

    For your average RDBMS there are lots e.g. for Oracle there are SQLPLUS, Pro*C, ODBC, JDBV , ADO, etc etc. If anyone can get at your data through one of these methods and start twiddling with the data or data structures then you need to make sure that they can't twiddle things that they're not supposed to twiddle with and, if they are allowed to twiddle, then they musn't be able to break things. If you rely on implementing data rules in your client or middleware layer then someday someone will come along and build another client that twiddles around where ever it likes.

    --
    No but, yeah but, no but...
  27. Re:please everybody by Anonymous Coward · · Score: 2, Informative

    It'd actually be pretty easy in this situation... just select-all and double the row-height.

  28. Re:please everybody by Daytona955i · · Score: 2, Informative

    You do know you can use perl (know for it's regular expressions) and very easily interface with just about any database?

    Oh and you can use html or even tk to make an interface. The other great thing about perl is that when your company finally figures out windows does _not_ have a lower TCO and you switch to linux you can still use perl.

    Excel is just another bloated app from a company that really knows bloat. I worked for one company that used excel to generate some reports and it took forever. (in comparison to a database with a good front end.)

    Excel is ok when the data set is small but if you have a lot of data or a lot of calculation, use something else.

  29. Re:please everybody by morelife · · Score: 2, Informative

    Use FileMakerPro or other relational database tool for this. It's easier than Excel to set up the layout, and it's much easier to integrate the database (say of employee names and office phone numbers) that lies behind the page.

    Doing a small address book in Excel, was cheaper than buying a copy of Filemaker Pro.

    It was a lot more interesting in Excel, because I wrote the macros and designed the dialog box layout for the front end. You didn't know you could do that in Excel -- DID you?:) Yeah, I made a nice dialog box with the necessary fields, about 25-30 of them, and that's where you'd do your queries, and enter new records too. Actually it served the purpose wonderfully, and I could easily add fields that did math based on contacts/calls etc, which I did, to nice effect. That was a few years ago...

    And then I went heavily into drug and alcohol abuse, lost all my clients and contacts, and didn't need a database after all.