Advanced Excel for Scientific Data Analysis
cgjherr writes "If the recent financial meltdown has left you wondering, 'When does exponential decay function stop?' then I have the book for you. Advanced Excel for Scientific Data Analysis is the kind of book that only comes along every twenty years. A tome so densely packed with scientific and mathematical formulas that it almost dares you to try and understand it all. A "For Dummies" book starts with a gentle introduction to the technology. This is more like a "for Mentats" book. It assumes that you know Excel very well. The first chapter alone will have you in awe as you see the author turn the lowly Excel into something that rivals Mathematica using VBA, brains, and a heaping helping of fortitude." Read on for the rest of Jack's review.
Advanced Excel for Scientific Data Analysis
author
Robert de Levie
pages
700
publisher
Oxford Press
rating
9
reviewer
Jack Herrington
ISBN
9780195370225
summary
Use Excel for high end scientific data analysis akin to Mathemetica
When I first opened this book my mouth just dropped. It had been years since I had seen a book typeset using LaTeX. But in an instant it made sense as the book is crammed packed with the kind of equations that would have been a nightmare to build with any other tools. Chapter after chapter has everything a really smart person needs to do curve fitting, statistical measures, differential equations, time-frequency analysis. But don't expect a play by play here. You will get the equations, set within a few dense paragraphs, with maybe a spreadsheet and a chart or two to show the results.
The first chapter concentrates on the getting the most out of Excel as a tool. All the chapters that follow dig into specific data analysis techniques. Chapters two, three and four are on least squares. Chapter five and six cover the analysis in the time domain including fourier transforms. Chapter seven covers differential equations. Chapter eight returns to Excel by digging in deeper into macros. Which leads into chapter nine, where we dig deeper into basic mathematical operations. Chapter ten covers matrix operations. And chapter eleven wraps it all up by giving you some spreadsheet best practices.
In University style there are also some exercises that you can do along the way if you want to tweak your brain pan a little more. To amuse myself I tried a few and I believe the book would have assessed my attempts 'wanting' if it had a voice to tell me.
Where most books like this would have several authors this book has just one; Roberte de Levie. This means that the tone, style and quality of the book is consistent throughout. A fact that you will come to appreciate as the book wades in ever increasingly deep data analysis concepts as the chapters roll on.
Though I would have preferred the book to have code samples in C#, I understand that the language of Excel is VBA and I guess I have to live with that. Thankfully VBA has come a long way and if you so inclined it would likely be easy to translate the code into C#, Java, or whatever else you like.
The fact that one person wrote the book left me wondering, "Who is this guy?" In my minds eye I kinda of figured he would look like one of those pulsing brain guys from Star Trek. Turns out he is a professor at Bowdoin College. And his fields of study include ionic equilibria, electrochemical kinetics, electrochemical oscillators, stochastic processes, and a whole lot more stuff that almost seems made up to sound impressive.
When this book isn't serving as an amazing reference for both Excel, scientific problem solving, or just insane equations it serves other purposes as well. It's a handy portable IQ test, as the count of pages you can grind through in one sitting, plus 90, is roughly your intelligence quotient. And if you fail at that you can always put a copy of the book, along with the Orange Bible, under your pillow and try to osmose your way to becoming the Kwisatz Haderach.
In all seriousness, this is a great book. It represents the kind of in-depth work and research we used to see in books that came out twenty years ago. Robert is to be applauded for his work. This is an excellent resource for anyone looking to do scientific data analysis but who was unaware of the powerful capabilities that Excel provides that is likely waiting just one Startup menu click away.
The book is not without fault. I would have preferred that it had been in color, or at least have one color section to show some of the more impressive visualizations that I'm sure would look great in color. In addition the index is silly short for a book that clocks in at 700 pages. But those are only minor quibbles for what is all-in-all an amazing piece of work.
You can purchase Advanced Excel for Scientific Data Analysis from amazon.com. Slashdot welcomes readers' book reviews -- to see your own review here, read the book review guidelines, then visit the submission page.
The first chapter concentrates on the getting the most out of Excel as a tool. All the chapters that follow dig into specific data analysis techniques. Chapters two, three and four are on least squares. Chapter five and six cover the analysis in the time domain including fourier transforms. Chapter seven covers differential equations. Chapter eight returns to Excel by digging in deeper into macros. Which leads into chapter nine, where we dig deeper into basic mathematical operations. Chapter ten covers matrix operations. And chapter eleven wraps it all up by giving you some spreadsheet best practices.
In University style there are also some exercises that you can do along the way if you want to tweak your brain pan a little more. To amuse myself I tried a few and I believe the book would have assessed my attempts 'wanting' if it had a voice to tell me.
Where most books like this would have several authors this book has just one; Roberte de Levie. This means that the tone, style and quality of the book is consistent throughout. A fact that you will come to appreciate as the book wades in ever increasingly deep data analysis concepts as the chapters roll on.
Though I would have preferred the book to have code samples in C#, I understand that the language of Excel is VBA and I guess I have to live with that. Thankfully VBA has come a long way and if you so inclined it would likely be easy to translate the code into C#, Java, or whatever else you like.
The fact that one person wrote the book left me wondering, "Who is this guy?" In my minds eye I kinda of figured he would look like one of those pulsing brain guys from Star Trek. Turns out he is a professor at Bowdoin College. And his fields of study include ionic equilibria, electrochemical kinetics, electrochemical oscillators, stochastic processes, and a whole lot more stuff that almost seems made up to sound impressive.
When this book isn't serving as an amazing reference for both Excel, scientific problem solving, or just insane equations it serves other purposes as well. It's a handy portable IQ test, as the count of pages you can grind through in one sitting, plus 90, is roughly your intelligence quotient. And if you fail at that you can always put a copy of the book, along with the Orange Bible, under your pillow and try to osmose your way to becoming the Kwisatz Haderach.
In all seriousness, this is a great book. It represents the kind of in-depth work and research we used to see in books that came out twenty years ago. Robert is to be applauded for his work. This is an excellent resource for anyone looking to do scientific data analysis but who was unaware of the powerful capabilities that Excel provides that is likely waiting just one Startup menu click away.
The book is not without fault. I would have preferred that it had been in color, or at least have one color section to show some of the more impressive visualizations that I'm sure would look great in color. In addition the index is silly short for a book that clocks in at 700 pages. But those are only minor quibbles for what is all-in-all an amazing piece of work.
You can purchase Advanced Excel for Scientific Data Analysis from amazon.com. Slashdot welcomes readers' book reviews -- to see your own review here, read the book review guidelines, then visit the submission page.
Don't do it!
The whole financial mess was due to the use of Excel? No wander!
If the recent financial meltdown has left you wondering, 'When does exponential decay function stop?' then I have the book for you. Advanced Excel for Scientific Data Analysis
So THAT's why we had a financial meltdown. All of those investment banks were doing their books and analysis with Excel 2007.
Just disrupt the deflector shield with a tachyon burst.
There's something hard to reconcile about the reviewer's obvious awe and the fact that the book was written by someone who thinks doing meaningful scientific data analysis in Excel is a good idea.
...but will it blend?
Talk about the wrong tool for the job. If you need to do any sort of serious data analysis, use R, not Excel.
Give me Classic Slashdot or give me death!
"The first chapter alone will have you in awe as you see the author turn the lowly Excel into something that rivals Mathematica using VBA, brains, and a heaping helping of fortitude."
Then why not just use Mathematica?
But it seems you have not seen Word 2007 equation editor. It's close, in both capabilities, and output quality, to LaTeX. Too bad the rest of Word sucks ass.
>> that would have been a nightmare to build with any other tools
Someone should tell this guy about SAGE http://www.sagemath.org/
I would urge anyone attempting significant data analysis to try a dedicated analysis software package such as ROOT. ROOT has much more support for data trees, histograms, functions, fitting, etc., and ROOT now also has a Python interface.
You see, there is a fundamental problem in science and the problem can be summarized as this: how do you get the right results in order to optimize the grants that you receive. Spreadsheets are ideal for this purpose for two reasons. First of all, they are designed to handle financial data. This is great because financial data are what grants are all about. For example: will result X allow for a conference in Hawaii or California this year.
The other big reason to use spreadsheets is that they make data more maluable. Normal scientific tools make it difficult to micromanage the data that you acquire, partially because the people who produce that software have this mistaken notion that data has to be managed in a consistent way. So you're usually stuck doing the same thing to an entire dataset, and it's even difficult to treat different datasets in different way. But spreadsheets expose all of that data, so it is easy to tweak an observation here and a variable there to get the desired result to maximize your grant.
So you see, spreadsheets are a tremendously valueable tool for scientists. It is the best tool for the job.
Does R have hooks for Perl or some other scripting language?
I have test equipment that spews data and I need to load it via a script. Excel is quite suited for this.
So? What's so special about that? You can turn C, Fortran, or even assembly language into something that rivals Mathematica using brains and a heaping helping of fortitude. This is arguably a better deal, since you don't need the VBA.
--MarkusQ
Mistaken Identifiers: Gene name errors can be introduced inadvertently when using Excel in bioinformatics
------ merkur (4T] berlin . c0m
...everything looks like a snowglobe!
Hardcore data analysis in Excel is almost always a bad idea. You can almost always find a way to do it in excel, and you can almost always find a way to do it better, faster, and cheaper somewhere else.
R, MatLab, Mathemateica, Python/Numpy, SigmaPlot, and any number of old, well written, debugged and vetted numerical libraries written in C or Fortran. I've used all of these at various times to solve something that a co-worker couldn't figure out how to do in Excel.
I fit quick linear regressions in Excel. For *anything* else, there is a better choice.
-V-
Who can decide a priori? Nobody.
-Sartre
As a graduate student in physics, I have never seen a serious researcher use excel for data analysis. Nor for that matter, is it common to see a scientist using windows for the OS--all linux and mac OS. This is akin to writing a book about publishing scientific papers with office. Instead, learn LaTex... The only group of people who use excel for large data analysis are financial types and MBAs. Need I remind you how that turned out?
for scientific data analysis.
I know it is popular and many science and engineering faculty lazily encourage their graduate students to use it. However, something like matlab beats the crap out of excel any day. Spreadsheets tend to obfuscate relationships between data, require a lot more clicking (read human intervention) and waste time that could be spent thinking about the data, and are singularly unsuited for analysis of similar sets of data (a situation any scientist faces when he has to do a series of experiments).
Matlab might take sometime to initially write the scripts, but it is so powerful and extensible that no one in their right mind would want to use excel. If you are a slave to spreadsheets, get yourself a copy of Microcal Origin or Labplot.
Excel is especially unsuited to the task of preparing figures for scientific publications. The default formatting is at once wrong for the task and hard to change. Once you set your preferences in matlab (easy to do), you are set for life.
In my experience, excel is also rarely used for anything serious outside of US. Maybe its an indictment of how lazy, slow witted and easily misled our pool of talent is becoming.
When I worked in the semiconductor industry in the late 90's, Excel nearly cost us several hundred grand. It had "helpfully" autocorrected a code in the documentation for a mask used in one of our clock buffer chip products. Had the engineers not caught this mistake in the printout, the fab of the chip would have been botched. The engineers were mad as I recall because they would change the code and Excel would change it back. If you can't prove what your tool is doing, you don't get to use it is what they taught me in engineering school.
Is perfectly safe and trusty for that kind of work. Thats why we are using it here at the Large Hadron Co
SPSS has now become the standard data analysis package for quantitative studies in social sciences. It's very crappy software, and it wouldn't take a whole lot of augmentation to get Excel do what SPSS does.
The problem is that social scientists don't want to mess with the internals too much, and SPSS made for them a point and click interface - in effect, they out-Microsofted Microsoft. They charge an insulting $1500/copy and completely dominate the universities, so they're making good money.
They seriously need some competition.
Serious researchers do not use excel. Period. Ask scientists in any national laboratory. Excel is a toy designed for simple accounting talks. It is slow, it is limited but, most importantly, it makes error prone since it refers to variables by their position on the spreadsheet, not their name or meaning or role in the workflow of the computation. I guess it is an ok tool to teach undergrads who will never go into sciences.
It's OK for simple stuff, but trying to do something like implementing a loop in a spreadsheet.
This is one reason the VB scripting turns out to be highly useful. But that said... for half the things loops might be useful for in a normal context, they're wrong in a spreadsheet. Iterating over a set of data isn't done with loops, it's done with applying formulas over a range of cells. And if you turn on iteration for the spreadsheet, it *is* possible to build flow-controlling state machines without using the scripting engine. Not particularly natural for most imperative programmers, but definitely possible.
So I don't think loops are the issue. My understanding is that the biggest problem with using Excel for detailed number crunching is in how they handle some precision/float issues.
Tweet, tweet.
Advanced Excel for Scientific Data Analysis is the kind of book that only comes along every twenty years
Excel was introduced in 1990. So, assuming that it was introduced with a book just like this one in 1990, that would be "a kind of book that comes along every eighteen years". I'm certain the poster would have realized this had he or she applied what she'd read in the first edition to do the proper calculation.
..."When does exponential decay function stop?" scientific data analysis is definitely not for you. How about "How is babby formed?"
all these years i referred to ms excel as "HEXEDcell"...
Others referred to ms office as ms orifice.
Now, if OO.o fails to at least equal mso, and if OO.o crashes Wall Street 2.0, would we call OO.o "OpenOrifice", in the name of "fairness"?
Previously: "Linux... Toward the Sunrise..." Now: "Linux... Toward the-- No, now, part of Every Sunrise"
So what you are saying is that Excel should be banned from the Congress?
-dZ.
Carol vs. Ghost
For what? The three items you discuss do different things.
Presumably nobody uses LaTeX for making charts or for performing calculations (the latter of which is really what's being addressed here).
In particular, how much is applicable to the Mac versions? 2008 dropped VBA support, I believe, which sounds like it could wipe out much of the applicability of this book.
What in the world are you talking about? :)
LaTeX is a markup language. You can express math with it, but it doesn't do anything for you in terms of analysis.
Excel is good for small data sets and quick looks at stuff - but painful to develop in.
Mathematica requires college-level calculus and linear algebra... not PhD stuff by any stretch.
Anyway, you left out Matlab - which is pretty awesome. Depending on what you are doing, there is also R, Maple, Minitab, MathCAD, yada, yada, yada. Lately I've been doing stuff in Python... SAGE is pretty nifty, and the NumPy/SciPy stuff is coming along well (it is included in SAGE).
W..w..W - Willy Waterloo washes Warren Wiggins who is washing Waldo Woo.
Its clear that the reviewer is not qualified to review this text. First, if they were familiar with scientific data analysis, they would not use Mathematica for their ridiculous comparison. Mathematica is also the wrong tool for the job. At least it does math correctly, something Excel cannot claim.
You haven't seen a book typeset in LaTeX recently? What scientific computing books do you read that allow you to avoid LaTeX for years?
Excel is not the right tool for the job. If you are going to put in the time to learn some new math, learning a better tool along the way makes sense.
Get a different book, and a different software package.
When I was a freshman in engineering school, my intro to engineering class required us to purchase a book similar to this. We were given two class periods to work with Excel, supervised by a TA. (it was considered a lab) I remember the assignment involved proving that sin^2+cos^2=1.
If you couldn't figure out Excel within those two class periods, it was recommended that you switched your major to business administration. The business administration school had a semester long class devoted to learning Excel.
One of our competitors trademarked the term "hypothesis". From now on, we will call them "boneheaded ideas".
You cannot be serious ..
"Excel 2007, like its predecessors, fails a standard set of intermediate-level accuracy tests in three areas: statistical distributions, random number generation, and estimation"
davecb5620@gmail.com
Python for scientific analysis,
Python is the solution I recommend for everyone who looks for tips on advanced Excel uses. Excel is OK if you just want some quick and dirty solution for a small problem, but if you have to go to the trouble of reading a book, Excel is clearly not the best solution.
For scientists and engineers who need something more than what Excel (and possibly Matlab) offers, I recommend starting with either A Byte of Python or Dive Into Python.
Presumably nobody uses LaTeX for making charts or for performing calculations (the latter of which is really what's being addressed here).
The PSTricks package can make some really beautiful charts in LaTeX documents.
There's a book called Tex Unbound that gives lots of other ways to make charts in TeX/LaTeX that are way better than Excel output (or most other graphing packages IMO).
don't forget Octave! (it's more-or-less a FOSS Matlab clone, and follows more closely to Matlab syntax than SciLab)
Some of us have to crunch numbers every day and it's interesting to consider Excel as a tool for this purpose. But to then have a reviewer talk about things like "insane equations" makes it clear that the reviewer sees "equations" as some kind of esoteric icon associated with peculiar people with "pulsing brains" rather than the bread and butter of the jobs of thousands, if not millions of people the world over. How can /. post a review by such a clearly ignorant reviewer? It verges on embarassing to read. What next? A review of a book on functional programming by someone whose only experience with computers is programming the VCR?
Doesn't it make you feel good to know that our freedoms are protected by politicans, lawyers and journalists.
I need infinite loops, you insensitive clod!
Another book in the same line is E. Joseph Billo's "Excel for Scientists and Engineers," Wiley-Interscience, 2007 ISBN 978-0-471-38734-3, including CD.
You may or may not agree with using Excel, but if you do, this book will help with roots of real and imaginary equations, ordinary and partial differential equations, matrices, and statistics.
Sometimes you just don't have the luxury of using Matlab, Spotfire, etc.
Live in the Future; It's Just Starting Now!
openorifice... if only we could come up with a new graphic for their logo...
I thumbed through the book but not impressed.
The author probably has used excel as best as any one can in doing the task he intends. But for most of people, the effort to acquire the skill by reading the book is not well-spent, since one can probably learn other tools which really intended for scientific analysis.
For statistic packages, R probably is much better, though I would prefer SAS. Try a huge data set (200MB, and put it in excel, your system will crawl before excel crashes, but in SAS, it will be really fast, and provide much more statistics. How much faith do you have in Excel's statistical function anyway?
I don't.
As for differential Equations, I would try matlab, if I really doing NA, I will choose netlib's packge anytime over Excel.
One can know Excel really well, and bend it to do all kinds of job, but in the end, it can only do that much. you probably can use bash to write a trading system, but why bother?
So according to the book, here's the recipe:
1. Write your data analysis software in VBA
2. Use the Excel cells, buttons, bells, and whistles for the I/O
3. Profit!
The math is actually irrelevant. Any computational mathematics book that respects itself uses pseudocode for the examples. If it is possible to program in one programming language it should be possible in any other language too.
I tried it too, although I wasn't nearly as crazy to do any numerical computations in VBA. I wrote the program in pure good old Fortran 95, wrote some VBA scripts to read the Fortran ASCII output, and set everything else up in Excel that my boss liked (I'm a chemical engineer). There you go: it's fast to program, fast to run, easy to maintain.
I would like to see anyone try to keep up with the Microsoft paraphernalia between VBA-Excel versions, if the whole thing is written in VBA. Not to mention the problems that I had with the locale when I tried to run the VBA code in a computer running a german version of Excel that had decided that the decimal point is there as a thousands separator and the comma was used as a decimal point. The setting for it in Excel was nowhere to be seen (I still haven't really figured it out. The central Windows setting seemed to have no influence on it, although I suppose it should) and 1.234 was then 1234 and 1.2E-02 was a character string. Oh, the pain... Thankfully, my *basic* Fortran part absolutely did not care, it just worked, and only the I/O needed to be reviewed.
Try to send the program to a customer without knowing what kind of Excel version he is running. We had to go as far as Office 97 just to be sure, and there was still the problem with the locale. After a year, the I/O was useless, but who cares? It was only 1% of the code.
I would still use Excel, but for nothing other than the most trivial tasks. There are wonderful libraries out there that work with Fortran and produce very nice graphs on the fly.
Right... why can't 1 app have all that functionality? Seems like a non-brainer.
Sort of the idea behind the office "suite", but for mathematics. Too profound a concept?
In my experience, excel is also rarely used for anything serious outside of US. Maybe its an indictment of how lazy, slow witted and easily misled our pool of talent is becoming.
I recently spent some time in Japan in a design group for a large Japanese company. I was showed the massive spreadsheet used to calculate power plant capacity and consumption. I almost cried. The whole sheet was based upon one large circular reference. Nobody understood it and it referenced steam tables through a plugin but didn't show the output of these calculations.
The US is not alone in using Excel for things it wasn't meant to do. Now please excuse me while I run a couple monte carlo simulations in Excel.
Even those who arrange and design shrubberies are under considerable economic stress at this period in history.
Feeding the trolls is one thing - you're holding out a ham and calling "Here boy!".
This comment is for entertainment purposes only. Any similarity to real insight or information is purely coincidental.
I hate to admit it, but I occasionally prototype small scientific models in Excel before adding them to my (C++) program, which is kind of an integrated meta model. It is very nice to get instant feedback on changes to parameter values, to get a feel for how the model reacts.
Bounded loops are really not a problem, you just have to make all the steps explicit (each step will be a row in a sheet). It can even be nice to see all the intermediate values.
Office Home & Student's license is not limited to student activities only. If we want to talk about a student's limited license, Office Ultimate is $50 and becomes a full license upon graduation for me (http://www.umflint.edu/its/services/sales/software.htm). The Mathematica License limits you to student learning activities only and does not allow for actual research.
Look at all those posts saying "Excel is not the right tool for this" or "When all you have is a hammer...". The point was not grokked by those folks.
I'll lay it out for you, plain and simple:
This book is like installing a linux kernel onto a wristwatch.
We should be marvelling at the feat, not lambasting a tool that was "hacked" to do so much more than it is normally used for. If you can't appreciate that kind of work, maybe you should just stick to appreciating fine arts.
/.'s 10 Millionth
Excel sucks for numerical routines. If you Google, you'll find papers about it.
Neither the numerics community nor the stats community trusts it. Even financial models are done in C++, Matlab or Maple, not Excel.
Excel is a good tool when you're into accounting and need 2 digits after the decimal point.
Stop promoting this tool as if it was reliable software. It's criminal. Even worse, medical researchers might end up believing it's software you can trust.
Main difference between the BSD license and the GPL license: one is from California and the other is from Massachusetts
...and your hard work will last exactly as long as it takes Microsoft's marketing department to decide to tweak VBA for some strategic purpose. They'll take it out of Excel, or only put it into Excel Plus, or change all the functions to work only with .NET or whatever their newest strategic push is for. Or it will mysteriously happen that the current version of VBA breaks under Windows 7 and can only be fixed by upgrading Excel, which will then be segmented into seven flavors (Excel Home, Excel Home Office, Excel Small Business, Excel Enterprise), etc. with VBA being included only in Excel Premium Pro Deluxe for Enterprises for $895.00.
Write it in VBA and it will last as long as something written in Exidy Sorceror BASIC or a TI-99A program or
Write it in VBA and your math results will depend on programmers who in all likelihood have never studied numerical analysis, and whose work isn't being carefully reviewed or QA-ed numerical accuracy because Excel isn't intended for that market. (On the other hand, cents roundoff and internal rate of return will probably be very reliable).
"How to Do Nothing," kids activities, back in print!
I've been using spreadsheets for scientific calculations since my freshman physics class, when we were using Lotus 1-2-3. The first time that I saw Excel in operation (1993?),I became a convert.
I still use Excel quite a lot for everything from simple data analysis (e.g. ANOVA [but not the built-in ANOVA function - that's lame], regression analysis, &c.), data aggregation, &c. It works fine for just about any application that I throw at it -- as far as number crunching goes. Worse case scenario, I have to write a bit of VBA to handle a special function. The built in data base tools (e.g. Match(), Index(), VLOOKUP(), &c.) make it easy to model multiple scenarios, as well (and link them to e.g. a list-box of options with a couple of mouse clicks). The array functions make it easy to perform parallel operations on large related sets of data with minimum work.
Where Excel fails (and does so miserably, in my opinion) is in it's display capabilities. Have you ever tried to plot a response surface in Excel? Yuck! - it's UGLY. I have, on more than one occasion, used Excel to aggregate/crunch a large data set, then exported the results into MiniTab for report generation.
Don't knock it till you've tried it.
Oh - and for you open source fanatics, out there (I'm a fan of open source, and contribute a bit, myself) - Open Office Calc is still *far* behind Excel in this type of application. Getting better, but still not there.
MATLAB is great at advanced stuff, but ask it to do something simple, and it will be slower than a tortoise knee deep in syrup.
The paradox that always gets me with larger simulations is whether to spend 1 day writing it in MATLAB, and 6 days running it -- or spending 6 days coding it in C, and 1 day running it?
GAAH! MY PRINTER IS ON FIRE!!! PUT IT OUT! PUT IT OUT!
Take a look at the compile option in SAGE, which uses cython. If your stuff is pretty simple, it might be worth a shot.
W..w..W - Willy Waterloo washes Warren Wiggins who is washing Waldo Woo.
Quick,somebody cue the ASCII Goatse guy!
ACs don't waste your time replying, your posts are never seen by me.
While I agree, sometimes being an engineer or analyst means working with one or two or six hands tied behind your back because ... IT-imposed user-permissions.
That's was why I had to do data analysis on the payroll of over 100k employees in VBA.
Large organisations don't just allow any new software and this was task was to be performed for only 12 months. I had to use what was available (which was Excel & Access). Books like this could be very useful for many people who have to do serious work in an environment when they don't have a choice what tool to use.
If this were really happening, what would you think?
The real problem with doing real scientific or research work with excel are the file sizes. Excel can handle more and more data each release, but my data files are gigs in size and way too big for excel.
1. Lipstick on a pig
2. Silk purse, sows ear
3. With sufficient thrust they can be made to fly
4. Polished Turds
Thanks but no thanks. I'll stick with Maple, Mathematica and MATLAB.
As the island of our knowledge grows, so does the shore of our ignorance.
When it comes to MATLAB, you have to remember that it's heavily optimized for vector and matrix math. For-loops are quite slow, and nested loops are downright painful... Always try to vectorize your code when you can.
The view was horrible and the smell was even worse; Julie severely regretted becoming a proctologist.
subject say it all.
No, it doesn't. My wife uses it in one of the 200-level classes she teaches, and she says the students go for it.
Luckily, the school has some kind of deal to get reasonably-priced licenses for the students. Anybody who's planning a career in math or something that requires a lot of math, and learns how to use Mathematica in her/his second year, is going to be way ahead of the game.
You are welcome on my lawn.
My math professor (at UNCC) used to use Excel all the time for complex stuff. Of course, our study was Numerical Analysis, so he also used to for fun have high school students program advanced stuff on the TI-82. One kid did some whiz-bang thing in like 4K of memory.
But he used to have really advanced, really cool stuff in Excel. For numerical analysis, you don't necessarily need any particular tool, the study is in how accurate you can make the algorithm on X hardware / platform.
Hell, I did some complex numerical analysis in Maple, for god's sake.
>> point and click system will always be inferior to
>> LaTeX when it comes to equations
You can use LaTeX-like syntax now to enter equations.
> It had been years since I had seen a book typeset using LaTeX.
The publishing industry (including my company) typesets books using LaTeX all the time. The reason you don't notice it (apart from the superior quality) is that it does its job of typesetting very well.
If this book has been typeset using LaTeX then I'm a Dutchman, or something has gone very wrong (and I'd like the author to contact me to let me know what).
Perhaps he was given faulty fonts, perhaps he was using a badly-written publisher's style, or perhaps he -- or his editor -- spent a long time making it look as bad as possible. Maybe OUP had it completely re-typeset in some other system without telling him. There are at least a dozen typographic faults in one paragraph alone, from unnecessary hyphenation to excessive word-spacing to bad math spacing, and LaTeX simply doesn't make those types of mistake unless you work very hard to introduce them manually.
As a test I screenshot a random paragraph that I viewed in Amazon's "Look Inside" feature, and then retyped it in LaTeX and typeset it (PDF).
As I don't have the book (and wouldn't understand it anyway :-) I'd be interested to know where the information came from that it was typeset with LaTeX; and if it really was done in LaTeX, I'd love to know WTF kind of style files, fonts, and preamble were used.
This has nothing to do with a book review.
And I'm pretty sure it doesn't have a BibTeX substitute. I use LaTeX myself, but I see a lot of researchers using Word and output looks great. You can now export the document into PDF directly from Word 2007.
I prefer LaTeX myself because of better bibliorgaphy support, cross-references and because it makes it easy to globally redefine the way the document looks by tweaking a few styles. Styles and xrefs have traditionally been a pain in Word and than hasn't changed with 2007.
Maybe some of the people yelling about how Excel is the wrong tool can give some advice for my scientific data analysis and visualization needs.
I have simulations (written in C++ and Python) that spit out tab delimited data files. I then need to analyze that data, doing things like linear regression on subsets of the data and calculations to transform the raw data into something else for plotting.
I have a Mac (with Windows XP in Parallels), I am not a student, and I don't have much budget ($500) for software. Currently I use a Mac program called Plot which is a little buggy and incomplete but has some nice plotting abilities. When I need a spreadsheet I use Apple's Numbers, but that seems sorely limited in abilities. What's a better tool for this job?
Why must we jump through hoops in order to utilize the wrong tool for the job. It's like using a dull knife to cut through a tree branch. You can do it, but why make it so hard.
Kirix Strata is by far, hands down, the most intuitive and powerful data analysis tool I've ever used. Please, don't take my word for it. Try it yourself and see how fast you are up and running using the right tool for the job.
http://kirix.com
matlab has plugins for excel so you can seamlessly move matrices back and forth between them. I guess this is a good book if you cant afford matlab.. but.. i dunno who would be using matlab and not even able to get a student license. Seems kinda pointless.
---------
No matter how thin you slice it, its still baloney.
That's a no-brainer. One day coding, 6 days running because you can be doing other things while it's running.
Of course, but for any given problem, there is only so far you can vectorize it.
Of all the places to have a HUGE bottleneck, iteration has to be one of the more painful ones. The culprit is probably their arbitrary precision numbers, but there has to be some way of optimizing at least at the places where there's limited range integer iteration.
GAAH! MY PRINTER IS ON FIRE!!! PUT IT OUT! PUT IT OUT!
"..Use Excel for high end scientific data analysis akin to Mathemetica.."
Damn... April Fools came earlier this year... Had the scientists used Excel, they would know the global warming is even stronger than suspected...
From my experience Excel is not a tool for any data analysis other than primary school science project or MBAs "data analysis" ..
Even doing simple stuff I got quite a few unsuspected results coming from Excel "knowing better".
And someone wrote 700+ pages about it - hard to believe...
OTOH, once I started doing real work, I dropped excel as inexact and slow. It was far faster, and cheaper, to write C++ routines and run the output through gnuplot. There was no workflow time saving using MS Office as MS Word was an order magnitude slower at math than Latex, and putting a picture in can be done in one line of code rather copy and pastes that are not always reliable.
I do think that this type of book falls under the heading of "when all you know how to use is a hammer, everything looks like a nail". MS office is everywhere, so users tend to redefine problems in such a way that they fit the Office philosophy. This is good and bad. For teaching, it might be bad because students only learn one tool to solve the problem, a tool that may not be nearly as good as an HP50 or mathematica. But if neither of these are available, something unlikely at the four year university level, then Excel is better than nothing.
"She's a scientist and a lesbian. She's not going to let it slide." Orphan Black
Yeah, having slow iterations is quite lame.
Regarding the precision, the default is standard IEEE double precision floating point. (You can of course set the variables to int or single precision as well.) It may be well worth the time coding parts of the programs in C/C++/Fortran (using a Mex gateway), which is actually quite easy, or even just autogenerate the C code from MATLAB and compile that. I've had some luck with that in the past, at least.
I hope I didn't misunderstand you, but if you're depending on arbitrary precision, or VPA, you're in for a world of sloooow pain. In that case, I really don't think MATLAB would be the best tool for the job.
The view was horrible and the smell was even worse; Julie severely regretted becoming a proctologist.
The choppy spacing in the equations looks a lot like what you get out of the MS Word equation editor. Personally, I think this entire book review is a clever troll written for our amusement.
I have been using pro Fit (www.quansoft.com) for over 15 years. It's the BEST plotting/data analysis program around and its price is very reasonable. Highly recommended.
Doing scientific data analysis is Excel is intellectual masturbation. I pithy some of my coworkers that do it just because they don't know anything better. Their plots invariably look horrible. (I did the same mistake in the past, writing my PhD thesis in Word....). Once you discover better tools, there's not going back. Leave Excel to the accountants.
Sorry, but using "Excel" and "advanced scientific data analysis" or even just "scientific data analysis" in one sentence (or paragraph) makes me laught. Sorry, but this is a no no no...
"If the recent financial meltdown has left you wondering..." .... no, it does not anymore wonder me.
I was about to say that - you beat me to it. MATLAB and OCTAVE are extremely powerful. Once you get the hang of their close-enough-to-C-to-confuse-you, one based array indexing, and other things, you will be able to compute complex mathematical functions within a few lines and plot them.
Octave is essentially MATLAB without the symbolic toolbox. It hurts sometimes, but when I'm using linux and need to compute something fast, it works really well!
LaTeX is turing complete, so you can do the analysis in it. Doing so would also, as the AC noted, be a chore, but if you're going to use the wrong tool for the job you may as well go whole hog.
Your assertion that Congress is smart enough to use Excel is ludicrous.
They use PowerPoint slides with less than 6 words per slide for communication with Congresscritters,
and a spam macro driving Word to produce the legislation.
As far as the book goes, it doesn't sound like it delves much into ADO, which is where there is some really interesting action. The ADOR.Recordset object really gives you a flat linked list to use.
With the Scripting.Dictionary, you come close to having a useful set of data structures to work with.
Lots of gotchas, but, when MS Office is all you have to work with, you can still get a lot done, as long as you remember what Marcellus Wallace said about pride.
Get thee glass eyes, and, like a scurvy politician, seem to see things thou dost not.--King Lear
All books that I know that have the word "advanced" in the title are stupid. E.g.: Advanced Calculus. On the other hand, books that claim to be introductory can be arbitrarily complicated. Example: "An Introduction to the Mathematical Theory of the Navier-Stokes Equations: Volume 2: Nonlinear Steady Problems" PS: the exponential decay function never stops.
That's easy. Spend one day writing code. While your computer heats up for the next 6 days, enjoy the world :)
Try Sweave - it is actually a way to let your latex document become "living"
http://www.stat.umn.edu/~charlie/Sweave/
Last time I looked, Excel rounded down at the limits of whatever precision you specify or Excel maxes out at.
If you're doing large-scale analysis round-even is the accepted rounding for scientific analysis, and I think also for financials.
Why? because round-up will cause aggregates to become higher, whereas round-even averages out the rounding errors.
The place where Excel is excellent is in the middle of the engineering cycle. I do aeroheating and thermal analysis using codes written in FORTRAN and C++, along with CFD. Ultimately the final product is tabular input files for another code along with a .pdf with figures and explanation of methodology, results, etc.
... each data file is treated as a data stream, and your UI is a grid, and each operation is treated like a filter, which is dragged and dropped on the grid. Filters are then interconnected between data streams to manage the data. You can cross-connect streams of data of any kind by any variable (time, position, etc.). The configurations can be saved and re-used. It's very nifty once you get the hang of it. They have filters to deal with flat telemetry data, video data, etc. You can write your own filters too...
:)
However in between setting up the problem and shipping out the final product - IE, when interacting with the customer to refine the product to their liking, dotting all the i's and crossing all the t's - we use Excel a lot. We input the output from our codes into Excel and whip up plots for all the data (we had someone whip up a plot tool in VBA, point it at a directory it will open every data file, plot everything of interest in separate files, crossplot certain data points across multiple files in a single file, formatted in appropriate font point, colors, etc. and auto-generate a PowerPoint presentation of the data in sequence. Nifty.) Got the data sets in presentation format in 5 minutes. Go to meeting, present interim results. Customer often wants to see either iterpolated data our outliers or something that requires real-time data manipulation. In Excel in a matter of seconds we can show the underlying data or provide a manipulation, etc.
This is where Excel excels. I can go to any meeting room with Office and do this. I don't need to be able to hit a license server for (expensive_software_suite_X). I don't need to pre-load every presentation room across the center with (open_source_package_X).
I never liked Matlab or Octave. I tried. Honestly though the best data manipulation software I've used (for tabular data) is Pioneer, but you have to be a DoD customer to use it
Most of my work is CFD though and then you need the big guns like Tecplot and Ensight
Perhaps the author used something like LyX and managed to generate really botched up LaTex...
Excel is a very powerful tool. However it seems a bit unusual than such an avid Office fan would type a book in LaTex...
Word can do much of what LaTex does. However LaTex-like abilities (cross-referencing, standarized formatting, etc) require , require very careful disipline -- something most Word users are not accustomed. In word, one CAN use styles the same way one uses tags in LaTex.
Google "excel probability flaws" or other such key words. you will find documents like this:
http://www.sciencedirect.com/science?_ob=ArticleURL&_udi=B6V8V-4S1S6FC-5&_user=10&_rdoc=1&_fmt=&_orig=search&_sort=d&view=c&_version=1&_urlVersion=0&_userid=10&md5=c20658d896f2aa649df41e54fc1219b7
putting the 'B' in LGBTQ+
How about some ASCII art:
=EO3= =EO3= =E.3= =Eo3=
and animate it! There could be a live ballet version too.
MATLAB runs just fine (native) on Linux, you know. I haven't tried Octave on Windows, though.
Under capitalism man exploits man. Under communism it's the other way around.
Obi Wan: That's no ham. It's a...
... a spreadsheet software which does NOT do computation mistakes.
bad typography seems to be a religion.
Fighting both with tradition & with vendor marketing...
( ragged-right's more readable than justified - and has been for 500 years,
space-endash-space is more matching the "this is an interjection" concept than nospace-longdash-nospace - visually, that is,
can't get good typeface families with expert chars, etc etc etc. )
BTW, if you get the chance, try Scrivener, maybe on a Frankentosh/Hackentosh or http://www.efi-x.com/ machine - best good-writing prog around http://www.literatureandlatte.com/ - if you prefer your publications be well writ.. :)
Both Corkboard & Infinite-Paper, among other things...
http://www.writerscafe.co.uk/ is *sorta* similar: its StoryLines gives one the ease of arranging the bits-of-story into several ( visually ) parallel streams of story, so one can SEE what one's work's structure be -shrug- Writer's Cafe runs on Linux/Windoze/Mac, unlike Scrivener...
I'd outright ban wordprocessors from author's submitted copy, if necessary, to get fully-worked work.
Cheers
BTW, the kerning in your PDF sucks: the word AVERAGE is particularly bad, and the word "weight" also shows bad optical distribution.
Try the font "Utopia" included with X in SuSE: it seems to be through and through quality. Interesting how clear the LaTeX is, in comparison, though, I'd never seen a a/b before, and am probably going to have to learn it, now. I owe you one. (:
Always Persue Excellence!
Although I've no idea what your point was, as each of the programs you mention serve a completely different purpose...
I would recommend taking a look at MathCAD. It enables a whole host of pretty advanced analyses, while giving really nice quality output. Has saved me a massive amount of time in the past.
If you want to keep your job that is. I have worked with some people who have experience in the financial sector and they tell me that Excel is used a LOT in this industry. If you don't know your way around VBA then you are at a competitive disadvantage.
I can just imagine being sat in front of a pointy headed boss. He likes the ideas behind my mathematical model, we wants to use it, he may well give me a bonus if this works out right so when he says
"Implement this by tomorrow and let me have the spreadsheet so I can play with it"
I am not going to say
"But sir, you really should learn Mathematica or R - only idiots use Excel"
Nope. I like my paycheck. So I am going to code it up in Excel. Of course I am mindful of its limitations so when I need to do some maths I pass control over to a decent math's library - NAG immediately springs to mind (quite easy to do - check this tutorial out). I'll just use Excel as the front end to keep my PHB happy.
Years later I will be the PHB and THEN I can demand that my minions use Mathematica, Matlab or whatever.
Who WANTS matlab syntax?! I think I'd rather slit my wrists than be forced to write in matlab again.
Octave 3 under windows is a very usable system. I prefer Scilab, in many ways to both Octave and Matlab, but at work I have Matlab and it is easier to switch between Octave and Matlab than Scilab and Matlab.
The other day I had a point cloud of about 100000 points that I wanted to plot. Not having matlab installed, I decided to "just plot it in Excel for the sake of expedience".
Not only did Excel take an unfathomably long time to load the dataset, it then popped up a friendly dialog saying that I had exceeded the 32,768 row limit.
Unbelievable. Software from 2007. A 32K limit. I was completely speechless.
(In the end, I just wrote a 20 line java+opengl app, which is what I should have done in the first place.)
That's pretty cool. I haven't really used R much, and I'm not in academia, but I might still check out Sweave if I ever find myself going from R -> LaTeX :)
W..w..W - Willy Waterloo washes Warren Wiggins who is washing Waldo Woo.
FreeMat is almost as good as Octave, and has its own plotting capabilities.
I'm in the process of writing my phd thesis at the moment and for a while was considering using LaTeX, but at this late stage in the game I decided it wouldn't be worth learning it (my only experience is with WYSIWYG programs). Care to explain what the differences are between the original and the LaTeX version you produced are? Because they're not immediately obvious to me.
as a scientist in a small company full of scientists, excel is agreat practical tool, it is something the average biologist, who doesn't know a lot of math or programming, can use.
all these other things - R maple matlab are $$ hard to learn (true if you already know excel) etc etc etc
anyone who blindly bashes excel for data analysis is a moron - like any tool excel has its place
*.docx is essentially a gzipped XML file, from what I understand.
"The first chapter alone will have you in awe as you see the author turn the lowly Excel into something that rivals Mathematica using VBA, brains, and a heaping helping of fortitude."
Yep. I'm already in awe that he didn't just use Mathematica.
People... whatever you think Excel does that is neat, there is already a specialized app that does it much better. Don't waste your time with Excel unless you're in one of those horribly managed offices where they only allow you to use Microsoft-branded stuff. In which case, you're damned to incompetence anyway, and rather than feeling sympathetic, I'm just glad I'm not you.
PS - I would have included many more "ha"s in the subject, but Slashdot would censor me. Apparently 7 "ha"s is the maximum amount of amusement I am allowed to feel here.
Why isn't it a good idea
Excel is still plagued by inaccuracies problems, bugs, etc.
In short : excel could spit the wrong answers.
and does this apply equally to OpenOffice?
In the opensource world, Gnumeric is a spreadsheet well known for it's collaboration with the R-Project in order to guarantee a good accuracy.
I don't know about OpenOffice.org Calc. I tend to use it only for basic tasks and move to more appropriate languages like R when the need arise. But last I heard about it, OpenOffice.org was criticized for simply NOT having advanced statistics.
In a short caricature : :-D )
- Microsoft Office Excel : does stats and does it wrong
- Gnumeric : does stats and does it well (excels at it
- OpenOffice.org Calc : doesn't do stats
- R/Octave/Matlab/Python+modules/Fortran : what the real statisticians use.
"Sufficiently advanced satire is indistinguishable from reality." - [Tips: 1DrYakQDKCQ6y52z6QbnkxHXAocMZJE61o ]
While .tex is still plain text.
Take a day or two and give LaTeX a go. It's worth it, it's just not great to get started with. It's worth checking to see if your university has a thesis style you can use, that will take care of how everything looks for you.
After you've learnt it, you just sit down and write. Maybe when you finish a section or something you'll run 'make' (or whatever) and see how it looks, but that's not usually in the front of your mind. Most of the time you just write away, and when you need to show someone a draft you then turn it into a PDF.
As for differences between the original and the LaTeX version done by the OP, the most obvious one is that the word 'column' doesn't need to be hyphenated. There are a few other slightly odd kerning things going on there too. However, the original definitely looks like it has been done in LaTeX, or at least styled after it, so I do wonder if the publisher is trying to micromanage a few too many things, to the point of breaking a lot.
I haven't tried Scilab. I recently took a look at Maxima, and I like it so far. It doesn't really cover the same range of problems, though. It's more like a replacement for my TI-89.
Under capitalism man exploits man. Under communism it's the other way around.
That's the thing, they don't have a template, just vague guidelines on margins and the like. I see what you mean about the hyphens. I might give LaTeX another go, but recently I've managed to turn kerning on in word and install a macro that replaces common ligatures - so I'm still not convinced it's going to do much for me - but I'm up for the challenge, anything to distract me from actually writing!
I'd be very surprised by any university that doesn't have a LaTeX style available. Here, it's the de facto standard for maths and computer science. It may be useful to have a look at the Otago thesis style (scroll down) which is what I use. Of course, if you tried to use it seriously, you'd need to adjust it for your university standards, but that's probably easier done when you have a starting point, rather than learning everything from scratch.
It's also nice in that you can use emacs/vim/kile/whatever else you like to write in. A laptop away from the internet, with only your editor of choice running in a VT is a great distraction-free way of getting things written.
I'm also given to understand (and I haven't used word for a very long time) that doing stable crossreferences and contents pages in word can be an exercise in frustration. With LaTeX, all those sorts of things just magically work (and, with the inclusion of hyperref) you get a hyperlinked PDF file.