Slashdot Mirror


Ask Slashdot: Statistical Analysis Packages For Libraries?

HolyLime writes "I'm a librarian in a small academic library. Increasingly the administration is asking our department to collect data on various aspects of our activities, class taught, students helped, circulation, collection development, and so on. This is generating a large stream of data that is making it difficult, and time consuming, to qualitatively analyze. For anything complicated, I currently use excel, or an analogous spreadsheet program. I am aware of statistical analysis programs, like SPSS or SAS. Can anyone give me recommendations for statistical analysis programs? I also place emphasis on anything that is open source and easy to implement since it will allow me to bypass the convoluted purchase approval process."

12 of 146 comments (clear)

  1. R or WEKA ... Wait, What Exactly Are You Doing? by eldavojohn · · Score: 5, Informative

    R is my personal favorite but you're going to have to get down and dirty with some high level programming (scripting). Check out the data import package (you would probably export your spreadsheets to flat txt files and import although the functionality is ever increasing). There's no user interface in this suggestion ... what there is, however, is a massive collection of packages for statistical analysis. Very well maintained, constantly updated and ever expanding.

    The other suggestion has a better GUI but is really heavyweight. WEKA has helped me time and time again perform advanced statistical calculations on data sets and it's in Java so runs on just about anything. Their interface occasionally improves too, they now have an explorer that I use to prep data and remove outliers/null data (don't worry, this isn't climate data). It's well documented.

    These (probably) require an intermediate data transformation step but are open source and extensively supported. Any examples of what you wanted to do? Simple stuff like standard deviation or complex stuff like principle component analysis (PCA)? I guess if it was just simple stuff, that'd be built into Excel, right? Maybe your problems are simple enough to just need a good macro writer to tackle? Whatever happens, good luck!

    --
    My work here is dung.
    1. Re:R or WEKA ... Wait, What Exactly Are You Doing? by logical_failure · · Score: 3, Informative

      Came here for the mention of R, and leave satisfied. R is an excellent choice.

      --
      Sock Puppets: damn_registrars=pudge_confirmer=jimmy_slimmy=raiigunner=cml4524=a_klavan=red4men=ronpaulisanidiot
    2. Re:R or WEKA ... Wait, What Exactly Are You Doing? by Alan+Shutko · · Score: 4, Insightful

      In this case, you're not quite correct. The head of our statisticians wants to get R in here to supplement SAS (which we pay a lot of money for) because it is both good software, and also being used heavily for research. As he put it "If we started using R, we could start using new tools as soon as we read the paper, since most of the researchers are using R."

    3. Re:R or WEKA ... Wait, What Exactly Are You Doing? by Warlord88 · · Score: 3, Informative

      Why do you think R is not easy to implement? My company has been using SAS for a long time and we are finally making the change to R. As far as OP's requirements are concerned, I think R is way superior to SAS or SPSS because of its free, modular nature. It is clean, simple and suitable for a wide range of users. The commercial packages are filled with way too much business lingo garbage for me.

      I personally think commercial support is overrated. I can install software on my own. I know how to browse through manuals and other information to find what I need. For a package like R, I almost always get any questions answered in at most few hours on online forums. So what exactly do I get from commercial support for my money? But, if OP needs commercial support, there is an enterprise edition of R by Revolution Analytics located here: http://www.revolutionanalytics.com/products/revolution-enterprise.php. Might be worth looking into.

      Bottom line: R all the way.

  2. PSPP by Geste · · Score: 5, Informative

    Look at the free SPSS work-alike PSPP. http://www.gnu.org/software/pspp/ Sounds like R might be a bit much for your needs.

  3. What output do they want and what answer? by vlm · · Score: 3, Informative

    Blue skying the toolset is not gonna work. What output do they want, then figure out what tools can generate that output.
    If the most important thing is inserting pretty graphs into newsletters, thats one thing.
    If the most important thing is hard core data warehousing analysis (for a library?) thats another thing.

    The other thing is what answer do they want? They're just looking for data to back up an unpopular decision or glorify themselves demonstrating their amazing management talents. So figure out what that is (by asking them?) and help them get the data they want. Don't give them a graph of declining circulation if they're trying to emphasize their brilliant leadership. Don't give them a graph of increasing student help, if they're trying to justify downsizing.

    --
    "Science flies us to the moon. Religion flies us into buildings." - Victor Stenger
  4. R and Python (Rpy2) by mpetch · · Score: 3, Interesting

    I have grown accustomed to doing statistical analysis using Python and R using http://rpy.sourceforge.net/rpy2

  5. Go Ahead and List Them Then by eldavojohn · · Score: 4, Interesting

    I also place emphasis on anything that is open source and easy to implement since it will allow me to bypass the convoluted purchase approval process.

    Sorry to burst your bubble, but if you want good support and easy implementation, you have to look for normal paid-for solutions. Besides, open source is not synonym for free. This is especially true with specialized software or something you want good support for. Open source just means you get the code aswell, so you can implement your own additions (without use of plugins) or change it.

    Your point may be valid. But what would really help your validity is mentioning some proprietary products that beat R and WEKA at their own game. Sure, I've used Matlab and it can't be beat in some respects and is heavily supported. But to suggest that just because it effortlessly interfaces with Excel spreadsheets when the person could get by with a simple export in Excel to run their R script on the resulting files? Not worth the cash, in my opinion. I don't go out and buy every piece of software to evaluate it, though. I'm aware of Matlab and Mathematica and have used them quite a bit ... but I still prefer R and WEKA. So, CmdrPony, go ahead and list all the proprietary point-and-click-omg-it-just-works software for our friend here. We're all waiting.

    But unless you get an product from a company that is spending money to develop it, you never get good software and good support.

    Say, friendo, have you ever heard of Linux? Eclipse? Audacity? PostGRES? VLC?

    No one can make both because everything in this world costs money, and developers have to live too. Open source and free software model works well for the likes of Google and Firefox because the developments get paid by money made with advertising. Statistical analysis software, and other specialized software is a different matter.

    Can you tell me what advertising model is employed to funnel money through Firefox into Google? I mean, Google makes a competing product called Chrome -- the rendering engines are even different! What in the world are you free basing?

    --
    My work here is dung.
    1. Re:Go Ahead and List Them Then by peter+in+mn · · Score: 4, Informative

      One major advantage of R is that it's the standard teaching package for undergraduate statistics. That means that stats department (or math department, if the school is too small to have a separate stats dept) will have people who can show you how to do stuff. That is, support is available, locally, for free. Also, there are teaching texts that start simple and build up to as complicated as you want. A saved R script is a reasonable way to automate the report preparation process. You can collect data in Excel, dump it to tab-delimited text, read it into R and generate a pile of pretty graphs over and over again every month. But writing the script requires a fair amount of study, and being able to talk to someone who uses it a lot will make you much happier.

  6. Maybe a slightly different tool by LWATCDR · · Score: 3, Interesting

    It almost seems like you are not doing statistics as much as creating reports from data.
    Maybe you should be using a database instead of a spreadsheet or a statistics program.
    The Uber geek way would be to set up a LAMP server and create a webased system.
    The more convent way would be something like Access.
    You can then use Excel to manipulate the data as needed or the database program.

    In the end if you know excel you may want to stick with it. I see people use Excel for databases all the time. Drives me a bit nuts but sometimes what ever works is just fine.

    --
    See my blog http://ilovecookes.blogspot.com/ for light hearted technical information.
  7. Do NOT stick with Excel by Anonymous Coward · · Score: 5, Informative

    Excel and other spreadsheets suck at stats:

    * Burns, P. (2005). Spreadsheet Addiction.
    * Cryer, J. (2001). Problems with using Microsoft Excel for StatisticsPDF.
    * Pottel, H. (n.d.). Statistical flaws in Excel. PDF
    * Practical Stats (n.d.), Is Microsoft Excel an Adequate Statistics Package?
    * Heiser, D. (2008). Errors, faults and fixes for Excel statistical functions and routines

    For a more comprehensive and technical discussion, see the papers by Yu (2008); Yalta (2008); and McCullough & Heiser in Computational Statistics and Data Analysis 52(10).

  8. R with RKWard by binarstu · · Score: 4, Informative

    I will echo the support for the open-source statistics package R. R is incredibly powerful, and in the natural sciences it is fast becoming the standard statistics software.

    I will also echo the sentiment that, by itself, R is fairly low-level and typically requires at least some simple programming to get what you want.

    However, there is a very nice graphical front end for R called RKWard (http://rkward.sourceforge.net/). With RKWard, importing and exporting data, running basic analyses on it (descriptive statistics, linear regression, t-tests, etc.), and producing basic graphs is very straightforward and does not require detailed knowledge of the R language. Plus, RKWard is also a nice development environment for writing R code, so if you want to take your project further, you can easily do so. So, I'd recommend giving RKWard + R a look.