Slashdot Mirror


Ask Slashdot: Statistical Analysis Packages For Libraries?

HolyLime writes "I'm a librarian in a small academic library. Increasingly the administration is asking our department to collect data on various aspects of our activities, class taught, students helped, circulation, collection development, and so on. This is generating a large stream of data that is making it difficult, and time consuming, to qualitatively analyze. For anything complicated, I currently use excel, or an analogous spreadsheet program. I am aware of statistical analysis programs, like SPSS or SAS. Can anyone give me recommendations for statistical analysis programs? I also place emphasis on anything that is open source and easy to implement since it will allow me to bypass the convoluted purchase approval process."

146 comments

  1. R or WEKA ... Wait, What Exactly Are You Doing? by eldavojohn · · Score: 5, Informative

    R is my personal favorite but you're going to have to get down and dirty with some high level programming (scripting). Check out the data import package (you would probably export your spreadsheets to flat txt files and import although the functionality is ever increasing). There's no user interface in this suggestion ... what there is, however, is a massive collection of packages for statistical analysis. Very well maintained, constantly updated and ever expanding.

    The other suggestion has a better GUI but is really heavyweight. WEKA has helped me time and time again perform advanced statistical calculations on data sets and it's in Java so runs on just about anything. Their interface occasionally improves too, they now have an explorer that I use to prep data and remove outliers/null data (don't worry, this isn't climate data). It's well documented.

    These (probably) require an intermediate data transformation step but are open source and extensively supported. Any examples of what you wanted to do? Simple stuff like standard deviation or complex stuff like principle component analysis (PCA)? I guess if it was just simple stuff, that'd be built into Excel, right? Maybe your problems are simple enough to just need a good macro writer to tackle? Whatever happens, good luck!

    --
    My work here is dung.
    1. Re:R or WEKA ... Wait, What Exactly Are You Doing? by logical_failure · · Score: 3, Informative

      Came here for the mention of R, and leave satisfied. R is an excellent choice.

      --
      Sock Puppets: damn_registrars=pudge_confirmer=jimmy_slimmy=raiigunner=cml4524=a_klavan=red4men=ronpaulisanidiot
    2. Re:R or WEKA ... Wait, What Exactly Are You Doing? by CmdrPony · · Score: 0, Flamebait

      ... you're going to have to get down and dirty with some high level programming (scripting) ... There's no user interface in this suggestion ...Their interface occasionally improves too ... These (probably) require an intermediate data transformation step ...Maybe your problems are simple enough to just need a good macro writer to tackle ...

      He said he wants something that is easy to implement, and only reason he is going with open source is because then he doesn't have to ask for purchase approval. Which IMO is a really stupid reason and will hurt in the long run - it's insane to take worse software just because you don't want to ask your boss if it's okay to buy this one.

      I also place emphasis on anything that is open source and easy to implement since it will allow me to bypass the convoluted purchase approval process.

      Sorry to burst your bubble, but if you want good support and easy implementation, you have to look for normal paid-for solutions. Besides, open source is not synonym for free. This is especially true with specialized software or something you want good support for. Open source just means you get the code aswell, so you can implement your own additions (without use of plugins) or change it.

      But unless you get an product from a company that is spending money to develop it, you never get good software and good support. No one can make both because everything in this world costs money, and developers have to live too. Open source and free software model works well for the likes of Google and Firefox because the developments get paid by money made with advertising. Statistical analysis software, and other specialized software is a different matter.

    3. Re:R or WEKA ... Wait, What Exactly Are You Doing? by fropenn · · Score: 1

      I too like R. You might link it with TINN-R (http://sciviews.org/Tinn-R/) to simplify some of the coding process. Last I had heard there was also some work on a GUI for R but I don't know if that's progressed very far.

      SPSS is fairly easy to use and I would recommend it over SAS for basic analyses, but, as parent suggested, it really depends on what you want to do. You might be pretty happy just downloading some Excel macros which can be found through web searches (or, better yet, writing your own).

    4. Re:R or WEKA ... Wait, What Exactly Are You Doing? by garcia · · Score: 1

      I guess if it was just simple stuff, that'd be built into Excel, right? Maybe your problems are simple enough to just need a good macro writer to tackle? Whatever happens, good luck!

      Sounds like you may be correct. More information is definitely required in order to recommend the proper product.

      However, R would definitely be my go to choice when someone is asking about SPSS/SAS. Speaking of that, being a SAS guy I really need to take the time to get R experience.

      Anyone with decent recommendations, aside from R's own website, where to do a quickstart when you're a SAS geek?

    5. Re:R or WEKA ... Wait, What Exactly Are You Doing? by Anonymous Coward · · Score: 0

      Nonsense. He isn't doing heavy stats and he certainly doesn't understand what's involved. Large streams of data and MS Excel are mutually exclusive.

      The poster should have provided input data and output requirements, he'd have gone a ton of solutions from one of us that writes commercial stats software for a living.

    6. Re:R or WEKA ... Wait, What Exactly Are You Doing? by Anonymous Coward · · Score: 2, Insightful

      Easy of deployment does NOT leave out open source. Ease of deployment simply depends on how the package was programmed. Many closed software are just as hard to deploy as open source. Saying you must buy software is useless without actually giving information out on the general field of statisical analysis like various options and comparisons between closed sourced and open.

      As for support. It's true open source generally has limited support but often enough it's enough since many closed software also provide limited or slow support unless your one of their larger customers. Also, sometimes you can find companys to pay for support when dealing with open source software. Simply speaking, support varies GREATLY depending on the software in question be it open or closed source. Note, however, he did NOT mention anything about his requirements in terms of software support so whether the issue of support exists is up in the air.

      This is also a library probably with limited funds. Organizations like these can take insane amount of time before software is approved when not even factoring in that the software can easily be rejected. While is true open source doesn't mean free, it might as well in the majority of cases. Most software that are open sourced often release the binaries for free (companies like cedega are in the extreme minority where they hide access to the source and charge for the binary). If a open source product can meet his requirements, why shouldn't he go with it? Both open and closed source take time to deploy and the amount of time spent trying to get a closed software to be approved can also be spent on deployment an open sourced software.

      *Note, I don't advocate open vs closed source. I'm just speaking for this specific case. If closed software fits and is less hassle, go for it. If open source fits and easier to deploy (be it deployment or approval), go for that instead. Really depends on the requiresments. Software are tools, use the one that fits best for your needs.

    7. Re:R or WEKA ... Wait, What Exactly Are You Doing? by Alan+Shutko · · Score: 4, Insightful

      In this case, you're not quite correct. The head of our statisticians wants to get R in here to supplement SAS (which we pay a lot of money for) because it is both good software, and also being used heavily for research. As he put it "If we started using R, we could start using new tools as soon as we read the paper, since most of the researchers are using R."

    8. Re:R or WEKA ... Wait, What Exactly Are You Doing? by Warlord88 · · Score: 3, Informative

      Why do you think R is not easy to implement? My company has been using SAS for a long time and we are finally making the change to R. As far as OP's requirements are concerned, I think R is way superior to SAS or SPSS because of its free, modular nature. It is clean, simple and suitable for a wide range of users. The commercial packages are filled with way too much business lingo garbage for me.

      I personally think commercial support is overrated. I can install software on my own. I know how to browse through manuals and other information to find what I need. For a package like R, I almost always get any questions answered in at most few hours on online forums. So what exactly do I get from commercial support for my money? But, if OP needs commercial support, there is an enterprise edition of R by Revolution Analytics located here: http://www.revolutionanalytics.com/products/revolution-enterprise.php. Might be worth looking into.

      Bottom line: R all the way.

    9. Re:R or WEKA ... Wait, What Exactly Are You Doing? by Anonymous Coward · · Score: 0

      R Is a great choice. You may want to use it as a PostgreSQL database Stored Procedure language. See http://www.joeconway.com/plr/
      With this combination, your data can be loaded and stored in an excellent open source data base; the analysis done with R. It then an easy task to write some Web applications to report the results.

    10. Re:R or WEKA ... Wait, What Exactly Are You Doing? by Anonymous Coward · · Score: 2, Interesting

      He said he wants something that is easy to implement, and only reason he is going with open source is because then he doesn't have to ask for purchase approval. Which IMO is a really stupid reason and will hurt in the long run - it's insane to take worse software just because you don't want to ask your boss if it's okay to buy this one.

      Horse shit. I've seen projects die because they couldn't get software through the approval process. Better to try 10 apps that are free and run in userspace (so no need to get IT involved for an Administrator install) than to wait for management approvals, budget cycles, and IT support, and never get the project done. If I'd done that on the job, I'd have been fired for taking too long to do my work.

      I also resent the implication the "free" means "worse."

      Sorry to burst your bubble, but if you want good support and easy implementation, you have to look for normal paid-for solutions. Besides, open source is not synonym for free. This is especially true with specialized software or something you want good support for. Open source just means you get the code aswell, so you can implement your own additions (without use of plugins) or change it.

      I'm guessing you haven't used R. Not only is there a thorough user manual, but there are books from most major statistical and instructional groups on how to use R, AND the R-help mailing list answers every R question I've ever had about it, AND there are local R user groups where you can get support similar to how LUG's work.

      But unless you get an product from a company that is spending money to develop it, you never get good software and good support. No one can make both because everything in this world costs money, and developers have to live too. Open source and free software model works well for the likes of Google and Firefox because the developments get paid by money made with advertising. Statistical analysis software, and other specialized software is a different matter.

      Please shut up. If your assumption were true, R would not exist. R exists, so you're just an asshat.

      My advice to the original poster: Use R if you have any familiarity with programming. Any higher level math/stat course OR experience with basic programming will let you get started in R. If you've been doing this all in Excel already, you're probably ready to hop into R. If you're still uncomfortable, I'm sure one of the people who value your academic library could help out.

    11. Re:R or WEKA ... Wait, What Exactly Are You Doing? by FhnuZoag · · Score: 1

      In addition, mondrian is a good complement to R for some interactive data visualisations. http://rosuda.org/mondrian/ The OP really needs to make clear what he wants to do, though.

    12. Re:R or WEKA ... Wait, What Exactly Are You Doing? by Anonymous Coward · · Score: 0

      R can read in .csv files, with a few caveats around spaces in the column headings (take them out) and missing data (leave cells empty).

      Check out Quick-R http://www.statmethods.net/ for a nicely laid out introduction which gives some idea as to the syntax.The R graph gallery shows its versatility as a plotting software http://addictedtor.free.fr/graphiques/.

      There are also quite a few books on R, including at least 2 by O'Reilly (R in a Nutshell and the R Cookbook).

      Another advantage is that it will load and run on a wide variety of OSs - I've run it off Windows Vista and 7 (including off a USB stick), several generations of OSX, and Xandros and Ubuntu 10 on an eeePC.

    13. Re:R or WEKA ... Wait, What Exactly Are You Doing? by SirGarlon · · Score: 1

      But even if you get an product from a company that is spending money to develop it, you never get good software and good support.

      Fixed that for you.

      --
      [Sir Garlon] is the marvellest knight that is now living, for he destroyeth many good knights, for he goeth invisible.
    14. Re:R or WEKA ... Wait, What Exactly Are You Doing? by Anonymous Coward · · Score: 2, Insightful

      I was at a "Large Data Sets" conference where there was an awkard pissing contest over who had the biggest data set. Then it became a question of whether you had to time-adjust the size of a data set, since a megabyte data set used to be huge. Then someone pointed out that large is relative; what is a large data set for a stats student (or librarian) is trivial for people working on the largest of the day, but it is still large for that person. I don't know what the OP is analyzing, but for them, this is large AND it fits in Excel. (And, since an Excel sheet expended from 2^24 to 2^34 cells, it now can hold a fairly large amount).

      TL;DR: "Large" is a matter of perspective, so don't think Excel makes it a small data set.

    15. Re:R or WEKA ... Wait, What Exactly Are You Doing? by geekoid · · Score: 1

      why do you assume open = worse?

      "you never get good software and good support. "
      bullshit.

      --
      The Kruger Dunning explains most post on /. http://en.wikipedia.org/wiki/Dunning%E2%80%93Kruger_effect
    16. Re:R or WEKA ... Wait, What Exactly Are You Doing? by ottothecow · · Score: 1
      While I do my work in SAS and recognize where it has strengths (especially running on a sas server with lots of shared libraries and huge datasets...sometimes it is nice that datasets are disk-space limited and not ram-limited)...R is better than SAS for the submitter.

      Its not a pretty and easy GUI like excel, but it is going to be no more difficult to implement than SAS (and since the language is less archaic, it might be more intuitive) and it is free. There is a large community to provide support to get quick answers from other people who are just "dabblers". Due to the cost of running SAS, almost all of the support online is written by and for people who use it daily and as such may not feel as accessible to a newbie as the R community.

      Maybe some bastardized gui tool like sas enterprise guide could solve the problems of the user as well...and I always hear about packages like Tableau which might do what they want...but R is quite functional and completely free.

      --
      Bottles.
    17. Re:R or WEKA ... Wait, What Exactly Are You Doing? by Anonymous Coward · · Score: 1

      Try SOFA (www.sofastatistics.com) alongside R. SOFA (Statistics Open For All) focuses on making some of the most important statistical tests easy to use and understand. It also has attractive charting and report tables. Disclosure - I am the lead developer of SOFA.

    18. Re:R or WEKA ... Wait, What Exactly Are You Doing? by SecurityGuy · · Score: 1

      He said he wants something that is easy to implement, and only reason he is going with open source is because then he doesn't have to ask for purchase approval. Which IMO is a really stupid reason and will hurt in the long run - it's insane to take worse software just because you don't want to ask your boss if it's okay to buy this one.

      Uh, no. Depending on where in the economy you live, you can find an open source product that does what you need and get the actual work done LONG before purchasing can actually get the software on your desk.

    19. Re:R or WEKA ... Wait, What Exactly Are You Doing? by Anonymous Coward · · Score: 0

      How about Netlib? Wait, they have a library for statistics and not statistics for a library.

      Ultimately, you are a librarian and not a statistician. If you have a background in mathematics that includes statistics, then you might be able to use the "better" products. The problem, though, is that the person interpretting the results will also require an understanding of what the pretty charts mean. It seems unreasonable to send people to a statistics course to understand the kind of data you are collecting.

      I do not know how the statistics are being stored or calculated now. If there is a database involved, most of those can do basic statistics directly in the query.

      The more in-depth statistics programs are often used by people who want to spend large amouns of time analyzing data. Though they can produce standard charts quickly, it often takes a bit of exploring for a way to convey the desired message.

      "Figures don't lie, but liars figure." - Mark Twain

    20. Re:R or WEKA ... Wait, What Exactly Are You Doing? by kiwigrant · · Score: 2, Informative

      Try SOFA (http://www.sofastatistics.com/) alongside R. SOFA (Statistics Open For All) focuses on making some of the most important statistical tests easy to use and understand. It also has attractive charting and report tables. There are also videos, on-line documentation, and direct support from the developer. Disclosure #1 - I am the lead developer of SOFA. #2 I already posted accidentally as AC

    21. Re:R or WEKA ... Wait, What Exactly Are You Doing? by demonbug · · Score: 2

      I second R, and would also suggest adding in R Commander. Adds a fairly usable GUI simplifying lots of common tasks, while maintaining the flexibility of R.

    22. Re:R or WEKA ... Wait, What Exactly Are You Doing? by Bucky24 · · Score: 1

      As far as support, it depends on the size of the project and how involved the developers are. There are a ton of small scale projects that are good at what they do, but are abandoned. You'll get no support for those.

      --
      All the world's a CPU, and all the men and women merely AI agents
    23. Re:R or WEKA ... Wait, What Exactly Are You Doing? by bukharin · · Score: 1

      Anyone with decent recommendations, aside from R's own website, where to do a quickstart when you're a SAS geek?

      About Quick-R

      R is an elegant and comprehensive statistical and graphical programming language. Unfortunately, it can also have a steep learning curve. I created this website for both current R users, and experienced users of other statistical packages (e.g., SAS, SPSS, Stata) who would like to transition to R. My goal is to help you quickly access this language in your work.

      http://www.statmethods.net/index.html

    24. Re:R or WEKA ... Wait, What Exactly Are You Doing? by magisterx · · Score: 1

      I second the suggestion of R. I have only dabbled with it, but it is quite powerful and has a great community. You might also want to consider something a little more general purpose though. Python with the NumPy and SciPy packages can handle just about any statistical problem you want to consider and it has the versatility to do a whole lot more, such as handle any intermediate steps. It is completely free and you can download an excellent complete package at http://code.google.com/p/pythonxy/wiki/Welcome

    25. Re:R or WEKA ... Wait, What Exactly Are You Doing? by magisterx · · Score: 1

      Sorry to burst your bubble, but if you want good support and easy implementation, you have to look for normal paid-for solutions. Besides, open source is not synonym for free. This is especially true with specialized software or something you want good support for. Open source just means you get the code aswell, so you can implement your own additions (without use of plugins) or change it.

      I think it depends on how you define "good support". Many free (both libre and gratis) applications are very well supported by the community, this includes both Python and R. If you do not like community support, most major free applications have companies that will happily sell support contracts. Red Hat is the obvious example with Linux. Logilab and ActiveState will sell support contracts for Python.

      As for the open source part, you are technically right that there is a difference beween "open source" and "libre" or "gratis". But unless they specifically say otherwise at some point, most people that say open source are looking for something that is both libre and gratis, not just that there is some way to acquire the source code.

    26. Re:R or WEKA ... Wait, What Exactly Are You Doing? by crush · · Score: 1

      Sorry to burst *your* bubble but your argument fails on several fronts:

      1. There already exists a succesful support company based around R: http://www.revolutionanalytics.com/
      2. The model of making money by providing BETTER support and releasing Free Software is proven by Red Hat, MySQL AB (pre-acquisition), etc
      3. The OP doesn't sound like they need anything besides out of the box functionality, which is incredibly full-featured in R (especially compared to Excel!)

    27. Re:R or WEKA ... Wait, What Exactly Are You Doing? by jvin248 · · Score: 1

      .
      R would be the way to go for heavy lifting, or even LibreOffice which has a database function in it for regular things (Scientific Linux and CAELinux are packaged with R, R's gui's and some other useful tools, I recommend CAELinux and you can run it directly from the DVD so no need to install).. There's a book I found helpful "introductory statistics with R" by Dalgaard as well as the gui extensions: rcmdr, rattle, rapid-i, and rstudio noted farther down in another post. there is also the R Journal (journal.r-project.org).

      I've spent years in the auto industry and been involved with statistical analysis .. a "six sigma black belt". You're being asked the typical question of collecting data and then reporting it, from the partial list looks like a typical management request to justify departmental staffing/budget.

      You can be very efficient in collecting data and producing reports, but the most valuable part of the exercise is defining the problem that needs solving. And only then look at the two or three variables that describe the symptoms and causes of that problem. By the time you have to pull out the statistical tools like R to answer the problem, the proposed solution doesn't move the needle enough.

      "Do you want your car to stop at the front of the bus or the back of the bus? or be five-nines confident it only goes as deep as the firewall?" When anything passed the radiator is a problem. You want to have something that can pass the "Intuitively Obvious to the Casual Observer" test which a simple spreadsheet bar-chart or line-chart can often do.

      Can you do a follow-up post on what data you've collected (charts)? And what questions are being answered? I'd be curious. or send me a note if you want some help looking over some of it.

    28. Re:R or WEKA ... Wait, What Exactly Are You Doing? by stranger_to_himself · · Score: 1

      R is good for analysis (although it has a steep learning curve) - but it seems to be that the poster has more of a data management problem than an analysis one. 'Administrators' are unlikely to be wanting a inference or projections, they will just be wanting informative series of data on usage (nice graphs and tables etc). So a good database solution is probably the most important step, then exporting tables into something that will make nice reports (Excel might be okay) is the next.

      I'm a statistician so I know the strengths and limitations of most stats packages - and I don't think a stats package is the correct approach here. But I would agree with GP, we need to know exactly what you are trying to do.

    29. Re:R or WEKA ... Wait, What Exactly Are You Doing? by ceoyoyo · · Score: 1

      SPSS and SAS aren't exactly point and click either. If you want to do serious stats, you're going to have to type. R actually has a fairly straightforward syntax, and is designed to be used interactively. Thre are also lots of good beginner tutorials.

      If the poster needs some data management help as well, there's rpy, which lets you use R from Python - all the power of a real programming language, including database access, linked to R.

    30. Re:R or WEKA ... Wait, What Exactly Are You Doing? by ceoyoyo · · Score: 1

      I tried installing SAS a few times. The licensed version from the university wouldn't install - the double click installer crashed every time, at the END of the long install. The non licensed version from tpb installed, but then I was left with SAS and it's hard to interface with self (SAS server never did work). R installs no problem, every time, and just works.

    31. Re:R or WEKA ... Wait, What Exactly Are You Doing? by Anonymous Coward · · Score: 0

      R. The learning curve is not as steep as it might look like.

    32. Re:R or WEKA ... Wait, What Exactly Are You Doing? by eulerizeit · · Score: 1

      It would seem to me like R would be like trying to use a sledge hammer to tap in finishing nails. Sounds to me like you need some sort of database that can be easily called when you need to analyze the data. Hiring a data modeler or some other similar IT consultant. Have them set up the db and pre-write the kinds of queries you will need. From there, what type of software you need is highly dependent on the types of statistics you are running. From the sounds of it excel will work just fine.

    33. Re:R or WEKA ... Wait, What Exactly Are You Doing? by Silas+is+back · · Score: 1

      Good to see this as first answer, I came here to suggest R to see I'm not the only one. Go with R!

      --
      this sig is useless
    34. Re:R or WEKA ... Wait, What Exactly Are You Doing? by beniform · · Score: 1

      Along with R, consider using Rattle (the R Analytic Tool To Learn Easily)

  2. This may be a bad idea by bluefoxlucid · · Score: 2

    I find that libraries carry a lot of common information and not so much uncommon information. This sort of muckery seems to encourage concentration of information into a smaller and smaller realm, constantly sorting out first the never-used, then the minimally-used, to maximize volume of return but minimize the use of the library as a haven for obscure and long-forgotten knowledge. Effectively, like burning some books while not burning other books--removes knowledge.

    As with all things, there must be balance. A library where you don't increase holding of more useful texts is less immediately useful; although if you removed all the most used texts, you would have an interesting outcome... the obscure and oft-overlooked need retention, too.

    1. Re:This may be a bad idea by Galaga88 · · Score: 2

      Libraries don't necessarily enjoy removing materials from the collection, but the two main reasons to do so are to make sure we have current/accurate materials and make room in our always limited shelf space. (The first is of presumably higher importance in an academic library.)

      Unless libraries can get an unlimited budget for expansion of their physical space or off-site archives, weeding materials will be a necessary evil.

  3. Here's a huge list by narcc · · Score: 1

    Try this giant list

    From personal experience, I can recommend WINKS. It's ridiculously easy to use.

  4. SAGE by MetalliQaZ · · Score: 2

    Sage (formerly SAGE?) is an open source mathematical package that includes statistical functions. I wanted to add that to the usual mentions of R, etc.

    However, are you sure this is what you want? It sounds to me like your real problem is that you have too much data to store. If you're currently using Excel to process your data, and it has been working except that you are running out of space, perhaps what you really need is a database, like Access. If you want OSS, you can probably try LibreOffice, or engage a local student to design a web based system based on MySQL.

    --
    "Here Lies Philip J. Fry, named for his uncle, to carry on his spirit"
    1. Re:SAGE by Anonymous Coward · · Score: 0

      Sage also includes R. You can even use R from within the public Sage notebook at sagenb.org. Just select "R" from the dropdown box at the top of a worksheet, or preface a cell with "%r" to do straight R calculations.

      Scipy stats and the statsmodel scikit also seem interesting.

  5. already have? by Anonymous Coward · · Score: 0

    Have you checked what's already available on the institution's computers? Many have SAS and others site-licensed. SAS can certainly do small tasks, but it's the most versatile and powerful, if not exactly friendly and intuitive. If you have to acquire something and don't need heavy duty tools, I'm thinking you might fare well with R, also. It's free, so if it doesn't work out, no problem, and you'll find clues scattered all over the 'net. Also, some academic departments may have favorite tools which might not be exactly what you want but are close enough and you know where to find help.

  6. SAS is expensive by KernelMuncher · · Score: 1

    SAS is a great package but is probably prohibitively expensive. An open source version like R is probably more appropriate.

  7. A good database? by Anonymous Coward · · Score: 2, Interesting

    Hear me out. We deal with about 3 million data-producing elements and track in real-time to near-real-time. We ingest everything into MySQL (via macros, scripts, tools, etc.) and normalize the data on the way in. For analysis we simply query. Those queries may have their outcome displayed in a simple report generator, or (more often than not) via HTML5 Canvas graphs/charts, Cacti graphs, etc. What we're doing doesn't lend itself well to a SAS type solution. If you could use SAS for what you're doing, this probably wouldn't work for you.

  8. PSPP by Geste · · Score: 5, Informative

    Look at the free SPSS work-alike PSPP. http://www.gnu.org/software/pspp/ Sounds like R might be a bit much for your needs.

  9. PowerPivot maybe by AaronLS · · Score: 2

    Depending on the type of "analysis" you might be better off with something like PowerPivot. There's alot that you can probably gleen from your data without doing sophisticated statistics, but instead using PowerPivot to slice/dice/summarize/chart your data in different ways. It is easiest to use if you structure your data in a data warehouse/star schema fashion.

  10. What output do they want and what answer? by vlm · · Score: 3, Informative

    Blue skying the toolset is not gonna work. What output do they want, then figure out what tools can generate that output.
    If the most important thing is inserting pretty graphs into newsletters, thats one thing.
    If the most important thing is hard core data warehousing analysis (for a library?) thats another thing.

    The other thing is what answer do they want? They're just looking for data to back up an unpopular decision or glorify themselves demonstrating their amazing management talents. So figure out what that is (by asking them?) and help them get the data they want. Don't give them a graph of declining circulation if they're trying to emphasize their brilliant leadership. Don't give them a graph of increasing student help, if they're trying to justify downsizing.

    --
    "Science flies us to the moon. Religion flies us into buildings." - Victor Stenger
    1. Re:What output do they want and what answer? by AaronLS · · Score: 1

      Agreed. A lot depends on what you want to accomplish. "Analysis" can be completely different beasts from project to project. The term "analysis" is kinda thrown around loosely and encompasses a lot of things. So it's important to not dive into the analysis if you don't have a very specific goal in mind.

    2. Re:What output do they want and what answer? by HolyLime · · Score: 1

      The library staff is currently working jointly with the school administration to determine what kinds of information we want to look at and analyze. Though it increasingly looks like the statistics we currently collect are going to be analyzed in more and various ways. I just wanted to take the initiative and come to the table with a potential solution in the form of a low cost software package capable of providing that functionality.

  11. Stick with Excel by syousef · · Score: 2, Insightful

    Seriously, stick with Excel. You and anyone who comes after you would need to learn whatever statistical package you introduce. That is either overkill for the kind of data you're collecting and analysing, or it's a full time job requiring specialist knowledge for which they should be hiring someone else.

    Excel has a few bugs but for the most part it's very capable. Ensure you run the service packs and can install the addons that come with it (analysis pack). Get them to send you on advanced short courses for Excel and Statistics. If there isn't that kind of commitment there's no room for any statistical package.

    Almost all ask slashdot stories that are work related can be answered the same way - bad idea: you're already out of your depth and if you can't be bothered to google for the information the project is doomed.

    --
    These posts express my own personal views, not those of my employer
    1. Re:Stick with Excel by bogaboga · · Score: 1

      Excel has a few bugs but for the most part it's very capable.

      Care to name some of those bugs? I have not come across a single one!

    2. Re:Stick with Excel by Anonymous Coward · · Score: 0

      Agreed. While we tech people are experts at creating situations where an employer can't afford to do without us, it makes leaving a job down the road very hard. Remember KISS and document everything you do so that someone else can come in later on and pick up where you left off.

      One other option is a Microsoft Access Database. Not open source, but you can do just about anything with a relational database & MS Access makes building queries for the novice much easier than using SQL. You can even get very fancy with user-interfaces and reports using VBA code if you want. I work at a shop that does a lot of statistical analysis. We use SPSS & SAS but I've found that for managing data long-term, it's hard to beat building an MS Access database in terms of development speed and simplicity of use & maintenance.

    3. Re:Stick with Excel by melikamp · · Score: 1

      This is an awful advice which ignores everything the submitter asked for. http://www.practicalstats.com/xlsstats/excelstats.html

    4. Re:Stick with Excel by Anonymous Coward · · Score: 0

      1) Most Librarians don't change jobs every two years like programmers.
      2) Excel caps out at a certain level with complex statistical analysis. You can buy add-ons which do more, but that's what this librarian is trying to avoid.
      3) These questions and discussions are very useful. Actually listening to what people have to say is sometimes more useful than a page of google results. It can take hours to evaluate each and every one of twenty possible hits from Google. If you hear someone say "this product is good but it doesn't do X" or "this product has a lousy user interface but great X capabilities" that's all valid information.
      4) Stop being a curmudgeon. ;)

    5. Re:Stick with Excel by syousef · · Score: 1

      Excel has a few bugs but for the most part it's very capable.

      Care to name some of those bugs? I have not come across a single one!

      You can't google Excel bugs???

      http://it.slashdot.org/story/07/09/24/2339203/Excel-2007-Multiplication-Bug
      http://www.joelonsoftware.com/items/2007/09/26b.html
      http://social.technet.microsoft.com/Forums/en-US/excel/thread/f2850183-e8f5-4a3e-a0b1-5a154347f3e9/

      --
      These posts express my own personal views, not those of my employer
    6. Re:Stick with Excel by syousef · · Score: 1

      This is an awful advice which ignores everything the submitter asked for.

      http://www.practicalstats.com/xlsstats/excelstats.html

      READ THE DAMN ARTICLE BEFORE YOU LINK TO IT. YES I AM YELLING AT YOU!!!

      "For business applications where questions might be simpler and precision not as necessary, Excel may be just fine."

      --
      These posts express my own personal views, not those of my employer
  12. R and Python (Rpy2) by mpetch · · Score: 3, Interesting

    I have grown accustomed to doing statistical analysis using Python and R using http://rpy.sourceforge.net/rpy2

  13. Access by Anonymous Coward · · Score: 0

    I hate to toe the Microsoft line here, but I'd go with Access in this case.

    True, SPSS and SAS are statistical analysis packages, but I think they're far beyond what you're looking for. You haven't mentioned T-tests, F-tests, multi-variable regression, etc. That's what those packages are for, and I doubt your administrators even know what they are.

    It sounds like you're after basic aggregation ("how many fiction books were checked out in July?", "How many overdue notices to we have to send out, on average, before they bring the book back?", etc.). Access does these just fine, and the built-in wizards will probably help you get more use out of it. You'll get better ability to select what it is you're counting than you would with Excel (want to know the percentage of books in your collection with more than 100 pages that get checked out more than twice per year? Access will do that easily. Excel... you can can do it, but it's a lot of filtering, cutting/pasting, etc.)

    1. Re:Access by Shifty0x88 · · Score: 1

      Nice response I totally agree... however I totally agree with others that Access is a pretty poor database but it works on the small-ish scale. (Although I do know of an IT company that uses it for their ticketing system... I know, I know you would think they know better, but they are IT pros not programmers and it works for them)

      If you have the skills I like the suggestions by others of: MySQL and/or a LAMP (Linux/Apache/MySQL/PHP) server for a web-based database program(although it seems a lot more complex then what you already have in place).

      I also like your basic aggregation questions, if these are accurate questions you need answered then Access can do them as Anon said above.

  14. Go Ahead and List Them Then by eldavojohn · · Score: 4, Interesting

    I also place emphasis on anything that is open source and easy to implement since it will allow me to bypass the convoluted purchase approval process.

    Sorry to burst your bubble, but if you want good support and easy implementation, you have to look for normal paid-for solutions. Besides, open source is not synonym for free. This is especially true with specialized software or something you want good support for. Open source just means you get the code aswell, so you can implement your own additions (without use of plugins) or change it.

    Your point may be valid. But what would really help your validity is mentioning some proprietary products that beat R and WEKA at their own game. Sure, I've used Matlab and it can't be beat in some respects and is heavily supported. But to suggest that just because it effortlessly interfaces with Excel spreadsheets when the person could get by with a simple export in Excel to run their R script on the resulting files? Not worth the cash, in my opinion. I don't go out and buy every piece of software to evaluate it, though. I'm aware of Matlab and Mathematica and have used them quite a bit ... but I still prefer R and WEKA. So, CmdrPony, go ahead and list all the proprietary point-and-click-omg-it-just-works software for our friend here. We're all waiting.

    But unless you get an product from a company that is spending money to develop it, you never get good software and good support.

    Say, friendo, have you ever heard of Linux? Eclipse? Audacity? PostGRES? VLC?

    No one can make both because everything in this world costs money, and developers have to live too. Open source and free software model works well for the likes of Google and Firefox because the developments get paid by money made with advertising. Statistical analysis software, and other specialized software is a different matter.

    Can you tell me what advertising model is employed to funnel money through Firefox into Google? I mean, Google makes a competing product called Chrome -- the rendering engines are even different! What in the world are you free basing?

    --
    My work here is dung.
    1. Re:Go Ahead and List Them Then by DeadDecoy · · Score: 2

      Stata is another option and it isn't too expensive. I find it more usable than R with regards to the basic tests. And it somewhat supports copy-paste functionality between excel.

    2. Re:Go Ahead and List Them Then by Anonymous Coward · · Score: 0

      I don't know for sure, but I'm guessing you were just successfully trolled, in the sense of "practice of playing a seriously misinformed or deluded user":

      http://en.wikipedia.org/wiki/Troll_%28Internet%29

    3. Re:Go Ahead and List Them Then by peter+in+mn · · Score: 4, Informative

      One major advantage of R is that it's the standard teaching package for undergraduate statistics. That means that stats department (or math department, if the school is too small to have a separate stats dept) will have people who can show you how to do stuff. That is, support is available, locally, for free. Also, there are teaching texts that start simple and build up to as complicated as you want. A saved R script is a reasonable way to automate the report preparation process. You can collect data in Excel, dump it to tab-delimited text, read it into R and generate a pile of pretty graphs over and over again every month. But writing the script requires a fair amount of study, and being able to talk to someone who uses it a lot will make you much happier.

    4. Re:Go Ahead and List Them Then by Anonymous Coward · · Score: 1

      I would mention Statistica and SPSS. They have really nice interfaces and have extensive statistics/data mining capabilities.

    5. Re:Go Ahead and List Them Then by Anonymous Coward · · Score: 1

      Stata is by far and away the easiest program to use, having used Stata, SAS, SPSS, and spent an afternoon with R and then simply gave up.

      You just get your data in .csv form, import it appropriately, and then reg y x for the basics. If you want fancypants implementation, the GUI has a good deal of stuff, but most of it is a matter of googling.

      IV, ARMA, ARIMA, Panel Data, time series, data, generating lags, all not terribly difficult. Lowest time investment.

      Free software with a good GUI is "Gretl".

    6. Re:Go Ahead and List Them Then by Anonymous Coward · · Score: 0

      Can you tell me what advertising model is employed to funnel money through Firefox into Google? I mean, Google makes a competing product called Chrome -- the rendering engines are even different! What in the world are you free basing?

      Google pays (or paid) FF to set Google as the default search engine ($50M if I remember right). The Firefox users then use Google. Google displays those users ads and makes money. In addition, Google's support of Firefox helps make Firefox more popular. This weakens Google-competitor Microsoft's hold on the browser with IE. Therefore, Google's competition is weakened which allows Google to make more money.

    7. Re:Go Ahead and List Them Then by ceoyoyo · · Score: 1

      If you pay am R guru as much as you pay for matlab, you'll probably get pretty awesome support too.

    8. Re:Go Ahead and List Them Then by tehcyder · · Score: 1

      That means that stats department (or math department, if the school is too small to have a separate stats dept) will have people who can show you how to do stuff. That is, support is available, locally, for free.

      That's an interesting definition of free. If I had a work-related piece of software I'd want proper paid for support on it, especially if I didn't know what I was doing. Relying on the helpfulness of maths geeks seems like a risky alternative.

      --
      To have a right to do a thing is not at all the same as to be right in doing it
  15. PDL by Raiford · · Score: 1

    Check out PDL (Perl Data Language). It may not be the most convenient solution but it's free and has a great, informed and responsive user group.

    --
    "player 4 hit player 1 with 0 stroms"
  16. Did you want to keep it simple? by Anonymous Coward · · Score: 0

    If you're already proficient in Excel, I suggest adding the Microsoft Office data analysis pack if you just need some statistical tools inside of Excel. This adds various ANOVA and regression tools inside of excel using an easily accessible Microsoft add-in (you may need to have your Office installation CD handy). http://office.microsoft.com/en-us/excel-help/load-the-analysis-toolpak-HP010021569.aspx contains the instructions. This will be your best and simplest bet if you just need to look at correlation, covariance, descriptive statistics, histograms, and the like.

    Alternatively, you can get pretty complicated with all this other fancy stuff or stay in Excel by using RExcel, a free package that allows R Statistical Package to play nice with Excel as the data input interface.

    Cheers!

  17. Liberals?! by Anonymous Coward · · Score: 0

    I read this first as: Statistical Analysis Packages For Liberals. Thinking it was some kind of lie with statistics deal going on...

  18. Blog and Book for SAS to R by eldavojohn · · Score: 2

    Anyone with decent recommendations, aside from R's own website, where to do a quickstart when you're a SAS geek?

    This blog explains some of the stuff you do in R and as he does it, he compares it to SAS.

    Example:

    Unlike SAS, which has DATA and PROC steps, R has data structures (vectors, matrices, arrays, dataframes) that you can operate on through functions that perform statistical analyses and create graphs. In this way, R is similar to PROC IML.

    And here's an entire book on the topic (although may be difficult to find)!

    --
    My work here is dung.
  19. Maybe a slightly different tool by LWATCDR · · Score: 3, Interesting

    It almost seems like you are not doing statistics as much as creating reports from data.
    Maybe you should be using a database instead of a spreadsheet or a statistics program.
    The Uber geek way would be to set up a LAMP server and create a webased system.
    The more convent way would be something like Access.
    You can then use Excel to manipulate the data as needed or the database program.

    In the end if you know excel you may want to stick with it. I see people use Excel for databases all the time. Drives me a bit nuts but sometimes what ever works is just fine.

    --
    See my blog http://ilovecookes.blogspot.com/ for light hearted technical information.
    1. Re:Maybe a slightly different tool by jgrahn · · Score: 1

      It almost seems like you are not doing statistics as much as creating reports from data. Maybe you should be using a database instead of a spreadsheet or a statistics program.

      I don't see why even a database would be needed. "Increasingly the administration is asking our department to collect data on various aspects of our activities, class taught, students helped, circulation, collection development, and so on."

      Seems to me that information already exists in a library, and the report generation is the only thing missing. And possibly looking into the database on the 1st of every month and writing down the number of books on a piece of paper.

      Or a reply to the administration "Stop asking for these statistics and let me do my job!"

    2. Re:Maybe a slightly different tool by Anonymous Coward · · Score: 2, Informative

      Agreed. Access is a sh*tty database but you seem to be saying that volume is your problem not functionality. However if you've got an Excel license you've probably got an Access license already and Access will allow you to re-use a lot of what you've put together in Excel while handling the volume of data better.

      Unfortunately I also agree with the other posters, if you're after more relevant advice you really need to give a bit more background on:
        - your skill set (Excel user/VBA hacker/Stats major/Hardcore programmer)
        - what do you mean by 'statistical analysis'? This is too broad a description
        - the data you're using (volumes, sources, complexity)

      Another option if volume is your only problem is to not use all the data. Take a random sample and work from that - this is common practice even for people/orgs with high end stats packages.

    3. Re:Maybe a slightly different tool by Anonymous Coward · · Score: 0

      I second this! It seems to me a pretty good solution - export the current database to an online one, for instance MySQL. The main statistical packages, such as R, provide means to interface with those databases. Also, you'd have an easy maintenable DB which could be queried by users of the library through a web interface (there are some pretty interfaces for MySQL out there)

    4. Re:Maybe a slightly different tool by Anonymous Coward · · Score: 0

      Completely agree... What the OP is trying to get comes out of correlating trends from a DB, not that much stats. Excel would be good to start, coding a web app relying on a DB (probably the one where the primary info comes from) once you know what you need is the following step.

  20. Do NOT stick with Excel by Anonymous Coward · · Score: 5, Informative

    Excel and other spreadsheets suck at stats:

    * Burns, P. (2005). Spreadsheet Addiction.
    * Cryer, J. (2001). Problems with using Microsoft Excel for StatisticsPDF.
    * Pottel, H. (n.d.). Statistical flaws in Excel. PDF
    * Practical Stats (n.d.), Is Microsoft Excel an Adequate Statistics Package?
    * Heiser, D. (2008). Errors, faults and fixes for Excel statistical functions and routines

    For a more comprehensive and technical discussion, see the papers by Yu (2008); Yalta (2008); and McCullough & Heiser in Computational Statistics and Data Analysis 52(10).

    1. Re:Do NOT stick with Excel by Anonymous Coward · · Score: 0

      More complete references to the above papers:

      Yu-Sung Su: "It’s easy to produce chartjunk using Microsoft Excel 2007 but hard to make good graphs"

      A. Talha Yalta: "The accuracy of statistical distributions in Microsoft Excel 2007"

      B.D. McCullough, David A. Heiser: "On the accuracy of statistical procedures in Microsoft Excel 2007"

      All published in "Computational Statistics and Data Analysis", 2008

    2. Re:Do NOT stick with Excel by iroll · · Score: 1

      Did you even read the articles you linked? From Pottel:

      My overall assessment is that while Excel uses algorithms that are not robust and can lead to
      errors in extreme cases, the errors are very unlikely to arise in typical scientific data analysis.
      However, I would not advise data analysis in Excel if the final results could have a serious
      impact on business results, or on the health of patients. For students, itâ(TM)s my personal belief
      that the advantages of easy-to-use functions and tools counterbalance the need for extreme
      precision.

      Emphasis mine. I highly doubt that the OP's data require more than a couple of significant figures of precision. While their stats could influence resource allocation, differences of a few percent are unlikely to be deal-breakers--think about it; the library is likely to be dealing with budget items that range in the thousands of dollars, probably in blocks. You're not going to accidentally budget for a whole class based on a wiggle of a percent in attendance.

      --
      Repetition does not transform a lie into the truth. - FDR
    3. Re:Do NOT stick with Excel by syousef · · Score: 1

      Did you even read the articles you linked? From Pottel:

      He would not have been modded informative here if he'd actually read what he linked to. Some days this place really gets me down. If this is the level of quality at a site for geeks, no wonder society is in decline.

      --
      These posts express my own personal views, not those of my employer
    4. Re:Do NOT stick with Excel by syousef · · Score: 1

      PLEASE ACTUALLY READ WHAT YOU LINK TO.
      MODERATORS: LOOK AT WHAT YOU ARE CALLING INFORMATIVE.
      YEP, I'M YELLING. DEALING WITH STUIPIDITY IS FRUSTRATING.

      Excel and other spreadsheets suck at stats:

      That is one camp of thought. There are others. Every package has it's limitations

      * Burns, P. (2005). Spreadsheet Addiction.

      Doesn't talk about never using statistics. Talks about misusing them by pressing them past their limits. "I know there are many spreadsheets in financial companies that take all night to compute. These are complicated and commonly fail. When such spreadsheets are replaced by code more suited to the task, it is not unusual for the computation time to be cut to a few minutes and the process much easier to understand."

      * Cryer, J. (2001). Problems with using Microsoft Excel for StatisticsPDF.

      Focuses on poor charting in the Excel 95 era. Title should be problems for using Excel for graphing. The article is a decade old. Excel has had several refreshes.

      * Pottel, H. (n.d.). Statistical flaws in Excel. PDF

      Another article about Excel 97 and 2000. Decade old software. Many flaws since addressed, and new flaws added. Clearly Excel bashing was popular around 2000.

      * Practical Stats (n.d.), Is Microsoft Excel an Adequate Statistics Package?

      This one suggests it's just fine for the submitter's purposes.

      "Excel’s limitations, and its errors, make this a very questionable practice for scientific applications. For business applications where questions might be simpler and precision not as necessary, Excel may be just fine"

      * Heiser, D. (2008). Errors, faults and fixes for Excel statistical functions and routines

      For a more comprehensive and technical discussion, see the papers by Yu (2008); Yalta (2008); and McCullough & Heiser in Computational Statistics and Data Analysis 52(10).

      Gets very technical, and I bet some of those remarks are valid, but if it's important you become aware of and work around the problem. If it's not, there is no problem. If you don't understand what you're asking Excel to calculate and why it might be wrong, it doesn't matter.

      The more you go into this, the more it requires specialist training. The idea that just replacing one software package with flaws and features you don't understand with another geekier more difficult product with flaws and features you don't understand is ridiculous. As is moderation on slashdot. The comments are being moderated by monkeys practicing to type up Shakespeare..

      --
      These posts express my own personal views, not those of my employer
  21. A suggestion... by esme · · Score: 2

    I suggest you post your question to the code4lib mailing list. It's going to get you much more informed and practical advice. You might even find some people who already have a good workflow who will share their tools.

    -Esme

    1. Re:A suggestion... by HolyLime · · Score: 1

      I suggest you post your question to the code4lib mailing list. It's going to get you much more informed and practical advice. You might even find some people who already have a good workflow who will share their tools.

      -Esme

      I shall try exactly that. Thank you for directing to that mailing list!

  22. Matlab or Octave by gnu-sucks · · Score: 1

    Depending on how large your dataset is, you may have luck using Matlab (or the opensource gnu octave). These programs will let you do *whatever* you want with the data (plotting, correlation, fft, etc).

    With at least Matlab, there are some MySQL plugins available that will let you get data out of your database and into arrays rather quickly. And of course, both matlab and gnu octave let you import csv and plaintext datafiles.

    Here is the matlab plugin I have used very successfully (and it's open source. No idea if it would work with octave):
    http://www.cims.nyu.edu/~almgren/mysql/

    You will need some background with math, statistics, and programming to effectively do this. If you don't have the skills, learn them or pay up for some overpriced commercial product...

  23. free SPSS clone by Anonymous Coward · · Score: 1

    I love R, but if you want something that looks more like SPSS, you could try the free SPSS clone PSPP:
    http://www.gnu.org/software/pspp/

  24. Scipy or Numpy with Python by Anonymous Coward · · Score: 0

    Typically for simple statistics, I do better staying in Python and using the Numpy package for calculation. Scipy provides a mountain of extra packages, and probably has a nice setup for exactly what you want. Certainly Python with Numpy is free, but I'll bet Scipy could be a free resource for you as well.

  25. R works with both PostgreSQL and MySQL by G3ckoG33k · · Score: 2
  26. Minitab by Anonymous Coward · · Score: 0

    Minitab

  27. python + scipy by rla3rd · · Score: 1

    if a full stats package is a bit heavy, try python + http://www.scipy.org/
    below is using the ipython shell

    In [1]: import scipy

    In [2]: x = [1,3,6,8,9,4,9,0,5,3,6,8,6,8]

    In [3]: scipy.mean(x)
    Out[3]: 5.4285714285714288

    In [4]: scipy.std(x)
    Out[4]: 2.7957693986829897

    and if you need more than that you can really delve into its stats submodule http://www.scipy.org/doc/api_docs/SciPy.stats.html.

    1. Re:python + scipy by StripedCow · · Score: 1

      Mod parent up!

      While the examples this poster gives may seem too simple to be of much use in practice, the possibilities of using python are much greater, in the end, than learning some domain-specific language. Python has a much bigger ecosystem around it. For example, when you'd want to add a graphical user-interface, there are thousands of solutions to choose from.

      Plus, but this is personal, I think it is really a shame that the developers of those domain-specific solutions actually thought they needed to develop their own programming language. That is just plain silly.

      --
      If Pandora's box is destined to be opened, *I* want to be the one to open it.
  28. What is your Integrated Library System? by Anonymous Coward · · Score: 2, Insightful

    What is your ILS? Depending on what it is, you may already have access to just about all of what you need there along with Excel. Atriuum from Booksys has wonderful features like you are asking about, record tracking, and it exports to Excel very well. Voyager from Ex Libris had wonderful integration with Access and my boss could pull out some amazing statistics with it.

    If you don't have an ILS then seriously look at Atriuum as they are great for the smaller libraries.

    lordjim AT gmail DOT com

  29. Do you have a mathematics department? by Anonymous Coward · · Score: 0

    You may already have access to the tools you need through their licensing. Also, they may be able help get you going.

  30. Two tools I made for this... by njvack · · Score: 2

    OK, this is a horribly shameless self-plug, but hey, it's directly relevant. I started two projects aimed at tracking reference statistics: Libstats, which is PHP-based and open-source. I'm also one of the founders of Gimlet, which is hosted and closed-source, but provides a similar workfow.

    If you're looking to spend some time delving in code, Libstats is looking for maintainers -- I'm no longer working in libraries, so it's largely orphaned.

  31. Less Is More +4, Seditious by Anonymous Coward · · Score: 0

    ( Before you make the leap into "statistical" "packages" ):

    Would a spreadsheet program satisfy your needs?

    Yours In Ulanbator,
    Kilgore Trout

  32. Tableau Public by Anonymous Coward · · Score: 0

    Great package for visualizing data. ez to use. great online videos and training.
    http://www.tableausoftware.com

  33. How large is "large"? by Nutria · · Score: 1

    What you think is large might be trivial even for OpenOffice/LibeOffice.

    Also, the real solution might be to automate data collection and storage in a database. Manipulation would then sort itself out.

    If you're at a University, then you should go to the Math Dept and talk to some Statistics grad student or maybe even an econometrics grad student in the College of Business. Heck, there's probably Comp Sci undergrads looking for a project to add to their resume.

    --
    "I don't know, therefore Aliens" Wafflebox1
  34. Evergreen by Anonymous Coward · · Score: 0

    Perhaps collecting the data in a standard open source software system would be a helpful first step? http://open-ils.org/

  35. I find that ... by PPH · · Score: 2

    ... rand() serves most of my statistical needs.

    --
    Have gnu, will travel.
    1. Re:I find that ... by Anonymous Coward · · Score: 0

      rand() is not random enough!

  36. Try the JMP demo by jollespm · · Score: 2

    I use and like JMP from SAS. They offer a free 30 day demo and I think it does a good job at data visualization and statistical modeling, or as they call it, discovery. It will interface with SAS, R, Excel along with various database packages for additional capability that may not exist in the core product. I found it pretty easy to pick up with a fairly active user base to help get started.

    1. Re:Try the JMP demo by jfb2252 · · Score: 1

      I agree whole-heartedly. I've been using JMP since version 2.0. Great for exploratory data analysis. SAS differentiates it from SAS proper by limiting the data sets it can deal with to RAM, but with 4GB of RAM common these days that's not likely to be an impediment.

      Almost twenty years ago I compared the sort routine in JMP to Excel's. 30K rows, 28 columns, sort on 3 columns. JMP took about 1% of the clock time Excel did.

      Academic pricing is pretty good.

  37. DeskTracker by Anonymous Coward · · Score: 0

    Desk Tracker it maybe to simple of a program for what you need, but I have seen it used and it does work. Even has a report section..

  38. I would suggest Rapid-I's tool suite. by sgtrock · · Score: 1

    Their product list is here. In particular, I think you would be interested in RapidMiner and RapidAnalytics. WIkipedia has a good overview of RapidMiner.

    Video tutorials for both RapidMiner and RapidAnalytics are available on their website. Those videos are a great way to get a good sense of what the product line is capable of. Searching on YouTube will find plenty more that focus on specific use cases and more advanced functionality.

    All of their software is dual licensed with a GPL version and closed source license available. GPLed versions of their software also has support contracts available for everything from basic troubleshooting support to full implementation. That includes both Rapid-I itself as well as partnerships with contracting companies in the U.S. and elsewhere. In addition, Rapid-I hosts a community forum that is well run and has active developer input.

    I've been using RapidMiner myself for 3 years for smaller projects. I have had occasion to use all of the free resources that I mention above. I have found them all to be very solid. The developers in particular have proven themselves to be knowledgeable and very polite. (IME, that's only to be expected of co-founders who happen to be German. :-) )

  39. Rapidminer! by Anonymous Coward · · Score: 0

    Rapidminer http://rapid-i.com/content/view/26/84/lang,en/ has a free, open-source community edition. It has an insanely easy to use, slick, very well designed graphical interface, and there are a lot of nice video tutorials (accessible directly from the help drop-down as well as from the website) for the basic functionality, to get you started. It already contains most WEKA functionality built-in, as well as a lot of other freely available algorithms of te type that you could also implement via R packages / SPS / SASS. It can also interface directly with R via its R extension, should you need any additional R functionality that Rapidminer doesn't have (my guess is you won't for your project). I've had zero problems installing and running it on Ubuntu (from Edgy to Natty) or on Windows (from XP to 7 Pro), have not tried Mac. It also updates itself automagically without snags via its own update mechanism on both of these platforms, it just asks you to check "ok" for the GPL for each module. It has a schweet graphical wizard for importing excel, tab delimited text, csv, whatever and telling it what the headers etc. are, and it also has some repository/database functionality but honestly I haven't fooled with that so I can't comment.

  40. R with RKWard by binarstu · · Score: 4, Informative

    I will echo the support for the open-source statistics package R. R is incredibly powerful, and in the natural sciences it is fast becoming the standard statistics software.

    I will also echo the sentiment that, by itself, R is fairly low-level and typically requires at least some simple programming to get what you want.

    However, there is a very nice graphical front end for R called RKWard (http://rkward.sourceforge.net/). With RKWard, importing and exporting data, running basic analyses on it (descriptive statistics, linear regression, t-tests, etc.), and producing basic graphs is very straightforward and does not require detailed knowledge of the R language. Plus, RKWard is also a nice development environment for writing R code, so if you want to take your project further, you can easily do so. So, I'd recommend giving RKWard + R a look.

  41. Find out the real need and focus on that by fredrikv · · Score: 2

    It seems to me that all you need is descriptive statistics (change from last month, mean, min, max, etc and probably graphing). Using a general spreadsheet application like Excel or Calc will do the job just fine. Remember that Excel is designed to support business calculations and what you are asked to provide is exactly that! Using a dedicated statistics software for this task (in your environment) is a waste of resources. Full stop.

    However, the solution may not be straight-forward to solve in Excel or any other program. In my experience there are two main reasons:

    1. The request for data is unclear.
    Why do they "increasingly want data on various aspects of our activities"? It could be that the data you have provided so far has not provided support to decisions. Are the questions they really want answered possible to support with the data you can provide? Meet up with the actual decision makers or at least someone who knows what the statistics are actually used for and ask them WHY they need it. Is it used to support resourcing? Is it used to describe changes? Not even a university administration creates statistics for no reason. Most likely, what they really want to know is a handful of numbers like "change from last month", "overall sum", "hours spent on teaching vs information searches".

    Do this with an open mind. You will probably learn that many of the imperfections you see in the details are less important to them. When you know their true needs, suggest a package of data, graphs, free-text report or whatever is suitable. If some parts are easy to provide, be clear about that. If something is more difficult to produce, tell them that it is is possible but time-consuming and costly. Get their buy-in before you spend time on producing the output.

    2. The raw data is not optimally formatted for the calculations
    First of all, if raw data quality can be improved, do that first. Update forms used for feedback, ask for output in a specific format etc. Then arrange the data and calculations in Excel to make it flexible and easy to read and troubleshoot. The trick is to use structure your data and calculations in Excel in a way that is easy to follow visually and logically. In my experience it is very useful to use different tabs for data entry, data analysis and presentation.

    It seems from your examples that your input will come from a variety of sources, both manually entered and output from other systems. To get it into Excel, create separate source data tabs where you can enter or paste your raw data. For each source data tab, create a "clean up and calculate" tab where you rearrange source data and make most of the calculations. If raw data is very far from optimal or calculations are complex you may want to use several tabs or even several workbooks for this. Then create presentation tabs where you present the results from calculations in a useful format.

    I'm convinced you are suffering from both these problems. Attack them in numeric order and you are well on your way. And by all means, sign up for a course in advanced Excel that is suitable for your application. Best of luck!

    1. Re:Find out the real need and focus on that by DerekLyons · · Score: 1

      You're not suggesting a complex open-source application that will require intensive work and special skills to implement to solve a basic task? You must be new around here.

    2. Re:Find out the real need and focus on that by HolyLime · · Score: 1

      You're not suggesting a complex open-source application that will require intensive work and special skills to implement to solve a basic task? You must be new around here.

      I will concur with my colleague above.

  42. Sofa by zdammit · · Score: 2
    1. Re:Sofa by Anonymous Coward · · Score: 1

      http://www.sofastatistics.com/

      Yes this might be what you are looking for to generate your reports. Free software and comes with video tutorials

  43. GUI for R by Anonymous Coward · · Score: 0

    I don't know if anyone else mentioned the Rattle GUI for R. It can import data in a number of formats including .csv and will perform all sorts of common statistics including graph functions. The nicest thing is that as you try out new functions the interface automatically asks permission to download the appropriate CRAN packages. If there is any downside to learning R it is finding one's way through the extensive library of packages--Rattle eliminates this anxiety.

  44. r with rattle by Anonymous Coward · · Score: 0

    i think R with Rattle would be your best bet. Rattle gives a point and clock interface. Also look at rstudio it has a server version so you can run it in a browser, but its best feature is having the help panel beside the coding panel so you can look up syntax and options.

    duphenix.com

  45. R, Octave, Matlab by Virtucon · · Score: 2

    I've used them all and in terms of engineering and academia, MATLAB seems to be where most theoretical prototyping is done. The license costs for academic/student use are reasonable but it's about $2K for a commercial single seat license. Octave is the MATLAB open source alternative and for most basic functions it does well however it doesn't have the extension packages available that MATLAB does.

    My favorite and one I use all the time is "R" because it does have great open source community support and there's not a lot it can't do.

    --
    Harrison's Postulate - "For every action there is an equal and opposite criticism"
  46. JMP!!! by Anonymous Coward · · Score: 0

    JMP from SAS, Inc. would be perfect for what you are looking for.

    JMP would be a sports car, while SAS is a huge semi truck. It is the most powerful point-and-click stats package you'll come across, ie, no languages to learn. Everything is visual. However, it is powerful enough to have a scripting language that is comparable to a lite version of SAS. SAS, SPSS, R, Matlab are overkill for what you are trying to do, like roasting a marshmallow with a rocket engine. You don't need the enterprise database incorporation, you don't need to access terabytes of data, what you need is a visual explorer of the small dataset you have. (ie, you are not analyzing consumer spending patterns across the U.S, you are looking at at most couple hundred thousand observations).

    Coming from someone who sits in front of a computer 8 hour a day looking at data. When you don't know what trends you are looking for and you are just exploring, having a visual interface that will scream "look at me, there's an upward trend" is much more useful than having to worry about whether the proper syntax for proc reg.

  47. code4lib by oneiros27 · · Score: 1

    Agreed ... odds are, they're not running homebrewed circulation software and someone in the library community has tried to extract metrics from whatever they're using.

    --
    Build it, and they will come^Hplain.
  48. What does "anything complicated" mean? + gretl by wfolta · · Score: 1

    As others have said, if you're mainly doing reports, stick with Excel or a database solution. Excel lets you look at your data from a variety of angles (pivot tables, etc), and has usable graphs. As usual, Microsoft has numerical issues, so you may get wrong answers under certain conditions, but hey, it's Excel.

    What is it that "anything complicated" means? Fancy graphs? Fancy partitioning/aggregation of data? Modeling and forecasting? Summary statistics? Graphs that aren't fancy, but Excel doesn't provide?

    An open source option that I haven't seen mentioned is gretl. It has a reasonable GUI and can make nice graphs (though not terribly customizable), give summary statistics, sample data in various ways, and do basic modeling. (It comes from an econometric world, so has quite a few time series capabilities.) If you need to do some things with time series, it would be helpful. (Though if you don't know what you're doing, it simply makes it easy to shoot yourself in the foot.)

  49. Software! by malaprohibita · · Score: 1

    I also work for a (relatively) small academic library, but our campus has free licenses for SAS and JMP. I had to go through hoops to get it (bureaucracy being what it is) but I use SAS all the time for inventory and usage data. It helps that I was a SAS programmer once upon a time, but I love it for its abilities to clean data as much as its statistical chops. Check around campus if you haven't done so already. You may find access to one or both of these to be easier than you think.

  50. More friendly faces for R by Anonymous Coward · · Score: 0

    I'm a working statistician with a fair amount of (data oriented) programming experience. I can say that R is at least initially not that friendly to use and lacks some of the conveniences of traditional statistical packages SAS, SPSS(PASW), STATA. Initially, I preferred to just use SAS to transform the data and then import the final data into R for analysis. There are some things that R does that SAS does not - usually things that are on the bleeding edge of statistical methods. (The project I was working on involved a rare events logistic regression - available through the Zelig package.) There are some groups that have tried to put a nicer friend end on R though. R Studio and Revolution R both attempt to help with the interface a bit and are either free (R-Studio) or free for academics (Revolution R.)

    I think a lot of whether or not R is right for you depends on what kind of analyses you're doing and how you get your data. If all you're doing are basic analyses and the data comes in a form that doesn't require extensive modification then R might appear tricky at first, but will likely fit your needs.

  51. Rstudio by rmcd · · Score: 2

    If you do go with R, be sure to check out Rstudio (rstudio.org), which is a very nice front-end for R.

    In response to the posters who tell you that R is low quality because it's open source, I can tell you that's nonsense. I have Stata, Matlab, and R on my machine, and access to SAS on a research server. There are times to use each, but all else equal I use R. It's not trivial to learn, but it's a powerful high-quality piece of software, widely used in the statistics community. Whether it's appropriate for your use depends on you and the task. But it's great software.

  52. MYSTAT (or SYSTAT) by dereference · · Score: 1

    Sounds like R might be a bit much for your needs.

    Agreed. Another good alternative is MYSTAT, the free "student" version of SYSTAT. Note also that many academic institutions negotiate site licenses for SYSTAT, so you might already have the full version available to you.

  53. stats and graphing package by Anonymous Coward · · Score: 0

    Graphpad Prism

  54. Re: What Exactly Are You Doing? by Anonymous Coward · · Score: 0

    I suspect that what the original poster is doing is trying to cope with bad management.

    Anyone who demands complex statistical analysis from a small academic library is creating make-work that will most likely result in diversion of resources from more meaningful tasks.

    Subservience to management hubris is probably being prioritized over service to patrons, or they would have assigned additional staff (some math students, perhaps?) to do this work, and not dumped it on the librarians.

    The phrase paralysis by analysis comes to mind....

  55. Yellowfin by sproose · · Score: 1

    Not open-source but very easy to generate reports off relation data sources. http://www.yellowfin.bi/

  56. What are you trying to do? by Registered+Coward+v2 · · Score: 0

    It would help if you provided some more details about what you are trying to do. What sort of statistics re you looking to analyze? or are you juts collecting statistics such as average number saved, variations by month, etc. From your post it sounds like you are looking more for activity statistics than statistical analysis; in which car a stat package would just add unneeded complication to your efforts.

    A stat package isn't going to make it any easier to analyze the data; it'll just make it easier to generate results based on large data sets. With Excel, I've found it easier to break down the analysis into separate spreadsheet and then link to get results in one sheet. This cuts down calculation time since you are not dealing with a large amount of data in one worksheet.

    --
    I'm a consultant - I convert gibberish into cash-flow.
  57. Sweave = LaTeX (or HTML) + R by Anonymous Coward · · Score: 0

    As I understand your workflow, you need a system that automatically collects data generated by your ILS and makes routine (but good looking) reports out if it, from very explicit instructions that can be audited later. You should consider a system like Sweave. You write a template for the report, plus the explicit code in R for the analysis. And let it run at whatever time or frequency you want. SAS may let you do it, with very expensive products. And SPSS is just a mess. R is miles ahead of them.

  58. Python(x,y) by gizmo_mathboy · · Score: 1

    As a Perl guy, Python(x,y) has a complete scientific computing package. While Perl and Ruby can do these things, Python(x,y) does it in a slick way.

    It is a Windows only package as far as I can tell.

    Perl, Python and Ruby can deal with Excel and R but Python(x,y) provides a nice interface for everything.

  59. I vote for R by Anonymous Coward · · Score: 0

    head over to https://stackoverflow.com/tags/r and put your rant there. surely you'll grab someone's attention...

  60. DeduceR or R Commander by belg4mit · · Score: 1

    PSPP is a nice idea, but lacks functionality. SPSS is ridculously priced, even with IBM's "discount" for non-profits.
    DeduceR and R commander give you access to the full power of R under R GUI. DeduceR gives you spreadsheet
    like data entry and basic stats, and then you can load R Commander for a menu driven interface more advanced
    functions.

    --
    Were that I say, pancakes?
  61. Overkill? Try Epi Info 7 by Anonymous Coward · · Score: 0

    I am a public health physician and an epidemiologist. I have used excel, SPSS, R, SAS, and python's numpy and scipy. I currently use R almost exclusively. I love it but the learning curve is steep and is probably overkill for what you need. I also agree with the comment above regarding statistical analyses versus generating reports.

    Take a look at EpiInfo 7. It was originally conceived for managing data related to disease outbreaks but is quite versatile (data is data is data). It is newly released and FREE from the CDC. The dos version was awesome and revolutionary for public health world wide. The later versions were awful but Epi info 7 is back on the right track. It is available cross platform, can be run off a thumb drive, can create data entry forms, supports double data entry, handles spatial mapping, do basic and not so basic statistics and can pull or push data from an SQL server. If your data is already in a database then this would still be helpful for creating reports, generating descriptive statistics, startifying your data and doing more complicate analyses if needed.

    Good luck!

  62. PowerPivot by Anonymous Coward · · Score: 0

    Microsoft has a free add-on to Excel called PowerPivot. Sure, it's not going to be as complex as say SAS, but it's way cheap- if you have Excel, it's free. Search for it, try it out. You'll be up and running in minutes and it may prove powerful enough for your needs. If you're already familiar with Pivot tables, PowerPivot takes it to the next level. It's very SQL integrated, but will work with data from nearly any source, and you can combine data from many differing sources. Good luck :)

  63. Try RapidMiner by Anonymous Coward · · Score: 0

    Well, why not use RapidMiner? It not only does stats, but also will allow you to do text analysis (data mining). It has an extension to run R as well... It is open source. Here is a link: RapidMiner

  64. Start with a good book.... by khb · · Score: 1

    Data mining with Rattle and R .... http://rattle.togaware.com/

    Most librarians were probably not math majors, and are unlikely to be expert in statistics. But if you can work your way through the book, you may get enough insight into your data to ask good questions from a local Math department. No doubt some graduate student(s) can get a paper out of it, or at least some applied class project credit.

    But if you don't understand what it is you are looking for, you probably won't coax them into figuring out what questions you ought to be asking. So start with the book.

    While a free version is on the site, support the work by buying a hardcopy for the library ;>

    1. Re:Start with a good book.... by HolyLime · · Score: 1

      Data mining with Rattle and R .... http://rattle.togaware.com/

      Most librarians were probably not math majors, and are unlikely to be expert in statistics. But if you can work your way through the book, you may get enough insight into your data to ask good questions from a local Math department. No doubt some graduate student(s) can get a paper out of it, or at least some applied class project credit.

      But if you don't understand what it is you are looking for, you probably won't coax them into figuring out what questions you ought to be asking. So start with the book.

      While a free version is on the site, support the work by buying a hardcopy for the library ;>

      I am actually curious as to how many librarians have math degrees. I have only met 2 so far; myself and my former professor back in grad school.

  65. What about using a CRM? by xof · · Score: 1

    Businesses use Customer relationship management systems. These tools also provide statistics.

  66. Spying ? by luk3Z · · Score: 0

    Wait a minute, is your administration want to spy you ? Why they want this data ?

    --
    Recipes for USA bankrupt - http://tinypaste.com/0d66f dd = dollar deluge (printed in the infinity)
  67. Tableau by Anonymous Coward · · Score: 0

    If you want something with powerful analysis and a drop dead easy interface, try Tableau. Not free, but definitely worth the money.

  68. SAS/SPSS are Analytic packages by Anonymous Coward · · Score: 0

    they are meant for modeling data and finding relationships between variables - think finding cross sell opportunities or look-a-like models. What you relay need is a Reporting solution not a modeling solution. Check out Tablue or Qlik or LogiXML.

    gl

    1. Re:SAS/SPSS are Analytic packages by rjune · · Score: 1

      I have been using SAS for about 3 years and it is not suitable based on the posters requirements:
      1. "I also place emphasis on anything that is open source and easy to implement": SAS is not open source and has a tremendously steep learning curve.
      2. "allow me to bypass the convoluted purchase approval process.": It is licensed directly from SAS, you won't find it at CDWG or JourneyEd etc.
      SAS requires a lot of effort, but will produce tremendous results. Since this is a relatively small part of the poster's job, it doesn't look like it would be time or cost effective approach.

  69. You sure you need so many stats?? Stick with Excel by Anonymous Coward · · Score: 0

    SAS and SPSS (and other mentioned here like R, Mathlab, Octave,etc) are tools used in highly complicated statistical analyis. I know so, cause my wife has a PhD in Biology, wrote several papers in peer review journals, and has to deal with lots of statistical analysis. Even for her works, she does lots of things with excel and the already mentioned XLStats complement (I believe its something like a set of macros for doing statistical analysis). She will only go to SPSS, Mathlab (other software she uses is Primer-e, Surfer and bit of SAS) when she needs to to more complicated stuff...

    I think you are going to complicate your daily task with this buddy. Stick with Excel, LibreOffice, OpenOffice or whatever spreadsheet you want

  70. This will depend on what you're trying to do... by rikhei · · Score: 1

    You might want to take a look at LibAnalytics (full disclosure: I work for Springshare). If you're actually trying to do statistical analysis, then I think others' recommendations will server you better - but if you're looking for a way to track many different sorts of data and generate reports, then I think LibAnalytics would serve you very well.

  71. R is available for Free with Linux by lsatenstein · · Score: 1

    Subject line fills in the banks omitted in other responses.

    --
    Leslie Satenstein Montreal Quebec Canada
  72. Convert to Evergreen open source by Anonymous Coward · · Score: 0

    Convert your entire library system to Evergreen (http://www.open-ils.org), a highly-scalable software for libraries that helps library patrons find library materials, and helps libraries manage, catalog, and circulate those materials, no matter how large or complex the libraries. Evergreen is open source software, freely licensed under the GNU GPL, and is already deployed at major libraries around the world.

    With Evergreen, you have better control of your library than you'll get from the various vendor-supported library circulation solutions. Because it is open source, you can implement yourself whatever features you need. Because it is being actively supported and extended by several major libraries, if you do not have the resources to extend the system, you can work with libraries that do have resources who may also be interested in implementing your feature.

    Evergreen is being used in King County Library System (KCLS) (http://www.kcls.org), the largest library system in the US by several measures, and the Library Journal's 2011 Library of the Year. KCLS is committed to developing, using, extending, and sharing the Evergreen system. KCLS is collaborating and coordinating the development and deployment of Evergreen with a dozen other major libraries systems around the US and the world.

  73. Sound more like a low cost datamart with stats by obscuro · · Score: 1

    Here's something that might give you a direction.

    DataMart-Tool.pdf

    For the kind of constant, ad hoc, data nagging your talking about the above would be a good start. Using something like MSSQL Analysis Services with Excel on top.... Or (get ready to feel really dirty) MS Access pivot tables....

    I haven't pointed someone toward the Microsoft in a while but if you don't have serious programming chops you're best bet is cubes in Access and they will take you quite far. They are extremely similar to Excel Pivot Tables. If you haven't explored Pivot Tables in Excel, there's a possibility that half your battles will be won there on the data as you currently have it.

    A more static but open source solution at the scale of Excel Pivot Tables is OpenOffice data pivot functionality. In fact, you might be able to put something together where OpenOffice Calc sits on top of MySQL! That get's you database, spreadsheet and pivot. Any super serious stats can be handled in R....

    --
    Every rule has more than one consequence.
  74. Orange by Anonymous Coward · · Score: 0

    Orange ( http://orange.biolab.si/ ) is an open-source app/framework for data mining, analysis and visualization. It's written in Python so it's as much a framework as an application.

  75. Not an answer but a request by TheLoneGundam · · Score: 1

    This is not the answer you're looking for... Could you post a reply here with whatever you chose to do?