Slashdot Mirror


Is Python a Legitimate Data Analysis Tool?

Back in May we discussed using Python, R, and Octave as data analysis tools, and compared the relative strength of each. One point of contention was whether Python could be considered a legitimate tool for such work. Now, Bei Lu writes while Python on its own may be lacking, Python with packages is very much up to the task: "My passion with Python started with its natural language processing capability when paired with the Natural Language Toolkit (NLTK). Considering the growing need for text mining to extract content themes and reader sentiments (just to name a few functions), I believe Python+packages will serve as more mainstream analytical tools beyond the academic arena." She also discusses an emerging set of solutions for R which let it better handle big data.

18 of 67 comments (clear)

  1. really? by Anonymous Coward · · Score: 2, Interesting

    Any Turing-complete language is a legitimate data analysis tool.

    1. Re:really? by Meshach · · Score: 2

      Any Turing-complete language is a legitimate data analysis tool.

      The question is not whether or not it is possible but whether or not it is realistic and practical.

      --
      "Maybe this world is another planet's hell"
      Aldous Huxley
    2. Re:really? by Billly+Gates · · Score: 3, Funny

      No the question is whether it is legitimate.

      Then that case Excel because you can email it and share it with colleagues and it is PHB approved.

    3. Re:really? by cynyr · · Score: 2
      --
      All of the above was encrypted with a Quad ROT-13 method. Unauthorized decryption is in violation of the DMCA.
  2. It Works by mrsquid0 · · Score: 4, Insightful

    Python may not be a legitimate data analysis tool, but it is widely used for data analysis, and it gives the right results. For the most part that is what really matters.

    --
    Just because you are paranoid does not mean that no-one is out to get you.
    1. Re:It Works by mcgrew · · Score: 5, Insightful

      Python is a language. It's a tool to build other tools with, including data analysis tools.

    2. Re:It Works by ceoyoyo · · Score: 4, Insightful

      What does "legitimate data analysis tool" mean? MatLab was included in the comparison, and MatLab is more of an engineering tool. The built in (excuse me, optional paid for) stats library is pretty limited.

      R is great for doing statistical analysis, but it's not great for doing things like image analysis. Without additional libraries R isn't nearly as good as it is with libraries either.

    3. Re:It Works by Instine · · Score: 2

      or use other libraries easily and quickly. PyCUDA gives genuinely huge number crunching power to the language. And allows meta programming which suits scripting languages and machine learning very well. http://mathema.tician.de/dl/pub/nvidia-gtc-2009.mp4

      The readability and flexibility and speed of development are what it brings, the raw power comes from the libraries it can talk to.

      --
      Because you can - or because you should?
    4. Re:It Works by roman_mir · · Score: 3, Funny

      What does "legitimate data analysis tool" mean?

      - obviously it means to ask whether Python is legitimate or is bastard, what do you think it means? It is not asking whether Python is a 'data analysis tool', it is asking whether Python is a legitimate something or other.

      So to answer the question you have to look at the Python's descendancy. You'll quickly discover that Python was actually conceived in a huge orgy of different programming paradigms, styles and languages, it's even named after a circus!

      I believe the answer is that Python is a bastard of data analysis tools, but so what, bastards are people too.

  3. Re:Call me old fashioned by ceoyoyo · · Score: 5, Interesting

    It depends how complicated the math is.

    I wrote a general linear model in Python because I was unhappy with the existing ones and I wanted an intimate knowledge of how it worked. I wrote most of a general linear mixed model, but then decided it wasn't worth the time and just used the one in R via RPy2. Then it turned out the one built into R was too slow, so I upgraded to the one in the lme R package. That exists because a lot of smart people use R.

    But sure, if your "data analysis" involves multiplication and maybe a t-test or two, it doesn't really matter what you use.

  4. Use what works by hawguy · · Score: 5, Insightful

    Since people do use python for data analysis (hence the data analysis related packages that are available), of course it's legitimate.

    Just like how when you're standing on the roof and you need to pound in a couple nails, that heavy pair of pliers in your pocket is a legitimate tool. It may not be the best tool for the job, the best tool might be a pneumatic nail gun, but if all you have with you and what you know how to use is pliers, then that's the right tool. Why spend time and money learning some other "more appropriate" language (or buying an air compressor and nail gun) when you already have a tool at your fingertips that will do what you need.

    As your needs grow you might need to find another more appropriate tool, but if you can get the job done with Python, why bother searching for the "perfect" tool?

    Depending on your needs, sh, awk, sed, sort, and uniq may be all the tools you need - many log parsing, analysis and reporting programs have been writing with those tools, often ingesting more rows of data per day than many small business BI systems.

  5. Re:http://en.wikipedia.org/wiki/Betteridge's_Law_o by jdgeorge · · Score: 3, Funny
  6. Re:http://en.wikipedia.org/wiki/Betteridge's_Law_o by godrik · · Score: 2

    Tomorrow on slashdot:

    "Can all questions in headlines be answered by 'no' ?"

  7. Python can do anything by Anonymous Coward · · Score: 5, Funny

    http://xkcd.com/353/

  8. Re:Better than R by ceoyoyo · · Score: 3, Informative

    R is MUCH nicer when you use it through a bridge from Python.

  9. Re:Call me old fashioned by hey! · · Score: 3, Insightful

    Alright, you're old-fashioned. And you're mixing up apples and oranges.

    I think what most people these days are talking about is not just having some kind of online analytics data resource, but having a system where having that resource is taken as a given and the task is to use mathematics and AI to classify records, discover patterns and relationships, locate unusual data (without necessarily specifying the nature of the anomaly in advance), and whatnot.

    A spreadsheet is fine for doing simple summaries of small, heterogeneous, tabular datasets (calculating averages and whatnot). But it's not going to help you find one record out of millions where your search criteria are too complex to be expressed in a SQL where clause.

    --
    Post may contain irony: discontinue use if experiencing mood swings, nausea or elevated blood pressure.
  10. Perl and Python both by gizmo_mathboy · · Score: 2

    Python and Perl make great data analysis tools.

    They have a plethora libraries to handle things: Numpy/Scipy for Python and PDL/GSL for Perl.

    They can access FORTRAN and C libraries as necessary for either performance or legacy needs.

    THey are probably best because they are high level languages, very platform neutral, and cost signficantly less than other "serious" data analysis tools/languages.

  11. Re:http://en.wikipedia.org/wiki/Betteridge's_Law_o by VortexCortex · · Score: 2

    No.

    Working link for subject. In other news, How hot is vehicle theft is your area?

    "No." is the correct answer. That headline is just wrong.