Is Python a Legitimate Data Analysis Tool?
Back in May we discussed using Python, R, and Octave as data analysis tools, and compared the relative strength of each. One point of contention was whether Python could be considered a legitimate tool for such work. Now, Bei Lu writes while Python on its own may be lacking, Python with packages is very much up to the task: "My passion with Python started with its natural language processing capability when paired with the Natural Language Toolkit (NLTK). Considering the growing need for text mining to extract content themes and reader sentiments (just to name a few functions), I believe Python+packages will serve as more mainstream analytical tools beyond the academic arena." She also discusses an emerging set of solutions for R which let it better handle big data.
Any Turing-complete language is a legitimate data analysis tool.
Python may not be a legitimate data analysis tool, but it is widely used for data analysis, and it gives the right results. For the most part that is what really matters.
Just because you are paranoid does not mean that no-one is out to get you.
But most of my data analysis stuff I put in a database and retrieve it with SQL. The language makes little difference. There are also people who swear by Excel and only excel to do calculations.
It doesn't matter where you apply the math.
No.
Someone who knows so little about tools like R, python, etc. should spend their time learning about what is available rather than writing articles on the topic using their own cursory knowledge.
Since people do use python for data analysis (hence the data analysis related packages that are available), of course it's legitimate.
Just like how when you're standing on the roof and you need to pound in a couple nails, that heavy pair of pliers in your pocket is a legitimate tool. It may not be the best tool for the job, the best tool might be a pneumatic nail gun, but if all you have with you and what you know how to use is pliers, then that's the right tool. Why spend time and money learning some other "more appropriate" language (or buying an air compressor and nail gun) when you already have a tool at your fingertips that will do what you need.
As your needs grow you might need to find another more appropriate tool, but if you can get the job done with Python, why bother searching for the "perfect" tool?
Depending on your needs, sh, awk, sed, sort, and uniq may be all the tools you need - many log parsing, analysis and reporting programs have been writing with those tools, often ingesting more rows of data per day than many small business BI systems.
No.
Working link for subject.
In other news, How hot is vehicle theft is your area?
Of course.
Tomorrow on slashdot:
"Can all questions in headlines be answered by 'no' ?"
http://xkcd.com/353/
Tomorrow on slashdot:
"Can all questions in headlines be answered by 'no' ?"
Most, but not all
William of Ockham had no beard. The most likely explanation is that it was chewed off by squirrels every morning.
I looked at R and it's one of the most deranged languages I've ever seen in terms of syntax (up there with Erlang). At least Python is readable to the average programmer who knows C or Java.
It's a language, not a tool, although languages are tools to communicate with.
Analysis is a function of math, not language.
I work in the biosciences and we occasionally have a similiar discussion.
In our context, it isn't about how one analyzes the the data, it is a question about how anyone else can recreate your experiment: that is, set up the experimental system, acquire the data, analyze it which will yield approximately the same results. It is in our best interest [and mandated by our funding agency and the journals] to publish papers that clearly define how we made our observations and how we analyzed the data.
My group concludes that any tool is fine, but it must part of a well-described logaical framework in which we generate a hypothesis, test it, and make a conclusion.
The language of choice depends on the community. If everyone is using Mumps, you should use Mumps. http://en.wikipedia.org/wiki/MUMPS You want to be able to share your work with the rest of the community.
One application I am aware of where Python is widely used is bioinformatics.
http://onlamp.com/pub/a/python/2002/10/17/biopython.html
Having said the above, Python has a lot to recommend it. It has become the initial teaching language of choice. There will be some people whose only language is Python. That's OK. It scales and can be used within almost any programming paradigm. If your only language is Python and you don't have much data to process, why would you bother learning something like R. http://en.wikipedia.org/wiki/R_(programming_language)
"legitimate" is such a disrespectful value judgment. Are you saying that people who do data analysis with Python are illegitimate? Are you calling them bastards?
No, seriously, you can have a profitable conversation all about the reasons why you think there are serious drawbacks to using Python as your data analysis tool. Lots of people might benefit from that. But when you start saying things like "That's not a legitimate data analysis tool" or "That's not a real programming language" or whatever, then you are getting down into contentless arguing, passing off disrespect as if it were legitimate discourse.
If you really think use of Python as a data analysis tool is that bad, go all the way: don't try to have a serious subject on the discussion, turn it into a humorous essay on people who are so stupid and unenlightened that they can't see what is blindingly obvious to you.
A long time ago in my academic life, I took a neural networks class that did a lot of data analysis with matlab. I poked around with octave, but I finally wound up writing my projects in Perl with PDL. I'm sure not many people would do that, but I just wanted the learning experience. It was legitimate for my purposes, which was learning and the joy of being able to say I did it. But you might want to mock me for it. :)
Secession is the right of all sentient beings.
If you're seriously asking this question you're over your head.
Just ask the astronomy community. They've been moving away from IDL as an analysis environment and towards the use of python with scipy (with numpy and pyfits offering similar performance). You're asking this question several years after it's already been effectively declared as such.
Is Python a Legitimate Data Analysis Tool?
Python is not a data analysis tool. It's a programming language.
You would use python (or any other reasonable programming language) to BUILD a data analysis tool.
But that doesn't mean that python IS a data analysis tool.
I can only imagine what your next question will be: "Can the x86 processor be used to run a data analysis program?" Asking such questions demonstrates a deep misunderstanding about how computing is organized into layers.
Can betteridge's Law cause paradox?
This story seems like an echo of the one a day or so ago about Linux being critical to the success of the LHC. Something with generic programmability supports something specific, then gets discussed as a tool for that specific task. Probably a lot of the comments there apply here.
Python and Perl make great data analysis tools.
They have a plethora libraries to handle things: Numpy/Scipy for Python and PDL/GSL for Perl.
They can access FORTRAN and C libraries as necessary for either performance or legacy needs.
THey are probably best because they are high level languages, very platform neutral, and cost signficantly less than other "serious" data analysis tools/languages.
No.
Working link for subject. In other news, How hot is vehicle theft is your area?
"No." is the correct answer. That headline is just wrong.
Yes absolutely. Its being used to do all sort of data analysis in the real world.
Check out Pandas (http://pandas.pydata.org/) the Python data analysis library.
Also there are lots of machine learning libraries: scikits-learn is probably the best known (http://scikit-learn.org/)
Both of these are built on NumPy.
You should also check out the videos from the 2012 PyData workshop: http://marakana.com/s/2012_pydata_workshop,1090/index.html
add numpy, scipy, scikit-*, pyplot to make it comfortable. perhaps pyr since you mention r. one phd in physics defended in this home with help of python.
The question is not whether or not it is possible but whether or not it is realistic and practical.
Not only is it realistic and practical but it is already in use for data analysis! Everyone on the ATLAS experiment at CERN uses python to some degree in their analysis and my grad students and I use an analysis framework almost entirely in Python with ROOT for I/O.
I love R and Python. However, both of them choke on big data sets. What they need is an in-built mechanism to store data on disk rather than in-memory. There are some really convoluted ways of doing this..but then dont always work with modeling packages that weren't written with the convoluted approach you are taking, in mind. So, if the base language has the ability to store object on disk, say with a simple flag, and its transparent to the rest of the system, most downstream libraries/packages would still work.
ff package in R is a good approach..maybe that should be adopted as the memory model for R.
I hate to say this but maybe R/Python can learn something from SAS here.