Python Gets a Big Data Boost From DARPA
itwbennett writes "DARPA (the U.S. Defense Advanced Research Projects Agency) has awarded $3 million to software provider Continuum Analytics to help fund the development of Python's data processing and visualization capabilities for big data jobs. The money will go toward developing new techniques for data analysis and for visually portraying large, multi-dimensional data sets. The work aims to extend beyond the capabilities offered by the NumPy and SciPy Python libraries, which are widely used by programmers for mathematical and scientific calculations, respectively. The work is part of DARPA's XData research program, a four-year, $100 million effort to give the Defense Department and other U.S. government agencies tools to work with large amounts of sensor data and other forms of big data."
I get the impression that in the Engineering and Scientific community Python is the new Fortran. I hope so, because it would be "Fortran done right".
So is this going to focus on Python 2 or 3? Might be a reason to upgrade..
Every experiment which ends in a big bang is a good experiment.
The put the money in the wrong place. They should have put it in to R which very popularly interfaces with Python.
As a full time Python developer for going on 6 years this is good to hear! Now if we can get a Python-lite to replace Javascript in the browser.
Yeah the govt needs better systems to manage the huge databases and dossiers they are building on everybody with their warrentless wiretaps and reading everybody's emails. Anybody who helps with this project is pretty damn naive if they don't think it will also be used for this.
For that matter anybody who trusts the govt and thinks the govt is your friend is pretty damn naive. Yeah I would like to believe that too. No I won't ignore the mountains of evidence to the contrary. I won't treat all the counterexamples as isolated cases. I see them for what they are: an amazingly consistent pattern. The rule, not the exception. Govt positions are really attractive to sociopath types who just love power and control and a feeling that they are important and they get that feeling by imposing their will on us.
So what you are saying is that DARPA funds will be used in a way to further the goals of DARPA/The government? Shocking. I haven't read anything that says which agencies will/won't have access to these tools - so I'd hazard a guess that any department that wants it can have it (including the famous three letter agencies).
FYI, Continuum Analytics is a company that is based on providing high-performance python-based computing to clients. Any packages they might release will either be open source (and can be checked), or closed source (in which case you don't have to use it). They aren't hijacking the Numpy/Scipy libraries. They are developing libraries/tools for a client (who happens to be DARPA). (Frankly, I'd hope that Continuum Analytics open sources their development because it might be useful to the larger community). You do know that DARPA funds also go to improve robotics, they supported ARPANET, and a lot of their space programs later got transferred to NASA?
Basically, I have no idea what you are ranting about. One government organization funded a project - it happens all the time. Do you rant about NSF/NIH/NASA money as well? If so, you'd better live in a cave - a lot of government sponsored research has gone into almost every modern convenience that we take for granted.
So, they're porting R and Perl PDL to Python, then?
~/ssh slashdot.org ssh: connect to host slashdot.org port 22: too many beers
What is this APRANET thing? It sounds like some useless crap loaded acronym to me.
matplotlib already does this in conjunction with Numpy and Scipy - its plotting quality and flexibility compares favourably to Matlab.
Its biggest drawback is that it is pretty glacial even by Matlab's standards when rendering large datasets (think millions of points). I'm not sure whether matplotlib or the interactive backend is at fault, but anything DARPA can do to improve the situation would be welcome.
It's strange that this article focused on Python and Continuum when there is a much bigger story to be had. The XDATA program is being run in a very open source manner, and there will be a multitude of open source tools created and delivered by the end of the contract. The program is focusing on two major tasks: the analytics/algorithmic tools to process big data; and the visualization/interaction tools that go along with them.
Have they heard of Matlab?
Frankly, I'd hope that Continuum Analytics open sources their development because it might be useful to the larger community
Open sourcing is a requirement of the XDATA program.
Now China can win!
The summary and article seem to conflate Big Data with Analytics. These days the two often go together, but it's quite possible to have either one without the other. Big Data is "more data than can fit on one machine", and analytics means "applying statistics to data". E.g. many Big Data projects start out as "capture now, analyze a year or two from now," and maybe just do simple counts in the interim, which is not "analytics". And of course, many useful analytics take place in the sub-terabyte range.
The irony with this story is that Python is useful for in-memory processing, and not "Big Data" per se. To process "Big Data" typically requires (today, based on available tools, not inherent language advantages) JVM-based tools, namely Hadoop or GridGain, and distributed data processing tasks on those platforms require Java or Scala. Both of those platforms leverage the uniformity of the JVM to launch distributed processes across a heterogeneous set of computers.
The real use case here is one first reduces Big Data using the JVM platform, and only then once it can fit into the RAM of a single workstation, use Python, R, etc. to analyze the reduced data. So typically, yes, these Python libraries will be used in Big Data scenarios, but pedantically, analytics doesn't require Big Data and Python isn't even capable (generally, based on today's tools) of processing raw Big Data.