Slashdot Mirror


Python Gets a Big Data Boost From DARPA

itwbennett writes "DARPA (the U.S. Defense Advanced Research Projects Agency) has awarded $3 million to software provider Continuum Analytics to help fund the development of Python's data processing and visualization capabilities for big data jobs. The money will go toward developing new techniques for data analysis and for visually portraying large, multi-dimensional data sets. The work aims to extend beyond the capabilities offered by the NumPy and SciPy Python libraries, which are widely used by programmers for mathematical and scientific calculations, respectively. The work is part of DARPA's XData research program, a four-year, $100 million effort to give the Defense Department and other U.S. government agencies tools to work with large amounts of sensor data and other forms of big data."

180 comments

  1. Great. Just Great by Anonymous Coward · · Score: 1, Insightful

    The work is part of DARPA's XData research program, a four-year, $100 million effort to give the Defense Department and other U.S. government agencies tools to work with large amounts of sensor data and other forms of big data.

    Yeah the govt needs better systems to manage the huge databases and dossiers they are building on everybody with their warrentless wiretaps and reading everybody's emails. Anybody who helps with this project is pretty damn naive if they don't think it will also be used for this.

    For that matter anybody who trusts the govt and thinks the govt is your friend is pretty damn naive. Yeah I would like to believe that too. No I won't ignore the mountains of evidence to the contrary. I won't treat all the counterexamples as isolated cases. I see them for what they are: an amazingly consistent pattern. The rule, not the exception. Govt positions are really attractive to sociopath types who just love power and control and a feeling that they are important and they get that feeling by imposing their will on us.

  2. I get the impression that by Chrisq · · Score: 5, Interesting

    I get the impression that in the Engineering and Scientific community Python is the new Fortran. I hope so, because it would be "Fortran done right".

    1. Re:I get the impression that by BlackPignouf · · Score: 1

      I think you're right.
      I love Ruby, it's a very fun and effective language, I could write it in my sleep but there are so many cool projects that are written in Python.
      Those languages are *very* similar, and it's a shame that so much effort is being divided between communities.
      I might get to learn Python one day but I'm afraid I'd become a so-so programmer in both languages.

    2. Re:I get the impression that by Anonymous Coward · · Score: 1

      Fortran done right is fortran that's slow as hell?

      Beyond just the speed issue, I've had problems where simulations in python die right in the middle, because it had been developed on a 32 bit machine, and some of the libraries defaulted to using the architecture's precision. The problem was quite hard to debug, because it was way after the numbers had been stored. This is the kind of bullcrap you get when your language doesn't have static types.

    3. Re:I get the impression that by Anonymous Coward · · Score: 1

      Yea... Fortran done right is actually... Fortran done right. There's nothing wrong with the language.

    4. Re:I get the impression that by jma05 · · Score: 5, Interesting

      > I might get to learn Python one day but I'm afraid I'd become a so-so programmer in both languages.

      I empathize since I conversely only barely use Ruby. Once someone learns one of these languages, there is not that much that the other offers. But happily, one need not learn advanced Python to benefit from these projects.

      > it's a shame that so much effort is being divided between communities

      AFAIK, all scientific funding from US and Europe is/was always directed to Python, not Ruby. So Python is firmly established as a research language and there is not much effort being divided with Ruby (which seems to have a much more spotted and amateur movement in this direction), at least as far as scientific stuff is concerned (Ruby is more popular on web app side). For me the tension for scientific use is not between Python and Ruby, but between Python and R. Python community is replicating a lot of R functionality these days but R still has a much better lead in science libraries. Happily, it is quite easy to call R from Python.

    5. Re:I get the impression that by solidraven · · Score: 5, Informative

      You're dead wrong, nothing quite beats Fortran in speed when it comes to number crunching. If you need to go through hundreds of gigabytes of data and performance is important there's only one realistic choice: Fortran. Python isn't fit to run on a large cluster to simulate things, too much overhead. And lets not forget what sort of efficiency you can get if you use a good compiler (Intel Composer). You won't find Fortran on the way out over here, it's here to stay!

    6. Re:I get the impression that by Anonymous Coward · · Score: 0

      You got to be fucking kidding. You either don't know what Python is, or you are completely clueless abou what Fortran is. Here's a hint: Python's main features are:
      a) it exists and it is there
      b) it is easy to use
      c) it is very popular as a scripting language

      The rest is just a natural consequence of smart people who don't have a lifetime available to wrap their head around C++ needing to pull some code together, and finding a language which doesn't demand that you sacrifice first child to write a hello world program.

    7. Re:I get the impression that by ctid · · Score: 2

      Why would Fortran be any faster than any other compiled language?

      --
      Reality is defined by the maddest person in the room
    8. Re:I get the impression that by Anonymous Coward · · Score: 1

      Well.. there's C, of course...

    9. Re:I get the impression that by Anonymous Coward · · Score: 1

      I thought NumPy (which i'm sure you would be using if doing large number crunching) was based on Fortran code (LAPACK) anyway? And with things like IPython clustering, it can run on large clusters of computers easily.

      It's probably not as fast as pure fortran, but if it lets scientists build a model by themselves quickly instead of learning fortran or queuing up someone who does than it seems like a good thing to me...

    10. Re:I get the impression that by ssam · · Score: 2

      FORTRAN does arrays in a way thats slightly easier for the compiler to optimise. But some modern techniques and data structures are much harder to do in FORTRAN compared to c++. It is also quite easy to call C, C++ or FORTRAN functions from python.

      Writing a loop in python is slow. You express that loop as a numpy array operation you get a substantial way towards c speed. if you use numexpr you will get something faster than a simple C version.

      Processing big data is as much about moving the data around, and minimising latency in this movement as the raw processing speed. so a language that lets you express things efficiently will win in the end.

    11. Re:I get the impression that by Anonymous Coward · · Score: 5, Informative

      Short answer, Fortran has stricter aliasing rules so the compiler has more optimization opportunities. Long answer, see Stack Overflow.

    12. Re:I get the impression that by martin-boundary · · Score: 1

      Why would Fortran be any faster than any other compiled language?

      Because the language is simpler, so the compiler can make assumptions and generate better automatic optimizations. C/C++ are much harder to optimize (=generate optimal assembly instructions).

    13. Re:I get the impression that by Anonymous Coward · · Score: 0

      Fortran compilers have been around for much longer than other compilers, so the optimizations are well-known. Much research has been put into generating Fortran code, so additional research has gone into keeping the Fortran code running fast, since it is hard to redo the algorithms with the amount of testing money spent verifying that the software is correct. That means that old Fortran code persists, with compiler optimization, despite the fact it limits parallelization efforts, because the algorithms can not be altered (automatic parallelization, OpenMP are nice quick-fixes, but will only impress your boss). So, the end result is a push to make sure the compiler optimizations are as fast as possible to keep the old Fortran code decent without a rewrite.

    14. Re:I get the impression that by martin-boundary · · Score: 1

      Processing big data is as much about moving the data around, and minimising latency in this movement as the raw processing speed. so a language that lets you express things efficiently will win in the end.

      If by expressing things efficiently you mean easy for the programmer to write, then you're wrong. What matters (doubly so for big data) is full control over the machine's resources, ie how data is laid out in memory, good control over i/o etc. While this has always been the key to fast performance, big data is plagued by big-oh asymptotics. For example, if you can lay out your data structures efficiently enough to keep everything in cache, your running time can easily gain a factor of ten, ie 1 day instead of 10 days. Ask Google or Facebook if they care about that...

      Scripting languages have their place where performance doesn't matter _enough_ to optimize, eg your local supermarket chain trying to datamine their customers in time for the end of the month.

    15. Re:I get the impression that by Anonymous Coward · · Score: 2, Informative

      I guess the problem is that people who speak about Fortran actually think about FORTRAN. The last FORTRAN standard was from 1977, and that shows. After that, there had been no new standard and little new development until the Fortran 90 standard (note the different capitalization). Fortran 90 got rid of the old punch card based restrictions by giving it completely new, much more reasonable code parsing rules (it still accepts old form code for backwards compatibility, but you cannot mix both forms in one file because they are too different), gave it a full set of properly nesting flow control statements (actually that was one thing already commonly available as non-standard extension to FORTRAN), and added very powerful array processing, operator overloading, and modules (and probably a few other things I don't remember right now). Later versions even added object orientation (and probably a whole set of other things; I haven't really followed Fortran development beyond Fortran 90).

    16. Re:I get the impression that by Kwyj1b0 · · Score: 4, Interesting

      Compared to plain old Python, yes. But Cython offers a lot of capabilities that improve speed dramatically - just using a type for your data in Cython gives programs a wonderful boost in speed.

      As someone who uses Matlab for most of my programming, I have come to detest languages that do not force specifying a variable type and/or declaring variables. Matlab offers neither, but it is a standard in some circles.

    17. Re:I get the impression that by LourensV · · Score: 5, Insightful

      You're probably right, but you're also missing the point. Most scientists are not programmers who specialise in numerical methods and software optimisation. Just getting something that does what they want is hard enough for them, which is why they use high-level languages like Matlab and R. If things are too slow, they learn to rewrite their computations in matrix form, so that they get deferred to the built-in linear algebra function libraries (which are written in C or Fortran), which usually gets them to within an order of magnitude of these low-level languages.

      If that still isn't good enough, they can either 1) choose a smaller data set and limit the scope of their investigations until things fit, 2) buy or rent a (virtual) machine with more CPU and more memory, or 3) hire a programmer to re-implement everything in a low-level language and so that it can run in parallel on a cluster. The third option is rarely chosen, because it's expensive, good programmers are difficult to find, and in the course of research the software will have to be updated often as the research question and hypotheses evolve (scientific programming is like rapid prototyping, not like software engineering), which makes option 3) even more expensive and time-consuming.

      So yes, operational weather forecasts and big well-funded projects that can afford to use it will continue to use Fortran and benefit from faster software. But for run-of-the-mill science, in which the data sets are currently growing rapidly, having a freely available "proper" programming language that is capable of relatively efficiently processing gigabytes of data while being easy enough to learn for an ordinary computer user is a godsend. R and Matlab and clones aren't it, but Python is pretty close, and this new library would be a welcome addition for many people.

    18. Re:I get the impression that by Terrasque · · Score: 1

      PyPy might change that in the future, especially with the Transitional Memory branch.

      --
      It's The Golden Rule: "He who has the gold makes the rules."
    19. Re:I get the impression that by Anonymous Coward · · Score: 0

      You miss the point. The pure CPU speed is important but appropriate language constructs (generators, coroutines) are equally important to deal with memory and processing complexity problems.

      While your Fortran code will choke when it hits swap, then Python code might simply nicely fly and finish the task.

    20. Re:I get the impression that by Impy+the+Impiuos+Imp · · Score: 0

      Python, the indent-based, block-structured language? I have about 6 months experience with it, I guess it's not enough to see the advantage of it qua-number crunching syntax.

      Oh well, it's just 0.75% of one day's borrowing.

      --
      (-1: Post disagrees with my already-settled worldview) is not a valid mod option.
    21. Re:I get the impression that by Chrisq · · Score: 3, Informative

      The entire point of Fortran is that it has difficult-to-deal-with aliasing rules that make the compiler more free to produce optimized code. That's why it is suitable for things that require every last bit of performance you can wring out of it. Today probably you can get the same thing with C or C++ provided you are prepared to use things like restrict, but it used to be you couldn't, so Fortran ruled certain topics.

      Python is an easy-to-use system with abysmal performance - expect 10-100x slowdown for code that runs in pure Python over a similar C version. If you can get things set up so Python is only gluing other C components together and the data never has to touch native Python data structures or loops, then performance will be fine, but now you aren't really coding in Python any more.

      The point is, the purpose of Fortran and the purpose of Python are entirely opposed. They are exactly the opposite of each other. So it boggles the mind how you can think that Python can be Fortran "done right". So much so that now I suspect I got trolled. Well done, sir.

      Yes I understand, and many people made the same point. However Fortran was for a lot of scientists and engineers the hammer to crack any nut. It was used for simple "try outs" where performance wasn't needed, simply because it was the language that Engineers knew. I think the same thing is happening with Python now, it is the first and sometimes only language that many engineers know. Now for the performance issue, it will not give the best performance but packages like SciPy and NumPy do give very good performance (arguably by using these libraries you are just using python to string c functions together, but it is properly integrated). Tests show that you are getting about a third of the performance of Fortran, (with the exception of the Fortran DGEMM marix multiply which greatly outperforms Python and other Fortran variants). The typical engineering reaction to performance needs is to throw hardware at the problem, then optimise your algorithm, and only change language if absolutely necessary!

    22. Re:I get the impression that by SpzToid · · Score: 1

      No one seems to be pining away for Fortran programmers. At least not much ayways. A quick 'n dirty search on Dice.com yields 46 results, (and no doubt a few are doubles).

      --
      You can't be ahead of the curve, if you're stuck in a loop.
    23. Re:I get the impression that by Anonymous Coward · · Score: 0

      Well, if that's your starting point, you've already failed.

    24. Re:I get the impression that by Anonymous Coward · · Score: 1

      I think you're right.
      I love Ruby, it's a very fun and effective language, I could write it in my sleep but there are so many cool projects that are written in Python.
      Those languages are *very* similar, and it's a shame that so much effort is being divided between communities.
      I might get to learn Python one day but I'm afraid I'd become a so-so programmer in both languages.

      Both languages suffer from the global interpreter lock defect and will require a rewrite in the next 5-10 years if the languages have any chance of surviving in the servers. It will take some very serious, dedicated, low level work and I just don't see it happening. I have this fantasy where Guido and Matsumoto will sit down and write the common code together for a super-interpreter that will handle different syntax in a modular way. I know it's technically possible since GCC is doing something very similar but, again, I just can't see this happening.

      In the meantime, Go is looking mighty good...

    25. Re:I get the impression that by nadaou · · Score: 4, Insightful

      You're probably right, but you're also missing the point. Most scientists are not programmers who specialise in numerical methods and software optimisation.

      Which is exactly why FORTRAN is an excellent choice for them instead of something else fast (close to assembler) like C/C++, and why so many of the top fluid dynamics models continue to use it. It is simple (perhaps a function of its age) and because of that it is simple to do things like break up the calculation for MPI or tell the compiler to "vectorize this" or "automatically make it multi-threaded" in a way which is still a long from maturity for other languages.

      Can you guess which language MATLAB was originally written in? You know that funny row,column order on indexes? Any ideas on the history of that?

      R is great an all, and is brilliant in its niche, but how's that RAM limitation thing going? It's not a solution for everything.

      MATLAB is pretty good too, as is Octave and SciLab, and it has gotten a whole lot faster recently, but ever try much disk I/O or array resizing for something which couldn't be vectorized? Becomes slow as molasses.

      If that still isn't good enough, they can either 1) choose a smaller data set and limit the scope of their investigations until things fit,

      heh. I don't think you know these people.

      2) buy or rent a (virtual) machine with more CPU and more memory,

      Many problems are I/O limited and require real machines with high speed low latency network traffic. VMs just don't cut it for many parallelized tasks which need to pass messages quickly.

      Forgive me if I'm wrong, but your post sounds a bit like you think you're pretty good on the old computers, but don't know the first thing about FORTRAN and are feeling a bit defensive about that, and attacking something out of ignorance.

      --
      ~.~
      I'm a peripheral visionary.
    26. Re:I get the impression that by nadaou · · Score: 1

      So yes, operational weather forecasts and big well-funded projects that can afford to use it will continue to use Fortran and benefit from faster software.

      I don't mean for this to be pick on LourensV day, but I have another small nit to pick. You're presuming operational weather forecasting is well funded? I don't think funding has anything to do with it. Often it's what the original author knew which chose the language.
      And have you seen what's been done to NOAA's budget over they last decade?? Well funded. LOL.

      FORTRAN is used because it's easy to get your head around so you can focus on the science not the coding. Much in the same way as Python is meant to be, as a matter of fact.

      How's that threading library in Python 2/3 doing? Still not able to actually make more than one thread and has to spawn new processes instead? Python is quite nice, and I welcome the improvements, but it still has a long way to go. Hopefully this a bit of funding will bring that a little closer to reality.

      --
      ~.~
      I'm a peripheral visionary.
    27. Re:I get the impression that by lattyware · · Score: 3, Interesting

      The GIL is an overblown issue. Threading is designed to get around issues with accessing slow resources, not for serious parallel computing. Just use multiprocessing if you want to do lots of computing in parallel, problem solved.

      --
      -- Lattyware (www.lattyware.co.uk)
    28. Re:I get the impression that by lattyware · · Score: 2

      Oh, and Python without a GIL exists, it's called Jython.

      --
      -- Lattyware (www.lattyware.co.uk)
    29. Re:I get the impression that by Anonymous Coward · · Score: 0

      Perhaps true.... with the difference being that FORTRAN is fast as hell and Python is blindingly slow.

    30. Re:I get the impression that by pjabardo · · Score: 1

      You are actually right but you are missing the point. Python doesn't compete with Fortran, it supplements it. With tools such as f2py, it is very easy to call fortran code from python (and there are tools that make it easy to call C/C++). This combination really potentializes both languages: bottlenecks use Fortran/C/C++ and the rest python. This combination is already popular: numpy/scipy is basically that.

      I don't think that being easy is python's main advantage. Using a dynamic environment were you can type code that gets executed immediately and were you can explore the data is a really big help. On the other hand, the same could be done with R, Matlab, Octave or Scilab and it is done. In some ways these languages are better suited than python because they were designed to do math, or more specifically matrices/arrays very well and might have better syntax for that. But then doing anything else increasingly becomes a pain once the problem becomes larger or more complex and that's where, IMHO, python gains an advantage. Better module/OOP environment, better GUI,etc.

      By the way, I work on scientific computing, using spectral element methods in computational fluid dynamics and I also work on a wind tunnel and I do lot's of data acquisition and processing. Right now I use C++ for lower level stuff (and bottlenecks) and R. I have been seriously considering switching to Python to have an easier environment to maintain.

    31. Re:I get the impression that by csirac · · Score: 1

      Perhaps he means it's well funded in the sense that they have dedicated programmers at all. "Run of the mill" science is done by investigating scientists or their jack-of-all-trades research assistants, collaborators or grads/post-docs, etc. most of which are unlikely to have substanital software engineering experience or training in their background.

      Nonetheless, they write code - very useful, productive code - but it's in whatever tool or high-level language popular among their peers/discipline (matlab, R, python, perl, fortran... each corner of science has their favourite things and if you want to leverage the work of others you run with whatever everyone else is using unless you have funding and good reasons not to).

    32. Re:I get the impression that by Anonymous Coward · · Score: 0

      So the language gets a bad rap for things that weren't standardized until... 23 years ago? And we're comparing against Python? Which didn't hit version 1.0 until 1994?

    33. Re:I get the impression that by Anonymous Coward · · Score: 0

      Next up... comparing the 8086 to the i5....

    34. Re:I get the impression that by Anonymous Coward · · Score: 0

      This is what Python and Ruby programmers actually believe. It's quite pathetic.

    35. Re:I get the impression that by LourensV · · Score: 1

      You're not picking on me, you're arguing your point. That's what this thing here is for, so no hard feelings at all.

      I'll readily admit to not knowing Fortran (or much Python! ;-)); I'm a C++ guy myself, having got there through GW-Basic, Turbo Pascal and C. I now teach an introductory programming course using Matlab (and know of its history as an easy-to-use Fortran-alike), and I use R because it's what's commonly used in my field of computational ecology. I greatly dislike R, and I'm not too hot on Matlab either, as the first thing you should do when programming is to decide what the program is about, and to express that you need type definitions, which Matlab nor R have. From a very quick look around, at least recent versions of Fortran do have them, so that's good in my view. As for the RAM limitations in R, it seems to me that that is actually a consequence of the vectorised style of programming and the lack of lazy evaluation: you tend to get either unreadable code with enormous expressions, or a lot of temporaries which eat up lots of RAM.

      Replying to your other post, I was thinking of the many hundreds of millions that are spent on satellites and the dedicated compute clusters for weather forecasting. I've also heard of budget issues and lack of replacement satellites in that area, but it's still a lot of money compared to most grants. Over here it's big news if someone manages to get a million Euro grant, spread over a couple of years, while NOAA has a 4.7 billion USD yearly budget. Of course they do other things than weather forecasting, I'm comparing an entire government organisation to a single scientific investigation here, but it's a different level for sure.

      In the end, I suspect that we're simply in different fields, and therefore seeing different things. Generally speaking, the more physical the field, the more tech-savvy the scientists, and the more computer use. In my institute, Microsoft Excel is by far the number one data processing tool...

    36. Re:I get the impression that by Anonymous Coward · · Score: 1

      If by expressing things efficiently you mean easy for the programmer to write, then you're wrong. What matters (doubly so for big data) is full control over the machine's resources, ie how data is laid out in memory, good control over i/o etc. While this has always been the key to fast performance, big data is plagued by big-oh asymptotics. For example, if you can lay out your data structures efficiently enough to keep everything in cache, your running time can easily gain a factor of ten, ie 1 day instead of 10 days. Ask Google or Facebook if they care about that...

      Scripting languages have their place where performance doesn't matter _enough_ to optimize

      I don't think anyone would dispute that using pure python for a Big Data application would be insanity. But that's not what's happening. Continuum Analytics will be writing its performance-critical code in C (or fortran or another low-level language). They will use Python for non-performance-critical code, including (but not limited to) the API. (This is also how NumPy is written, and SciPy, etc. etc.)

    37. Re:I get the impression that by tyrione · · Score: 1

      You lost me at ``Most scientists are not programmers...'' schtick. Whether it was my Mechanical Engineering professors fluent in ADA, C, Fortran, C++ or Pascal or my EE professors in the same, to my Mathematics Professors all in the same, not a single CS Professor could hold a candle to them, unless we started dicking around with LISP, SmallTalk or VisualBasic for shits and giggles. In fact, they became proficient in these languages because they had to write custom software to model nonlinear-dynamic systems. Perhaps in the post 2000 era scientist group we have Matlab/Octave/R/Python lovers but the old school folks are hardcore in their knowledge of those languages.

      Rarely does one find an expert in software development who is an expert in any Engineering, Physics or Mathematics field of research.

    38. Re:I get the impression that by Dcnjoe60 · · Score: 1

      You're dead wrong, nothing quite beats Fortran in speed when it comes to number crunching. If you need to go through hundreds of gigabytes of data and performance is important there's only one realistic choice: Fortran. Python isn't fit to run on a large cluster to simulate things, too much overhead. And lets not forget what sort of efficiency you can get if you use a good compiler (Intel Composer). You won't find Fortran on the way out over here, it's here to stay!

      Isn't that the point of DARPA funding this project - to make it so Python is fit to run on a large cluster to simulate things? I do agree, though, that Fortran is here to stay. However, it is so specialized in what it does and that often a solution then requires multiple languages to get the task accomplished.

      Back in the day (1970s) I had a professor who would say that you can write anything in anything. For instance you could write a business app in Fortran and you can use COBOL for plotting trajectories to the moon. But, why would you? Each excel at what they were designed for and create a lot of extra work trying to make them do what they weren't designed for.

      Something like Python is good at doing a lot of different things, but not necessarily great at large number crunching/analysis. It seems like DARPA is wanting to change that. That doesn't mean that FORTRAN will be obsolete, but if successful, it does mean that Python can be even more useful in research than it is now.

    39. Re:I get the impression that by Anonymous Coward · · Score: 0

      I know this won't get much love, but julia is Fortran done right. There was slashdot article about it: http://science.slashdot.org/story/12/04/18/1423231/julia-language-seeks-to-be-the-c-for-numerical-computing

    40. Re:I get the impression that by Anonymous Coward · · Score: 1

      SciPy tries to use LAPACK (or any other such tools) where-ever possible. NumPy is based in C, but does try to utilize specialty math libraries like Intel's MKL wherever possible. So, the core numerical array class (NumPy) is C, while the advanced scientific tools (SciPy) are in C and Fortran. Because of that, they are an *extremely* powerful duo.

    41. Re:I get the impression that by solidraven · · Score: 1

      Yes, but Python is still an interpreted language and very slow compared to Fortran.

    42. Re:I get the impression that by solidraven · · Score: 1

      I disagree partially with what you said based on personal experience. As an EE student I had to learn to use Fortran for my thesis. I needed to run a large EM simulation and not a single affordable commercial program was able to run on a small cluster of computers that was available. So I resorted to using Fortran and MATLAB for visualisation. So I managed to learn basic Fortran over the weekend and then use it to write a working program for a cluster, all within 1 week time. I just don't think I could have done that with Python. Especially considering the time constraints I had in terms of runtime.

    43. Re:I get the impression that by solidraven · · Score: 1

      Nah, Fortran was designed with number crunching for scientific and engineering applications in mind. It won't choke, it won't stop. Fortran compilers are far smarter than Python when dealing with memory. The language was designed to allow the compiler to make assumptions to speed up computation and make for efficient memory management. But I'll agree that you shouldn't write the entire application in Fortran. For visualisation other languages are better suited (MATLAB/Octave comes to mind). You can have a python script to assign the tasks to the cluster. But for the actual calculations I'd still use Fortran. It's still the tool of the trade for very good reasons.

    44. Re:I get the impression that by solidraven · · Score: 1

      Sure you can, any language that has a full feature set can do any task that the system is capable off. But efficiency is also important, and Fortran simply has so much advantages over Python. Complex data structures aren't needed for most simulations while they make optimisation so much harder. Additionally interpretation is a serious bottleneck.

    45. Re:I get the impression that by K.+S.+Kyosuke · · Score: 1

      Wouldn't it be simply better for people to learn Haskell?

      --
      Ezekiel 23:20
    46. Re:I get the impression that by mattr · · Score: 1

      It would be easier to get some of that Darpa money sent over to Pynie and it will all run on Parrot (multithreaded stable as of last month apparently). Then you will be able to call Perl6 and Befunge when you get tired of indenting all the time (ducks)

    47. Re:I get the impression that by Anonymous Coward · · Score: 0

      In my experience, combining two languages to accomplish a task pretty much means the solution has a very short shelf-life. Foreign function interfaces are great and all when you need to use widely supported libraries, but otherwise you're practically begging the next guy to rewrite the whole thing.

    48. Re:I get the impression that by Anonymous Coward · · Score: 0

      The GIL is somewhat of an overblown issue. But that, the overall slowness of CPython, and other factors have soured me a bit on using python in scientific work. Why make python3 have breaking changes if you don't take the opportunity to adopt more radical improvements? Lots of good stuff in other languages that python could have adopted: everything is an expression, currying/partial applications, better function syntax, etc. Why not throw more support behind PyPy and drop CPython? Having the reference implementation also not be the best kinda sucks, and since it provided implementation specific APIs, we are now stuck with it when using numpy and scipy.

    49. Re:I get the impression that by pthisis · · Score: 2

      The core processing in SciPy/NumPy is done in compiled C or Fortran libraries (LAPACK is used extensively where available), not in Python.

      I'm unaware of a widely-used interpreted version of Python. Whether Python is byte-compiled (CPython), JIT'd (psyco, pypy, IronPython, many Jython stacks), or compiled ahead of time to machine code (Jython+gcj, ShedSkin) depends on which Python implementation you're talking about.

      --
      rage, rage against the dying of the light
    50. Re:I get the impression that by Anonymous Coward · · Score: 0

      Except that Python is an interpreted language and is awfully slow. It can be OK with vector and array based processing with these external libraries, but in Fortran the external libraries are in Fortran too, i.e. there are already two or three languages to deal with. In addition to this, not all problems are amenable to LA / vector / matrix processing; many tasks are best solved via iteration and imperativeness close to the machine level.

    51. Re:I get the impression that by SolitaryMan · · Score: 1

      Yes, but Python is still an interpreted language and very slow compared to Fortran.

      Nope, it's not. Never was.

      --
      May Peace Prevail On Earth
    52. Re:I get the impression that by ericcc65 · · Score: 1

      I get the impression that in the Engineering and Scientific community Python is the new Fortran. I hope so, because it would be "Fortran done right".

      I love Python and think that would be generally good but there is a problem with that. Fortran runs fast, in real-time. For all the talk about how speed doesn't matter when you're doing real-time signal processing it does. I like numpy/scipy/matplotlib for prototyping but then I have to implement algorithms in C/C++. It would be nice if a higher level language than C came along that could still compile to fast code.

    53. Re:I get the impression that by gl4ss · · Score: 1

      buying vm's on the same farm is just another way of getting access to real machines on the cheap for limited time. it's just another way of saying of buying time on a supercomputer now.

      --
      world was created 5 seconds before this post as it is.
    54. Re:I get the impression that by thetoadwarrior · · Score: 0

      Static typing is for bad programmers.

    55. Re:I get the impression that by steveha · · Score: 1

      I resorted to using Fortran and MATLAB for visualisation. So I managed to learn basic Fortran over the weekend and then use it to write a working program for a cluster, all within 1 week time. I just don't think I could have done that with Python.

      Python with SciPy is a lot like MATLAB. Python, the language, is far superior to MATLAB's language; I hate 1-based array indexing, for example. MATLAB's language does have a few special features for matrices that Python lacks, but that is just syntactic sugar (there are functions to do everything in Python).

      When you use SciPy, there is a bunch of compiled Fortran and C code that is running under the covers. The heavy lifting of matrix work is all done as fast as Fortran, because it is being done in Fortran; you just didn't write the Fortran.

      So Python is an expressive and pleasant language that lets you set up your calculations, and then the calculations run at full speed. This is the reason why scientists who just need to get their work done are turning to Python.

      The big win is that you don't need to write two versions, the code in MATLAB for looking at graphs, and the code in Fortran for doing the real work; you just write the code once in Python.

      I've talked to people who use Python on supercomputer clusters. There are Python libraries to support this. I haven't done this work myself so I don't want to try to say how easy or hard these libraries are, but if I needed to write a weather simulation I would try that route.

      --
      lf(1): it's like ls(1) but sorts filenames by extension, tersely
    56. Re:I get the impression that by Anonymous Coward · · Score: 0

      But efficiency is also important, and Fortran simply has so much advantages over Python. Complex data structures aren't needed for most simulations while they make optimisation so much harder. Additionally interpretation is a serious bottleneck.

      0) When you write code in SciPy you are using compiled Fortran library code. Python sets up the calls to Fortran functions, but Fortran does the number crunching.

      1) Python itself runs something like 50 times slower than Fortran, but that doesn't matter, because see point 0.

      2) Pedants will complain about your last comment, because Python isn't really interpreted. The Python compiler turns your program into bytecodes, then the Python virtual machine runs the bytecodes. But it is absolutely true that the way Python does things adds serious overhead (see comment 1 about "50 times slower"). PyPy can produce tremendous speedups using JIT compilation, but SciPy won't be running on PyPy anytime soon. Doesn't matter; SciPy is already good enough that a lot of scientists are using it. And people keep improving SciPy every day... e.g. there are libraries to apply the power of a GPU.

    57. Re:I get the impression that by Anonymous Coward · · Score: 0

      I've been coding in R since it was in beta, and along the way have also written in Python, C/C++, Lisp, FORTRAN, and Perl. I've enjoyed coding in all these languages.

      R, MATLAB, Python, and other similar scripting frameworks are great (I wish people would use Python instead of MATLAB), but over the years I've realized how limiting they are. Their speed and memory requirements are atrocious, and these problems only get worse as datasets get larger, compounded by issues with parallel computation.

      I understand the argument that you can wrap R or Python around C or FORTRAN and use the latter for the heavy lifting, but I've realized over time that paradigm only works if your problem fits exactly with the specific algorithm you've inscribed into the C/FORTRAN code. In practice, that's almost never, so you almost inevitably end up coding part of the thing in R or Python, and then you're back to the bottleneck.

      R and Python are wonderful for certain things, but I've come to the conclusion that they're ultimately insufficient. This isn't by choice, but by having to deal with both of them over and over again and hitting my head against walls with them. Things like the Programming Language Game benchmarks aren't perfect, but if you peruse enough things like that, and write realistic benchmarks of your own, you realize how limited R and Python are. Even javascript is lightyears ahead of the both of them performance-wise in many ways.

      At the same time, things like C and C++ have progressed to a point where there's often little difference in how difficult it is to write something in C or R. 10 years ago I wouldn't have said that, but now the libraries available for C or C++ are amazing, and it's difficult to justify writing things in a language like R or Python.

      I suspect that as things move forward, the torch will gradually move from C/C++ to something like Go, or maybe Julia, or some natively compiled version of Scala, or something that isn't even heard of yet. But as much as I love them, R and Python are dead-ends in the long run (for numerical computing).

    58. Re:I get the impression that by Anonymous Coward · · Score: 0

      As somebody who has written a medium-sized threaded application, I thank Guido himself for the GIL. There are SO MANY race conditions that I didn't have to care about because the GIL was there to save my butt. And yet I was in no way prevented from doing things while other threads were blocked on IO.

      That's the whole point of the threads -- your app can keep going while part of it is waiting for some random IO to happen. Especially helpful when you're trying to download a dozen different pages from slow webservers.

    59. Re:I get the impression that by blueskies · · Score: 1

      Both languages suffer from the global interpreter lock defect and will require a rewrite in the next 5-10 years if the languages have any chance of surviving in the servers.

      You don't really understand big data, if you think it needs to run on ONE computer.

      This is only a problem if you think threading is the solution to scaling CPU computations across hundreds of computers. If you generalized your code to run on hundreds of computers, there is no reason you can't run a process per core for your multicore machines.

    60. Re:I get the impression that by Anonymous Coward · · Score: 0

      > Can you guess which language MATLAB was originally written in? You know that funny row,column order on indexes? Any ideas on the history of that?

      I didn't realize M(row,column) is a funny indexing method. That's standard in mathematics. Same with a(1). Indices in math start at 1.

    61. Re:I get the impression that by Anonymous Coward · · Score: 0

      How's that threading library in Python 2/3 doing? Still not able to actually make more than one thread and has to spawn new processes instead? Python is quite nice, and I welcome the improvements, but it still has a long way to go. Hopefully this a bit of funding will bring that a little closer to reality.

      Are you talking about the thread library that has been around since python 1.4 in 1996 using pthreads? Yeah, you are 16 years too late. http://docs.python.org/release/1.4/lib/node72.html#SECTION00840000000000000000

      The problem with even considering the use of pthreads is that interestingly enough, processes seem to work better when you are talking about a large number of compute nodes. Most people encounter problems trying to run one multi-threaded process across 100 compute nodes.

    62. Re:I get the impression that by Spacelem · · Score: 1

      Sadly, with the exception of a few times where I get to sum an array, pretty much my whole model needs to be run in a fast language like C or Fortran (I use C, my supervisor uses F77). It's the kind of model (a spatial stochastic disease simulation) that doesn't really lend itself to coding up in Python. No matrices, just lots of little bits of data interrogation, calculating one event at a time, and so many loops (unavoidable) that it would just crawl in Python. If you try to start in Python and replace the slow bits with C, then before you know it it's more C than Python. In the end, I think doing the whole thing in C is just less work.

      I do all the graphs, and the non-spatial deterministic versions in Octave (R's graphs are prettier, but R is less pleasant to use), where it does take advantage of Fortran for the ODE solver, but that's the only bit. I do generally prefer the 1-based array indexing though, the only places I've found 0-based indexing useful has been dealing with C's inability to handle multidimensional variable length arrays in an easy fashion, so I wasn't really convinced by 0-based arrays in general. Perhaps I'd have been better off with Fortran, but that's just the way it turned out.

    63. Re:I get the impression that by Anonymous Coward · · Score: 0

      I don't know anything about your problem domain, and you do, so perhaps you are right. But there are lots of people doing serious work with Python, sometimes using extensions like Cython for the performance-intensive bits.

      http://blog.perrygeo.net/2008/04/19/a-quick-cython-introduction/

    64. Re:I get the impression that by chris_mahan · · Score: 1

      Question: Why do you not need to indent Perl code?

      Answer: You put all the code on one line. /me ducks.

      --

      "Piter, too, is dead."

    65. Re:I get the impression that by butalearner · · Score: 1

      Python with SciPy is a lot like MATLAB. Python, the language, is far superior to MATLAB's language; I hate 1-based array indexing, for example. MATLAB's language does have a few special features for matrices that Python lacks, but that is just syntactic sugar (there are functions to do everything in Python).

      Even as a MATLAB user I agree, as long as we're strictly talking about the language. Many of GNU Octave's woes (though they're getting JIT now!) can be blamed on the poor language design.

      But there are many things that SciPy doesn't have. Yes, MATLAB is an unnecessarily expensive choice for data analysis, but my employer only uses it for that (not "big data," mind you) because it's already our design tool, so it's an ideal rapid prototyping environment. That's where it really shines: Simulink and code generation, which is how many of the aerospace companies do it these days. There is no Python/SciPy alternative, and the commercial alternatives can't touch it.

    66. Re:I get the impression that by solidraven · · Score: 1

      I doubt SciPy would have been as easily to expand for running on a cluster. These sort of things come of as natural to Fortran. Additionally if I write my code in Fortran the compiler can optimize it a lot further than Python will allow me to. Hence the speed advantage is still in Fortran's hands which is important if you don't have access to the latest hardware and time on large clusters.

    67. Re:I get the impression that by solidraven · · Score: 1

      Most of those are still interpreted. It's not because it's a bytecode that it's not interpreted. In fact even your CPU interprets complex instructions and executes them using a set of simple instructions in a lot of cases (yay for RISC/CISC hybrids). A pre-compiled generalised library will never reach the performance of real Fortran code. People often forget that a lot of Fortran's performance comes from the way it deals with memory, pre-compiled libraries can't do that. Not to mention what a few decades of optimizer development has done for Fortran.

    68. Re:I get the impression that by solidraven · · Score: 1

      Sorry, but you're wrong: it is. Or did you forget where the PYC files come from? You might want to read the official Python documentation on this one http://docs.python.org/3/glossary.html . Go to "interpreted" in case you're too lazy to find it yourself. And by the definition we use over at the electronics department Python is an interpreted language no matter what you wish to claim.

    69. Re:I get the impression that by Anonymous Coward · · Score: 0

      What they need is something better than threading.

      Inter process communications and synchronization is harder and more complicated than it has to be, but that can be fixed.

      There's seldom a need to share everything amongst threads. The way threads work is actually an unsafe default.

    70. Re:I get the impression that by pthisis · · Score: 1

      Most of those are still interpreted. It's not because it's a bytecode that it's not interpreted. In fact even your CPU interprets complex instructions and executes them using a set of simple instructions in a lot of cases (yay for RISC/CISC hybrids).

      Okay, then Fortran's an interpreted language too. What was the point of your original post, then?

      Moving the goalposts like this in the middle of a conversation is pointless--sure, there's a semi-rational definition under which x86 assembler is an interpreted language in some sense. That's not what people mean by an interpreted language, nor is it what you meant in the post I was responding to.

      --
      rage, rage against the dying of the light
    71. Re:I get the impression that by pthisis · · Score: 1

      Sorry, but you're wrong: it is. Or did you forget where the PYC files come from? You might want to read the official Python documentation on this one http://docs.python.org/3/glossary.html [python.org] . Go to "interpreted" in case you're too lazy to find it yourself. And by the definition we use over at the electronics department Python is an interpreted language no matter what you wish to claim.

      You're conflating implementations with languages.

      Not every Python implementation even has .pyc files. When I compile a .py file to java bytecodes and then use gcj to generate linkable object code, that's interpreted? And if you consider the use of psyco+CPython to be interpreted, then I'd humbly suggest that the definition you use over at the electronics department is wildly out of touch with what the computer science community means by "interpreted".

      Compiled vs. VM vs. interpreted are artifacts of particular implementations, not of the language itself--Aspen has a perfectly fine Fortran interpreter, and EiC and ch have fine C interpreters.

      --
      rage, rage against the dying of the light
    72. Re:I get the impression that by Anonymous Coward · · Score: 0

      Hollerith codes.

    73. Re:I get the impression that by solidraven · · Score: 1

      First of all, if you keep pulling different implementations out of your hat you can try to prove anything. And C and Fortran interpreters, lets just not go there before this turns into a complete comedy. You butcher both languages simply by doing that. On the other hand, Python wasn't designed to be compiled like that and is inefficient at it. So yes, our definition still holds. Compiling such a language will never lead to an optimal implementation in size, memory usage or performance. If you take the path you suggest it becomes even worse. The JVM is stack based, modern x86 processors are pretty much the exact opposite. Translation only goes so far. There goes your efficiency... And it's not because it's linkable that it's fast or worth using, we all remember PHP Phalanger don't we? That one also produced linkable code...

    74. Re:I get the impression that by solidraven · · Score: 1

      Fortran is compiled directly to machine code in most cases. The Fortran VMs and interpreters aren't used all that often as far as I'm aware. At least I haven't seen any of them used in production environments. Lets take a good example: Intel Composer actively seems to avoid microcode based instructions and goes for fast hardware implemented ones and uses all features of the hardware. Pretty interesting to see at times how much a good compiler can make a difference. Trying to achieve the same with compiled python will be very difficult simply due to the language's structure. Even more so if you consider how compile paths can include things like java bytecode. Lethal for performance on x86 based systems.

      And I'm not moving goalposts at all, it's not because your definition of interpreted is so skewed that mine is. Not interpreted means it runs on bare metal without any form of code interacting with it (so no microcode either).

    75. Re:I get the impression that by Spacelem · · Score: 1

      Thanks for the link. My problem is that there isn't any one bit you can point to and say "that's the slow bit" (unless it's telling the code which parameters to use, varying the parameters, and then graphing the results when done -- I'm currently doing those parts with bash and Octave, and to be fair I would probably be better off doing both of those in Python).

      The main work is the simulation, and it's where I've got a trivially small amount of data (say a 20x20 lattice of sites containing the number of susceptible and infective animals), so I need arrays to store the numbers of individuals, the birth, death, infection, recovery, dispersal rates for each site, and one that keeps track of which sites need updating.

      The bits you might think would be the slow bits (summing arrays, checking there are no groups with negative numbers of individuals, converting rate matrices into cumulative distribution functions and using a binary search to select an event) just don't seem to have that much of an effect on the performance. The only time that a part significantly stands out is when calculating dispersal across the entire lattice, rather than a nearest neighbour dispersal. The rest is just lots of small things that need to be done randomly and frequently.

      I have profiled the model quite a bit, and the C code is over 300 times faster than the prototype written in Octave (taking advantage of vectorisation whenever possible, and using fast algorithms), but all the non-performance critical bits are either deeply embedded in the code (which is horrifically loopy by nature of the problem), or necessary for the rest to work. So Python isn't really going to help.

    76. Re:I get the impression that by DragonWriter · · Score: 1

      I love Ruby, it's a very fun and effective language, I could write it in my sleep but there are so many cool projects that are written in Python. Those languages are *very* similar, and it's a shame that so much effort is being divided between communities.

      I think I disagree. I think that its great that both communities exist and each can develop languages in ways unconstrained by the particular historical choices that shaped the other languages (and that, in both cases, each has subcommunities around particular implementations that can experiment with things unconstrained by the historical choices that shaped other implementations.) In both the between-languages and between-implementations cases, this results in a lot of new ideas which spread between the various subcommunities that probably wouldn't happen as much if you had a monoculture around one language (or, within a language, around one implementation.)

      I might get to learn Python one day but I'm afraid I'd become a so-so programmer in both languages.

      I think, on balance, learning Python alongside Ruby (no matter which you learn first) makes you a better programmer at both: sure, you might occasionally conflate syntax constructs from one language with ones from the other, but that's more than offset by the deeper understanding of each language you get from the broader perspective that comes from experience with the different ways each does things, and the idioms common to each language.

    77. Re:I get the impression that by lattyware · · Score: 1
      --
      -- Lattyware (www.lattyware.co.uk)
    78. Re:I get the impression that by Anonymous Coward · · Score: 0

      I used to do "big data" and "cloud" computing when it was called clusters. While the biggest cluster I ever worked at was probably no more than 32 machines with 2 cores each, I can say this with a certainty: Anything other than a compiled language with low level facilities is a pure waste of time and money.

      While with Java you at least get some safety for big projects, there's absolutely no good reason to program anything remotely performance demanding on any level with python as it stands right now.

      Unless, it's bioinformatics. Then you'll be needing the worst possible software to hide the fact you're doing nothing but wasting money...

    79. Re:I get the impression that by blueskies · · Score: 1

      I used to do "big data" and "cloud" computing when it was called clusters.

      Did you run one process with multiple threads across all of those machines, or was threading less of an issue once you started thinking about distributed computing?

      I can say this with a certainty: Anything other than a compiled language with low level facilities is a pure waste of time and money.

      Isn't that what Numba does? Compiling Python code using LLVM and being able to understand numpy data structures? I'm still not sure I understand what threading has to do with this. The OP said threading was an issue, but threading doesn't

      While with Java you at least get some safety for big projects

      Safety? Job security?

    80. Re:I get the impression that by Spacelem · · Score: 1

      That funny row/column order in matrix indices (aka column major order) is because it's the correct mathematical order.

      Consider that you can only multiply two matrices if matrix A is of size [i,j], and matrix B is of size [j,k], i.e. the number of rows in A must be equal to the number of columns in B. The product C=AB is then of size [i,k]. This works for any number of matrices, so, [i,j]*[j,k]*[k,l]*[l,m] is valid, and gives [i,m].

      This naturally leads to the indexing you see in Fortran and Matlab, because it's the way mathematicians like it. If you had row major order, then [j,i]*[k,j]=[k,i], which is pretty horrible in comparison.

  3. Matlab by Anonymous Coward · · Score: 1

    Bye-bye Matlab. I liked your plotting capabilities, but that was about it.

    1. Re:Matlab by sophanes · · Score: 2

      matplotlib already does this in conjunction with Numpy and Scipy - its plotting quality and flexibility compares favourably to Matlab.

      Its biggest drawback is that it is pretty glacial even by Matlab's standards when rendering large datasets (think millions of points). I'm not sure whether matplotlib or the interactive backend is at fault, but anything DARPA can do to improve the situation would be welcome.

    2. Re:Matlab by 0100010001010011 · · Score: 1

      Still nothing for Simulink.

    3. Re:Matlab by richtopia · · Score: 1

      This is why I would like to see some money go towards Sage, allowing it to better replace Matlab in the future.

    4. Re:Matlab by 0100010001010011 · · Score: 1

      Sage doesn't do anything Simulink does.

  4. Congratulations are in order by Anonymous Coward · · Score: 1

    Seriously- the Continuum folks do great work, and after hanging out with them a bit at the last PyCon I was really impressed with where they seemed to be headed. Hope they make it there.

  5. Python 2 or 3? by toQDuj · · Score: 3, Interesting

    So is this going to focus on Python 2 or 3? Might be a reason to upgrade..

    --
    Every experiment which ends in a big bang is a good experiment.
    1. Re:Python 2 or 3? by SQL+Error · · Score: 4, Informative

      Both. The prebuilt "Anaconda" distro defaults to Python 2.7, but it also works with 3.3 and 2.6.

  6. Wrong language by Dishwasha · · Score: 4, Funny

    The put the money in the wrong place. They should have put it in to R which very popularly interfaces with Python.

    1. Re:Wrong language by Anonymous Coward · · Score: 0

      Maybe that's all the company is going to do... give new interface functions and pocket the money.

    2. Re:Wrong language by toQDuj · · Score: 1

      Perhaps. After all, it is in the nature of companies to ask as much money as possible for as little work as possible.

      --
      Every experiment which ends in a big bang is a good experiment.
    3. Re:Wrong language by SQL+Error · · Score: 3, Informative

      DARPA runs a lot of these research seed programs, putting a couple of million dollars into a bunch of different but related research projects. In this case the program budget is $100 million in total, and Continuum got $3 million for their Python work (Numba, Blaze, etc). Some of the program money may have gone to R as well; there's a couple of dozen research groups, but I don't have a full list.

    4. Re:Wrong language by csirac · · Score: 1

      Wow, I hope not. As much as I am actually a Ruby fan at heart; and as much as I appreciate the R community and everything R has done, it always seems much easier to write slow and/or memory-intensive R code than in Python. Perhaps I never quite spent enough time with it but there are many corners to the language which seem unnecessarily tedious. And no references - variables are all copied around the place, which is expensive. I know, I know... worrying about pass-by-value and efficiency of assignment statements (well, R doesn't really have statements; everything-is-an-expression) means I'm doing it wrong, but most code I debug is written by someone else who is also doing it wrong..

      Then there's pandas and the rest of the SciPy stack, which is the only reason I used Python over Ruby (I had also considered Perl+Moose) in my last project. pandas is extremely fast, and I was able to write some quite advanced data processing stuff which would normally have needed far more effort in Ruby or Perl.

    5. Re:Wrong language by hyfe · · Score: 2
      http://en.wikipedia.org/wiki/R_(programming_language)

      R is a statistical programming language. It has lots of neat methods and functions implemented, and is rules the world of statistical analysis.. which is kinda cool, since it's also open source.

      It sits pretty much halfway between Matlab and Python.. It's pretty usuable and convenient because of the huge library, but as a programming language it just, well, sucks ball. Building up the objects some of the methods there need, if you get data from an unexpected source, is just an utter pain in the bottomhole.

      --
      "" How about taking the safety labels off everything, and let the stupidity-problem solve itself? """
    6. Re:Wrong language by drinkypoo · · Score: 1

      Others have complained about limitations of R in this very thread, so it doesn't seem as cut-and-dried as you make it out to be. Python is the popular language of this particular fifteen-minute period, so it's the logical choice to put the effort into. Scientists would like to benefit from language popularity too.

      --
      "You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
    7. Re:Wrong language by Anonymous Coward · · Score: 0

      That was a massive long rant about Ruby, the problem is you've got the wrong language. R != Ruby.

    8. Re:Wrong language by Anonymous Coward · · Score: 0

      The best comment I have seen about Python and R:

      I prefer Python to R for mathematical computing because mathematical computing doesn't exist in a vacuum; there's always other stuff to do. I find doing mathematical programming in a general-purpose language is easier than doing general-purpose programming in a mathematical language.

      http://www.johndcook.com/blog/2012/10/24/python-for-data-analysis/

  7. Good news for the Python community by kauaidiver · · Score: 3, Funny

    As a full time Python developer for going on 6 years this is good to hear! Now if we can get a Python-lite to replace Javascript in the browser.

    1. Re:Good news for the Python community by Anonymous Coward · · Score: 0

      Wow, I've never thought of this. Imagine modules! :O

    2. Re:Good news for the Python community by lattyware · · Score: 1

      Yeah, the issue is that Python is pretty hard to sandbox, being the hugely dynamic language it is. I imagine it would take a lot to get the browsers to stop working on their JavaScript implementations that they have sunk insane amounts of time and effort into, and start something brand new.

      Trust me, I'd love to see it happen, but I don't think it will.

      --
      -- Lattyware (www.lattyware.co.uk)
    3. Re:Good news for the Python community by Anonymous Coward · · Score: 0

      Yeah, as a full time $LANG developer for going on $RAND years this is good to hear! Now if we can get a $LANG-lite to replace $LANG_I_KNOW_BUT_DONT_MASTER in the $_PLATFORM.

    4. Re:Good news for the Python community by thetoadwarrior · · Score: 1

      It could easily be done but there are too many people who are heavily invested in JS being broken.

  8. Enthought Python by screff · · Score: 1

    I wonder how this effort compares to the work being done by Enthought Python. Hopefully it is more open and freely available to all, or better yet, incorporated into the mainline python distro.

    1. Re:Enthought Python by Anonymous Coward · · Score: 0

      What work is Enthought doing in this space?

    2. Re:Enthought Python by blueskies · · Score: 1

      Luckily all of the XData funded work is required to be open source. This looks like the numpy work: http://blaze.pydata.org/

  9. Re:Great. Just Great by Kwyj1b0 · · Score: 5, Insightful

    Yeah the govt needs better systems to manage the huge databases and dossiers they are building on everybody with their warrentless wiretaps and reading everybody's emails. Anybody who helps with this project is pretty damn naive if they don't think it will also be used for this.

    For that matter anybody who trusts the govt and thinks the govt is your friend is pretty damn naive. Yeah I would like to believe that too. No I won't ignore the mountains of evidence to the contrary. I won't treat all the counterexamples as isolated cases. I see them for what they are: an amazingly consistent pattern. The rule, not the exception. Govt positions are really attractive to sociopath types who just love power and control and a feeling that they are important and they get that feeling by imposing their will on us.

    So what you are saying is that DARPA funds will be used in a way to further the goals of DARPA/The government? Shocking. I haven't read anything that says which agencies will/won't have access to these tools - so I'd hazard a guess that any department that wants it can have it (including the famous three letter agencies).

    FYI, Continuum Analytics is a company that is based on providing high-performance python-based computing to clients. Any packages they might release will either be open source (and can be checked), or closed source (in which case you don't have to use it). They aren't hijacking the Numpy/Scipy libraries. They are developing libraries/tools for a client (who happens to be DARPA). (Frankly, I'd hope that Continuum Analytics open sources their development because it might be useful to the larger community). You do know that DARPA funds also go to improve robotics, they supported ARPANET, and a lot of their space programs later got transferred to NASA?

    Basically, I have no idea what you are ranting about. One government organization funded a project - it happens all the time. Do you rant about NSF/NIH/NASA money as well? If so, you'd better live in a cave - a lot of government sponsored research has gone into almost every modern convenience that we take for granted.

  10. Looking forward... by Anonymous Coward · · Score: 1

    ... to Python operated railguns. That would be awesome :D

    1. Re:Looking forward... by Anonymous Coward · · Score: 0

      Looking forward to Python operated railguns. That would be awesome :D

      However, when you try to abort the launch command with a CTRL^C

      It JUST LAUNCHES EVEN HARDER.

    2. Re:Looking forward... by Anonymous Coward · · Score: 0

      However, when you try to abort the launch command with a CTRL^C

      It JUST LAUNCHES EVEN HARDER.

      That's not a bug, it's a feature!

    3. Re:Looking forward... by Anonymous Coward · · Score: 0

      How would it be more awesome than operated railguns?

  11. So... by CAIMLAS · · Score: 2

    So, they're porting R and Perl PDL to Python, then?

    --
    ~/ssh slashdot.org ssh: connect to host slashdot.org port 22: too many beers
    1. Re:So... by Anonymous Coward · · Score: 1

      That'd be rather nice.

    2. Re:So... by Anonymous Coward · · Score: 0

      So, they're porting R and Perl PDL to Python, then?

      No, this is not a port of R and Perl PDL. Python, for nearly a decade, has been building its scientific computing capabilities. What's new and significant about this latest venture, is that Python will soon have a simplified interface for working with out-of-core data and maintain computational efficiency. http://blaze.pydata.org/

  12. There is also Pandas by siDDis · · Score: 1

    Pandas http://pandas.pydata.org/ is another great tool for data analysis. It use numpy and is highly optimized with critical code paths which is written in C.

  13. Hope these guys work with Wes McKinney (Pandas) by bwbadger · · Score: 1

    This DARPA work sound like it's in the same space as the Pandas library. I hope they can work together.

    1. Re:Hope these guys work with Wes McKinney (Pandas) by blueskies · · Score: 1

      Pandas is right under: "Scientific & Data Analysis Packages" http://docs.continuum.io/index.html

  14. Re:Great. Just Great by Anonymous Coward · · Score: 5, Funny

    What is this APRANET thing? It sounds like some useless crap loaded acronym to me.

  15. Its all going on making the documentation legible by Anonymous Coward · · Score: 0

    Only half a troll, seriously the sphinx/numpy documentation themes are terrible compared to javadoc standard.
    Finding epydoc has dropped my swearing to lines of code ratio by heaps.

  16. Re:What is this APRANET (sic) thing? by Anonymous Coward · · Score: 0

    http://en.wikipedia.org/wiki/ARPANET

  17. Re:Great. Just Great by N!k0N · · Score: 1

    you're using it now. or a derivative work, anyway.

  18. llvm by Anonymous Coward · · Score: 1

    python linking to llvm is the way to really speed it up and a few groups are seriously working on it

    1. Re:llvm by Anonymous Coward · · Score: 0

      Yes, and Continuum Analytics is one of them. Numba (http://numba.pydata.org) is their Python compiler for arrays & big data.

  19. There's more to XDATA by seekthirst · · Score: 2

    It's strange that this article focused on Python and Continuum when there is a much bigger story to be had. The XDATA program is being run in a very open source manner, and there will be a multitude of open source tools created and delivered by the end of the contract. The program is focusing on two major tasks: the analytics/algorithmic tools to process big data; and the visualization/interaction tools that go along with them.

  20. Re:Its all going on making the documentation legib by lattyware · · Score: 1

    Seriously? Sphinx makes beautiful documentation that is easy to find your way around. Compared to the ugly-ass JavaDocs that are painful to browse through, I wouldn't even give it a second thought.

    --
    -- Lattyware (www.lattyware.co.uk)
  21. Python? by Murdoch5 · · Score: 2

    Have they heard of Matlab?

    1. Re:Python? by vgerclover · · Score: 1

      Have you heard of Open Source?

    2. Re:Python? by Murdoch5 · · Score: 1

      Fine then use Octave or one of the other mathematical open source packages. The issue is that they want to adapt a system instead of using an existing one.

    3. Re:Python? by Anonymous Coward · · Score: 2, Insightful

      Okay, look. I used Octave for a long time on Linux and on Windows. On Linux (Ubuntu) it generally worked rather well and I used it for classwork where possible. On Windows, it works well as long as you don't need to plot anything. I can't tell you the number of times I installed/uninstalled various versions of Octave on Windows to find out that the plotting was broken in some way. MATLAB is great until you run in to licensing issues.

      Then I found out about the combination of IPython/Numpy/Scipy/Matplotlib, which now all seems to fall under the name of "Scipy". It runs circles around Octave in just about every way, except that the syntax doesn't try to be matlab compatible. The plotting isn't as good as MATLAB's plotting, for large data sets, but for 99% of use cases, it works quite well, and for that other 1% I've been able to reduce my data set or view the data differently. Where "Scipy" destroys Octave and MATLAB is that in the same language as I do scientific computing, I have access to database libraries, asynchronous networking, good HDF5 support, GUI toolkits, multithreading, multiprocessing, etc. This is because Python is a computer language that makes it easy to integrate or "glue" things together. To the point that people created and glued really some really good numerical processing and plotting libraries. Saying "Fine then use Octave" is ridiculous because it ignores how much better "Scipy" is than Octave. Also, with Anaconda CE, you get a bunch of useful packages installed by default, available as 64-bit on every major OS. I understand that Octave is maintained by volunteers and that Numpy/Scipy have some degree of financial backing, but they're both open source, and I'm going to use the open source option that is more polished. If you don't explicitly care about trying to adhere to matlab syntax(which mathworks continually tries to break, anyways), then I don't know why someone would choose Octave over Scipy.

    4. Re:Python? by naroom · · Score: 1

      Thanks! As a scientist looking to switch away from Matlab, this was really informative! Somebody get this guy some mod points :)

    5. Re:Python? by thetoadwarrior · · Score: 1

      Proprietary languages, lol.

    6. Re:Python? by steveha · · Score: 1

      Have they heard of Matlab?

      Have you heard of SciPy?

      I predict that a tipping point is coming, and after we reach that tipping point, Matlab will become a legacy language and all the new projects will be SciPy.

      Right now Matlab is benefiting from network effect: everyone uses Matlab because everyone uses Matlab. It's the standard, you expect to see everyone using it in certain industries. But it's a proprietary product controlled by a single company that is doing its best to extract maximum revenue from it.

      Meanwhile, Python and SciPy are free and open-source software. There is a generation of college students who are using SciPy for free rather than buying even a student license for Matlab, and they will be heading into the jobs market soon. In some sciences, such as astronomy, SciPy is rapidly displacing niche solutions and becoming the standard.

      The day is coming when paying for Matlab will be as unusual as paying for a web browser. (I remember the days when everyone bought Netscape Navigator, and it seemed that a real open-source free browser was a dauntingly huge difficult project that might take forever to happen. The actual tipping point was, in retrospect, pretty fast.)

      --
      lf(1): it's like ls(1) but sorts filenames by extension, tersely
    7. Re:Python? by Murdoch5 · · Score: 1

      Fine, then grab one of the other million and 1/2 great open source mathematical packages and run with it. Basically a ton of money is being sunk into something that can be solved by moving platforms. Open or not, there is software which fulfills the need and for $100 million you can get a lot of anything.

    8. Re:Python? by steveha · · Score: 1

      grab one of the other million and 1/2 great open source mathematical packages

      Okay, why?

      The scientific community is already coalescing around SciPy. You are arguing that DARPA should send money to anything but SciPy but you didn't give a reason.

      --
      lf(1): it's like ls(1) but sorts filenames by extension, tersely
    9. Re:Python? by Anonymous Coward · · Score: 0

      Have they heard of Matlab?

      Matlab is a great tool but can be prohibitively expensive and difficult to use for parallel compute. On top of that, there aren't (as of yet) good matlab routines for targeting GPU and mutli-core architecture. While still nascent, this does exist in the Python community.

    10. Re:Python? by Murdoch5 · · Score: 1

      The scientific community is already coalescing around SciPy.

      Maybe some it is but I'm going to bet the vaste majority aren't even touching python. I would never use python for scientific computing, it's not designed for it, simply put. Sure you can do light scrientific computing in SciPy maybe even some more advanced functions but if Python has to go balls to the wall, it simply wont measure up!

      So if your looking for a system that can handle all your big data and your large storage why not look towards a system which can handle most of it out of the box, that's not Python. You can alter any language to include those features or you can just start with the right one.

      If SciPy is so powerful then it should have no problem implementing my real time model based VDHL generation system for complex genre detection utilizing real time audio samples. I just finished designing this system in Matlab and have started porting it to Octave, a task such as mine needs powerful scientific systems and when I was designing my project I never once even considered using SciPy because I know Python isn't designed to handle powerful scientific computing.

      So I think DARPA needs to step back and really think of a better way to approach this, personally I'm still going to use an open based mathimatical modeling and programming system along the lines of Matlab ( not open ) or Octave, it's designed to handle these kind of tasks from the get go.

    11. Re:Python? by Anonymous Coward · · Score: 0

      So you didn't even bother to look into SciPy, but you "know Python isn't designed to handle powerful scientific computing". This will come as a surprise to all the scientists using SciPy for powerful scientific computing.

      http://conference.scipy.org/proceedings/
      http://andy.terrel.us/blog/2012/09/27/starting-with-python/
      https://us.pycon.org/2012/schedule/presentation/463/

      I'm not even going to try to persuade you. Have fun with Matlab and have a nice life.

    12. Re:Python? by Anonymous Coward · · Score: 0

      What features does Scipy lack that it's unable to generate VHDL?

      In terms of performance, you might want to check this out: The performance difference of Octave vs. Scipy seems mostly a wash: https://www.osc.edu/files/research/cse/projects/octave_python.pdf
      http://julialang.org/

      You can say "Oh, that wimpy software doesn't have enough POWER for my POWERFUL SCIENTIFIC COMPUTING!", but you should ask yourself if that's a true statement. Porting from Matlab to Octave is a good reason to choose Octave, but claiming that Scipy isn't "powerful" enough is a flimsy charge with out any evidence.

    13. Re:Python? by Anonymous Coward · · Score: 0

      my real time model based VDHL generation system...when I was designing my project I never once even considered using SciPy because I know Python isn't designed to handle powerful scientific computing

      How large is this this cluster you are using? Somehow I don't think you are talking the same kind of BIG data that everyone else is talking about.

  22. Re:Great. Just Great by sdaug · · Score: 5, Informative

    Frankly, I'd hope that Continuum Analytics open sources their development because it might be useful to the larger community

    Open sourcing is a requirement of the XDATA program.

  23. YAY by sproketboy · · Score: 2

    Now China can win!

  24. Re:Great. Just Great by Anonymous Coward · · Score: 0

    You have no idea what he's talking about? It was pretty clear: factions within the US government wants these tools to datamine all the ISP data they have been snarfing up so they can spy on everyone in the world. Saying that you believe otherwise is a pretty extreme view and, as such, requires a very high standard of proof. Do you have that proof? No, then STFU while us adults try to figure out how to stop this obvious slide into tyranny.

  25. Big Data != Analytics by michaelmalak · · Score: 2

    The summary and article seem to conflate Big Data with Analytics. These days the two often go together, but it's quite possible to have either one without the other. Big Data is "more data than can fit on one machine", and analytics means "applying statistics to data". E.g. many Big Data projects start out as "capture now, analyze a year or two from now," and maybe just do simple counts in the interim, which is not "analytics". And of course, many useful analytics take place in the sub-terabyte range.

    The irony with this story is that Python is useful for in-memory processing, and not "Big Data" per se. To process "Big Data" typically requires (today, based on available tools, not inherent language advantages) JVM-based tools, namely Hadoop or GridGain, and distributed data processing tasks on those platforms require Java or Scala. Both of those platforms leverage the uniformity of the JVM to launch distributed processes across a heterogeneous set of computers.

    The real use case here is one first reduces Big Data using the JVM platform, and only then once it can fit into the RAM of a single workstation, use Python, R, etc. to analyze the reduced data. So typically, yes, these Python libraries will be used in Big Data scenarios, but pedantically, analytics doesn't require Big Data and Python isn't even capable (generally, based on today's tools) of processing raw Big Data.

    1. Re:Big Data != Analytics by Anonymous Coward · · Score: 0

      Big Data is "more data than can fit on one machine"

      False. Big Data for most firms is anything that exceeds their current business analytics technology stack. A recent Microsoft paper about "No one ever got fired for buying Hadoop" shows that about 50% of Hadoop cluster workloads can fit comfortably on a single machine.

      The irony with this story is that Python is useful for in-memory processing, and not "Big Data" per se. To process "Big Data" typically requires (today, based on available tools, not inherent language advantages) JVM-based tools, namely Hadoop or GridGain, and distributed data processing tasks on those platforms require Java or Scala.

      This is so wrong it's not even funny. Many organizations from finance to scientific computing all rely on Python to deal with vastly larger datasets than your average Hadoop user. Python has been used for robust data processing for over a decade, by folks like NASA and virtually every national lab. Just because there isn't a bunch of marketing hype at the latest "big data" conference about Python, doesn't mean that it's not suitable for these tasks. The irony in your refutation is that as a technology, Hadoop for disk-based ETL processing is *incredibly* slow, and anyone who's had to earn their keep doing distributed computing for the last 10 years scoffs at it. The reason for its popularity is because of the price point of HDFS and Hadoop-based ETL workflows are much, much lower than existing enterprise data warehouse software costs (e.g. SAP, Informatica, Oracle, IBM).

      The real use case here is one first reduces Big Data using the JVM platform, and only then once it can fit into the RAM of a single workstation,

      If you read the description of the Blaze project, the point of it is to deliver analytical kernels to the data in situ, without needing to do this kind of data reduction. The pipeline-oriented paradigm does not work well when you really need to do advanced analytics over truly Big Data, of the sort that DARPA and others get. If you *do* reduce your Big Data down, then it's Little Data and a tremendous amount of its value has been whittled away.

      So typically, yes, these Python libraries will be used in Big Data scenarios, but pedantically, analytics doesn't require Big Data and Python isn't even capable (generally, based on today's tools) of processing raw Big Data.

      That's funny, you should tell that to the guys that run Python on 60,000 core supercomputers.

    2. Re:Big Data != Analytics by Anonymous Coward · · Score: 0

      To process "Big Data" typically requires (today, based on available tools, not inherent language advantages) JVM-based tools, namely Hadoop or GridGain, and distributed data processing tasks on those platforms require Java or Scala. Both of those platforms leverage the uniformity of the JVM to launch distributed processes across a heterogeneous set of computers.

      I think this is why there is funding. People are tired of trying to use a (painful) map-reduce framework for every problem. No one chooses to use Hadoop. They use it because it's all there currently is.

    3. Re:Big Data != Analytics by Anonymous Coward · · Score: 0

      The irony with this story is that Python is useful for in-memory processing, and not "Big Data" per se.

      I RTFA (shocking I know) and they said they chose Python because the language is easy to learn and they don't expect data analysts to be hacker geeks. So they are adding "Big Data" features to Python.

      So even if you are right they are working to fix the exact problem you are complaining about!

  26. May I have a word by Anonymous Coward · · Score: 0

    Python isn't fit to run on a large cluster to simulate things, too much overhead.

    Have you heard of Stackless Python? Your presumption that Python isn't fit for large clusters to simulate things may be news to the largest single instance human particapatory simulation ever done: New Eden.

    1. Re:May I have a word by solidraven · · Score: 1

      You're comparing two very different tasks. A game and a large simulation are very different things. Lets compare two extremes: EVE online and the FDTD algorithm (EM field solver). EVE Online has a lot of conditionals. It's very unpredictable in memory usage. But the FDTD algorithm has very different properties. It needs a lot of data, but there are no conditional expressions. Additionally what's needed from the memory is known long before it's ever needed. It just goes over the data every pass without analysing it. It simply does calculations. Do you see how this can be done efficiently on a pipelined CPU? You can ensure the data shows up on the right spot at the right time. The Fortran compiler tries to analyse the implemented algorithm and optimize these sort of things, that's where it strength lies. The same sort of compiler would be very difficult to write for Python.

    2. Re:May I have a word by Anonymous Coward · · Score: 0

      The Fortran compiler tries to analyse the implemented algorithm and optimize these sort of things, that's where it strength lies. The same sort of compiler would be very difficult to write for Python.

      Not true. Many existing Python projects already optimize array and matrix computations, building on top of the array infrastructure provided by Numpy. (e.g. NumExpr and Theano and the like.) The Blaze project is about providing a better and more general array description than what FORTRAN can easily describe (e.g. ragged arrays, variable length strings, etc.) while also supporting out-of-core processing, distributed arrays, and PGAS models. Numba is building on existing compilation approaches in the LLVM and Python worlds (including things like minivect) and targeting GPUs and vectorized CPUs.

  27. Imagine the research if we took all lobbying by tyrione · · Score: 1

    cash and put it to advancing applied sciences to better the nation. We piss billions down the drain marketing to morons and yet whine about spending billions on DARPA, DoE and whatnot. This county is truly too stupid for its own well-being.

  28. Re:Great. Just Great by luis_a_espinal · · Score: 0

    What is this APRANET thing? It sounds like some useless crap loaded acronym to me.

    You gotta be fucking kidding me. Either you are trolling or you are completely clueless about technology. In the case of the later, it begs the question what are you doing in /. If you don't know what ARPANET you should be posting in MySpace instead of posting on a nerd/tech news site. It'd be like me posting opinions on a medicine-related site without knowing the meaning of the word 'penicilin'.

  29. GIL is a non-issue. by luis_a_espinal · · Score: 1

    I think you're right. I love Ruby, it's a very fun and effective language, I could write it in my sleep but there are so many cool projects that are written in Python. Those languages are *very* similar, and it's a shame that so much effort is being divided between communities. I might get to learn Python one day but I'm afraid I'd become a so-so programmer in both languages.

    Both languages suffer from the global interpreter lock defect and will require a rewrite in the next 5-10 years if the languages have any chance of surviving in the servers.

    Gee, because there are no distributed enterprise solutions written on Python or Ruby <rolls eyes/>

    It will take some very serious, dedicated, low level work and I just don't see it happening.

    It already has happened. The solutions aren't just in the mainstream versions, though. Take Jython. On a typical JVM, it is the fastest Python in-the-trenches implementation available. Throw that over specialized Java-focused hardware (like the Azul Vega 3), and you are on fire.

    Furthermore, a solution to the GIL problem is not necessary in the general case. In any modern system, the cost of communicating processes vs threads is no longer so much of an issue as it was a decade ago. Depending on the nature of computation, context switching between processes can be as cheap as switching between threads, and the former is typically somewhat (but no completely free) of the locking issues that are experienced with threading paradigms as seen in, say, Java/JEE solutions.

    In the back-end server arena where the greatest bottlenecks are those between http servers, app servers and database servers, there are so many, tried and true solutions to the so-called GIL process that it typically renders it as a non-issue. More processes per box, more RAM and SDDs, more boxes collocated on the same subnets running more processes, all communicating with some type of messaging queue. For these typical solutions, the issue of the GIL get blurred into non-existence.

    It's only for those applications where you have to squeeze every last drop out of your cores that the GIL becomes an issue, and where Java/JEE shines. But for the typical bizneyty application, a platform with a GIL issue does just fine by simply scaling horizontally.

    I have this fantasy where Guido and Matsumoto will sit down and write the common code together for a super-interpreter that will handle different syntax in a modular way. I know it's technically possible since GCC is doing something very similar but, again, I just can't see this happening.

    In the meantime, Go is looking mighty good...

    Google Go looks mighty good... for systems-level programming. That's what Google intended it to be. For app development, sorry, you need more than a language. You need a tried and true app stack. Until that happens (and it will take some time for that to happen), Java, Python, Ruby and even .NET do more than fine.

    You need more than the language (however greatly designed it might be) to make potentially complex domain-specific shit happen.

    1. Re:GIL is a non-issue. by Anonymous Coward · · Score: 0

      Java/JEE never shines. It is total crap.

    2. Re:GIL is a non-issue. by Anonymous Coward · · Score: 0

      Gee, because there are no distributed enterprise solutions written on Python or Ruby <rolls eyes/>

      I think you've missed the next 5-10 years part. What I'm saying is that python and ruby are used right now not because they're good but because there's nothing really better out there right now that is ready for quick development cycles (web servers).

      It already has happened. The solutions aren't just in the mainstream versions, though. Take Jython. On a typical JVM, it is the fastest Python in-the-trenches implementation available. Throw that over specialized Java-focused hardware (like the Azul Vega 3), and you are on fire.

      Typical JVM... specialized Java-focused hardware... Python is already an interpreted language so adding another abstraction layer in the form of the jvm is hardly what I would call solution. It's what ? 2, maybe 3 years now that CPUs haven't seen mainstream speed improvements ? The hardware can't keep getting better indefinitely and eventually all those abstraction layers will need to be taken back to C. Python still has a place in the userland, prototyping and even the enterprise servers arena since C++ and Java are really awful by comparison, but it needs to resolve it's problems.

      Furthermore, a solution to the GIL problem is not necessary in the general case. In any modern system, the cost of communicating processes vs threads is no longer so much of an issue as it was a decade ago. Depending on the nature of computation, context switching between processes can be as cheap as switching between threads, and the former is typically somewhat (but no completely free) of the locking issues that are experienced with threading paradigms as seen in, say, Java/JEE solutions.

      This "feature not bug" approach isn't helping. You can't keep throwing IPC on the kernel and the userland and expect miracles. Threading is already costly enough in comparison to say, coroutines, but to multi process everything regardless of how good the kernel can handle it is just crazy.

      In the back-end server arena where the greatest bottlenecks are those between http servers, app servers and database servers, there are so many, tried and true solutions to the so-called GIL process that it typically renders it as a non-issue. More processes per box, more RAM and SDDs, more boxes collocated on the same subnets running more processes, all communicating with some type of messaging queue. For these typical solutions, the issue of the GIL get blurred into non-existence.

      Yes. The more RAM thing. 3 years ago I said to a friend of mine if the GIL isn't sorted soon python is headed down the drain in a couple of years. I just didn't see RAM getting this cheap and plentiful this quickly. I admit, throwing more hardware at the problem can work. In real life things rarely scale expediently so I can definitely see 3 more years of rising RAM capacity without excessive costs. But consider this, for how long ? While servers keep needing more and more computing and memory, end users have stopped adding more ram at about 4-8gb a few years ago. As demand drops, for how long can the servers afford those hardware solutions ? I can't predict the future regarding the market but I can tell you that Go and whatever else is around the corner shouldn't take more than 2-4 years to catch up on library bindings at least so python and ruby will be competing very hard against growing odds.

      Google Go looks mighty good... for systems-level programming. That's what Google intended it to be. For app development, sorry, you need more than a language. You need a tried and true app stack. Until that happens (and it will take some time for that to happen), Java, Python, Ruby and even .NET do more than fine.
      You need more than the language (however greatly designed it might be) to make potentially complex domain-specific shit happen.

    3. Re:GIL is a non-issue. by luis_a_espinal · · Score: 1

      Java/JEE never shines. It is total crap.

      That's an invective, not an argument. Now go back and finish your programming homework.

  30. Pypy by Anonymous Coward · · Score: 0

    So how much of that $3M will go towards development of the NumPy port to Pypy? I'm guessing 0%, which is unfortunate, since that is one of the best places to push the state of the art in speed for numerical processing with python. The Pypy community has the modest goal of raising $60k for that work (just 2% of the grant to this company), and they are still only 3/4 of the way to achieving those funds after a year with their shingle out.

    http://pypy.org/numpydonate.html

    1. Re:Pypy by Anonymous Coward · · Score: 0

      So how much of that $3M will go towards development of the NumPy port to Pypy? I'm guessing 0%, which is unfortunate, since that is one of the best places to push the state of the art in speed for numerical processing with python.

      I guess they didn't want to fragment Python and re-invent the wheel. "We are way behind, but please fund us...uh, no we don't even have numpy working on pypy and you have to recompile all of your C extension modules."

  31. Status: Won'tWork by Anonymous Coward · · Score: 0

    Speaking as someone who's been employed by Python nearly a decade, and prior to that was involved in porting scientific Fortran to C & Java (Dear Fortran guys -- I so sorry, it was the job the idiots paid for because management thought Fortran was dying).

    It won't work.

    It's not that Python can't do it. It's that without a real programmer, python is slow. Even with a real programmer, it's slower -- but that's /often/ recoverable in many ways, particularly in development time.

    Scientists that don't code have an easier time learning python. Scientists that do code (well) can learn python, but are often going to want to move into other languages because they *always* want more data and more refined models. I've seen them learn java and c -- but that's a total nightmare, worse than even python.

    Contrast to a friend who thinks they can program but can't even fizzbuzz -- they have a dataset that they think is too slow for python. It is too slow the way they do it, but they've copy-pasted a O(log log N) algorithm so badly it's at least O( N log(N)) . Going out of asymptotics, there really is a constant of about "5" before that for all the extra iterations and wholly unecessary subdivisions they do, plus the output is total shit because they don't understand what it means to working in floating point. So a process that I can finish on my desktop in a few hours as long as I have enough RAM takes them three weeks to run on a server.

    The thing runs -- except for the 10% of the data they drop, but it's a wholly unreadable mess.

    Some of the people that want to do this are "real programmers" -- but many are scientists that just want a visualization and don't give a damn what tool does it as long as the output looks like what they think it should.

    They're the same researchers that cut and paste from stackoverflow or expert sexchange, and who just drag and drop code around in notepad trying to get rid of errors.

    They'll get an example given a CSV to make a beautiful clustergraph from examples or a friend that knows it, but they'll still develop deeply flawed research and modeling code and never know why or catch it.

    Doing this in python may make some of the analysis more accessible as a whole, but it won't fix the 'problem' that most scientists can't actually program.

    Maybe they shouldn't have to -- but somebody does.

    The problem is really best summed on when describing a bug to a new programmer that wasn't great at math, and clearly used to having a single error mess them up. They figured they could change one thing and fix a totally flawed algorithm...

    "Just tell me what line the bug is on"

    The answer was : "All but these two".

    To the new-non-programmer...this answer was inconceivable. They 'knew' what they told the computer to do, and it was being unreasonable in interpreting their source according to the rules of the language. There had to be one line to fix it -- the notion that the fundamental structure of their logic was wrong was so counterintuitive they didn't believe it even when pointed out.

  32. Re:Great. Just Great by Anonymous Coward · · Score: 0
  33. 110 reasons to pick Python over Matlab by naroom · · Score: 1
    1. Re:110 reasons to pick Python over Matlab by Murdoch5 · · Score: 1

      Well it's group voted on so it's not like I can argue the list. How ever that being said, Matlab or any mathematical computing language is still better suited for big data, the lack of skill of a programmer should never be blamed on the language, it's an easy way out.

    2. Re:110 reasons to pick Python over Matlab by naroom · · Score: 1

      You may not be familiar with SciPy / NumPy. They are the scientific computing side of Python. They support matrix operations and linear algebra at least as well as Matlab does. Underneath, both NumPy and Matlab are just LAPACK anyway. Here, have a relevant wiki article.

    3. Re:110 reasons to pick Python over Matlab by Anonymous Coward · · Score: 0

      Matlab or any mathematical computing language is still better suited for big data

      Python with its scientific computing libraries is on par with those. Real Science gets done every day on that platform, and it's not like those scientists weren't handed the same Matlab crackpipe when they were students.

      the lack of skill of a programmer should never be blamed on the language

      Computer languages are tools. If a particular language's concept model is a poor fit for the tasks that certain users want to do, the right answer is not to worship the tool and blame the user. Instead, make better tools. That's what DARPA is funding.

    4. Re:110 reasons to pick Python over Matlab by Anonymous Coward · · Score: 0

      the lack of skill of a programmer should never be blamed on the language, it's an easy way out.

      Definitely agree, but how does matlab distribute data which is too big to fit into memory, too big to fit on one disk, too big to fit on one machine? Furthermore, how is computation on distributed data parallelized? I don't think they've generalized these solutions.

  34. Re:Great. Just Great by Anonymous Coward · · Score: 0

    The home of only four character TLD suffix .arpa

    By the way, did you know Bill Joy wrote the BSD IP-stack in one weekend? :-)

  35. good lord no. by luis_a_espinal · · Score: 1

    Well.. there's C, of course...

    I work with C and C++ on a daily basis, and I have to ask/answer: For parallelized scientific computation or data crunching? No thank you. You don't use a phillips screw driver to unscrew a hexagonal bolt, do you? Know your tools, their strenghts and limitations.

    1. Re:good lord no. by nr · · Score: 1

      I disagree, Fortran and C is very good for parallel scientific computations. If you are doing computations you care about speed, and the closer you are to the iron (and the os) the faster it runs and more work you can do in a time unit. You have nice tools like OpenMP, UPC, Cilk and MPI, etc. Posix SHM is the best for local IPC/RPC.

      Python may be a nice lang to work with, but it is a slow dog.

  36. depends by luis_a_espinal · · Score: 1

    Yeah, the issue is that Python is pretty hard to sandbox, being the hugely dynamic language it is.

    Forgive me but JavaScript is also hugely dynamic. How does this prevent effective sand boxing in the general sense?

    I imagine it would take a lot to get the browsers to stop working on their JavaScript implementations that they have sunk insane amounts of time and effort into, and start something brand new.

    Another solution is to program in a subset of Python that gets verified at compile time with additional restrictions, and then compiled into JavaScript (the way CoffeeScript does.) That way we re-capture the investment already made in browser-side JavaScript technology.

    Trust me, I'd love to see it happen, but I don't think it will.

    That sounds more like a solution looking for a problem. No need to reinvent the browser vm wheel. Reuse what's there to greatest extend possible and get the best ROI.

    It might not sound as cool as re-inventing browser script vm technology, but it is certainly a more pragmatic solution for which working precedents already exist. Plus, it's not as if it were trivial. Language-to-language compilers are fertile ground for very cool experimentation.

  37. Re:Great. Just Great by luis_a_espinal · · Score: 1

    Poe's Law.

    In /., you never know.

  38. They should funded... by Anonymous Coward · · Score: 0

    ...this to Julia as it is made for number science anyway.

  39. Re:Great. Just Great by blueskies · · Score: 1

    Yeah the govt needs better systems to manage the huge databases and dossiers they are building on everybody with their warrentless wiretaps and reading everybody's emails. Anybody who helps with this project is pretty damn naive if they don't think it will also be used for this.

    Isn't this true of all useful open source projects?

  40. Re:Great. Just Great by blueskies · · Score: 1

    You have no idea what he's talking about? It was pretty clear: factions within the US government wants these tools to datamine all the ISP data they have been snarfing up so they can spy on everyone in the world. Saying that you believe otherwise is a pretty extreme view

    He has no idea why there is ranting about open source code that everyone in the world can use for any purposes. Did you rant about git being open source? I'm betting the gov't can use that to manage code related to data mining. Do you rant about postgres or any of the databases used by the US gov't? Would postgres suddenly become evil because the gov't threw some money their way?

  41. Re:Great. Just Great by Anonymous Coward · · Score: 0

    So what you are saying is that DARPA funds will be used in a way to further the goals of DARPA/The government? Shocking.

    First rule of Slashdot: never, ever, EVER miss the opportunity to be a condescending ass. It's much more important than the point being made, after all.

    Which, since you seem confused, the point was they want OUR HELP. No, says I, they can work on their rights-infringing projects without my personal assistance. You see, actively assisting tyranny would be the opposite of furthering MY goals.

    See how simple that is? You aren't that stupid. You are, however, too insecure and thus too eager to portray another as the fool, and this stops you from realizing what is being said. After all, the other guy is a fool so you MUST interpret what he said in the dumbest way possible. Then you can cry about how dumb he was.

    It's sort of like making an idol with your own two hands, and then bowing down and worshipping the idol you have made. You have to forget that you did in fact make it. That's how it is when you assume you're so smart and everyone else is so stupid. You have to forget that you took it upon yourself to assume that. Then you can believe in it.

    People like you are why mature, rational adult conversations are so hard to find these days. The sense of worth you're looking for is found within yourself by cultivating a healthy mind and spirit and an attitude of joy and appreciation. You will never have satisfaction, fulfillment, or security by playing the condescending ass. You will only miss the point being made.

  42. More details and discussion... by Anonymous Coward · · Score: 0

    See Travis Oliphant's announcement about this on the numpy-discussion list: http://comments.gmane.org/gmane.comp.python.numeric.general/52397

  43. Neither LANGUAGE has a GIL by DragonWriter · · Score: 1

    Both languages suffer from the global interpreter lock defect and will require a rewrite in the next 5-10 years if the languages have any chance of surviving in the servers.

    No, they don't. The CPython and MRI/YARV implementations of Python and Ruby, respectively, have global interpreter locks, but those are implementation quirks not language features. On the Python side, IronPython and Jython don't have a GIL, on the Ruby side neither JRuby, MacRuby, IronRuby nor Rubinius (the latter being particularly important, because it has been widely suggested as the next mainline Ruby platform, in the same way that the YARV-based Ruby, which replaced the old mainline interpreter from 1.9, was prior to 1.9) have a GIL.

    Further, I'm not sure the GIL is that big of an issue going forward: threadsafe native code in the runtime or extensions can release the GIL, and directly using native system threads with shared mutable state at the application level rather than using isolated task abstractions at the application level with threads managed in the runtime doesn't seem to be all that great a way to build scalable application code (there is a reason why languages designed specifically for scalable concurrency often don't directly expose threading at the language level: this is true both of newer languages like Go and Rust, and older and more widely used concurrency-focussed languages like Erlang.)

    Finally, insofar as the GIL is important, its not like a ground-up rewrite that starts from square one would need to be done to get rid of it in the next 5-10 years: the mainline interpreters have been working on improving the thread-safety of the underlying code for years with the intent of removing the GIL in both CPython and MRI, and as noted previously, alternative implementations of the languages have already been built that don't have a GIL -- so the work of the "rewrite" has already been done and is available (multiple times, for each language.)

  44. Re:Great. Just Great by Anonymous Coward · · Score: 0

    The home of only four character TLD suffix .arpa

    Other than, you know, these:

    • .aero
    • .asia
    • .coop
    • .info
    • .jobs
    • .mobi
    • .name
    • .post

    Reference: http://en.wikipedia.org/wiki/List_of_Internet_top-level_domains