Slashdot Mirror


Ask Slashdot: Best Language To Learn For Scientific Computing?

New submitter longhunt writes "I just started my second year of grad school and I am working on a project that involves a computationally intensive data mining problem. I initially coded all of my routines in VBA because it 'was there'. They work, but run way too slow. I need to port to a faster language. I have acquired an older Xeon-based server and would like to be able to make use of all four CPU cores. I can load it with either Windows (XP) or Linux and am relatively comfortable with both. I did a fair amount of C and Octave programming as an undergrad. I also messed around with Fortran77 and several flavors of BASIC. Unfortunately, I haven't done ANY programming in about 12 years, so it would almost be like starting from scratch. I need a language I can pick up in a few weeks so I can get back to my research. I am not a CS major, so I care more about the answer than the code itself. What language suggestions or tips can you give me?"

5 of 465 comments (clear)

  1. Re:Python by shutdown+-p+now · · Score: 5, Interesting

    a complete answer would be Python and C++, because numpy/scipy can't do everything and Python is still very slow for number-crunching.

    The problem with using the mix (when you actually write the C++ code yourself) is that debugging it is a major pain in the ass - you either attach two debuggers and simulate stepping across the boundary by manually setting breakpoints, or you give up and resort to printf debugging.

    OTOH, if Windows is an option, PTVS is a Python IDE that can debug Python and C++ code side by side, with cross-boundary stepping etc. It can also do Python/Fortran debugging with a Fortran implementation that integrates into VS (e.g. the Intel one).

    (full disclosure: I am a developer on the PTVS team who implemented this particular feature)

  2. Java Java! by Latent+Heat · · Score: 3, Interesting
    For research engineering, I use Java to run the numerical examples of the algorithms I develop although most of the authors in the journals I publish in are using Matlab for this purpose (ewwwwww!). Long time ago I was a Turbo Pascal person as were engineering colleagues who crossed over to Matlab seeking the same kind of ease-of-use. Me, I transitioned to Delphi but now I am with Java and Eclipse -- the Turbo Pascal of the 21st century.

    For numeric-intensive work, I can get within 20% of the speed of C++ using the usual techniques -- minimize garbage collection by allocating variables once, use the "server" VM, perform "warmup" iterations in benchmark code to stabilize the JIT. I use the Eclipse IDE, copy and paste numeric results from the Console View into a spreadsheet program, and voila, instant journal article tables.

  3. Re:FORTRAN by Frosty+Piss · · Score: 5, Interesting

    Clearly you are not involved in serious science.

    And if you think FORTRAN is some ancient esoteric languge, you're ignorent as well. The most recent standard, ISO/IEC 1539-1:2010, informally known as Fortran 2008, was approved in September 2010.

    Fortran is, for better or worse, the only major language out there specifically designed for scientific numerical computing. It's array handling is nice, with succinct array operations on both whole arrays and on slices, comparable with matlab or numpy but super fast. The language is carefully designed to make it very difficult to accidentally write slow code -- pointers are restricted in such a way that it's immediately obvious if there might be aliasing, as the standard example -- and so the optimizer can go to town on your code. Current incarnations have things like coarray fortran, and do concurrent and forall built into the language, allowing distributed memory and shared memory parallelism, and vectorization.

    The downsides of Fortran are mainly the flip side of one of the upsides mentioned; Fortran has a huge long history. Upside: tonnes of great libraries. Downsides: tonnes of historical baggage.

    If you have to do a lot of number crunching, Fortran remains one of the top choices, which is why many of the most sophisticated simulation codes run at supercomputing centres around the world are written in it. But of course it would be a terrible, terrible, language to write a web browser in. To each task its tool.

    --
    If you want news from today, you have to come back tomorrow.
  4. C/C++ by ericcc65 · · Score: 3, Interesting

    I'm a MSEE and I've been working in the digital signal processing realm for the last 10 years since graduating. I should mention that I haven't done a lot of low level hardware work, I haven't programmed actual DSP cards or played with CUDA. I have written software that did real-time signal processing just on a GPU. Everyone in my industry at this point uses C or C++. There is some legacy FORTRAN, and I shudder when I have to read it. Some old types swear by it, but it's fallen out of favor mostly just because it's antiquated and most people know C/C++ and libraries are available for it.

    For non-real-time prototypes I'd recommend learning python (scipy, numpy, matplotlib). Perhaps octave and/or Matlab would be useful as well.

    At some point you have to decide what your strength will be. I love learning about CS and try to improve my coding skills, but it's just not my strength. I'm hired because of my DSP knowledge, and I need to be able to program well enough to translate algorithms to programs. If you really want to squeeze out performance then you'll probably want to learn CUDA, assembly, AVX/SSE, and DSP specific C programming. But I haven't delved to that level because, honestly, we have a somewhat different set of people at the company that are really good in those realms.

    Of course, it would be great if I could know everything. But at the moment it's been good enough to know C/C++ for most of our real time signal processing. If something is taking a really long time, we might look at implementing a vectorized version. I would like to learn CUDA for when I get a platform that has GPUs but part of me wonders if it's worth it. The reason C/C++ has been enough so far is that compilers are getting so good that you really have to know what you're doing in assembly to beat them. Casual assembly knowledge probably won't help. I might be wrong, but I envision that being the case in the not too distant future with GPUs and parallel programming.

  5. Re:Python by Jane+Q.+Public · · Score: 3, Interesting

    "This is certainly the way of the future, not just for gene sequencing but many other quantitative sciences, although a complete answer would be Python and C++, because numpy/scipy can't do everything and Python is still very slow for number-crunching."

    I mostly agree with your conclusion, but for somewhat different reasons. I don't believe Python is "the wave of the future", but rather I'd recommend it because it has been in use by the scientific community for far longer than other similar languages, like Ruby. Therefore, there will be more pre-built libraries for it that a programmer in the sciences can take advantage of.

    I also agree that some C should go along with it, for building those portions of the code that need to be high performance. I would choose C over C++ for performance reasons. If you need OO, that's what Python is for. If you need performance, that's what the C is for. C++ would sacrifice performance for features you already have in Python.

    If it were entirely up to me, however -- that is to say, if there weren't so much existing code for the taking out there already -- I'd choose Ruby over Python. But that's just a personal preference.