Slashdot Mirror


Ask Slashdot: Best Language To Learn For Scientific Computing?

New submitter longhunt writes "I just started my second year of grad school and I am working on a project that involves a computationally intensive data mining problem. I initially coded all of my routines in VBA because it 'was there'. They work, but run way too slow. I need to port to a faster language. I have acquired an older Xeon-based server and would like to be able to make use of all four CPU cores. I can load it with either Windows (XP) or Linux and am relatively comfortable with both. I did a fair amount of C and Octave programming as an undergrad. I also messed around with Fortran77 and several flavors of BASIC. Unfortunately, I haven't done ANY programming in about 12 years, so it would almost be like starting from scratch. I need a language I can pick up in a few weeks so I can get back to my research. I am not a CS major, so I care more about the answer than the code itself. What language suggestions or tips can you give me?"

76 of 465 comments (clear)

  1. Python by curunir · · Score: 4, Insightful

    I have a friend who works for a company that does gene sequencing and other genetic research and, from what he's told me, the whole industry uses mostly python. You probably don't have the hardware resources that they do, but I'd bet you also don't have data sets that are nearly as large as theirs are.

    You might also get better results from something less general purpose like Julia, which is designed for number crunching.

    --
    "Don't blame me, I voted for Kodos!"
    1. Re:Python by the+gnat · · Score: 4, Insightful

      the whole industry uses mostly python

      This is certainly the way of the future, not just for gene sequencing but many other quantitative sciences, although a complete answer would be Python and C++, because numpy/scipy can't do everything and Python is still very slow for number-crunching. It's best to start with just Python, but eventually some C++ knowledge will be helpful. (Or just plain C, but I can't see any good reason to inflict that on myself or anyone else.)

    2. Re:Python by Anonymous Coward · · Score: 4, Insightful

      Python is the new VB.

    3. Re:Python by Garridan · · Score: 5, Informative

      I use Sage. When Python isn't fast enough, I can essentially write in C with Cython. It's gloriously easy. Have some trivially parallelizable data mining? Just use the @parallel decorator. Sage comes with a slew of fast mathematical packages, so your toolbox is massive, and you can hook it all in to your Cython code with minimal overhead.

    4. Re:Python by shutdown+-p+now · · Score: 5, Interesting

      a complete answer would be Python and C++, because numpy/scipy can't do everything and Python is still very slow for number-crunching.

      The problem with using the mix (when you actually write the C++ code yourself) is that debugging it is a major pain in the ass - you either attach two debuggers and simulate stepping across the boundary by manually setting breakpoints, or you give up and resort to printf debugging.

      OTOH, if Windows is an option, PTVS is a Python IDE that can debug Python and C++ code side by side, with cross-boundary stepping etc. It can also do Python/Fortran debugging with a Fortran implementation that integrates into VS (e.g. the Intel one).

      (full disclosure: I am a developer on the PTVS team who implemented this particular feature)

    5. Re:Python by shutdown+-p+now · · Score: 5, Insightful

      Python is VB done right.

    6. Re:Python by SJHillman · · Score: 5, Funny

      VB is feeding your scrotum to a python.

    7. Re:Python by rwa2 · · Score: 4, Informative

      Yes, I did my master's thesis using simpy / scipy, integrated with lp_solve for the number crunching , all of which was a breeze to learn and use. It was amazing banging out a new recursive algorithm crawling a new object structure and just having it work the first time without spending several precious cycles bugfixing syntax errors and chasing down obscure stack overflows.

      I used the psyco JIT compiler (unfortunately 32-bit only) to get ~100x boost in runtime performance (all from a single import statement, woo), which was fast enough for me... these days I think you can get similar boosts from running on PyPy. Of course, if you're doing more serious number crunching, python makes it easy to rewrite your performance-critical modules in C/C++.

      I also ended up making a LiveCD and/or VM of my thesis, which was a good way of wrapping up the software environment and dependencies, which could quickly grow outdated in a few short years.

    8. Re:Python by dmbasso · · Score: 3, Insightful

      The problem with using the mix (when you actually write the C++ code yourself) is that debugging it is a major pain in the ass

      Only if you don't use the C/C++ code as an independent module, as it should be. If you *must* debug it in parallel, you're designing it wrong.

      --
      `echo $[0x853204FA81]|tr 0-9 ionbsdeaml`@gmail.com
    9. Re:Python by Anonymous Coward · · Score: 3, Insightful

      VB is closed-source trash.

    10. Re:Python by shutdown+-p+now · · Score: 2

      How do you write C++ code for use from Python such that it's not an independent module?

      Anyway, regardless of how you architecture it, in the end you'll have Python script feeding data to your C++ code. If something goes wrong, you might want to debug said C++ code specifically as it is called from Python (i.e. with that data). Even if you don't ever have to cross the boundary between languages during debugging, there are still benefits to be had from a debugger with more integrated support - for example, it can show Python representations of objects that were passed to your C++ code.

    11. Re:Python by ebno-10db · · Score: 5, Insightful

      Perl is still in wide use.

      Do not use Perl for this. I've been using Perl for 15-20 years, and I love it for "scripting", text processing, etc., but using it for scientific computing sounds like an exercise in masochism.

    12. Re:Python by wanax · · Score: 3, Informative

      Sage is okay for small-midsize projects, as is R (both benefit from being free).. on the whole though, I'd really recommend Mathematica, which is purpose-built for that type of project, makes it trivial to parallelize code, is a functional language (once you learn, I doubt you'll want to go back) and scales well up to fairly large data sets (10s of gigs).

    13. Re:Python by RDW · · Score: 4, Informative

      I have a friend who works for a company that does gene sequencing and other genetic research and, from what he's told me, the whole industry uses mostly python.

      I think your friend is mistaken. Though it's essential to know a scripting language, most of the computationally expensive stuff in sequence analysis is done with code written in, as you might expect, C, C++, or Java. Perl and Python are used more for glue code, building analysis pipelines, and processing the output of the heavy duty tools for various downstream applications. R is used heavily for statistics, and especially for anything involving microarrays.

    14. Re:Python by Jane+Q.+Public · · Score: 3, Interesting

      "This is certainly the way of the future, not just for gene sequencing but many other quantitative sciences, although a complete answer would be Python and C++, because numpy/scipy can't do everything and Python is still very slow for number-crunching."

      I mostly agree with your conclusion, but for somewhat different reasons. I don't believe Python is "the wave of the future", but rather I'd recommend it because it has been in use by the scientific community for far longer than other similar languages, like Ruby. Therefore, there will be more pre-built libraries for it that a programmer in the sciences can take advantage of.

      I also agree that some C should go along with it, for building those portions of the code that need to be high performance. I would choose C over C++ for performance reasons. If you need OO, that's what Python is for. If you need performance, that's what the C is for. C++ would sacrifice performance for features you already have in Python.

      If it were entirely up to me, however -- that is to say, if there weren't so much existing code for the taking out there already -- I'd choose Ruby over Python. But that's just a personal preference.

    15. Re:Python by Garridan · · Score: 2

      I've used Sage on a supercomputer, chugging through hundreds of gigs of data. Do you know what you're talking about, or are you just recommending the shiny thing that you paid lots of money for?

    16. Re:Python by Joce640k · · Score: 4, Insightful

      Compared to C and C++, Fortran is actually more elegant for pure numerical computing.

      Unsurprising - that's what Fortran was designed for...!

      --
      No sig today...
    17. Re:Python by shutdown+-p+now · · Score: 3, Insightful

      No, it's a simple language that is easy for beginners to learn. But, unlike VB, it is not horribly designed, and is useful even once you grow out of the beginner phase.

    18. Re:Python by columbus · · Score: 2

      There are a lot of good suggestions in this discussion so far.

      I have a few points to add.
      1) compiled language vs scripting language
      In general, any compiled language is going to run faster than any scripting language. But you will probably spend more time coding and debugging to get your analysis running with a compiled language. It is useful to think about how important performance is to you relative to the value of your own time. Are you going to be doing these data mining runs repeatedly? Is it worth spending ten times as many hours getting this thing up and running if by doing so, you can get it to run really fast? If so, than chose a compiled language. You're already familiar with C so that would be a natural choice. If, after consideration, you value your development time more than processing time, stick with a scripting language. You'll probably be able to stand up a working program much faster & you can look for other ways to squeeze out extra performance

      2) Parallelism. Your initial question explicitly said you want to use all 4 cores on a Xeon, but I've only seen 1 response so far that addresses this issue. To get good performance out of multiple cores you may need to re-work your algorithms to split the problem into pieces and crunch them down in parallel. Is your problem one that is easily amenable to parallelization? If yes, then you probably want to start thinking about multi-thread or multi-process programming. If your program will never run on something bigger than 1 server, than you will probably be OK sticking with with single multi-threaded process. I don't have experience in this myself, but I've heard that writing your program in a functional language like Haskell will make it intrinsically easy to parallelize. If you ever think your program is going to run on something bigger than that Xeon server - let's say you're thinking of ramping up to a cluster, than I would suggest building it on top of MPI from the beginning. I've had good results getting something up and running on MPI quickly using a combination of python, NumPy, SciPy and mpi4py.

      Good Luck.

      --
      friends don't let friends teleport drunk
    19. Re:Python by Just+Some+Guy · · Score: 5, Funny

      I wrote some Perl that looked like the output of AES once.

      --
      Dewey, what part of this looks like authorities should be involved?
    20. Re:Python by techno-vampire · · Score: 2

      I agree with you that doing the number crunching is best in a language designed for that but I don't think C is the answer because it was primarily designed for systems programming, not numeric. If you really need efficient number crunching, go with FORTRAN, especially as the OP says that he already has experience with it.

      --
      Good, inexpensive web hosting
    21. Re:Python by rwa2 · · Score: 2

      Yep. High level languages such as python are great for letting you focus on the domain-specific task you want to accomplish without spending years learning all the little poorly-documented compiler-specific idiosyncrasies of compilers and preprocessors and template languages. Once you're through the prototyping phase and have your interface definitions and unit tests set up, you can then toss things one module at a time over to one of those software weenies to turn into hand-optimized production code. And they'll probably be happier since they don't have to tax their communications skills talking to project managers while trying to figure out what's going on from a nebulous requirements definition document.

  2. Fortran by Anonymous Coward · · Score: 2, Insightful

    sorry to say, but that is a fact

    1. Re:FORTRAN by Anonymous Coward · · Score: 2, Insightful

      Yeah, sure.

      So that no one can ever check your models or replicate your results even if you publish code and initial data.

    2. Re:Fortran by shutdown+-p+now · · Score: 2

      It depends on what exactly his computationally intensive part is. It may be something that can be trivially implemented in Python in terms of standard numpy operations, for example, with performance that's "good enough".

    3. Re:FORTRAN by Frosty+Piss · · Score: 5, Interesting

      Clearly you are not involved in serious science.

      And if you think FORTRAN is some ancient esoteric languge, you're ignorent as well. The most recent standard, ISO/IEC 1539-1:2010, informally known as Fortran 2008, was approved in September 2010.

      Fortran is, for better or worse, the only major language out there specifically designed for scientific numerical computing. It's array handling is nice, with succinct array operations on both whole arrays and on slices, comparable with matlab or numpy but super fast. The language is carefully designed to make it very difficult to accidentally write slow code -- pointers are restricted in such a way that it's immediately obvious if there might be aliasing, as the standard example -- and so the optimizer can go to town on your code. Current incarnations have things like coarray fortran, and do concurrent and forall built into the language, allowing distributed memory and shared memory parallelism, and vectorization.

      The downsides of Fortran are mainly the flip side of one of the upsides mentioned; Fortran has a huge long history. Upside: tonnes of great libraries. Downsides: tonnes of historical baggage.

      If you have to do a lot of number crunching, Fortran remains one of the top choices, which is why many of the most sophisticated simulation codes run at supercomputing centres around the world are written in it. But of course it would be a terrible, terrible, language to write a web browser in. To each task its tool.

      --
      If you want news from today, you have to come back tomorrow.
    4. Re:FORTRAN by Obfuscant · · Score: 4, Informative

      Upside: tonnes of great libraries.

      Those great libraries are spread across several different "FORTRAN"s. gfortran. gfortran44. Intel's fortran. f77. f90. PGI pgif90. etc. etc etc.

      Gfortran is woooonderful. It allows complete programming idiots to write functional code, since the libraries all do wonderful input error checking. Want to extract a substring from the 1 to -1 character location? gfortran will let you do it. Quite happily. Not a whimper.

      PGI pgif90 will not. PGI writes compilers that are intended to do things fast. Input error checking takes time. If you want the 1 to -1 substring, your program crashes. PGI assumes you know not to do something that stupid, and it forces you to write code that doesn't take shortcuts.

      So, if you get a program from someone else that runs perfectly for them, and you want to use it for serious work and get it done in a reasonable amount of time so you compile it with pgif90, you may find it crashes for no obvious reason. And then you have to debug seriously stupidly written code wondering how it could ever have worked correctly, until you find that it really shouldn't have worked at all. They want to extract every character in an input line up to the '=', and they never check to see if there wasn't an '=' to start with. 'index' returns zero, and they happily try to extract from 1 to index-1. Memcpy loves that.

      The other issue is what is an intrinsic function and what isn't. I've been bitten by THAT one, too.

      And someone I work with was wondering why code that used to run fine after being compiled with a certain compiler was now segment faulting when compiled with the same compiler, same data. Switching to the Intel compiler fixed it.

      Sigh. But yes, FORTRAN is a de-facto standard language for modeling earth sciences, even if nobody can write it properly.

    5. Re:FORTRAN by shutdown+-p+now · · Score: 2

      It's mainly due to more constraints that Fortran places on data structures, e.g. lack of aliasing, that let the optimizer do a better job.

    6. Re:FORTRAN by Anubis+IV · · Score: 2

      Not really. My first job while still green and fresh out of high school was an internship with Lockheed Martin, working on hundreds of thousands of lines of meteorological software code that was used by NASA and was written in FORTRAN. I went in without ever having seen it before in my life, and was able to pick it up easily enough so that I was productive within a couple of weeks. I recall that having the first few columns of each line reserved for special uses threw me off the first time I saw it, as did parsing data, but I got used to it easily enough. I later did that same internship the next summer, took a class on FORTRAN during my time at university, and later, while in grad school, ended up as the Teaching Assistant on that class the very last semester it was ever offered at my university.

      So, I think it's fair to say that I've been exposed to it more than most people in the under-30 crowd, though I've never been at a point where I'd consider it my primary language or my go-to language when I want to get something done. Even so, for what it's designed to do, it's hard to compete with it. I haven't had a reason to use it in at least five years, but were I involved in scientific computing that relied on number-crunching, I'd certainly consider it seriously. To not do so would be foolish, I think.

    7. Re:Fortran by ebno-10db · · Score: 2

      if you don't care about having your code be maintained or extended by anyone under age 30

      1. There are plenty of programmers over age 30.
      2. Someone who is 30 today, likely finished his BSc in 2005. Do you think Fortran was much more popular then?
      3. People under age 30 learn Fortran if they're involved in HPC. It's still widely used, and has advantages over C/C++ (easy, built-in parallelization, etc.).

      don't plan on doing any custom visualization beyond GNUplot

      There are lots of other programs you can use besides GNUplot. In serious HPC graphics are often considered a back end that runs separately from the main program, and sometimes on a different machine.

      don't care if you ever find employment outside of academia

      1. You don't know what his major is - he may care less about putting the programming language du jour on his resume. In fact he specifically said "I haven't done ANY programming in about 12 years ... I am not a CS major".
      2. If you do HPC, Fortran could be a very useful thing to put on your resume. All the more because it's obscure these days.

      Don't think that whatever kind of code you write is the be all and end all of programming.

    8. Re:Fortran by The_Wilschon · · Score: 2

      This is exactly the right answer. Never write code that someone else has already written. If you can compose standard operations to do your calculations, then do so in a high-level language. Spend more time thinking and less time coding. OTOH, if you need to code up something custom and you're REALLY sure that you can't use standard operations to do it, then think again about whether or not you can do it with standard operations. You probably can. But, if you can't, then go with FORTRAN. Or maybe C or even C++. But probably FORTRAN. But even then, code as little as you can in FORTRAN. Don't write the whole thing in FORTRAN. Create small operations, and compose them in a high-level language as if they were the standard operations.

      --
      SIGSEGV caught, terminating

      wait... not that kind of sig.
  3. English by Anonymous Coward · · Score: 4, Funny

    Obviously.

  4. FORTRAN by Frosty+Piss · · Score: 2, Insightful

    Seriously consider FORTRAN

    --
    If you want news from today, you have to come back tomorrow.
  5. More details? by schneidafunk · · Score: 3, Informative

    Depending on your needs, R may be your best bet if it is statistical processing you are interested in.

    --
    Some people die at 25 and aren't buried until 75. -Benjamin Franklin
    1. Re:More details? by Bovius · · Score: 4, Informative

      Second this. There are numerous languages out there that are tailor-made for specific kinds of problems. You didn't quite share enough to narrow down what kinds problems you need to solve, but the R project is geared toward number crunching, albeit with a significant bent toward statistics and graphic display.

      http://www.r-project.org/

      If that's not pointed in the right direction, some other language might be. Alternatively, there are a lot of libraries out there for the more popular languages that could help with what you're doing. Heck, 12 years ago we didn't even have the boost libraries for C++. It's difficult for me to imagine using that language with out them now.

    2. Re: More details? by jonnyj · · Score: 2

      R is by far the best solution that I've found for statistical analysis and data mining. It's ugly, inconsistent, quirky and old fashioned but it's absolutely brilliant.

      The whole syntax of R is based around processing data sets without ever needing to worry about loops. Read up on data tables - not data frames - in R and you'll learn how to filter data, aggregate it, add columns, perform a regression and beautifully plot the results all in one line of code. The Zoo package will sort out your time series analysis and longitudinal analysis. With R, you can calculate the statistical significance of you hypotheses and apply the model you've developed to your hold-out sample using built-in functions. And the concept of workspacecd means that you don't need to think of funky ways to store your interim results.

      Using knitr, R will produce publication quality documents and presentations. ggplot will give you the best data visualisation tools in the business.

      R is the tool that has been purpose-built for the task in front of you. Anything else might be easier to learn or more widely supported - but it won't be as effective.

  6. What are you doing? by RichMan · · Score: 3, Informative

    What do you mean by scientific computing?

    Modelling: Hard core finite element simulations or the like. Then C or Fortran and you will be linking with the math libraries.
    Log Processing: A lot of other stuff you will be parsing data logs and doing statistics. So perl or python then octive.
    Data Mining: Python or other SQL front end.

    1. Re:What are you doing? by UnknowingFool · · Score: 3, Informative

      Well if your problems require statistical computing, R is the language to use. For general scientific computing, the last I checked Octave was still valid. As for multi-core processing only a few languages and compilers support platforms like Open MP. Fortran, C, and C++.

      --
      Well, there's spam egg sausage and spam, that's not got much spam in it.
    2. Re:What are you doing? by TheCarp · · Score: 2

      It sounds like you are saying a more specific version of what I was going to post.

      A little research goes a long way and libraries may be more important than language. I don't care how nice the language is.... the less underlying mechanisms I need to implement, and the faster I can get into the meat of what I am working on, the better.

      If you want to do RSA encryption in your code (for example) your best bet is NOT to pick a language where you can't find an RSA implementation (Applesoft basic? lol not sure what that would be these days) and implement your own.

      Sure its not too bad, but any mistakes could sink you, and it means debugging and supporting yet more code....when you could be using a standard library that lots of other people use and has already had most of the kinks worked out, and gets updated on its own.

      Base languages are all exceedingly similar when you strip away the syntactical sugar. Its the varying quality of the different sections of their libraries that really set them apart in different areas.

      --
      "I opened my eyes, and everything went dark again"
    3. Re:What are you doing? by shutdown+-p+now · · Score: 2

      Well if your problems require statistical computing, R is the language to use.

      A lot of people seem to be pretty happy with Python+pandas lately.

      (and the advantage of going the Python way is that it's also a general purpose language that's useful elsewhere)

  7. IPython Notebook + Python Data Analysis Library by rla3rd · · Score: 3, Informative

    Install these 2 and you'll be good to go
    http://ipython.org/notebook.html
    http://pandas.pydata.org/

  8. what the rest of your team uses by peter303 · · Score: 4, Insightful

    You should all be sharing your codes to avoid rewriting and to perfect it.
    And if you are not a member of a team then I seriously question the quality of your graduate program.

  9. BAD TIM! BAD! by girlintraining · · Score: 5, Funny

    What language suggestions or tips can you give me?"

    Timothy, shame on you. You should know better than to start a holy war.

    --
    #fuckbeta #iamslashdot #dicemustdie
  10. 2 paths by johnjaydk · · Score: 3, Informative

    If you can find anything that resembles a math library with the correct tools then go with Python. Numpy is everyones friend here.

    If you have to do the whole thing from scratch then Fortran is the fastest platform. I can't say I've meet anyone who enjoyed Fortran but it's wicked fast.

    --
    TCAP-Abort
  11. Java Java! by Latent+Heat · · Score: 3, Interesting
    For research engineering, I use Java to run the numerical examples of the algorithms I develop although most of the authors in the journals I publish in are using Matlab for this purpose (ewwwwww!). Long time ago I was a Turbo Pascal person as were engineering colleagues who crossed over to Matlab seeking the same kind of ease-of-use. Me, I transitioned to Delphi but now I am with Java and Eclipse -- the Turbo Pascal of the 21st century.

    For numeric-intensive work, I can get within 20% of the speed of C++ using the usual techniques -- minimize garbage collection by allocating variables once, use the "server" VM, perform "warmup" iterations in benchmark code to stabilize the JIT. I use the Eclipse IDE, copy and paste numeric results from the Console View into a spreadsheet program, and voila, instant journal article tables.

    1. Re:Java Java! by Atzanteol · · Score: 4, Funny

      I tried out those benchmarks myself.

      Java:
      $ time java nbody 50000000
      -0.169075164
      -0.169059907

      real 0m8.863s
      user 0m8.820s
      sys 0m0.016s

      Not too shabby. But checkout the C++ times!
      $ time ./nbody.gpp-7.gpp_run
      Segmentation fault (core dumped)

      real 0m0.097s
      user 0m0.000s
      sys 0m0.000s

      OMG that's a ton faster!

      --
      "Ignorance more frequently begets confidence than does knowledge"

      - Charles Darwin
  12. Python, or ... by Kiliani · · Score: 2

    First suggestion: Python. Lot's of nice stuff for science (NumPy, SciPy), lots of other goodies, easy to learn, many people to ask or places to get help from. Plus you can explore data interactively ("Yes Wedesday, play with your data!").

    Beyond that: CERN uses a lot of Java (sorry folks, true), they have good (and fast) tools I do a project right now where I am using Jython since it is supported by the main (Java) software I have to use. I like jhepwork/SCaVis quite a bit, if you are into plotting stuff on Java.

    If you have extra free time and want to learn how to program well? I'd learn something like Smalltalk (for OOP concepts) and/or Haskell (functional programming). Scientists are often lousy programmers because they often do not learn programming properly, and/or the language allows them to get away with bad programming (I know, every language allows bad programmers to write bad code, but some make it easier than others).

    So, stick with Python, it works really well, is modern, and has good support. Plus you can read your code in 5 years time ...

    What do I program in? Python (and Jython), Perl, C, IDL (yickes!), Smalltalk, Matlab, Mathematica. I know some Lisp, but that's just for fun. And whatever allows me to load sketches on an Arduino. I like Python (get's stuff done) and Smalltalk (works actually like I think - passing messages between objects).

    Use whatever works and you don't hate :-)

    --
    Do your own thing. And overdo it!
  13. R-language by biodata · · Score: 4, Informative

    Most of the cutting edge data mining I've seen is done using R (which acts as a scripting wrapper for the C or Fortran code that the fast analysis libraries are coded in), or alternatively in python. Some people swear by MatLab if they have trained in it (so your octave would come in handy there). Have a look at some discussions at places like kaggle.com to see what the competitive machine learning community uses (if that is what you mean by data mining).

    --
    Korma: Good
    1. Re:R-language by green+is+the+enemy · · Score: 2, Insightful

      This is the correct advice: Use whatever language is most common in your research area, so you can benefit from the most existing source code. This will almost certainly be a high-level scripting language like R, MATLAB or Python, with the ability to drop down to C, FORTRAN and CUDA for the small parts of the code that need optimization. (In my case: electrical engineering = MATLAB + C and CUDA mex files)

  14. Profile by Arker · · Score: 5, Insightful

    A lot of people will propose a language because it is their favorite. Others because they believe it is very easy to learn. I will give you a third line of thought.

    I would not look for a language in this case, I would look for a library, then teach myself whatever language is easiest/quickest to access it. I would try to profile what you are building, figure out where the bottlenecks are likely to be (profiling your existing mockup can help here but dont trust it entirely) and try to find the best stable well-designed high performance library for that particular type of code.

    --
    =-=-=-=-=-=-=-=-=-=-=-=-=-=-
    Friends don't let friends enable ecmascript.
  15. Speed incarnate by Impy+the+Impiuos+Imp · · Score: 2

    If you're using VBA in Excel, you can speed it up a ton by putting this at the beginning of your function:

    Application.Calculation = xlCalculationManual

    And restore it with ...Automatic at the end.

    Do this at the top level with a wrapper function whose only purpose is to disable and enable that, calling the real function in between.

    If you want a real speedup, I am available for part time work in C or C++.

    --
    (-1: Post disagrees with my already-settled worldview) is not a valid mod option.
  16. My favorite is CnH2n+1OH by nanospook · · Score: 3, Funny

    It take all the work out of the computations..

    --
    Have you fscked your local propeller head today?
  17. Fortran + Python = F2PY by n1ywb · · Score: 4, Informative

    Better yet, Fortran + Python.

    http://docs.scipy.org/doc/numpy/user/c-info.python-as-glue.html#f2py

    I used it to wrap some crazy magnetometer processing code written in Fortran into a nice Python program. I ripped out all the I/O from the Fortran code and moved it into the Python layer. It worked great. Fortran is AWESOME at number crunching but SUCKS ASS at IO or well pretty much anything else, hence Python.

    --
    -73, de n1ywb
    www.n1ywb.com
    1. Re:Fortran + Python = F2PY by mvdw · · Score: 2

      OTOH, exactly how many hands do you have??

  18. Python, numpy, Pyvot by shutdown+-p+now · · Score: 4, Informative

    Since you mention VBA, I suspect that your data is in Excel spreadsheets? If you want to try to speed this up with minimum effort, then consider using Python with Pyvot to access the data, and then numpy/scipy/pandas to do whatever processing you need. This should give you a significant perf boost without the need to significantly rearchitecture everything or change your workflow much.

    In addition, using Python this way gives you the ability to use IPython to work with your data in interactive mode - it's kinda like a scientific Python REPL, with graphing etc.

    If you want an IDE that can connect all these together, try Python Tools for Visual Studio. This will give you a good general IDE experience (editing with code completion, debugging, profiling etc), and also comes with an integrated IPython console. This way you can write your code in the full-fledged code editor, and then quickly send select pieces of it to the REPL for evaluation, to test it as you write it.

    (Full disclosure: I am a developer on the PTVS team)

  19. matlab by smadasam · · Score: 3, Informative

    FORTAN used to be it back in the day, but now days Matlab is the stuff that many engineers use for scientific computing. Many of the math libraries are very good in Matlab and don't require you to be a computer scientist to make them run fast. I used to work with scientists in my old lab to port their Matlab code to run on HPC clusters porting them to FORTAN or C. Often the matlab libraries smoked the BLAS/Atlas packages that you find on Linux/UNIX machines for instance. The same would hold true for Octave since they just build on the standard GNU math pacakges like BLAS.

    1. Re:MATLAB by burdickjp · · Score: 2

      It's also not free, under any definition, and proprietary, meaning you're making your development dependent on the availability of MATLAB as a resource. Learning a free language, such as Python, would free you from the cost and availability restraint, and mean you are learning a more general, and thus more useful, language.

    2. Re:Matlab by theEnguneer · · Score: 2

      Matlab is great for testing out ideas, but it is slow compared to Python or C. Also, for doing Data Mining, Matlab is a poor choice because the whole point of Matlab is to make it so the user doesn't have to worry about variable/data storage, which is the thing that data miners need to optimize.

  20. C/C++ by ericcc65 · · Score: 3, Interesting

    I'm a MSEE and I've been working in the digital signal processing realm for the last 10 years since graduating. I should mention that I haven't done a lot of low level hardware work, I haven't programmed actual DSP cards or played with CUDA. I have written software that did real-time signal processing just on a GPU. Everyone in my industry at this point uses C or C++. There is some legacy FORTRAN, and I shudder when I have to read it. Some old types swear by it, but it's fallen out of favor mostly just because it's antiquated and most people know C/C++ and libraries are available for it.

    For non-real-time prototypes I'd recommend learning python (scipy, numpy, matplotlib). Perhaps octave and/or Matlab would be useful as well.

    At some point you have to decide what your strength will be. I love learning about CS and try to improve my coding skills, but it's just not my strength. I'm hired because of my DSP knowledge, and I need to be able to program well enough to translate algorithms to programs. If you really want to squeeze out performance then you'll probably want to learn CUDA, assembly, AVX/SSE, and DSP specific C programming. But I haven't delved to that level because, honestly, we have a somewhat different set of people at the company that are really good in those realms.

    Of course, it would be great if I could know everything. But at the moment it's been good enough to know C/C++ for most of our real time signal processing. If something is taking a really long time, we might look at implementing a vectorized version. I would like to learn CUDA for when I get a platform that has GPUs but part of me wonders if it's worth it. The reason C/C++ has been enough so far is that compilers are getting so good that you really have to know what you're doing in assembly to beat them. Casual assembly knowledge probably won't help. I might be wrong, but I envision that being the case in the not too distant future with GPUs and parallel programming.

  21. Quick suggestion... by MiniMike · · Score: 2

    Do you have access to MATLAB or a similar analysis tool? Many universities have licenses, and overall it seems like it might be a good choice for you. These programs usually have a lot of build-in functionality that will be difficult to reproduce if you are not an experienced scientific programmer.

    I haven't done ANY programming in about 12 years, so it would almost be like starting from scratch.

    This is probably a bigger problem than choosing which language to use. If you don't know how to program properly and efficiently, it doesn't matter which language you choose. If you go this route I'd suggest taking a course to refresh or upgrade your skills. Since you're familiar with C that might be a good language to focus on in the course. Another factor is if you have to work with any existing libraries it might limit your choices. I program in C, FORTRAN, and VB and find that for computationally intensive programs C is usually the best fit, sometimes FORTRAN, and never VB.

    1. Re:Quick suggestion... by umafuckit · · Score: 2

      NO.

      No Matlab. Not portable, not open, and it perpetuates a vendor lock-in for quantitative scientists/engineers every bit as bad and destructive as the stranglehold Windows has enjoyed on the desktop for decades.

      I think you're over-stating things a touch. Some of the core stuff is closed source but most of the functions are open, meaning that they are readable .m scripts. e.g. if you're worried about how MATLAB implements ANOVA then you read the file and check. You can modify if needed. So MATLAB is open enough in most normal usage scenarios. You're not really locked in given that we have Octave.

      Python is more readable, more enjoyable to code, has equivalent IDEs available (Spyder), far more user-friendly features, you can use your code literally anywhere you go without worrying about a Matlab license, and the SciPy Stack has reached functional feature parity with Matlab (and is evolving well beyond in certain areas).

      I like Python and I've spent some time learning it recently and ported some of MATLAB code. Python is not a panacea, however. For starters, there is no equivalent of the excellent MATLAB docs. For a newcomer with no programming experience, the entry barrier is definitely higher. Much more Googling needed to get stuff to work. The plots produced by Matplotlib are good but don't do everything. e.g. I found animating data was too slow in Matplotlib and I spent ages messing about with pyqtgraph. So to get the most out of it you have to screw about with different plotting packages and that can be very time consuming. In general, the syntax for matrix operations is a lot more elegant and economical in MATLAB than in Python/numpy. I also ran into issues where seemingly equivalent code would run substantially slower in Python than MATLAB. In many cases I was able to resolve the issue and surely to a degree it was due to me being a numpy beginner, but I do feel it's easier to get the most out of MATLAB than the most of out of numpy. MATLAB has now become quite smart about helping the user to optimise code. Admittedly this might make the user a less careful programmer: I definitely learned things whilst trying to get Python code to run at the same speed as my original MATLAB code. So the process was useful. I'd like to use Python more in the future, but rabidly hating on MATLAB isn't fair. Finally, they're pretty different languages: Python is a general-purpose language which has been adapted to number crunching, whereas MATLAB was designed for number-crunching from the ground up. When you use these languages, their heritage shows.

  22. Matlab by necro81 · · Score: 2

    If you are working in academia, then you probably have access to Matlab. Matlab, as a language, has both scripting abilities and programming abilities. The scripting was born from Matlab's roots in Unix, which makes it handy for batch processing lots of files. It's programming functions started off as C, but has since incorporated features from C++, Python, and Java. The programming side of it has, in my opinion, more structure and formalism than Python, but makes certain things like file IO and data visualization (i.e., graphing) easier than straight up C/C++. The basics of using it can be picked up in an afternoon, and the sky's the limit from there. There are lots of well-written and documented functions built in; specialized toolboxes can be had for additional fees. There's a fair bit of user-generated code out there. Plus, I expect you can find a lot of people around you who know plenty about it.

  23. R, Perl, some C by digitalhermit · · Score: 2

    I run lots of statistical analyses. Most of the code is in R with some wrappers in Perl and some specific libraries in C. The R and Perl code is pretty much all my own. The C is almost entirely open source software with very minor changes to specify different libraries (I'm experimenting with some GPU computing code from NVidia). Most of the people who are doing similar things are using Python with R (or more specifically, the people I know who are doing the same thing are using Python/R).

    An average run with a given data set takes approximately 20 minutes to complete on an 8-core AMD 8160. About 80% of the run is multi-threaded and all cores are pegged. The last bit is constrained mainly by network and disk speed.

    You may consider using something like Java/Hadoop depending on your data and compute requirements. Though my Java code is just a step above the level of a grunting walrus, I've found that the performance is actually not that bad and can be pretty good in some cases.

     

  24. PDL by swm · · Score: 3, Informative

    Perl Data Language
    The power of Perl + the speed of C

    1. Re:PDL by Roger+W+Moore · · Score: 4, Funny

      The power of Perl + the speed of C

      ...and the readability of machine code?

  25. Re:Fortran 90+ with OpenMP or Python by Eunuchswear · · Score: 2

    Fortran 77 is for weenies. Real men program in FORTRAN 66.

    --
    Watch this Heartland Institute video
  26. "Scientific Computing" is over-broad by FellowConspirator · · Score: 2

    The problem with this question is that "scientific computing" is an over-broad term. The truth is that certain languages have found specific niches in different parts aspects of scientific computing. Bioinformatics, for example, tends to involve R, Python, Java, and PERL (the prominence of each depends largely on the application). Big-data analytics typically involves Java or languages built on Java (Scala, Groovy). Real-time data processing is generally done in Matlab. pharmacokinetics, some physics, and some computational chemistry are often done in FORTRAN. Instrumentation is generally controlled using C, C++, or VB.NET. Visualization is done in R, D3 (JavaScript), or Matlab. Validated clinical biostatistics are all done in SAS (!).

    Python is a nice simple to learn start, very powerful, and the NumPy package is important to learn for scientific computing. R is the language of choice for many types of statistical and numerical analysis. Those are a good place to start, if incomplete. From there, I'd look at the specific fields of interest and look at what the common applications and code-base are for those.

    With regard to the OS, that's pretty easy: Linux (though OS X is a reasonable substitute). Nearly all scientific computing is done in a UNIX-like environment.

  27. Re:Fortran (plus MPI and some CUDA) by Anonymous Coward · · Score: 2, Insightful

    Fortran and learn some how to implement MPI and CUDA code is your work is parallelizable.

    DO NOT USE CUDA

    Use OpenCL

  28. Also, it is fast by Sycraft-fu · · Score: 2

    In part, this is because Intel has a compiler for it. On commodity hardware (as in desktop, laptop), you will generally get the best performance running an Intel CPU and using an Intel compiler. That means C/C++ or FORTRAN, as they are the only languages for which Intel makes compilers. C++ is easy to see, since so much is written in it but why would they make a FORTRAN compiler? Because as you say, serious science research uses it.

    When you want fast numerical computation on a desktop, FORTRAN is a good choice. We have a few researchers here who use it, and they all use the Intel Fortran Compiler because they want fast computation, but they don't have the money to buy bigass systems for every grad student. What they get out of the IFC and a regular Intel desktop chip is pretty impressive.

    Compilers matter, and Intel makes some damn good ones. So if your research calls for lots of performance on little budget, that can influence language choices. Heck same thing on supercomputers. That is not my area of expertise, but it isn't as though all compilers for a given supercomptuer will be equally good. If I were to bet, I'd say the FORTRAN compilers are some of the better ones.

  29. Matlab will fall, SciPy will rise by steveha · · Score: 2

    If you are working in academia, then you probably have access to Matlab.

    On the other hand, you definitely have access to SciPy, given that it's free.

    I predict that Python with SciPy/NumPy will completely displace Matlab within a few years.

    I say that even though I am working in one industry, digital signal processing, that is really married to Matlab and will be one of the last places to make the switch.

    Because Matlab was purpose-built for scripting with matrices, it has some nice syntactic sugar for that. In every other way, Python as a language is far superior.

    I was able to attend the SciPy conference a couple of years ago, and one thing I heard there: people like that Python works as a universal language. Sysadmins can use Python to do admin tasks; the web site guys can use Python (with Django) to make web sites; the science guys can use SciPy... it's one language that is flexible enough to do anything you might need, and it's much easier to learn than other really flexible languages like Lisp.

    Because Matlab has been around a long time and has man-centuries of work invested in it, it has very complete and well-debugged libraries available for it. SciPy is playing catch-up here. But the basics are already solid, and if SciPy will work for you, you should choose it because it is the future.

    There was a time, not that long ago, when people spent $30 to get a web browser. Now people expect web browsers to be free. I predict in the near future the same thing will happen with Matlab vs. SciPy.

    SciPy has the advantages of being free and open, as well as the advantage of being free as in beer. And Python is just a better language than the Matlab language. Mark my words: Matlab will fall and Python/SciPy will rise.

    --
    lf(1): it's like ls(1) but sorts filenames by extension, tersely
  30. VB is too slow for you? C++ then... by bobbied · · Score: 2

    I suspect that VB is NOT your problem here. But, if you have a VB program that is too slow, then I'm going to suggest you do the following:

    1. Profile your program and see if you can figure out what's taking up all the processing time. It may be possible to change the program you already have slightly and get the performance you need. It would be a shame to go though all the trouble to learn a new language and recode the whole thing if replacing some portion of your code will fix it. Do you have a geometric solution implemented when a non-geometric solution exists?

    2. Consider adding hardware - It's almost ALWAYS cheaper to throw hardware at it than to re-implement something in a language you are learning.

    3. Rewrite your program in VB - This time, looking for ways to make it perform faster (you did profile it right? You know what is taking all the time right?) Can you multi-thread it, or adjust your data structures to something more efficient?

    4. Throw hardware at it - I cannot stress this enough, it's almost ALWAYS easier to throw hardware at it, unless you really have a problem with geometric increases in required processing and you are just trying to run bigger data sets..

    5. If 1-4 don't fix it, then I'm guessing you are in serious trouble. If you really do not have a geometric problem, You *MIGHT* be able to learn C/C++ well enough to get an acceptable result if you re-implement your program. C/C++ will run circles around VB when properly implemented, but it can be a challenge to use C/C++ if your data structures are complex.

    6. Throw hardware at it - seriously.

    Unless you really just have a poorly written VB program or you are really doing some geometric algorithm with larger data sets (In which case, you are going to be stuck waiting no matter what you do) getting better hardware may be your only viable option. I would NOT recommend trying to pick up some new language over VB just for performance improvement unless it is simply your only option. If you do decide to switch, use C/C++ but I would consider that a very high risk approach and the very last resort.

    --
    "File to fit, pound to insert, paint to match" - Aircraft Maintenance 101
  31. C. Obviously. by RandCraw · · Score: 3, Insightful

    You know C. C is simple, as fast as any alternative, it's straightforward to optimize (aside from pointer abuse), and you always know what the compiler/runtime is doing. And threading libraries like pthreads or CUDA are best served via C/C++. Why use anything else?

    Another thought: scientific libraries. If you need external services/algorithms then your chosen language should support the libraries you need. C/C++ are well served by many fast machine learning libs such as FANN, LIBSVM, OpenCV, not to mention CBLAS, LinPACK, etc.

  32. R, Matlab/Octave or Python with Pandas or Numpy/Sc by joeblog · · Score: 2

    My experience at this comes from being a MooC addict where some of the courses are in Python, others in R, and others in Matlab or its GNU counterpart Octave.

    Of these Python is my favorite since it's the language I'm most familiar with. Furthermore, you can "bolt" R to Python with the Pandas library, and you can "bolt" Matlab/Octave with the Numpy & Scipy libraries.

    A big drawback, however, is speed. The big advantage of domain specific mini-languages over "kitchen sink" languages was brought home to me by writing a Python script to simulate the popular (in statistics courses) Monty Hall problem and the same script in R. While my Python script took several seconds to simulate a couple of thousand Monty Hall game turns, the R script would give the percentage for millions the instant I hit the enter key.

    More complicated problems ended up with weird bugs in R scripts I couldn't figure out, whereas (because of my better familiarity with Python's "mutable list" problems) I tended to get correct -- albeit slower -- answers from my Python programs.

    Re Octave: whereas R has overtaken commercial versions of S, I've written off Octave as a lame "freeware" version of Matlab -- lots of features are missing, the documentation is frustrating (it seems to only be used by universities, so "gurus" on stackoverflow etc automatically assume any question is some student trying to cheat at homeworkd) so I'm not a fan. But if I knew Octave as well as Python, I might like it.

    R, on the other hand, has an obvious speed advantages for the problems it's aimed at, and a probably a better selection of specialist libraries for statistical problems. But it's full of strange quirks for non specialists.

    --
    If it works, it's obsolete
  33. Fortran by hooiberg · · Score: 2

    I have worked for almost a decade in scientific computing, and it is Fortran everywhere. Make sure you get up to the new standard. Contemporary Fortran is not the same as Fortran77. Many problems typically associated with Fortran are things of the past. ;) Next is C. Moreover, Fortran is a fairly easy language to learn. Avoid all object oriented stuff. For scientiicf computing, this is never used, and even shunned to a great degree. Avoid c++ and C# and all that stuff. When you work in SciCom, you will never see that anyway.

  34. Re:Fortran (plus MPI and some CUDA) by GiganticLyingMouth · · Score: 2

    Last I checked (a few years ago) CUDA had better tools and more features than OpenCL. Has this changed much since then? OpenCL didn't even support templates back then...

  35. Re:C. Obviously. by GiganticLyingMouth · · Score: 2

    It should also be noted that as of C++11 threading is part of the C++ standard library (so you usually won't have to use pthreads or any other platform-specific threads directly).