Ask Slashdot: Best Language To Learn For Scientific Computing?
New submitter longhunt writes "I just started my second year of grad school and I am working on a project that involves a computationally intensive data mining problem. I initially coded all of my routines in VBA because it 'was there'. They work, but run way too slow. I need to port to a faster language. I have acquired an older Xeon-based server and would like to be able to make use of all four CPU cores. I can load it with either Windows (XP) or Linux and am relatively comfortable with both. I did a fair amount of C and Octave programming as an undergrad. I also messed around with Fortran77 and several flavors of BASIC. Unfortunately, I haven't done ANY programming in about 12 years, so it would almost be like starting from scratch. I need a language I can pick up in a few weeks so I can get back to my research. I am not a CS major, so I care more about the answer than the code itself. What language suggestions or tips can you give me?"
I have a friend who works for a company that does gene sequencing and other genetic research and, from what he's told me, the whole industry uses mostly python. You probably don't have the hardware resources that they do, but I'd bet you also don't have data sets that are nearly as large as theirs are.
You might also get better results from something less general purpose like Julia, which is designed for number crunching.
"Don't blame me, I voted for Kodos!"
sorry to say, but that is a fact
Obviously.
Why not trying tracking down a CS professor and getting paired up with an undergrad student who needs to create a capstone project?
Math
Seriously consider FORTRAN
If you want news from today, you have to come back tomorrow.
Have you looked at Matlab? It's commercial, requiring a license, but many universities have a site license available for you to use it. Pretty powerful, faster than VB, but not as fast as native C/C++ but unless you're running some calculations real-time, this probably is not an issue for you.
..seems pretty self-explaining to me.
http://julialang.org/
>> I initially coded all of my routines in VBA because it 'was there'.
Are you in Access? Or Excel?
If your routines work but are just slow, I'd first look at moving the data to SQL Server and porting your VBA routines to VB.NET.
If you have more time, you may want to learn what the "Hadoop" world is all about.
Depending on your needs, R may be your best bet if it is statistical processing you are interested in.
Some people die at 25 and aren't buried until 75. -Benjamin Franklin
Java (for quick prototyping), C++ (port from Java code/structure to fine-tune performance).
Check with your potential employers what language(s) was(were) used to build their current applications, and what languages (if any) they will port to.
What do you mean by scientific computing?
Modelling: Hard core finite element simulations or the like. Then C or Fortran and you will be linking with the math libraries.
Log Processing: A lot of other stuff you will be parsing data logs and doing statistics. So perl or python then octive.
Data Mining: Python or other SQL front end.
Install these 2 and you'll be good to go
http://ipython.org/notebook.html
http://pandas.pydata.org/
Try Python. Make sure to use scipy (numpy really), because you don't want to do the heavy lifting in native Python.
http://www.scipy.org/
You should all be sharing your codes to avoid rewriting and to perfect it.
And if you are not a member of a team then I seriously question the quality of your graduate program.
What language suggestions or tips can you give me?"
Timothy, shame on you. You should know better than to start a holy war.
#fuckbeta #iamslashdot #dicemustdie
But from what I heard, it's still in development. Does someone know how usable it is atm?
Cause that probably the answer if your having "computation performance problems", maybe even C++/OpenCL if your feeling really brave...
On the other hand, why not just throw more hardware at the problem (or wait a little longer). By the time you have recoded your VBA in something else, i'm betting the VBA code could have solved the problem running on some decent hardware.
Unless the question is "I wrote my code in VBA and it doesn't scale to a 5k node cluster, what did I do wrong". In that case you aren't really asking the right question.
Fortran and learn some how to implement MPI and CUDA code is your work is parallelizable.
"goodbye and hello, as always" ~Prince Corwin, from Zelazny's Amber series
Perl should handle literally anything you can throw at it.
If you can find anything that resembles a math library with the correct tools then go with Python. Numpy is everyones friend here.
If you have to do the whole thing from scratch then Fortran is the fastest platform. I can't say I've meet anyone who enjoyed Fortran but it's wicked fast.
TCAP-Abort
There's nothing wrong with Fortran or C. There are newer and in some cases more focused languages you may want to check into like R, Matlab, C++, Python, Perl, or Go. I'm not a fan of the language and it's not known for raw performance compared to Fortran or C,but there are probably great libraries for what you need in Java.
For numeric-intensive work, I can get within 20% of the speed of C++ using the usual techniques -- minimize garbage collection by allocating variables once, use the "server" VM, perform "warmup" iterations in benchmark code to stabilize the JIT. I use the Eclipse IDE, copy and paste numeric results from the Console View into a spreadsheet program, and voila, instant journal article tables.
I would recommend learning what a programming language is. Especially if you have the time. Personally I spent a lot of time learning languages and not really seeing the abstraction that every programming language adhere's to, making learning a new language difficult and time consuming. I can only really describe it as trying to learn a language rather than learning linguistics. All computer languages share common patterns all based on formalism, just like all spoken languages share common patterns. Learning formalism makes picking up new programming languages much easier since you'll not only be able to identify patterns shared between them faster, but pick up the lexicon to communicate well formed questions to other programmers. I'd recommend reading Structure and Interpretation of Computer Programs. There are other books that attempt to replicate what this does, but it really is great and I haven't seen other books get to the point of computer programming faster. It is based on LISP, which most people will never use, but its deceptively easy to read and understand, so getting through the book for someone that hasn't used LISP before shouldn't be a problem. Good Luck!
First suggestion: Python. Lot's of nice stuff for science (NumPy, SciPy), lots of other goodies, easy to learn, many people to ask or places to get help from. Plus you can explore data interactively ("Yes Wedesday, play with your data!").
Beyond that: CERN uses a lot of Java (sorry folks, true), they have good (and fast) tools I do a project right now where I am using Jython since it is supported by the main (Java) software I have to use. I like jhepwork/SCaVis quite a bit, if you are into plotting stuff on Java.
If you have extra free time and want to learn how to program well? I'd learn something like Smalltalk (for OOP concepts) and/or Haskell (functional programming). Scientists are often lousy programmers because they often do not learn programming properly, and/or the language allows them to get away with bad programming (I know, every language allows bad programmers to write bad code, but some make it easier than others).
So, stick with Python, it works really well, is modern, and has good support. Plus you can read your code in 5 years time ...
What do I program in? Python (and Jython), Perl, C, IDL (yickes!), Smalltalk, Matlab, Mathematica. I know some Lisp, but that's just for fun. And whatever allows me to load sketches on an Arduino. I like Python (get's stuff done) and Smalltalk (works actually like I think - passing messages between objects).
Use whatever works and you don't hate :-)
Do your own thing. And overdo it!
Most of the cutting edge data mining I've seen is done using R (which acts as a scripting wrapper for the C or Fortran code that the fast analysis libraries are coded in), or alternatively in python. Some people swear by MatLab if they have trained in it (so your octave would come in handy there). Have a look at some discussions at places like kaggle.com to see what the competitive machine learning community uses (if that is what you mean by data mining).
Korma: Good
http://golang.org/ You won't regret it.
Sorry about the mess.
A lot of people will propose a language because it is their favorite. Others because they believe it is very easy to learn. I will give you a third line of thought.
I would not look for a language in this case, I would look for a library, then teach myself whatever language is easiest/quickest to access it. I would try to profile what you are building, figure out where the bottlenecks are likely to be (profiling your existing mockup can help here but dont trust it entirely) and try to find the best stable well-designed high performance library for that particular type of code.
=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Friends don't let friends enable ecmascript.
If you are doing a computationally intensive data mining problem, have you considered porting to a Hadoop solution? You may need to rewrite your code, or you may be able to use Hadoop to call your current functions. You could use an AWS Hadoop cluster; Amazon often gives free credits to students, it may cost you nothing out of pocket, and help you learn a hot new technology.
Recent version of Fortran are very advanced, a lot easier to use than Fortran 77 and still extremely fast.
Some new features since 77: structured programming, array programming, modular programming and generic programming (Fortran 90), high performance Fortran (Fortran 95), object-oriented programming (Fortran 2003) and concurrent programming (Fortran 2008).
Free compilers: GFortran and G95
If you're using VBA in Excel, you can speed it up a ton by putting this at the beginning of your function:
Application.Calculation = xlCalculationManual
And restore it with ...Automatic at the end.
Do this at the top level with a wrapper function whose only purpose is to disable and enable that, calling the real function in between.
If you want a real speedup, I am available for part time work in C or C++.
(-1: Post disagrees with my already-settled worldview) is not a valid mod option.
It take all the work out of the computations..
Have you fscked your local propeller head today?
If you really want to do heavy lifting, you can't beat Fortran. Just stay away from Fortran 77; it's a hot mess. Fortran 90 and later are much easier to use, and they're supported by the main compilers: gfortran and ifortran.
ifortran is Intel's Fortran compiler. It's the fastest out there, and it runs on Windows and Linux. Furthermore, you can get it as a free download for some types of academic use. (Search around intel's website -- it's hard to find.) That said, I usually use gfortran -- which is free and open source -- on linux. See http://www.polyhedron.com/compare0html for a compiler comparison.
If you use Fortran, it's very easy to use OpenMP to do multiprocessing and make use of all those cores. OpenMP is supported by the main compilers.
If you're doing lighter work, SciPy/NumPy works fine; I use it a fair amount if maximum performance isn't essential. However, I can't speak to its multiprocessing ability.
For scientific stuff, FORTRAN is still the best. Simple, old things are very often the best things around. C++ is in many ways a regression, especially all the C-style stuff you can find in the average C++ program.
As soon as you need to process massive data sets or run massive simulations, all the Script languages won't cut it any longer, so you either go Fortran or C++. So, again, Fortran.
Before you C++ kids want to tell me something, read up on that Mr Kuck and his optimizers. Fortran optimizers did things about 20 years ago which C++ optimizers still cannot do.
Finally, there are tons of Fortran libraries already available for all kinds of science and engineering problems.
Use KNIME and you can probably do 90% of what you want by dragging and dropping a new nodes and joining them up. KNIME does all the complicated memory caching for large filesets for you, and you can write your own Java functions to plug into it if you need something special.
R, MATLAB, SAS, Python, there's a bunch of languages you can use, and a bunch of ways to store the data (RDBMS, NOSQL, Hadoop, etc.). It really comes down to what kind of access to the data you have, how it's presented, what other resources you have available to you, and what you want to do with it.
Better yet, Fortran + Python.
http://docs.scipy.org/doc/numpy/user/c-info.python-as-glue.html#f2py
I used it to wrap some crazy magnetometer processing code written in Fortran into a nice Python program. I ripped out all the I/O from the Fortran code and moved it into the Python layer. It worked great. Fortran is AWESOME at number crunching but SUCKS ASS at IO or well pretty much anything else, hence Python.
-73, de n1ywb
www.n1ywb.com
Well, it depends. You say " computationally intensive data mining problem" but, what kind computations (arithmetic, mathematical, text-base, etc.).
In general for flat out speed, toss interpreted languages out (Perl, Python, Java, etc.) the door. You'll want something that compiles to machine code, esp. if you are running on older hardware. Crunching numbers, complex math, matrices then Fortran is the beast. If you're data is arranged in lists, consider lisp, then pick something else as it will likely give you a migraine. The format of your data and what you need to do with it will drive your language choice.
Is finding a partner an option? Seems you should be able to work with someone from CS who needs a coding project...
I work in the industry (all our customers are scientists), and the two languages that seem to be predominant are R and Python. R has lots of cool stuff specifically for advanced number crunching, while Python is more the swiss army knife that can be used to tackle anything. I don't think you can go wrong with either, but Python will probably be more friendly (eg. it has way more books on it than R) and will serve you better in non-scientific enterprises.
Since you mention VBA, I suspect that your data is in Excel spreadsheets? If you want to try to speed this up with minimum effort, then consider using Python with Pyvot to access the data, and then numpy/scipy/pandas to do whatever processing you need. This should give you a significant perf boost without the need to significantly rearchitecture everything or change your workflow much.
In addition, using Python this way gives you the ability to use IPython to work with your data in interactive mode - it's kinda like a scientific Python REPL, with graphing etc.
If you want an IDE that can connect all these together, try Python Tools for Visual Studio. This will give you a good general IDE experience (editing with code completion, debugging, profiling etc), and also comes with an integrated IPython console. This way you can write your code in the full-fledged code editor, and then quickly send select pieces of it to the REPL for evaluation, to test it as you write it.
(Full disclosure: I am a developer on the PTVS team)
then you should care about the code, as well. Choice of language can have a lot of consequences for accuracy and floating rounding errors need to be accounted for, and these may differ per language and implementation version of each language.
FORTAN used to be it back in the day, but now days Matlab is the stuff that many engineers use for scientific computing. Many of the math libraries are very good in Matlab and don't require you to be a computer scientist to make them run fast. I used to work with scientists in my old lab to port their Matlab code to run on HPC clusters porting them to FORTAN or C. Often the matlab libraries smoked the BLAS/Atlas packages that you find on Linux/UNIX machines for instance. The same would hold true for Octave since they just build on the standard GNU math pacakges like BLAS.
If you want to be able to ask someone for help then it would be best to use the same tools they use. The point is that any programming language will work. Some languages are easier then others but the difference is negligible compared to the advantage of being able to ask your piers for assistance.
...at jsoftware.com .
It's more powerful, concise, and consistent than most languages. However, R and Matlab have larger user communities and this is an important consideration.
There was a note on the J-forum a few months ago from an astronomer who uses J to "...compute photoionization models of planetary nebulae." His code to do this is about 500 lines in about 30 modules and uses some multi-dimensional datasets, including a four-dimensional one of "...2D grids of the collisional cooling by each of 16 ions".
However, the point of his note was that he ported this code to his i-phone - and it works! Consider, too that porting consists mainly of copying some text and data files - there would be little to no code changes.
you haven't given a lot of info on specifics of what you're trying to do, but i'm assuming something like crunching through tables of data with possible aggregations, filtering and sorting with possibly a few custom calculations based on the raw data. kind of stuff you can do in excel on a small scale.
so my first question is have you looked at excel 2010 or 2013? if not they're much better at bigger data than previous versions. but excel does have it's limits....
if you have a budget for commercial software, then something like matlab might work. it is uber fast, can handle multiple cores/64-bit and is extremely well documented on their website with copious examples and documentation. the pace of updates at 2x per year is also good with steady incremental improvements.
if you have no budget then python+matplotlib+ipython+pandas is an excellent combo. it's what i use. free and productive once you learn the ropes. and you spend minimal time on learning a programming environment, etc. if you can do VBA, you can definitely do python. and with pandas it can be quite fast.
as a final thought, if you're really just doing data mining and have the data somewhere like in a database, you might want to consider some of the newer tools like tableau. no/minimal programming required to do some pretty nice analysis and it's dead simple to play around with new ways of looking at the data.
No language will give you a magic speed boost if you do not understand how it processes the numbers and data structures.
My recommendation is probably not what you want to hear: pick a language that you are comfortable with and study it so that you know how to write efficient code with it.
I'm a MSEE and I've been working in the digital signal processing realm for the last 10 years since graduating. I should mention that I haven't done a lot of low level hardware work, I haven't programmed actual DSP cards or played with CUDA. I have written software that did real-time signal processing just on a GPU. Everyone in my industry at this point uses C or C++. There is some legacy FORTRAN, and I shudder when I have to read it. Some old types swear by it, but it's fallen out of favor mostly just because it's antiquated and most people know C/C++ and libraries are available for it.
For non-real-time prototypes I'd recommend learning python (scipy, numpy, matplotlib). Perhaps octave and/or Matlab would be useful as well.
At some point you have to decide what your strength will be. I love learning about CS and try to improve my coding skills, but it's just not my strength. I'm hired because of my DSP knowledge, and I need to be able to program well enough to translate algorithms to programs. If you really want to squeeze out performance then you'll probably want to learn CUDA, assembly, AVX/SSE, and DSP specific C programming. But I haven't delved to that level because, honestly, we have a somewhat different set of people at the company that are really good in those realms.
Of course, it would be great if I could know everything. But at the moment it's been good enough to know C/C++ for most of our real time signal processing. If something is taking a really long time, we might look at implementing a vectorized version. I would like to learn CUDA for when I get a platform that has GPUs but part of me wonders if it's worth it. The reason C/C++ has been enough so far is that compilers are getting so good that you really have to know what you're doing in assembly to beat them. Casual assembly knowledge probably won't help. I might be wrong, but I envision that being the case in the not too distant future with GPUs and parallel programming.
If you really need fast number crunching and have a highly-parallelizable problem, consider openCL (or CUDA/directCompute, but those are less generic). There is a bit of a learning curve, but the results are worthwhile for these types of problems.
For powerful expressiveness with large datasets, I like MATLAB/Octave. For universal-ness that's easy to learn and easy for others to understand, go with Python - it's very common for certain types of simulations and models and I can understand why.
Do not even consider using Java - you will regret it.
Do you have access to MATLAB or a similar analysis tool? Many universities have licenses, and overall it seems like it might be a good choice for you. These programs usually have a lot of build-in functionality that will be difficult to reproduce if you are not an experienced scientific programmer.
I haven't done ANY programming in about 12 years, so it would almost be like starting from scratch.
This is probably a bigger problem than choosing which language to use. If you don't know how to program properly and efficiently, it doesn't matter which language you choose. If you go this route I'd suggest taking a course to refresh or upgrade your skills. Since you're familiar with C that might be a good language to focus on in the course. Another factor is if you have to work with any existing libraries it might limit your choices. I program in C, FORTRAN, and VB and find that for computationally intensive programs C is usually the best fit, sometimes FORTRAN, and never VB.
newLISP is small and can easily call most c/c++ libraries, plus Java for graphics. HTML/XML are really just LISP S-expressions for all practical purposes. Throw in a little Unix/bash and you are there.
Personally, I would do it in C unless you have Fortran libraries you want to use, then I'd use Fortran. However, if you have existing VBA code you want to leverage, I'd just use VB.Net, import the core parts of the code and run with it. There's a moderately steep learning curve going from VB6 or VBA to VB.Net; but, it'll be much less effort than learning a new language.
Perl is the second one, but if you actually mean real science, you need to learn R, or at least S, in addition to Perl.
C is a good choice too, as is C++.
We use those in real science.
It depends on what you willing to deal with.
Python is good if you don't need to very heavy array code. I know you can use Python libraries that give you access to good arrays but I think of Python as a scripting language. It's good for a quick prototype as well, but for heavy computation, I would move on to a compiled language.
Fortran 90 or Fortran 2003/08 is what will be the most like what the mathematical syntax you'll use. Despite what people may tell you, it is possible to write code that is understandable and reusable in Fortran, it just takes a great deal of understanding when you design the code. Most people have only seen Fortran code that was either hacked together or is so heavily optimized that it has been obfuscated.
C++ is good as well but you'll spend more time figuring out how to express your mathematics and to use the arrays than you might might find productive. In my group, we do computer science parts of our codes in C++, but numeric calculations and heavy-duty array manipulation is done in Fortran.
The thing about taking advantage of the multiple core machine is much deeper than simply choosing a language. There are MPI and OpenMP libraries that are very good for Fortran and C++. However, producing efficient code that is parallelizable requires changing and complicating the algorithm for a well understood and functioning serial code. Writing effective parallel code will take you much more time than picking up a programming language.
How easily does your problem parallelize? How slow is too slow? Why, exactly, do you "want" to use all four cores?
Has someone solved a variant of your problem before? Since you're doing data mining, the answer is most likely 'yes', in which case unless you're a masochist or have something to prove to yourself, you want to adapt what they've done. Hell, it's quite likely that there's a nice-enough implementation in a standard software package already (R comes to mind). A few hours spent mathematically/conceptually massaging your problem into a canonical form can save you days of programming, and will train you to make useful analogies too. It won't be optimized, but that shouldn't be your concern now. Days or weeks of coding to save a few hours or days is not a smart investment.
It's really easy to get into a trap of focusing on coding, and frankly asking on Slashdot will probably lead you further that way. Sometimes you do need to focus on coding, but it should always be in the mindset of automating something you could (at least in principle) be doing by hand. For a quantitative researcher, programming is, itself, a subroutine, not an ends in itself.
If you are working in academia, then you probably have access to Matlab. Matlab, as a language, has both scripting abilities and programming abilities. The scripting was born from Matlab's roots in Unix, which makes it handy for batch processing lots of files. It's programming functions started off as C, but has since incorporated features from C++, Python, and Java. The programming side of it has, in my opinion, more structure and formalism than Python, but makes certain things like file IO and data visualization (i.e., graphing) easier than straight up C/C++. The basics of using it can be picked up in an afternoon, and the sky's the limit from there. There are lots of well-written and documented functions built in; specialized toolboxes can be had for additional fees. There's a fair bit of user-generated code out there. Plus, I expect you can find a lot of people around you who know plenty about it.
For a free windows compiler, go with MinGW.
Linux uses the GCC standard. You can also go with LLVM/CLang.
All of these support the C++11 standard which in turn supports multi-threading out of the box.
http://solarianprogrammer.com/2011/12/16/cpp-11-thread-tutorial/
http://en.cppreference.com/w/cpp/thread
http://cpprocks.com/wp-content/uploads/C++-concurrency-cheatsheet.pdf
As for Octave, see here: http://stackoverflow.com/questions/11889118/get-gnu-octave-to-work-with-a-multicore-processor-multithreading
If you want true horse power and are willing to work for it, invest in a compatible AMD or Nvidia card that works with OpenCL and spend some time learning that.
http://opencl.codeplex.com/wikipage?title=OpenCL%20Tutorials%20-%201
http://www.drdobbs.com/parallel/a-gentle-introduction-to-opencl/231002854
http://enja.org/2010/07/13/adventures-in-opencl-part-1-getting-started/
Let me second this one. Mathematica, Maple, Sage, Matlab / Octive... Mathematical languages are so nice for scientific computing because the languages have wonderful built in functions.
The answer would really depend on the nature of the problem. If you are doing more statistics type processing then R is commonly used in academia. Python might be good in the short and medium term, but you will probably want to get acquainted with C++ if you are serious.
You're falling for one of the most common traps in programming. It doesn't do what I want, so I'm going to start over from scratch. You'll waste a lot of time doing things you've already done and debugged.
So what you should do is keep the VB program. Identify the slow parts, most likely the inner most section of the inner most loop, and convert that to a C or Fortran module. Ideally use a "message passing" interface so the C or Fortran code can be multi-threaded while the VB portion stays more or less as is.
Do just a little bit at a time, so you can see the actual progress. Chose between C or Fortran based on the availability of libraries that make your computation easier.
I worked as a sysadmin for a high energy physics group at the Beckman Center. Day and night, it was Fortran, on big whopping clusters, doing monte carlo simulations.
Though it ~was~ many years ago.
Elsewhere, I worked for a company doing datamining on massive datasets, over a terabyte of data back in 2000, per customer, with multiple customers and daily runs on 1-5 gig subsets. We used C + big math/vector/matrix libs for the processing because nothing else could come close, and Perl or Java for the data management; preprocessing, set creation and munging (like attempting to corrrect spelling mistakes, parsing date strings into a standard format, normalizing data against a standard metric, applying expert system filters, even actual machine analysis like clustering or shape detection, which to us was still just preprocessing).
Don't use a programming language. Use a tool like Matlab or Mathematica instead. These tools are well designed for scientific computing and have sufficient scripting built in to support the programming-language-like functionality you're probably looking for.
You won't be able to call yourself a programmer. But you're not a programmer, you're a scientist.
Moderating "-1, Disagree" is simple censorship. Have the guts to post your opinion.
Stick to 'functional' flavored languages, especially ones that are geared toward composing concurrent code particularly easy and feature immutability of variables as a default, as that's particularly important when you've got a lot of working parts running in parallel. We've been pulling away from the higher-level languages reimplementing most of Lisp poorly toward languages that more and more resemble the ML family with each year that goes past. Haskell is the current darling of that group, though I'd not suggest a lazily evaluated language to a beginner as it's particularly difficult to reason about compared to the more typical eager evaluation. Still, you can try taking a crack at Learn You a Haskell if you want to see a brief glimpse at one of the most interesting languages floating around at the moment, especially since there's ample resources available for study compared to say, Standard ML, my personal favorite.
I would keep my eyes out on Rust in particular, which has taken a few of the better ideas from ML (pattern matching, type inference) and has tried to graft it to a more pragmatic set of trade-offs, such as pushing garbage collection from a required performance hindrance to a per-task elective option and has focused on strong interoperability with C and its calling conventions. It also was a day-one design decision to focus on parallelism and concurrency, which is part of why I cannot directly suggest OCaml at the moment, as that is its biggest weakness in otherwise a very robust and well-established member of the ML family.
I would avoid Go for myriad reasons, unless you want to cover a third of your source-code with explicit error handling or just silently discard them, suffer through the lack of generics (not everything has to be shit with generics like Java and C++, guys, c'mon) or just really enjoy null errors and weird versioning issues that crop up during project development because the Go build system is a bit too simplistic and their solution is to just pull all your dependencies into your own tree to avoid the issue.
I run lots of statistical analyses. Most of the code is in R with some wrappers in Perl and some specific libraries in C. The R and Perl code is pretty much all my own. The C is almost entirely open source software with very minor changes to specify different libraries (I'm experimenting with some GPU computing code from NVidia). Most of the people who are doing similar things are using Python with R (or more specifically, the people I know who are doing the same thing are using Python/R).
An average run with a given data set takes approximately 20 minutes to complete on an 8-core AMD 8160. About 80% of the run is multi-threaded and all cores are pegged. The last bit is constrained mainly by network and disk speed.
You may consider using something like Java/Hadoop depending on your data and compute requirements. Though my Java code is just a step above the level of a grunting walrus, I've found that the performance is actually not that bad and can be pretty good in some cases.
A programming language is just the start, which can be mastered in relatively small time (weeks). When you've completed this, start looking for third party packages.
Most of the building blocks have been created and are available on the web. To you the task to use them to build your application.
For example: people say Python, but only because that has numpy and scipy frameworks that enable you to do a lot of numerical stuff.
As others have pointed out, if you are going to do serious research,
you dont want to mess around with the latest and greatest.
Stick with something that has been around and works.
COBOL is that language!
Ha! Made you smile :P
Seriously tho, use C or FORTRAN. Nothing beats them for optimization and speed.
Cheers!
it's easy to learn, it's fast, it's suitable for almost any task because there are many ready to use libraries out there. thus python. btw I learned python in three hours! see github.com/masikh for my work.
Not only is C# easy to learn, and easy to both read and write, it also runs at a fairly high speed when it is compiled. To make use of multiple CPU Cores, C# has a neat feature named PARALLEL.FOR. If your algorithm scans across a 2D Data Array using a FOR LOOP at all, Parallel.For will automatically break that array into smaller arrays, and have each calculated by a different CPU core, resulting in a much faster overall computation speed. I develop algorithms in C# and highly recommend it if you want a) a nice, readable code syntax and b) fast execution speed. I hope this helps...
Why did the chicken cross the road? Because Elon Musk put an AI chip in its head.
If you're data is arranged in lists, consider lisp,
Oh please! It's not like Lisp doesn't have any other data structure, is it? You can have your multidimensional numerical arrays in CL quite easily. (I'm saying neither "use CL" nor "don't use CL", merely that your argument is pretty weak. It's easier to learn to work with lists in the language you already know (unless it's COBOL!) than to learn an entirely different one just because of lists.)
Ezekiel 23:20
The correct answer is that it depends on your algorithm, and what bits of it you need to modify. If it's something that can mostly be coded in terms of existing algorithms, then find a library that implements those, learn the appropriate language enough to modify it, and carry on. The best thing to do may be to ask other people in your group/research field to see what tools they use.
If you're looking for existing libraries for data-mining, I suspect the answer is unlikely to be Fortran, even though that's probably the best language for scientific computing in general, followed closely by C++ (which is *very* hard to learn). The answer may be Python or even MatLab, if it has the appropriate data-mining tools available.
To use all four cores, if you're very lucky, your data mining might be easy to parallelise, in that you can give separate pieces of the data to each core and use them that way. Otherwise, unless you're using an existing library, you will not be able to write a parallel code in a few weeks.
However, you said "I need a language I can pick up in a few weeks so I can get back to my research. I am not a CS major, so I care more about the answer than the code itself."
You don't say what your course/subject is. If it's anything that requires computing a lot. I'm sorry to have to say this, but programming is part of your research. If you're experimenting with algorithms and modifying them, you're in the realms of scientific computing, and you will have to bite the bullet and learn a decent programming language (Fortran or C++ probably), and you will end up using it. On the other hand, the effort should pay off, since you will become more productive, be able to experiment with new algorithms more easily, etc.
Also, most computational scientists are not CS majors, they are probably mathematicians, physicists, or something similar initially, who realise they need to learn to program, and have the mindset and ability to do so. You should care about the code just as much as the answer. What happens when your paper is submitted to a journal and you need to make corrections? You will need to dig out the code, make modifications to it, debug it, and rerun it, 6 months after you thought you'd seen the back of it. If you haven't spent time making the code easy to read in the first place, you will become unstuck at this point.
Welcome to Scientific Computing. It's not the same as computer science, but you still get to play with big shiny computers and do lots of wonderful stuff with them.
English is the best language to learn it!
Those "the best language" questions are the most stupid ones.
In order to realize all possible performance from your hardware, I would suggest linux over XP.
With xeons going 64-bit around 2005, it would have to be really old to be only 32 bit.
And even if it was an ancient 32-bit only xeon, XP is still going to have issues using more than 3.5 gb ram.
XP process management seems weak to me compared to the linux side of things.
I don't have a favorite brand of linux to recommend; I would ask your professors and fellow researchers if they have a preference (because they are going to be your go-to support crew).
In any event, I would try to max out the ram your specific motherboard can handle.
And I would beg/buy/borrow/steal a modest SSD to run the OS on, you can probably get both for $100 or so.
Keep your data sets on the slower spinning-rust drives.
One especially insightful response I saw above was asking about what kind of computation you're running.
The python guys are probably right.
I suspect your problems with VB is it will be single-threaded, and (I'm not a VB developer, I've just had to cope with it from time to time) not so generous with efficient data types.
I've had some awful experiences trying to run multi-threaded procsses on XP and Java.
I think you'd get better results from ditching XP.
Your actual language doesn't matter as does some parallel-capability.
Finally, the good news: almost anything is certain to be better than running VB in XP.
The fact that you could implement your solution VB suggests that it is not crazy complex.
Doing it in raw C will be a pain because you'll have to code your own process management.
I'd be very interested in seeing if numpy or perhaps "R" can do the math that you need.
Do follow up and let us know what you end up doing.
It sounds like you have control of the whole machine, which makes you the sysadmin. You don't only get to choose the programming language. You have to design a workflow. The programming language will fall out of you designing your plan of attack. You have to do so within the limitation of your advisor's budget, the assistance you can beg, etc. Take comfort in the fact that procedural languages are deep down 98% the same with different words for things, it is the libraries that get confusing. And read the library documentation like your life depends on it. It does.
The lab I work in does a lot of what qualifies as "Scientific programming", including doing a lot of computationally expensive analyses.
We use Perl as a prototyping language, and then convert the code to C++ once the prototyping is done. Perl is easier to write and develop in, but C++ is far more efficient for computationally expensive programs.
Perl vs. Python is mostly a personal preference thing. And by personal preference, I mean the preference of your PI. Some labs use one, some labs use another, and the professors tend to have very strong opinions on which one should be used. So, use whichever your PI prefers. Also check to see what his feelings are on publically available modules -- some PIs want you to write the code yourself, even if a version already exists in the wild.
They're cheap and run on caffeine.
You said you already used VBA. While not a direct descendant, VB.NET would be faster than recoding in any other language. You will likely be able to copy and paste much of your code without much rewriting.
Visual Studio Express is completely free by the way.
Do you have access to MATLAB or a similar analysis tool? Many universities have licenses, and overall it seems like it might be a good choice for you. These programs usually have a lot of build-in functionality that will be difficult to reproduce if you are not an experienced scientific programmer.
I haven't done ANY programming in about 12 years, so it would almost be like starting from scratch
http://googleseovn.blogspot.com/
Look at other people in your scientific specialty. What languages are they using? What about related specialties?
For example, I don't know macrobiologists (wildlife conservation, etc.) that do much programming. What I saw was VBA, with S+ for statistics.
Microbiologists love to use python, sometimes use perl, and occasionally branch out to other languages.
In bioinformatics I saw python and matlab, with C or C++ for parallel, high-performance pieces.
In proteomics I saw parallel code written in C, run exclusively on supercomputers.
Science will require collaboration at some point, so use tools your collaborators will be familiar with. Otherwise you need a convincing reason why technology X meets your needs much better than Y.
PHP because it's fast, fun and scalable!
*ducks for cover*
There is often much to be gained from thinking about smarter ways to implement your algorithm. Do you have nested looping where parts can be unrolled? Do you recalculate values which you could store? Are you using strings where a number might work? Are you using Single precision numbers where an Integer would do? Have you looked for a library which might be smarter about computationally intensive logic? Have you built your application into an EXE rather than interpreting it?
You don't specify the scientific field. My experience is from biology and what i can recommend is Python (look at the numpy and BioPython modules) and R (www.cran.org), which is an excellent statistics and data mining tool (again on the biology side it has the bioconductor toolset). MySQL may also come handy to store data depending on the project. I find myself writing some pieces in R, some in Python and using the Rpy2 python module to glue them together. MySQL can also be accessed from both python and R.
It's lisp on the JVM, has a great statistical package already (Incanter), has pmap (parallel map), and ForkJoin support. Making something use all cores is as easy as changing (map fn data) to (pmap fn data) -- although that's a simplistic answer and some problems require other methods. You can also create an uberjar that bundles all dependencies into the jarfile and send that to the computing institute of your institution, rather than dealing with dependency hell like Perl (and to a much lesser extent Python) when you don't have root access.
Being on the JVM you also have access to just about all java libraries, Scala libraries, and possibly Jython libraries (I haven't seen nor tested). Interop is amazing, you don't have to know Java (I don't).
If that doesn't work for you, I say Python, it's quite standard, but for the heavy lifting I really enjoy Clojure and it's libraries. And yes, with parallelization, I got my bad code to kick the ass of highly optimized C code (single threaded) 63hrs in C to 5 hrs on the lab's overclocked queue for gene correlation.
The laziness of Clojure also helps you deal with very large files that won't fit into memory.
If you haven't written any code in 12 years, and aren't even sure which language is best suited for the project you're doing, there is a larger problem.
The elephant in the room is that the algorithm you concoct to solve your problem is more likely to be the performance killer than the language or platform you pick.
You'd be better off buying a data structures book or some other language-agnostic text book and learning to be a better coder in a language you already know, than starting at square 1 in a new language, thinking your efficiency and speed problems will magically go away.
http://www.hardocp.com/article/2012/05/08/inside_mind_stuart/
I saw it years ago, when everyone was wondering if it was real, lol.
Reminds me, I need some more nyquil...
Truth isn't Truth - Guliani
Because you want to do science instead of programming.
Type "data minin"g in Wikipedia and there is a list of open-source and commercial applications for data mining. Python has a great collections of free libraries for research in data mining, if you google "Python data mining".
Perl Data Language
The power of Perl + the speed of C
As someone who regularly has to waste time compiling scientific software.
Learn autotools and set up proper build environments, scientists re-invent the wheel far too many times in new and surprisingly retarded ways.
Sometimes I'm amazed at how little reflection that goes on.
At some point developers doing scientific code should go, "this is silly, maybe we should just re-do the build process".
Some scientific applications uses the configure script to launch scripts that downloads and patches code and chain fires off a series of other scripts to do all sorts of nasty things, but nothing beats REQUIRING one to install into the source directory, which is actually very common.
What language you choose, who cares.. use what's the accepted standard in your field, and if that's Java, change fields.
write code on bare metal... assembler or machine code.
Or, if you are thinking Inception-esq....
Outer layer - Python (UI, controls and getting things started).
Down one - C/C++ (The place where control spends 90% of its time grinding away). Also handles most/all of your intensive I/O
Down two - Assembler (To make a rocket-sled you need to bring rocket engines. Real rocket engines. Saturn F1's! Not some throw away, short lived JATO). Inline your code where things run slow in C/C++ - but only where things run slow. One slip up and you could be heading full throttle to the Gamma Quadrant.)
Down three - Hand-tuned machine code. The areas where the Assembler just isn't cutting it...
Properly done - the job will be so fast it will finish before it started :)
Just like Inception where each level down becomes more bizarre, time also runs perceptually slower as events occur much, much faster. Those of you who have every programmed assembler - even just for a class at college - know what I mean.
But, if you don't need all that power/performance/maintenance - Pythyon/C/C++/Fortran/R, etc. will still run a lot faster than VB/VBA.
If you have access to Matlab licence then I suggest using it. Syntax is almost same as Octave but it is faster and it can do multicores automatically. However, if you need to purchase licence, then maybe some other options are better.
I am a scientist who dabbles in data mining problems. I use Python with a healthy dose of C++ and the occasional Java. These are probably the three most common languages among the community. I see people using R and Matlab relatively frequently. A bunch of people in this topic have suggested Fortran, but I've never seen anyone use it seriously.
I haven't run into anyone who who doesn't use a minimum of two programming languages (Python/C++, Matlab/Java, etc.).
Note that Kaggle.com (the data mining competition site) frequently posts their example solutions in Python. Failure to understand the Python solution starts you out at a healthy disadvantage.
Perl. That is all. :)
Why is it so hard to only have politicians for a few years, then have them go away?
Back when Men were Men....
Check out PDL. I've been a happy user for years and have no plans of giving up. It works, is fast enough for Real Work and (at least in my environs) there is a ton of software written using it.
SAS
We tend to use R, C, some C++, and a lot of Perl.
But then, we do real science.
-- Tigger warning: This post may contain tiggers! --
Sounds like you are working on some sort of similarity search problem.
You probably find most of your peers are working with C/C++.
If that's the case I'd go for that language.
You are not going to write everything from scratch by yourself. You're just not. Not if you actually want to get anything done. You're going to reuse code.
So: figure out what code you're most likely to reuse, what frameworks are useful in the field you're interested in, and let that suggest the language.
If you don't know how to get started on that: asking the question of peers in the same scientific field will get you a more useful answer than asking the question on a wide-open generic technical forum.
Another angle: look at what network databases you want to integrate with (eg. protein databases at nih.gov), and look for sample code showing how to access 'em. That'll give you a clue what other practitioners are doing.
I'm a bioinformatician and make use of bash almost exclusively. Throw in some gnu parallel and you're there..
For scientific computing, you will be doing a lot of collaboration and very likely sharing codes with other scientific programmers, very few of whom enjoy learning new programming languages all the time. To simplify/enable collaboration, you should follow what the community uses. In physics, generally that means Fortran. Anything past Fortran90 is basically modern, it's really not too bad to learn and even has basic object-oriented stuff, though not as good as C++. F77 is mostly obsolete and a major pain in the neck, but you will see it around in older codes, as well as a lot of the libraries. There are C/C++/Python/f77/etc codes around, but most physicists use >F90, especially in high performance/parallel computational work. But there are subfields of physics with their own popular tools too. My advice is to go with whatever the majority of your colleagues are using, placing a very big premium on what your adviser and group members use, which is who you will collaborate with the most. What the majority in the field uses is usually suitable for the job anyway.
It sounds like you're interested in parallel computing as well. Fortran is probably the best option then, mostly for the libraries, but you can still interface from C/C++ or whatever. Also, if you have a lot of computationally intensive stuff, you should try to get supercomputer access. Ask around, you should be able to work something out. You'll need to decide on OpenMP or MPI for parallel programming, depending e.g. on your memory, shared/distributed etc. Here's a quick rundown: http://www.dartmouth.edu/~rc/classes/intro_mpi/parallel_prog_compare.html
Most scientific hpc (high performance computing/supercomputer/parallel) is on unix/linux.
What field are you in exactly, and what is the nature of your data mining?
Especially for a beginner Fortran will make the most sense, IMHO. Here's why:
- User-friendly syntax. Especially vector and matrix syntax and operations are very intuitive.
- Strong typing will let you catch lots of errors at compile-time rather than let you hit your head against the wall at run-time.
- Fast in quite a foolproof way (just remember to loop over columns first, or possibly even use the simpler Matlab-like syntax and let the compiler figure it out)
- Usable with OpenMP and MPI
- Massive availability of free code on the net (visit netlib). Old code also has very good chances to run out-of-the-box or with very minor changes.
C is a close second to many of these points. I can't recommend C++ and Java though, as all the clutter will slow you down especially at the beginning. I also like Python a lot, but there's a catch, which brings me to my next point:
Do you have a plan of your program? How well do you know what you'll be programming? Is that going to change a lot along the way? Code changes are a lot harder to implement in Fortran and C than in Python. The abstraction level in Python really amazed me, as the interpreter would run anything I would throw at it. Mixing paradigms in Python makes scientific programming a breeze of fresh air. There are ways to make it quite fast, too! Mixing procedural and object-oriented programming is possible in Fortran as well, but by far not as versatile as in Python. In any case, if you decide to use the OO-paradigm, you have to make sure you have the whole program figured out before you start so that you can define your objects wisely. I have mixed experience with scientific OO in C++, so I can't really recommend it.
Output is also a huge topic in scientific computing. It is a pain to make live graphs in Fortran (the intel compiler has a proprietary library that somewhat helps), but you can export the raw data and use gnuplot or another tool for visualization (even Excel for small graphs). For larger 3D+time datasets there is Paraview. These things are much more fun to do in C/C++. On the other hand, you can use the C-interoperability features of Fortran 2003 and combine them! Or use f2py or PyFort and combine Fortran and Python!
Things have moved on. Fortran 95+ is almost as easy as Matlab, definitely easier than C++ and faster than both.
And for heavily numerical algorithms it is better designed, beyond just the speed.
If you want to do programming as a career, you need to be flexible enough to be able to pick up any language, so use whatever language you feel comfortable enough using to write maintainable code.
LaTeX is a great thing to learn, but it is most emphatically NOT a remotely reasonable choice for writing number crunching code...
SIGSEGV caught, terminating
wait... not that kind of sig.
When I was in grad school I had your mindset. Now that I'm out, I (thankfully) have a very different one. You're talking about this project taking you weeks. Why not have the lab contract with an actual programmer (maybe even a grad student in the CS department) to write this for you? It'll get done faster, it'll be easier to extend/modify, plus you can do research in the meantime which will move you closer to your goal of graduating.
The problem with this question is that "scientific computing" is an over-broad term. The truth is that certain languages have found specific niches in different parts aspects of scientific computing. Bioinformatics, for example, tends to involve R, Python, Java, and PERL (the prominence of each depends largely on the application). Big-data analytics typically involves Java or languages built on Java (Scala, Groovy). Real-time data processing is generally done in Matlab. pharmacokinetics, some physics, and some computational chemistry are often done in FORTRAN. Instrumentation is generally controlled using C, C++, or VB.NET. Visualization is done in R, D3 (JavaScript), or Matlab. Validated clinical biostatistics are all done in SAS (!).
Python is a nice simple to learn start, very powerful, and the NumPy package is important to learn for scientific computing. R is the language of choice for many types of statistical and numerical analysis. Those are a good place to start, if incomplete. From there, I'd look at the specific fields of interest and look at what the common applications and code-base are for those.
With regard to the OS, that's pretty easy: Linux (though OS X is a reasonable substitute). Nearly all scientific computing is done in a UNIX-like environment.
Matlab (closed/paid) if you want a GUI interface to work and plot in. A lot of academics that do matrix operations use matlab as the operations are fast out of the box (you can be just as fast in other languages if you have the supporting libraries appropriately compiled). Many of the matrix operation libraries are only available for matlab (i.e. tensor toolbox, many kernel method routines).
R (open/free) if you want a function oriented language that is most similar to what scientist often use/see. It generally (changes dramatically by your field) has more mature packages, packages install more easily without sudo/admin, and is closest to a language that "just works" for most scientists. Data frames are part of the core package, so it is easy to hold all your data together in a "logical" frame work.
Python (open/free) if you want an object oriented language that is close to what programmers want to see. It generally has fewer mature packages (though this is changing quickly, I tend to run into more IT roadblocks installing packages because it installs at the machine level (not user level as with R), but is by far the most modern programming language. Data frames are implemented as additional packages to compete with R (I believe pandas is your ticket for that).
Any of these points can be nit-picked and you will see a lot of people disagree. However, scientist generally want to code quickly with mature packages (even if the cost is slightly longer run times) and, having recently been in the same position you are in, I find these to fit the bill.
I'd look into matlab. Im a current undergrad studying CE and we use it for everything.
If you're going to have to do any sort of GUI work, or just want a simple standard library like Java's, C++ with QT is an excellent choice.
Julia
Religous speak to God. Insane are spoken to by God. When all shut up, one can finally hear Shostakovich in peace
A ridiculous amount of engineering is done within Matlab. I don't think it's a good idea because it's generally as slow as python and proprietary/expensive, but knowing it is probably in your best interest.
Python, as others have said, has caught on for a lot of scientific computing tasks. I'd recommend the Anaconda bundle of scientific-minded packages.
A lot of legacy code is done in Fortran, but I don't like the poor standardization among Fortran compilers. Too many support their own extensions and that can cause trouble. C++ with the Eigen numeric/linear algebra library goes a long way to replacing fortran for me.
I would consider Python. I read your whole post (unlike some commenters, it seems), and Python is your best bet. It's easy to learn, easy to implement, very effective, and it's very fast...
In part, this is because Intel has a compiler for it. On commodity hardware (as in desktop, laptop), you will generally get the best performance running an Intel CPU and using an Intel compiler. That means C/C++ or FORTRAN, as they are the only languages for which Intel makes compilers. C++ is easy to see, since so much is written in it but why would they make a FORTRAN compiler? Because as you say, serious science research uses it.
When you want fast numerical computation on a desktop, FORTRAN is a good choice. We have a few researchers here who use it, and they all use the Intel Fortran Compiler because they want fast computation, but they don't have the money to buy bigass systems for every grad student. What they get out of the IFC and a regular Intel desktop chip is pretty impressive.
Compilers matter, and Intel makes some damn good ones. So if your research calls for lots of performance on little budget, that can influence language choices. Heck same thing on supercomputers. That is not my area of expertise, but it isn't as though all compilers for a given supercomptuer will be equally good. If I were to bet, I'd say the FORTRAN compilers are some of the better ones.
The best compiler support for numbers will commonly be Fortran.
Python belongs on the list because slow functions can be coded in C
or another native language for speed. It is also a rich and portable protyping
language.
There is value in asking your advisor.
A linux distro like Centos is well regarded, almost any programming language
can be downloaded. Switching to Redhat for product support has a small learning curve.
R is a statistical rich environment that you should be aware of. Python bindings for R exist so
again Python.
SUMMARY: Python and R. R may be all you need.... R makes charts and graphs, slices dices....
runs on many platforms even WinsowZ
Truth is stranger than fiction, but it is because Fiction is obliged to stick to possibilities; Truth isn't. Mark Twain.
Python Python Python Python Python It already does almost everything you could ask for, and is growing in acceptance and userbase. It is a modern language with modern language structure. It is designed to be human readable and consistent.
Your best bet is to use C. It is highly efficient. If possible use computational code like the Atlas BLAS package. This code will run circles around your own code no matter what language you use. You already know C and moving to C++ is a major problem. All the other languages are distractions from your purpose.
If possible run multiple, independent processes rather than writing parallel code. That can be a major ordeal.
If your goal is to process data as opposed to learning elaborate programming techniques, keep simplicity in mind. C is a very powerful language and you can reach maximal efficiency for many problems using Atlas BLAS and multiple processes. If you goal is to get a degree in CS, ignore what I've suggested.
Ray Seyfarth, ray.seyfarth@gmail.com, http://rayseyfarth.blogspot.com
If you are working in academia, then you probably have access to Matlab.
On the other hand, you definitely have access to SciPy, given that it's free.
I predict that Python with SciPy/NumPy will completely displace Matlab within a few years.
I say that even though I am working in one industry, digital signal processing, that is really married to Matlab and will be one of the last places to make the switch.
Because Matlab was purpose-built for scripting with matrices, it has some nice syntactic sugar for that. In every other way, Python as a language is far superior.
I was able to attend the SciPy conference a couple of years ago, and one thing I heard there: people like that Python works as a universal language. Sysadmins can use Python to do admin tasks; the web site guys can use Python (with Django) to make web sites; the science guys can use SciPy... it's one language that is flexible enough to do anything you might need, and it's much easier to learn than other really flexible languages like Lisp.
Because Matlab has been around a long time and has man-centuries of work invested in it, it has very complete and well-debugged libraries available for it. SciPy is playing catch-up here. But the basics are already solid, and if SciPy will work for you, you should choose it because it is the future.
There was a time, not that long ago, when people spent $30 to get a web browser. Now people expect web browsers to be free. I predict in the near future the same thing will happen with Matlab vs. SciPy.
SciPy has the advantages of being free and open, as well as the advantage of being free as in beer. And Python is just a better language than the Matlab language. Mark my words: Matlab will fall and Python/SciPy will rise.
lf(1): it's like ls(1) but sorts filenames by extension, tersely
There is no single language that does all -- so learn multiple. To get back in the game, you might try this new fad, MOOCs. I did (mostly for laughs, alongside much heavier courses involving programming) udacity's programming 101 and it teaches python. Might give it a whirl. Coursera also has lots of relevant courses you can use to help you brush up.
I've found that python is convenient to throw things together quickly (as is perl, but that gave me a headache, as are things like shell (for scripting, bourne shell, not bash, nor csh), sed, awk, and so on). Still, even with lots of scientific libraries available, it's not something I'd rely on for everything. My usual "tinkering for fun" language is C++, by the by. It's good for speed but to get to it you may need to know more computer internals than you care for. No extant language is suitable as your sole window into programming, so knowing more is an asset, not a burden.
Oh, don't be afraid to throw code away and re-do it, possibly in a different language. After a bit of practice you'll see why. Also learn how to use a source control system. Regardless of language, that's useful. Try a few (for example, git, mercurial, svn, given in alphabetical order).
You mentioned OSes, and personally I would pick linux (or, say, FreeBSD, or PC-BSD if you want shiny GUIs) as a platform as it gives easy access to quite a lot of free software. In fact I ran (and run, when I'm not slacking off on /.) those MOOCs on a tiny core linux booted off a usb key, including python and octave, even postgresql.
You're not a CS major, but since I mentioned MOOCs, if you're going to deal with serious amounts of data then you may want to follow (over time, not all at once) a few more courses. A course in algorithms is useful (and awesome), as is intro to relational algebra and databases. It teaches you SQL, and that's useful to know too.
Already mentioned are julia, R, octave and a bunch more. My impression with octave was that brushing up your linear algebra is rather useful. You may also want to take a look at pspp and scilab. Which you'll pick to actually use probably depends on who you're working with. Depending on just what you're doing, there's also data visualisation programs you may want to look into.
Perl is still in wide use.
Do not use Perl for this. I've been using Perl for 15-20 years, and I love it for "scripting", text processing, etc., but using it for scientific computing sounds like an exercise in masochism.
A lot of bioinformatics and computational biology uses Perl, so if you're working in those areas you're going to run into a lot of it.
For easy but slow number crunching, use MATLAB/octave.
For simulation (weather, fluids, materials, FEM), use fortran.
For stastical analysis, use R (I think--I've never used it).
For general purpose, use python.
For communication with hardware (sensors, IO cards collecting data) use C/C++.
You'll probably have to make some compromise, because your work will cross boundaries. Maybe interfacing with the hardware won't actually require using C. Or maybe it will, but the libraries you need are for a different language.
#1 reduced the field of choices (IMO) to * Matlab/Octave * R/S+ * SAS * Perl * Python * Julia
As for #2 gives preference to Python, R, Julia, Perl, or Octave (Your situation may not be as limiting).
#3 led me to many searches that all indicated that R and Python have a rich set of libraries and lots of community support.
As for #4 From Julia's website http://julialang.org/ they show nice benchmark information that indicates that Python is pretty quick.
My conclusion was that I couldn't really go wrong between R or Python. However, I chose Python because it was quicker, I like the syntax better, I like the libraries better (NumPy, SciPy, Pandas, Matplotlib) and is seems to play nicer with everything else. This is what worked for me and how I went about deciding.
I put the 'Physics' in 'Physical Attraction'
In addition to the excellent comments previously made, consider investigating the Center for Open Science, specifically their information for developers, and the associated Open Science Framework (note: will display only if cookies are enabled; I've no idea what value they provide in this context and will be contacting them about that).
They may not have anything that can help you. Or they might. Or you might be able to help them. Or not. YMMV, etc.
Worth taking a peek, anyway.
I'm here EdgeKeep Inc.
The best environment is one that already has the stuff for your application, so that you just cobble together calls to code written by somebody else. Perl + CPAN was the winning combination for many years. These days it seems to be Python+numpy/scipy/scikits.
I would say it largely depends on what people in your field use. I use Matlab on a desktop for data analysis and Fortran/Python for HPC number crunching (astronomy/planetary science). Recent releases of Matlab have seen heavy optimization in number crunching and the parallel processing toolbox is incredibly simple to use. The plotting and graphing tools are second to none and very intuitive if you want to visualize multi-dimensional datasets. For integration of visualization, editing and debugging in one scientifically-oriented IDE, it can't be beat. Plus it sounds like you're familiar with GNU Octave. Python is a better language in my opinion, but lacks some of the 'do-science-straight-out-of-the-box' feel that Matlab is good at. Python obviously has the advantage of being free. The best scientific package is the Enthought Python Distribution which integrates their Canopy IDE with numpy, matplotlib and other great python modules. Free licenses are available to student/academic users.
Where I'm coming from: I'm a satellite physicist working as a contractor for the USGS on the Landsat program. I work very closely with NASA.
Almost all the scientific programming we do -- and by 'we' I mean USGS and NASA -- is either in IDL/ENVI or Matlab. They're the defacto standards for scientific processing. We do need to know SQLPlus to get our data out of the databases, and we need rudimentary C++ skills in order to make prototype code for the IT coders to turn into an operational release. Sometimes it's easier to code something in C++ then IDL or Matlab, so it's nice to be able to jump straight to that when warranted. Add Perl for text manipulation (which always turns out to be useful in some way) and that's all the programming I've done for the past ten years. Many scientists in the building swap out ARCGIS or ERDAS for IDL/ENVI. (Matlab doesn't seem to be swappable; you either need to use it or you never touch the stuff.)
I've dabbled in Php when they asked me to prototype a web site but that never went far. I've done a little Flash programming that they eventually decided to hire out for. (I did a fine job, but they wanted the application to go bigger.) In the early days of my career FORTRAN was everywhere, you couldn't get away from it. There are still some FORTRAN programs in-house that I could fiddle with if they asked me to, although I'd blanch at the prospect.
All that said, what you need depends on what your role is. If you're a scientist like me then these self-taught languages might be enough. If you're a science-oriented IT person, you'll need more -- most importantly strong C++ skills, at least around here. And different disciplines will have different needs; I worked briefly for NIH (National Institutes of Health) and they still had COBOL programs.
I know of one person in two organizations (USGS and NASA) who knows Python, and he's an IT guy not a scientist. He's also the only person I know who has ever used Hadoop. I have never met anyone who knew R. Visual Basic is used occasionally here and there for prototyping, and almost immediately switched out with C++ as soon as management decides to support the project.
Genocide Man -- Life is funny. Death is funnier. Mass murder can be hilarious.
I suspect that VB is NOT your problem here. But, if you have a VB program that is too slow, then I'm going to suggest you do the following:
1. Profile your program and see if you can figure out what's taking up all the processing time. It may be possible to change the program you already have slightly and get the performance you need. It would be a shame to go though all the trouble to learn a new language and recode the whole thing if replacing some portion of your code will fix it. Do you have a geometric solution implemented when a non-geometric solution exists?
2. Consider adding hardware - It's almost ALWAYS cheaper to throw hardware at it than to re-implement something in a language you are learning.
3. Rewrite your program in VB - This time, looking for ways to make it perform faster (you did profile it right? You know what is taking all the time right?) Can you multi-thread it, or adjust your data structures to something more efficient?
4. Throw hardware at it - I cannot stress this enough, it's almost ALWAYS easier to throw hardware at it, unless you really have a problem with geometric increases in required processing and you are just trying to run bigger data sets..
5. If 1-4 don't fix it, then I'm guessing you are in serious trouble. If you really do not have a geometric problem, You *MIGHT* be able to learn C/C++ well enough to get an acceptable result if you re-implement your program. C/C++ will run circles around VB when properly implemented, but it can be a challenge to use C/C++ if your data structures are complex.
6. Throw hardware at it - seriously.
Unless you really just have a poorly written VB program or you are really doing some geometric algorithm with larger data sets (In which case, you are going to be stuck waiting no matter what you do) getting better hardware may be your only viable option. I would NOT recommend trying to pick up some new language over VB just for performance improvement unless it is simply your only option. If you do decide to switch, use C/C++ but I would consider that a very high risk approach and the very last resort.
"File to fit, pound to insert, paint to match" - Aircraft Maintenance 101
Matlab
You know C. C is simple, as fast as any alternative, it's straightforward to optimize (aside from pointer abuse), and you always know what the compiler/runtime is doing. And threading libraries like pthreads or CUDA are best served via C/C++. Why use anything else?
Another thought: scientific libraries. If you need external services/algorithms then your chosen language should support the libraries you need. C/C++ are well served by many fast machine learning libs such as FANN, LIBSVM, OpenCV, not to mention CBLAS, LinPACK, etc.
What do the other grad students in your field do? What does your advisor do? You can accomplish your goals with one of many different tools or languages, and the truth is that there are several good alternatives, many of which have been mentioned here.
If you choose the same tool as your colleagues, your life will be much easier.
... that is all
I work in gambling and we write everything in c++. It's as fast as anything else, and the new standard makes everything a lot easier. Threads especially. You also have tons of libraries at your fingertips: GSL, good RNGs, whatever you can think of. I think C is probably the worst thing to write in if you're just coming back to it. You'll end up spending more time with memory management than you will actually getting stuff done. Go with C++ and use standard containers.
My experience at this comes from being a MooC addict where some of the courses are in Python, others in R, and others in Matlab or its GNU counterpart Octave.
Of these Python is my favorite since it's the language I'm most familiar with. Furthermore, you can "bolt" R to Python with the Pandas library, and you can "bolt" Matlab/Octave with the Numpy & Scipy libraries.
A big drawback, however, is speed. The big advantage of domain specific mini-languages over "kitchen sink" languages was brought home to me by writing a Python script to simulate the popular (in statistics courses) Monty Hall problem and the same script in R. While my Python script took several seconds to simulate a couple of thousand Monty Hall game turns, the R script would give the percentage for millions the instant I hit the enter key.
More complicated problems ended up with weird bugs in R scripts I couldn't figure out, whereas (because of my better familiarity with Python's "mutable list" problems) I tended to get correct -- albeit slower -- answers from my Python programs.
Re Octave: whereas R has overtaken commercial versions of S, I've written off Octave as a lame "freeware" version of Matlab -- lots of features are missing, the documentation is frustrating (it seems to only be used by universities, so "gurus" on stackoverflow etc automatically assume any question is some student trying to cheat at homeworkd) so I'm not a fan. But if I knew Octave as well as Python, I might like it.
R, on the other hand, has an obvious speed advantages for the problems it's aimed at, and a probably a better selection of specialist libraries for statistical problems. But it's full of strange quirks for non specialists.
If it works, it's obsolete
I have worked for almost a decade in scientific computing, and it is Fortran everywhere. Make sure you get up to the new standard. Contemporary Fortran is not the same as Fortran77. Many problems typically associated with Fortran are things of the past. ;)
Next is C.
Moreover, Fortran is a fairly easy language to learn.
Avoid all object oriented stuff. For scientiicf computing, this is never used, and even shunned to a great degree. Avoid c++ and C# and all that stuff. When you work in SciCom, you will never see that anyway.
Best is meaningless without a measure of goodness. (from Optimization) You are going to get a slew of candidate bests but folks aren't going to often articulate what makes it best. there will conflicting or even mutually exclusive rubrics.
The goal of the language might include:
- inexpensive (starving college student budget)
- employable (typically used and valued in your post degree career)
- fast enough (not every grad student needs to run on a supercomputer to get their job done)
- great breadth and depth of libraries
IMO the "R" language does some of these really well.
- It imports into JMP, SAS, and Python so you can wrapper it for your job.
- It is engineered and maintained by stats/math grad students so it is wide, deep, and mostly correct
- It is open source so it is free
Personally I use MatLab, which was taught in school and it hurts for the following reason:
- where I work is JMP-dominant, so it is pulling teeth to stay in the $5k/yr CAL.
- nobody else here speaks the language (statistically speaking) so I have to do extensive hand-holding to share the code
- If I am not connected by VPN to the work CAL server, I can't turn on my software
As long as I am not doing CFD I find the interpreted language is good enough. Computers today are much better than the supercomputers of 15 years ago. We have smart-phones with better CPU's than a bleeding edge Pentium II yesteryear.
I particularly like RStudio as an IDE.
http://www.rstudio.com/
I was thinking of what could give the op a performance boost while staying on a ramen budget.
*shrug* without knowing more it is really hard to say.
And I would beg/buy/borrow/steal a modest SSD to run the OS on, you can probably get both for $100 or so. Keep your data sets on the slower spinning-rust drives.
If he's going to keep the data sets on the spindles then I see no reason at all to invest in a SSD. All calculation takes place in ram, it is loaded and written to spindles... Yeah the computer will boot in 15 seconds instead of 75, but how often is this thing going to be rebooted?
Leave programming to the programmers. If you want to get science done, use LabView.
-
J is your language. Not because it's easy nor fast to learn... in fact it is a language for bravehearted people, but a breath of fresh air for the mind.
Otherwise, take the "library" approach: choose the library that will easy your task and a language (not C) you can best work with that library.
a) Python can be compiled to machine code.
b) Java compiles to machine code automatically with the JIT compiler that's been built into the JRE/JVM since... forever. The reason you don't use Java for scientific computation is its lack of some IEEE floating-point types/semantics.
If you need to be very flexible, which is typically when you are doing research from scratch -- devising/changing your algorithms often, visualizing the data, etc., I'd suggest MATLAB. It allows you to program and evaluate stuff very quickly. If you are able to vectorize the problem you are solving, it is also very fast, since it uses highly optimized vector/matrix handling libraries.
Once you know what you want to do and how, you might want to implement your stuff in other languages, as MATLAB is cumbersome, if you for example need to process text or perform networking ... or actually do anything that cannot be vectorized. In such a case one choice would be Python, that has lots of libraries for everything.
As for C/C++ (or even Fortran :-o), I would avoid these unless you need to address a bottleneck that cannot be solved by use of an optimized library. And even in such case, I would only rewrite the bottleneck in it, nothing more, and interface with higher-level languages. Programming in C/C++ is literally a minefield for beginners. Updating/refactoring your code in C/C++ takes much more time than in higher programming languages, as you need to take care of many issues related to low-level programming (compared to Python or Java, even C++ is a low level language). Actually I'm surprised that so many people recommend it.
You did a fair amount of C? Just refresh your knowledge of C, and you'll be back to business in a few days.
Gravitation is a theory, not a fact.
Why not? It's turing complete!
6. Throw hardware at it - seriously.
This must be the most lame piece of advice on the whole page. If we were having this discussion back in 1990, then yes, "throwing hardware at it" could be the way to go. But now PCs are thankfully in the 64bit era so programs are allowed to allocate a lot of memory (of which there is always plenty in a modern system), SSDs offer fast I/O, your standard CPU has 4 cores or more and the GHz race has stagnated. And this is a very typical PC that you can get for less than 1000 bucks. So what is he going to do to "throw more hardware at it"? Get more memory? This is not going to help unless his program is paging. Should he get more cores or distribute the computation over a network? And he is going to do this with VB how, exactly? Oh, and he never mentioned VB, he mentioned VBA, which is interpreted. By Excel. Yikes.
Using the right algorithm is correct, but it's not going to help much in this case. He'll get a moderate speedup whereas picking the right language will reduce the rum-time by orders of magnitude.
I would love to see it done though ;-)
It should also be noted that as of C++11 threading is part of the C++ standard library (so you usually won't have to use pthreads or any other platform-specific threads directly).
Don't be silly, you can write FORTRAN in any language..
Organization? You must be joking..
You are more likely to end up in IT than in science, so use an language that is used in IT
Obviously, it depends on what you're doing, but without knowing that, some general comments:
If I we completely hands-on ignorant of all programming languages, and could pick exactly two to achieve extremely high proficiency in, they'd probably be C++ and Python. A third might be a modern Fortran.
The C++/Python combination gets you Scipy, Numpy, and C mastery for free, and access to extensive user communities and libraries for both, including good linear algebra packages. (Adding Fortran into the mix will aid the linear algebra situation as well.) If you know C/C++ very well, and are comfortable with thinking about operations at the hardware level of whatever you're using, then you will also be able to pick up CUDA pretty quickly if you happen to find yourself working on something easily parallelizeable. Your post indicates you may be interested in parallelization, but be warned the CUDA/GPU processing is not a magic wand; it comes attached to a specific hardware model which is fucking fantastic for some applications, and only so-so for others. But if you think you want to go that route, then you'll end up learning C/C++ anyway, is my guess.
(Anyone wishing to dispute the point about CUDA is invited to survey the literature and point me to a package for sparse vector-matrix multiply that shows the same performance gain over a CPU svmm package as one can readily achieve for the dense versions. No prior assumptions over the type of sparsity are allowed. A package for tensor manipulation of arbitrary order; bonus points for efficient sparse tensor handling. Seriously, help a guy out.)
I might also urge some caution with Fortran. I know modern versions of Fortran support recursion, but when I learned it, I learned on F77. F77 compilers did not UNIVERSALLY support recursion, and versions before that generally did not. I have very little detailed knowledge about how many modern Fortran libraries use recursion, which old ones have been upgraded to support it, etc. An expert would know. But it seems like something that could bite a fellow in the ass.
If its really heavy compute intensive I would say use fortran its relatively a simple language to pick up rather than C++ or C.
For Java, I use System.nanoTime(). For C++, I use the Windows-specific QueryPerformance() call. So
The technique profilers claim to use is to calibrate the overhead of something like System.nanoTime() with a loop and then subtract the estimated overhead from your instrumented code. The overhead to the nanoTime() calls can even be larger than the execution time of code segments you are trying to measure, but if your estimate is accurate, this works.
I am doing plain add-subract-multiply-divide in a mix of scalar and looped operations on arrays -- if you are doing largely trig or calls into a numeric library, you are timing loops and library calls, not the intrinisic performance of your language.
I am also doing a lot of the OO version of using global variables. Java is supposed to do "escape analysis" where if you allocate inside a method and don't let a reference pass outside, the JIT is supposed to recognize that as a local-context stack allocation, but I am not using an advanced Java version or the right set of JVM flags to get that to work.
1. get back in to Fortran - especially if you'll be working with other researchers on existing code.
2. python, with the scipy and numpy libraries.
Python is being suggested frequently here. If you do go with that, I'd strongly suggest taking a look at the Spyder IDE:
http://code.google.com/p/spyderlib/
It's especially useful for scientific work and entirely cross-platform. I even have it running under FreeBSD.
But, my point is that if you have a properly written program, even in VB, the performance gain from recoding into C/C++ is not going to be all that great considering the effort involved. If time is money (and it usually is) then throwing hardware at the problem is a cost effective solution that has been used for decades to get less than efficient solutions to market.
Please note the ORDER of what I suggest. Always fully evaluate your program's performance weakness and KNOW what is causing the bulk of the problem in your system. This is ALWAYS first. KNOW how your solution scales and why your performance is what it is. I'm just guessing here, but I'll be willing to BET that the issue is not his choice of tools (visual basic) but either how it was coded and/or the nature of the problem. VB is not the fastest solution for data processing out there, but it's not a total disaster in performance either.
If you are seriously suggesting that VB is orders of magnitude slower than C++, I'm going to object. (And I'm an old C programmer with decades of experience who hates VB.) If you use VB properly, it's not great, but it's not a total dog either. You should get *some* improvement but not 10X better. Further, I'm going to claim that a novice C++ programmer is extremely unlikely to be able to punch out performant C++ code that has any kind of data structures to process. So the situation is we have some performance improvements possible, but we also have a novice programmer.
Both of these issues tell me that the least risky way to take a reasonably well written VB program and improve it's performance is to throw HARDWARE at it. Throwing programing resources who don't know C++ at it to convert it is a way to spend a lot of money/time and get nothing to show for it.
I hate throwing hardware at problems too. I've seen it done many times and it seems a waste. But I've also seen projects flounder because they where hesitant to re-spin the processor card and add that extra memory or faster processor where we spent many hours wringing out a few more bytes here and making that interrupt routine a few cycles shorter there. Of course it was really expensive to change custom hardware, so sometimes you just have to make it fit. Off the shelf hardware is CHEAP, and often it makes the most sense when you consider how much programming effort costs. I could be wrong, but in this case, I'd recommend trying to fix the VB code first, then throw hardware at it
"File to fit, pound to insert, paint to match" - Aircraft Maintenance 101
"...seriously question the quality of your graduate program." Please elaborate; specifically, is this a simple personal opinion, or do you claim some professional authority in evaluation of graduate programms?
The following is nothing but personal opinion -informed by thirty five years of observation of real-world technical problem solving.
"Teams" have their uses... ;
they are useful for 'doing', but it takes smart Individuals to figure out what needs to be done; especially when 'what' has never been done before.
Team => implementation / production
Individual => Creativity
This is not to say that Individuals cannot work toward their respective goals in a mutually supportive manner...
Take a look at Julia. http://julialang.org/
It is almost as fast as C, which make it much faster than Matlab, Octave, R and Python.
and look into the implementation of co-arrays, functional programming, and most of the standard OOP/OOD functionality. THe one thing they are talking about implementing in the 2015 standard is programming by contract. Take a look at "Scientific Software Design: The Object-Oriented Way" by Damian Rouson, Jim Xia, and Xiaofeng Xu. Much of the 2003 and 2008 standards are implemented in gcc-4.8.0 and 4.8.1. You can also look at the features and compatibility charts:
http://fortranwiki.org/fortran/show/Fortran+2008
http://fortranwiki.org/fortran/show/Fortran+2008+status
http://fortranwiki.org/fortran/show/Fortran+2003
http://fortranwiki.org/fortran/show/Fortran+2003+status
Reasonable performance - better than Python, Perl, PHP, not much worse than C/C++ or Fortran :-)
Object Oriented,readable and easy to learn quickly.
Modern day language
Widely understood in Educational field.
Can test your code on your Android phone
Donte Alistair Anderson Roberts - hi son!
Karma: Chameleon
Fortran, plus some python.
he said he was using VBA, which is not fully compiled code and is probably a set of Excel macros. VBA in Excel isn't particularly fast and is single threaded. If, however, he moves his code to one of the compiled versions of VB then he will see a performance boost and be able to spawn multiple threads.
MATLAB, and Maple are for faggots. You can use free alternatives, buy if you are going to be paying, pay for Mathematica.
Last post!
Are you a Neandertal? Current Fortran standard is from 2008, and there are at least another two since F77. At least if you're trying it for performance use the latest version, or are you still using Linux 1.x.x or Windows 2?
use what others in your field use. or better yet: https://en.wikipedia.org/wiki/Root.cern C++ interpreted and compiled. It's great.
I do scientific programming for a living (Fusion Scientist) and have extensively used a lot of different languages in my research including:
Python
IDL
MATLAB
FORTRAN
C/C++
Ruby
When working on my own project my favorite setup is to use Python along with Scipy/Numpy. When I need extra speed I use Cython, and also use Cython to interface with libraries that are written in C or C++. For interaction with my codes I use ipython (assuming I need command line interaction) or QT (assuming I need GUI interaction). For libraries written in FORTRAN I use f2py. For plotting I use matplotlib.
This setup works very well for me. It is fast and powerful, almost completely platform independent, and has excellent mathematical and scientific library support. It is extremely easy to integrate C/C++ or FORTRAN code into a Python project which can be extremely useful. It is also very straight forward to do basic multithreading and parallelization. Interactive debugging is very easy and can really help both in the development and in finding problems with scientific calculations. Plotting support is fantastic and easy.
I would say that the next best option is MATLAB. This has good support, an excellent mathematical and scientific library and good plotting tools. I don't particularly like the language for large and complicated projects, and it does require a license, which can make it difficult or impossible to share codes between institutions.
Working directly in C/C++ or FORTRAN is fine for certain kinds of large projects, but is inconvenient for working on lots of small projects or numerous related calculations. Doing something simple like creating a plot requires a significant amount of programming, and debugging can be very time consuming.
I would stay away from IDL; while a nice language in many respects, it is quite out of date at this time and is no longer well supported in terms of staying current. Ruby does not have sufficient support in terms of math/science/plotting libraries at this point to work well for scientific programming.
At the end though it does really matter what kinds of projects you will be working with and what your final goals are. It also matters who else you will be working with to make sure that code can be easily shared.
Haskell
If you can't code it yourself, then hire someone. Having said that, if you want speed, C is close to your best bet, although you could argue that Fortran and Lisp are also good choices. They are all quite old languages. They can all be optimized heavily. Notice I didn't say anything about python or java or anything with a ++ in it. Python and Java are interpreted languages. They run slow because they can't be compiled into native assembly that the computer runs natively. Yelp about Java byte code all you want, its still dog slow compared to a language like C. Oh, and there is not good reason to use C++. Numerical recipes don't benefit from object orientation, and may suffer from it. C is more deterministic. The single best thing you can do though, is use the fastest algorithms available. Languages aside, algorithms win more than anything else. Compiler or not can't compare to algorithms. High clock speeds can't compare to algorithms. Many years ago I wrote a program in a language called REXX (an interpreted language), to compute exponents to very large values. Example: 123456789.123456789 ^ 123456789.123456789. (Yes, a 9 digit -pre-decimal- exponent). On a 40 MHz cpu with 2 MB of ram, it would give the (correct!) answer in about 1 second, with a good algorithm. With a crappy algorithm, just doing the non-decimal part of the exponent on a quad-core 2.66 GHz processor could take years (like multiplying a value by 123456789.123456789 in a simple loop 123456789 times: stupid).
I've spent a few man hours using Perl to create my truss app, works great. I was a little worried about matrix manipulation but sure enough there is module for everything. I've been using Perl since 1998.
My app is hosted here: http://design.medeek.com/calculator/calculator.pl
VB != VBA
try labview. It's designed for scientists who want to write software but don't care about how computers work. Also, you can use multiple cores with very little effort. Downside: it costs money. But I think it's quite cheap for students, and evaluation is free.
Read the summary and my last comment again, carefully, and the comment by user "confused one" right below. He is using VBA not VB. Visual Basic for Applications. Not the same thing. And yes, a re-write from VBA in any compiled language will get him at least a speedup of one order of magnitude, maybe two. Add to this your advice on algorithms and he won't know what hit him.
It's not that I don't like updating the hardware because of cost or whatever, it's that I can't add more hardware any more. For a single threaded application in VBA there is nothing you can do to make it faster anymore hardware-wise. Again, algorithmic optimization is to the point but "more hardware" isn't. Even massively parallel applications hit a wall at some point where the data distribution and communication costs start to outweigh the speedup you get by the extra processors. BTW, I found out that VB can also be parallelized natively as well, but the re-write alone from VBA to VB will do the trick.
Assembly
I different language will not make bad design decisions go away. I have at times written things in a sloppy but expedient way, then gone back and spent the time to re-write it with some thought and it will perform 100's of times faster.
I would go so far as to say that proper algorithm design will make a larger impact than a change to any of the languages listed here. Even in the case where a compiler does a particularly bad job at one or more types of operations, these tend to be well known and work-arounds exist.
Maybe the solution is to have someone from CS come review your code? Maybe they can sort out the problem in an afternoon and saving you the hassle of re-writing the thing.
You know C. C is simple, as fast as any alternative, it's straightforward to optimize (aside from pointer abuse), and you always know what the compiler/runtime is doing. And threading libraries like pthreads or CUDA are best served via C/C++. Why use anything else?
This is just nonsense, and to see it constantly repeated and modded up is just sad.
C is only simple in the same way a written alphabet with only two letters is simple: sure, you only have to remember the letters A and B (simple!), but actually using it is not simple.
For crying out loud, in C, you can't even do A = B + C; without having a very good chance of invoking undefined behavior. Why? Because in C, overflow or underflow on signed values has undefined behavior!
Access beyond the end of an array and damage data elsewhere in the system (making it often really hard to find)? No problem!
Laboriously managing your own memory (and probably leaking it)? No problem!
What, real strings? Heck no, real men like to take the risk of overflowing the strings and their buffers!
C is filled with literally hundreds of mine fields just waiting to trap the unwary, and often forces you to write a lot of code that would only be a few lines in a higher level language.
C is not simple to use. C is not simple to use.
Javascript is the language of the UI of the future: the browser. ... lacking libraries BUT it can call C/C++ routines or the latter can be converted to JS/NodeJS using Emscripten/LLVM.
Javascript is fast enough: 2x C/C++ (on par or faster for some tasks).
Javascript is
Speed only makes sense in real-time apps, say day-trading or control systems. If you are crunching numbers, what difference does an hour or a day or a week make? Seriously, if you are sitting around idle while your numbers a crunching, you are a waste of space. You should be:
planning the next experiment,
reviewing/refactoring your code for correctness and efficiency,
writing the paper in which your results will be published (esp. intro, background, experimental set-up/procedure),
setting up the website on which you will publish the pre-print so others can review your work and maybe prevent you from publishing foolishness,
fleshing out your next steps to follow your results,
thinking of your next great hypothesis.
Screw speed, it will come with faster processors. Code, make your code correct, and make it accessible and shareable. Javascript/Nodejs will do that in spades.
"Consensus" in science is _always_ a political construct.