Ask Slashdot: Best Language To Learn For Scientific Computing?
New submitter longhunt writes "I just started my second year of grad school and I am working on a project that involves a computationally intensive data mining problem. I initially coded all of my routines in VBA because it 'was there'. They work, but run way too slow. I need to port to a faster language. I have acquired an older Xeon-based server and would like to be able to make use of all four CPU cores. I can load it with either Windows (XP) or Linux and am relatively comfortable with both. I did a fair amount of C and Octave programming as an undergrad. I also messed around with Fortran77 and several flavors of BASIC. Unfortunately, I haven't done ANY programming in about 12 years, so it would almost be like starting from scratch. I need a language I can pick up in a few weeks so I can get back to my research. I am not a CS major, so I care more about the answer than the code itself. What language suggestions or tips can you give me?"
I have a friend who works for a company that does gene sequencing and other genetic research and, from what he's told me, the whole industry uses mostly python. You probably don't have the hardware resources that they do, but I'd bet you also don't have data sets that are nearly as large as theirs are.
You might also get better results from something less general purpose like Julia, which is designed for number crunching.
"Don't blame me, I voted for Kodos!"
sorry to say, but that is a fact
Obviously.
Seriously consider FORTRAN
If you want news from today, you have to come back tomorrow.
Have you looked at Matlab? It's commercial, requiring a license, but many universities have a site license available for you to use it. Pretty powerful, faster than VB, but not as fast as native C/C++ but unless you're running some calculations real-time, this probably is not an issue for you.
>> I initially coded all of my routines in VBA because it 'was there'.
Are you in Access? Or Excel?
If your routines work but are just slow, I'd first look at moving the data to SQL Server and porting your VBA routines to VB.NET.
If you have more time, you may want to learn what the "Hadoop" world is all about.
Depending on your needs, R may be your best bet if it is statistical processing you are interested in.
Some people die at 25 and aren't buried until 75. -Benjamin Franklin
What do you mean by scientific computing?
Modelling: Hard core finite element simulations or the like. Then C or Fortran and you will be linking with the math libraries.
Log Processing: A lot of other stuff you will be parsing data logs and doing statistics. So perl or python then octive.
Data Mining: Python or other SQL front end.
Install these 2 and you'll be good to go
http://ipython.org/notebook.html
http://pandas.pydata.org/
Try Python. Make sure to use scipy (numpy really), because you don't want to do the heavy lifting in native Python.
http://www.scipy.org/
You should all be sharing your codes to avoid rewriting and to perfect it.
And if you are not a member of a team then I seriously question the quality of your graduate program.
What language suggestions or tips can you give me?"
Timothy, shame on you. You should know better than to start a holy war.
#fuckbeta #iamslashdot #dicemustdie
Fortran and learn some how to implement MPI and CUDA code is your work is parallelizable.
"goodbye and hello, as always" ~Prince Corwin, from Zelazny's Amber series
If you can find anything that resembles a math library with the correct tools then go with Python. Numpy is everyones friend here.
If you have to do the whole thing from scratch then Fortran is the fastest platform. I can't say I've meet anyone who enjoyed Fortran but it's wicked fast.
TCAP-Abort
For numeric-intensive work, I can get within 20% of the speed of C++ using the usual techniques -- minimize garbage collection by allocating variables once, use the "server" VM, perform "warmup" iterations in benchmark code to stabilize the JIT. I use the Eclipse IDE, copy and paste numeric results from the Console View into a spreadsheet program, and voila, instant journal article tables.
I would recommend learning what a programming language is. Especially if you have the time. Personally I spent a lot of time learning languages and not really seeing the abstraction that every programming language adhere's to, making learning a new language difficult and time consuming. I can only really describe it as trying to learn a language rather than learning linguistics. All computer languages share common patterns all based on formalism, just like all spoken languages share common patterns. Learning formalism makes picking up new programming languages much easier since you'll not only be able to identify patterns shared between them faster, but pick up the lexicon to communicate well formed questions to other programmers. I'd recommend reading Structure and Interpretation of Computer Programs. There are other books that attempt to replicate what this does, but it really is great and I haven't seen other books get to the point of computer programming faster. It is based on LISP, which most people will never use, but its deceptively easy to read and understand, so getting through the book for someone that hasn't used LISP before shouldn't be a problem. Good Luck!
First suggestion: Python. Lot's of nice stuff for science (NumPy, SciPy), lots of other goodies, easy to learn, many people to ask or places to get help from. Plus you can explore data interactively ("Yes Wedesday, play with your data!").
Beyond that: CERN uses a lot of Java (sorry folks, true), they have good (and fast) tools I do a project right now where I am using Jython since it is supported by the main (Java) software I have to use. I like jhepwork/SCaVis quite a bit, if you are into plotting stuff on Java.
If you have extra free time and want to learn how to program well? I'd learn something like Smalltalk (for OOP concepts) and/or Haskell (functional programming). Scientists are often lousy programmers because they often do not learn programming properly, and/or the language allows them to get away with bad programming (I know, every language allows bad programmers to write bad code, but some make it easier than others).
So, stick with Python, it works really well, is modern, and has good support. Plus you can read your code in 5 years time ...
What do I program in? Python (and Jython), Perl, C, IDL (yickes!), Smalltalk, Matlab, Mathematica. I know some Lisp, but that's just for fun. And whatever allows me to load sketches on an Arduino. I like Python (get's stuff done) and Smalltalk (works actually like I think - passing messages between objects).
Use whatever works and you don't hate :-)
Do your own thing. And overdo it!
Most of the cutting edge data mining I've seen is done using R (which acts as a scripting wrapper for the C or Fortran code that the fast analysis libraries are coded in), or alternatively in python. Some people swear by MatLab if they have trained in it (so your octave would come in handy there). Have a look at some discussions at places like kaggle.com to see what the competitive machine learning community uses (if that is what you mean by data mining).
Korma: Good
http://golang.org/ You won't regret it.
Sorry about the mess.
A lot of people will propose a language because it is their favorite. Others because they believe it is very easy to learn. I will give you a third line of thought.
I would not look for a language in this case, I would look for a library, then teach myself whatever language is easiest/quickest to access it. I would try to profile what you are building, figure out where the bottlenecks are likely to be (profiling your existing mockup can help here but dont trust it entirely) and try to find the best stable well-designed high performance library for that particular type of code.
=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Friends don't let friends enable ecmascript.
If you are doing a computationally intensive data mining problem, have you considered porting to a Hadoop solution? You may need to rewrite your code, or you may be able to use Hadoop to call your current functions. You could use an AWS Hadoop cluster; Amazon often gives free credits to students, it may cost you nothing out of pocket, and help you learn a hot new technology.
If you're using VBA in Excel, you can speed it up a ton by putting this at the beginning of your function:
Application.Calculation = xlCalculationManual
And restore it with ...Automatic at the end.
Do this at the top level with a wrapper function whose only purpose is to disable and enable that, calling the real function in between.
If you want a real speedup, I am available for part time work in C or C++.
(-1: Post disagrees with my already-settled worldview) is not a valid mod option.
It take all the work out of the computations..
Have you fscked your local propeller head today?
If you really want to do heavy lifting, you can't beat Fortran. Just stay away from Fortran 77; it's a hot mess. Fortran 90 and later are much easier to use, and they're supported by the main compilers: gfortran and ifortran.
ifortran is Intel's Fortran compiler. It's the fastest out there, and it runs on Windows and Linux. Furthermore, you can get it as a free download for some types of academic use. (Search around intel's website -- it's hard to find.) That said, I usually use gfortran -- which is free and open source -- on linux. See http://www.polyhedron.com/compare0html for a compiler comparison.
If you use Fortran, it's very easy to use OpenMP to do multiprocessing and make use of all those cores. OpenMP is supported by the main compilers.
If you're doing lighter work, SciPy/NumPy works fine; I use it a fair amount if maximum performance isn't essential. However, I can't speak to its multiprocessing ability.
Use KNIME and you can probably do 90% of what you want by dragging and dropping a new nodes and joining them up. KNIME does all the complicated memory caching for large filesets for you, and you can write your own Java functions to plug into it if you need something special.
R, MATLAB, SAS, Python, there's a bunch of languages you can use, and a bunch of ways to store the data (RDBMS, NOSQL, Hadoop, etc.). It really comes down to what kind of access to the data you have, how it's presented, what other resources you have available to you, and what you want to do with it.
Better yet, Fortran + Python.
http://docs.scipy.org/doc/numpy/user/c-info.python-as-glue.html#f2py
I used it to wrap some crazy magnetometer processing code written in Fortran into a nice Python program. I ripped out all the I/O from the Fortran code and moved it into the Python layer. It worked great. Fortran is AWESOME at number crunching but SUCKS ASS at IO or well pretty much anything else, hence Python.
-73, de n1ywb
www.n1ywb.com
Well, it depends. You say " computationally intensive data mining problem" but, what kind computations (arithmetic, mathematical, text-base, etc.).
In general for flat out speed, toss interpreted languages out (Perl, Python, Java, etc.) the door. You'll want something that compiles to machine code, esp. if you are running on older hardware. Crunching numbers, complex math, matrices then Fortran is the beast. If you're data is arranged in lists, consider lisp, then pick something else as it will likely give you a migraine. The format of your data and what you need to do with it will drive your language choice.
Is finding a partner an option? Seems you should be able to work with someone from CS who needs a coding project...
I work in the industry (all our customers are scientists), and the two languages that seem to be predominant are R and Python. R has lots of cool stuff specifically for advanced number crunching, while Python is more the swiss army knife that can be used to tackle anything. I don't think you can go wrong with either, but Python will probably be more friendly (eg. it has way more books on it than R) and will serve you better in non-scientific enterprises.
Since you mention VBA, I suspect that your data is in Excel spreadsheets? If you want to try to speed this up with minimum effort, then consider using Python with Pyvot to access the data, and then numpy/scipy/pandas to do whatever processing you need. This should give you a significant perf boost without the need to significantly rearchitecture everything or change your workflow much.
In addition, using Python this way gives you the ability to use IPython to work with your data in interactive mode - it's kinda like a scientific Python REPL, with graphing etc.
If you want an IDE that can connect all these together, try Python Tools for Visual Studio. This will give you a good general IDE experience (editing with code completion, debugging, profiling etc), and also comes with an integrated IPython console. This way you can write your code in the full-fledged code editor, and then quickly send select pieces of it to the REPL for evaluation, to test it as you write it.
(Full disclosure: I am a developer on the PTVS team)
FORTAN used to be it back in the day, but now days Matlab is the stuff that many engineers use for scientific computing. Many of the math libraries are very good in Matlab and don't require you to be a computer scientist to make them run fast. I used to work with scientists in my old lab to port their Matlab code to run on HPC clusters porting them to FORTAN or C. Often the matlab libraries smoked the BLAS/Atlas packages that you find on Linux/UNIX machines for instance. The same would hold true for Octave since they just build on the standard GNU math pacakges like BLAS.
If you want to be able to ask someone for help then it would be best to use the same tools they use. The point is that any programming language will work. Some languages are easier then others but the difference is negligible compared to the advantage of being able to ask your piers for assistance.
...at jsoftware.com .
It's more powerful, concise, and consistent than most languages. However, R and Matlab have larger user communities and this is an important consideration.
There was a note on the J-forum a few months ago from an astronomer who uses J to "...compute photoionization models of planetary nebulae." His code to do this is about 500 lines in about 30 modules and uses some multi-dimensional datasets, including a four-dimensional one of "...2D grids of the collisional cooling by each of 16 ions".
However, the point of his note was that he ported this code to his i-phone - and it works! Consider, too that porting consists mainly of copying some text and data files - there would be little to no code changes.
I'm a MSEE and I've been working in the digital signal processing realm for the last 10 years since graduating. I should mention that I haven't done a lot of low level hardware work, I haven't programmed actual DSP cards or played with CUDA. I have written software that did real-time signal processing just on a GPU. Everyone in my industry at this point uses C or C++. There is some legacy FORTRAN, and I shudder when I have to read it. Some old types swear by it, but it's fallen out of favor mostly just because it's antiquated and most people know C/C++ and libraries are available for it.
For non-real-time prototypes I'd recommend learning python (scipy, numpy, matplotlib). Perhaps octave and/or Matlab would be useful as well.
At some point you have to decide what your strength will be. I love learning about CS and try to improve my coding skills, but it's just not my strength. I'm hired because of my DSP knowledge, and I need to be able to program well enough to translate algorithms to programs. If you really want to squeeze out performance then you'll probably want to learn CUDA, assembly, AVX/SSE, and DSP specific C programming. But I haven't delved to that level because, honestly, we have a somewhat different set of people at the company that are really good in those realms.
Of course, it would be great if I could know everything. But at the moment it's been good enough to know C/C++ for most of our real time signal processing. If something is taking a really long time, we might look at implementing a vectorized version. I would like to learn CUDA for when I get a platform that has GPUs but part of me wonders if it's worth it. The reason C/C++ has been enough so far is that compilers are getting so good that you really have to know what you're doing in assembly to beat them. Casual assembly knowledge probably won't help. I might be wrong, but I envision that being the case in the not too distant future with GPUs and parallel programming.
Do you have access to MATLAB or a similar analysis tool? Many universities have licenses, and overall it seems like it might be a good choice for you. These programs usually have a lot of build-in functionality that will be difficult to reproduce if you are not an experienced scientific programmer.
I haven't done ANY programming in about 12 years, so it would almost be like starting from scratch.
This is probably a bigger problem than choosing which language to use. If you don't know how to program properly and efficiently, it doesn't matter which language you choose. If you go this route I'd suggest taking a course to refresh or upgrade your skills. Since you're familiar with C that might be a good language to focus on in the course. Another factor is if you have to work with any existing libraries it might limit your choices. I program in C, FORTRAN, and VB and find that for computationally intensive programs C is usually the best fit, sometimes FORTRAN, and never VB.
newLISP is small and can easily call most c/c++ libraries, plus Java for graphics. HTML/XML are really just LISP S-expressions for all practical purposes. Throw in a little Unix/bash and you are there.
Personally, I would do it in C unless you have Fortran libraries you want to use, then I'd use Fortran. However, if you have existing VBA code you want to leverage, I'd just use VB.Net, import the core parts of the code and run with it. There's a moderately steep learning curve going from VB6 or VBA to VB.Net; but, it'll be much less effort than learning a new language.
It depends on what you willing to deal with.
Python is good if you don't need to very heavy array code. I know you can use Python libraries that give you access to good arrays but I think of Python as a scripting language. It's good for a quick prototype as well, but for heavy computation, I would move on to a compiled language.
Fortran 90 or Fortran 2003/08 is what will be the most like what the mathematical syntax you'll use. Despite what people may tell you, it is possible to write code that is understandable and reusable in Fortran, it just takes a great deal of understanding when you design the code. Most people have only seen Fortran code that was either hacked together or is so heavily optimized that it has been obfuscated.
C++ is good as well but you'll spend more time figuring out how to express your mathematics and to use the arrays than you might might find productive. In my group, we do computer science parts of our codes in C++, but numeric calculations and heavy-duty array manipulation is done in Fortran.
The thing about taking advantage of the multiple core machine is much deeper than simply choosing a language. There are MPI and OpenMP libraries that are very good for Fortran and C++. However, producing efficient code that is parallelizable requires changing and complicating the algorithm for a well understood and functioning serial code. Writing effective parallel code will take you much more time than picking up a programming language.
If you are working in academia, then you probably have access to Matlab. Matlab, as a language, has both scripting abilities and programming abilities. The scripting was born from Matlab's roots in Unix, which makes it handy for batch processing lots of files. It's programming functions started off as C, but has since incorporated features from C++, Python, and Java. The programming side of it has, in my opinion, more structure and formalism than Python, but makes certain things like file IO and data visualization (i.e., graphing) easier than straight up C/C++. The basics of using it can be picked up in an afternoon, and the sky's the limit from there. There are lots of well-written and documented functions built in; specialized toolboxes can be had for additional fees. There's a fair bit of user-generated code out there. Plus, I expect you can find a lot of people around you who know plenty about it.
The answer would really depend on the nature of the problem. If you are doing more statistics type processing then R is commonly used in academia. Python might be good in the short and medium term, but you will probably want to get acquainted with C++ if you are serious.
I worked as a sysadmin for a high energy physics group at the Beckman Center. Day and night, it was Fortran, on big whopping clusters, doing monte carlo simulations.
Though it ~was~ many years ago.
Elsewhere, I worked for a company doing datamining on massive datasets, over a terabyte of data back in 2000, per customer, with multiple customers and daily runs on 1-5 gig subsets. We used C + big math/vector/matrix libs for the processing because nothing else could come close, and Perl or Java for the data management; preprocessing, set creation and munging (like attempting to corrrect spelling mistakes, parsing date strings into a standard format, normalizing data against a standard metric, applying expert system filters, even actual machine analysis like clustering or shape detection, which to us was still just preprocessing).
Don't use a programming language. Use a tool like Matlab or Mathematica instead. These tools are well designed for scientific computing and have sufficient scripting built in to support the programming-language-like functionality you're probably looking for.
You won't be able to call yourself a programmer. But you're not a programmer, you're a scientist.
Moderating "-1, Disagree" is simple censorship. Have the guts to post your opinion.
Stick to 'functional' flavored languages, especially ones that are geared toward composing concurrent code particularly easy and feature immutability of variables as a default, as that's particularly important when you've got a lot of working parts running in parallel. We've been pulling away from the higher-level languages reimplementing most of Lisp poorly toward languages that more and more resemble the ML family with each year that goes past. Haskell is the current darling of that group, though I'd not suggest a lazily evaluated language to a beginner as it's particularly difficult to reason about compared to the more typical eager evaluation. Still, you can try taking a crack at Learn You a Haskell if you want to see a brief glimpse at one of the most interesting languages floating around at the moment, especially since there's ample resources available for study compared to say, Standard ML, my personal favorite.
I would keep my eyes out on Rust in particular, which has taken a few of the better ideas from ML (pattern matching, type inference) and has tried to graft it to a more pragmatic set of trade-offs, such as pushing garbage collection from a required performance hindrance to a per-task elective option and has focused on strong interoperability with C and its calling conventions. It also was a day-one design decision to focus on parallelism and concurrency, which is part of why I cannot directly suggest OCaml at the moment, as that is its biggest weakness in otherwise a very robust and well-established member of the ML family.
I would avoid Go for myriad reasons, unless you want to cover a third of your source-code with explicit error handling or just silently discard them, suffer through the lack of generics (not everything has to be shit with generics like Java and C++, guys, c'mon) or just really enjoy null errors and weird versioning issues that crop up during project development because the Go build system is a bit too simplistic and their solution is to just pull all your dependencies into your own tree to avoid the issue.
I run lots of statistical analyses. Most of the code is in R with some wrappers in Perl and some specific libraries in C. The R and Perl code is pretty much all my own. The C is almost entirely open source software with very minor changes to specify different libraries (I'm experimenting with some GPU computing code from NVidia). Most of the people who are doing similar things are using Python with R (or more specifically, the people I know who are doing the same thing are using Python/R).
An average run with a given data set takes approximately 20 minutes to complete on an 8-core AMD 8160. About 80% of the run is multi-threaded and all cores are pegged. The last bit is constrained mainly by network and disk speed.
You may consider using something like Java/Hadoop depending on your data and compute requirements. Though my Java code is just a step above the level of a grunting walrus, I've found that the performance is actually not that bad and can be pretty good in some cases.
As others have pointed out, if you are going to do serious research,
you dont want to mess around with the latest and greatest.
Stick with something that has been around and works.
COBOL is that language!
Ha! Made you smile :P
Seriously tho, use C or FORTRAN. Nothing beats them for optimization and speed.
Cheers!
it's easy to learn, it's fast, it's suitable for almost any task because there are many ready to use libraries out there. thus python. btw I learned python in three hours! see github.com/masikh for my work.
Not only is C# easy to learn, and easy to both read and write, it also runs at a fairly high speed when it is compiled. To make use of multiple CPU Cores, C# has a neat feature named PARALLEL.FOR. If your algorithm scans across a 2D Data Array using a FOR LOOP at all, Parallel.For will automatically break that array into smaller arrays, and have each calculated by a different CPU core, resulting in a much faster overall computation speed. I develop algorithms in C# and highly recommend it if you want a) a nice, readable code syntax and b) fast execution speed. I hope this helps...
Why did the chicken cross the road? Because Elon Musk put an AI chip in its head.
If you're data is arranged in lists, consider lisp,
Oh please! It's not like Lisp doesn't have any other data structure, is it? You can have your multidimensional numerical arrays in CL quite easily. (I'm saying neither "use CL" nor "don't use CL", merely that your argument is pretty weak. It's easier to learn to work with lists in the language you already know (unless it's COBOL!) than to learn an entirely different one just because of lists.)
Ezekiel 23:20
In order to realize all possible performance from your hardware, I would suggest linux over XP.
With xeons going 64-bit around 2005, it would have to be really old to be only 32 bit.
And even if it was an ancient 32-bit only xeon, XP is still going to have issues using more than 3.5 gb ram.
XP process management seems weak to me compared to the linux side of things.
I don't have a favorite brand of linux to recommend; I would ask your professors and fellow researchers if they have a preference (because they are going to be your go-to support crew).
In any event, I would try to max out the ram your specific motherboard can handle.
And I would beg/buy/borrow/steal a modest SSD to run the OS on, you can probably get both for $100 or so.
Keep your data sets on the slower spinning-rust drives.
One especially insightful response I saw above was asking about what kind of computation you're running.
The python guys are probably right.
I suspect your problems with VB is it will be single-threaded, and (I'm not a VB developer, I've just had to cope with it from time to time) not so generous with efficient data types.
I've had some awful experiences trying to run multi-threaded procsses on XP and Java.
I think you'd get better results from ditching XP.
Your actual language doesn't matter as does some parallel-capability.
Finally, the good news: almost anything is certain to be better than running VB in XP.
The fact that you could implement your solution VB suggests that it is not crazy complex.
Doing it in raw C will be a pain because you'll have to code your own process management.
I'd be very interested in seeing if numpy or perhaps "R" can do the math that you need.
Do follow up and let us know what you end up doing.
It sounds like you have control of the whole machine, which makes you the sysadmin. You don't only get to choose the programming language. You have to design a workflow. The programming language will fall out of you designing your plan of attack. You have to do so within the limitation of your advisor's budget, the assistance you can beg, etc. Take comfort in the fact that procedural languages are deep down 98% the same with different words for things, it is the libraries that get confusing. And read the library documentation like your life depends on it. It does.
PHP because it's fast, fun and scalable!
*ducks for cover*
You don't specify the scientific field. My experience is from biology and what i can recommend is Python (look at the numpy and BioPython modules) and R (www.cran.org), which is an excellent statistics and data mining tool (again on the biology side it has the bioconductor toolset). MySQL may also come handy to store data depending on the project. I find myself writing some pieces in R, some in Python and using the Rpy2 python module to glue them together. MySQL can also be accessed from both python and R.
http://www.hardocp.com/article/2012/05/08/inside_mind_stuart/
I saw it years ago, when everyone was wondering if it was real, lol.
Reminds me, I need some more nyquil...
Truth isn't Truth - Guliani
Perl Data Language
The power of Perl + the speed of C
I am a scientist who dabbles in data mining problems. I use Python with a healthy dose of C++ and the occasional Java. These are probably the three most common languages among the community. I see people using R and Matlab relatively frequently. A bunch of people in this topic have suggested Fortran, but I've never seen anyone use it seriously.
I haven't run into anyone who who doesn't use a minimum of two programming languages (Python/C++, Matlab/Java, etc.).
Note that Kaggle.com (the data mining competition site) frequently posts their example solutions in Python. Failure to understand the Python solution starts you out at a healthy disadvantage.
Back when Men were Men....
We tend to use R, C, some C++, and a lot of Perl.
But then, we do real science.
-- Tigger warning: This post may contain tiggers! --
Sounds like you are working on some sort of similarity search problem.
You probably find most of your peers are working with C/C++.
If that's the case I'd go for that language.
You are not going to write everything from scratch by yourself. You're just not. Not if you actually want to get anything done. You're going to reuse code.
So: figure out what code you're most likely to reuse, what frameworks are useful in the field you're interested in, and let that suggest the language.
If you don't know how to get started on that: asking the question of peers in the same scientific field will get you a more useful answer than asking the question on a wide-open generic technical forum.
Another angle: look at what network databases you want to integrate with (eg. protein databases at nih.gov), and look for sample code showing how to access 'em. That'll give you a clue what other practitioners are doing.
For scientific computing, you will be doing a lot of collaboration and very likely sharing codes with other scientific programmers, very few of whom enjoy learning new programming languages all the time. To simplify/enable collaboration, you should follow what the community uses. In physics, generally that means Fortran. Anything past Fortran90 is basically modern, it's really not too bad to learn and even has basic object-oriented stuff, though not as good as C++. F77 is mostly obsolete and a major pain in the neck, but you will see it around in older codes, as well as a lot of the libraries. There are C/C++/Python/f77/etc codes around, but most physicists use >F90, especially in high performance/parallel computational work. But there are subfields of physics with their own popular tools too. My advice is to go with whatever the majority of your colleagues are using, placing a very big premium on what your adviser and group members use, which is who you will collaborate with the most. What the majority in the field uses is usually suitable for the job anyway.
It sounds like you're interested in parallel computing as well. Fortran is probably the best option then, mostly for the libraries, but you can still interface from C/C++ or whatever. Also, if you have a lot of computationally intensive stuff, you should try to get supercomputer access. Ask around, you should be able to work something out. You'll need to decide on OpenMP or MPI for parallel programming, depending e.g. on your memory, shared/distributed etc. Here's a quick rundown: http://www.dartmouth.edu/~rc/classes/intro_mpi/parallel_prog_compare.html
Most scientific hpc (high performance computing/supercomputer/parallel) is on unix/linux.
What field are you in exactly, and what is the nature of your data mining?
Especially for a beginner Fortran will make the most sense, IMHO. Here's why:
- User-friendly syntax. Especially vector and matrix syntax and operations are very intuitive.
- Strong typing will let you catch lots of errors at compile-time rather than let you hit your head against the wall at run-time.
- Fast in quite a foolproof way (just remember to loop over columns first, or possibly even use the simpler Matlab-like syntax and let the compiler figure it out)
- Usable with OpenMP and MPI
- Massive availability of free code on the net (visit netlib). Old code also has very good chances to run out-of-the-box or with very minor changes.
C is a close second to many of these points. I can't recommend C++ and Java though, as all the clutter will slow you down especially at the beginning. I also like Python a lot, but there's a catch, which brings me to my next point:
Do you have a plan of your program? How well do you know what you'll be programming? Is that going to change a lot along the way? Code changes are a lot harder to implement in Fortran and C than in Python. The abstraction level in Python really amazed me, as the interpreter would run anything I would throw at it. Mixing paradigms in Python makes scientific programming a breeze of fresh air. There are ways to make it quite fast, too! Mixing procedural and object-oriented programming is possible in Fortran as well, but by far not as versatile as in Python. In any case, if you decide to use the OO-paradigm, you have to make sure you have the whole program figured out before you start so that you can define your objects wisely. I have mixed experience with scientific OO in C++, so I can't really recommend it.
Output is also a huge topic in scientific computing. It is a pain to make live graphs in Fortran (the intel compiler has a proprietary library that somewhat helps), but you can export the raw data and use gnuplot or another tool for visualization (even Excel for small graphs). For larger 3D+time datasets there is Paraview. These things are much more fun to do in C/C++. On the other hand, you can use the C-interoperability features of Fortran 2003 and combine them! Or use f2py or PyFort and combine Fortran and Python!
Things have moved on. Fortran 95+ is almost as easy as Matlab, definitely easier than C++ and faster than both.
And for heavily numerical algorithms it is better designed, beyond just the speed.
If you want to do programming as a career, you need to be flexible enough to be able to pick up any language, so use whatever language you feel comfortable enough using to write maintainable code.
LaTeX is a great thing to learn, but it is most emphatically NOT a remotely reasonable choice for writing number crunching code...
SIGSEGV caught, terminating
wait... not that kind of sig.
The problem with this question is that "scientific computing" is an over-broad term. The truth is that certain languages have found specific niches in different parts aspects of scientific computing. Bioinformatics, for example, tends to involve R, Python, Java, and PERL (the prominence of each depends largely on the application). Big-data analytics typically involves Java or languages built on Java (Scala, Groovy). Real-time data processing is generally done in Matlab. pharmacokinetics, some physics, and some computational chemistry are often done in FORTRAN. Instrumentation is generally controlled using C, C++, or VB.NET. Visualization is done in R, D3 (JavaScript), or Matlab. Validated clinical biostatistics are all done in SAS (!).
Python is a nice simple to learn start, very powerful, and the NumPy package is important to learn for scientific computing. R is the language of choice for many types of statistical and numerical analysis. Those are a good place to start, if incomplete. From there, I'd look at the specific fields of interest and look at what the common applications and code-base are for those.
With regard to the OS, that's pretty easy: Linux (though OS X is a reasonable substitute). Nearly all scientific computing is done in a UNIX-like environment.
seconded. If possible this is a nice route.
Julia
Religous speak to God. Insane are spoken to by God. When all shut up, one can finally hear Shostakovich in peace
In part, this is because Intel has a compiler for it. On commodity hardware (as in desktop, laptop), you will generally get the best performance running an Intel CPU and using an Intel compiler. That means C/C++ or FORTRAN, as they are the only languages for which Intel makes compilers. C++ is easy to see, since so much is written in it but why would they make a FORTRAN compiler? Because as you say, serious science research uses it.
When you want fast numerical computation on a desktop, FORTRAN is a good choice. We have a few researchers here who use it, and they all use the Intel Fortran Compiler because they want fast computation, but they don't have the money to buy bigass systems for every grad student. What they get out of the IFC and a regular Intel desktop chip is pretty impressive.
Compilers matter, and Intel makes some damn good ones. So if your research calls for lots of performance on little budget, that can influence language choices. Heck same thing on supercomputers. That is not my area of expertise, but it isn't as though all compilers for a given supercomptuer will be equally good. If I were to bet, I'd say the FORTRAN compilers are some of the better ones.
The best compiler support for numbers will commonly be Fortran.
Python belongs on the list because slow functions can be coded in C
or another native language for speed. It is also a rich and portable protyping
language.
There is value in asking your advisor.
A linux distro like Centos is well regarded, almost any programming language
can be downloaded. Switching to Redhat for product support has a small learning curve.
R is a statistical rich environment that you should be aware of. Python bindings for R exist so
again Python.
SUMMARY: Python and R. R may be all you need.... R makes charts and graphs, slices dices....
runs on many platforms even WinsowZ
Truth is stranger than fiction, but it is because Fiction is obliged to stick to possibilities; Truth isn't. Mark Twain.
Python Python Python Python Python It already does almost everything you could ask for, and is growing in acceptance and userbase. It is a modern language with modern language structure. It is designed to be human readable and consistent.
Your best bet is to use C. It is highly efficient. If possible use computational code like the Atlas BLAS package. This code will run circles around your own code no matter what language you use. You already know C and moving to C++ is a major problem. All the other languages are distractions from your purpose.
If possible run multiple, independent processes rather than writing parallel code. That can be a major ordeal.
If your goal is to process data as opposed to learning elaborate programming techniques, keep simplicity in mind. C is a very powerful language and you can reach maximal efficiency for many problems using Atlas BLAS and multiple processes. If you goal is to get a degree in CS, ignore what I've suggested.
Ray Seyfarth, ray.seyfarth@gmail.com, http://rayseyfarth.blogspot.com
If you are working in academia, then you probably have access to Matlab.
On the other hand, you definitely have access to SciPy, given that it's free.
I predict that Python with SciPy/NumPy will completely displace Matlab within a few years.
I say that even though I am working in one industry, digital signal processing, that is really married to Matlab and will be one of the last places to make the switch.
Because Matlab was purpose-built for scripting with matrices, it has some nice syntactic sugar for that. In every other way, Python as a language is far superior.
I was able to attend the SciPy conference a couple of years ago, and one thing I heard there: people like that Python works as a universal language. Sysadmins can use Python to do admin tasks; the web site guys can use Python (with Django) to make web sites; the science guys can use SciPy... it's one language that is flexible enough to do anything you might need, and it's much easier to learn than other really flexible languages like Lisp.
Because Matlab has been around a long time and has man-centuries of work invested in it, it has very complete and well-debugged libraries available for it. SciPy is playing catch-up here. But the basics are already solid, and if SciPy will work for you, you should choose it because it is the future.
There was a time, not that long ago, when people spent $30 to get a web browser. Now people expect web browsers to be free. I predict in the near future the same thing will happen with Matlab vs. SciPy.
SciPy has the advantages of being free and open, as well as the advantage of being free as in beer. And Python is just a better language than the Matlab language. Mark my words: Matlab will fall and Python/SciPy will rise.
lf(1): it's like ls(1) but sorts filenames by extension, tersely
Python +PyQT or PyGTK is a similar, and probably easier, answer.
#1 reduced the field of choices (IMO) to * Matlab/Octave * R/S+ * SAS * Perl * Python * Julia
As for #2 gives preference to Python, R, Julia, Perl, or Octave (Your situation may not be as limiting).
#3 led me to many searches that all indicated that R and Python have a rich set of libraries and lots of community support.
As for #4 From Julia's website http://julialang.org/ they show nice benchmark information that indicates that Python is pretty quick.
My conclusion was that I couldn't really go wrong between R or Python. However, I chose Python because it was quicker, I like the syntax better, I like the libraries better (NumPy, SciPy, Pandas, Matplotlib) and is seems to play nicer with everything else. This is what worked for me and how I went about deciding.
I put the 'Physics' in 'Physical Attraction'
Truth is stranger than fiction. Even as a Perl lover, I would never have thought of using it w/ number crunching. Still doubt it would be my first choice.
In addition to the excellent comments previously made, consider investigating the Center for Open Science, specifically their information for developers, and the associated Open Science Framework (note: will display only if cookies are enabled; I've no idea what value they provide in this context and will be contacting them about that).
They may not have anything that can help you. Or they might. Or you might be able to help them. Or not. YMMV, etc.
Worth taking a peek, anyway.
I'm here EdgeKeep Inc.
The best environment is one that already has the stuff for your application, so that you just cobble together calls to code written by somebody else. Perl + CPAN was the winning combination for many years. These days it seems to be Python+numpy/scipy/scikits.
I would say it largely depends on what people in your field use. I use Matlab on a desktop for data analysis and Fortran/Python for HPC number crunching (astronomy/planetary science). Recent releases of Matlab have seen heavy optimization in number crunching and the parallel processing toolbox is incredibly simple to use. The plotting and graphing tools are second to none and very intuitive if you want to visualize multi-dimensional datasets. For integration of visualization, editing and debugging in one scientifically-oriented IDE, it can't be beat. Plus it sounds like you're familiar with GNU Octave. Python is a better language in my opinion, but lacks some of the 'do-science-straight-out-of-the-box' feel that Matlab is good at. Python obviously has the advantage of being free. The best scientific package is the Enthought Python Distribution which integrates their Canopy IDE with numpy, matplotlib and other great python modules. Free licenses are available to student/academic users.
Where I'm coming from: I'm a satellite physicist working as a contractor for the USGS on the Landsat program. I work very closely with NASA.
Almost all the scientific programming we do -- and by 'we' I mean USGS and NASA -- is either in IDL/ENVI or Matlab. They're the defacto standards for scientific processing. We do need to know SQLPlus to get our data out of the databases, and we need rudimentary C++ skills in order to make prototype code for the IT coders to turn into an operational release. Sometimes it's easier to code something in C++ then IDL or Matlab, so it's nice to be able to jump straight to that when warranted. Add Perl for text manipulation (which always turns out to be useful in some way) and that's all the programming I've done for the past ten years. Many scientists in the building swap out ARCGIS or ERDAS for IDL/ENVI. (Matlab doesn't seem to be swappable; you either need to use it or you never touch the stuff.)
I've dabbled in Php when they asked me to prototype a web site but that never went far. I've done a little Flash programming that they eventually decided to hire out for. (I did a fine job, but they wanted the application to go bigger.) In the early days of my career FORTRAN was everywhere, you couldn't get away from it. There are still some FORTRAN programs in-house that I could fiddle with if they asked me to, although I'd blanch at the prospect.
All that said, what you need depends on what your role is. If you're a scientist like me then these self-taught languages might be enough. If you're a science-oriented IT person, you'll need more -- most importantly strong C++ skills, at least around here. And different disciplines will have different needs; I worked briefly for NIH (National Institutes of Health) and they still had COBOL programs.
I know of one person in two organizations (USGS and NASA) who knows Python, and he's an IT guy not a scientist. He's also the only person I know who has ever used Hadoop. I have never met anyone who knew R. Visual Basic is used occasionally here and there for prototyping, and almost immediately switched out with C++ as soon as management decides to support the project.
Genocide Man -- Life is funny. Death is funnier. Mass murder can be hilarious.
I suspect that VB is NOT your problem here. But, if you have a VB program that is too slow, then I'm going to suggest you do the following:
1. Profile your program and see if you can figure out what's taking up all the processing time. It may be possible to change the program you already have slightly and get the performance you need. It would be a shame to go though all the trouble to learn a new language and recode the whole thing if replacing some portion of your code will fix it. Do you have a geometric solution implemented when a non-geometric solution exists?
2. Consider adding hardware - It's almost ALWAYS cheaper to throw hardware at it than to re-implement something in a language you are learning.
3. Rewrite your program in VB - This time, looking for ways to make it perform faster (you did profile it right? You know what is taking all the time right?) Can you multi-thread it, or adjust your data structures to something more efficient?
4. Throw hardware at it - I cannot stress this enough, it's almost ALWAYS easier to throw hardware at it, unless you really have a problem with geometric increases in required processing and you are just trying to run bigger data sets..
5. If 1-4 don't fix it, then I'm guessing you are in serious trouble. If you really do not have a geometric problem, You *MIGHT* be able to learn C/C++ well enough to get an acceptable result if you re-implement your program. C/C++ will run circles around VB when properly implemented, but it can be a challenge to use C/C++ if your data structures are complex.
6. Throw hardware at it - seriously.
Unless you really just have a poorly written VB program or you are really doing some geometric algorithm with larger data sets (In which case, you are going to be stuck waiting no matter what you do) getting better hardware may be your only viable option. I would NOT recommend trying to pick up some new language over VB just for performance improvement unless it is simply your only option. If you do decide to switch, use C/C++ but I would consider that a very high risk approach and the very last resort.
"File to fit, pound to insert, paint to match" - Aircraft Maintenance 101
You know C. C is simple, as fast as any alternative, it's straightforward to optimize (aside from pointer abuse), and you always know what the compiler/runtime is doing. And threading libraries like pthreads or CUDA are best served via C/C++. Why use anything else?
Another thought: scientific libraries. If you need external services/algorithms then your chosen language should support the libraries you need. C/C++ are well served by many fast machine learning libs such as FANN, LIBSVM, OpenCV, not to mention CBLAS, LinPACK, etc.
... that is all
I work in gambling and we write everything in c++. It's as fast as anything else, and the new standard makes everything a lot easier. Threads especially. You also have tons of libraries at your fingertips: GSL, good RNGs, whatever you can think of. I think C is probably the worst thing to write in if you're just coming back to it. You'll end up spending more time with memory management than you will actually getting stuff done. Go with C++ and use standard containers.
My experience at this comes from being a MooC addict where some of the courses are in Python, others in R, and others in Matlab or its GNU counterpart Octave.
Of these Python is my favorite since it's the language I'm most familiar with. Furthermore, you can "bolt" R to Python with the Pandas library, and you can "bolt" Matlab/Octave with the Numpy & Scipy libraries.
A big drawback, however, is speed. The big advantage of domain specific mini-languages over "kitchen sink" languages was brought home to me by writing a Python script to simulate the popular (in statistics courses) Monty Hall problem and the same script in R. While my Python script took several seconds to simulate a couple of thousand Monty Hall game turns, the R script would give the percentage for millions the instant I hit the enter key.
More complicated problems ended up with weird bugs in R scripts I couldn't figure out, whereas (because of my better familiarity with Python's "mutable list" problems) I tended to get correct -- albeit slower -- answers from my Python programs.
Re Octave: whereas R has overtaken commercial versions of S, I've written off Octave as a lame "freeware" version of Matlab -- lots of features are missing, the documentation is frustrating (it seems to only be used by universities, so "gurus" on stackoverflow etc automatically assume any question is some student trying to cheat at homeworkd) so I'm not a fan. But if I knew Octave as well as Python, I might like it.
R, on the other hand, has an obvious speed advantages for the problems it's aimed at, and a probably a better selection of specialist libraries for statistical problems. But it's full of strange quirks for non specialists.
If it works, it's obsolete
I have worked for almost a decade in scientific computing, and it is Fortran everywhere. Make sure you get up to the new standard. Contemporary Fortran is not the same as Fortran77. Many problems typically associated with Fortran are things of the past. ;)
Next is C.
Moreover, Fortran is a fairly easy language to learn.
Avoid all object oriented stuff. For scientiicf computing, this is never used, and even shunned to a great degree. Avoid c++ and C# and all that stuff. When you work in SciCom, you will never see that anyway.
Best is meaningless without a measure of goodness. (from Optimization) You are going to get a slew of candidate bests but folks aren't going to often articulate what makes it best. there will conflicting or even mutually exclusive rubrics.
The goal of the language might include:
- inexpensive (starving college student budget)
- employable (typically used and valued in your post degree career)
- fast enough (not every grad student needs to run on a supercomputer to get their job done)
- great breadth and depth of libraries
IMO the "R" language does some of these really well.
- It imports into JMP, SAS, and Python so you can wrapper it for your job.
- It is engineered and maintained by stats/math grad students so it is wide, deep, and mostly correct
- It is open source so it is free
Personally I use MatLab, which was taught in school and it hurts for the following reason:
- where I work is JMP-dominant, so it is pulling teeth to stay in the $5k/yr CAL.
- nobody else here speaks the language (statistically speaking) so I have to do extensive hand-holding to share the code
- If I am not connected by VPN to the work CAL server, I can't turn on my software
As long as I am not doing CFD I find the interpreted language is good enough. Computers today are much better than the supercomputers of 15 years ago. We have smart-phones with better CPU's than a bleeding edge Pentium II yesteryear.
I particularly like RStudio as an IDE.
http://www.rstudio.com/
I was thinking of what could give the op a performance boost while staying on a ramen budget.
*shrug* without knowing more it is really hard to say.
And I would beg/buy/borrow/steal a modest SSD to run the OS on, you can probably get both for $100 or so. Keep your data sets on the slower spinning-rust drives.
If he's going to keep the data sets on the spindles then I see no reason at all to invest in a SSD. All calculation takes place in ram, it is loaded and written to spindles... Yeah the computer will boot in 15 seconds instead of 75, but how often is this thing going to be rebooted?
Leave programming to the programmers. If you want to get science done, use LabView.
-
When searching and sorting DNA sequences, it's just [ACGT]* in a text file. Why wouldn't Perl be ideal?
by Mike Buddha -- Someday the mountain might get him, but the law never will.
I think of that as text processing rather than number crunching.
If you need to be very flexible, which is typically when you are doing research from scratch -- devising/changing your algorithms often, visualizing the data, etc., I'd suggest MATLAB. It allows you to program and evaluate stuff very quickly. If you are able to vectorize the problem you are solving, it is also very fast, since it uses highly optimized vector/matrix handling libraries.
Once you know what you want to do and how, you might want to implement your stuff in other languages, as MATLAB is cumbersome, if you for example need to process text or perform networking ... or actually do anything that cannot be vectorized. In such a case one choice would be Python, that has lots of libraries for everything.
As for C/C++ (or even Fortran :-o), I would avoid these unless you need to address a bottleneck that cannot be solved by use of an optimized library. And even in such case, I would only rewrite the bottleneck in it, nothing more, and interface with higher-level languages. Programming in C/C++ is literally a minefield for beginners. Updating/refactoring your code in C/C++ takes much more time than in higher programming languages, as you need to take care of many issues related to low-level programming (compared to Python or Java, even C++ is a low level language). Actually I'm surprised that so many people recommend it.
You did a fair amount of C? Just refresh your knowledge of C, and you'll be back to business in a few days.
Gravitation is a theory, not a fact.
Why not? It's turing complete!
6. Throw hardware at it - seriously.
This must be the most lame piece of advice on the whole page. If we were having this discussion back in 1990, then yes, "throwing hardware at it" could be the way to go. But now PCs are thankfully in the 64bit era so programs are allowed to allocate a lot of memory (of which there is always plenty in a modern system), SSDs offer fast I/O, your standard CPU has 4 cores or more and the GHz race has stagnated. And this is a very typical PC that you can get for less than 1000 bucks. So what is he going to do to "throw more hardware at it"? Get more memory? This is not going to help unless his program is paging. Should he get more cores or distribute the computation over a network? And he is going to do this with VB how, exactly? Oh, and he never mentioned VB, he mentioned VBA, which is interpreted. By Excel. Yikes.
Using the right algorithm is correct, but it's not going to help much in this case. He'll get a moderate speedup whereas picking the right language will reduce the rum-time by orders of magnitude.
I would love to see it done though ;-)
It should also be noted that as of C++11 threading is part of the C++ standard library (so you usually won't have to use pthreads or any other platform-specific threads directly).
Don't be silly, you can write FORTRAN in any language..
Organization? You must be joking..
If its really heavy compute intensive I would say use fortran its relatively a simple language to pick up rather than C++ or C.
For Java, I use System.nanoTime(). For C++, I use the Windows-specific QueryPerformance() call. So
The technique profilers claim to use is to calibrate the overhead of something like System.nanoTime() with a loop and then subtract the estimated overhead from your instrumented code. The overhead to the nanoTime() calls can even be larger than the execution time of code segments you are trying to measure, but if your estimate is accurate, this works.
I am doing plain add-subract-multiply-divide in a mix of scalar and looped operations on arrays -- if you are doing largely trig or calls into a numeric library, you are timing loops and library calls, not the intrinisic performance of your language.
I am also doing a lot of the OO version of using global variables. Java is supposed to do "escape analysis" where if you allocate inside a method and don't let a reference pass outside, the JIT is supposed to recognize that as a local-context stack allocation, but I am not using an advanced Java version or the right set of JVM flags to get that to work.
1. get back in to Fortran - especially if you'll be working with other researchers on existing code.
2. python, with the scipy and numpy libraries.
Python is being suggested frequently here. If you do go with that, I'd strongly suggest taking a look at the Spyder IDE:
http://code.google.com/p/spyderlib/
It's especially useful for scientific work and entirely cross-platform. I even have it running under FreeBSD.
But, my point is that if you have a properly written program, even in VB, the performance gain from recoding into C/C++ is not going to be all that great considering the effort involved. If time is money (and it usually is) then throwing hardware at the problem is a cost effective solution that has been used for decades to get less than efficient solutions to market.
Please note the ORDER of what I suggest. Always fully evaluate your program's performance weakness and KNOW what is causing the bulk of the problem in your system. This is ALWAYS first. KNOW how your solution scales and why your performance is what it is. I'm just guessing here, but I'll be willing to BET that the issue is not his choice of tools (visual basic) but either how it was coded and/or the nature of the problem. VB is not the fastest solution for data processing out there, but it's not a total disaster in performance either.
If you are seriously suggesting that VB is orders of magnitude slower than C++, I'm going to object. (And I'm an old C programmer with decades of experience who hates VB.) If you use VB properly, it's not great, but it's not a total dog either. You should get *some* improvement but not 10X better. Further, I'm going to claim that a novice C++ programmer is extremely unlikely to be able to punch out performant C++ code that has any kind of data structures to process. So the situation is we have some performance improvements possible, but we also have a novice programmer.
Both of these issues tell me that the least risky way to take a reasonably well written VB program and improve it's performance is to throw HARDWARE at it. Throwing programing resources who don't know C++ at it to convert it is a way to spend a lot of money/time and get nothing to show for it.
I hate throwing hardware at problems too. I've seen it done many times and it seems a waste. But I've also seen projects flounder because they where hesitant to re-spin the processor card and add that extra memory or faster processor where we spent many hours wringing out a few more bytes here and making that interrupt routine a few cycles shorter there. Of course it was really expensive to change custom hardware, so sometimes you just have to make it fit. Off the shelf hardware is CHEAP, and often it makes the most sense when you consider how much programming effort costs. I could be wrong, but in this case, I'd recommend trying to fix the VB code first, then throw hardware at it
"File to fit, pound to insert, paint to match" - Aircraft Maintenance 101
Take a look at Julia. http://julialang.org/
It is almost as fast as C, which make it much faster than Matlab, Octave, R and Python.
I'm using Julia for most of my work. It's very usable. There's good libraries for most common tasks now, and for anything that there isn't, Python functions can be called (with automatic conversion to Numpy arrays, etc) with no wrapper with a package called PyCall. This is a feature that Julia developers should really make more noise about.
For example, suppose I want to do a simple plot of a two vectors x and y:
________________
using PyCall #import the module
@pyimport matplotlib.pyplot as plt #import the Python module itself
plt.plot(x,y)
plot.show
-------------
That's it. A window with the plot shows up, exactly as it would in python.
Apart from that, Julia has many innovative features. A good type system and multiple dispatch make for a very elegant and fast language. As well, like lisp, code is data like any other, allowing for useful macros (the @pyimport is actually a macro call). I used to use Matlab, and then Python/Numpy, but a really haven't looked back after moving to Julia.
If you don't understand any of my sayings, come to me in private and I shall take you in my German mouth.
Reasonable performance - better than Python, Perl, PHP, not much worse than C/C++ or Fortran :-)
Object Oriented,readable and easy to learn quickly.
Modern day language
Widely understood in Educational field.
Can test your code on your Android phone
Donte Alistair Anderson Roberts - hi son!
Karma: Chameleon
he said he was using VBA, which is not fully compiled code and is probably a set of Excel macros. VBA in Excel isn't particularly fast and is single threaded. If, however, he moves his code to one of the compiled versions of VB then he will see a performance boost and be able to spawn multiple threads.
I do scientific programming for a living (Fusion Scientist) and have extensively used a lot of different languages in my research including:
Python
IDL
MATLAB
FORTRAN
C/C++
Ruby
When working on my own project my favorite setup is to use Python along with Scipy/Numpy. When I need extra speed I use Cython, and also use Cython to interface with libraries that are written in C or C++. For interaction with my codes I use ipython (assuming I need command line interaction) or QT (assuming I need GUI interaction). For libraries written in FORTRAN I use f2py. For plotting I use matplotlib.
This setup works very well for me. It is fast and powerful, almost completely platform independent, and has excellent mathematical and scientific library support. It is extremely easy to integrate C/C++ or FORTRAN code into a Python project which can be extremely useful. It is also very straight forward to do basic multithreading and parallelization. Interactive debugging is very easy and can really help both in the development and in finding problems with scientific calculations. Plotting support is fantastic and easy.
I would say that the next best option is MATLAB. This has good support, an excellent mathematical and scientific library and good plotting tools. I don't particularly like the language for large and complicated projects, and it does require a license, which can make it difficult or impossible to share codes between institutions.
Working directly in C/C++ or FORTRAN is fine for certain kinds of large projects, but is inconvenient for working on lots of small projects or numerous related calculations. Doing something simple like creating a plot requires a significant amount of programming, and debugging can be very time consuming.
I would stay away from IDL; while a nice language in many respects, it is quite out of date at this time and is no longer well supported in terms of staying current. Ruby does not have sufficient support in terms of math/science/plotting libraries at this point to work well for scientific programming.
At the end though it does really matter what kinds of projects you will be working with and what your final goals are. It also matters who else you will be working with to make sure that code can be easily shared.
Read the summary and my last comment again, carefully, and the comment by user "confused one" right below. He is using VBA not VB. Visual Basic for Applications. Not the same thing. And yes, a re-write from VBA in any compiled language will get him at least a speedup of one order of magnitude, maybe two. Add to this your advice on algorithms and he won't know what hit him.
It's not that I don't like updating the hardware because of cost or whatever, it's that I can't add more hardware any more. For a single threaded application in VBA there is nothing you can do to make it faster anymore hardware-wise. Again, algorithmic optimization is to the point but "more hardware" isn't. Even massively parallel applications hit a wall at some point where the data distribution and communication costs start to outweigh the speedup you get by the extra processors. BTW, I found out that VB can also be parallelized natively as well, but the re-write alone from VBA to VB will do the trick.
You know C. C is simple, as fast as any alternative, it's straightforward to optimize (aside from pointer abuse), and you always know what the compiler/runtime is doing. And threading libraries like pthreads or CUDA are best served via C/C++. Why use anything else?
This is just nonsense, and to see it constantly repeated and modded up is just sad.
C is only simple in the same way a written alphabet with only two letters is simple: sure, you only have to remember the letters A and B (simple!), but actually using it is not simple.
For crying out loud, in C, you can't even do A = B + C; without having a very good chance of invoking undefined behavior. Why? Because in C, overflow or underflow on signed values has undefined behavior!
Access beyond the end of an array and damage data elsewhere in the system (making it often really hard to find)? No problem!
Laboriously managing your own memory (and probably leaking it)? No problem!
What, real strings? Heck no, real men like to take the risk of overflowing the strings and their buffers!
C is filled with literally hundreds of mine fields just waiting to trap the unwary, and often forces you to write a lot of code that would only be a few lines in a higher level language.
C is not simple to use. C is not simple to use.
Javascript is the language of the UI of the future: the browser. ... lacking libraries BUT it can call C/C++ routines or the latter can be converted to JS/NodeJS using Emscripten/LLVM.
Javascript is fast enough: 2x C/C++ (on par or faster for some tasks).
Javascript is
Speed only makes sense in real-time apps, say day-trading or control systems. If you are crunching numbers, what difference does an hour or a day or a week make? Seriously, if you are sitting around idle while your numbers a crunching, you are a waste of space. You should be:
planning the next experiment,
reviewing/refactoring your code for correctness and efficiency,
writing the paper in which your results will be published (esp. intro, background, experimental set-up/procedure),
setting up the website on which you will publish the pre-print so others can review your work and maybe prevent you from publishing foolishness,
fleshing out your next steps to follow your results,
thinking of your next great hypothesis.
Screw speed, it will come with faster processors. Code, make your code correct, and make it accessible and shareable. Javascript/Nodejs will do that in spades.
"Consensus" in science is _always_ a political construct.