Julia Language Seeks To Be the C For Numerical Computing
concealment writes in with an interview with a creator of the (fairly) new language Julia designed for number crunching. Quoting Infoworld: "InfoWorld: When you say technical computing, to what type of applications are you specifically referring? Karpinski: It's a broad category, but it's pretty much anything that involves a lot of number-crunching. In my own background, I've done a lot of linear algebra but a fair amount of statistics as well. The tool of choice for linear algebra tends to be Matlab. The tool of choice for statistics tends to be R, and I've used both of those a great deal. But they're not really interchangeable. If you want to do statistics in Matlab, it's frustrating. If you want to do linear algebra in R, it's frustrating. InfoWorld: So you developed Julia with the intent to make it easier to build technical applications? Karpinski: Yes. The idea is that it should be extremely high productivity. To that end, it's a dynamic language, so it's relatively easy to program, and it's got a very simple programming model. But it has extremely high performance, which cuts out [the need for] a third language [C], which is often [used] to get performance in any of these other languages. I should also mention NumPy, which is a contender for these areas. For Matlab, R, and NumPy, for all of these options, you need to at some point drop down into C to get performance. One of our goals explicitly is to have sufficiently good performance in Julia that you'd never have to drop down into C."
The language implementation is licensed under the GPL. Lambda the Ultimate has a bit of commentary on the language, and an R programmer gives his two cents on the language.
You mean, ignored by almost every developer in the field in lieu of more "business-friendly" languages that add bloat and inefficiency?
I use Sage quite a bit. It's basically a wrapper for almost all the mathematics software available. http://www.sagemath.org/ While you still need to drop down to C for great performance, it solves a lot of the interoperability issues discussed. In other words, take the example from the summary: from Sage, you can call Matlab commands and then immediately use the results with R commands. Sage works through a web browser, and it's based on Python, which is a plus.
Three days from now?? Thats tomorrow!! ~Peter Griffin
In my opinion, the new code in Julia is easier to read than the R code because Julia has fewer syntactic quirks than R. More importantly, the Julia code runs much faster than the R code without any real effort put into speed optimization. For the sample text I tried to decipher, the Julia code completes 50,000 iterations of the sampler in 51 seconds, while the R code completes the same 50,000 iterations in 67 minutes — making the R code more than 75 slower than the Julia code.
That certainly caught my attention!
The XKCD comic you cite is correct for some standards but software languages are much more complex than standards and, in fact, many of them implement common sets of core standards. Once you get specific enough, you're not talking about a standard but rather a specific implementation of how to accomplish something.
My work here is dung.
From wikipedia: "FORTRAN is a general-purpose, procedural, imperative programming language that is especially suited to numeric computation and scientific computing." Sounds to me like unless there's a particular weakness in FORTRAN that doesn't lend itself to workarounds or repair in newer versions of the language, there's already a numeric computation and scientific programming language that's well documented, mature, and widely distributed.
Do not look into laser with remaining eye.
What will make or break this language is the availability of addon packages for it. A lot of people who use R don't do much coding themselves. They read in data, preprocess it a little bit, and then apply one of the packages found in CRAN.
CRAN is like CPAN, but for R instead of Perl. And we can expect similar behavior from them. Perl probably wouldn't be anyone's first choice for a project these days, but the size and scope of CPAN makes it really really easy to benefit from the work of others. This is a lot of inertia, and a big reason why Perl is still used when newer languages have significant advantages.
There's so much software, particularly academic software, implemented in R that I just don't see it going away. e.g. the entire Bioconductor suite is implemented in R. Just about any bioinformatics paper you pick at random will refer to, if not contain R code.
How much work are we going to have to reimplement if we want everyone to use the one true numerical programming language? And if we don't want that, isn't it just contributing to fragmentation?
Give me Classic Slashdot or give me death!
Robust, mature, fast, easy to use, side-by-side with .m it wins hands down, really no comparison, use Python.
Cython if you need to make it faster for the %5 of code that is too slow.
import numpy
import pylab
What is that a hash of the source code?
Careful... wouldn't want to give the Mozilla devs any ideas.
Fear is the mind killer.
This may seem petty, but one of the biggest sources of relief to me in changing from Matlab to R and Numpy was finally leaving behind that damned operator syntax where element-wise operations need to have an extra dot prepended. That is to say, if I have an array t of times and an array x of distances, I want to be able to get the corresponding array of speeds using x / t. In Matlab and Julia I must instead use x ./ t.
It seems like no big deal, but it is unbelievable how many Matlab bugs I wrote due to that little difference. True linear algebraic operations are so rare, at least to me, that I am far happier giving them the special operators and reserving the usual operators to work element-wise.
I also must have named arguments and default values. It's a pity, because otherwise it looks to have decent syntax, good speed and nice parallelization. For now, I'm sticking with R, numpy and C.
The other obvious language to come to mind is APL. Anyone looking to write a numerical processing language should have some APL experience.
Yes, it is a pain to learn all the symbols. Programs are incredibly dense, making them difficult to understand and debug, but there are also a lot of cool things you can do with the language. In building a new language, there's a lot of good stuff there to incorporate.
Once certainly does choose MATLAB over C -- one chooses the MATLAB language over C because the former makes it much easier to represent many mathematical operations; one chooses the MATLAB libraries and execution environment because they are richer than C in mathematics building blocks. When a particular numerical analysis needs to be performed at most a few times, development time becomes a major factor in the cost, which is why people would prefer MATLAB over C -- but the MATLAB execution time might be so long that alternatives become interesting.
(I suspect that Julia and R do not have code generation for signal processors, but Mathworks and its partner companies will gladly sell you tools that will convert a subset of MATLAB code to C or an HDL to run on an embedded system or FPGA. They will even give away free stories about how awesome those tools are, while glossing over their limitations, but hey -- they are sales pitches..)
The weakness of FORTRAN is that it entirely misses out of 50+ years of research and innovation in programming languages.
OK, maybe the original version of Fortran, the one made 50+ years ago, missed out on "50+ years of research and innovation in programming languages", but you are aware that Fortran has been updated since then, right?
Fortran now includes a great number of the improvements to programming languages made since then. But don't take my word for it -- check out Wikipedia's page on it. I picked Fortran 90 as a starting point, but there's been many versions of Fortran made since the first, with new features (often coming from other languages) being added all the time.
And not only is Fortran still being actively developed, but the library of well tested and optimized numerical computing code already written it it is massive.
I'm not saying that there's not room for a new language, and certainly, Fortran doesn't have all the features of some new languages, but your claim that Fortran "entirely misses out of 50+ years of research and innovation in programming languages" is completely and utterly wrong.
I should also mention that they stopped calling it FORTRAN in all caps back in 1990 or so when Fortran 90 came out. Now it's just Fortran. But even the venerable FORTRAN 77 benefited greatly from programing language developments available at the time.
Why does the Oblig always misses the <a> tag? Even in Chrome, select + goto takes more time than a simple click.
Because if don't know the xkcd strips by number, there's a card you might be expected to hand in as you leave.
PlusFive Slashdot reader for Android. Can post comments.
No, all distributed versioning systems use hashes for changesets. Fossil and Mercurial - distributed system systems designed by and for humans - use a monotonic counter for version numbers.
I am TheRaven on Soylent News
"(Did we mention it should be as fast as FORTRAN?)"
They want to design a language for speed, but they already made choices in the language that hamper speed dramatically, like dynamic typing. Dynamic typing adds overhead to every function call; it's fine if your functions do a lot of work, not so much if they do relatively little and are called very often.
It looks like if you want to write fairly low-level code, you'll still need to write it in C there...
It also looks like their approach to parallelization is very heavy-weight and, albeit usable in clusters, it will yield both poor scalability on large systems and poor performance on simple multi-core systems.
There is already a high-level, dynamic and accessible language for numerical computing, it's MATLAB. It wraps a lot of high-performance libraries, using them without the user even noticing it. Code in MATLAB can easily be faster than in C for some constructs because C compilers, unlike MATLAB, do not recognize some patterns and replace them by optimized library calls. For this reason, MATLAB is great when you're coding with high-level constructs, but suffers from poor performance when using low-level constructs (such as accessing data element by element) for the same reasons as pointed out above.
A new language for high-performance numerical computing should allow both the high-level programming of MATLAB and the possibilities of a low-level statically compiled language like C. The best contender for this is C++, which has tons of high-level and fast libraries for transcendental functions, linear algebra, statistics, image processing, signal processing, etc.
As for FORTRAN, it's great for writing one thing well and fast, but it doesn't have any mechanisms for more high-level programming or code re-use, which means it is annoying to maintain, extend, or to even guarantee consistencies between the different subroutines of a large application. It also relies a lot more on what the compiler will do, while with C/C++ there is more control on what happens with regards to vectorization, parallelization or data transfers, which can be critical for heterogeneous systems.
The physical world, and the hardware in the computer, is stateful, not-stateless. There is a finite amount of storage, which can be overwritten.
The idiomatic programming model for functional language isn't like this.
In a functional language to ensure you get fast code, you have to both have a mental model of the program, and a much more complex mental model of the transformations that your functional compiler might (or might not!) apply. This is often exceptionally hard.
A human, like a numerical programmer, has some clever knowledge about how best to order and arrange things to map to an efficient implementation in a stateful world.
Take, for example, a production-level SVD algorithm. You could probably express a SVD method in a functional way. Would it be fast, and have low memory usage, no needless temporaries? (in high performance computing these always go together) Well maybe but you'd have to really massage things in light of a particular implementation's optimizer & quirks. That isn't something scientists have the desire to do.
In practice, the capability of imperative, but data-parallel languages best map to their user's knowledge and capabilities and existing technology for quality execution.