The Power of the R Programming Language
BartlebyScrivener writes "The New York Times has an article on the R programming language. The Times describes it as: 'a popular programming language used by a growing number of data analysts inside corporations and academia. It is becoming their lingua franca partly because data mining has entered a golden age, whether being used to set ad prices, find new drugs more quickly or fine-tune financial models. Companies as diverse as Google, Pfizer, Merck, Bank of America, the InterContinental Hotels Group and Shell use it.'"
... most others keep thinking that M$ Excel is the silver bullet.
Sad, but f****** true.
R!
Growing in use? sure.
The Kruger Dunning explains most post on
Why they chose to go with R rather than T, I'll never know.
There appear to be duplicate links in the summary :)
...if at first you don't succeed, then skydiving is not for you.
Weaselmancer
rediculous.
My request is to those that are in the know to show me some example code, that does something useful. Then later, compare that code to code from other languages to accomplish the same task.
Include reasons to support the notion that the R language is [necessarily] better at what it does.
Now you'll have to do it again
My UID is prime... is yours?
All that scrambled verses and you forgot the part where the Nazi torpedoed Noah's Ark so it ran aground at the mountains of Ararat.
Oh god... cue pirate jokes.
Very true. This is what I try to explain to people when they can't understand why some software is given away gratis. Because if they charged for it, given the current attitudes of the market, they wouldn't stand a chance and wouldn't ever get any market share to begin with.
Billy Brown rides on. Yolanda Green bypasses Gary White.
Good thing Boeing's not using fere software for aircraft simulation tools, space station labs, sub hunters, or moon rockets ;-)
education is no substitute for intelligence
Actually that wasn't why I used R, just a fun addendum. The reason to use R is the huge body of statistics, data mining and graphics facilities. Superb.
Of course, the problem with any statistical library is you have to turn your brain on first. Nothing produces "Garbage in Garbage out" quite like statistical analysis.
With R you tend to need to spend far more time thinking about why you are doing something, and what the answer means than in say vanilla C/Ruby programming.
Which is actually not a Bad Thing at all.
The worse thing about R programming is its name. Googling for "R" turns up way to much noise and way too little signal.
The language is very well documented online and the mailing lists contain thousands of examples. It is primarily for statistical analysis, and the libraries available for doing such analysis are unparalleled.
Well.. maybe. Or Maybe not. But Definitely not sort of.
Calling R a programming language is like calling Mathematica or Matlab a language. R is a system for statistical tasks that has a language and snytax, and but it is not capable of producing stand-alone executables that do not require the entire R environment.
"I think it addresses a niche market for high-end data analysts that want free, readily available code," said Anne H. Milley, director of technology product marketing at SAS. She adds, "We have customers who build engines for aircraft. I am happy they are not using freeware when I get on a jet."
Wow...talk about FUD. Does SAS imdemnify against plane crashes?
Actually it may not suck. But having used it on and off over the past few years while not being a statistics pro, I find the R language bletcherous and annoying. - as an assignment operator?
Anne H. Milley, director of technology product marketing at SAS ... adds, "We have customers who build engines for aircraft. I am happy they are not using freeware when I get on a jet."
Help fight continental drift.
http://www.arrrrrr.com/corsair.jpg
"The worse thing about R programming is its name. Googling for "R" turns up way to much noise and way too little signal"
Try searching from http://rseek.org/ instead of directly from Google.
The statistical libraries are what I use it for, but what pisses me off is the lack of more general data structures, like dictionaries, or vectors of nontrivial types*. It's not nearly as self contained and consistent as, say, Python.
If Python's scientific libraries get better, I'll probably switch to that.
*e.g. if you want a list of functions, it has to be a linked list, which then has O(n) access time.
The "smart set" needs a such a high level lingua franca to express infinite precision financial models of no accuracy whatsoever!
Seastead this.
I remember once years ago freaking my colleagues out with a largish app written in R... with nary a loop anywhere.
That's a feature of functional languages, a class that also includes Scheme and XSLT. The basic idea is that programs should not have state, because state makes them harder to debug. A for or while loop, by definition, has state, so you have to do your iteration some other way, namely Tail Recursion.
I suppose that makes sense, but I've never been able to teach myself to think that way. It's the main reason I never managed to get through The Wizard Book.
I retract my sole criticism of R.
Q is awesome.
x = vector(mode="list")
x[["joe"]] = y
x[["bob"]] = z #z can be a function!
x = list(joe=y)
x$bob = z
Or at least in the context it's made out to be in this article. Isn't it a language suited mostly to statistics? For that use, I hear that it's one of the best.
How does R compare to Python + numpy + plotting libs
The R language (yes, it's a language; an interpreted languages is a language too) has developed as the language of choice by statisticians (both academics and sundry statistical researchers) around the world as their main computer language. It is used in those cases where researchers feel the need for customized computations rather than the use of a package like SAS or SPSS.
The reason that R has become popular is due to a snowball effect and history. It started as a FOSS re-implementation-from-scratch of the "S" language designed for statistical work at Bell labs (see http://en.wikipedia.org/wiki/S_(programming_language). Some academics and researchers of repute used it (the S language) because at that time (1975) it was very innovative and far better than most alternatives, and others followed. The S language gained a measure of acceptance among statisticians. Then when R became available the cycle intensified because of the much improved availability of the interpretor and its libraries. This cycle continued to the point that by now probably most professional statisticians use it.
As far as I can see, the R language isn't especially sophisticated or elegant, and may strike people used to more modern languages as a bit repugnant. It does however excel in three respects:
(a) it allows for easy access of Fortran and C library routines
(b) it allows you to pass large blobs of data by name
(c) it makes it easy to pass data to and from your own compiled C and Fortran routines
The first reason is particularly important because it allows one to use e.g. pre-compiled linear algebra package like LAPACK, or Fourier Transforms, or special function evaluations and thereby gain execution speeds comparable to C despite being an interpreted language (just like Matlab, Octave, Scilab, Gauss, Ox and suchlike): the hard work is carried out by a compiled library routine which is made easily accessible through the interpreted language. Any algorithm needed in statistics that's available as C or Fortran code can be linked in and called without too much effort.
The second reason is important because it slows down execution much less than any pass-by-value interpreted language would, and it allows you to change data that is passed into a function.
The third reason is particularly important because it helps researchers be more productive. Reading in your data, examining it, graphing it, tracing outliers and cleaning them up is best done in an interactive environment in an interpreted language. Coding such things in C or Fortran is an awful waste of time, and besides, researchers aren't code-monkeys and don't enjoy coding inane for-loops to read, clean, and display data. Vector and matrix primitives are far more powerful, and usually preferable unless they are so inefficient that you have to wait for the result. However, there are times when you just need to carry out standard algorithms (linear algebra, calculation of mathematical or statistical functions) or simply time-consuming repetitive algorithms that run so much faster in a genuine compiled language. You could start out by coding the algorithm in an interpreted language to check if it's working, and then isolate the computationally expensive part and code it up in C or Fortran. Using R (or Matlab or Scilab) you can *call* the compiled subroutine, pass it your (cleaned) data, and get the result back in an environment where you can easily analyze it.
That's why languages like R, Matlab, Scilab, Octave, Gauss, and Ox are so productive: you get the best of both worlds. Both the convenience, interactiveness, and terseness of a high-level interpreted language and the speed of compiled languages.
So why R, and why not Gauss or Matlab or whatever?
Well, part of that is cultural. If you're an econometrician you'll have been weane
The worse thing about R programming is its name. Googling for "R" turns up way to much noise and way too little signal.
use RSeek.org
Problem solved.
The worse thing about R programming is its name. Googling for "R" turns up way to much noise and way too little signal.
Yep. There are a couple of dedicated R search engines that can help with that: http://www.dangoldstein.com/search_r.html and http://www.rseek.org/. It may also sometimes be useful to Google on "Splus (whatever)" since most R and S+ code is pretty much interchangeable.
The correlation between ignorance of statistics and using "correlation is not causation" as an argument is close to 1.
I think we all know how well that's turned out, eh? So it that the fault of the language or programmer error?
...find new drugs more quickly...
Drugs? Sign me up!
The worse thing about R programming is its name. Googling for "R" turns up way to much noise and way too little signal.
I see your addendum that it works better now, but searching using just a single letter or word usually doesn't work well, you need to give it context. You can enter "tank" into Google and it's going to give you a lot of mixed results because the results could be for fuel, water, waste, military armor or some other use, so you need to give Google something to narrow down the results. So "R programming language" would probably be a base to start from.
What folks are not talking about is that it used extensively in bioinformatics, especially by the Bioconductor software project. This is highly useful for recently developed and increasingly used microarray and large scale DNA sequencing experiments in biology and disease oriented research. This field is helping to transform biomedical research and ultimately medicine.
A plea to future language developers perhaps
Please dont name your languages with 1 character like C or R, give them longer names.
something like staRtistics would probably work great in a search engine. And the intentional mispelling would give me some joy.
That sounds extremely weird: if a program has a stack, then it has a state - the location on the stack is still state. Thus, if you use recursion, you still have state. I mean, you can try to hide the fact that you have state, but I don's see how you can have a program without state.
Even the wizard book appears to have a chapter on state: http://mitpress.mit.edu/sicp/full-text/book/book-Z-H-19.html#%25_chap_3 , but, unlike your description, instead of talking about a program without state, it considers two kinds of state: the state of objects, or the state of streams of data.
Do you happen to have a link to what you mean by "a program should not have state"? Because, I mean, that seems antithetic to the nature of a program.
Have you considered using sage (www.sagemath.org)? It is FOSS and has highly active community and developer support. I'd suggest reading the tour http://www.sagemath.org/tour.html and seeing what you think.
I remember once years ago freaking my colleagues out with a largish app written in R... with nary a loop anywhere.
I'm sure you had plenty of loops in your code. They were just hidden via the use built in functions. Not that that's a bad thing.... just saying. You have to understand the mechanics of the calculations to use them properly, and over-reliance on built in functions can make it too easy to talk out of your ass.
You didn't have any friends in the 3rd grade.
... and has objects, and is even oriented by them...
Plus it has a huge following of lots of smart people writing libraries, functions, and tools for it... so it's simple to do something if someone else has done _exactly_ that thing.
But, I find it incredibly painful to use for anything outside of what's already been done. I spend more time trying to work around silliness than I do actually solving my problem.
Case #1 - I want to normalize gene expression data, using this cool, published algorithm. In basic terms, it does math on a table of numbers, and spits out another table of numbers. Simple? No, the only available package/example uses, start-to-finish, it's own functions, and assumes that my data comes from a specific vendor. My lab didn't use that vendor to collect our data? Out of luck... maybe try shoehorning the data we do have into some of the special objects (objects - not a matrix of numbers, it has to be an object of a specific type, with no way to make the object other than directly importing from vendor-specific files that we don't have because we used a different vendor), cross fingers, and hope it works...
Case #2/3: I have more math I want to do, on other tables of numbers (unrelated). I play interactively with my code, and get my answer... but it only appears when I use the 'pretty print' function for formatting my output variable. This does make a nice-looking set of output, with various summary statistics as well as the 2 numbers that I actually care about, but I want to do this hundreds of times, using input from another set of completely different programs. Fool me, I try to access those numbers out of my 'result' object. No luck... it has subcomponents for all of the ancillary statistics, convergence results, starting parameters, etc. etc, but not the freaking answer I ran the function for in the first place! Fine... I'll run it in batch mode (The documentation for which points out that it captures all of the output in a file for you) and parse the output using either perl or a frankenpipe of grep, less, and sed. Lo and behold, when I run this from under a shell or perl script (hundreds of times, remember), it produces a zero-length output file, instead of my actual answer. I can run the exact command line for an arbitrary invocations in my shell and it works perfectly... making the parsable output file and everything.
When I google this phenomenon, I find other people with the exact same problem asking for help in forums. The response is (and I quote): "Working as designed." This is the point where I throw my hands up in the air and (rhetorically) wonder who is going to introduce these people to the idea of UNIX... I said this was cases 2 and 3 because this has happened to me twice now... yesterday it was a nonlinear estimation problem (which I _still_ don't have working noninteractively), before it was a clustering thing that I finally gave up and used SAS for.
I try to use open source solutions whenever possible - my graduate work in bioinformatics (computational methods development for figuring out the causes of human disease) is supported in part by public funding and as such I believe it should be as widely distributed/unencumbered as possible, but nonsense like the above makes me prefer SAS (which my university gets for free, probably partially because it was first developed here...). Even if it has many many (many) quirks, at the end of the day your inputs are tables of numbers, your outputs are tables of numbers, and it is relatively simple to go back and forth to text files (containing tables of numbers). And for the record, I started programming in both of these languages at about the same time, for about the same projects, so it isn't a case of "language A is easier because I've been using it for years, whereas B is new and frightening."
-DuctTapeBoxen
Ross Ihaka was my supervisor for my honours dissertation last year. Reading this article was a bit amusing for me. If you asked him for his opinion about R, let me say he wouldn't have written such glowing words about it! He doesn't like being in the limelight all that much either, from what I have been able to tell.
I may have some idea what is meant by the "wanting to create more advanced software" at the end of the article. At the moment he is tinkering away rebuilding the guts of R in Lisp. He reckons if "things were done properly", R would be orders of magnitude faster and more efficient. For example, when fitting a linear model, several copies of the data matrix are made when performing the matrix operations required to find all the coefficients, working out diagnostic matrices, et cetera.
So if anyone out there wants to contribute to R, now would be a good time to volunteer.
Labview is well designed for its intent. So someone with minimal programming skills can sit down and get something done in a short amount of time. Would I use it for crunching numbers or collecting terabytes of data, probably not. But its sure damn handy if you want to interface test equipment and get results. Its all about the best tool for the job.
Only the State obtains its revenue by coercion. - Murray Rothbard
I think you mean
R! Jim lad!
You have to play with it. As with APL you'll either love it or hate it.
If you like the idea of a language that includes relational tables as a primitive data type, that extends most operators to do the right thing when you feed them vectors and matrices, that has linear regression and equation solving built-in, you'll probably like R.
I'm a Programmer. That's one level above Software Engineer and one level below Engineer.
There's nothing like it for doing math and graphs.
On one level, Matlab, or at least the Figure Windows and M-File editor and Command Windows are written in Java and the main Matlab windows are JFrames. People talk about Java having not gotten anywhere "on the client" and "where are the Java shrinkwrap apps", but a bunch of stuff -- Matlab, Mathematica, Maple -- have GUI front-ends done in Java. You can tell because when one of these packages is slow to load, you see the "flaming coffee cup" icon on the taskbar.
Word and even OO may not be written in Java, but for the apps in question, there is a big demand for having them on the Unices on account of the academic market for them, and Java appears to be the cross-platform GUI thingy of choice for this tier of commercial apps.
So, is the R front end Java-based, or are they using something else to be cross-platform?
The other thing about Matlab is that it is usable as a Java scripting environment. You can create instances of Java classes from the command window, assign these instances to Matlab variables, pass (by value) Matlab arrays in and out of Java functions without too much fuss, poke at Java class instances (that is invoke methods on them).
You can even embed Java Swing widgets in Figure Windows, although the javacomponent Matlab command is thinly documented and perhaps not yet officially supported.
Does R allow the same thing? How Java friendy/compatible/implemented is it?
I think he says "state" when he means "heap."
Don't blame me, I voted for Baltar.
Actually, calculations in R are just vectorized. Internally, there must be some looping, but the language hides it.
is that SAS is not amused of the competition
Say you realize that you need to check for another corner case that you forgot, or need to extend a function for another purpose, or whatever. In any other language, you would type a few lines of code and be done with it. Not with labview. With labview you have to move things around to make room for the new code, disconnect wires and reconnect them. NI has added stuff into the newer version to help with this (auto growing, etc) but it still turns into a mess in short order.
Other things are just easier to type than to draw, and also easier to read in text then as a schematic, like equations. So much so that they have added the ability to type portions of the code, but the amount of setup that you need to do with a code block often defeats the time benefit you get from using it.
As someone who likes "clean code" I find LabView much more tedious and time consuming to keep neat, and when dealing with other coders that are not as picky, I find that their LabView code is much messier and harder to read than Java or C code by the same developer.
Arrrrr! Yes, my matey!
What will happen to the naming convention of programming languages when we run out of letters of the alphabet?
Do you happen to have a link to what you mean by "a program should not have state"? Because, I mean, that seems antithetic to the nature of a program.
Of course there is a state, you're using a standard computer to run the program, so there must be a state somewhere. Still, the point is that even if the language implementation works by changing the computer's memory state, the abstraction you use to program isn't state-based. In a pure functional programming language, you don't program by manipulating a state, but by computing the results of functions.
Regarding the SICP book, like most functional programming languages, Scheme isn't a pure functional language. It contains constructs with side effects, which actually change the program state directly. Such constructs are available because there are problems that are very difficult (but not impossible) to handle with pure functional programming, so language designers end up making compromises.
Just my 2 (Euro) cents
Like many posters have already said, the syntax of R is terribly outdated, and that's a first problem with me (I started programming with Python, go figure). But the main problem I have with R is the performance. A lot of functions and packages are dead slow or quite memory hungry (compared to a, say, C++ equivalent - for the initiated, check out the performance of rma from Bioconductor with RMAExpress, which is written in C++).
Another issue I have is not with R itself, but with its most popular add-on, the Bioconductor suite, widely used in bioinformatics. The packages' quality varies a great deal, and there's no way to file bug reports (unlike R itself, which has a bug tracker) short of emailing the authors, who, being academics, may not even have the time/will to reply to you. I'd love to see stuff like Bioconductor in a more recent programming language, but I doubt it - doing this kind of stuff doesn't give you any funding.
A CC-licensed illustrated horror novel
i R baboon!
Labview is utterly non-deterministic in its execution. The execution order of blocks does NOT follow the data flow of the lines joining them if there are more than a handful of blocks present. In fact, the execution sequence becomes random, and changes randomly when block positions are changed (even without changing the data connectivity). This forces the use of explicit sequence structures in any non-trivial function, increasing its complexity and opacity. Just try synchronizing shared data between asynchronous loops. Even their Knowledgebase admits that there's no way to do it properly.
And let's not get started on the crappy content of Labview's documentation. It's organized and formatted tolerably well, but the content is vacuous. Hardly any functions have any suggestion of their behaviour when faulty data arrives (e.g. a NaN), for example.
Those who can make you believe absurdities can make you commit atrocities. - Voltaire
That's not what I understand to be functional programming. The basic rule of functional programming is "no side effects", so the same input(s) will always produce the same output. Or, to put it another way, a function doesn't maintain state between invocations. This isn't incompatible with loops. For example, you could write a function to calculate the fibonacci series up to n which internally used a loop, and still call it functional.
"With R you tend to need to spend far more time thinking about why you are doing something, and what the answer means than in say vanilla C/Ruby programming."
C & Ruby? You're comparing Apples and Oranges sir.
Other common ones are filter, fold. For loops and the like that require state, typically you use tail recursion, and most functional programming langauges trivally expand that optimization to the general case of tail call optimization, where any function F whose return value is the return value of the function G, then you don't need to retain F's stack frame, as F's result is simply G's result. * C and C++ allow for the use of function pointers, which can allow a functional style. C++ is better than C is this regard, as the STL also comes with a set of functional algorithms such as map.
R is statistics oriented, where Octave and MATLAB are more general mathematical computing environments.
if you are coming from a programming background, I assure you that you'll hate gui oriented tools like spss. if you have a slightly better understanding of probability and the notion of sampling, you'll find that the way r approaches data as a whole feels very nice for a developer.
in data analysis, you'll be transforming, filtering typecasting data. you'll be turning numbers into nominal values, you'll be sampling from complex distributions etc. writing code to do these lets you stay in complete control of what you're doing and at least for me looking at a function line by line is a much better way of seeing what I'm doing, instead of clicking on icons and selecting menu items in a particular order. the whole process is documented in code, and for me it is much easier to map the things I'm doing to statistics.
the downside of R is, as you start dealing with more and more data, things become a little bit harder since R takes all data into memory and processes it in memory. scaling into very large amounts of data is not easy, at least I can't afford as much ram as necessary (my data sets are really huge). the most important advantage of some commercial tools in my work is that they can treat data as a stream on disk, and even if it takes longer, it is possible to process huge amounts of data.
R forces you to think about what you are doing, instead of hiding behind spss and saying, well these are the results spss has given us, when we clicked these buttons and menus. my 2 cents of course.
Actually, R hasn't used linked lists to represent its "list" data structure for a long time. So it is very much like python indeed.
"The worse thing about R programming is its name. Googling for "R" turns up way to much noise and way too little signal"
Try searching from http://rseek.org/ instead of directly from Google.
I usually google for "R-help" plus whatever term I'm interested in. This gets responses from the enormously popular and helpful R-help mailing list.
I remember once years ago freaking my colleagues out with a largish app written in R... with nary a loop anywhere.
That's a feature of functional languages, a class that also includes Scheme and XSLT. The basic idea is that programs should not have state, because state makes them harder to debug. A for or while loop, by definition, has state, so you have to do your iteration some other way, namely Tail Recursion.
I suppose that makes sense, but I've never been able to teach myself to think that way. It's the main reason I never managed to get through The Wizard Book.
R has a scheme-like lower layer but it feels more like APL with its array manipulation capabilities than Scheme. It does not support tail recursion.
Such constructs are available because there are problems that are very difficult (but not impossible) to handle with pure functional programming, so language designers end up making compromises.
Not that difficult, really. The main problem is sequencing, which is provided by things like function composition. The problem is the unwieldy nature in languages like Scheme of specifying sequencing using pure functions, while also handling data that doesn't require sequencing; but this is a syntactic problem, not a practical one.
The issue is handled admirably by the language Haskell, using a mathematical construct called a "monad" to allow an elegant way of handling sequencing--even a syntactic sugar "do" notation that looks vaguely imperative--while remaining 100% pure, unlike Scheme.
I think he says "state" when he means "heap."
More likely he's thinking of something along the lines of "maintaining referential transparency". That is, the only sense of identity a variable has is its value, not its storage location, meaning that complicated data structures can't be altered, only used to construct new, different versions.
This incidentally makes multithreading alot easier, too. :)
... but for some curious reason it requires a fortran compiler as well as a C compiler to build the whole system.
Sorry R team , but theres no way I'm installing an entire fortran compiler just to install your system.
Why can't they write the whole thing in C for heavens sake? The days when C couldn't handle floating point very well (if thats the reason) are long gone.
No, it's P. It's like R but it's missing a leg!
Actually, R is an imperative language. You can loop just fine, as well as modify variables (state) and everything else you'd expect from a normal language. (Though functions are first-class objects and all that)
The reason you don't is that it's like SQL in that its statements can work on whole tables at a time. For example, applying mean() to a table gives you an array with the mean value of every column (and the array positions are named after the column names).
If you want to get the rows of an array (x) with 0 on column 1, you do x[x[,1]==0]. basically, x[,1] gives you a one column table with the appropriate column, x[,1]==0 gives you a vector of true,false values, which you can use to index into the array again.
You can do things like by(x,x[,1]==0,nrow), which gives you how many rows have 0 and how many don't on column 1. by(x,x[,1]==0,mean) gives you the per-column mean of the rows which have 0 on column 1 and the same for those that don't
Anyway, you get the idea. You can loop just like can use cursors in SQL, but your code will be slower and less readable if you do.
In a pure functional programming language, you don't program by manipulating a state, but by computing the results of functions
Which is why I always found the entire 'functional programming' paradigm absurd and useless. When you deal with I/O, you have states ('printf'). When you deal with a user (interface), you have states. Hell, even when you deal with time you have states ('before' and 'after').
If all you want to do is compute f(x), then OK, maybe, but besides that nothing useful has ever come out of functional programming except Emacs. Well, let me rephrase that: nothing useful has ever come out of functional programming, period. The amount of time and brain power wasted on things like Lisp for no result whatsoever is astounding.
Non-Linux Penguins ?
BOGUS languages are much better, more exclusive and more profitable programming languages. I encourage everyone to put that they are proficient in BOGUS languages on their resume.
to can: verb; means "to throw away", as in "we can the financial crisis on idiots".
A more complete answer would be: the R system is basically an interpreter which accepts messages and terminal connections, like a server. It has no "native" front-end (except a stark command-line window) per se.
This means that *any* application that can log in to R as a client or send the R interpreter messages and capture the response can use the R interpreter as a slave, and can hence act as a GUI. It doesn't matter if that's a Windows application, a Linux application, or a Java application, or a web-server (R can work as a back-end to a web-server too). The only thing is the amount of work needed to get things working and to actually code up the GUI. There is a package called Rserve (see http://www.rforge.net/Rserve/) that greatly facilitates this for C++ and Java. Below is a quote from the Rforge repository:
Rserve is a TCP/IP server which allows other programs to use facilities of R (see www.r-project.org) from various languages without the need to initialize R or link against R library. Every connection has a separate workspace and working directory. Client-side implementations are available for popular languages such as C/C++ and Java. Rserve supports remote connection, authentication and file transfer. Typical use is to integrate R backend for computation of statstical models, plots etc. in other applications.
The following Java code illustrates the easy integration of Rserve:
Rconnection c = new Rconnection(); double d[]=c.eval("rnorm(10)").asDoubleArray();
d now contains 10 random samples from the N(0,1) distribution if there is a runing Rserve on the local machine. The Rconnection doesn't have to be created more than once in your application.
There is at least one R gui's written in Java (called "JGR" (Java Gui for R); see the post with the JGR link; JGR is FOSS too). In principle this ought to be able to run under Linux, but the last time I tried (a year or so ago) I had nothing but trouble getting it to install. The Windows installer does a nice job of it though: 5 seconds and it's up and running with all its features enabled. I don't know how easy or hard it is to add your own Java routines to JGR and how easy or hard it is to redirect R output to your own Java application under JGR, or put them under the menu structure of JGR. I never tried, but I strongly suspect that it's possible and not hard. As far as I can see, this is not a question of interfacing with R but with JGR (a native Java application), but I never gave it much thought so you'll have to see for yourself. Sorry.
A popular Windows GUI (called TinnR) (see http://www.sciviews.org/Tinn-R/) has been written in Delphi 5.
Last but not least, R has excellent interactivity with Emacs, which works as well under Linux as it does under Windows. I personally can't stand Emacs, but lots of people swear by it (I just swear at it).
I love people who omit the fact-checking before blurting out publically an opinion they formed long ago (after all, what was good back then, it is good now, too, no?)
I agree with the first part of your post: to me R is something to code in when you have to, and to keep the resulting code as short and simple as possible. If I ever had to code a real application with a GUI that needed the statistical strengths of R, I would almost certainly not use R.
On the other hand I'd probably use Java and link to R as a server (see my other post about R and Java) instead of using Python.
"I think it addresses a niche market for high-end data analysts that want free, readily available code," said Anne H. Milley, director of technology product marketing at SAS. She adds, "We have customers who build engines for aircraft. I am happy they are not using freeware when I get on a jet."
This idiot needs to be fired immediately. Freeware is no less reliable then closed source. For someone who works at SAS the concept of peer review of work must obviously make the work more unreliable because it's peer reviewed. Such utter idiotic nonsense.
SAS Institute Inc.
100 SAS Campus Drive
Cary, NC 27513-2414
USA
Phone: 919-677-8000
Fax: 919-677-4444
I for one have no interest in dealing with someone that stupid. Lockheed, Boeing, Ford, Toyota, and ever damn bank I have ever seen and worked with has Linux in the enterprise. Someone with this seer level of stupidity needs to go.
-=[ Who Is John Galt? ]=-
Oh My Gobbldegook!
what happened to the part about sloths and fruit bats and orangutans?
I'm an electrical engineer. Most folks in my field would use excel and then Matlab. I have no statistical background. I started using it because it has great graphics and plotting capabilities, much better than excel -- more precision, many more types of plots, no jaggies when imported into word, and more. See here for some great examples from Deepayan Sarkar's lattice system, an implementation of William Cleveland's trellis graphics:
http://lmdvr.r-forge.r-project.org/figures/figures.html
That's just one of several ways to create precise, good-looking graphics.
As I used it more, I started to appreciate R's language features, too. It's a great all-purpose data manipulator. It's great with 2-d tables, but handles other data nicely, too.
Along the way, I even picked up some statistics:)
Is that normal, non-programmer users who want to be able to perform various statistical operations and data manipulation operations do not have the time nor desire to learn a programming language. R is a great language. However, some people already have enough trouble clicking on a button.
There is some difficulty in understanding the target audience for R. It is not for everyone, but this seems to be implying it is, and that is where the problem enters.
In fact, the more case-oriented that data mining software becomes, like SAS Enterprise Miner or Insightful (now TIBCO) Intelligent Miner or SPSS Clementine, PolyAnalyst, DiagnosX, etc, the larger the audience of users.
While that does pull out the rows which have 0 in column 1, it packs those return values into a vector (the first element of each of the selected rows, followed by the second element of the selected rows, etc.). If you want to extract those rows of the matrix, and keep them in matrix form, then this will do it (note the extra comma):
x[x[,1]==0,]
Yes, R is pretty nice in that sense. Matlab can do many of the same tricks, e.g. the Matlab equivalent of my command above is
x(x(:,1)==0,:)
Matlab doesn't have a simple way to do your version that I know of, since Matlab doesn't automatically recycle a vector index to a matrix. You'd have to do something like x(repmat(x(:,1)==0,n,1)) where n is the number of columns in x -- or to avoid hardcoding the n, you could do x(repmat(x(:,1)==0,size(x,2),1)) which is starting to get ugly isn't it?
And even with all that aside, you're still in some kind of state. Be it Idaho or the nation state of Australia.
Some people just don't understand anything. Sheesh.
I don't know if anyone's noticed, but this post was from the 'much-better-than-Q' dept.
Q is so much better to programming in than R (and more productive, especially when analysing large data sets).
Someones got an axe to grind in the R vs Q argument...
Tell that to Eiffel, Haskell, Clojure, C++ templates, et cetera. Don't confuse what certain functional languages do with something being a defining characteristic of functional languages. Many functional languages are mutable, have loops, are not pure, have state, et cetera.
StoneCypher is Full of BS
I've come across a couple of examples of inappropriate use of Excel
Yeah, but it's only the UK economy, it's not like its the German or the Japanese economy or anything. ;)
Firehed - Unfortunately, thanks to medical breakthroughs, common sense is not as common as it once was.
...Stata is still my choice.
It's commands are simple and intuitive. If you want to regress y on x1 and x2 you type "regress y x1 x2". In R you would type something like: "Results - lm(Y~X1+X2, data=datasetname)".
In Stata if you want an 'option' like clustering or robust standard errors you simply change the command to "regress y x1 x2, robust", in R its not so simple.
There have been several postings like this that say R has "1970's syntax", that R should "never" be used for large/complex programs, etc.
I'd like to ask for specifics:
1. State your modern-syntax language for comparison. Give 3 or 4 examples of how R is old-school (and inferior, not just different).
2. Specifically why would you not use R for complex/large projects? Is it the language, or the lack of an IDE? And are you including R's base and package routines in the calculation? (I.e. you may be able to do with 3 function calls in R what it would take two pages of coding in another language.)
Personally, I like R as a language. Anonymous functions, parameters by position or name, the "..." parameter which is fairly magical, lots of vectorization (i.e. functions that work on vectors/lists just as simply as scalars), nice data structures, and a useful OO structure.
Certainly I wouldn't recommend getting R _only_ for its programming language. But if you are working in the machine learning, statistics, etc, areas, and need things like statistical analysis, clustering, regression, data mining, etc, etc, plus very nice and flexible graphical output, why NOT use R?
nothing useful has ever come out of functional programming, period.
Lol. I generally agree with you, but map/reduce was inspired by functional programming and seems to be useful for some problems....
The power of R is in its libraries, which are often maintained by the best statistical researchers in that area.
Recently I discovered the ggplot2 graphing library, which is a huge step forward for constructing graphs of all types in R. It's very well documented and very actively maintained.
http://had.co.nz/ggplot2/
I have been using R for Geospatial statistics for about a year.
R is a good language for number crunching large amounts of data, data visualization and analysis.
It is based on Scheme and brings back fond memories of Lisp for me.
You can compile and link in your own Fortran libraries using your favorite flavor of Fortan (F95 in my case) to extend it. This means it is more a job control language in some ways.
You should not loop in R but rather use sequences or do so in you libraries.
It has a truly frightening number of open source add on libraries.
Comparing it to C, Java, Ruby or other languages misses the mark. You't won't use it for e-commerce or low level controllers, but it was never designed for that.
R does have OOP capabilities though I haven't used them.
I've used Maple, SAS, Matlab and R. R wins hands down. It's powerful and fun.
Since I've only been using it a year, there are a number of other things I probably have missed.
putting the 'B' in LGBTQ+
Actually, the main reason R doesn't need (as many) loops, is that most data types are vectors. And the operations on them are typically vector operations.
Not sure, I only spent about 2 minutes looking at the site.
...
I mostly know functional programming from studying Scheme and Logo. These languages facilitates stateless programming by making it easy to avoid state variables. (Really, the word "variable" means something quite different in these languages.) Textbooks that I've read that are based on these languages (SICP being the most famous example) simply don't talk about traditional loops — all iteration is done with tail recursion. Perhaps other functional languages are different, but these are the only two I know anything about.
All beside the point. As a half-dozen posts have already pointed out, I was wrong to describe R as a functional language. It's an imperative language with lots of vector operations. That's where it's looplessness comes from.
Do I have to give my karma points back?
Someone needs to write a file-sharing application in R.
Just for kicks.
I have used both SAS and R for projects large and small. Each has its own use, but I strongly disagree with the SAS marketing rep's argument that "We have customers who build engines for aircraft. I am happy they are not using freeware when I get on a jet." Actually, I would find it terrifying to think about the engineers using canned packages that are subject to no external review whatsoever, rather than R and its routines, which are freely available for all to see.
When choosing between the two, there are two questions I would ask:
1) What format does my data come in?
2) Is my data small enough to fit in memory?
If the answer to 1 is "SAS dataset" or the answer to 2 is "NO" then I would use SAS. It keeps datasets on disk, so if you are handling TBs (or even GBs) of data, there is no better software package. A lot of big data dumps come in SAS, too, so while it is easy to dump to CSV, if you are repeatedly accessing the same data source, SAS makes sense. Also, if you already have significant infrastructure written in SAS (data processing, reporting, etc.) it may make sense to continue doing it that way.
In all other cases (and even in some of the above cases, when considering overall cost), R is better. Easier to program in; easier to write C/Fortran/Python/etc. extensions for; contains real functions (SAS just has macros); object oriented; and FREE (GPL and zero cost). That last bit is important because if for some reason the SAS Institute goes bust (stranger things have happened), and all of your code is in SAS, you are fucked. Whereas if you have been writing in R, you will always have your existing software and can, in principle, continue developing it yourself or hire someone to maintain/upgrade it for you.
R is great, even if it is a slightly awkward language. It easily blows Octave out of the water in almost every way, but is inferior (as a language) to python+scipy, although I believe that it has far more packages available for it. Just check out CRAN some day. It's amazing.
Zementis (http://www.zementis.com/)has been working with the R community, specifically to extend the support for the Predictive Model Markup Language (PMML) standard which allows model exchange among various statistical software tools.
Got models in R? Deploy and score them in ADAPA in minutes on the Amazon EC2 cloud computing infrastructure!
If you develop your models in R, you can easily deploy and execute these models in the Zementis ADAPA scoring engine (using the PMML standard). This not only eliminates potential memory constraints in R but also speeds execution and allows SOA-based integration. For the IT department, ADAPA delivers reliability and scalability needed for production-ready deployment and real-time predictive analytics.
How to export PMML from R? http://adapasupport.zementis.com/2008/02/how-can-i-export-pmml-code-from-r.html
What is PMMML? http://knol.google.com/k/alex-guazzelli/pmml/3pz0mz6zvkz16/1
Or in the UK, out of your "Rs".
To have a right to do a thing is not at all the same as to be right in doing it
I haven't seen several things listed so far:
1. Plotting (graphing). You have several choices:
i. R's base graphics
ii. The lattice package
iii. The ggplot2 package
iv. Packages that create interesting classes tend to create plot methods for them (really a special
case of item i, above)
Most environments/languages would be absolutely thrilled to have options ii or iii, and option i/iv is pretty good, too. The ggplot2 package in particular has a different (but very nice) approach to plotting.
I mean how many programs/environments with adequate graphics spawn multiple graphics packages, each of which is well-maintained and good?
2. I like named parameters, and while it may not be the neatest programming practice in the world, the "..." parameter is very clever. It stands for all the parameters you have not specified in your function. For example:
foo - function (x, y=13, ...) { x - bar (x, y) ; baz (x, ...) }
The code here is silly, but the idea is that you are passing x to baz, plus all of the arguments to foo that were not x or y (i.e. all of the arguments that you did not specifically list as parameters for foo).
It's a poor-man's OO tool, I guess, but it gets used a lot in plotting, in particular.
3. Documentation. I've seen complaints about the documentation for R packages, but when I've dipped my toe in the Matlab and Sage worlds, I've found package documentation that looks more like notes scribbled on a sticky note than R documentation. R packages have documentation that includes a nice TeX'd output, and usually good explanations and bibliographies. Larger packages will even have vignettes that dive into specific areas of the package.
Faster, easier, nicer.
Well, both of them ARE very Lisp-y, sooo... Yeah, that's about it.
I know tobacco is bad for you, so I smoke weed with crack.
Someone known R interpreter written in Java ??
JMule user, enjoy it : http://www.jmule.org
as used by Pirates?
"Sir, you are stupid and I am going to rub your face in it!"
"Wait, I am stupid."
"Nevermind, I won't apologize, I will just call us both stupid, so as to avoid retracting my thesis that you are stupid."
Best Regards,
Internet Pussy