The Power of the R Programming Language
BartlebyScrivener writes "The New York Times has an article on the R programming language. The Times describes it as: 'a popular programming language used by a growing number of data analysts inside corporations and academia. It is becoming their lingua franca partly because data mining has entered a golden age, whether being used to set ad prices, find new drugs more quickly or fine-tune financial models. Companies as diverse as Google, Pfizer, Merck, Bank of America, the InterContinental Hotels Group and Shell use it.'"
... most others keep thinking that M$ Excel is the silver bullet.
Sad, but f****** true.
R!
Growing in use? sure.
The Kruger Dunning explains most post on
...if at first you don't succeed, then skydiving is not for you.
Weaselmancer
rediculous.
My request is to those that are in the know to show me some example code, that does something useful. Then later, compare that code to code from other languages to accomplish the same task.
Include reasons to support the notion that the R language is [necessarily] better at what it does.
Trying to find middle ground with C?
When all else fails, try.
All that scrambled verses and you forgot the part where the Nazi torpedoed Noah's Ark so it ran aground at the mountains of Ararat.
Very true. This is what I try to explain to people when they can't understand why some software is given away gratis. Because if they charged for it, given the current attitudes of the market, they wouldn't stand a chance and wouldn't ever get any market share to begin with.
Billy Brown rides on. Yolanda Green bypasses Gary White.
Good thing Boeing's not using fere software for aircraft simulation tools, space station labs, sub hunters, or moon rockets ;-)
education is no substitute for intelligence
Actually that wasn't why I used R, just a fun addendum. The reason to use R is the huge body of statistics, data mining and graphics facilities. Superb.
Of course, the problem with any statistical library is you have to turn your brain on first. Nothing produces "Garbage in Garbage out" quite like statistical analysis.
With R you tend to need to spend far more time thinking about why you are doing something, and what the answer means than in say vanilla C/Ruby programming.
Which is actually not a Bad Thing at all.
The worse thing about R programming is its name. Googling for "R" turns up way to much noise and way too little signal.
And i also dont know why it is called R
The guys who originally wrote both had first names that started with R and being the jokers that they were, they thought it would be funny to give it a name very similar to S.
The language is very well documented online and the mailing lists contain thousands of examples. It is primarily for statistical analysis, and the libraries available for doing such analysis are unparalleled.
Well.. maybe. Or Maybe not. But Definitely not sort of.
"I think it addresses a niche market for high-end data analysts that want free, readily available code," said Anne H. Milley, director of technology product marketing at SAS. She adds, "We have customers who build engines for aircraft. I am happy they are not using freeware when I get on a jet."
Wow...talk about FUD. Does SAS imdemnify against plane crashes?
Actually it may not suck. But having used it on and off over the past few years while not being a statistics pro, I find the R language bletcherous and annoying. - as an assignment operator?
http://www.arrrrrr.com/corsair.jpg
"The worse thing about R programming is its name. Googling for "R" turns up way to much noise and way too little signal"
Try searching from http://rseek.org/ instead of directly from Google.
I remember once years ago freaking my colleagues out with a largish app written in R... with nary a loop anywhere.
That's a feature of functional languages, a class that also includes Scheme and XSLT. The basic idea is that programs should not have state, because state makes them harder to debug. A for or while loop, by definition, has state, so you have to do your iteration some other way, namely Tail Recursion.
I suppose that makes sense, but I've never been able to teach myself to think that way. It's the main reason I never managed to get through The Wizard Book.
Are you kidding me? Are you really *(*$@#ing, Grade A kidding me?
Python/Perl/Ruby require interpreters. Scheme and Lisp are frequently run within interpreters. "stand-alone executable" require HARDWARE. Any programming system requires *something* underneath it unless you are programming in a purely physical system like an automated abacus with mechanical gears that buzz and whirr.
Programming languages are defined by their Turing completeness: can they do things repeatedly, can they assign values to memory locations and perform some basic set of operations (nand works nicely), can they make decisions. Everything else is fluff.
Perl has "fluff" that handles regular expressions very well.
Python (and others) have "fluff" that make networking and database ops easy.
R has "fluff" that makes it terribly convenient to work with data.
Matlab has "fluff" that makes it very easy to do numerical methods programming.
Mathematica has "fluff" that makes it very easy to do symbolic computation.
Each and every one of these, and most well-known languages, with all their warts and beauty marks are Turing complete and are deserving of the term "programming language".
Regards,
Mark
x = vector(mode="list")
x[["joe"]] = y
x[["bob"]] = z #z can be a function!
x = list(joe=y)
x$bob = z
The R language (yes, it's a language; an interpreted languages is a language too) has developed as the language of choice by statisticians (both academics and sundry statistical researchers) around the world as their main computer language. It is used in those cases where researchers feel the need for customized computations rather than the use of a package like SAS or SPSS.
The reason that R has become popular is due to a snowball effect and history. It started as a FOSS re-implementation-from-scratch of the "S" language designed for statistical work at Bell labs (see http://en.wikipedia.org/wiki/S_(programming_language). Some academics and researchers of repute used it (the S language) because at that time (1975) it was very innovative and far better than most alternatives, and others followed. The S language gained a measure of acceptance among statisticians. Then when R became available the cycle intensified because of the much improved availability of the interpretor and its libraries. This cycle continued to the point that by now probably most professional statisticians use it.
As far as I can see, the R language isn't especially sophisticated or elegant, and may strike people used to more modern languages as a bit repugnant. It does however excel in three respects:
(a) it allows for easy access of Fortran and C library routines
(b) it allows you to pass large blobs of data by name
(c) it makes it easy to pass data to and from your own compiled C and Fortran routines
The first reason is particularly important because it allows one to use e.g. pre-compiled linear algebra package like LAPACK, or Fourier Transforms, or special function evaluations and thereby gain execution speeds comparable to C despite being an interpreted language (just like Matlab, Octave, Scilab, Gauss, Ox and suchlike): the hard work is carried out by a compiled library routine which is made easily accessible through the interpreted language. Any algorithm needed in statistics that's available as C or Fortran code can be linked in and called without too much effort.
The second reason is important because it slows down execution much less than any pass-by-value interpreted language would, and it allows you to change data that is passed into a function.
The third reason is particularly important because it helps researchers be more productive. Reading in your data, examining it, graphing it, tracing outliers and cleaning them up is best done in an interactive environment in an interpreted language. Coding such things in C or Fortran is an awful waste of time, and besides, researchers aren't code-monkeys and don't enjoy coding inane for-loops to read, clean, and display data. Vector and matrix primitives are far more powerful, and usually preferable unless they are so inefficient that you have to wait for the result. However, there are times when you just need to carry out standard algorithms (linear algebra, calculation of mathematical or statistical functions) or simply time-consuming repetitive algorithms that run so much faster in a genuine compiled language. You could start out by coding the algorithm in an interpreted language to check if it's working, and then isolate the computationally expensive part and code it up in C or Fortran. Using R (or Matlab or Scilab) you can *call* the compiled subroutine, pass it your (cleaned) data, and get the result back in an environment where you can easily analyze it.
That's why languages like R, Matlab, Scilab, Octave, Gauss, and Ox are so productive: you get the best of both worlds. Both the convenience, interactiveness, and terseness of a high-level interpreted language and the speed of compiled languages.
So why R, and why not Gauss or Matlab or whatever?
Well, part of that is cultural. If you're an econometrician you'll have been weane
I wish it had a more googleable name. It's hard to search for help. The signal to noise ratio is low.
Your comment is absolutely wrong.
http://en.wikipedia.org/wiki/Programming_language
R is a Turing complete programming language. The fact that it requires an interpreter is completely irrelevant.
Actually, R is a real (Turing-complete) programming language like Perl, Python, Ruby, etc. It just happens to have lots of statistical libraries and matrix-oriented functions.
You put #!/usr/bin/Rscript in your first line and it can work just like any other scripting language, with command-line arguments, etc. I use it all the time as a replacement for other scripting languages (think PDL+Perl or Numpy+Python).
R is an excellent language for any scientist. The sytax and semantics of the language are very well thought-out.
'R' is not a general programing language but that hardly means it's not a language. Producing a stand alone executable is not a feature of any language, it's a feature of the tool set.
And did you exchange a walk on part in the war for a lead role in a cage? - Pink Floyd.
"And I also don't know why it is called R"
"The guys who originally wrote both had first names that started with R and being the jokers that they were, they thought it would be funny to give it a name very similar to S."
Additionally, in statistics r is the letter used to denote the Pearson product-moment correlation coefficient.
In the land of the blind, the one-eyed man is king.
I would argue that GP is confusing "programming language" with "general-purpose programming language".
I bet even SQL is Turing-complete, but I wouldn't want to do more than database operations with it.
Don't thank God, thank a doctor!
Tell me about it. Try this:
http://www.rseek.org/
Have you considered using sage (www.sagemath.org)? It is FOSS and has highly active community and developer support. I'd suggest reading the tour http://www.sagemath.org/tour.html and seeing what you think.
I remember once years ago freaking my colleagues out with a largish app written in R... with nary a loop anywhere.
I'm sure you had plenty of loops in your code. They were just hidden via the use built in functions. Not that that's a bad thing.... just saying. You have to understand the mechanics of the calculations to use them properly, and over-reliance on built in functions can make it too easy to talk out of your ass.
are you talking about R or S? searching for "R" on google returns pretty good results--the first 6 links are all related to R. and 4 of the results on the next page are also related to R. searching for "S" on the other hand doesn't immediately come up with any relevant results.
i'd say it's fairly easy to find info on R using google considering its limited popularity relative to other languages. obviously you're not going to find a ton of information on it since it's a somewhat obscure niche language. but if you can find the r-project/CRAN website or other R resources on google, then you can probably find documentation for whatever info you need.
besides, you can always use multiple keywords and boolean search operators to narrow down your search results, like searching for "R" AND "statistics." or once you've found online documentation for "R" you can use the "site:" modifier to search that site only.
i mean, this is all pretty basic stuff. there are much harder things to search for information on--like pharmaceutical drugs. this just requires basic knowledge of search engines and a little commonsense.
You didn't have any friends in the 3rd grade.
Labview is well designed for its intent. So someone with minimal programming skills can sit down and get something done in a short amount of time. Would I use it for crunching numbers or collecting terabytes of data, probably not. But its sure damn handy if you want to interface test equipment and get results. Its all about the best tool for the job.
Only the State obtains its revenue by coercion. - Murray Rothbard
You have to play with it. As with APL you'll either love it or hate it.
If you like the idea of a language that includes relational tables as a primitive data type, that extends most operators to do the right thing when you feed them vectors and matrices, that has linear regression and equation solving built-in, you'll probably like R.
I'm a Programmer. That's one level above Software Engineer and one level below Engineer.
Matlab supports production of a stand alone executable from Matlab that does not require the Matlab environment.
Tesla was a genius. Edison however was a overrated hack who liked to torture puppies.
Or that if you don't have Javascript enabled.
Say you realize that you need to check for another corner case that you forgot, or need to extend a function for another purpose, or whatever. In any other language, you would type a few lines of code and be done with it. Not with labview. With labview you have to move things around to make room for the new code, disconnect wires and reconnect them. NI has added stuff into the newer version to help with this (auto growing, etc) but it still turns into a mess in short order.
Other things are just easier to type than to draw, and also easier to read in text then as a schematic, like equations. So much so that they have added the ability to type portions of the code, but the amount of setup that you need to do with a code block often defeats the time benefit you get from using it.
As someone who likes "clean code" I find LabView much more tedious and time consuming to keep neat, and when dealing with other coders that are not as picky, I find that their LabView code is much messier and harder to read than Java or C code by the same developer.
Do you happen to have a link to what you mean by "a program should not have state"? Because, I mean, that seems antithetic to the nature of a program.
Of course there is a state, you're using a standard computer to run the program, so there must be a state somewhere. Still, the point is that even if the language implementation works by changing the computer's memory state, the abstraction you use to program isn't state-based. In a pure functional programming language, you don't program by manipulating a state, but by computing the results of functions.
Regarding the SICP book, like most functional programming languages, Scheme isn't a pure functional language. It contains constructs with side effects, which actually change the program state directly. Such constructs are available because there are problems that are very difficult (but not impossible) to handle with pure functional programming, so language designers end up making compromises.
Just my 2 (Euro) cents
Labview is utterly non-deterministic in its execution. The execution order of blocks does NOT follow the data flow of the lines joining them if there are more than a handful of blocks present. In fact, the execution sequence becomes random, and changes randomly when block positions are changed (even without changing the data connectivity). This forces the use of explicit sequence structures in any non-trivial function, increasing its complexity and opacity. Just try synchronizing shared data between asynchronous loops. Even their Knowledgebase admits that there's no way to do it properly.
And let's not get started on the crappy content of Labview's documentation. It's organized and formatted tolerably well, but the content is vacuous. Hardly any functions have any suggestion of their behaviour when faulty data arrives (e.g. a NaN), for example.
Those who can make you believe absurdities can make you commit atrocities. - Voltaire
It could be worse. Try searching for the natural language processing system "Lolita".
Quidnam Latine loqui modo coepi?
Such constructs are available because there are problems that are very difficult (but not impossible) to handle with pure functional programming, so language designers end up making compromises.
Not that difficult, really. The main problem is sequencing, which is provided by things like function composition. The problem is the unwieldy nature in languages like Scheme of specifying sequencing using pure functions, while also handling data that doesn't require sequencing; but this is a syntactic problem, not a practical one.
The issue is handled admirably by the language Haskell, using a mathematical construct called a "monad" to allow an elegant way of handling sequencing--even a syntactic sugar "do" notation that looks vaguely imperative--while remaining 100% pure, unlike Scheme.
No, it's P. It's like R but it's missing a leg!
I agree with the first part of your post: to me R is something to code in when you have to, and to keep the resulting code as short and simple as possible. If I ever had to code a real application with a GUI that needed the statistical strengths of R, I would almost certainly not use R.
On the other hand I'd probably use Java and link to R as a server (see my other post about R and Java) instead of using Python.
I tried. The search results are extremely relevant to my interests. Of course, I don't use the natural language processing system "Lolita".
8 of 13 people found this answer helpful. Did you?
That "Post Anonymously" button is kind of hard to miss these days.
Dewey, what part of this looks like authorities should be involved?