The Power of the R Programming Language
BartlebyScrivener writes "The New York Times has an article on the R programming language. The Times describes it as: 'a popular programming language used by a growing number of data analysts inside corporations and academia. It is becoming their lingua franca partly because data mining has entered a golden age, whether being used to set ad prices, find new drugs more quickly or fine-tune financial models. Companies as diverse as Google, Pfizer, Merck, Bank of America, the InterContinental Hotels Group and Shell use it.'"
... most others keep thinking that M$ Excel is the silver bullet.
Sad, but f****** true.
R!
Growing in use? sure.
The Kruger Dunning explains most post on
Why they chose to go with R rather than T, I'll never know.
There appear to be duplicate links in the summary :)
...if at first you don't succeed, then skydiving is not for you.
Weaselmancer
rediculous.
My request is to those that are in the know to show me some example code, that does something useful. Then later, compare that code to code from other languages to accomplish the same task.
Include reasons to support the notion that the R language is [necessarily] better at what it does.
Now you'll have to do it again
My UID is prime... is yours?
All that scrambled verses and you forgot the part where the Nazi torpedoed Noah's Ark so it ran aground at the mountains of Ararat.
Oh god... cue pirate jokes.
Very true. This is what I try to explain to people when they can't understand why some software is given away gratis. Because if they charged for it, given the current attitudes of the market, they wouldn't stand a chance and wouldn't ever get any market share to begin with.
Billy Brown rides on. Yolanda Green bypasses Gary White.
Good thing Boeing's not using fere software for aircraft simulation tools, space station labs, sub hunters, or moon rockets ;-)
education is no substitute for intelligence
Actually that wasn't why I used R, just a fun addendum. The reason to use R is the huge body of statistics, data mining and graphics facilities. Superb.
Of course, the problem with any statistical library is you have to turn your brain on first. Nothing produces "Garbage in Garbage out" quite like statistical analysis.
With R you tend to need to spend far more time thinking about why you are doing something, and what the answer means than in say vanilla C/Ruby programming.
Which is actually not a Bad Thing at all.
The worse thing about R programming is its name. Googling for "R" turns up way to much noise and way too little signal.
The language is very well documented online and the mailing lists contain thousands of examples. It is primarily for statistical analysis, and the libraries available for doing such analysis are unparalleled.
Well.. maybe. Or Maybe not. But Definitely not sort of.
"I think it addresses a niche market for high-end data analysts that want free, readily available code," said Anne H. Milley, director of technology product marketing at SAS. She adds, "We have customers who build engines for aircraft. I am happy they are not using freeware when I get on a jet."
Wow...talk about FUD. Does SAS imdemnify against plane crashes?
Actually it may not suck. But having used it on and off over the past few years while not being a statistics pro, I find the R language bletcherous and annoying. - as an assignment operator?
Anne H. Milley, director of technology product marketing at SAS ... adds, "We have customers who build engines for aircraft. I am happy they are not using freeware when I get on a jet."
Help fight continental drift.
http://www.arrrrrr.com/corsair.jpg
Calling R a programming language is like calling Mathematica or Matlab a language. R is a system for statistical tasks that has a language and snytax, and but it is not capable of producing stand-alone executables that do not require the entire R environment.
So, you're saying java, js, python, perl, and ruby aren't programming languages?
"The worse thing about R programming is its name. Googling for "R" turns up way to much noise and way too little signal"
Try searching from http://rseek.org/ instead of directly from Google.
The "smart set" needs a such a high level lingua franca to express infinite precision financial models of no accuracy whatsoever!
Seastead this.
I remember once years ago freaking my colleagues out with a largish app written in R... with nary a loop anywhere.
That's a feature of functional languages, a class that also includes Scheme and XSLT. The basic idea is that programs should not have state, because state makes them harder to debug. A for or while loop, by definition, has state, so you have to do your iteration some other way, namely Tail Recursion.
I suppose that makes sense, but I've never been able to teach myself to think that way. It's the main reason I never managed to get through The Wizard Book.
I retract my sole criticism of R.
It's amazing how often I hear people refer to Matlab as a language (mostly engineering professors).
Are you kidding me? Are you really *(*$@#ing, Grade A kidding me?
Python/Perl/Ruby require interpreters. Scheme and Lisp are frequently run within interpreters. "stand-alone executable" require HARDWARE. Any programming system requires *something* underneath it unless you are programming in a purely physical system like an automated abacus with mechanical gears that buzz and whirr.
Programming languages are defined by their Turing completeness: can they do things repeatedly, can they assign values to memory locations and perform some basic set of operations (nand works nicely), can they make decisions. Everything else is fluff.
Perl has "fluff" that handles regular expressions very well.
Python (and others) have "fluff" that make networking and database ops easy.
R has "fluff" that makes it terribly convenient to work with data.
Matlab has "fluff" that makes it very easy to do numerical methods programming.
Mathematica has "fluff" that makes it very easy to do symbolic computation.
Each and every one of these, and most well-known languages, with all their warts and beauty marks are Turing complete and are deserving of the term "programming language".
Regards,
Mark
x = vector(mode="list")
x[["joe"]] = y
x[["bob"]] = z #z can be a function!
x = list(joe=y)
x$bob = z
Or at least in the context it's made out to be in this article. Isn't it a language suited mostly to statistics? For that use, I hear that it's one of the best.
It's also (hence the name) an open-source implementation of the much older S platform. The article distorts its history to the point of dishonesty.
What I'm listening to now on Pandora...
The R language (yes, it's a language; an interpreted languages is a language too) has developed as the language of choice by statisticians (both academics and sundry statistical researchers) around the world as their main computer language. It is used in those cases where researchers feel the need for customized computations rather than the use of a package like SAS or SPSS.
The reason that R has become popular is due to a snowball effect and history. It started as a FOSS re-implementation-from-scratch of the "S" language designed for statistical work at Bell labs (see http://en.wikipedia.org/wiki/S_(programming_language). Some academics and researchers of repute used it (the S language) because at that time (1975) it was very innovative and far better than most alternatives, and others followed. The S language gained a measure of acceptance among statisticians. Then when R became available the cycle intensified because of the much improved availability of the interpretor and its libraries. This cycle continued to the point that by now probably most professional statisticians use it.
As far as I can see, the R language isn't especially sophisticated or elegant, and may strike people used to more modern languages as a bit repugnant. It does however excel in three respects:
(a) it allows for easy access of Fortran and C library routines
(b) it allows you to pass large blobs of data by name
(c) it makes it easy to pass data to and from your own compiled C and Fortran routines
The first reason is particularly important because it allows one to use e.g. pre-compiled linear algebra package like LAPACK, or Fourier Transforms, or special function evaluations and thereby gain execution speeds comparable to C despite being an interpreted language (just like Matlab, Octave, Scilab, Gauss, Ox and suchlike): the hard work is carried out by a compiled library routine which is made easily accessible through the interpreted language. Any algorithm needed in statistics that's available as C or Fortran code can be linked in and called without too much effort.
The second reason is important because it slows down execution much less than any pass-by-value interpreted language would, and it allows you to change data that is passed into a function.
The third reason is particularly important because it helps researchers be more productive. Reading in your data, examining it, graphing it, tracing outliers and cleaning them up is best done in an interactive environment in an interpreted language. Coding such things in C or Fortran is an awful waste of time, and besides, researchers aren't code-monkeys and don't enjoy coding inane for-loops to read, clean, and display data. Vector and matrix primitives are far more powerful, and usually preferable unless they are so inefficient that you have to wait for the result. However, there are times when you just need to carry out standard algorithms (linear algebra, calculation of mathematical or statistical functions) or simply time-consuming repetitive algorithms that run so much faster in a genuine compiled language. You could start out by coding the algorithm in an interpreted language to check if it's working, and then isolate the computationally expensive part and code it up in C or Fortran. Using R (or Matlab or Scilab) you can *call* the compiled subroutine, pass it your (cleaned) data, and get the result back in an environment where you can easily analyze it.
That's why languages like R, Matlab, Scilab, Octave, Gauss, and Ox are so productive: you get the best of both worlds. Both the convenience, interactiveness, and terseness of a high-level interpreted language and the speed of compiled languages.
So why R, and why not Gauss or Matlab or whatever?
Well, part of that is cultural. If you're an econometrician you'll have been weane
The worse thing about R programming is its name. Googling for "R" turns up way to much noise and way too little signal.
use RSeek.org
Problem solved.
The worse thing about R programming is its name. Googling for "R" turns up way to much noise and way too little signal.
Yep. There are a couple of dedicated R search engines that can help with that: http://www.dangoldstein.com/search_r.html and http://www.rseek.org/. It may also sometimes be useful to Google on "Splus (whatever)" since most R and S+ code is pretty much interchangeable.
The correlation between ignorance of statistics and using "correlation is not causation" as an argument is close to 1.
Your comment is absolutely wrong.
http://en.wikipedia.org/wiki/Programming_language
R is a Turing complete programming language. The fact that it requires an interpreter is completely irrelevant.
I think we all know how well that's turned out, eh? So it that the fault of the language or programmer error?
What's libc again? Oh, that's right, it's something C programs generally need to run. So you're only programming in C if you don't use libc or statically linking? How awesome it is to have an "I am actually programming" flag in your compiler and linker!
For every problem, there is at least one solution that is simple, neat, and wrong.
Note that the truth of your statement does not change when you replace "R" by "python" (and remove the word "statistical"). Nevertheless, I would still call python a programming language.
The lesson here is that a sufficiently large corporation is indistinguishable from government. --ultranova
Actually, R is a real (Turing-complete) programming language like Perl, Python, Ruby, etc. It just happens to have lots of statistical libraries and matrix-oriented functions.
You put #!/usr/bin/Rscript in your first line and it can work just like any other scripting language, with command-line arguments, etc. I use it all the time as a replacement for other scripting languages (think PDL+Perl or Numpy+Python).
R is an excellent language for any scientist. The sytax and semantics of the language are very well thought-out.
'R' is not a general programing language but that hardly means it's not a language. Producing a stand alone executable is not a feature of any language, it's a feature of the tool set.
And did you exchange a walk on part in the war for a lead role in a cage? - Pink Floyd.
The worse thing about R programming is its name. Googling for "R" turns up way to much noise and way too little signal.
I see your addendum that it works better now, but searching using just a single letter or word usually doesn't work well, you need to give it context. You can enter "tank" into Google and it's going to give you a lot of mixed results because the results could be for fuel, water, waste, military armor or some other use, so you need to give Google something to narrow down the results. So "R programming language" would probably be a base to start from.
I would argue that GP is confusing "programming language" with "general-purpose programming language".
I bet even SQL is Turing-complete, but I wouldn't want to do more than database operations with it.
Don't thank God, thank a doctor!
You could compile an R program if you wanted to, and if you were willing to write a compiler. Same with Matlab. Your argument is like saying that AWK is not a programming language, because it is interpreted.
Palm trees and 8
Matlab is a programming language. Let me guess, you also think that Rexx and AWK are not programming languages, because they have weird syntax and are specialized for certain tasks?
Palm trees and 8
Even ASM requires a minimal environment, it just happens that it's environment is provided directly by the hardware.
Most languages have "fluff" that makes it terribly convenient to work with "data". Perl's fluff helps it deal with string data, for example. You really have to be a bit more specific.
That sounds extremely weird: if a program has a stack, then it has a state - the location on the stack is still state. Thus, if you use recursion, you still have state. I mean, you can try to hide the fact that you have state, but I don's see how you can have a program without state.
Even the wizard book appears to have a chapter on state: http://mitpress.mit.edu/sicp/full-text/book/book-Z-H-19.html#%25_chap_3 , but, unlike your description, instead of talking about a program without state, it considers two kinds of state: the state of objects, or the state of streams of data.
Do you happen to have a link to what you mean by "a program should not have state"? Because, I mean, that seems antithetic to the nature of a program.
Have you considered using sage (www.sagemath.org)? It is FOSS and has highly active community and developer support. I'd suggest reading the tour http://www.sagemath.org/tour.html and seeing what you think.
I remember once years ago freaking my colleagues out with a largish app written in R... with nary a loop anywhere.
I'm sure you had plenty of loops in your code. They were just hidden via the use built in functions. Not that that's a bad thing.... just saying. You have to understand the mechanics of the calculations to use them properly, and over-reliance on built in functions can make it too easy to talk out of your ass.
You didn't have any friends in the 3rd grade.
Ross Ihaka was my supervisor for my honours dissertation last year. Reading this article was a bit amusing for me. If you asked him for his opinion about R, let me say he wouldn't have written such glowing words about it! He doesn't like being in the limelight all that much either, from what I have been able to tell.
I may have some idea what is meant by the "wanting to create more advanced software" at the end of the article. At the moment he is tinkering away rebuilding the guts of R in Lisp. He reckons if "things were done properly", R would be orders of magnitude faster and more efficient. For example, when fitting a linear model, several copies of the data matrix are made when performing the matrix operations required to find all the coefficients, working out diagnostic matrices, et cetera.
So if anyone out there wants to contribute to R, now would be a good time to volunteer.
Labview is well designed for its intent. So someone with minimal programming skills can sit down and get something done in a short amount of time. Would I use it for crunching numbers or collecting terabytes of data, probably not. But its sure damn handy if you want to interface test equipment and get results. Its all about the best tool for the job.
Only the State obtains its revenue by coercion. - Murray Rothbard
You have to play with it. As with APL you'll either love it or hate it.
If you like the idea of a language that includes relational tables as a primitive data type, that extends most operators to do the right thing when you feed them vectors and matrices, that has linear regression and equation solving built-in, you'll probably like R.
I'm a Programmer. That's one level above Software Engineer and one level below Engineer.
Matlab supports production of a stand alone executable from Matlab that does not require the Matlab environment.
Tesla was a genius. Edison however was a overrated hack who liked to torture puppies.
I think he says "state" when he means "heap."
Don't blame me, I voted for Baltar.
Actually, calculations in R are just vectorized. Internally, there must be some looping, but the language hides it.
is that SAS is not amused of the competition
Say you realize that you need to check for another corner case that you forgot, or need to extend a function for another purpose, or whatever. In any other language, you would type a few lines of code and be done with it. Not with labview. With labview you have to move things around to make room for the new code, disconnect wires and reconnect them. NI has added stuff into the newer version to help with this (auto growing, etc) but it still turns into a mess in short order.
Other things are just easier to type than to draw, and also easier to read in text then as a schematic, like equations. So much so that they have added the ability to type portions of the code, but the amount of setup that you need to do with a code block often defeats the time benefit you get from using it.
As someone who likes "clean code" I find LabView much more tedious and time consuming to keep neat, and when dealing with other coders that are not as picky, I find that their LabView code is much messier and harder to read than Java or C code by the same developer.
What will happen to the naming convention of programming languages when we run out of letters of the alphabet?
R's fluff helps in manipulating data in statistical contexts, while actually providing support for non-vector/matrix data, unlike the abomination that is Matlab. Seriously, just try manipulating non-matrix data in Matlab. You'd be sooner pulling your teeth out, or more likely, using R.
Do you happen to have a link to what you mean by "a program should not have state"? Because, I mean, that seems antithetic to the nature of a program.
Of course there is a state, you're using a standard computer to run the program, so there must be a state somewhere. Still, the point is that even if the language implementation works by changing the computer's memory state, the abstraction you use to program isn't state-based. In a pure functional programming language, you don't program by manipulating a state, but by computing the results of functions.
Regarding the SICP book, like most functional programming languages, Scheme isn't a pure functional language. It contains constructs with side effects, which actually change the program state directly. Such constructs are available because there are problems that are very difficult (but not impossible) to handle with pure functional programming, so language designers end up making compromises.
Just my 2 (Euro) cents
Like many posters have already said, the syntax of R is terribly outdated, and that's a first problem with me (I started programming with Python, go figure). But the main problem I have with R is the performance. A lot of functions and packages are dead slow or quite memory hungry (compared to a, say, C++ equivalent - for the initiated, check out the performance of rma from Bioconductor with RMAExpress, which is written in C++).
Another issue I have is not with R itself, but with its most popular add-on, the Bioconductor suite, widely used in bioinformatics. The packages' quality varies a great deal, and there's no way to file bug reports (unlike R itself, which has a bug tracker) short of emailing the authors, who, being academics, may not even have the time/will to reply to you. I'd love to see stuff like Bioconductor in a more recent programming language, but I doubt it - doing this kind of stuff doesn't give you any funding.
A CC-licensed illustrated horror novel
Labview is utterly non-deterministic in its execution. The execution order of blocks does NOT follow the data flow of the lines joining them if there are more than a handful of blocks present. In fact, the execution sequence becomes random, and changes randomly when block positions are changed (even without changing the data connectivity). This forces the use of explicit sequence structures in any non-trivial function, increasing its complexity and opacity. Just try synchronizing shared data between asynchronous loops. Even their Knowledgebase admits that there's no way to do it properly.
And let's not get started on the crappy content of Labview's documentation. It's organized and formatted tolerably well, but the content is vacuous. Hardly any functions have any suggestion of their behaviour when faulty data arrives (e.g. a NaN), for example.
Those who can make you believe absurdities can make you commit atrocities. - Voltaire
That's not what I understand to be functional programming. The basic rule of functional programming is "no side effects", so the same input(s) will always produce the same output. Or, to put it another way, a function doesn't maintain state between invocations. This isn't incompatible with loops. For example, you could write a function to calculate the fibonacci series up to n which internally used a loop, and still call it functional.
Other common ones are filter, fold. For loops and the like that require state, typically you use tail recursion, and most functional programming langauges trivally expand that optimization to the general case of tail call optimization, where any function F whose return value is the return value of the function G, then you don't need to retain F's stack frame, as F's result is simply G's result. * C and C++ allow for the use of function pointers, which can allow a functional style. C++ is better than C is this regard, as the STL also comes with a set of functional algorithms such as map.
R is statistics oriented, where Octave and MATLAB are more general mathematical computing environments.
if you are coming from a programming background, I assure you that you'll hate gui oriented tools like spss. if you have a slightly better understanding of probability and the notion of sampling, you'll find that the way r approaches data as a whole feels very nice for a developer.
in data analysis, you'll be transforming, filtering typecasting data. you'll be turning numbers into nominal values, you'll be sampling from complex distributions etc. writing code to do these lets you stay in complete control of what you're doing and at least for me looking at a function line by line is a much better way of seeing what I'm doing, instead of clicking on icons and selecting menu items in a particular order. the whole process is documented in code, and for me it is much easier to map the things I'm doing to statistics.
the downside of R is, as you start dealing with more and more data, things become a little bit harder since R takes all data into memory and processes it in memory. scaling into very large amounts of data is not easy, at least I can't afford as much ram as necessary (my data sets are really huge). the most important advantage of some commercial tools in my work is that they can treat data as a stream on disk, and even if it takes longer, it is possible to process huge amounts of data.
R forces you to think about what you are doing, instead of hiding behind spss and saying, well these are the results spss has given us, when we clicked these buttons and menus. my 2 cents of course.
Actually, R hasn't used linked lists to represent its "list" data structure for a long time. So it is very much like python indeed.
I remember once years ago freaking my colleagues out with a largish app written in R... with nary a loop anywhere.
That's a feature of functional languages, a class that also includes Scheme and XSLT. The basic idea is that programs should not have state, because state makes them harder to debug. A for or while loop, by definition, has state, so you have to do your iteration some other way, namely Tail Recursion.
I suppose that makes sense, but I've never been able to teach myself to think that way. It's the main reason I never managed to get through The Wizard Book.
R has a scheme-like lower layer but it feels more like APL with its array manipulation capabilities than Scheme. It does not support tail recursion.
Or, indeed, Java. Language != implementation.
Reality is the ultimate Rorschach.
Such constructs are available because there are problems that are very difficult (but not impossible) to handle with pure functional programming, so language designers end up making compromises.
Not that difficult, really. The main problem is sequencing, which is provided by things like function composition. The problem is the unwieldy nature in languages like Scheme of specifying sequencing using pure functions, while also handling data that doesn't require sequencing; but this is a syntactic problem, not a practical one.
The issue is handled admirably by the language Haskell, using a mathematical construct called a "monad" to allow an elegant way of handling sequencing--even a syntactic sugar "do" notation that looks vaguely imperative--while remaining 100% pure, unlike Scheme.
I think he says "state" when he means "heap."
More likely he's thinking of something along the lines of "maintaining referential transparency". That is, the only sense of identity a variable has is its value, not its storage location, meaning that complicated data structures can't be altered, only used to construct new, different versions.
This incidentally makes multithreading alot easier, too. :)
... but for some curious reason it requires a fortran compiler as well as a C compiler to build the whole system.
Sorry R team , but theres no way I'm installing an entire fortran compiler just to install your system.
Why can't they write the whole thing in C for heavens sake? The days when C couldn't handle floating point very well (if thats the reason) are long gone.
No, it's P. It's like R but it's missing a leg!
Actually, R is an imperative language. You can loop just fine, as well as modify variables (state) and everything else you'd expect from a normal language. (Though functions are first-class objects and all that)
The reason you don't is that it's like SQL in that its statements can work on whole tables at a time. For example, applying mean() to a table gives you an array with the mean value of every column (and the array positions are named after the column names).
If you want to get the rows of an array (x) with 0 on column 1, you do x[x[,1]==0]. basically, x[,1] gives you a one column table with the appropriate column, x[,1]==0 gives you a vector of true,false values, which you can use to index into the array again.
You can do things like by(x,x[,1]==0,nrow), which gives you how many rows have 0 and how many don't on column 1. by(x,x[,1]==0,mean) gives you the per-column mean of the rows which have 0 on column 1 and the same for those that don't
Anyway, you get the idea. You can loop just like can use cursors in SQL, but your code will be slower and less readable if you do.
In a pure functional programming language, you don't program by manipulating a state, but by computing the results of functions
Which is why I always found the entire 'functional programming' paradigm absurd and useless. When you deal with I/O, you have states ('printf'). When you deal with a user (interface), you have states. Hell, even when you deal with time you have states ('before' and 'after').
If all you want to do is compute f(x), then OK, maybe, but besides that nothing useful has ever come out of functional programming except Emacs. Well, let me rephrase that: nothing useful has ever come out of functional programming, period. The amount of time and brain power wasted on things like Lisp for no result whatsoever is astounding.
Non-Linux Penguins ?
A more complete answer would be: the R system is basically an interpreter which accepts messages and terminal connections, like a server. It has no "native" front-end (except a stark command-line window) per se.
This means that *any* application that can log in to R as a client or send the R interpreter messages and capture the response can use the R interpreter as a slave, and can hence act as a GUI. It doesn't matter if that's a Windows application, a Linux application, or a Java application, or a web-server (R can work as a back-end to a web-server too). The only thing is the amount of work needed to get things working and to actually code up the GUI. There is a package called Rserve (see http://www.rforge.net/Rserve/) that greatly facilitates this for C++ and Java. Below is a quote from the Rforge repository:
Rserve is a TCP/IP server which allows other programs to use facilities of R (see www.r-project.org) from various languages without the need to initialize R or link against R library. Every connection has a separate workspace and working directory. Client-side implementations are available for popular languages such as C/C++ and Java. Rserve supports remote connection, authentication and file transfer. Typical use is to integrate R backend for computation of statstical models, plots etc. in other applications.
The following Java code illustrates the easy integration of Rserve:
Rconnection c = new Rconnection(); double d[]=c.eval("rnorm(10)").asDoubleArray();
d now contains 10 random samples from the N(0,1) distribution if there is a runing Rserve on the local machine. The Rconnection doesn't have to be created more than once in your application.
There is at least one R gui's written in Java (called "JGR" (Java Gui for R); see the post with the JGR link; JGR is FOSS too). In principle this ought to be able to run under Linux, but the last time I tried (a year or so ago) I had nothing but trouble getting it to install. The Windows installer does a nice job of it though: 5 seconds and it's up and running with all its features enabled. I don't know how easy or hard it is to add your own Java routines to JGR and how easy or hard it is to redirect R output to your own Java application under JGR, or put them under the menu structure of JGR. I never tried, but I strongly suspect that it's possible and not hard. As far as I can see, this is not a question of interfacing with R but with JGR (a native Java application), but I never gave it much thought so you'll have to see for yourself. Sorry.
A popular Windows GUI (called TinnR) (see http://www.sciviews.org/Tinn-R/) has been written in Delphi 5.
Last but not least, R has excellent interactivity with Emacs, which works as well under Linux as it does under Windows. I personally can't stand Emacs, but lots of people swear by it (I just swear at it).
I agree with the first part of your post: to me R is something to code in when you have to, and to keep the resulting code as short and simple as possible. If I ever had to code a real application with a GUI that needed the statistical strengths of R, I would almost certainly not use R.
On the other hand I'd probably use Java and link to R as a server (see my other post about R and Java) instead of using Python.
"I think it addresses a niche market for high-end data analysts that want free, readily available code," said Anne H. Milley, director of technology product marketing at SAS. She adds, "We have customers who build engines for aircraft. I am happy they are not using freeware when I get on a jet."
This idiot needs to be fired immediately. Freeware is no less reliable then closed source. For someone who works at SAS the concept of peer review of work must obviously make the work more unreliable because it's peer reviewed. Such utter idiotic nonsense.
SAS Institute Inc.
100 SAS Campus Drive
Cary, NC 27513-2414
USA
Phone: 919-677-8000
Fax: 919-677-4444
I for one have no interest in dealing with someone that stupid. Lockheed, Boeing, Ford, Toyota, and ever damn bank I have ever seen and worked with has Linux in the enterprise. Someone with this seer level of stupidity needs to go.
-=[ Who Is John Galt? ]=-
Oh My Gobbldegook!
what happened to the part about sloths and fruit bats and orangutans?
?
'tis turing complete...
ever heard of "interpreted" languages?
entia non sunt multiplicanda praeter necessitatem
While that does pull out the rows which have 0 in column 1, it packs those return values into a vector (the first element of each of the selected rows, followed by the second element of the selected rows, etc.). If you want to extract those rows of the matrix, and keep them in matrix form, then this will do it (note the extra comma):
x[x[,1]==0,]
Yes, R is pretty nice in that sense. Matlab can do many of the same tricks, e.g. the Matlab equivalent of my command above is
x(x(:,1)==0,:)
Matlab doesn't have a simple way to do your version that I know of, since Matlab doesn't automatically recycle a vector index to a matrix. You'd have to do something like x(repmat(x(:,1)==0,n,1)) where n is the number of columns in x -- or to avoid hardcoding the n, you could do x(repmat(x(:,1)==0,size(x,2),1)) which is starting to get ugly isn't it?
And even with all that aside, you're still in some kind of state. Be it Idaho or the nation state of Australia.
Some people just don't understand anything. Sheesh.
I don't know if anyone's noticed, but this post was from the 'much-better-than-Q' dept.
Q is so much better to programming in than R (and more productive, especially when analysing large data sets).
Someones got an axe to grind in the R vs Q argument...
Tell that to Eiffel, Haskell, Clojure, C++ templates, et cetera. Don't confuse what certain functional languages do with something being a defining characteristic of functional languages. Many functional languages are mutable, have loops, are not pure, have state, et cetera.
StoneCypher is Full of BS
I've come across a couple of examples of inappropriate use of Excel
Yeah, but it's only the UK economy, it's not like its the German or the Japanese economy or anything. ;)
Firehed - Unfortunately, thanks to medical breakthroughs, common sense is not as common as it once was.
Matlab supports production of a stand alone executable from Matlab that does not require the Matlab environment.
It does until you call a function that doesn't support stand alone executables from one of the many available toolboxes.
Matlab and it's toolboxes are a great tools for analysis, but for direct production deployments of exe's there are a great many inconvenient detours involved. (Matlab has been steadily improving this though...)
nothing useful has ever come out of functional programming, period.
Lol. I generally agree with you, but map/reduce was inspired by functional programming and seems to be useful for some problems....
The funny part is, that as written there is nothing wrong with the statement:
"Calling R a programming language is like calling Mathematica or Matlab a language. ..."
Since by strict definition they are Turing Complete. So yes R is equivalent to Mathlab and Mathmatica, they are programming languages. I'm inferring by inflection that this isn't what the author intends say.
If I understand the intent of the original comment, the statement aptly show the difference between those use use programming languages and people who know what they are and how they work. Hobbes' and slashdotmsiriv's comments have the flavor of people who have written or know much about a programming languages internals. daknapp appears to be saying "This doesn't look like anything I've used, must not be a programming language".
"Better to remain silent and be thought a fool than to speak out and remove all doubt." -- Abraham Lincoln
BOFH, My model for being a sysadmin :)
The power of R is in its libraries, which are often maintained by the best statistical researchers in that area.
Recently I discovered the ggplot2 graphing library, which is a huge step forward for constructing graphs of all types in R. It's very well documented and very actively maintained.
http://had.co.nz/ggplot2/
I have been using R for Geospatial statistics for about a year.
R is a good language for number crunching large amounts of data, data visualization and analysis.
It is based on Scheme and brings back fond memories of Lisp for me.
You can compile and link in your own Fortran libraries using your favorite flavor of Fortan (F95 in my case) to extend it. This means it is more a job control language in some ways.
You should not loop in R but rather use sequences or do so in you libraries.
It has a truly frightening number of open source add on libraries.
Comparing it to C, Java, Ruby or other languages misses the mark. You't won't use it for e-commerce or low level controllers, but it was never designed for that.
R does have OOP capabilities though I haven't used them.
I've used Maple, SAS, Matlab and R. R wins hands down. It's powerful and fun.
Since I've only been using it a year, there are a number of other things I probably have missed.
putting the 'B' in LGBTQ+
Not sure, I only spent about 2 minutes looking at the site.
...
I mostly know functional programming from studying Scheme and Logo. These languages facilitates stateless programming by making it easy to avoid state variables. (Really, the word "variable" means something quite different in these languages.) Textbooks that I've read that are based on these languages (SICP being the most famous example) simply don't talk about traditional loops — all iteration is done with tail recursion. Perhaps other functional languages are different, but these are the only two I know anything about.
All beside the point. As a half-dozen posts have already pointed out, I was wrong to describe R as a functional language. It's an imperative language with lots of vector operations. That's where it's looplessness comes from.
Do I have to give my karma points back?
I have used both SAS and R for projects large and small. Each has its own use, but I strongly disagree with the SAS marketing rep's argument that "We have customers who build engines for aircraft. I am happy they are not using freeware when I get on a jet." Actually, I would find it terrifying to think about the engineers using canned packages that are subject to no external review whatsoever, rather than R and its routines, which are freely available for all to see.
When choosing between the two, there are two questions I would ask:
1) What format does my data come in?
2) Is my data small enough to fit in memory?
If the answer to 1 is "SAS dataset" or the answer to 2 is "NO" then I would use SAS. It keeps datasets on disk, so if you are handling TBs (or even GBs) of data, there is no better software package. A lot of big data dumps come in SAS, too, so while it is easy to dump to CSV, if you are repeatedly accessing the same data source, SAS makes sense. Also, if you already have significant infrastructure written in SAS (data processing, reporting, etc.) it may make sense to continue doing it that way.
In all other cases (and even in some of the above cases, when considering overall cost), R is better. Easier to program in; easier to write C/Fortran/Python/etc. extensions for; contains real functions (SAS just has macros); object oriented; and FREE (GPL and zero cost). That last bit is important because if for some reason the SAS Institute goes bust (stranger things have happened), and all of your code is in SAS, you are fucked. Whereas if you have been writing in R, you will always have your existing software and can, in principle, continue developing it yourself or hire someone to maintain/upgrade it for you.
R is great, even if it is a slightly awkward language. It easily blows Octave out of the water in almost every way, but is inferior (as a language) to python+scipy, although I believe that it has far more packages available for it. Just check out CRAN some day. It's amazing.
ANSI SQL is not Turing-complete. If you add stored procedures you can get a Turing-complete language, but blech.
Exactly.
Maybe a better example is Postfix. Turing-complete, but do you really want to write more than pretty fractal patterns in it?
Don't thank God, thank a doctor!
ANSI SQL is not Turing-complete.
It is if WITH RECURSIVE is implemented.
Social scientists are inspired by theories; scientists are humbled by facts.
Or in the UK, out of your "Rs".
To have a right to do a thing is not at all the same as to be right in doing it
I haven't seen several things listed so far:
1. Plotting (graphing). You have several choices:
i. R's base graphics
ii. The lattice package
iii. The ggplot2 package
iv. Packages that create interesting classes tend to create plot methods for them (really a special
case of item i, above)
Most environments/languages would be absolutely thrilled to have options ii or iii, and option i/iv is pretty good, too. The ggplot2 package in particular has a different (but very nice) approach to plotting.
I mean how many programs/environments with adequate graphics spawn multiple graphics packages, each of which is well-maintained and good?
2. I like named parameters, and while it may not be the neatest programming practice in the world, the "..." parameter is very clever. It stands for all the parameters you have not specified in your function. For example:
foo - function (x, y=13, ...) { x - bar (x, y) ; baz (x, ...) }
The code here is silly, but the idea is that you are passing x to baz, plus all of the arguments to foo that were not x or y (i.e. all of the arguments that you did not specifically list as parameters for foo).
It's a poor-man's OO tool, I guess, but it gets used a lot in plotting, in particular.
3. Documentation. I've seen complaints about the documentation for R packages, but when I've dipped my toe in the Matlab and Sage worlds, I've found package documentation that looks more like notes scribbled on a sticky note than R documentation. R packages have documentation that includes a nice TeX'd output, and usually good explanations and bibliographies. Larger packages will even have vignettes that dive into specific areas of the package.
Faster, easier, nicer.
Well, both of them ARE very Lisp-y, sooo... Yeah, that's about it.
I know tobacco is bad for you, so I smoke weed with crack.
Someone known R interpreter written in Java ??
JMule user, enjoy it : http://www.jmule.org
More than that, you can connect Eclipse to Matlab via JDWP, and debug your java code (set breakpoints in it, stop on exceptions etc) while calling it from matlab.
Matlab and it's toolboxes are a great tools for analysis, but for direct production deployments of exe's there are a great many inconvenient detours involved.
I disagree. A lot of code running on-orbit is autocoded from Matlab/Simulink. This code is far easier to maintain (as long as you keep the software developers from modifying it) and more reliable than hand written code. Analysis and development is also far easier in the Matlab environment as well.
Tesla was a genius. Edison however was a overrated hack who liked to torture puppies.
I hate to pile on, but this comment is in error.
In one of Dana Scott's papers from the 1980s, he presents a programming language which is equipolent with Turing Machines but which consists entirely of the letter G. It's syntax was just the ability to apply expressions to other expressions, and it's operational semantics was some rewriting rules. It was very cute.
Sorry, I'm a jerk. I replied to the wrong comment.