R Throwdown Challenge
theodp (442580) writes "'R beats Python!' screams the headline at Prof. Norm Matloff's Mad (Data) Scientist blog. 'R beats Julia! Anyone else wanna challenge R?' Not that he has anything against Python, Matloff adds, but he just doesn't believe that Python or Julia will become 'the new R' anytime soon, or ever. Why? 'R is written by statisticians, for statisticians,' explains Matloff. 'It matters. An Argentinian chef, say, who wants to make Japanese sushi may get all the ingredients right, but likely it just won't work out quite the same. Similarly, a Pythonista could certainly cook up some code for some statistical procedure by reading a statistics book, but it wouldn't be quite same. It would likely be missing some things of interest to the practicing statistician. And R is Statistically Correct.'"
Nothing with a name that verbose can possibly be any good.
Get thee glass eyes, and, like a scurvy politician, seem to see things thou dost not.--King Lear
I don't see any margin of error. This claim is scientifically worthless.
An Argentinian chef is more likely to make great sushi than a Japanese automotive engineer.
You generally want to use programming languages designed by experienced programmers (even better, experienced language designers) who work closely with subject matter experts. Left to their own devices, experts are likely to get a lot of things wrong, and if the language is sufficiently popular, you are stuck with their mistakes for a long time to come.
R itself is okay, but even as a long-time user I don't think the language or environment itself is all that much to brag about. What makes it great for statistics is just that statisticians use it, which means that a lot of the packages are written by statisticians. That makes a big difference: recent papers often have R implementations, standard problems have well-maintained R packages for them with all the bells and whistles, etc. As Matloff notes, this means they often have everything that statisticians are looking for, while straightforward textbook implementations you often find in other languages often aren't nearly as thorough in how they handle the statistical models, or only handle some special cases (though there are some really good packages in other languages, just not as many).
But I don't think that has much to do with R itself being uniquely suited to statisticians. It's used for historical reasons: Bell Labs S was influential in the field way back when nothing like Python or Julia existed, and statisticians started using it because it was a lot nicer than Fortran, which is what other areas of science mostly used back then. GNU R is essentially a free-software workalike for Bell's S, and it's kept most of the community on board through a mixture of existing packages, familiarity, and inertia.
10 PRINT CHR$(205.5+RND(1)); : GOTO 10
Why is there no "r" in statistics?
" And R is Statistically Correct" doesn't mean anything.
Python, C, Mathematica and R all have different strengths for mathematical work / numerical calculations though, and using the best tool for the job is what it's about. As always, what the best tool actually is, is also rather subjective, as which tool will best solve a specific task is always dependent on your skill with the different tools. I do agree with professor though, even though there's quite abit of Python hype (python + scipy/matplotlib is amazing) R is not being replaced anytime soon. It's too good at what it's good at.
"" How about taking the safety labels off everything, and let the stupidity-problem solve itself? """
because it is an inferior mish-mash for an up-start generation which was never taught the, "In the end, everything looks like LISP," maxim. And its requirement for particular whitespace offends me as someone who has spent the last decade working with accessibility groups.
I'm not really sure I see where R fits, though. For basic statistical work, SPSS is good. For advanced statistical work, surely you'd want a general purpose language with cross-language libraries?
His posts are perfectly 'random'. R itself is written in C. Python is also written in C. I can't see why one can get much better statistical correctness in R than what comes from its underlying implementation - in C.
A joke I've read recently:
I'm not sure if "R is written by statisticians, for statisticians" is a good thing e.g. "stadiums are built by footballers, for footballers"
R may be written for statisticians, but is rightly criticized for lacking the validation that SAS has (which python et al also lack). There's a good discussion here on the subject. And for what it's worth, both R and SAS both lack the tools to easily hook into other systems, which really makes them good ONLY for ad hoc statistics and reports.
and pirates.
Yeah R is so different from Python, I mean everything is the same but not quite and I totally have a point and not just bullshitting because like Japanese sushi and beef Argentinian soup, brocolli.
Use the right tool for the job and stop bashing other tools that were designed for different jobs .
---- Booth was a patriot ----
So a special purpose statistics language beats out python - a general purpose language with lots of varying libraries (its real strength...)
Thats news? or worthy of some retards crowing ?
I never heard of R before and as it is statistics I see no need to know much further.
Next language or equiv I might look at is one that simulates Quantum Computing, as I want to see what applications that computing method is actually applicable to.
APK throws challenges to trolls here on hosts. Not a 1 manages to validly topple his points.
If I had to do some intense statistical analysis, then R is probably a better choice.
Now, if I have to get data via a feed or web page scraping, manipulate it, clean it, do some sanalysis, display it or feed it to another program, then Python makes all of that much easier and maintainable.
Back in the old days before all these smancy fancy tools, we used this red book called something like "Mathematical Programming in C" - in the snow; uphill both ways. It had the code and alogrithms to implelent all the stats, engineering, and god knows what - all in C.
I don't see it on Amazon - or I got the title totally wrong.
"R is written by statisticians, for statisticians"
This is primarily why it will never gain widespread adoption, too. Most people aren't statisticians, and probably don't want to be.
unless you're a statistician or interested in writing programs for high-accuracy statistics.
"Arrrr.... fix yar name 'R' while you may, maties!!"
I may not have the belly for Deep Statistics but I do know abut Internet Search noise levels. I remember trying to do research on WebDAV (believe me, there is such a thing) only to discover that folks discussing it invariably refer to it as 'dav'. Because saying "Distributed Authoring [and] Versioning" out loud makes you spit out your toothpick. Any attempt to search 'webdav' yielded only the sterile official pages, and attempts to search on 'dav' with other keywords brought up conversations from the community of Disabled American Veterans who also use the term in casual conversation, and have said an awful lot over the years. They occupied 'dav' first.
Now you may think you can pull off a 'C' where Google seems to pick off relevant results if you combine it with any computery term, but it was not always so. It has taken an incredible saturation of C, and perhaps some special coded cases on Google's part, for this to come about.
The success of Perl is due in some part to the ability of confused people to obtain help and advice about it merely by searching on its unique spelling.
So the best way to push this R language is with a refit of the name. Go with the pirate theme, it will sell many more T-shirts than those of silly camels and pearls. But stake out a bit of Keyword Real Estate that presently has a relatively low population density.
Google search result estimate counts, descending order,
r --- 2,730,000,000
ar --- 656,000,000
arr --- 24,400,000
arrrrrrrr --- 3,060,000
arrrr --- 876,000
aarr --- 638,000
arrr --- 536,000
arrrrr --- 405,000
aaarrrrr --- 267,000
arrrrrr --- 205,000
arrrrrrr --- 129,000
aarrr --- 107,000
aarrrr --- 107,000
aaarrr --- 56,600
aaarrr --- 56,600
arrrrrrrrr --- 52,400
Adding arrrs is not enough since talking like a pirate is typically accomplished with a single 'a', so ar+ space is pretty well populated up to ar{5}, it looks like best ratio is around a{3}r{3}. But even choosing the less-optimum and easier to type a{2}r{3} by using 'aarrr' instead of 'r' you have improved the signal to noise ratio by a factor of twenty-five thousand.
Push the name change firmly and decisively. This means that if anyone mentions 'R' there should be immediate responses that ask, "What AARRR you talking about?" This will inject the proper searchable term into the discussion while it reminds the poster of the name change.
For an interesting 9 minute lecture that might help sell you on this idea, listen here.
<blink>down the rabbit hole</blink>
For a nice video on using ipython notebook in data analysis: https://www.youtube.com/watch?...
For a nice selection of ipython notebooks for doing various type of data analysis: https://github.com/ipython/ipy...
Having seen the state of programming in the Sciences, I really do not thing that "built by statisticians" is something you would want to advertise.
Troll is not a replacement for I disagree.
A few examples are provided in TFA but it's all rather vague as to why R "beats" Python. I've been using R for years for fitting mixed effects linear models. It does this really well, it makes it easy to compare models, it's got all the cutting-edge stuff in it. The problem with R, however, is that it's shitty and unintuitive as a programming language. I do all my pre-processing in MATLAB and I only ever export to R when I have a final data frame that needs a moderately complicated statistical analysis.
soylentnews.org
I'm afraid your research neglects a huge subset of the Talk-Like-A-Pirate word space, 'yarr' has 523,000 results
No one uses R for it's amazing language*. The language sucks. R is used because it has nearly limitless, tested, and approved statistical algorithms. Want partial least squares, support vector machines, linear models, principle components analysis, Fisher's exact test?, they are all there waiting to process your data. Along with hundreds of other analyses that you might really need to use but don't even know about yet.
"Python" doesn't have this stuff because it is a language, not a set of statistical methods.
*there may be a few deviants who use it for self flagellation
And to what extent are statisticians willing to use warez?
ASM makes R possible.
Henceforth, ASM is king. R is just another pretender.
Still waiting on Serviscope_minor to wake up to fucking reality and realize that Jessica Price isn't going to fuck him.
dickbreath???...
Well, by that logic, nothing past machine code was ever needed.
I think there's some value in having a language that allows you te express code in an efficient notation. Just like there's value in having mathematical formulas and not having to write mathematical work as prose.
Python has Beautiful Soup for web page scraping that I have not seen in any other language.
Java has JSoup
This guy must have been reading the recent stuff on Fortran and decided to jump on the bandwagon.
Fortran was written by engineers and scientists for engineers and scientists.
R is written by statisticians for statisticians.
Well, there you have it. If a language or other kind of tool was developed by practitioners of X for other practitioners of X, it’s likely that it will be better than some other tool that was designed for a different purpose.
Who would have thunk it.
He's probably right. All other things being equal a good Domain Specific Language will crush a General Purpose Language in its domain. If Julia is much faster than R and that were unfixable it would still be far easier to write a library in Julia accessible by R than to train R users in all of Julia's concepts.
General purpose languages can sometimes get close to DSLs in effectiveness and then the greater diversity of users creates an economy of sacle and deep entrenchment which drives DSLs away. But then with a large and highly diverse user base the General Purpose language isn't able to rapidly adapt so DSLs spring up to fill niches. Some of those DSLs become incredibly successful and start to move into other domains diversifying their purpose and user base to become General Purpose Languages and the cycle repeats.
This is such a pointless discussion. To all the people going on and on about how horrible a programming language R is: well, it never intended to be a good programming language. It is perfect for what it is meant to do, namely, load data, do statistical analysis on it, and produce graphics summarizing the results.
If you need a _programming language_, no, R alone won't do. It is not for "programming in the large", it's not for problems that can't be expressed as vector/matrix/array manipulations, it is not good for writing "modular" software" (perfectly good for independent packages, however).
No, people are not using it simply out of momentum. Unless you define "momentum" as "using it because it does statistical analysis and graphics".
I'm shocked to learn that a purpose-built programming language might be better at its specific purpose than a general purpose programming language. Shocked I say.
I'd be even more shocked if a bunch of mathematicians had the good sense to pick a Google searchable name for their language. One PIA thing with C is how hard it is to search Google for documentation when you don't remember the exact function name.
Moderating "-1, Disagree" is simple censorship. Have the guts to post your opinion.
Completely right.
We use R extensively in work. Programmers talk about R's libraries, but that's not the real reason we use it. The killer blow is that the _documentation_ is written by statisticians. That means that it's reliable, easy to understand, and honestly tells you the pitfalls of the techniques you're using.
We're financial guys who are doing stuff in consumer finance that has rarely, if ever, been done in our field. The statistics aren't particularly advanced, but it's impossible to hire someone who understands the industry and knows the statistics already. Statistics text books tend to either be so basic that you already know what they say, or so advanced that you need a PhD to understand them. On the other hand, much of the R documentation is beautifully simple to read, and comes with brilliant worked examples - albeit from fields that are very different from our own. Whenever we're researching potential new statistical approaches, we find blogs stuffed full of examples written in R.
In short, the R ecosystem makes you a better statistician. Julia and Python can't offer that.
This joke was tired and lazy a decade ago. You're not just beating a dead horse, you've move past that to sodomizing it.
You know you're on /. when you need to check half of the words on wikipedia, just to be able to understand a 10 words sentence. :-)
"Sufficiently advanced satire is indistinguishable from reality." - [Tips: 1DrYakQDKCQ6y52z6QbnkxHXAocMZJE61o ]
We all know Raul Julia as M Bison beats them both. And Raul Julia's reading of "Mystery on the Docks" on Double "R" (Reading Rainbow) lives on in my mind as one of the great renditions.
It's one of those things that only comes up in the context of comparing programming languages. It's a feature of certain languages that the code is also a data type. That means that you can, e.g., concatenate a string of commands:
and then call
Lisp is the most notable example. Slightly more usefully, you can write a function, pass it around to another function, change (e.g.) a SUM operation to SIN, and return the new function for use elsewhere. It's related but not equivalent to having functions as first class objects, the former implies the latter but not vice-versa. There are a great many useful things one can do with code that writes code that writes code, and even a wide sea of things that are more concisely or elegantly done this way. I don't know if there is a class of problems which can only be performed in a homoiconic language, but I'd guess that there's a Turing tar-pit for anyone interested in using the wrong tool for a given job.
So if what you got from that it's something that only LISP weenies and Clojure hackers care about, it's pretty much true: that weird light in their eyes is probably related. I have more pedestrian programming challenges, but some nights, I dream of destroying the divide between object and action, between coder and code.
From the summary:
And R is Statistically Correct
But Python is correct all the time.
If Pandora's box is destined to be opened, *I* want to be the one to open it.
> 'R is written by statisticians, for statisticians,'
that's why it's great for statistics but kinda sucks as a language... I have a few comments saved from the last debate about it...
"All indexing in R is base-one. Note that no error is thrown if you try to access a[0]; it always returns an atomic vector of the same type but of length zero, written like numeric(0). Unaccountably, nobody's in jail for that decision, yet. Indexing past the end of the array, by contrast, yields NA."
"If you ask for a numeric vector using numeric(42) or as.numeric(x), you will get a double vector. A perfect R-ism is that if you ask for a single vector, you'll still get a double-precision float vector, though it will have a flag set so that it will be passed into C APIs as single-width floats instead of doubles. There is no single-precision storage type in R."
because, you know, fuck you! that's why!
R is strange, and non intuitive for programmers. I dont like it. Python is beatiful and simple. Even if Python is less statistics-ish, I prefer it because there are so many surprises in R - all the time.