Slashdot Mirror


R Throwdown Challenge

theodp (442580) writes "'R beats Python!' screams the headline at Prof. Norm Matloff's Mad (Data) Scientist blog. 'R beats Julia! Anyone else wanna challenge R?' Not that he has anything against Python, Matloff adds, but he just doesn't believe that Python or Julia will become 'the new R' anytime soon, or ever. Why? 'R is written by statisticians, for statisticians,' explains Matloff. 'It matters. An Argentinian chef, say, who wants to make Japanese sushi may get all the ingredients right, but likely it just won't work out quite the same. Similarly, a Pythonista could certainly cook up some code for some statistical procedure by reading a statistics book, but it wouldn't be quite same. It would likely be missing some things of interest to the practicing statistician. And R is Statistically Correct.'"

10 of 185 comments (clear)

  1. Can't use it by smittyoneeach · · Score: 5, Funny

    Nothing with a name that verbose can possibly be any good.

    --
    Get thee glass eyes, and, like a scurvy politician, seem to see things thou dost not.--King Lear
  2. Hard to believe in these figures by CRCulver · · Score: 5, Funny

    And R is Statistically Correct.

    I don't see any margin of error. This claim is scientifically worthless.

  3. Bad analogy by Florian+Weimer · · Score: 5, Insightful

    An Argentinian chef is more likely to make great sushi than a Japanese automotive engineer.

    You generally want to use programming languages designed by experienced programmers (even better, experienced language designers) who work closely with subject matter experts. Left to their own devices, experts are likely to get a lot of things wrong, and if the language is sufficiently popular, you are stuck with their mistakes for a long time to come.

    1. Re:Bad analogy by Glock27 · · Score: 5, Interesting

      Exactly. Julia will eat R for lunch soon enough, I think. It's an elegant, well designed and efficient language. It's only been around for a couple of years, and has a very vibrant and rapidly growing community.

      Check it out for yourself: The Julia Language Homepage. It's got a lot to offer anyone with an interest in mathematics, including statisticians. It's based on the LLVM, and interfaces trivially with C libraries - plus it's a very fast language in it's own right, unlike R or Python.

      --
      Galileo: "The Earth revolves around the Sun!"
      Score: -1 100% Flamebait
    2. Re:Bad analogy by retchdog · · Score: 5, Interesting

      my friend uses julia, and every few weeks complains about some bug. the other day he mentioned that the latest release broke Bernoulli sampling (wtf?). the others have been pretty fundamental too.

      this is a serious problem, of course. the other one is lack of libraries. R is an abysmal pile of shit, but at least it's a standard; pretty much 95%+ of applied stats is at least partially supported by someone's hacked-up library/package. julia is far, far short of that, and it appears that much of its community is more interested in pretty graphics, meta-wankery, and interface methodology than actual working statistics (not that there's anything wrong with that per se).

      yeah, yeah, "fix it yourself," and it's on my list to write at least a basic survival analysis package for it. but i wouldn't blame anyone for not using it, and i wouldn't recommend it for doing stats as it is now.

      --
      "They were pure niggers." – Noam Chomsky
    3. Re:Bad analogy by professionalfurryele · · Score: 4, Insightful

      Sorry but I use both R and python in my work as a biomechanist and while I love working with python and hate working in R, R is not only less verbose for this task, but it is more consistent, intuitive and better documented. Very few languages beat python for simple, easy to read code, but it is not up to the task of doing general purpose statistics. To see why this is the case consider a problem with that blog post. All the diagnostic plots I need to do to check the regression are missing, no qq, no cook's, not even something simple like fitted vs. residual. Now consider what happens when I notice that while the fit is decent the residuals depend on what subject I'm looking at and I need to vary the error term. Or need to switch to a mixed effects model because there is clearly a dependence on the intercept by subject.
      Seriously when i say I hate R, I mean it. The code is ugly, it can be hard to read and woe betide the poor git who makes the mistake of needing a plot more complicated that something lattice can do. It is still better than python for statistics.

  4. true, but not really because of R itself by Trepidity · · Score: 5, Insightful

    R itself is okay, but even as a long-time user I don't think the language or environment itself is all that much to brag about. What makes it great for statistics is just that statisticians use it, which means that a lot of the packages are written by statisticians. That makes a big difference: recent papers often have R implementations, standard problems have well-maintained R packages for them with all the bells and whistles, etc. As Matloff notes, this means they often have everything that statisticians are looking for, while straightforward textbook implementations you often find in other languages often aren't nearly as thorough in how they handle the statistical models, or only handle some special cases (though there are some really good packages in other languages, just not as many).

    But I don't think that has much to do with R itself being uniquely suited to statisticians. It's used for historical reasons: Bell Labs S was influential in the field way back when nothing like Python or Julia existed, and statisticians started using it because it was a lot nicer than Fortran, which is what other areas of science mostly used back then. GNU R is essentially a free-software workalike for Bell's S, and it's kept most of the community on board through a mixture of existing packages, familiarity, and inertia.

  5. Meh by hyfe · · Score: 5, Informative
    Statistics major who programmed Python professionally for a few years (and have a MsC in Comp.Sci) ...

    ... this is all posturing and drama, but good on Prof. Norm Matloff for getting some attention. R is rather usefull, has quite a few extremely usefull features as a language, including some of the best list/indices handling I've seen anywhere. Excellent libraries for statistical work, but it also has quite a few the most downright abhorrent language decision I've seen anywhere ever, with the amazingly poor string handling (for a scripted language) topping that list ( http://www.burns-stat.com/page... )

    Python, C, Mathematica and R all have different strengths for mathematical work / numerical calculations though, and using the best tool for the job is what it's about. As always, what the best tool actually is, is also rather subjective, as which tool will best solve a specific task is always dependent on your skill with the different tools. I do agree with professor though, even though there's quite abit of Python hype (python + scipy/matplotlib is amazing) R is not being replaced anytime soon. It's too good at what it's good at.

    --
    "" How about taking the safety labels off everything, and let the stupidity-problem solve itself? """
  6. A joke on the subject by kav2k · · Score: 4, Funny

    A joke I've read recently:

    I'm not sure if "R is written by statisticians, for statisticians" is a good thing e.g. "stadiums are built by footballers, for footballers"

  7. If you're going to use R by Johnny+Loves+Linux · · Score: 4, Informative
    Be sure to use RStudio as the front end: http://www.rstudio.com/. Using on R in a terminal is ok, but having the beautiful GUI frontend RStudio makes working with R sooooooo much better! The help system, plots, R markdown (knitr), and inspecting variables in RStudio is so much easier. As far as comparisons go,
    1. R is no competitor to python for writing generic scripts.
    2. Python (numpy, scipy, statsmodels, pandas, sklearn, matplotlib, ipython and ipython notebooks) is not yet ready to compete with R for doing statistical analysis but give Python a couple of more years and then slashdot should do a review of how it compares.
    3. You can always call R from python using the r2py module. This is really easy within an ipython notebook using the %load_ext rmagic command.

    For a nice video on using ipython notebook in data analysis: https://www.youtube.com/watch?...

    For a nice selection of ipython notebooks for doing various type of data analysis: https://github.com/ipython/ipy...