Slashdot Mirror


Interviews: Ask Author and Programmer Andy Nicholls About R

Andy Nicholls has been an R programmer and consultant for Mango Solutions since 2011 (where he currently manages the R consultancy team), after a long stint as a statistician in the pharmaceutical industry. He has a serious background in mathematics, too, with a Masters in math and another in Statistics with Applications in Medicine. Andy has taught more than 50 on-site R training courses and has been involved in the development of more than 30 R packages; he's also a regular contributor to events at LondonR, the largest R user group in the UK. But since not everyone can get to London for a user group meeting, you can get some of the insights he's gained as an R expert in Sams Teach Yourself R In 24 Hours (available in print or at Safari), of which he is the lead author. Today, though, you can ask Andy about the much-lauded statistics-oriented free software (GPL) language directly -- Why to use it, how to get started, how to get things done, and where those intriguing release names come from. (The about page is helpful, too.) As usual, please ask as many questions as you'd like, but one question at a time, please. Note: Slashdot is always looking for interesting interview guests. Who do you want to ask? Let us know!

13 of 187 comments (clear)

  1. R? by U2xhc2hkb3QgU3Vja3M · · Score: 5, Funny

    Is that a pirates-only language?

  2. Evolution of R by patabongo · · Score: 4, Interesting

    How has the way you use R changed over time? For myself, I don't think I've gone through an entire R session in the past six months without loading dplyr. Combine that with the pipeline operator and I think if you'd shown the R code I wrote yesterday to me of two years ago, I wouldn't have believed it was the same language.

  3. Future of R, now that programmers use it? by Anonymous Coward · · Score: 2, Insightful

    What's your take on the future of R? It used to be that it was a tool for statisticians, and now it's been discovered by programmers. As a statistician who's not a programmer, but who hangs out sometimes on slashdot and stackoverflow, it feels sometime like it's in danger of becoming just another language for programmers, instead of a tool for statisticians. Should I be worried? Can it be both? Is this mass inflow of programmers going to change it somehow? Or am I just having a "get off my lawn" moment?

    More about me, I'm a PhD statistician at a major public research university, and use R every day for data manipulation, exploration, and analysis, and have for 10+ years. I've done a few packages and enough coding that I know most of R's quirks, but would not consider myself a programmer.

    1. Re:Future of R, now that programmers use it? by Pseudonym · · Score: 3, Insightful

      As a statistician who's not a programmer, but who hangs out sometimes on slashdot and stackoverflow, it feels sometime like it's in danger of becoming just another language for programmers, instead of a tool for statisticians.

      As a programmer who used to research programming languages, here's no danger of that at all.

      It's not much of a stretch to say that no programmer really uses R. At most, programmers use the high-quality statistical libraries which only work with R. R is basically the best statistical packages every written bound together by one of the worst programming languages ever developed.

      --
      sub f{($f)=@_;print"$f(q{$f});";}f(q{sub f{($f)=@_;print"$f(q{$f});";}f});
    2. Re:Future of R, now that programmers use it? by gumbi+west · · Score: 2

      I actually program exclusively in R and fine it OK once you learn the quirks. Where it excels is in sort of "jotting" down thoughts about programs. e.g. you can define a S3 class and then make one that only has a few of the properties, or claim your object is a class it is not. This would drive any Java programer bananas but it's super nice for going fast and loose.

      Similarly, the fact that it can recover your call in addition to the arguments you passed makes several functions work much better when you haven't specified all of the optional arguments. Once you have specified them then it's back to irrelevant.

  4. Key advantages of R by Compuser · · Score: 5, Interesting

    In your view, what are the key advantages of R over other scientific computing languages, most notably Matlab (which has to be considered with its plethora of toolboxes of course)?

    1. Re:Key advantages of R by TeknoHog · · Score: 4, Interesting

      In your view, what are the key advantages of R over other scientific computing languages, most notably Matlab (which has to be considered with its plethora of toolboxes of course)?

      Or Python with scipy/numpy, or Julia, given their open source nature in addition to the plethora of libraries.

      --
      Escher was the first MC and Giger invented the HR department.
  5. Re:Tips for new statisticians by Shadow+of+Eternity · · Score: 2

    Hoisting the AC for asking a good question.

    To add on: R is gaining massive traction in graduate programs but so many professors teach it like it's SPSS, almost as a cargo cult coding language, and so much of the documentation is written for people who are already experienced coders. Is there any decent introduction to R for someone that doesn't already know it (or another programming language) fluently?

    --
    A bullet may have your name on it but splash damage is addressed "To whom it may concern."
  6. What about the painful side of R? by Shadow+of+Eternity · · Score: 5, Interesting

    There's an entire book, the R Inferno, dedicated to R's many "quirks" and problems. Is there ever a plan to dedicate some time to focusing on cleaning up the language and making it less painful to use?

    --
    A bullet may have your name on it but splash damage is addressed "To whom it may concern."
  7. Harsh crowd by Anonymous Coward · · Score: 2, Interesting

    In my experience (from searching for R advice online - I've never mailed the R discussion list myself) the R community is incredibly harsh and unforgiving of new users. Answers to beginners' questions are normally brusque - often extremely so. (I remember one exchange, where a user basically asked "I've read the documentation for par, and I don't understand ...", and the response was, in its entirety, "?par" -- which, for those unfamiliar with R, is the command to bring up the documentation for par.)

    On the statistical end of things, too, the community seems less than helpful. My impression is that it's normally assumed that all R users have good (graduate student-level) backgrounds on the statistical aspects, and little to no consideration is given to those who might not be up to speed on the theoretical basis of some of the functions in R, or who haven't read the (pay-walled, mathematically dense) 1963 paper where the method was first described.

    What are your thoughts on the helpfulness and "beginner friendliness" of the R community? Do you think there might be an issue with going from a very hand-holdy "Teach Yourself In 24 Hours" type work and being abruptly dumped into a much more brusque "why are you asking us? - figure it out yourself!" type environment?

  8. Impressed with R's speed by joeblog · · Score: 2

    I encountered R via Johns Hopkins University's data science series of Coursera courses which I highly recommend. The first one is at https://www.coursera.org/learn...

    As a mainly Python programer, but someone with an eclectic interest in programing languages (I enjoy Prolog, Lisp, ML...), I've found R very intriguing: it's a very "functional" programing language, but also object oriented (using dollar signs instead of the customary dots). I've also found R to be incredibly quick -- provided you know and use the right builtin functions. I once tried to solve an assignment with a for loop and killed the process after it hadn't finished within a day. Using "aggregate" did the job within an instant of pressing enter.

    I've found R to have numerous strange quirks I haven't got the hang of, resulting in weird results sometimes which I can't debug. The Coursera course mentioned above teaches a style of R I'm not particularly fond of using various libraries, which I'm ideologically opposed to in the same way I prefer battling with JavaScript directly rather than learning JQuery as an intermediary "dialect".

    What are your pointers for the "right way" to program in R?

    --
    If it works, it's obsolete
  9. Using R to learn statistics by twistedcubic · · Score: 3, Interesting

    What topic(s) in statistics do you think students can learn easier today using R than years ago when there was nothing like R widely available?

  10. How about errors and debugging? by GlobalEcho · · Score: 2

    I feel that one of the weakest points of R is the error handling, reporting, and debugging available.  Do you have advice on tools or techniques for people coding in R (aside from using RStudio?  Are there plans for improvements in this area?  The current facilities are reminiscent, at least to me, of using gdb back in the 1990s.

    I have in mind cases like the following, in which a confusion about list access using the [ operator (when the [[ should have been used) provides a cryptic error message with no traceback available.

    > symlog_scaler <- list(linear_to=2.5,  abscissa=2.0,
    +    scaling_function=function(x,linear_to=2.5,abscissa=2.0){
    +        y <- x; linear_to = abs(linear_to); big_ix = (linear_to<x)
    +        y[big_ix] = linear_to + log(1+(x[big_ix] - linear_to), base=abscissa)
    +        small_ix = (-linear_to>x)
    +        y[small_ix] = -(linear_to + log(1+(-x[small_ix] - linear_to),base=abscissa))
    +        y})
    > symlog_scaler$scaling_function(-5:5)
    [1] -4.307355 -3.821928 -3.084963 -2.000000 -1.000000  0.000000  1.000000  2.000000  3.084963
    [10]  3.821928  4.307355
    > symlog_scaler['scaling_function'](-5:5)
    Error: attempt to apply non-function
    > traceback()
    No traceback available
    >