Interviews: Ask Author and Programmer Andy Nicholls About R
Andy Nicholls has been an R programmer and consultant for Mango Solutions since 2011 (where he currently manages the R consultancy team), after a long stint as a statistician in the pharmaceutical industry. He has a serious background in mathematics, too, with a Masters in math and another in Statistics with Applications in Medicine. Andy has taught more than 50 on-site R training courses and has been involved in the development of more than 30 R packages; he's also a regular contributor to events at LondonR, the largest R user group in the UK. But since not everyone can get to London for a user group meeting, you can get some of the insights he's gained as an R expert in Sams Teach Yourself R In 24 Hours (available in print or at Safari), of which he is the lead author. Today, though, you can ask Andy about the much-lauded statistics-oriented free software (GPL) language directly -- Why to use it, how to get started, how to get things done, and where those intriguing release names come from. (The about page is helpful, too.) As usual, please ask as many questions as you'd like, but one question at a time, please.
Note: Slashdot is always looking for interesting interview guests. Who do you want to ask? Let us know!
Is that a pirates-only language?
How has the way you use R changed over time? For myself, I don't think I've gone through an entire R session in the past six months without loading dplyr. Combine that with the pipeline operator and I think if you'd shown the R code I wrote yesterday to me of two years ago, I wouldn't have believed it was the same language.
What's your take on the future of R? It used to be that it was a tool for statisticians, and now it's been discovered by programmers. As a statistician who's not a programmer, but who hangs out sometimes on slashdot and stackoverflow, it feels sometime like it's in danger of becoming just another language for programmers, instead of a tool for statisticians. Should I be worried? Can it be both? Is this mass inflow of programmers going to change it somehow? Or am I just having a "get off my lawn" moment?
More about me, I'm a PhD statistician at a major public research university, and use R every day for data manipulation, exploration, and analysis, and have for 10+ years. I've done a few packages and enough coding that I know most of R's quirks, but would not consider myself a programmer.
I don't believe you're even a human any more. In fact, I'd go as far as to say you're something akin to 791.
-SR
In your view, what are the key advantages of R over other scientific computing languages, most notably Matlab (which has to be considered with its plethora of toolboxes of course)?
For those that are relatively new to R and hope to enter the field of statistics, where would you recommend focusing your R training efforts?
For example, which programming concepts, or fields of application, or packages, etc. do you feel are especially worthy of attention?
Similarly, what would you recommend we avoid?
Ken M usually does not post as anonymous.
Ken M is actually funny. Of course, as an Ken M enthusiast I'd add one more reason he's a huge improvement over APK here.
You are not alone. This is not normal. None of this is normal.
There's an entire book, the R Inferno, dedicated to R's many "quirks" and problems. Is there ever a plan to dedicate some time to focusing on cleaning up the language and making it less painful to use?
A bullet may have your name on it but splash damage is addressed "To whom it may concern."
In my experience (from searching for R advice online - I've never mailed the R discussion list myself) the R community is incredibly harsh and unforgiving of new users. Answers to beginners' questions are normally brusque - often extremely so. (I remember one exchange, where a user basically asked "I've read the documentation for par, and I don't understand ...", and the response was, in its entirety, "?par" -- which, for those unfamiliar with R, is the command to bring up the documentation for par.)
On the statistical end of things, too, the community seems less than helpful. My impression is that it's normally assumed that all R users have good (graduate student-level) backgrounds on the statistical aspects, and little to no consideration is given to those who might not be up to speed on the theoretical basis of some of the functions in R, or who haven't read the (pay-walled, mathematically dense) 1963 paper where the method was first described.
What are your thoughts on the helpfulness and "beginner friendliness" of the R community? Do you think there might be an issue with going from a very hand-holdy "Teach Yourself In 24 Hours" type work and being abruptly dumped into a much more brusque "why are you asking us? - figure it out yourself!" type environment?
I encountered R via Johns Hopkins University's data science series of Coursera courses which I highly recommend. The first one is at https://www.coursera.org/learn...
As a mainly Python programer, but someone with an eclectic interest in programing languages (I enjoy Prolog, Lisp, ML...), I've found R very intriguing: it's a very "functional" programing language, but also object oriented (using dollar signs instead of the customary dots). I've also found R to be incredibly quick -- provided you know and use the right builtin functions. I once tried to solve an assignment with a for loop and killed the process after it hadn't finished within a day. Using "aggregate" did the job within an instant of pressing enter.
I've found R to have numerous strange quirks I haven't got the hang of, resulting in weird results sometimes which I can't debug. The Coursera course mentioned above teaches a style of R I'm not particularly fond of using various libraries, which I'm ideologically opposed to in the same way I prefer battling with JavaScript directly rather than learning JQuery as an intermediary "dialect".
What are your pointers for the "right way" to program in R?
If it works, it's obsolete
I am myself an R aficionado, but what do you answer to someone who says that Python has gone a long why to be a good contender for data analysis tasks (SciPy, Pandas, Scikit etc...)?
I've complimented your work in the past, as a matter of fact. I'm sorry if you did not see it. It's not the application that's the problem, but the personality attached to it. If you read my posts on other topics, not related to you, you'll find that I'm quite often the reasonable one in the room. That leads me to wonder if that's really any different here.
Hopefully it wasn't me who drove you to that drink. I'm more of a gin man, myself, but I do enjoy a good rum; might I ask what you're poring tonight?
APK quotes people (including myself) without context and should not be trusted. Just thought you should know.
There is no way that ranking is accurate. It says C is the #2 language. There is no way. C++ is over C and no one programs in Delphi since 1999.
Dude, no one has used Object Pascal since 1999.
my persona can be as nice as the next person's UNTIL I am attacked
And this is why I was trying to point out that my initial comment, months ago, was indeed a joke and not a directed attack. Ya gotta admit, ya jumped in pretty heavy at the onset.
We good?
Cruzan is good stuff, definitely one of my choice rums when I go that route. Getting any "spendier" than that is just for show. My gin of preference is Citadelle; I bought a bottle of Tanqueray #10 at 4x the cost per ounce one night when I wanted to indulge and it's ended up being a show piece, certainly not best of breed.
I miss the winter weather, but I certainly don't miss being able to wear sandals year-round. I envy your snow right now, though.
APK quotes people (including myself) without context and should not be trusted. Just thought you should know.
Don't believe everything you read on the Internet. No one knows what TIOBE is. I can put up a website that says anything. There is no way anyone is using Delphi in 2016.
What topic(s) in statistics do you think students can learn easier today using R than years ago when there was nothing like R widely available?
I think minitab is better. How would you convince me otherwise?
Confucius say, "Find worm in apple - bad. Find half a worm - worse."
R has been around longer than Java, and is based on S which is older than C++. There's a huge body of existing code and libraries to leverage. But from what I gather, the real reason to use R is because the only other option you're being offered is SAS, and you don't want to deal with that mess! Or so I hear.
Bottom line, if you're not being threatened with SAS, there may be little reason to learn R. But if you are, or if you think there's any danger you might be, R is probably something you want to learn ASAP! :)
I feel that one of the weakest points of R is the error handling, reporting, and debugging available. Do you have advice on tools or techniques for people coding in R (aside from using RStudio? Are there plans for improvements in this area? The current facilities are reminiscent, at least to me, of using gdb back in the 1990s.
I have in mind cases like the following, in which a confusion about list access using the [ operator (when the [[ should have been used) provides a cryptic error message with no traceback available.
> symlog_scaler <- list(linear_to=2.5, abscissa=2.0,
+ scaling_function=function(x,linear_to=2.5,abscissa=2.0){
+ y <- x; linear_to = abs(linear_to); big_ix = (linear_to<x)
+ y[big_ix] = linear_to + log(1+(x[big_ix] - linear_to), base=abscissa)
+ small_ix = (-linear_to>x)
+ y[small_ix] = -(linear_to + log(1+(-x[small_ix] - linear_to),base=abscissa))
+ y})
> symlog_scaler$scaling_function(-5:5)
[1] -4.307355 -3.821928 -3.084963 -2.000000 -1.000000 0.000000 1.000000 2.000000 3.084963
[10] 3.821928 4.307355
> symlog_scaler['scaling_function'](-5:5)
Error: attempt to apply non-function
> traceback()
No traceback available
>
Right and I'd mod myself down, too.. mhm, so sure. I have one account, one single account, a fact which only Slashdot staff will be able to prove or disprove, much like your claim that I have multiple sockpuppet accounts. You're playing yourself for stupid.
APK quotes people (including myself) without context and should not be trusted. Just thought you should know.
Woah, woah, cool it, Alexander. What happened to the last couple messages we exchanged last night? Really not making yourself look good here, buddy, coming at me like this after we made amends.
APK quotes people (including myself) without context and should not be trusted. Just thought you should know.
When did I bring up statistics, other than pointing out that R is a statistical analysis language? Whichever AC said that, I can assure you it was not me, just as whoever modded both of us down was not me. I think your "disappointment" is misdirected; you're not family and clearly have no interest in being a friend, though, so your disappointment really doesn't mean much to me. Sorry about that.
APK quotes people (including myself) without context and should not be trusted. Just thought you should know.
I'd gladly lay off but you started up again even now
Everything you are referring to was posted before we supposedly made amends and had already been replied to by you.
Your POST HISTORY SHOWS YOU CONSTANTLY COMING IN AFTER I HAVE BEEN IN POSTS TOO
I stood up for you in one post and directly replied to you in another, in this very topic. Aside from that, there was another thread a few days ago where we interacted, and I made one off-the-cuff remark about wishing you'd leave me alone (in a thread where that type of comment was actually quite relevant), which was also made during that little tiff.
My only posts to or about you since our supposed (and clearly meaningless) truce have been directly to you, people posting as you, or people posting attacking me for the conversation we already had, inquiring as to WTF those attacks are all about since we had come to a new understanding and were supposedly past that like a couple of grown adults.
You know what? I no longer have karma to burn to make amends with you. You're dead to me.
APK quotes people (including myself) without context and should not be trusted. Just thought you should know.
I have been impressed with the strong community surrounding R, and the excellent third party libraries that are available in the CRAN.
What is your view on the various third party GUIs that exist for R, such as RStudio, Tinn-R and RExcel? Do you use or recommend any of them?
"Nine times out of ten, starting a fire is not the best way to solve the problem." - my wife
I wasn't gonna go there but... since he's dead to me now... don't you know it?
APK quotes people (including myself) without context and should not be trusted. Just thought you should know.