Slashdot Mirror


The Power of the R Programming Language

BartlebyScrivener writes "The New York Times has an article on the R programming language. The Times describes it as: 'a popular programming language used by a growing number of data analysts inside corporations and academia. It is becoming their lingua franca partly because data mining has entered a golden age, whether being used to set ad prices, find new drugs more quickly or fine-tune financial models. Companies as diverse as Google, Pfizer, Merck, Bank of America, the InterContinental Hotels Group and Shell use it.'"

23 of 382 comments (clear)

  1. What's a pirate's favorite programming language? by Anonymous Coward · · Score: 5, Funny

    R!

  2. Re:Only for certain kind of analyst... by Samschnooks · · Score: 5, Insightful

    ... most others keep thinking that M$ Excel is the silver bullet.

    The folks I know who use Excel for analysis use it because it's the package that everyone gets in their organization, there's a shit load of material on the web that uses excel, there's plenty of add-ons for it (no need to reinvent the wheel), and when sharing data and analysis, everyone is familiar with it. An engineer I know who uses excel chose it because it was the fastest way to connect to his testing equipment. R is relatively new and as more folks come into the workforce who know it, we'll see it replace Excel for functions that it is better suited for.

  3. Well... by Weaselmancer · · Score: 5, Funny

    ...if at first you don't succeed, then skydiving is not for you.

    --
    Weaselmancer
    rediculous.
  4. Show me some example code by bogaboga · · Score: 5, Insightful

    My request is to those that are in the know to show me some example code, that does something useful. Then later, compare that code to code from other languages to accomplish the same task.

    Include reasons to support the notion that the R language is [necessarily] better at what it does.

    1. Re:Show me some example code by transonic_shock · · Score: 5, Insightful

      FTA
      "I think it addresses a niche market for high-end data analysts that want free, readily available code," said Anne H. Milley, director of technology product marketing at SAS. She adds, "We have customers who build engines for aircraft. I am happy they are not using freeware when I get on a jet.""

      Seriously, does this person know what she is talking about?

      1. Yes, CFD and Structural Analysis software is increasingly written using open source tools and run on open source OS (Linux running on clusters)

      2. SAS is not used to design any part of the aircraft.

      I have noticed SAS uses the same kind of FUD to counter R as M$ uses to counter Linux.

    2. Re:Show me some example code by visible.frylock · · Score: 5, Insightful

      Seriously, does this person know what she is talking about?

      Let's see, Director of technology product marketing. I'm gonna go with a big NO.

      --
      Billy Brown rides on. Yolanda Green bypasses Gary White.
    3. Re:Show me some example code by lt.+slock · · Score: 5, Informative

      I use R a great deal. Think of it as an alternative to MATLAB, or Excel, rather than C or perl or lisp or whatever you like to use as a general purpose language. So, compared to MATLAB, functions are first class objects (rather like lisp), so, you can write functions that take functions as arguments, and return them as well, just as though
      they were simple variables. It handles
      vectors rather easily, and has decent plotting tools.

      #quick example

      # function, which, given numerical arguments a and b, and a function g, returns a function of x
      f - function(a,b, g){
          function(x){ a * x + g(b * x)}
      }

      f1 - f(1,2.5,sin)
      x - seq(-pi,pi,l=100)
      plot(x,f1(x),type='l')

  5. SAS strikes out ^H^H^H er, "back" by enilnomi · · Score: 5, Informative
    FTFA:

    She [Anne H. Milley, director of technology product marketing at SAS] adds, "We have customers who build engines for aircraft. I am happy they are not using freeware when I get on a jet."

    Good thing Boeing's not using fere software for aircraft simulation tools, space station labs, sub hunters, or moon rockets ;-)

    --
    education is no substitute for intelligence
    1. Re:SAS strikes out ^H^H^H er, "back" by jd · · Score: 5, Informative

      Good thing NASA likewise never uses Open Source to design engines and aircraft alongside companies like Boeing. (*This product may contain nuts^H^H^H^Hsarcasm.)

      --
      It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
  6. Re:Only for certain kind of analyst... by jaxtherat · · Score: 5, Informative

    Sorry, but R is not relatively new, it's been around for at least 10 years, I was taught how to use R at University back in 2001, and S and later S+ (which R is a FOSS version of) has been around for even longer, since the mid 70's.

    --
    http://www.zombieapocalypse.tv/
  7. Re:Freak your colleagues out with "no loop" code.. by Anonymous Coward · · Score: 5, Informative

    "The worse thing about R programming is its name. Googling for "R" turns up way to much noise and way too little signal"

    Try searching from http://rseek.org/ instead of directly from Google.

  8. Re:r-project.org by DahGhostfacedFiddlah · · Score: 5, Funny

    the libraries available for doing such analysis are unparalleled.

    With multi-core processors becoming more and more prevalent, R's developers should remedy this as soon as possible.

  9. Re:Not a language, really by Hobbes_2100 · · Score: 5, Insightful

    Are you kidding me? Are you really *(*$@#ing, Grade A kidding me?

    Python/Perl/Ruby require interpreters. Scheme and Lisp are frequently run within interpreters. "stand-alone executable" require HARDWARE. Any programming system requires *something* underneath it unless you are programming in a purely physical system like an automated abacus with mechanical gears that buzz and whirr.

    Programming languages are defined by their Turing completeness: can they do things repeatedly, can they assign values to memory locations and perform some basic set of operations (nand works nicely), can they make decisions. Everything else is fluff.

    Perl has "fluff" that handles regular expressions very well.

    Python (and others) have "fluff" that make networking and database ops easy.

    R has "fluff" that makes it terribly convenient to work with data.

    Matlab has "fluff" that makes it very easy to do numerical methods programming.

    Mathematica has "fluff" that makes it very easy to do symbolic computation.

    Each and every one of these, and most well-known languages, with all their warts and beauty marks are Turing complete and are deserving of the term "programming language".

    Regards,
    Mark

  10. The R language and its uses by golodh · · Score: 5, Informative
    I'll pitch in because R deserves better than the usual Slashdot cocktail of random ignorance and immature jokes.

    The R language (yes, it's a language; an interpreted languages is a language too) has developed as the language of choice by statisticians (both academics and sundry statistical researchers) around the world as their main computer language. It is used in those cases where researchers feel the need for customized computations rather than the use of a package like SAS or SPSS.

    The reason that R has become popular is due to a snowball effect and history. It started as a FOSS re-implementation-from-scratch of the "S" language designed for statistical work at Bell labs (see http://en.wikipedia.org/wiki/S_(programming_language). Some academics and researchers of repute used it (the S language) because at that time (1975) it was very innovative and far better than most alternatives, and others followed. The S language gained a measure of acceptance among statisticians. Then when R became available the cycle intensified because of the much improved availability of the interpretor and its libraries. This cycle continued to the point that by now probably most professional statisticians use it.

    As far as I can see, the R language isn't especially sophisticated or elegant, and may strike people used to more modern languages as a bit repugnant. It does however excel in three respects:

    (a) it allows for easy access of Fortran and C library routines

    (b) it allows you to pass large blobs of data by name

    (c) it makes it easy to pass data to and from your own compiled C and Fortran routines

    The first reason is particularly important because it allows one to use e.g. pre-compiled linear algebra package like LAPACK, or Fourier Transforms, or special function evaluations and thereby gain execution speeds comparable to C despite being an interpreted language (just like Matlab, Octave, Scilab, Gauss, Ox and suchlike): the hard work is carried out by a compiled library routine which is made easily accessible through the interpreted language. Any algorithm needed in statistics that's available as C or Fortran code can be linked in and called without too much effort.

    The second reason is important because it slows down execution much less than any pass-by-value interpreted language would, and it allows you to change data that is passed into a function.

    The third reason is particularly important because it helps researchers be more productive. Reading in your data, examining it, graphing it, tracing outliers and cleaning them up is best done in an interactive environment in an interpreted language. Coding such things in C or Fortran is an awful waste of time, and besides, researchers aren't code-monkeys and don't enjoy coding inane for-loops to read, clean, and display data. Vector and matrix primitives are far more powerful, and usually preferable unless they are so inefficient that you have to wait for the result. However, there are times when you just need to carry out standard algorithms (linear algebra, calculation of mathematical or statistical functions) or simply time-consuming repetitive algorithms that run so much faster in a genuine compiled language. You could start out by coding the algorithm in an interpreted language to check if it's working, and then isolate the computationally expensive part and code it up in C or Fortran. Using R (or Matlab or Scilab) you can *call* the compiled subroutine, pass it your (cleaned) data, and get the result back in an environment where you can easily analyze it.

    That's why languages like R, Matlab, Scilab, Octave, Gauss, and Ox are so productive: you get the best of both worlds. Both the convenience, interactiveness, and terseness of a high-level interpreted language and the speed of compiled languages.

    So why R, and why not Gauss or Matlab or whatever?

    Well, part of that is cultural. If you're an econometrician you'll have been weane

    1. Re:The R language and its uses by verySmartApe · · Score: 5, Informative

      I second that. R is terribly useful for the wide variety of libraries available and esoteric statistical procedures. But you would *never* want to write a long/complex program in R.

      As you say, it's most convenient to work in some other language that's actually designed to be scaleable, object-oriented, and easy to debug. It's usually straightforward to call R libraries when you need them. I find that python+scipy+rpy is an almost ideal environment for day to day scientific programming.

  11. Re:Based on S by Anonymous Coward · · Score: 5, Interesting

    I wish it had a more googleable name. It's hard to search for help. The signal to noise ratio is low.

  12. Re:Only for certain kind of analyst... by PachmanP · · Score: 5, Funny

    So we can the financial crisis on idiots who don't understand that GIGO applies in EVERY computer language?

    No, but we can the dropping of verbs on idiots who don't understand that they apply to EVERY sentence!

    --
    You're thinking small. Why miniaturize the laser, when we could instead enlarge the sharks? -John Searle
  13. Re:Based on S by spud603 · · Score: 5, Informative

    Tell me about it. Try this:
    http://www.rseek.org/

  14. Re:Only for certain kind of analyst... by Kyle3om · · Score: 5, Insightful

    The flowchart programming of labview is a pain in the butt for many looped programs and programs with complicated timings. Mablab is easier for most things (and more powerful) if you can get your external equipment to work with it without jumping through hoops.

  15. It is a pain in the ass to change. by pavon · · Score: 5, Informative

    Say you realize that you need to check for another corner case that you forgot, or need to extend a function for another purpose, or whatever. In any other language, you would type a few lines of code and be done with it. Not with labview. With labview you have to move things around to make room for the new code, disconnect wires and reconnect them. NI has added stuff into the newer version to help with this (auto growing, etc) but it still turns into a mess in short order.

    Other things are just easier to type than to draw, and also easier to read in text then as a schematic, like equations. So much so that they have added the ability to type portions of the code, but the amount of setup that you need to do with a code block often defeats the time benefit you get from using it.

    As someone who likes "clean code" I find LabView much more tedious and time consuming to keep neat, and when dealing with other coders that are not as picky, I find that their LabView code is much messier and harder to read than Java or C code by the same developer.

  16. Re:Only for certain kind of analyst... by Undead+Waffle · · Score: 5, Interesting

    I use LabView on a daily basis. I hate it.

    My coworkers like it and what they seem to have in common is that they either don't know any other languages or aren't proficient in them.

    It is a language that aims to be very simple by removing as much typed code as possible. Because of this you will spend stupid amounts of time moving little wires around and trying to make your code not look like a tangled mess. And good luck changing it later.

    Since there are no functions and the only way to reuse code is to put it in a different file people tend not to do this. So if you want to use part of someone else's code you will usually have to copy and paste into a different file and spend a bunch of time reconnecting wires and dealing with references to variables you won't have access to in the new file.

    The visual style is also, in my opinion, much harder to read than typed code. If I'm trying to figure out some sort of formula it's easier to read it as text than try to figure out where all these wires are coming from that are connected to little "+" and "-" terminals. Also, since comments take space they tend to be short and are usually missing in more complicated sections because it's harder to route the wires around them. And control structures quickly make code virtually unreadable.

    There's also the part about writing most of your code with a mouse. Do you really enjoy having to navigate through a series of menus to do anything?

  17. Re:Labview sucks the most by Undead+Waffle · · Score: 5, Interesting

    It has plenty of other annoying behaviors.

    If you try to access an array element out of range it just gives you the default value for that data type rather than giving some indication that something is wrong.

    There is an option to automatically build an array as the output of a loop, but no way to make it *not* add a value to the array. Like when you hit a terminating condition for the loop or some value you want to skip. If you have these situations you either have to modify the array afterwards or build the array manually.

  18. Re:Based on S by digitig · · Score: 5, Funny

    It could be worse. Try searching for the natural language processing system "Lolita".

    --
    Quidnam Latine loqui modo coepi?