Slashdot Mirror


R Throwdown Challenge

theodp (442580) writes "'R beats Python!' screams the headline at Prof. Norm Matloff's Mad (Data) Scientist blog. 'R beats Julia! Anyone else wanna challenge R?' Not that he has anything against Python, Matloff adds, but he just doesn't believe that Python or Julia will become 'the new R' anytime soon, or ever. Why? 'R is written by statisticians, for statisticians,' explains Matloff. 'It matters. An Argentinian chef, say, who wants to make Japanese sushi may get all the ingredients right, but likely it just won't work out quite the same. Similarly, a Pythonista could certainly cook up some code for some statistical procedure by reading a statistics book, but it wouldn't be quite same. It would likely be missing some things of interest to the practicing statistician. And R is Statistically Correct.'"

185 comments

  1. Can't use it by smittyoneeach · · Score: 5, Funny

    Nothing with a name that verbose can possibly be any good.

    --
    Get thee glass eyes, and, like a scurvy politician, seem to see things thou dost not.--King Lear
    1. Re:Can't use it by dmbasso · · Score: 1

      Like... hmm... C?

      --
      `echo $[0x853204FA81]|tr 0-9 ionbsdeaml`@gmail.com
    2. Re:Can't use it by Anonymous Coward · · Score: 1

      I'm waiting for R++ myself...

    3. Re:Can't use it by rudy_wayne · · Score: 2

      R#

    4. Re:Can't use it by Anonymous Coward · · Score: 0

      Have you hear of these exceptional languages?

      w
      H
      O
      o
      S
      H

    5. Re:Can't use it by FatdogHaiku · · Score: 3, Funny

      Is this the programming language of Pirates?
      Is this the programming language for Pirates?
      Is this the language for programming Pirates?
      Arrr...

      --
      You have the right to remain sentient. If you give up the right to remain sentient, you will be elected to public office
    6. Re:Can't use it by Anonymous Coward · · Score: 0

      I'm waiting for the R vs D wars.

    7. Re:Can't use it by Anonymous Coward · · Score: 0

      Do we send ASM programmers to the hague...

    8. Re:Can't use it by FatLittleMonkey · · Score: 1

      Posting to undo stupid.

      --
      Science is all about firing a drunk pig out of a cannon just to see what happens.
    9. Re:Can't use it by the_B0fh · · Score: 1

      I would wait for the next version...

      R2D2...

    10. Re:Can't use it by Mitchell314 · · Score: 1

      R.net?

      --
      I read TFA and all I got was this lousy cookie
  2. Hard to believe in these figures by CRCulver · · Score: 5, Funny

    And R is Statistically Correct.

    I don't see any margin of error. This claim is scientifically worthless.

    1. Re:Hard to believe in these figures by Anonymous Coward · · Score: 0

      only 1 in 1,225,461 people use it anyway.

    2. Re:Hard to believe in these figures by Anonymous Coward · · Score: 0

      I'd like to see a double-blind study on the chef claim, too.

    3. Re:Hard to believe in these figures by I'm+New+Around+Here · · Score: 2

      It dices. It chops. It purees. It makes my food taste better, to a not insignificant amount.

      Any other claims you want to hear from a chef*?

      .
      *Note: Worked in several restaurants during and after high school. Now I occasionally cook or make deserts at home.

      --
      If you think I voted for Trump because of this post, you're wrong. I voted for Dr. Jill Stein of the Green Party. Again.
    4. Re:Hard to believe in these figures by fuzzyfuzzyfungus · · Score: 1

      I'd like to see a double-blind study on the chef claim, too.

      I'm pretty sure that food, like music, has a fringe of...enthusiasts...who would tell you that double-blind studies just ineffably blunt the terroir in some more or less mystical way (which is of course the real reason why they have trouble performing above chance), rather than let base materialism and the plebian theory that functionally identical outcomes can be produced by a variety of means sully the transcendent subtlety of their experience.

    5. Re:Hard to believe in these figures by Culture20 · · Score: 1

      I enjoy having my steak prepared the same way every time, but I would balk at eating a 100% reproduced steak.

    6. Re:Hard to believe in these figures by gordo3000 · · Score: 1

      Like wine, or food, or music, if you do a double blind study you may very well end up with very different preference rankings. It doesn't mean the outcomes are equivalent though. Even dishes that are usually simple are made with very different spices and ingredient balances by different chefs, and frankly, very different ingredients. Even sushi, one of the simplest foods available (literally, slice raw fish, put it on rice which has some sugar and vinegar added to it) can be markedly different. And usually, the difference is the chef being able to pick out the higher quality cut of meat (I'm assuming traditional sushi, not the westernized version with 25 ingredients and crazy names). Even the same chef can't reproduce the same outcomes, as there are factors that are beyond his control (the drink I pair the food with, the temperature it ends up being at when I eat it, these things matter).

      The argentinian Chef won't mess up the sushi because he is argentinian, he will mess it up because he isn't as sensitive to the exact ingredient balance and isn't as knowledgable about the cuts of fish that are best when served raw. And most likely, he will also mess up because he doesn't have a good understanding of the order in which to serve the sushi if it is a chef plate. But this is the equivalent to me writing stat software. Of course R will be better, I'm nothing more than a 2-bit coder who hates error handling. I would get the code flow wrong, would probably do things in horribly suboptimal ways, etc, etc. It doesn't matter that I understand statistics quite well.

    7. Re:Hard to believe in these figures by arglebargle_xiv · · Score: 1

      *Note: Worked in several restaurants during and after high school.

      Saying "would you like fries with that" doesn't really count as working in a restaurant though...

    8. Re:Hard to believe in these figures by Anonymous Coward · · Score: 0

      But is it Technically Correct? Because that's the best kind of correct.

    9. Re:Hard to believe in these figures by I'm+New+Around+Here · · Score: 1

      No, I worked in actual restaurants, with menus and tablecloths and everything. The real "doesn't count" part is that generally I was a dishwaher or busboy, and only did the prep-cook work as it was needed. But I didn't want to confuse the issue at the expense of a fun post.

      --
      If you think I voted for Trump because of this post, you're wrong. I voted for Dr. Jill Stein of the Green Party. Again.
    10. Re:Hard to believe in these figures by cellocgw · · Score: 1

      *Note: Worked in several restaurants during and after high school. Now I occasionally cook or make deserts at home.

      We have enough arid lands already, you insensitive clod!

      --
      https://app.box.com/WitthoftResume Code: https://github.com/cellocgw
    11. Re:Hard to believe in these figures by I'm+New+Around+Here · · Score: 1

      D'oh!

      I forgot to check that. "Just remember that 'dessert' has two s's because you want to have two servings."

      Thanks for catching that. :^)

      --
      If you think I voted for Trump because of this post, you're wrong. I voted for Dr. Jill Stein of the Green Party. Again.
    12. Re:Hard to believe in these figures by frig.neutron · · Score: 1

      why?

  3. Bad analogy by Florian+Weimer · · Score: 5, Insightful

    An Argentinian chef is more likely to make great sushi than a Japanese automotive engineer.

    You generally want to use programming languages designed by experienced programmers (even better, experienced language designers) who work closely with subject matter experts. Left to their own devices, experts are likely to get a lot of things wrong, and if the language is sufficiently popular, you are stuck with their mistakes for a long time to come.

    1. Re:Bad analogy by Glock27 · · Score: 5, Interesting

      Exactly. Julia will eat R for lunch soon enough, I think. It's an elegant, well designed and efficient language. It's only been around for a couple of years, and has a very vibrant and rapidly growing community.

      Check it out for yourself: The Julia Language Homepage. It's got a lot to offer anyone with an interest in mathematics, including statisticians. It's based on the LLVM, and interfaces trivially with C libraries - plus it's a very fast language in it's own right, unlike R or Python.

      --
      Galileo: "The Earth revolves around the Sun!"
      Score: -1 100% Flamebait
    2. Re:Bad analogy by retchdog · · Score: 5, Interesting

      my friend uses julia, and every few weeks complains about some bug. the other day he mentioned that the latest release broke Bernoulli sampling (wtf?). the others have been pretty fundamental too.

      this is a serious problem, of course. the other one is lack of libraries. R is an abysmal pile of shit, but at least it's a standard; pretty much 95%+ of applied stats is at least partially supported by someone's hacked-up library/package. julia is far, far short of that, and it appears that much of its community is more interested in pretty graphics, meta-wankery, and interface methodology than actual working statistics (not that there's anything wrong with that per se).

      yeah, yeah, "fix it yourself," and it's on my list to write at least a basic survival analysis package for it. but i wouldn't blame anyone for not using it, and i wouldn't recommend it for doing stats as it is now.

      --
      "They were pure niggers." – Noam Chomsky
    3. Re:Bad analogy by KingOfBLASH · · Score: 1

      Using three lines of code I can do a regression in R and get the output, including loading the data.

      Python? Fuhgeddaboutit. Can do, but with a lot more code.

      Of course, if you're looking to do stuff you'd expect of a normal scripting language, R falls flat on its face.

      The solution? R + Python. They talk to each other quite nicely, and you can get the best of both worlds.

    4. Re:Bad analogy by Gaygirlie · · Score: 3, Funny

      my friend uses julia, and every few weeks complains about some bug.

      He should tell Julia to wear protection and be more careful with who she spends time with so as not to catch so many bugs.

    5. Re:Bad analogy by K.+S.+Kyosuke · · Score: 1

      How much R package code is written in R? Would it be such a problem to take an R parser and generate Julia code out of it as a first iteration? Then, people could refactor it - if necessary - while keeping the first version around for regression testing. Even if the original R APIs are horrible, at least they have the benefit of people being familiar with them, as you rightly point out.

      --
      Ezekiel 23:20
    6. Re:Bad analogy by I'm+New+Around+Here · · Score: 1

      Considering Florian Weimer didn't make an analogy in his post, your post is what happens when a /. geek tries to make an argument based on his own skills in reading comprehension.

      --
      If you think I voted for Trump because of this post, you're wrong. I voted for Dr. Jill Stein of the Green Party. Again.
    7. Re:Bad analogy by tomhath · · Score: 2

      Python? Fuhgeddaboutit. Can do, but with a lot more code.

      Yea, with Python it takes up to nine lines of code to calculate the regression and generate a plot

    8. Re:Bad analogy by Anonymous Coward · · Score: 0

      Two of those lines are import statements you would only need to do once if doing a lot of related work (or just put in your start-up script or use ipython which pulls in most of pylab and numpy anyway, so taking zero lines of code). It looks nice to define things on a separate line, but you don't really need to have a separate line just for an arange definition or making the points need to plot the line. And different plotting settings mean you don't need the show command. You end up back to about three lines of code: one to define/load your data, one to run the regression, and one to plot it.

    9. Re:Bad analogy by KingOfBLASH · · Score: 3, Informative

      You're just getting a plot. I'm talking about output that looks like this:


      Call:
      lm(formula = new_day_return ~ prior_day_return + rsi_under_10 +
              rsi_under_20 + rsi_under_30 + rsi_over_70 + rsi_over_80 +
              rsi_over_90 + fourteen_day_rsi, data = mydata5)

      Residuals:
            Min 1Q Median 3Q Max
          -100 -1 0 1 205700

      Coefficients:
                                            Estimate Std. Error t value Pr(>|t|)
      (Intercept) -9.845e+01 3.742e+02 -0.263 0.792
      prior_day_return -4.143e-04 3.434e-03 -0.121 0.904
      rsi_under_10 -1.916e-01 3.798e+00 -0.050 0.960
      rsi_under_20 2.195e-02 1.447e+00 0.015 0.988
      rsi_under_30 -2.291e-01 6.915e-01 -0.331 0.740
      rsi_over_70 -2.364e-01 3.348e-01 -0.706 0.480
      rsi_over_80 5.135e-03 4.820e-01 0.011 0.991
      rsi_over_90 7.162e-03 8.650e-01 0.008 0.993
      fourteen_day_rsi 4.193e-04 3.434e-03 0.122 0.903

      Residual standard error: 163.7 on 1581663 degrees of freedom
          (137 observations deleted due to missingness)
      Multiple R-squared: 5.397e-07, Adjusted R-squared: -4.518e-06
      F-statistic: 0.1067 on 8 and 1581663 DF, p-value: 0.999

    10. Re:Bad analogy by Anonymous Coward · · Score: 0

      > You generally want to use programming languages designed by experienced programmers (even better, experienced language designers)

      That's why Python has such a good design... oh, wait!

    11. Re:Bad analogy by Anonymous Coward · · Score: 0

      C++ & java are good proof of that problem.

    12. Re:Bad analogy by Anonymous Coward · · Score: 0

      An Argentinian chef could cook sushi just fine. I would expect a Japanese chef could make Empanadas just fine too. I don't see how nationality or race would make any difference at all.

      The skill of the chef, and the level of experience making the dish in question, are going to be much more significant determinants of the outcome.

    13. Re:Bad analogy by professionalfurryele · · Score: 4, Insightful

      Sorry but I use both R and python in my work as a biomechanist and while I love working with python and hate working in R, R is not only less verbose for this task, but it is more consistent, intuitive and better documented. Very few languages beat python for simple, easy to read code, but it is not up to the task of doing general purpose statistics. To see why this is the case consider a problem with that blog post. All the diagnostic plots I need to do to check the regression are missing, no qq, no cook's, not even something simple like fitted vs. residual. Now consider what happens when I notice that while the fit is decent the residuals depend on what subject I'm looking at and I need to vary the error term. Or need to switch to a mixed effects model because there is clearly a dependence on the intercept by subject.
      Seriously when i say I hate R, I mean it. The code is ugly, it can be hard to read and woe betide the poor git who makes the mistake of needing a plot more complicated that something lattice can do. It is still better than python for statistics.

    14. Re:Bad analogy by Antique+Geekmeister · · Score: 1

      Like the f2c toolkit, for converting Fortran to C?

      I don't think you could write he parser in R, or in Julia.

    15. Re:Bad analogy by Anonymous Coward · · Score: 0

      "woe betide the poor git who makes the mistake of needing a plot more complicated that something lattice can do."

      Base R can give you almost any plot you want.

    16. Re:Bad analogy by K.+S.+Kyosuke · · Score: 1

      Or f2cl? ;-) I don't see a reason why one shouldn't be able to write the parser in Julia. It seems perfectly equipped even for such tasks. It even has macros, come to think of it.

      --
      Ezekiel 23:20
    17. Re:Bad analogy by fuzzyfuzzyfungus · · Score: 1

      Given that these languages are (primarily, obviously anything Turing-complete can be turned to the same purposes as anything else, if somebody feels like it) used for statistics work, I'd be inclined to wonder whether that is the easiest or best way to go about it:

      If something is already implemented in R, and you want to more or less blindly feed it a new target, or re-run it to see how it works, R was apparently not broken enough to stop it, because it's already done.

      If you want to implement some, currently unsupported, aspect of statistics in Julia, with API or binary compatibility with R not a consideration, you could potentially end up in a situation where being reasonably sure that your translated version works, does what it is supposed to, and is vaguely human readable might take longer or be more difficult, or both, than starting with the math you wish to implement and building something non-broken from scratch.

    18. Re:Bad analogy by K.+S.+Kyosuke · · Score: 1

      Julia isn't strictly numerical. It sure as hell isn't "primarily for statistics work". It has a numerical bent, but so far I haven't seen any limitation in the sense that something general and non-numeric in it would be possible (in the sense of Turing completeness) but impractical. Indeed, the very fact that Julia has been designed with support for Lisp-like macros in mind should be a hint to you that perhaps expecting it to have at least generous facilities for manipulating and transforming syntactic trees and structures is not entirely unwarranted. The only obstacle I see is a dearth of parsing tools in (or for) it, but that is the least of my worries (the thought of OMeta immediately comes to my mind).

      --
      Ezekiel 23:20
    19. Re:Bad analogy by Anonymous Coward · · Score: 0

      He totes did, and so did GP. Reading is fundamental.

    20. Re:Bad analogy by Anonymous Coward · · Score: 1
      Then you just need to use a different package not meant for the people who want a quick single regression, e.g. use statsmodels. The example there is rather verbose, as some people prefer that, especially when learning, but you can easily do a less verbose version similar to R:

      results = smf.ols('Lottery ~ Literacy + np.log(Pop1831)', data=dat).fit()
      print results.summary()

    21. Re:Bad analogy by professionalfurryele · · Score: 1

      You can. For me the primitives are a pain to work with compared with matplotlib. Not that anything I've used has good 2D primitives for plotting, just gradations of less crappy.

    22. Re:Bad analogy by Jmstuckman · · Score: 1

      Although much R package code is written in R, many of the important bits are living in FORTRAN libraries (many of which date back to the 1980s) which are linked into the packages.

    23. Re:Bad analogy by Anonymous Coward · · Score: 0

      R is an abysmal pile of shit

      Could be worse. Could be MATLAB.

    24. Re:Bad analogy by KingOfBLASH · · Score: 1

      OK. How about loading of data?

      In R I just type mydata - read.table("./foo", header=TRUE, sep=",")

      What about messing around with models? Python you're either executing your script over and over again, or you're using it in interactive mode, and it gets a bit messy.

      I use R because it seems easiest to me. If you can make my life easier with Python, I'm all ears...

      What would you recommend?

    25. Re:Bad analogy by kukulcan · · Score: 1

      Come on, you haven't looked at Python for this kind of work have you?

      R as a lot more libraries, that cover specific needs, but the basics (and more) are all covered in Python, and are extremely easy to use.

      You need: Python + Numpy (or Scipy) + statsmodels + Panda.
      Or get a pre-packaged distribution that has all that, like Anaconda or Enthougt (haven't used them).

      As for your question, you need Pandas, which is similar to R's data tables: http://pandas.pydata.org/

      import pandas as pd
      df = pd.read_csv('foo.csv', header=[0], sep=',')

      Pretty similar.

    26. Re:Bad analogy by LetterRip · · Score: 1

      import pandas as pd

      df = pd.read_csv("./foo", header=True, sep = ',')

    27. Re:Bad analogy by LetterRip · · Score: 1

      Look at ipython notebook,

      it is a lot like the workflow for mathematica or R.

      Look at pandas + scipy stack.

      pandas replicates the functionality of R dataframes but integrates many features found in external R packages in a beatiful and intuitive way.

      notebook + scipy stack (scipy, numpy, sklearn, matplotlib w seaborn or use ggplot if you prefer) + pandas is enough to largely eliminate the need for R for most people doing machine learning or statistical.analysis (there are still occassional times when I need something from R but it is rare).

    28. Re:Bad analogy by CadentOrange · · Score: 1

      With judicious use of semicolons, you could fit all that into a single line.

      You might have to scroll horizontally a lot, but it's still a single line!

    29. Re:Bad analogy by HiThere · · Score: 1

      Julia is an excellent design for a specific range of problems. I was considering using it for a couple of days, so I looked over the design. It is good for handling matricies of identical types of element doing the same thing on each entry. This is a pretty broad class of problem, but it's far from descriptive of all problems, and even within that class I'm skeptical that they will ever be able to optimise some of the operations.

      OTOH, I must admit that I didn't even consider using R. I wasn't considering a statistical problem. Julia is plausibly optimal over a much larger portion of problem space than is R. (Note that I'm not talking about the current implementation of either, but rather the implementation as it approaches it's design limitations.)

      That said, for the areas where R is more nearly optimal, I would expect it to be shorter and clearer. This is a side effect of it's narrower focus of coverage of problem space.

      --

      I think we've pushed this "anyone can grow up to be president" thing too far.
    30. Re:Bad analogy by Anonymous Coward · · Score: 0

      Julia is homoiconic. You can write a metacircular evaluator in it.

    31. Re:Bad analogy by Anonymous Coward · · Score: 0

      They are both quite capable of doing the "basics." If all you want to do is simple stuff, with a pretty expansive definition of simple, then it doesn't matter which one you use, as both have options for simple syntax (including Python packages that use essentially R syntax). Pick which ever one you are familiar with, or like more, or is just already installed. If you are doing something more advanced, then like nearly any general programming problem, it is going to come down to which one has the best library for your specific problem already made, and sufficiently documented. Otherwise, neither language is going to make your life magically easier.

      Needing something like loading various formats from basic separated text files, to other IDE/work environments (another already recommended ipython notebook) is something any common language is going to have. What languages can't you open a csv file in one line?

    32. Re:Bad analogy by Beck_Neard · · Score: 1

      > It is good for handling matricies of identical types of element doing the same thing on each entry.

      Actually, that's MATLAB. Julia does not give matrices any special treatment - it has a type system that is rich enough that you can define an entire matrix domain-specific-language inside it (which is exactly what they did - Julia's matrix operations are defined entirely in Julia itself, yet are still blazing fast because they call external libraries). http://julia.readthedocs.org/e...

      Plus, whereas MATLAB kind of forces you to do vector operations, in Julia you can do it either way. You can do it the C way if you like, or the MATLAB way. The 'right' way depends on what you want to achieve (code clarity, performance, etc.)

      The benefit of this is that you can define a language for dealing with data (similar to R) inside Julia. That's what the DataFrames.jl package provides.

      --
      A fool and his hard drive are soon parted.
    33. Re:Bad analogy by DexterIsADog · · Score: 1

      Yeah? Go back and read it again. But sober this time. Put the cherry schnapps back in mom's liquor cabinet.

    34. Re:Bad analogy by Anonymous Coward · · Score: 0

      If only there was a integration package for using Julia in R like Rcpp (which is extremely good), then people who depend on R's large set of libraries and have an existing code base could start migrating.

    35. Re:Bad analogy by arglebargle_xiv · · Score: 1

      An Argentinian chef is more likely to make great sushi than a Japanese automotive engineer.

      There's an even closer-to-food analogy for this: If you want a good Italian pizza, get a Greek to make it. I have no idea why this works, but the best Italian pizzas always tend to be made by someone called Nikos or Costas.

    36. Re:Bad analogy by I'm+New+Around+Here · · Score: 2

      Yeah? Go back and read it again.

      OK. FW said:

      An Argentinian chef is more likely to make great sushi than a Japanese automotive engineer.

      You generally want to use programming languages designed by experienced programmers (even better, experienced language designers) who work closely with subject matter experts. Left to their own devices, experts are likely to get a lot of things wrong, and if the language is sufficiently popular, you are stuck with their mistakes for a long time to come.

      Upon rereading it, I still don't see an analogy. So let's break it down and verify.

      An Argentinian chef is more likely to make great sushi than a Japanese automotive engineer.

      Not an analogy. A statement of fact, with some supposition. It is possible Japanese auto engineers are all required to be master sushi artists, but unlikely. Still not analogy.

      You generally want to use programming languages designed by experienced programmers

      Again, not an analogy. A statement of personal opinion, which may or may not be factually accurate.

      (even better, experienced language designers)

      More personal opinion. But it certainly makes sense.

      who work closely with subject matter experts.

      Conclusion of personal opinion.
      Still not an analogy.

      Left to their own devices, experts are likely to get a lot of things wrong,

      Again, supposition used to bolster an argument that supports WF's opinion mentioned previously.
      Still not an analogy.

      and if the language is sufficiently popular, you are stuck with their mistakes for a long time to come.

      A final statement of fact. It does assume that mistakes are great enough they can't be fixed in a simple revision, but not so severe that they render the programming language unusable.
      That is still not an analogy.

      But sober this time. Put the cherry schnapps back in mom's liquor cabinet.

      I haven't touched your mother's liquor in several months. Besides, she prefers rum when we do body shots.

      --
      If you think I voted for Trump because of this post, you're wrong. I voted for Dr. Jill Stein of the Green Party. Again.
    37. Re:Bad analogy by DexterIsADog · · Score: 0

      Dude, you're an idiot. In more ways than I'm inclined to enumerate.

      Take your meds, and go to bed.

    38. Re:Bad analogy by StripedCow · · Score: 1

      How can a non-functional language be _the_ platform for mathematical computing?

      --
      If Pandora's box is destined to be opened, *I* want to be the one to open it.
    39. Re:Bad analogy by StripedCow · · Score: 1

      (While you may be right, following slashdot conventions the analogy was intended as a car-analogy, not a food-analogy.)

      --
      If Pandora's box is destined to be opened, *I* want to be the one to open it.
    40. Re:Bad analogy by I'm+New+Around+Here · · Score: 1

      I won't dispute that, but I still know what an analogy is.

      --
      If you think I voted for Trump because of this post, you're wrong. I voted for Dr. Jill Stein of the Green Party. Again.
    41. Re:Bad analogy by Anonymous Coward · · Score: 0

      An Argentinian chef is more likely to make great sushi than a Japanese automotive engineer. Not an analogy. A statement of fact, with some supposition. It is possible Japanese auto engineers are all required to be master sushi artists, but unlikely. Still not analogy.

      Most analogies are statements of fact... that is the whole point, to translate a problem that is complex or obscure into a situation more familiar or approachable, resulting in an analogous statement of fact that illustrates something in the original problem. It might not be the greatest analogy, and others might make arguments that it fails to connect to the original problem or that it isn't even correct in the analogous case, but that doesn't change that it is an analogy.

      When discussing networking and difference between latency and bandwidth, someone inevitably says. "But a semi-truck carries freight a lot faster than a race car." It is a statement of fact, but clearly meant to be an analogy. Unless you think people like to drop random, irrelevant statements of facts as part of a reasonable argument.

    42. Re:Bad analogy by I'm+New+Around+Here · · Score: 1

      An Argentinian chef is more likely to make great sushi than a Japanese automotive engineer.

      Not an analogy. A statement of fact, with some supposition. It is possible Japanese auto engineers are all required to be master sushi artists, but unlikely. Still not analogy.

      Most analogies are statements of fact... that is the whole point, to translate a problem that is complex or obscure into a situation more familiar or approachable, resulting in an analogous statement of fact that illustrates something in the original problem. It might not be the greatest analogy, and others might make arguments that it fails to connect to the original problem or that it isn't even correct in the analogous case, but that doesn't change that it is an analogy.

      Yes, you are correct in all this. But that isn't what the OP did, it is what the story submitter did. Theodp wrote this analogy.

      An Argentinian chef, say, who wants to make Japanese sushi may get all the ingredients right, but likely it just won't work out quite the same. Similarly, a Pythonista could certainly cook up some code for some statistical procedure by reading a statistics book, but it wouldn't be quite same.

      The OP Florian Weimer replied that the analogy was flawed, and gave a reason why. I don't see his response as an analogy in itself.

      When discussing networking and difference between latency and bandwidth, someone inevitably says. "But a semi-truck carries freight a lot faster than a race car." It is a statement of fact, but clearly meant to be an analogy. Unless you think people like to drop random, irrelevant statements of facts as part of a reasonable argument.

      Again, this is basically what the summary did. Florian Weimer simply responded to it.

      Maybe I'm wrong on whether the post was just a response to an analogy, or a furtherance of the analogy. It seems to me though that Florian would have had to re-write the original analogy, incorporating his version, for his line to be an analogy itself.

      Anyhow, thanks for taking the time to discuss this.

      --
      If you think I voted for Trump because of this post, you're wrong. I voted for Dr. Jill Stein of the Green Party. Again.
    43. Re:Bad analogy by HiThere · · Score: 1

      No, that's Julia, at least for the portion of problem space that I was considering. It allows parallel execution over matricies of many kinds of operation. Perhaps it's also Mathlab. I don't know and have never used Mathlab. For my purposes, however, it's parallel execution capabilities were not flexible enough. I've ended up in the process of writing a multi-thread program in D. (In particular I'm using std.concurrency.) I would have preferred a language where more libraries were easily accessible, though in every other way D appears to be a superior language for my part of the problem space.

      --

      I think we've pushed this "anyone can grow up to be president" thing too far.
    44. Re:Bad analogy by Anonymous Coward · · Score: 0

      this is in python, 3 lines if you have the data in a pandas DataFrame

      http://statsmodels.sourceforge.net/devel/examples/notebooks/generated/formulas.html starting at line [8]

      >>> mod = ols('Lottery ~ Literacy + Wealth + Region', data=df)
      >>> res = mod.fit()
      >>> print(res.summary())

      results omitted because of "Filter error: Please use fewer 'junk' characters."

    45. Re:Bad analogy by Beck_Neard · · Score: 1

      A tender suggestion: perhaps you should consider learning Matlab (not mathlab) and/or Gnu octave (a similar language to Matlab, but open source); if nothing else they would give you an idea of what's possible in technical computing (e.g. virtually automatic parallelization of code if you write stuff correctly). For most problems there's no need to get knee-deep in OS parallelization primitives hell.

      --
      A fool and his hard drive are soon parted.
    46. Re:Bad analogy by romons · · Score: 1

      Left to their own devices, experts are likely to get a lot of things wrong

      Like arrays that start at 1. Grrrr. Julia got it wrong too. I mean, who wants to say i-1 everywhere? Even Wirth got this wrong. So did maxima.

      Python got it right, and got lots of other things right too. They just missed the fact that 1/2 > 0. Small price to pay.

      --
      Go to Heaven for the climate, Hell for the company -- Mark Twain
    47. Re:Bad analogy by retchdog · · Score: 1

      a lot of it, but many packages use C routines for tight loops and such.

      frankly i wouldn't trust translated numerical code from a parser without a lot of auditing, especially since R is a clusterfuck of kludges and julia is a moving target under development. there are so many R packages (currently 5,579) written by so many people without much development experience that even a minimal amount of manpowered auditing/revision would be too much, especially since many of the packages are under active development

      a better solution would probably be to have julia provide an R emulator. the speed and memory hits will be substantial, but R is already crappy for speed and memory.

      --
      "They were pure niggers." – Noam Chomsky
    48. Re:Bad analogy by retchdog · · Score: 1

      i couldn't agree more.

      --
      "They were pure niggers." – Noam Chomsky
    49. Re:Bad analogy by HiThere · · Score: 1

      OK. I've looked at Octave. It doesn't seem to like utf8. It makes it difficult to determine the character class of a character within a string. I'll grant that you CAN, but it appears quite difficult to split a string in the way that I need, which includes keeping the "split-at" boundary values. Possibly this is a result of it's inherent capability for parallelism. Mailing list comments as late as 2014 indicate that unicode characters in comments cause problems.

      Etc. It is heavily biased towards numerical and statistical manipulations. That's not what I'm doing. A lot of what I'm doing is inherently serial, but many of the serial steps can be done in parallel. D's std.concurrency seems a MUCH better match than does Octave. And it handles unicode quite well. And it allows "easy" manipulation of strings of utf8 chars, including determining their general character class (in unicode terminology).

      Even with that said, I would like something a bit more dynamic. And there's a reason that I put "easy" in quotes. I wish the Linux version of Objective C were more active and better documented. That might give D some competition. Or that Python were better at handling parallel execution. For some reason languages that SHOULD be better at parallel execution seem to miss the boat. Go should be such a good language, but it doesn't actually implement concurrency. Racket Scheme has all the needed features to program concurrently...but it doesn't implement it. Etc. (Is there a Scheme that actually implements concurrency? Does facilitate handling of utf8 strings [including determination of the general character class of a character]? Is there a decent way to generate program documention [for developers, not for end users].) I'd even consider Vala, except that it doesn't look like it will ever get out of beta.

      The highly specialized languages always make assumptions about what I'm doing that aren't accurate, and this has, so far, always rendered them unusable.

      --

      I think we've pushed this "anyone can grow up to be president" thing too far.
  4. true, but not really because of R itself by Trepidity · · Score: 5, Insightful

    R itself is okay, but even as a long-time user I don't think the language or environment itself is all that much to brag about. What makes it great for statistics is just that statisticians use it, which means that a lot of the packages are written by statisticians. That makes a big difference: recent papers often have R implementations, standard problems have well-maintained R packages for them with all the bells and whistles, etc. As Matloff notes, this means they often have everything that statisticians are looking for, while straightforward textbook implementations you often find in other languages often aren't nearly as thorough in how they handle the statistical models, or only handle some special cases (though there are some really good packages in other languages, just not as many).

    But I don't think that has much to do with R itself being uniquely suited to statisticians. It's used for historical reasons: Bell Labs S was influential in the field way back when nothing like Python or Julia existed, and statisticians started using it because it was a lot nicer than Fortran, which is what other areas of science mostly used back then. GNU R is essentially a free-software workalike for Bell's S, and it's kept most of the community on board through a mixture of existing packages, familiarity, and inertia.

    1. Re:true, but not really because of R itself by Anonymous Coward · · Score: 1

      I'm not a heavy R user, but I do appreciate how functions such as lapply are so easily scaled. I'm a peripheral stats user with some minor programming experience, but I even able to to use the NVidia GPU libraries and then R on Hadoop with minor code changes. For that, I'm happy.

    2. Re:true, but not really because of R itself by jythie · · Score: 3, Interesting

      *nods* who uses a language has more impact on its usefulness then anything inherent to the language. LIbraries, support community, easy of hiring people who both know the language and have domain specific skills, much more important then what kind of sugar the language has.

    3. Re:true, but not really because of R itself by HuguesT · · Score: 3, Interesting

      R has some pretty unique graphing packages. Nothing that I know of matches the way you can do 2D and 3D plots in R. Not Python, not Gnuplot, not Julia, not Matlab, not Excel, not Mathematica, nothing.

    4. Re:true, but not really because of R itself by Trepidity · · Score: 2

      Around here Python's matplotlib has been making some inroads in the plotting category, even among people who use R for the actual data analysis, but it's admittedly not as featureful as the whole suite of R plotting packages.

    5. Re:true, but not really because of R itself by Anonymous Coward · · Score: 0

      Yes, the advantage of R is the plotting.

    6. Re:true, but not really because of R itself by shrewdsheep · · Score: 1

      I definitely concur. The depth of implementations of statistical methods dwarfs that of other languages (with Matlab coming closest). Two more aspects to add: R can be used to program in functional style. Together with being a vectorized language this can make programs more compact while still readable. This was what made me stick with R which is now one of my preferred languages. There is also the micro-DSL called formulas in R. Unless another programming language implements something similar, R will always be superior in specifying and working with statistical models. Escpecially, implementing new statistical methods is made much simpler using this machinery (plus tailor-made data handling: merging, missing data, etc.).

    7. Re:true, but not really because of R itself by Anonymous Coward · · Score: 0

      C is shitty, but used by OSs. Other compiled language idioms would be better from a security perspective (not putting your return pointer on the fucking parameter stack, for one), however not only do OS devs use C, the hardware itself implements the C way of function stack-frames, and even has opcodes specifically for management thereof.

      Thus, in a way you are correct. However, in another way: The fact that hardware can influence software paradigms (and vise versa), you are also wrong.

    8. Re:true, but not really because of R itself by StripedCow · · Score: 1

      A true "statistical" programming language allows the user to define statistical processes in the language and then compute its statistical properties.
      For example:

      x = random() /* a random number between 0 and 1, uniformly distributed */
      y = x*x
      print(E(y)) /* print the expected value of y */

      R is nowhere near that.

      --
      If Pandora's box is destined to be opened, *I* want to be the one to open it.
    9. Re:true, but not really because of R itself by Anonymous Coward · · Score: 0

      I'm sure such constructs exist in other languages too, but there are packages in Python that follow that convention. It is particularly useful in some MCMC work, where a variable can be defined as some random distribution (continuous or discrete), and math operations automatically produce a new distribution.

  5. If R was written by statisticians... by Anonymous Coward · · Score: 0

    Why is there no "r" in statistics?

  6. truth, lies, and statistics by Anonymous Coward · · Score: 0

    " And R is Statistically Correct" doesn't mean anything.

    1. Re:truth, lies, and statistics by Anonymous Coward · · Score: 0

      Yes it does. They are referring to its functional implementations being mathematically correct stemming from the fact there is extensive peer review in the statistics community of the freely available libraries. This is not so much the case with other systems. Hence Statistically Correct. As someone before mentioned, the language is a function of its users.

  7. Meh by hyfe · · Score: 5, Informative
    Statistics major who programmed Python professionally for a few years (and have a MsC in Comp.Sci) ...

    ... this is all posturing and drama, but good on Prof. Norm Matloff for getting some attention. R is rather usefull, has quite a few extremely usefull features as a language, including some of the best list/indices handling I've seen anywhere. Excellent libraries for statistical work, but it also has quite a few the most downright abhorrent language decision I've seen anywhere ever, with the amazingly poor string handling (for a scripted language) topping that list ( http://www.burns-stat.com/page... )

    Python, C, Mathematica and R all have different strengths for mathematical work / numerical calculations though, and using the best tool for the job is what it's about. As always, what the best tool actually is, is also rather subjective, as which tool will best solve a specific task is always dependent on your skill with the different tools. I do agree with professor though, even though there's quite abit of Python hype (python + scipy/matplotlib is amazing) R is not being replaced anytime soon. It's too good at what it's good at.

    --
    "" How about taking the safety labels off everything, and let the stupidity-problem solve itself? """
    1. Re:Meh by Pseudonym · · Score: 1

      [...] using the best tool for the job is what it's about.

      Ah, but from the point of view of a computer scientist, the "best tool for the job" isn't necessarily the best tool that currently exists. R is a fabulous set of well-documented algorithms and linked together with one of the bizarre, poorly-specified and inadequately-documented language with a flaky, abstraction-leaking, poorly-performing implementation.

      I think it's great that R is written by statisticians for statisticians, and that statisticians find it useful. But you shouldn't then be surprised if it doesn't do parallelism, even though many statistical problems on large data sets should intuitively be easy to do in a parallel kind of way.

      Of course Python doesn't beat R; it doesn't even compete in the same space. (Oh, and its formal semantics are almost as bad as that of R.)

      Of course Julia doesn't beat R. It was designed to be a modern SISAL (which in turn was designed to be a modern Fortran). Again, it doesn't pretend do the same thing as R.

      The thing is, the computer science community knows it can do better than R. The problem is convincing the statisticians that a better R is in their best interests.

      --
      sub f{($f)=@_;print"$f(q{$f});";}f(q{sub f{($f)=@_;print"$f(q{$f});";}f});
    2. Re:Meh by PerlPunk · · Score: 1

      I agree with the above about R. But as regards to reliability, I would prefer SAS to R, even though I hate SAS even more than R. Yes, R has lots and lots of features, good documentation, better libraries than any other out there. But sometimes I find discrepancies between R and SAS in performing the same operations, and when I test which is right SAS always seems to win. That is to say that R as an open source platform has the same problems open source platforms tend to have -- buggy code, sometimes inconsistent or barely-there documentation. Vendor-supported software like SAS does have a quality advantage.

      Also, Matlab / Octave rocks when it comes to matrix manipulation. It beats R hands down in working with matrices.

  8. I dislike Python by Anonymous Coward · · Score: 0

    because it is an inferior mish-mash for an up-start generation which was never taught the, "In the end, everything looks like LISP," maxim. And its requirement for particular whitespace offends me as someone who has spent the last decade working with accessibility groups.

    I'm not really sure I see where R fits, though. For basic statistical work, SPSS is good. For advanced statistical work, surely you'd want a general purpose language with cross-language libraries?

    1. Re:I dislike Python by jythie · · Score: 3, Insightful

      Hrm. I never thought about the whitespace requirements in python from an accessibility perspective.

    2. Re:I dislike Python by Pinky's+Brain · · Score: 1

      In the end most people will still use anything but LISP.

    3. Re:I dislike Python by KingOfBLASH · · Score: 3, Interesting

      Believe it or not, most statisticians are not programming wizards.

      Most stats guys use R, matlab, mathematica, or something similar. Even if it takes days to run a program that would take 20 minutes in C. Sort of like how the business guys will use VBA when they need anything, because that's what they know.

      Languages like R are used because they are accessible. And once they reach a critical mass, everyone learns them in a field.

      Sort of like how Fortran just won't die.

    4. Re:I dislike Python by Anonymous Coward · · Score: 0

      SPSS had a bug in repeated measures anova they failed to correct for >20 years. If you cant see the code you can't trust it...

    5. Re:I dislike Python by pla · · Score: 1

      because it is an inferior mish-mash for an up-start generation which was never taught the, "In the end, everything looks like LISP," maxim.

      I have to suspect you as trolling here, because although I do indeed know Lisp (and Scheme, and Tcl) - Very, very little of my code ends up looking anything like Lisp.


      And its requirement for particular whitespace offends me as someone who has spent the last decade working with accessibility groups.

      I will fully agree with you that required whitespace offends me, but that has fuck-all to do with accessibility. Any programming language that doesn't let you write the entire program on one line with zero whitespace (not that you ever should do that, Perl notwithstanding) has some serious damage.


      I'm not really sure I see where R fits, though. For basic statistical work, SPSS is good. For advanced statistical work, surely you'd want a general purpose language with cross-language libraries?

      Statisticians != Programmers. TFA's rant very much looks like the newbie programmer after mastering his first language, who then tries to apply that particular hammer to every problem he comes across. "Damnit, that screw will get pounded in! Yes, I can chop through this 2x4 by striking it repeatedly with the claw-end! Yes, I can trick pure C into supporting something vaguely like an associative array!"

      Good programmers will eventually realize that the job defines the tool to use. Poor programmers will stay trapped forever in an interpreted language with garbage collection. And Statisticians will go to their grave believing that whatever language they learn first counts as the best choice ever; Physicists have the same problem, thus you often see the most powerful supercomputers on the planet running... Fortran.

    6. Re:I dislike Python by fuzzyfuzzyfungus · · Score: 1

      Hrm. I never thought about the whitespace requirements in python from an accessibility perspective.

      I know that Python's approach to whitespace is very...polarizing; but I've always wondered how much it would cause trouble either for people who really loath it, or for specialized situations that tend to crop up under 'accessibility' (where the path from text file to user is likely going through one or more atypical transformations, anywhere from simple contrast bumps up through text to speech or the like).

      Given that the whitespace has to have an unambiguous meaning to the python interpreter, your editor could presumably convert, in either direction, any notation you desire, so long as it covers the same possible meanings (and, ideally, doesn't clash with characters python uses to mean something else, since then it'd have to convert those as well, potentially sending you chasing down the road to something that looks utterly different).

      It's not as though what you see on the screen bears much resemblance to the actual underlying sequence of bits.

    7. Re:I dislike Python by Taxman415a · · Score: 1

      What's the accessibility problem with Python's whitespace? I don't code, but my screen reader reads space, tab, and newline to me just fine. I use VoiceOver.

    8. Re:I dislike Python by BitterOak · · Score: 1

      I'm not really sure I see where R fits, though. For basic statistical work, SPSS is good.

      It's good if you have the money. R is free, while SPSS is fairly expensive, as is its main competitor SAS. I see R as competing not with general purpose languages like Python, but rather with commercial statistics packages like SPSS and SAS. While it may have more of a learning curve than these packages, it is free software, which makes it very attractive for many users.

      --
      If I can be modded down for being a troll, can I be modded up for being an orc, or a balrog?
    9. Re:I dislike Python by phantomfive · · Score: 1

      I have to suspect you as trolling here, because although I do indeed know Lisp (and Scheme, and Tcl) - Very, very little of my code ends up looking anything like Lisp.

      You should try making your code look functional sometime (that is, write it as functions with no side-effects), you might find you have fewer bugs.

      I'm really interested why you combined Tcl with Lisp and Scheme though, those languages don't seem to have much together

      --
      "First they came for the slanderers and i said nothing."
    10. Re:I dislike Python by Anonymous Coward · · Score: 0

      You should try making your code look functional sometime (that is, write it as functions with no side-effects), you might find you have fewer bugs.

      In the real world there are plenty of programmers who have inherited large codebases written in C, PHP, ASP.Net, COBOL, Fortran, etc. Good luck convincing any of them it's a good idea to start focusing on making their code look functional with no side-effects.

      If you're a good programmer and are writing the whole thing all by yourself then I'm sure Lisp is great for that.

      Top programmers can pick languages that are great for the code they write. Average programmers should pick languages that are great for all the code they DON'T have to write, document and support/debug ;).

      If all those libraries, frameworks are written by above average programmers, it means less code written by crappy programmers like me, which means fewer bugs.

    11. Re:I dislike Python by phantomfive · · Score: 1

      Good luck convincing any of them it's a good idea to start focusing on making their code look functional with no side-effects.

      Thanks.

      If you're a good programmer and are writing the whole thing all by yourself then I'm sure Lisp is great for that.

      If you're a great programmer, then you will write great code in any language. The language is less important than the skill of the person using it.

      Top programmers can pick languages that are great for the code they write. Average programmers should pick languages that are great for all the code they DON'T have to write, document and support/debug ;).If all those libraries, frameworks are written by above average programmers, it means less code written by crappy programmers like me, which means fewer bugs.

      If you are writing new code in an existing project, it is up to you whether you want to write functions without side-effects or not. Sometimes it's hard because you have to call a function that has side effects, but in that case you can communicate to anyone who might call your code what the side effects might be (either use comments or name the function in a way that the side-effects are clear, but let them know somehow).

      No one gets to choose the language they will write in unless they are the ones starting the project.

      --
      "First they came for the slanderers and i said nothing."
    12. Re:I dislike Python by pla · · Score: 1

      I'm really interested why you combined Tcl with Lisp and Scheme though, those languages don't seem to have much together

      Although you can force it to behave imperatively, Tcl primarily counts as a functional language (though I have to agree, a bit of an oddball due to its object oriented side).

    13. Re:I dislike Python by countach · · Score: 1

      "If you're a great programmer, then you will write great code in any language. The language is less important than the skill of the person using it."

      This is commonly claimed, but I'm damned if I could ever do anything elegant in perl. I've done lovely stuff in Java, but none of it is as great as what I did in scheme. The right language can make a good programmer better and a better programmer great.

    14. Re:I dislike Python by Pseudonym · · Score: 1

      Statisticians != Programmers.

      Yes, Dr Statistician, I know you're not a programmer, but that thing you're writing is a program, and you will use revision control or you are not working on my project.

      Woah, sorry, had a flashback there.

      --
      sub f{($f)=@_;print"$f(q{$f});";}f(q{sub f{($f)=@_;print"$f(q{$f});";}f});
    15. Re:I dislike Python by Pseudonym · · Score: 1

      If you're a great programmer, then you will write great code in any language.

      If you're a truly great programmer, then you will refuse to write any code in certain languages.

      --
      sub f{($f)=@_;print"$f(q{$f});";}f(q{sub f{($f)=@_;print"$f(q{$f});";}f});
    16. Re:I dislike Python by phantomfive · · Score: 1

      Like Python?

      --
      "First they came for the slanderers and i said nothing."
    17. Re:I dislike Python by phantomfive · · Score: 1

      This is commonly claimed, but I'm damned if I could ever do anything elegant in perl.

      You're not a great programmer.

      --
      "First they came for the slanderers and i said nothing."
    18. Re:I dislike Python by Pseudonym · · Score: 1

      You said it, not me.

      --
      sub f{($f)=@_;print"$f(q{$f});";}f(q{sub f{($f)=@_;print"$f(q{$f});";}f});
  9. Hah by Anonymous Coward · · Score: 0

    His posts are perfectly 'random'. R itself is written in C. Python is also written in C. I can't see why one can get much better statistical correctness in R than what comes from its underlying implementation - in C.

  10. A joke on the subject by kav2k · · Score: 4, Funny

    A joke I've read recently:

    I'm not sure if "R is written by statisticians, for statisticians" is a good thing e.g. "stadiums are built by footballers, for footballers"

    1. Re:A joke on the subject by Anonymous Coward · · Score: 0

      Yeah, not sure I want to hop on a plane made by passengers for passengers...

    2. Re:A joke on the subject by Anonymous Coward · · Score: 0

      Come on - some basic Venn diagrams will add clarity here...

  11. Don't throw down R if you won't talk SAS by Anonymous Coward · · Score: 0

    R may be written for statisticians, but is rightly criticized for lacking the validation that SAS has (which python et al also lack). There's a good discussion here on the subject. And for what it's worth, both R and SAS both lack the tools to easily hook into other systems, which really makes them good ONLY for ad hoc statistics and reports.

    1. Re:Don't throw down R if you won't talk SAS by Trepidity · · Score: 1

      You can't talk SAS unless you've got a big bank account, though. A one-year, individual (single-desktop) license costs upwards of $5,000, which makes it a non-starter for a lot of people. Also, it's not open source.

    2. Re:Don't throw down R if you won't talk SAS by retchdog · · Score: 1

      yes, R is written for people who know what they are doing.

      --
      "They were pure niggers." – Noam Chomsky
    3. Re:Don't throw down R if you won't talk SAS by Pseudonym · · Score: 1

      On the contrary, R is written for people who don't know they're programming. That's why it's such a pain to write maintainable programs in.

      --
      sub f{($f)=@_;print"$f(q{$f});";}f(q{sub f{($f)=@_;print"$f(q{$f});";}f});
  12. R is for statisticians... by Anonymous Coward · · Score: 0

    and pirates.

  13. so different by bucket_brigade · · Score: 1

    Yeah R is so different from Python, I mean everything is the same but not quite and I totally have a point and not just bullshitting because like Japanese sushi and beef Argentinian soup, brocolli.

    1. Re:so different by Anonymous Coward · · Score: 0

      I know what you mean. It really is the broccoli that sets it apart from the rest, tomato cucumber.

      Sausage & potato chutney cracked solder joints!

  14. Who really f-ing cares? by nurb432 · · Score: 3, Insightful

    Use the right tool for the job and stop bashing other tools that were designed for different jobs .

    --
    ---- Booth was a patriot ----
    1. Re:Who really f-ing cares? by Anonymous Coward · · Score: 0

      Who really f-ing cares? people who want to write tools that make people more produce care because these kinds of articles highlight the advantages and shortcomings of using the tool. Also discovering the 'right tool for the job' may not be straightforward so having the aforementioned facts to hand can help.

  15. yawn by Anonymous Coward · · Score: 0

    So a special purpose statistics language beats out python - a general purpose language with lots of varying libraries (its real strength...)

    Thats news? or worthy of some retards crowing ?

    I never heard of R before and as it is statistics I see no need to know much further.

    Next language or equiv I might look at is one that simulates Quantum Computing, as I want to see what applications that computing method is actually applicable to.

  16. Wrong site to note a challenge by Anonymous Coward · · Score: 0

    APK throws challenges to trolls here on hosts. Not a 1 manages to validly topple his points.

  17. Data mining by Anonymous Coward · · Score: 0

    If I had to do some intense statistical analysis, then R is probably a better choice.

    Now, if I have to get data via a feed or web page scraping, manipulate it, clean it, do some sanalysis, display it or feed it to another program, then Python makes all of that much easier and maintainable.

    Back in the old days before all these smancy fancy tools, we used this red book called something like "Mathematical Programming in C" - in the snow; uphill both ways. It had the code and alogrithms to implelent all the stats, engineering, and god knows what - all in C.

    I don't see it on Amazon - or I got the title totally wrong.

    1. Re:Data mining by Anonymous Coward · · Score: 2, Interesting

      You got the title wrong.

      _Numerical Recipes in C_, by Press, W. et al

      http://www.amazon.com/Numerical-Recipes-Scientific-Computing-Edition/dp/0521431085

      IIRC there was also a _Numerical Recipes in FORTRAN_ as well.

      Also see http://www.nr.com/ . I think they only have a single book now called _Numerical Recipes_ and it is in its third edition.

    2. Re:Data mining by plopez · · Score: 1

      The numerical recipes series was much more than algorithms and code. It told you more about the how and *why* of an algorithm. And when as in when it should be used. The commentary alone is enough reason to buy them even if you never actually use any code from them.

      --
      putting the 'B' in LGBTQ+
    3. Re:Data mining by Anonymous Coward · · Score: 1

      The problem with it is that if you include code from that book, you can't publish your source code (at least the portions from that book). Those must be binary. My previous advisor was shocked when I pointed out that bit in the introduction of the book.

  18. I've found the problem... by Anonymous Coward · · Score: 0

    "R is written by statisticians, for statisticians"

    This is primarily why it will never gain widespread adoption, too. Most people aren't statisticians, and probably don't want to be.

    1. Re:I've found the problem... by gnupun · · Score: 2

      "R is written by statisticians, for statisticians"

      Does R invent new syntactic constructs that make it useful for handling/generating statistical data? So far I've not seen any new syntax in R that warrants creating a new programming language -- it's just a rehash of various scripting languages already available.

      From a programmer's perspective, R should just be an easy to use library that you can use in various languages like Python, Julia, Ruby, etc. There's no need to learn new syntax if it's not that new and useful.

    2. Re:I've found the problem... by HuguesT · · Score: 2

      How about the syntax for specifying model?.

      lmfit = lm( change ~ setting + effort )

    3. Re:I've found the problem... by Anonymous Coward · · Score: 0

      well to be fair (though i pretty much agree with you) the R syntax is a lot older than the 3 languages you specified.

    4. Re:I've found the problem... by fuzzyfuzzyfungus · · Score: 2

      Don't forget the influence of history: R wasn't designed for superiority to Python, Julia, and Ruby; but in large part to be a GNU-acceptable implementation of S, which may well have been designed for superiority to APL and FORTRAN; and which has existed since somewhere between the-before-time-when-the-gods-were-young and the start of the Second Trilobite War.

    5. Re:I've found the problem... by Anonymous Coward · · Score: 0

      More than one python package supports similar syntax, e.g. statsmodels, with the only minor issue being you need to put string quotes around it.

    6. Re:I've found the problem... by Anonymous Coward · · Score: 0

      import statsmodels.formula.api as smf
      model = smf.ols('change ~ setting + effort', data=mydata)

  19. So basically R doesn't beat Python, or anything.. by Anonymous Coward · · Score: 0

    unless you're a statistician or interested in writing programs for high-accuracy statistics.

  20. With R... every day is Talk Like A Pirate Day! by TheRealHocusLocus · · Score: 3, Funny

    "Arrrr.... fix yar name 'R' while you may, maties!!"

    I may not have the belly for Deep Statistics but I do know abut Internet Search noise levels. I remember trying to do research on WebDAV (believe me, there is such a thing) only to discover that folks discussing it invariably refer to it as 'dav'. Because saying "Distributed Authoring [and] Versioning" out loud makes you spit out your toothpick. Any attempt to search 'webdav' yielded only the sterile official pages, and attempts to search on 'dav' with other keywords brought up conversations from the community of Disabled American Veterans who also use the term in casual conversation, and have said an awful lot over the years. They occupied 'dav' first.

    Now you may think you can pull off a 'C' where Google seems to pick off relevant results if you combine it with any computery term, but it was not always so. It has taken an incredible saturation of C, and perhaps some special coded cases on Google's part, for this to come about.

    The success of Perl is due in some part to the ability of confused people to obtain help and advice about it merely by searching on its unique spelling.

    So the best way to push this R language is with a refit of the name. Go with the pirate theme, it will sell many more T-shirts than those of silly camels and pearls. But stake out a bit of Keyword Real Estate that presently has a relatively low population density.

    Google search result estimate counts, descending order,
    r --- 2,730,000,000
    ar --- 656,000,000
    arr --- 24,400,000
    arrrrrrrr --- 3,060,000
    arrrr --- 876,000
    aarr --- 638,000
    arrr --- 536,000
    arrrrr --- 405,000
    aaarrrrr --- 267,000
    arrrrrr --- 205,000
    arrrrrrr --- 129,000
    aarrr --- 107,000
    aarrrr --- 107,000
    aaarrr --- 56,600
    aaarrr --- 56,600
    arrrrrrrrr --- 52,400

    Adding arrrs is not enough since talking like a pirate is typically accomplished with a single 'a', so ar+ space is pretty well populated up to ar{5}, it looks like best ratio is around a{3}r{3}. But even choosing the less-optimum and easier to type a{2}r{3} by using 'aarrr' instead of 'r' you have improved the signal to noise ratio by a factor of twenty-five thousand.

    Push the name change firmly and decisively. This means that if anyone mentions 'R' there should be immediate responses that ask, "What AARRR you talking about?" This will inject the proper searchable term into the discussion while it reminds the poster of the name change.

    For an interesting 9 minute lecture that might help sell you on this idea, listen here.

    --
    <blink>down the rabbit hole</blink>
    1. Re:With R... every day is Talk Like A Pirate Day! by TheRealHocusLocus · · Score: 1

      For an interesting 9 minute lecture that might help sell you on this idea, listen here.

      Certificate warnings freak you out? Try this link instead, now with matching wildcard, calmer seas and less mogul.

      --
      <blink>down the rabbit hole</blink>
    2. Re:With R... every day is Talk Like A Pirate Day! by wisnoskij · · Score: 2

      It is scary sometimes how much control the limitations of Google Search has over our lives.

      For example, the best anti pirating system you can use for any game or film is to name it with less than 3 characters. It then becomes very hard to search for it.

      It took me days to find "9" (and I know others who had similar problems), and I think I never did end up seeing "B".

      --
      Troll is not a replacement for I disagree.
    3. Re:With R... every day is Talk Like A Pirate Day! by Anonymous Coward · · Score: 0

      WebDAV is at the core of Subversion source control web access. I've not seen anything else use it in the last 8 years.

    4. Re:With R... every day is Talk Like A Pirate Day! by Bite+The+Pillow · · Score: 1

      If google search is limiting your pirating, you may want to investigate something a little more specialized. I assume you're talking about the 2009 film with Jennifer Connelly, not the 2005 short nor the video game - either would be two clicks away after less than a minute.

      And if Google Search is really impacting your life in any meaningful way, you should step away from the keyboard for a weekend.

      I think this is more a case where you detected a pattern from two events, and extrapolated to assume that everyone has the same problem all the time. It's normal and natural to do so, but not correct.

    5. Re:With R... every day is Talk Like A Pirate Day! by Anonymous Coward · · Score: 1

      "Yarr" also has a nice acronymization: "Yet Another R Rebranding".

    6. Re:With R... every day is Talk Like A Pirate Day! by wisnoskij · · Score: 1

      Well I specifically mean the specialised searchers.

      Go to The Pirate Bay. Search "9", Search "9 2009".
      Neither of those return any useful results.

      And I guarantee you that that would of effected the number of people who torrented it.

      --
      Troll is not a replacement for I disagree.
    7. Re:With R... every day is Talk Like A Pirate Day! by TheRealHocusLocus · · Score: 1

      This joke was tired and lazy a decade ago. You're not just beating a dead horse, you've move past that to sodomizing it.

      And you've been everywhere and seen it all -- and have come back to tell us how you've been everywhere and seen it all -- and have come back to tell us how you've been everywhere and seen it all -- and have come back to tell us how you've -- been.

      Sorry to hear it. Get a leg up into the world of wonder and whimsy. Join us!

      --
      <blink>down the rabbit hole</blink>
  21. If you're going to use R by Johnny+Loves+Linux · · Score: 4, Informative
    Be sure to use RStudio as the front end: http://www.rstudio.com/. Using on R in a terminal is ok, but having the beautiful GUI frontend RStudio makes working with R sooooooo much better! The help system, plots, R markdown (knitr), and inspecting variables in RStudio is so much easier. As far as comparisons go,
    1. R is no competitor to python for writing generic scripts.
    2. Python (numpy, scipy, statsmodels, pandas, sklearn, matplotlib, ipython and ipython notebooks) is not yet ready to compete with R for doing statistical analysis but give Python a couple of more years and then slashdot should do a review of how it compares.
    3. You can always call R from python using the r2py module. This is really easy within an ipython notebook using the %load_ext rmagic command.

    For a nice video on using ipython notebook in data analysis: https://www.youtube.com/watch?...

    For a nice selection of ipython notebooks for doing various type of data analysis: https://github.com/ipython/ipy...

    1. Re:If you're going to use R by dpryan · · Score: 1

      The GUIs are fine unless you need to run anything in parallel (e.g., mcmapply). Those almost never work in a GUI.

    2. Re:If you're going to use R by Anonymous Coward · · Score: 0

      If your arch = ARM, you'll need to build RStudio yourself.

  22. State of Programming in the Sciences by wisnoskij · · Score: 2

    Having seen the state of programming in the Sciences, I really do not thing that "built by statisticians" is something you would want to advertise.

    --
    Troll is not a replacement for I disagree.
    1. Re:State of Programming in the Sciences by Anonymous Coward · · Score: 0

      I noun predicate isn't a statement...

      ERROR: Overflow detected at Line 1.
      Stacktrace:
          Read parent post.
          Write child post.
          Compose line 1.
          Ellipsis encountered.
          Reconsider line 1.
          Reconsider line 1.
          Reconsider line 1.
          void* std::realloc(int);
          Out of Qbits error.

  23. Beats python at what? by umafuckit · · Score: 3, Interesting

    A few examples are provided in TFA but it's all rather vague as to why R "beats" Python. I've been using R for years for fitting mixed effects linear models. It does this really well, it makes it easy to compare models, it's got all the cutting-edge stuff in it. The problem with R, however, is that it's shitty and unintuitive as a programming language. I do all my pre-processing in MATLAB and I only ever export to R when I have a final data frame that needs a moderately complicated statistical analysis.

    1. Re:Beats python at what? by Anonymous Coward · · Score: 0

      Why not clone R and give the clone better syntax? Or else add R functionality to a Pythonesque language?

    2. Re:Beats python at what? by Anonymous Coward · · Score: 0

      Oh yeah, r2py. So what's the problem?

  24. With R... every day is Talk Like A Pirate Day! by iggymanz · · Score: 1

    I'm afraid your research neglects a huge subset of the Talk-Like-A-Pirate word space, 'yarr' has 523,000 results

  25. MIssed point Apples - Oranges by WatchMaster · · Score: 1

    No one uses R for it's amazing language*. The language sucks. R is used because it has nearly limitless, tested, and approved statistical algorithms. Want partial least squares, support vector machines, linear models, principle components analysis, Fisher's exact test?, they are all there waiting to process your data. Along with hundreds of other analyses that you might really need to use but don't even know about yet.

    "Python" doesn't have this stuff because it is a language, not a set of statistical methods.

    *there may be a few deviants who use it for self flagellation

    1. Re:MIssed point Apples - Oranges by stevebyan · · Score: 1

      No one uses R for it's amazing language*. The language sucks.

      From Morandat, Hill, Osvald, Vitek, "Evaluating the Design of the R Language", http://r.cs.purdue.edu/pub/eco...

      "R is a dynamic language for statistical computing that combines lazy functional features and object-oriented programming. This rather unlikely linguistic cocktail would probably never have been prepared by computer scientists, yet the language has become surprisingly popular. With millions of lines of R code available in repositories, we have an opportunity to evaluate the fundamental choices underlying the R language design. Using a combination of static and dynamic program analysis we assess the success of different language features."

  26. Can't spell warez without R by tepples · · Score: 2

    And to what extent are statisticians willing to use warez?

    1. Re:Can't spell warez without R by Mitchell314 · · Score: 1

      Many of them are or were college students. What do you think? :P

      --
      I read TFA and all I got was this lousy cookie
  27. ASM is superior by Khyber · · Score: 0

    ASM makes R possible.

    Henceforth, ASM is king. R is just another pretender.

    --
    Still waiting on Serviscope_minor to wake up to fucking reality and realize that Jessica Price isn't going to fuck him.
    1. Re:ASM is superior by Anonymous Coward · · Score: 0

      I program in microcode you insensitive clod!

  28. Re:Right tool for the right job. by Anonymous Coward · · Score: 0

    dickbreath???...

  29. Re:Right tool for the right job. by Anonymous Coward · · Score: 0

    Well, by that logic, nothing past machine code was ever needed.

    I think there's some value in having a language that allows you te express code in an efficient notation. Just like there's value in having mathematical formulas and not having to write mathematical work as prose.

  30. Re:Right tool for the right job. by Anonymous Coward · · Score: 0

    Python has Beautiful Soup for web page scraping that I have not seen in any other language.

    Java has JSoup

  31. Fortran throwdown challenge! by Theovon · · Score: 1

    This guy must have been reading the recent stuff on Fortran and decided to jump on the bandwagon.

    Fortran was written by engineers and scientists for engineers and scientists.
    R is written by statisticians for statisticians.

    Well, there you have it. If a language or other kind of tool was developed by practitioners of X for other practitioners of X, it’s likely that it will be better than some other tool that was designed for a different purpose.

    Who would have thunk it.

  32. DSLs by jbolden · · Score: 3, Insightful

    He's probably right. All other things being equal a good Domain Specific Language will crush a General Purpose Language in its domain. If Julia is much faster than R and that were unfixable it would still be far easier to write a library in Julia accessible by R than to train R users in all of Julia's concepts.

    General purpose languages can sometimes get close to DSLs in effectiveness and then the greater diversity of users creates an economy of sacle and deep entrenchment which drives DSLs away. But then with a large and highly diverse user base the General Purpose language isn't able to rapidly adapt so DSLs spring up to fill niches. Some of those DSLs become incredibly successful and start to move into other domains diversifying their purpose and user base to become General Purpose Languages and the cycle repeats.

    1. Re:DSLs by Anonymous Coward · · Score: 0

      The thing about general purpose versus domain specific, is the getting "close" as you say can be really, really close. In languages that have decent string manipulation, at worst you end passing around some strings to account for syntax quirks in many DSLs. At least in packages in Python that are meant to handle R related work, they were able to duplicate model definitions using the same syntax this way. Then it is only a matter of which one has the problem specific packages you need or want.

    2. Re:DSLs by jbolden · · Score: 1

      That for the general purpose language creates the two language problem.

      Library X has a syntax Y but also a syntax from from language Z keeps bleeding through in practice vs. in a DSL where Y is clean.

    3. Re:DSLs by Anonymous Coward · · Score: 0

      The half decently written libraries typically have syntax like both the DSL they are inspired by and the language they are actually in. For example statsmodels in python can define models either using a compact syntax like from R, or a bit more verbose way that is plain python. It is your choice if you want it to be clean or to be compact/familiar. Unless you are using another library on top, which made a choice for you, but then if that is the library you need you don't have much choice which language to use in the end.

  33. Programming language vs. statistical computing by Anonymous Coward · · Score: 1

    This is such a pointless discussion. To all the people going on and on about how horrible a programming language R is: well, it never intended to be a good programming language. It is perfect for what it is meant to do, namely, load data, do statistical analysis on it, and produce graphics summarizing the results.

    If you need a _programming language_, no, R alone won't do. It is not for "programming in the large", it's not for problems that can't be expressed as vector/matrix/array manipulations, it is not good for writing "modular" software" (perfectly good for independent packages, however).

    No, people are not using it simply out of momentum. Unless you define "momentum" as "using it because it does statistical analysis and graphics".

    1. Re:Programming language vs. statistical computing by Pseudonym · · Score: 1

      It is perfect for what it is meant to do, namely, load data, do statistical analysis on it, and produce graphics summarizing the results.

      And that's also its biggest problem. It's perfect for what it's meant to do, but it's distinctly imperfect for many of the uses that it does do, and poorly designed for the uses that it could do.

      An example of the former is maintaining large codebases. People do maintain large R codebases, but this is despite the language, not because of it.

      An example of the latter is parallelism. Many statistical problems could naturally parallelise on large data sets in a well-designed R-like language. A clean core language (perhaps with craploads of syntactic sugar) could allow this parallelism to happen mostly automatically. But it can't, because R started leaking implementation details into the language very early on. Now it's impossible, and anyone who wants parallelism had better hope that their native-language library writer knows what they're doing.

      Statisticians are no different from any other software customer. They don't always know what they want.

      --
      sub f{($f)=@_;print"$f(q{$f});";}f(q{sub f{($f)=@_;print"$f(q{$f});";}f});
  34. shocked by Spazmania · · Score: 1

    I'm shocked to learn that a purpose-built programming language might be better at its specific purpose than a general purpose programming language. Shocked I say.

    I'd be even more shocked if a bunch of mathematicians had the good sense to pick a Google searchable name for their language. One PIA thing with C is how hard it is to search Google for documentation when you don't remember the exact function name.

    --
    Moderating "-1, Disagree" is simple censorship. Have the guts to post your opinion.
    1. Re:shocked by cellocgw · · Score: 1

      I'd be even more shocked if a bunch of mathematicians had the good sense to pick a Google searchable name for their language

      You young punks have any idea by how many years R precedes the existence of google (or even alta vista)? Same goes for the c language, FWIW.

      --
      https://app.box.com/WitthoftResume Code: https://github.com/cellocgw
    2. Re:shocked by Spazmania · · Score: 1

      By the first stable release of R (2/2000) folks recognized the problems using the search engines to find C documentation. It would have been a wise time to pick a searchable name.

      --
      Moderating "-1, Disagree" is simple censorship. Have the guts to post your opinion.
  35. true, but not really because of R itself by jonnyj · · Score: 3, Insightful

    Completely right.

    We use R extensively in work. Programmers talk about R's libraries, but that's not the real reason we use it. The killer blow is that the _documentation_ is written by statisticians. That means that it's reliable, easy to understand, and honestly tells you the pitfalls of the techniques you're using.

    We're financial guys who are doing stuff in consumer finance that has rarely, if ever, been done in our field. The statistics aren't particularly advanced, but it's impossible to hire someone who understands the industry and knows the statistics already. Statistics text books tend to either be so basic that you already know what they say, or so advanced that you need a PhD to understand them. On the other hand, much of the R documentation is beautifully simple to read, and comes with brilliant worked examples - albeit from fields that are very different from our own. Whenever we're researching potential new statistical approaches, we find blogs stuffed full of examples written in R.

    In short, the R ecosystem makes you a better statistician. Julia and Python can't offer that.

  36. With R... every day is Talk Like A Pirate Day! by Anonymous Coward · · Score: 0

    This joke was tired and lazy a decade ago. You're not just beating a dead horse, you've move past that to sodomizing it.

  37. Slashdot by DrYak · · Score: 1

    You know you're on /. when you need to check half of the words on wikipedia, just to be able to understand a 10 words sentence. :-)

    --
    "Sufficiently advanced satire is indistinguishable from reality." - [Tips: 1DrYakQDKCQ6y52z6QbnkxHXAocMZJE61o ]
  38. R Julia beats both R and Julia by PaddyM · · Score: 1

    We all know Raul Julia as M Bison beats them both. And Raul Julia's reading of "Mystery on the Docks" on Double "R" (Reading Rainbow) lives on in my mind as one of the great renditions.

  39. Actor and Object, An Artificial Divide? by Anonymous Coward · · Score: 0

    It's one of those things that only comes up in the context of comparing programming languages. It's a feature of certain languages that the code is also a data type. That means that you can, e.g., concatenate a string of commands:

    a = 'b *' + 'c'

    and then call

    a()

    Lisp is the most notable example. Slightly more usefully, you can write a function, pass it around to another function, change (e.g.) a SUM operation to SIN, and return the new function for use elsewhere. It's related but not equivalent to having functions as first class objects, the former implies the latter but not vice-versa. There are a great many useful things one can do with code that writes code that writes code, and even a wide sea of things that are more concisely or elegantly done this way. I don't know if there is a class of problems which can only be performed in a homoiconic language, but I'd guess that there's a Turing tar-pit for anyone interested in using the wrong tool for a given job.

    So if what you got from that it's something that only LISP weenies and Clojure hackers care about, it's pretty much true: that weird light in their eyes is probably related. I have more pedestrian programming challenges, but some nights, I dream of destroying the divide between object and action, between coder and code.

  40. Flaky by StripedCow · · Score: 2

    From the summary:

    And R is Statistically Correct

    But Python is correct all the time.

    --
    If Pandora's box is destined to be opened, *I* want to be the one to open it.
  41. without computer scientists... by Anonymous Coward · · Score: 0

    > 'R is written by statisticians, for statisticians,'

    that's why it's great for statistics but kinda sucks as a language... I have a few comments saved from the last debate about it...

    "All indexing in R is base-one. Note that no error is thrown if you try to access a[0]; it always returns an atomic vector of the same type but of length zero, written like numeric(0). Unaccountably, nobody's in jail for that decision, yet. Indexing past the end of the array, by contrast, yields NA."

    "If you ask for a numeric vector using numeric(42) or as.numeric(x), you will get a double vector. A perfect R-ism is that if you ask for a single vector, you'll still get a double-precision float vector, though it will have a flag set so that it will be passed into C APIs as single-width floats instead of doubles. There is no single-precision storage type in R."

    because, you know, fuck you! that's why!

  42. R is less optimal for programmers by Anonymous Coward · · Score: 0

    R is strange, and non intuitive for programmers. I dont like it. Python is beatiful and simple. Even if Python is less statistics-ish, I prefer it because there are so many surprises in R - all the time.