Slashdot Mirror


Microsoft Announces R Tools For Visual Studio (technet.com)

theodp writes: A year after its acquisition of Revolution Analytics, Microsoft announced a slew of R-related product offerings, and noted that Revolution R Open is giving up her maiden name and will henceforth be known as Microsoft R Open. Tucked away in the announcement was the news that R is coming to Visual Studio. Microsoft has released a teaser video for R Tools for Visual Studio (RTVS) and is taking sign-ups for early access.

5 of 105 comments (clear)

  1. Re:R vs. Python vs. other by i.r.id10t · · Score: 4, Informative

    R is pretty much pure statistics. While it has a built-in interpreter to load data from csv files or user input or whatever and then run its functions against the data set, it really shines as a library to be used in other "real" programming languages where you have logic, loops, etc. available to you.

    And since there are R interfaces for Python (http://rpy.sourceforge.net/) it isn't a "versus" situation.... what a bargain!

    --
    Don't blame me, I voted for Kodos
  2. Re:R vs. Python vs. other by bangular · · Score: 5, Informative

    R is preferred by statisticians. Many statisticians are on the leading front of creating new traditional machine learning algorithms (not the GPU driven or map reduce stuff "hip" companies are dealing with right now). Things like supervised classification tasks and clustering algorithms. This usually means you have access to a researchers implementation of a new algorithm fairly quickly, long before it's in a commercial package. It also means you have to deal with a lot of 1-off code and deciding whether their function wants a row-vector or column-vector.

    Python seems to be much more popular with those having a computer science background. There are far fewer machine learning algorithms available in Python. However, if you are going to design a large system, it's generally much more convenient to do in Python. There are Python interfaces to R as well.

    Julia is new on the scene and attempts to solve the shortcomings of Python and R (insert xkcd comic here). Performance is good and has interfaces to many languages. I've used it a few times and it's maturing, but it's definitely risky doing any long term project in Julia.

    Then there's Java. Weka is a popular machine learning package with a GUI and all of the algorithms available as jar files. Very consistent API and includes pre-processing tools. Weka also has a marketplace for new algorithms. However, many times you just have to write a 1-time script for data cleanup or to compare algorithms, and it's definitely not convenient to do in Java. I haven't seen many pure Java people doing this type of work in the wild. The final implementation may end up in Java, but the initial work seems to almost always be in R and Python.

  3. Re:R vs. Python vs. other by Anonymous Coward · · Score: 2, Informative

    If you know both a Scripting language eg. Perl or even bash, and a spreadsheeting program, then R is almost unbelievably easy to learn, I was doing semi "advanced" stuff in less than a day, and I am not that bright.
    The real problem will be understanding the specific statistics you are wilding, which can be a lot harder.

    this is one of the nicer intro's I have found but tastes vary
    http://www.statmethods.net/

  4. Re:R vs. Python vs. other by reve_etrange · · Score: 5, Informative

    MATLAB is amazing for general 'data science,' and is very widely used for certain tasks, such as image processing. It provides a huge array of already-implemented algorithms for computer vision, statistics, machine learning, and simulation. Many academic labs use it, and many students receive MATLAB training. On the other hand, MATLAB is proprietary and quite expensive. (It's semi-open source because most of it's functions are MATLAB scripts themselves). The language is very readable, except maybe the native array syntax, and comes with extremely good documentation, but it's clunky for general purpose programming. It has an OK IDE and one of the best debuggers in any language. The runtime is redistributable, so you *can* make portable applications, but again, it's a little clunky. The open-source GNU Octave and Scilab environments are also (mostly) code-compatible with MATLAB. All-in-all, it scores highly in all three aspects you mentioned, but it's very expensive.

    Python is also very good, once numpy, scipy, matplotlib, pandas and ipython/jupyter packages are installed. Like MATLAB, Python is widely used in academia, and lots of students receive training. There are many function/algorithms already available, but somewhat less so than in MATLAB. For example, the statistics capabilities are similar, but MATLAB has more image processing functions. Plotting and visualization also haven't quite caught up to MATLAB yet. Python has the great advantage of being totally free and open-source, and there are a large number of IDEs and debuggers available. Python is also a great general purpose language for self-contained, portable applications that may grow out of data analysis code. The documentation can be lacking in some modules, but there's good free support online via e.g. stackoverflow. Python is readable and easy to learn. It scores about the same as MATLAB, weaker in some areas, stronger in others, and is completely free. There's active development of the analytics modules and going forward Python will probably become more popular for data science.

    R is a bit of a special case. It has excellent statistics and machine learning capabilities, and there are a lot of extension packages available with specialized features, but it's really not as general as MATLAB or Python. I'm unaware of anyone using R for image processing, for example. As a language, it's very declarative, and the analyst doesn't need to understand statistics methods or their implementations in order to use them. That's great for beginners and convenient for experts, but can lead beginning/intermediate users astray if they don't appreciate the distinctions between significance and effect size, between different measures of significance/effect size, independence of variables, etc. Plots and visualizations in R tend to look nice when printed as PDF, but they're essentially non-interactive. R isn't general purpose at all, and personally I don't like its language conventions. I had the same experience with Mathematica, some people really like it and it's great for certain things, but I just can't stand the language. Back to R, I think the usefulness is great for statistics, less so for other tasks. Maintainability is OK - IMHO the language is not as intuitive as MATLAB or Python. My impression is that fewer people receive training with R, and it's a little less popular in general. It's the only one of these three languages I didn't see until grad school.

    My first choice for any new data analysis task is Python. I think it has the brightest future, and it's available to everyone for free. I'll use MATLAB if one of its built-in functions will save me a ton of time, or if I need to prototype something very rapidly (I guess it's still my strongest language). R I only use if I absolutely need something from one of its third-party modules. Lately, I've been experimenting with Julia, but it's not close to mature enough for my academic projects, let alone commercial ones. Sometimes I use external visualization tools, like LLNL VisIt, if I need to make high-quality, interactive visualizations of very large data sets. Hope that helps, sorry for the wall of text.

    --
    .: Semper Absurda :.
  5. Re:Managers Hate Niche Languages by i.r.id10t · · Score: 3, Informative

    "If you enjoy what you do, you'll never work a day in your life"

    --
    Don't blame me, I voted for Kodos