Slashdot Mirror


Book Review: R Graphs Cookbook

RickJWagner writes "Once upon a time, I thought communication was one of my strong suits. Alas, a few years into my programming career I realized I'm more of the head-down codeslinging type, not one of the schmoozing managerial types. So when I have a point to make, I really like to have my data ready to do the talking for me. In that capacity, this book is a very good weapon to have in my arsenal." Read on for the rest of Rick's review. R Graphs Cookbook author Hrishi Mittal pages 272 publisher Packt Publishing rating 8/10 reviewer RickJWagner ISBN 1849513066 summary An invaluable reference book for expert R users Right away, you should realize this is not a book that teaches R. R (an excellent open source statistical language) is a great tool for any technician. I've used it to analyze logs, find performance bottlenecks, and make sense of mountains of nearly unrecognizable data. But this book doesn't teach R, it teaches R graphing.

It turns out R has excellent graphing capabilities. You can draw scatter plots, line plots, pie graphs, bar charts, histograms, box and whisker plots, heat maps, contour maps and 'regular' maps. These are all good for demonstrating data in different ways, and the book lightly explains which graph will help you illustrate which point.

If you're getting a little interested, you'll also want to know that all this graphing can be scripted and scheduled. So you can get data-driven reports on a schedule, easily accomplished once you know how to write the graphing scripts (which are then scheduled using cron or a similar facility). One small caveat: To prepare your data for presentation, I think it's usually necessary to partner R with another language that's better for text extracting and manipulation. I prefer Python for this task, you might like another language.

The book is exceptionally easy to read and work with. This doesn't mean it's simplistic, though. Anyone who's tangled with R's graphing without a good example will testify that figuring out the various functions and arguments necessary to wrangle a descriptive graph can be really difficult. This book gives you the kind of graphs you need, with the bells and whistles you're going to want, in a series of snippets you can run immediately.

The book is written in Packt's "Recipe" format. In a nutshell, this means that it's a series of how-to sections worded in a templated form. There are headings for sections that inform you what you're going to accomplish, how it's done, and why it worked. You quickly realize it's a repetitive format, but it serves to make the book an excellent resource for quick reference.

Another really nice feature of the book is the downloadable source code and matching data. Knowing the data is half the battle, really. The specific formulas given are certainly useful, but without knowing how the underlying data is formatted you really wouldn't get nearly the practical value. For that reason, I urge anyone using this book to be sure they examine the underlying data for at least the first few formulas. After that, it'll be automatic, you'll know you want to look at that data when you're trying to master some graph type. Then when you go to make your own data ready for graphing, you reach for that secondary language like Python, extract the fields you want in a way similar to your example data set, and presto-- you've got the graph you want.

The book starts out with a first chapter that introduces the kinds of graphs you'll be able to produce and situations where each type is most useful. The next chapters, up until the final one, are in-depth sections on each of the graph types. Maps are treated to a different chapter than pie graphs, for instance. The final chapter covers putting final touches on your graphs, including saving them in different formats (PDF, PNG, JPEG, etc.) and niceties like adding scientific notations, mathematical symbols, etc.

The book states that the target audience is experienced R programmers. I really don't think that's necessary, though. There is an obligatory R installation section, and I think that a reasonably competent programmer with Google at his disposal could get off the ground (for graphing purposes) with this book and a little bumbling. If you already know R, then you needn't worry at all, there is nothing here that will look foreign to you.

If I could change one thing about the book, I'd want a comprehensive index of all the functions and arguments that augment the basic core functions that produce the example graphs. These functions and arguments tweak the basic function in ways that make them much more appealing than what the basic function alone can provide. But the book isn't able to show each and every combination with each graphing function, so it's up to the reader to figure out how to pick some of the options from one recipe and apply it to another. It's not difficult to do, but having an index to help you find the options you want would make this process easier.

You can purchase R Graphs Cookbook from amazon.com. Slashdot welcomes readers' book reviews -- to see your own review here, read the book review guidelines, then visit the submission page.

16 of 64 comments (clear)

  1. This afternoon's Slashdot content... by PCM2 · · Score: 3, Insightful

    ...is brought to you once again by the letter Packt and the number RickJWagner.

    --
    Breakfast served all day!
    1. Re:This afternoon's Slashdot content... by Compaqt · · Score: 2

      Are there no more O'Reilly books being published anymore? How about some reviews of new instant classics (like the Camel book)?

      It's been all Packt, all the time for how long now?

      As a side note, PacktLib (all you can eat package) is more expensive at $220/yr than O'Reilly's Safari Online Books. Safari is $110/yr for the base package--5 books at a time. That's strange, since O'Reilly books have usually been considered the best tech books. Also they have books from a whole lot of other publishers, while Packt is mostly just barely edited PDFs.

      --
      I'm not a lawyer, but I play one on the Internet. Blog
    2. Re:This afternoon's Slashdot content... by i.r.id10t · · Score: 2

      I pay $0 for an all-I-can Safari ... of course, it is bought by my county library system, and I pay taxes and donate to the library system specifically, so I guess I do pay something...

      --
      Don't blame me, I voted for Kodos
  2. Or you can use Excel by AdamInParadise · · Score: 3, Informative

    Or any other spreadsheet program.

    Now of course I admit that Excel is probably not as flexible as R. However, unless your job is to produce stunning, tailor-made graphs, a spreadsheet application will deliver results a lot faster.

    --
    Nobox: Only simple products.
    1. Re:Or you can use Excel by Anonymous Coward · · Score: 2, Insightful

      If your data set is so small that a spreadsheet can open it, then your data set is a toy data set.

      Where's the +1,Smugly Superior mod option when you need it...

      Seriously, any data set that you encounter "in the wild" is by definition not a toy data set. There's many instances where using a spreadsheet to quickly visualize some figures is fine, just like there's many instances where using a word processor to write a letter instead of firing up TeX is fine.

      (Granted, you can easily write letters with TeX, too, and better-looking ones than what LibreOffice etc. will come up with, but that's because TeX has packages for just that sort of thing; you don't actually have to wrestle with raw TeX. R has a much steeper learning curve and thus may well be overkill. The right tool for the right job, people!)

    2. Re:Or you can use Excel by Beryllium+Sphere(tm) · · Score: 4, Informative

      People who know more about statistics than I do severely criticize Excel, e.g. http://www.stat.uiowa.edu/~jcryer/JSMTalk2001.pdf

    3. Re:Or you can use Excel by Anonymous Coward · · Score: 2, Informative

      Wrong row limit. Sure you can _have_ 1M+ rows but you can still only graph 32K of them at a time.

    4. Re:Or you can use Excel by plopez · · Score: 3, Informative

      I like R because:
      1) It can handle the large (million or more) ata sets I need to crunch and compare

      2) Seriously, the latest versions of Excel seem to choke on larger datasets. The "Oh no! Excel is bogging down and getting ready to crash!" sensation is far too frequent. R is much more stable

      3) You can do nice graphics in R you can't do in Excel. See http://addictedtor.free.fr/graphiques/

      4) There is a huge number of pre-rolled *serious* statistical libraries already written, and open sourced (including GPL'd) for it. FFT, geospatial stats, multivariate linear and non-linear statistical modeling, time series analysis, linear algebra, and more. Including OOP. I jam ust exploring how R does OOP now.

      5) The scripting language is in the Lisp family. It works the way I think.

      6) You can compile and link in your own packages in Fortran (pick your flavor 77, 88, 95, '03, or '08), C, C++, etc. If it links, you can link it.

      Sweet. Also more stable than Matlab (and cheaper), and more user friendly than SAS.

      --
      putting the 'B' in LGBTQ+
    5. Re:Or you can use Excel by jrcoyle · · Score: 2

      For lattice graphics, get Lattice: Multivariate Data Visualization with R, by the author of the lattice package in R. However, I would recommend instead the ggplot2 package, and the book ggplot2: Elegant Graphics for Data Analysis by its author. ggplot has all the functionality that lattice does, it produces prettier plots by default, and its easier to specify graphs and edit them with a minimal change in code.

    6. Re:Or you can use Excel by cellocgw · · Score: 2

      I'm sorry, but if you think Excel's graphs are good for much of anything, or you think they are easy to edit and reformat, you are grossly mistaken. I'm no novice: I've written spreadsheets with named variables so I can change the content of Excel graphs by changing names or data in cells.

      Before you get snarky about R, at least take the time to find one of the web sites dedicated to displaying charts, maps, and graphs generated with R. Most of them are far beyond anything Excel can do.
      If all you want are Enterpris-ey pie charts and bar charts (both worthless pieces of crap that only make PHBs happy), then use Excel. But if you've learned enough to know the difference between a line chart and a scatterplot, time to move up to a real tool such as R, Origins, Mathematica, Numpy, etc.

      --
      https://app.box.com/WitthoftResume Code: https://github.com/cellocgw
    7. Re:Or you can use Excel by dtdmrr · · Score: 2

      a spreadsheet application will deliver results a lot faster.

      Not really, particularly if you have the data already entered. Running:
      R
      data=read.csv("data.csv")
      hist(data)

      takes far less time than selecting your columns, dragging the mouse over to the graph button, selecting the region for your plot, and then trudging through a multi-stage wizard. Even if you actually want to type in some data in a spreadsheet its frequently faster to save the table and load it up in R or gnuplot to graph it. And if you do want something like a histogram or a boxplot, excel doesn't stand a chance (gnumeric at least supports boxplots).

      I might accept that creating a slightly prettied up graph might be a little quicker in a gui spreadsheet. But for quick and dirty and higher quality graphs they are slower, if they work at all. Once you start encoding your style preferences in little scripts that you load before graphing, you'll find even higher quality graphs take less time than mediocre graphs from a spreadsheet. And really there's something satisfying about tweaking one line in a single file and that automatically updating the style of 20 graphs in an article.

      I generally find that when plotting, if doing it once its a coin toss whether to write a script or manipulate the data and plot manually, twice and scripting definitely breaks even and of course more than that and scripting just gets more and more valuable. R (and many other environments) save your history, so that if you do decide a day later you should have just written a script, its already there, you just need to copy the commands out of the history file. In excel, well at least you learned from experience what to do that next day.

      As I see it there are two reasons to graph in a spreadsheet. First if you're actually working in a spreadsheet and just want a quick look at some data (not debating the merits of that, separate discussion). Second, when you're not sure what you want and are unfamiliar with the tools available, a gui gives you something to poke at blindly with a mouse. In that second case, I think one should accept the accept the pitfalls of ignorance with an intent to learn more and improve. Stubbornly grasping your spreadsheets, knowing there's a better world out there, will only hurt you in the long run.

    8. Re:Or you can use Excel by subreality · · Score: 2

      Now of course I admit that Excel is probably not as flexible as R. However, unless your job is to produce stunning, tailor-made graphs, a spreadsheet application will deliver results a lot faster.

      R is not a graphing language. It's a statistics language. If you just want to plot your sales growth by quarter, sure, a spreadsheet is much more convenient. But professional-quality graphs aren't the only (or even the primary) reason for R.

      R has an enormous library of very well refined statistics functions. Spreadsheets are not designed to handle hundreds of thousands of data points, cross-correlations, advanced data transforms, and all kinds of analysis that spreadsheets don't (and shouldn't) have.

    9. Re:Or you can use Excel by garcia · · Score: 2

      And R is free and SAS and/or Excel are not. For most here that would be the big deal breaker.

      While I use SAS myself, it's because it's available to me. However, I would not use Excel to build charts simply because if you have to change something it's very likely you will have to recreate the chart too. Personally I like running a block of code and having the output get e-mailed to the report's recipient each day/week/month/quarter/foo w/o me having to do anything manually.

      Excel = manual and that scares the shit out of me.

      YMMV.

  3. R makes great graphs, but... by proxima · · Score: 2

    R makes great graphs functionally speaking, but without mucking about with the options and some post-processing they are not the most attractive. Open up your favorite financial/data intensive news source and look at the visuals and you'll find that generating that style with just code is fairly difficult.

    Until about Office 2007, the defaults in Excel charts were also atrocious. Openoffice.org is still pretty bad, and Matlab is not much better than R. The good news is that you can generate PDFs from each of these and easily open them in Inkscape/Illustrator, where making vector-based edits is easy.

    Anyone who regularly visualizes data needs to pick up resources on how to clearly organize and display your data, like "The Visual Display of Quantitative Information" by Edward Tufte (though some of his examples are a little dated). Books like that are full of examples that would be very tricky to replicate without any post processing, because it usually involves eliminating excessive lines and cluttering detail.

    --
    "The universe seems neither benign nor hostile, merely indifferent." --Carl Sagan
    1. Re:R makes great graphs, but... by dondelelcaro · · Score: 2

      R makes great graphs functionally speaking, but without mucking about with the options and some post-processing they are not the most attractive.

      Base graphics aren't that nice looking, but that's why ggplot and lattice exist. You can fairly easily produce publication quality graphs with them without spending much time dealing with additional options. There are also packages which produce many of the plots which Tufte promulgates.

      --
      http://www.donarmstrong.com
  4. Far, far too basic. by dondelelcaro · · Score: 3, Informative

    Just from examining the few preview pages on amazon.com, this book appears to be far too basic for anyone who has actually done any serious work with R. I personally would forgo this entire book, and spend the time wandering through the R Graph Gallery which has far more examples with source code and underlying data. It's also rather odd that this book doesn't cover ggplot, grid graphics, lattice, or any of the more commonly used tools in advanced R graphics.

    Perhaps this book could be useful as your first foray into graphing with R... but I'm unconvinced it even covers that well.

    --
    http://www.donarmstrong.com