R In a Nutshell
joel.neely writes "R is a statistical computing environment that is fully-compliant with state-of-the-art buzzwords: free, open-source, cross-platform, interactive, graphics, objects, closures, higher-order functions, and more. It is supported by an impressive collection of user-supplied modules through CRAN, the 'Comprehensive R Archive Network.' And now it has its own O'Reilly Nutshell book, R in a Nutshell, written by Joseph Adler. I am pleased to report that Adler has risen to the challenge of the highly-regarded 'Nutshell' franchise. As is traditional for the series, this title mixes introduction, tutorial, and reference material in a style that is well suited to a reader who already has a background in programming, but is a new or occasional user of R." Read on for the rest of Joel's review.
R in a Nutshell
author
Joseph Adler
pages
672
publisher
O'Reilly
rating
9/10
reviewer
Joel Neely
ISBN
978-0-596-80170-0
summary
A practical and engaging introduction to the R statistical system and its usage
As a curious newcomer to R who wanted to get going quickly, I was well-served by Part 1, which provided an R kickstart. Chapter 1 covers the process of getting and installing R. It is short, to the point, and just works, addressing Windows, Mac OS X, and Linux/Unix with equal attention. Chapter 2, on the R user interface, introduces the range of options for interacting with R: the GUI (both the standard version and some enhanced alternatives), the interactive console, batch mode, and the RExcel package (which supports R inside a certain well-known spreadsheet). Chapter 3 uses a set of interactive examples to provide a quick tour of the R language and environment, establishing a task-oriented theme that carries through the rest of the book. The last chapter of part 1 covers R packages. It summarizes the standard pre-loaded packages, introduces the tools to explore repositories and install additional package, and concludes by explaining how to create new packages.
As a polyglot programmer who is always interested in seeing how a new language approaches programs and their construction, I enjoyed Part 2, which described the R language. This section begins with an overview in chapter 5, and then devotes a chapter each to R syntax, R objects, symbols and environments (central to understanding the dynamic nature of R), functions (including higher-order functions), and R's own approach to object-oriented programming. This section closes in chapter 11, with a discussion of techniques and tips for improving performance.
As a busy professional with data sitting on my hard drive that I'd like to understand better, I appreciated Part 3, with its practical emphasis on using R to load, transform, and visualize data. Chapter 12 presented alternatives for loading, editing, and saving data, from the built-in data editor, through file I/O in a variety of formats, to a mature set of database access options. Chapter 13 illustrated a range of techniques for manipulating, organizing, cleaning, and sorting data, in preparation for presentation or more detailed analysis. Chapter 14 introduces the reader to the wealth of graphical presentation options built into the R environment. There are so many charting types and details that this chapter could have been overwhelming, but Adler keeps the interest high and the mood light by drawing on an engaging variety of data: toxic chemical levels, baseball statistics, the topography of Yosemite Valley, demographic data, and even turkey prices. Chapter 15 is devoted to lattice graphics, the R implementation of the "trellis graphics" technique for data visualization developed at Bell Labs. This chapter illustrates the power of lattice graphics by exploring the question of why more babies are born on weekdays than weekends.
As a non-statistician who still occasionally needs to do some number-crunching, I'm sure I'll be returning to Part 4, with its detailed explanations and illustrations of analysis tools and techniques–almost two-hundred pages worth. In chapters 16 through 20, Adler surveys topics in data analysis, probability, statistics, power tests, and regression modeling. As someone who has been offered too many medications and lost fortunes, I found much to enjoy in chapter 21, which used a variety of spam-detection techniques to illustrate the concepts of classification. Chapter 22, on machine learning, discusses several of the data mining techniques that R supports. Chapter 23 covers time series analysis, which may be used to identify trends or periodic patterns in data. Finally, chapter 24 offers an overview of Bioconductor, an open-source project focused on genomic data.
The book closes with a detailed reference to the standard R packages.
This is an impressive piece of work. In a volume of this size (about 650 pages), navigation is crucial, and I found both the organization of the chapters and index up to the task. I was able to follow the instructions and examples through the first several chapters of the book essentially without a hitch, and in the latter chapters the variety of illustrations and data sources added interest to what could have been very dull going.
I won't claim perfection for this book. There were a couple of explanations that could have been clearer, and one or two odd turns of phrase or rough edits. Out of all the code examples that I tried, I found exactly one that didn't seem to work without a minor correction. For a work of this size, that's actually pretty amazing!
As a long-time O'Reilly reader, I see Joseph Adler's R in a Nutshell as a welcome addition to the menagerie.
You can purchase R in a Nutshell: A Desktop Quick Reference from amazon.com. Slashdot welcomes readers' book reviews -- to see your own review here, read the book review guidelines, then visit the submission page.
As a polyglot programmer who is always interested in seeing how a new language approaches programs and their construction, I enjoyed Part 2, which described the R language. This section begins with an overview in chapter 5, and then devotes a chapter each to R syntax, R objects, symbols and environments (central to understanding the dynamic nature of R), functions (including higher-order functions), and R's own approach to object-oriented programming. This section closes in chapter 11, with a discussion of techniques and tips for improving performance.
As a busy professional with data sitting on my hard drive that I'd like to understand better, I appreciated Part 3, with its practical emphasis on using R to load, transform, and visualize data. Chapter 12 presented alternatives for loading, editing, and saving data, from the built-in data editor, through file I/O in a variety of formats, to a mature set of database access options. Chapter 13 illustrated a range of techniques for manipulating, organizing, cleaning, and sorting data, in preparation for presentation or more detailed analysis. Chapter 14 introduces the reader to the wealth of graphical presentation options built into the R environment. There are so many charting types and details that this chapter could have been overwhelming, but Adler keeps the interest high and the mood light by drawing on an engaging variety of data: toxic chemical levels, baseball statistics, the topography of Yosemite Valley, demographic data, and even turkey prices. Chapter 15 is devoted to lattice graphics, the R implementation of the "trellis graphics" technique for data visualization developed at Bell Labs. This chapter illustrates the power of lattice graphics by exploring the question of why more babies are born on weekdays than weekends.
As a non-statistician who still occasionally needs to do some number-crunching, I'm sure I'll be returning to Part 4, with its detailed explanations and illustrations of analysis tools and techniques–almost two-hundred pages worth. In chapters 16 through 20, Adler surveys topics in data analysis, probability, statistics, power tests, and regression modeling. As someone who has been offered too many medications and lost fortunes, I found much to enjoy in chapter 21, which used a variety of spam-detection techniques to illustrate the concepts of classification. Chapter 22, on machine learning, discusses several of the data mining techniques that R supports. Chapter 23 covers time series analysis, which may be used to identify trends or periodic patterns in data. Finally, chapter 24 offers an overview of Bioconductor, an open-source project focused on genomic data.
The book closes with a detailed reference to the standard R packages.
This is an impressive piece of work. In a volume of this size (about 650 pages), navigation is crucial, and I found both the organization of the chapters and index up to the task. I was able to follow the instructions and examples through the first several chapters of the book essentially without a hitch, and in the latter chapters the variety of illustrations and data sources added interest to what could have been very dull going.
I won't claim perfection for this book. There were a couple of explanations that could have been clearer, and one or two odd turns of phrase or rough edits. Out of all the code examples that I tried, I found exactly one that didn't seem to work without a minor correction. For a work of this size, that's actually pretty amazing!
As a long-time O'Reilly reader, I see Joseph Adler's R in a Nutshell as a welcome addition to the menagerie.
You can purchase R in a Nutshell: A Desktop Quick Reference from amazon.com. Slashdot welcomes readers' book reviews -- to see your own review here, read the book review guidelines, then visit the submission page.
In a volume of this size (about 650 pages)
Not to criticize the reviewer but there's not enough written above to do this book justice. From the author's emphasis on preprocessing the data in another language (like Perl I think he uses in the Chapter 3 tutorial) so that it can be effortlessly ingested by R to the very last pages on machine learning in R, it's a good book. I actively lament that in college I was relegated to Matlab instead of R today and the many packages available on CRAN.
I too would give this book a 9/10. It sometimes tries to inject tutorials in what should probably stick to being a reference and it might have too large of a scope for a single volume (I've read sets of books on machine learning and classification models) but this book is great for R beginners and R intermediates and as an R reference.
Seriously if you know a statistician who codes or if you know a developer who values statistics then this is their book. Given the nature of the subject matter and the GPL'd beauty of R, you'll undoubtedly have a hard time finding a negative review of this book anywhere.
My work here is dung.
I mean, how can they resist?
"Waste not one watt!" - CZ
going to be called R-square?
"the GPL'd beauty of R"
LOL, wut?
...but can it architect a paradigm to leverage Web 2.0 community-driven crowdsourcing, using social graphs to monetize the cloud, thus rendering a modular-frameworked, enterprise-grade solution?
R is an excellent language to learn for just about every field. It's ability to import and export data to MS based resources such as Access, Excel, MS-SQL and other non-MS sources makes it a versital tool. It's commerical parent is S-PLUS and is nearly syntax identical with minor variations. Buy the book, use the tool, impress your Eve Online players by pinning down the July Tritanium prices and hitting the weekly averages within .5 ISK by doing time series analysis using regression plus ARIMA on the residuals. Find out cool things like Hulkageddon impacts frigate prices more then exhumers and MORE!
FUN FOR THE WHOLE FAMILY (Except your big sister because she's icky and into boys....)
For those what want to do google searches but find 'R' difficult there is the rseek.org site and a few quick links to get you started while you wait for the nutshell book to arrive in the mail.
R Intro : http://www.itc.nl/~rossiter/teach/R/RIntro_ov.pdf
Programming in R: http://manuals.bioinformatics.ucr.edu/home/programming-in-r
R Graph Gallery: http://addictedtor.free.fr/graphiques/
Big Resource I use: http://www.math.yorku.ca/SCS/StatResource.html
The Little Handbook: http://www.tufts.edu/~gdallal/LHSP.HTM
The Big N: http://www.itl.nist.gov/div898/handbook/
There are hundreds of PDF references out there that can help as well, too many to list. Good luck, have fun.
-=[Idgarad]=- I Am A Savage In A Brave New World!
Let me just say: wow, thanks for actually providing a review, rather than a blurb copied from the amazon listing.
Seriously. Thanks.
As an (occasional) R user, I am excited to see a well-reviewed O'Reilly book on the language. I went and checked the major ebook stores - Amazon, BN, and Stanza, and none had the title.
It turns out that in addition to the Safari books service, O'Reilly also sells DRM-free copies in epub, mobi, and PDF formats. This book is available here. It's not a huge discount over the printed version on Amazon ($6.50 less), though. I'm surprised, then, that it isn't available via the major stores.
"The universe seems neither benign nor hostile, merely indifferent." --Carl Sagan
R!
<drops pin...>
I have a number of other good R books and this book has been a really useful addition (works better as a reference than many of the others). At first I wasn't going to buy this given the up and down nature of the O'Reilly series- but "R in a Nutshell" is definitely up.
This book always sits right on my desk.
R is a language that more people should really learn. The statistics community has definitely gravitated strongly to it. These days, with the thousands of packages on CRAN, it's much superior in functionality compared to other packages like STATA or SAS (I won't even go into people who use matlab for statistics), not to mention open source.
It still is a bit slower than matlab for some matrix operations, but hopefully that will be improved in the future.
If you don't understand any of my sayings, come to me in private and I shall take you in my German mouth.
Based on Wikipedia, only G, H, N, O, P, U, V, W one-letter programming language names are left! Time to invent a new language :)
there be rum and free software matey... aRRRRRRRRRgh
$ unzip, strip, touch, finger, grep, mount, fsck, more, yes,fsck,fsck,fsck,umount, sleep
Based on Wikipedia, only G, H, N, O, P, U, V, W one-letter programming language names are left! Time to invent a new language :)
Ah, but we have unicode now, so we have a _lot_ of single-letter names to go.
More than a decade ago I gave a talk on using R, Octave, MuPAD and other software in the classroom environment. It's a great package. Back then I used it to get through stats courses and plot disk usage in a graph. Now I'm using it to hammer through stock market data each night. To do the same with some commercial packages would cost thousands of dollars.
Not having read the O' Reilly book,
I can't draw a comparison between the two, but I have been extremely pleased with "R In Action" by Robert Kabacoff
and it can be found here:
http://www.manning.com/kabacoff/
It's a work in progress, in that some 90% of the book is written. Pre-ordering the electronic version gives you the ability to download chapters as they are written, plus a final e-copy (or hard copy if you pay more) when it's completed.
I have a high degree of familiarity with SPSS and SAS, and am learning R to get around the crazy licensing issues of the aforementioned programs. I have been very pleased with Kabacoff's book, as I had *no* familiarity with R before grabbing "R in Action." The publisher/author support a forum where purchasers can identify errors and/or make suggestions for improvements before the book goes to final press.
Not sure if it is competition for "R in a Nutshell" or simply an additional reference, but worth checking out if you want to learn R. It's been very helpful for me.
jeff
:V (the dots were above it) -- was a cross platform byte-code compiled language used for voice processing applications (DOS & Unix)
Old age and treachery almost always overcome youth and skill.
650 pages in a Nutshell book?
Following the Murphy's Law "Anything which can be put in a Nutshell belongs there" I avoid Nutshell books, but that's a different topic.
Is there a way to integrate R programs with another high level language like Java, for example to bind a R object to a Java interface? I have basic familiarity with R, and I would like to use programs written in R directly with other programs written in a object-oriented language, as opposed to do file i/o for the bridge between them.
The general idea is to be able to take Java objects and pass them to R and do all the stats numbercrunching with smaller R programs, that are somehow integrated with a Java program. The results then get back as other objects that can be further processed in Java.
Are there any possibilities for that?
I only wanted to say that having learned R and Latex just for doing my PhD thesis (I am a PM now and I have never used them since my dissertation), I would strongly recommend them, especially to those never planning on going back to academia...it's once in a lifetime opportunity to do beautiful AND useful coding, feel proud of them and being able to brag to non geeks/nerds. All at the same time.
PS: my PhD was nothing to do with CS by the way.
Often reason people get involved in statistical analysis is there is a body of data, and no clue where to start ... as inhabitants of the information age, and cheap storage ... there's lots of material and often little clue or thought to what the stored data might mean.
http://rattle.togaware.com/ is a website dedicated to "rattle" which is an R package (and togaware has a PDF book that's a great introduction) to a GUI based datamining tool.
Very handy, and the book is very lucid.
I mean, how can they resist?
"R" is resistance, so R apps should be able to resist just fine.
I only wanted to say that having learned R and Latex just for doing my PhD thesis (I am a PM now and I have never used them since my dissertation), I would strongly recommend them, especially to those never planning on going back to academia...it's once in a lifetime opportunity to do beautiful AND useful coding, feel proud of it and being able to brag to non geeks/nerds. All at the same time. Just priceless..... PS: my PhD had nothing to do with CS by the way.
I don't know about java, but when I have to use a statistics library available in R, I use rpy. It's a python module that lets you automagically call r functions very easily, and directly get back python objects or R objects for further processing with R methods. Python's introspection capabilities make this sleek and transparent, I doubt a Java binding could be as cool (though if you need java, there probably are solutions).
and honestly, i'm so glad i don't have to use R directly... TFS says it is object oriented, but as far as I can recall all the library methods i tried just returned heterogeneous matrixes, with no real user-defined types. And the function calling semantics are mind-boggling, with mixing of keyword and positional arguments leading to all sorts of weirdness...
Statistical computing is where it is at but this is the wrong paradigm for it.
To quote Gauss: "Mathematics is the study of relations."
Relational programming is the right paradigm. In statistical terms, a relationship is a "case" and a set of relationships is a "relation". To use SQL terms, a case is a "row" and a "table" is a random variable or more accurately a set of random variables in a relation.
Once you have that idea, you can express your statistical propositions -- the things on which you make observations of cases -- in something relevant like the propositional calculus.
The <- approach is interesting, but what's the R notation for "less than negative six".
And so, the quest continues. Pascal's := might be the best; although I hate to admit it because Pascal
is my "had to deal with in school and was struggling so I hate it"
language.
For all intensive purposes, "whom" is no longer a word. That begs the question, "who cares"?
Serious question here.
I do a lot of statistical analyses, including some I've authored. The book is for the programmer, but R is for statistics and that means someone who actually uses the numbers for something.
SAS has it's own language as well as GUI with menus and can interchange data structures with many common programs.
SPSS has all these, plus is can record what's pulled down from the menus and generate code in its own language, which is easy to understand, comes as a text file, and can be edited and cut-and-pasted into batch files.
Why should I care about R?
And for the Matlab users, it was never meant to be a stats program, the stats add-on package requires you learn to write M code first, then learn the package, months of learning to get through it all, and you have to bring a full compliment of statistical know-how with you. If someone made you use Matlab for stats and the two weren't already your bent, someone needs spanked.
"I may be synthetic, but I'm not stupid." -- Bishop 341-B
Let me take this opportunity to say that unicode can fuck right off.
Confucius say, "Find worm in apple - bad. Find half a worm - worse."
R is a very impressive, mature program that does a hell of a job.
I best liked connecting R data sets to a PostgreSQL database
for my PhD thesis, and then doing statistical data on SQL selections
without bothing about the SQL bits any more.
Also, I see lots of universities in Germany step up and teach R, which I think is good.
- Hubert