Domain: r-project.org
Stories and comments across the archive that link to r-project.org.
Comments · 217
-
Why don't you use R, Julia, ...?
-
sloth is eternal
The default behavior is to treat the field as whatever you've told the spreadsheet that it is. By default, every cell is set up for numeric data types.
... The problem is misuse of tools, not a problem with the tool.A process of "five whys" applied to the present discussion immediately reveals "default numeric" as bad policy in academic research.
A sane default would be "untyped" or "exactly as entered" which shifts sins of omission into sins of commission, this being far more compatible with the culture and standards of scientific journal publication than what Microsoft originally chose, mainly for the convenience of boutique-reseller power demos. Also, the more collaborative the environment, the more important it becomes to enforce a strong-typed, sin-of-commission data model.
This is all covered in the first week of Graybeard 101 as taught with slate tablets back in the stone age. I was there in 1985. Microsoft has had wool in its ears since forever. Still doesn't make it right, does it?
Furthermore, anyone who really cares about data pipeline integrity writes an export function from the derived format back to the raw input format, until they come out exact, or every difference is adjudicated and signed off, which is incorporated into an automatic validation task which can be repeated at any point in time for the life of the project.
CRAN Task View: Reproducible Research
LaTeX was originally written in the early 1980s by Leslie Lamport at SRI International.
Leslie Lamport won the Turing Award in 2013 for his uber graybeard rectitude, if anyone cares to notice. Douglas McIlroy made his seminal contributions in 1968 (Bill Gates was thirteen, but perhaps he was already set in his ways). John Backus delivered his Turing Award lecture "Can programming be liberated from the von Neumann style?" in 1977, which inaugurated the modern tradition in functional languages (Bill Gates was then twenty-three).
Competence is hard. Sloth is eternal. We continue to seek a third way.
-
Re:BronsCon why're you talking out your ass?
I have an advanced degree in statistical sciences. First, understand that there's quite a lot more to statistics than you learned in intro courses. Having taken 'stat I/II' does not qualify you to speak with authority on statistical computing as a whole, regardless of the grades you got.
I, and many of my colleagues, use R as our daily language for statistical computing. It's interpreted, so it's very easy to work with program code and results at the same time. It also has a tremendous number of libraries (7905 at the moment, just counting CRAN: https://cran.r-project.org/) that are under continual development by leaders in the field of statistics. We use it because it is the language of our field. Could we write code to do the same thing in Delphi? Yes, certainly. But that's not the language we chose.
R is not an appropriate language for writing device drivers or complex applications. It's not particularly good for programming user interfaces. It's not (typically) compiled, so it's likely to be slower than efficient code written in compiled languages. There are many other things that R isn't very good at. But, as statisticians, we don't want to do those things; we want to do statistics. R has its warts, but regardless of your personal opinion of it, statisticians are not going to switch to Delphi any time soon.
-
Re:Seems counter-productive
Just use ape.
-
Re:Easy to explain
This is two groups: Gangolf Jobb and the editors of BMC Evolutionary Biology fully exercising their rights.
Gangolf Jobb has every right to license his software in any way he sees fit.
The editors of BMC Evolutionary Biology have every right to set the publication policy for their journal.
Everyone has a right to look like an ass in public.
And nothing of value was lost. Just use ape.
-
RColorBrewer
For R users, see the RColorBrewer package for an easy way to use these palettes in maps and charts.
-
Re:R wont run on linux soon
though I would expect that they provide some administrative support in some form (perhaps in similar manner that the FSF does for many open source projects
What "administrative support" do you think the FSF provides for "many open source projects"? All they ever seem to want to do is for you to transfer your copyright to them based on bogus justifications.
As far as the R Project is concerned, I don't see them listed as benefactors or supporting institution:
http://www.r-project.org/found...
Furthermore, the R copyright hasn't even been transferred to the FSF, it's held by the R Foundation.
I think this confusion over R illustrates again how the FSF likes to misrepresent its contributions and significance.
Being associated with GNU and the FSF used to be a positive thing; these days, I think it's a net negative for any project.
-
Re:R wont run on linux soon
http://www.r-project.org/ also states that "R is a language and environment for statistical computing and graphics. It is a GNU project which is similar to the S language and environment which was developed at Bell Laboratories (formerly AT&T, now Lucent Technologies) by John Chambers and colleagues." So obviously the GNU project itself doesn't do a lot of actual development, though I would expect that they provide some administrative support in some form (perhaps in similar manner that the FSF does for many open source projects).
-
Re:Why oh Why
This is about R. That would be difficult to do with a list of contributors this long not impossible, well, yeah, probably impossible.
http://www.r-project.org/contr... -
Re:Portmasterz luv R
You haven't lived until you've ported a 3-D shooter like crysis over to R!!!
This is a collection of R games and other funny stuff, such as the classical Mine sweeper and sliding puzzles.
-
Re:Here's a link to a story about it.
Hey... Citation was requested... I provided.
A citation was requested, but you did not provide any citation worthy of consideration.
No idea to whom the website belongs.
It doesn't matter to whom the website belongs. What matters is whether the citation is either to a recognised (eg ISI listed) peer reviewed journal appropriate to the subject matter, or to some similar source of data carrying due authority and credibility. I mean a citation to someone's slashdot comment, for instance, would hardly be admissible would it?
Right this moment - the global warming appears to have leveled-off. These are simply facts... no parlor tricks here.
Just for a quick check throw the yearly anomalies (here's the GISSTEMP data) into R and see if the slope is flat. Here
... I'll make it easy for you to get stared (but do improve on this and double check my numbers for the likely transcription error %-) ) :year <- c(1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013)
anom <- c(33, 46, 62, 41, 41, 53, 62, 61, 52, 67, 60, 63, 50, 60, 67, 55, 58, 61)Then plot it and draw a line of fit. (For interest you can check the correlation using cor(year, anom).)
plot(year, anom)
fit <- lm(anom ~ year)
abline(fit)Does that even look flat to you?!
Now given that this is part of a curve which is showing an unequivocal rise over the last 50 years, let alone the entire record, please devise a test to demonstrate that these 18 years show any significant "levelling off" of the long-term trend. And then get back to me with the code. Hell no, get back to the scientific community, with your code
... fame awaits you!The real question you ought to ask however, is what relevance so short a period (15, 16, 17 or even 18 years) has to data which is not only extremely noisy, but is known to be subject to multi-decadal cycles? If someone asks you to look at climate data over a period of less than at least half a century
... grab your wallet tightly!Facts? No parlor tricks? Having examined the data for yourself, do you still believe that?
-
Re:Spreadsheets as a software development platform
ahh the old "when all you have is a hammer, every problem looks like a nail." issue huh? You need to ask yourself if something can be prototyped in Excel, that's fine. There are tools out there, open source if you have zero budget. If you have to prototype out a solution or create a solution that's robust, that can be documented and reviewed, it's more often easier to use something like "R".
While you can get easy answers from a spreadsheet you may not necessarily get the right answer and that's where their abuse becomes apparent. I've seen railroads actually use Excel to manage daily consist for trains and when they started using it was for an emergency, but then every day suddenly had an emergency/exception and then it became operationally necessary. That was until the guy who based his whole job/career around maintaining the spreadsheet finally retired, then it was WTF? for the rest of the operation because nobody could make sense out of it.
-
even better: R
I'm a bit surprised to find that, 60 comments in, nobody's yet suggested R , e.g. http://cran.r-project.org/ as an alternative. There are several different GUIs available for R (Rstudio, Rcommander, Rjava,...), it's 100% opensource, and frankly most of us R users find the syntax and flexibility to be far better than MatLab. And the graphics have to be seen to be believed. You can do anything and then some.
-
Cowtan & Way 2013 trend is inside HadCRUT4 err
Cowtan and Way 2013 compensated for missing HadCRUT4 surface temperature measurements in places like the Arctic and Africa by using the spatial pattern of satellite data to produce a hybrid satellite/surface dataset. Jane and Lonny ponder the differences between Cowtan and Way's hybrid dataset and HadCRUT4:
I keep asking: what's wrong with my basic premise: that if your measurements are shown to be off by 100%, there's something wrong with your science? That was my point. [Jane Q. Public]
... They are saying that it is not the 0.05 degrees C per decade that the AR5 report gives for the last 15 years, but that it is, instead, 0.12 degrees C. Which is actually a difference of not 100% but 140%, for the most recent 15 years. [Jane Q. Public]
@ScienceChannel @jimmygle PLEASE tell the Anthropogenic Global Warmists! Yet another report surfaced saying their "science" was off by 140% [Lonny Eachus]
Jane and Lonny's basic premise wrongly ignores the large error bars on these noisy, short-term trends. The SkS trend calculator can calculate the trends and error bars from 1997 through (including) 2012 for both HadCrut4 and Cowtan and Way's hybrid dataset:
1997-2013 HadCRUT4 Trend: 0.049 0.126 C/decade
1997-2013 HadCRUT4 hybrid Trend: 0.119 0.150 C/decadeThe hybrid dataset's central estimate is inside the error bars of the original HadCRUT4 estimate.
... they haven't been right yet... They admit that they have no explanation why their models, which projected continued if not increased warming, do not explain why it has dropped by more than half (0.12 to 0.05 deg. C / decade) over the last 15 years. Or, for that matter, why their margin of error (-0.05 to +0.15 deg. C) for the last decade and a half is 4 times the size of their actual estimated warming. Nope... it's pretty damned clear. Something is wrong with their science. [Jane Q. Public]
I calculated error bars on UAH trends. The black line on the second page shows the UAH trend ending in 2012, for different starting years. The error bars are shown in red; they're 95% confidence uncertainty bounds. Note that error bars on longer trends are smaller than the large error bars on shorter trends.
Anyone can reproduce my results by downloading the free "R" programming language used by professional statisticians. Then save this code as "significance.r":
# run using R CMD BATCH significance.r
# outputs to Rplots.pdf and significance.r.Rout
# load custom functions
# for generalised least squares
library(nlme)
# options
xunits="year"
textsize=1.4
titlesize=1.8
colfit="red"
pch1=20#points
# read basin data
indata = read.table("greenland2013/GIS_climate.nasa.txt",header=T)
title="Greenland mass"
yunits="gigatons"
tlims=c(-350,-190)
alims=c(-60,0)
#indata = indata[which(indata$x>2002.0),]
# remove mean
indata$y = indata$y - mean(indata$y)
n = length(indata$x)
n
midpoint=(indata$x[n]-indata$x[1])/2.0+indata$x[1]
# fit model
fit=gls(y~x,data=indata,corr=corARMA(p=1,q=1))
#fit=gls(y~x+sin(2*pi*x)+cos(2*pi*x),data=indata,corr=corARMA(p=1,q=1))
#fit=gls -
Re:What The Holy Fuck?
The submitter presumably thought that enough people on
/. should be familiar with R, the most popular statistical programming language, or from the context (i.e. R is mentioned together with Perl), would infer that it's a language, and google, "r language", or something along those lines. These assumptions seem pretty reasonable. Here's a bit of help:http://www.r-project.org/
https://en.wikipedia.org/wiki/R_(programming_language) -
Re:More details?
Second this. There are numerous languages out there that are tailor-made for specific kinds of problems. You didn't quite share enough to narrow down what kinds problems you need to solve, but the R project is geared toward number crunching, albeit with a significant bent toward statistics and graphic display.
If that's not pointed in the right direction, some other language might be. Alternatively, there are a lot of libraries out there for the more popular languages that could help with what you're doing. Heck, 12 years ago we didn't even have the boost libraries for C++. It's difficult for me to imagine using that language with out them now.
-
Re:How modern!
I recently switched my scientific programming from R to Python with NumPy and Matplotlib, as I couldn't bear programming in such a misdesigned and underdocumented language any more. R is fine as a statistical analysis system, i.e. as a command line interface to the many ready-made packages available in CRAN, but for programming it's a perfect example of how not to design and implement a programming language. It's also unusably slow unless you vectorise your code or have a tiny amount of data. Unfortunately, vectorisation is not always possible (i.e. the algorithm may be inherently serial), and even when it is, it tends to yield utterly unreadable code. Then there is the disfunctional memory management system which leads you to run out of memory long before you should, and documentation even of the core library that leaves you no choice but to program by coincidence.
As an example of a fundamental problem, here's an R add-on package that has as its goal to be "[..] a set of simple wrappers that make R's string functions more consistent, simpler and easier to use. It does this by ensuring that: function and argument names (and positions) are consistent, all functions deal with NA's and zero length character appropriately, and the output data structures from each function matches the input data structures of other functions.". Needless to say that there is absolutely no excuse for having such problems in the first place; if you can't write consistent interfaces, you have no business designing the core API of any programming language, period.
Python has its issues as well, but it's overall much nicer to work with. It has sane containers including dictionaries (R's lists are interface-wise equivalent to Python's dictionaries, but the complexity of the various operations is...mysterious.) and with NumPy all the array computation features I need. Furthermore it has at least a rudimentary OOP system (speaking of Python 2 here, I understand they've overhauled it in 3, but I haven't looked into that) and much better performance than R. On the other hand, for statistics you'd probably be much better off with R than with Python. I haven't looked at available libraries much, but I don't think the Python world is anywhere near R in that respect.
Anyway, for doing statistics I don't really think there's anything more extensive out there than R, proprietary or not, although some proprietary packages have easier to learn GUIs. In that field, R is not going to go anywhere in the foreseeable future. For programming, almost anything is better than R, and I agree that those improvements you mention are not doing much to improve Rs competitiveness in that area.
-
Re:Great
Not that I disagree with your comment about Mathematica, but I do want to point out that there are institutions for which it makes sense to download R http://www.r-project.org/ for free, even if the machine itself is way more expensive than their combined salaries.
LibreOffice had 16% market share already in 2010 next to MS Office 56% http://store.steampowered.com/hwsurvey?platform=combined And now from http://www.theregister.co.uk/2013/02/08/libreoffice_40_ships/
"LibreOffice 4.0 ships with new features, better looks. Slowly closing the gap with Microsoft Office"
-
R; apt-get install r-base
If you're not afraid of programming (and it sounds like you're not): R. Gimme more details if you want to know what packages to use for graphing and stuff but installing R is incredibly easy. At the risk of tooting my own horn, you can read through this post, the corresponding story and the replies to it. There are a ton of packages for producing graphs. Are you going for accuracy? Beauty? Speed? What?
Lastly, please don't hate on the TI-84. I still have mine as well as a TI-89 and while they were both expensive, they are beautiful and trustworthy devices. Both have outlasted countless other computing machines that have passed through my usage. -
I freakin' love Kaggle
I've been working on the Heritage Health Prize that Kaggle is running for over a year now. It's a fantastic way to learn data science and tackle real world problems with real data and a co-op-etitive spirit. The forums and winning solutions are great for learning the art, and if you've never used R, it's a great opportunity to learn it and talk to people that have a ton of experience in the area.
-
Re:Good
if Excel is not a "quality tool" what is?
-
Seriously?
When you have R, you hardly need any lousy calculators like this.
-
Re:first post?
So true. Also R is great for vector-based data as well, and does some stuff quite a bit better (and some stuff quite a bit worse) than scipy/matplotlib.
-
Re:R or WEKA ... Wait, What Exactly Are You Doing?
Along with R, consider using Rattle (the R Analytic Tool To Learn Easily)
-
Re:R or WEKA ... Wait, What Exactly Are You Doing?
He said he wants something that is easy to implement, and only reason he is going with open source is because then he doesn't have to ask for purchase approval. Which IMO is a really stupid reason and will hurt in the long run - it's insane to take worse software just because you don't want to ask your boss if it's okay to buy this one.
Horse shit. I've seen projects die because they couldn't get software through the approval process. Better to try 10 apps that are free and run in userspace (so no need to get IT involved for an Administrator install) than to wait for management approvals, budget cycles, and IT support, and never get the project done. If I'd done that on the job, I'd have been fired for taking too long to do my work.
I also resent the implication the "free" means "worse."
Sorry to burst your bubble, but if you want good support and easy implementation, you have to look for normal paid-for solutions. Besides, open source is not synonym for free. This is especially true with specialized software or something you want good support for. Open source just means you get the code aswell, so you can implement your own additions (without use of plugins) or change it.
I'm guessing you haven't used R. Not only is there a thorough user manual, but there are books from most major statistical and instructional groups on how to use R, AND the R-help mailing list answers every R question I've ever had about it, AND there are local R user groups where you can get support similar to how LUG's work.
But unless you get an product from a company that is spending money to develop it, you never get good software and good support. No one can make both because everything in this world costs money, and developers have to live too. Open source and free software model works well for the likes of Google and Firefox because the developments get paid by money made with advertising. Statistical analysis software, and other specialized software is a different matter.
Please shut up. If your assumption were true, R would not exist. R exists, so you're just an asshat.
My advice to the original poster: Use R if you have any familiarity with programming. Any higher level math/stat course OR experience with basic programming will let you get started in R. If you've been doing this all in Excel already, you're probably ready to hop into R. If you're still uncomfortable, I'm sure one of the people who value your academic library could help out.
-
R works with both PostgreSQL and MySQL
-
R - There is nothing that beats it on any platform
R
There is nothing that beats it on any platform. Some links:
http://www.sr.bham.ac.uk/~ajrs/R/r-gallery.html
http://addictedtor.free.fr/graphiques/index.php
http://opencpu.org/
https://r-forge.r-project.org/
http://hlplab.wordpress.com/
http://rseek.org/
http://www.r-bloggers.com/ -
R or WEKA ... Wait, What Exactly Are You Doing?
R is my personal favorite but you're going to have to get down and dirty with some high level programming (scripting). Check out the data import package (you would probably export your spreadsheets to flat txt files and import although the functionality is ever increasing). There's no user interface in this suggestion
... what there is, however, is a massive collection of packages for statistical analysis. Very well maintained, constantly updated and ever expanding.
The other suggestion has a better GUI but is really heavyweight. WEKA has helped me time and time again perform advanced statistical calculations on data sets and it's in Java so runs on just about anything. Their interface occasionally improves too, they now have an explorer that I use to prep data and remove outliers/null data (don't worry, this isn't climate data). It's well documented.
These (probably) require an intermediate data transformation step but are open source and extensively supported. Any examples of what you wanted to do? Simple stuff like standard deviation or complex stuff like principle component analysis (PCA)? I guess if it was just simple stuff, that'd be built into Excel, right? Maybe your problems are simple enough to just need a good macro writer to tackle? Whatever happens, good luck! -
R or WEKA ... Wait, What Exactly Are You Doing?
R is my personal favorite but you're going to have to get down and dirty with some high level programming (scripting). Check out the data import package (you would probably export your spreadsheets to flat txt files and import although the functionality is ever increasing). There's no user interface in this suggestion
... what there is, however, is a massive collection of packages for statistical analysis. Very well maintained, constantly updated and ever expanding.
The other suggestion has a better GUI but is really heavyweight. WEKA has helped me time and time again perform advanced statistical calculations on data sets and it's in Java so runs on just about anything. Their interface occasionally improves too, they now have an explorer that I use to prep data and remove outliers/null data (don't worry, this isn't climate data). It's well documented.
These (probably) require an intermediate data transformation step but are open source and extensively supported. Any examples of what you wanted to do? Simple stuff like standard deviation or complex stuff like principle component analysis (PCA)? I guess if it was just simple stuff, that'd be built into Excel, right? Maybe your problems are simple enough to just need a good macro writer to tackle? Whatever happens, good luck! -
R or WEKA ... Wait, What Exactly Are You Doing?
R is my personal favorite but you're going to have to get down and dirty with some high level programming (scripting). Check out the data import package (you would probably export your spreadsheets to flat txt files and import although the functionality is ever increasing). There's no user interface in this suggestion
... what there is, however, is a massive collection of packages for statistical analysis. Very well maintained, constantly updated and ever expanding.
The other suggestion has a better GUI but is really heavyweight. WEKA has helped me time and time again perform advanced statistical calculations on data sets and it's in Java so runs on just about anything. Their interface occasionally improves too, they now have an explorer that I use to prep data and remove outliers/null data (don't worry, this isn't climate data). It's well documented.
These (probably) require an intermediate data transformation step but are open source and extensively supported. Any examples of what you wanted to do? Simple stuff like standard deviation or complex stuff like principle component analysis (PCA)? I guess if it was just simple stuff, that'd be built into Excel, right? Maybe your problems are simple enough to just need a good macro writer to tackle? Whatever happens, good luck! -
Add R to the listMost new additions to R project are highly academic works, many coming from BioInformatics research as well.
However, some of the modules which people find really useful are rewritten by the core team, so one could say that they were not an output of the PhD/Masters.
In the larger scheme of things, the solutions by academics remain solutions for academic only until they are widely adopted. Then they permeated textbooks, and become the standard solutions of a useful problem. For this, there will exist a software (probably a rewrite) which has optimized it to within an inch of its life.
So the ideas behind the software live on, while the actual lines of code might not.
-
Re:Too Old to learn a programmign language at 40?
Get your dad on R for statistical analysis. Even if you love to program (and I do), doing it in C can be a grind. R, like Perl and Ruby, has a HUGE library which is dead simple to use (just about as easy to use as RubyGems), and very high quality. Plots are easy to do and look beautiful (especially if you use Hadley Wickham's ggplot2 library). We use it in our department because when it comes time to do the analysis, we want to be focusing on the math, not whether we have some null pointer dereference hiding somewhere. If you taught yourself Scala and Erlang, then R will be a piece of cake.
-
Re:Too Old to learn a programmign language at 40?
Get your dad on R for statistical analysis. Even if you love to program (and I do), doing it in C can be a grind. R, like Perl and Ruby, has a HUGE library which is dead simple to use (just about as easy to use as RubyGems), and very high quality. Plots are easy to do and look beautiful (especially if you use Hadley Wickham's ggplot2 library). We use it in our department because when it comes time to do the analysis, we want to be focusing on the math, not whether we have some null pointer dereference hiding somewhere. If you taught yourself Scala and Erlang, then R will be a piece of cake.
-
Re:Let's hope that 15%...
why not have the researchers break windows for a living?
there is good natural language research. this, however, could be done (given the data) by one person in a few hours with prepack software: http://cran.r-project.org/web/packages/textcat/index.html
-
Have a look at R
Have a look at R, http://www.r-project.org/, which is math related.
Their developers page is at http://developer.r-project.org/
The R Project has again participated in the Google Summer of Code during 2010 http://www.r-project.org/soc10/index.html which had several projects in C++
I think R could be nice as it combines a high level approach with the lower level of C++.
They also have forum at http://groups.google.com/group/gsoc-r/topics
-
Have a look at R
Have a look at R, http://www.r-project.org/, which is math related.
Their developers page is at http://developer.r-project.org/
The R Project has again participated in the Google Summer of Code during 2010 http://www.r-project.org/soc10/index.html which had several projects in C++
I think R could be nice as it combines a high level approach with the lower level of C++.
They also have forum at http://groups.google.com/group/gsoc-r/topics
-
Have a look at R
Have a look at R, http://www.r-project.org/, which is math related.
Their developers page is at http://developer.r-project.org/
The R Project has again participated in the Google Summer of Code during 2010 http://www.r-project.org/soc10/index.html which had several projects in C++
I think R could be nice as it combines a high level approach with the lower level of C++.
They also have forum at http://groups.google.com/group/gsoc-r/topics
-
Re:Stress?
I read the first letter of your post but got distracted after I googled it and discovered the molar gas constant (8.314472 m2 kg s-2 K-1 mol-1) and the r-project for statistical computing – wait, what was I saying again? Ooh, what’s this “Submit” button do...
-
Re:Probability in computers: it's called a float
Agree on the GPU part. If you haven't seen them, check out the gnutools and cudaBayesreg R packages. They don't look too easy to use now, but eventually this will become mainstream.
-
Re:Probability in computers: it's called a float
Agree on the GPU part. If you haven't seen them, check out the gnutools and cudaBayesreg R packages. They don't look too easy to use now, but eventually this will become mainstream.
-
Re:SPSS since 1968!!!
Most of the statisticians I know use R.
-
Re:Looking for a good book on statistics
Like Daniel Dvorkin has said Devore's book Probability and Statistics for Engineering and the Sciences is an excellent starting point.
Definitely learn to use R since its free you don't have to worry about paying licensing fees. It is also widely used (no matter what you here from SAS, Minitab, SPSS, etc).
Books I would recommend that I think fit his other suggestions are Bowerman/O'Connell Linear Statistical Models: An Applied Approach and Wackerly et al Mathematical Statistics with Applications
Devore talks about Bayes Rule as does Wackerly and Wackerly's last chapter talks about some Bayesian techniques, but these are merely primers for what is typical in a Bayesian course. So I recommend these two books as analogous with Devore's: Bolstad Introduction to Bayesian Statistics and to Wackerly's: Hoff A First Course in Bayesian Statistical Methods
Some things you need from mathematics are the ability to integrate, work with matrices and matrix operations, and algebraic manipulation. Familiarity with transformations and operators especially linear ones is useful since many procedures in statistics are linear operators. The highest levels of statistics will get even more math intense using mathematical results from areas like ODE/PDE, Galios Theory, or general Measure Theory.
The wikipedia's statistics articles are pretty good overall, but as Dvorkin noted some are more technical than what would be friendly to those that are new to statistics. When you feel that's the case try using the sources linked as citations in the article or google confusing parts and it is generally possible to find an explanation for almost any background level.
However if you can get through these texts you're background would be pretty strong. -
Re:Looking for a good book on statistics
Devore's Probability and Statistics for Engineering and the Sciences is probably the best one-volume, undergrad-level intro to statistics out there. Get a copy (I think it's on the sixth or seventh edition now; you can pick up a fifth edition for cheap) and work your way through that, and you'll have a pretty good idea of where all those formulae come from and how they're used. Get a copy of R and check out the "Devore*" packages in the package list too. If you want to learn more after that, I recommend Kutner et al.'s Applied Linear Statistical Models for applications, and Casella and Berger's Statistical Inference for theory.
The Wikipedia stats pages are pretty good for most things, but many of them are written with the assumption of a lot of background knowledge. If you open up a page on a particular stats subject and you comprehend it, great; if not, be prepared to do a lot of digging outside of Wikipedia, because trying to figure out the subject from the links to other WP pages is an exercise in circularity.
-
Re:Looking for a good book on statistics
Devore's Probability and Statistics for Engineering and the Sciences is probably the best one-volume, undergrad-level intro to statistics out there. Get a copy (I think it's on the sixth or seventh edition now; you can pick up a fifth edition for cheap) and work your way through that, and you'll have a pretty good idea of where all those formulae come from and how they're used. Get a copy of R and check out the "Devore*" packages in the package list too. If you want to learn more after that, I recommend Kutner et al.'s Applied Linear Statistical Models for applications, and Casella and Berger's Statistical Inference for theory.
The Wikipedia stats pages are pretty good for most things, but many of them are written with the assumption of a lot of background knowledge. If you open up a page on a particular stats subject and you comprehend it, great; if not, be prepared to do a lot of digging outside of Wikipedia, because trying to figure out the subject from the links to other WP pages is an exercise in circularity.
-
One stop shopping: R
R.
-
Insufficient information
By "interesting things", we'll assume you mean interesting to him.
Given his age, that probably means something webish, so Javascript is the obvious choice for the kind of instant gratification a 12-year-old will need.
If he's into games, then the language of choice is probably whatever will let him mod his favourite.
If he likes to play with numbers, it's VBA and Excel--or R.
Is he into computer graphics (not digital painting)? Then you want to introduce him to Processing.
Lots of choices
-
Re:RealClimate has a big reply on this
McIntyre's paper (23 pages)
It comes down to that the thought that Mann in 1998 got his math wrong when performing principle component analysis over the data, particularly while using Bristlecone Pine cores. (Which I have to say is an amazing organism. They live at least 5,000 years. Wow. Just wow.) The 2008 Mann paper does two different analyses, one with tree cores, and without. The hockey stick remains in both.
Mann should have given McIntyre the data, which he started to do, but then stopped for some reason. Why, I don't know. I suspect there was some personality clash, but that's just speculation on my part. Just release the data. Who cares? If someone wants to examine the samples directly, then let them if they real credentials (i.e. a PhD in climatetology or some other related field). It's just impractical to give access to every Joe down at the bar, samples can be damaged. It's a scientific resource worth millions, not an exhibit at a hands-on museum.
The conclusions of all the investigations was that Mann didn't do PCA right, like how McIntyre said, but McIntyre didn't do it right either.
Perhaps the most interesting for you would be this link than contains links to the Mann's data, and statistics source code to analyze the data correctly. You just have to download R.
-
Derivative works and interpreted languages
One thing that's often confused me is the exact relationship between the GPL and interpreted languages. For example, if I write a perl script which calls perl functionality which is part of the base interpreter, my script need not be distributed under the terms of the GPL. This is akin to using a GPL word processor or other software, where the output of a program is not subject to the GPL.
If, on the other hand, my script calls a perl function which is itself written in perl (licensed under the GPL), the FSF argues that this constitutes a derivative work akin to dynamic linking. Thus, my script (if distributed) must be distributed under a GPL-compatible license.
I can see it both ways. On the one hand, calling a function written in the same interpreted language is very much like calling a function in a library from a compiled binary. On the other, it's strange to think that there's a distinction based on whether the function being called is written as part of the interpreter (in, for example, C) versus the interpreted language itself. In addition, there seems to be disagreement about whether the GPL really binds like the FSF claims. Lots of interpreted code gets released as the GPL when it seems likely that the LGPL is what the authors really intend; that is, they do not want to restrict scripts and functions which call the code.
A good example of this is R This statistical language has fairly small interpreter and a large set of both included and downloadable packages, themselves written in R (and licensed under the GPL). Clearly most of the primary authors do not intend for all R scripts using the most basic of functionality to be released in a GPL compatible way; for one, they make the header files necessary for writing C-based libraries for use in R LGPL to explicitly allow such libraries to be non-free. In addition, they are fine with a large number of downloadable packages which restrict commercial use (obviously not allowed under the GPL). Their interpretation of the GPL seems at odds with the FSF. Even if you want to release all your code in a GPL-compatible way, it may be (IANAL) that you cannot call both code restricted from commercial use and GPL-licensed code (basically unavoidable) in the same project.
-
Re:Eyecandy in cost of usability
R can do a lot more than Excel can.
-
Re:University of brussels does
In the courses no software that doesnt run under linux is being used by the CS department, but for courses like statistics with SPSS we're pretty much pooped. Luckily we had to make a task about Machine Learning instead of messing with SPSS, but that doesn't count for people not studying CS.
If you want something like SPSS but open/free, you might try R. It is command line, but has some gui front ends available, and it might very well meet your needs.