Google Snaps Up Stats Tool from Swedish Charity

← Back to Stories (view on slashdot.org)

Google Snaps Up Stats Tool from Swedish Charity

Posted by Zonk on Sunday March 18, 2007 @07:53AM from the at-least-he's-rich-in-spirit dept.

paulraps writes "A stats program that began as a teaching aid for a university lecture has just been bought by Google for an undisclosed sum. The statistics tool, Trendalyzer, was developed by a professor and his son at Stockholm's Karolinska Institute. Unfortunately for the developers, the project has been run under the auspices of a charity, Gapminder, and financed over the last seven years by public money. Maybe that seemed smart at the time, but the professor, admitting that he won't see a dime of Google's cash, now seems regretful. As for what Google has purchased: 'Public organizations around the world invest 20 billion dollars a year producing different kinds of statistics. Until now, nobody has thought of collecting all the information in the same place. That should be possible with Trendalyzer, which will be able to present that quantity of data in a clear way as well as giving the user the ability to compare many different kinds of information.'"

9 of 106 comments (clear)

Ulterior Motives by Seumas · 2007-03-18 08:02 · Score: 5, Insightful

'Public organizations around the world invest 20 billion dollars a year producing different kinds of statistics. Until now, nobody has thought of collecting all the information in the same place. That should be possible with Trendalyzer, which will be able to present that quantity of data in a clear way as well as giving the user the ability to compare many different kinds of information.'" Of course people have thought of collecting all of that information in one single place. Just because none of the services have achieved such massive market share that they essentially did collect all of the stats around the world doesn't mean that wasn't and isn't their goal.

Google, I dig you for now, but I'm not really sure that I care for the idea of having google own nearly all of the search data for every search done by every individual around the planet in the history of google and beyond combined with all of the world-wide traffic analysis data.

And as someone who would be targeted for this service -- why would I bother? There are plenty of free open source utilities out there that provide every ounce of data you could ever want and they're incredibly easy to configure and deploy.

No, the benefit here seems to be less for the end-users deploying the service and more for whoever google then turns around and sells the massive amounts of correlated information to. For instance, let's see every bit of data about a specific user so we can see everything from each search he does to his entire browsing trail. Bet we could sell that for a lot of money!

Hopefully you will still have a simple way as a user to prevent google from collecting this information just like you can do with their stupid Urchin service (by blocking it). And, sadly, people will still continue to use this new service because they'll sell out their mother's medical history and offer up a sample of their own blood and cholesterol ratings if it means getting something "for free".
1. Re:Ulterior Motives by Short+Circuit · 2007-03-18 09:00 · Score: 5, Informative
  
  No, the benefit here seems to be less for the end-users deploying the service and more for whoever google then turns around and sells the massive amounts of correlated information to. For instance, let's see every bit of data about a specific user so we can see everything from each search he does to his entire browsing trail. Bet we could sell that for a lot of money! You've got it backwards...statistics aren't useful when you zoom in to focus on individuals, they're useful when you zoom out to focus on groups. Marketing is rarely about selling to an individual, but to selling to masses. Individuals have too many quirks and preferences to make per-individual marketing efforts worthwhile. Why spend all that effort to gaurantee a sale to one individual, when you can spend the same amount to sell to two or three persent of a group of a few thousand?
  
  "Targeted" advertisements are still group-based efforts. Your individual browsing history is only valuable up to the point where you can be lumped into a marketing stereotype.
  
  About ten years ago, I went online searching for prices on printer ribbons for an IBM Proprinter II. The email address I supplied one website is still receiving spam from that one encounter, not for Proprinter ribbons, not for dot matrix supplies, but for inkjets and toner cartridges. I got lumped into a "shops for printer supplies online" marketing group; nobody's ever sent me an offer for supplies for my Proprinter II. (Though, once he found out I had a use for it, a guy handed me a box of 8.5"x11" tractor feed paper yesterday.)
  
  --
  tasks(723) drafts(105) languages(484) examples(29106)
What does it do? by dour+power · 2007-03-18 08:05 · Score: 5, Informative

Neither article nor summary explain what Trendalyzer actually does. The animated mapping of stats at http://tools.google.com/gapminder is a little more illustrative.
1. Re:What does it do? by ghoti · 2007-03-18 08:16 · Score: 5, Informative
  
  If you want to know what this is about, watch Hans Rosling's ("the professor") excellent talk. This is about bringing lots of data that were collected with public money online so they can actually be used. Rosling uses simple but effective visualization tools (and is a great speaker) to get people interested in the data.
  
  --
  EagerEyes.org: Visualization and Visual Communication
2. Re:What does it do? by ghoti · 2007-03-18 08:25 · Score: 4, Interesting
  
  This has nothing at all to do with the CIA World Factbook. This is not just about collecting data (which it does, of course, and more data over longer time than the Factbook), but about understanding the world through that data. A collection of data is worthless if it isn't used to figure out how to help people in Africa, for example. Rosling shows very clearly that Africa isn't just starving children, and that development aid therefore must be adapted to the exact population it is for. He also has a lot of interesting things to say about the developments in Asia, how health care and economy are connected, etc.
  
  Don't dismiss this without knowing anything about it.
  
  --
  EagerEyes.org: Visualization and Visual Communication
If this was developed with public money... by ciggieposeur · 2007-03-18 08:15 · Score: 4, Insightful

...why isn't it already in the public domain?
1. Re:If this was developed with public money... by ciggieposeur · 2007-03-18 09:28 · Score: 4, Interesting
  
  could you give us any insight into why you think it would be in the public domain?
  
  The law regarding software and publicly-funded inventions has not always been as it is now. It used to be the case that most significant publicly-funded software HAD to be in the public domain, which AFAIK is why we have the BSD license today. Also witness early versions of Gaussian (quantum chemistry).
  
  These days lots of 100% publicly-funded software is not automatically released to the public domain but instead held ransom by the author or university with a separate license permitting unlimited government use. This directly affects me: essentially ALL of the current quantum chemistry code that produces publishable results is no longer free for everyone to use. Though most programs come with source (the have to for some of the systems we need to run it on), their license restrictions are very onerous for developers: only the PI can register to download it, or it costs 5000 euros per seat, or it cannot be ported to other platforms, etc. One program even revokes licenses from academics who use competing software in the same domain! And this almost ALL software written by tenured professors and their graduate students funded from government grants.
  
  I think we all did much better with the old formula. University-developed code should be available for everyone to use, even if that means someone can later come along and compete with a closed-source version.
  
  I'm curious if the Swedish system more closely resembles the current USA system or the old USA system.
People Do Things For Different Reasons by hduff · 2007-03-18 08:30 · Score: 5, Insightful

Lot's of people have great ideas that never reach fruition for reasons that have nothing to do with them. And sometimes, those ideas can take off and be promoted for reasons that have nothing to do with them. Often these things offend our sense of fairness.

Yet life is not fair and often people have regrets and indulge in "what if" fantasies.

For something like this, even if the fellow gets no money, he can get publicity and recognition and might be able to leverage that into something to get him more money if that's what he wants.

The past is past and the price for obtaining "justice" and "fairness" can be quite high and more than one should have to pay; you can lose your future doing it.

Learn from the past and develop a plan to move forward and leverage on the lessons learned; the best revenge is always living well.

--
"I believe in Karma. That means I can do bad things to people all day long and I assume they deserve it." : Dogbert
Significance levels and missing data by Baldrson · 2007-03-18 08:50 · Score: 4, Interesting
I wrote a primitive version of such a site several years ago which I called Laboratory of the States since the goal was to gather lots of demographic variables by State and present ecological correlations.
Shortly thereafter, a site called Nation Master cropped up, with a bit flashier and simpler user interface, but focused on CIA World Fact Book data, rather than the States of the US. (The same folks later did State Master using similar UI technology.)
Finally, Google tested Gapminder with an even spiffier and simpler UI -- again focusing on by Nation correlations.
Aside from the usual complaints about "The Ecological Fallacy" (a fallacy that cuts both ways BTW) there are two big pitfalls for this stuff:
1. Dealing with missing data.
2. Estimating statistical significance.
What I did about missing data was simply eliminate any data points where data was missing from one or both of the variables being correlated. This reduces the sample size, hence statistical significance, but it bypasses arguments over what sort of missing data should be used. The Netflix Prize is coming up with really good algorithms to compute missing data efficiently and accurately so maybe there is hope for something more effective here.
Statistical significance is more difficult to deal with. Usually one must look at tables for statistical significance of correlations under the assumption that the variables each follow a normal distribution. Unfortunately, many variables follow polynomial (like squared) or exponential distributions, so you have to do things like take the sqrt or log of one or both of the variables to try to normalize them. However, when you are looking for correlations, sometimes it its the relationship that is polynomial or exponential -- in which case you can apply sqrt or log to get the maximum correlation coefficient at the sacrifice of normality of one or both of the variables. Unfortunately, there is no simple arithmetic formula for calculating the significance level of a correlation given a non-normal distribution -- you can't just plug in the skewness, kurtosis, etc. as well as sample size and correlation coefficient, and get out a valid statistical significance. Therefore it is hard to make good statements about many very important correlations without watering them down to meaninglessness.
Also, a complaint about the "simple" user interfaces:
Some of the worst reporting from news media comes when they refuse to report statistics in terms remotely related to anything meaningful -- for example you will frequently hear statements to the effect that "California has the most orange trees in the nation." or some such. Such statistics are nonsense for the purposes of correlation studies since the size of the ecology (California state) is all you are really measuring with such statements. You have to divide by the population or divide by the total GDP or something to rationalize the ecology against other ecologies.
In Laboratory of the States, I did this with all my variables but I also left the raw variables around and allowed people to do arithmetic on them -- like dividing them -- to get their own rational comparisons if for some reason my choices were not adequate. This problem isn't as bad with Gapminder as it is with Nation Master and State Master -- but Gapm
--
Seastead this.