Laying the Groundwork For Data-Driven Science
aarondubrow writes The ability to collect and analyze massive amounts of data is transforming science, industry and everyday life. But what we've seen so far is likely just the tip of the iceberg. As part of an effort to improve the nation's capacity in data science, NSF today announced $31 million in new funding to support 17 innovative projects under the Data Infrastructure Building Blocks (DIBBs) program, including data infrastructure for education, ecology and geophysics. "Each project tests a critical component in a future data ecosystem in conjunction with a research community of users," said said Irene Qualters, division director for Advanced Cyberinfrastructure at NSF. "This assures that solutions will be applied and use-inspired."
I just wish the PI could get funding to create a big enough sample to get conclusive evidence, instead of torturing the statistics to keep dragging out what is effectively the same study.
To lay a groundwork for "data driven science" shows the bankruptcy of modern education and culture.
I want to delete my account but Slashdot doesn't allow it.
They just want an excuse to keep building supercomputers with government money.
This sounds suspiciously like something written by someone with an online MBA: "Each project tests a critical component in a future data ecosystem in conjunction with a research community of users," said said Irene Qualters, division director for Advanced Cyberinfrastructure at NSF. "This assures that solutions will be applied and use-inspired."
If we want the public to continue to support federal funding of the sciences we have to do better than this. I understand the point, but it this needlessly laden with buzz-phrases and it is clumsy.
Alan Alda and others have done work with scientists where it comes to communicating to non-scientists, and I'm grateful for it.
from TFA:
"In fiscal year (FY) 2014, its budget is $7.2 billion. NSF funds reach all 50 states through grants to nearly 2,000 colleges, universities and other institutions. Each year, NSF receives about 50,000 competitive requests for funding, and makes about 11,500 new funding awards. NSF also awards about $593 million in professional and service contracts yearly."
and: "awards support research in 22 states"
This particular investment is a tiny fraction of the budget. A low priority.
Note that each congressperson attempts to get government funding for his/her state as part of the obligatory re-election process. Often this funding is for nonsense activity that may provide jobs or incentive for corporate supporters.
Not saying that any of this is pork, but I'd like to know.
...omphaloskepsis often...
So where does all this "data" to be anal-ized come from. I hope it's nothing like empiricism - that wouldn't be real science, like the drawer studies mentioned above.
If the NSF grant process is like the one for NASA, there's still a little bit of flexibility for the program manager after they've gotten the scores.
I know because I was on a panel that specifically gave two proposals 'poor' reviews (the lowest possible), and the program manager asked us to consider changing it. In this case, he's a rather nice guy, and it may just be that he didn't want to have to write the 'your proposal sucks' letter to them ... but those of us on the panel knew that there is _no_ way for them to fund a 'poor'. They have leeway with any other score, and could give something with a marginal rating some seed money (fund 'em for a year, so they might be able to put in a more competitive bid next round).
We told the program manager that no, we wanted to make sure that there was no possible way that those two proposals could get funded.
Build it, and they will come^Hplain.
Thanks.
... is that data isn't evidence. And the simple fact that most people don't understand that simply underscores the danger of it.
Now, science must be empirical. It must be based on observation, experimentation, and the results should drive theory.
However, something that has been worrying for years is a lazy tendency for people... scientists included... to grab a data set, point out some correlating variables, and then conclude a discovery... or propose a theory that is supposed to be taken seriously.
That is wrong. And we all know that is wrong. I'm fine with it if we don't take the study seriously or if they don't just cite correlative statistics. But they do that with depressing consistency. Correlative statistics are not evidence. It is data. But basically anything is data. Having data isn't an accomplishment. It is having some readings or information that could mean anything including nothing at all.
Serious efforts have to be taken to ensure the data is pure of distorting influences. And then you have to set up devil's advocate tests/experiments to make sure that there is some causation going on in the data. Often as not, this isn't happening especially with the "data driven" science which in so far I have seen is code for people that sit at their computers calling up spreadsheets and then concluding things from them. That isn't good enough.
Here someone is going to say I know nothing and that people that call up spreadsheets are doing entirely valid science. Which ignores all sorts of points such them not knowing exactly where the data came from or how it was collected. Oh sure, they might have some notation that says how that was done... but who really knows. Scientists need to be willing to get their hands dirty and get the data themselves. The arm chair stuff is good to a point when the data is known to be good or when someone else went to the effort to sort out all the problems with it. But that isn't terribly common. Most of the sets have issues even after being declared good.
Anyway... I hope this all works out for the best and that my fears in this matter are unfounded. Truly. I just worry that this is going to be more of a giant waste of time.
I've decided to stop wasting my time responding to AC trolls/sockpuppets... so if you want a response from me... login.
Perhaps they should start with an annual walrus census (see story following this one).
This sounds suspiciously like something written by someone with an online MBA: "Each project tests a critical component in a future data ecosystem in conjunction with a research community of users," said said Irene Qualters, division director for Advanced Cyberinfrastructure at NSF. "This assures that solutions will be applied and use-inspired."
If we want the public to continue to support federal funding of the sciences we have to do better than this. I understand the point, but it this needlessly laden with buzz-phrases and it is clumsy.
I understand your point about the technobabble. However, Ms. Qualters' résumé appears to be somewhat less fluffy that the quote would suggest.
If it weren't for deadlines, nothing would be late.
This is the AC.
I posted as AC from my phone knowing that I would make three or more mistakes from my location on the fly. Credit to you for not taking the cheap shot at the missing characters in my subject or the autocorrect on my mid-sentence gaffe.
My criticism wasn't meant to suggest that NSF staff are illiterate, although many technical folks aren't particularly good communicating their ideas. Nor that they are unintelligent.
The problem is that intelligent scientists need to learn how to express the common sense version of the importance of their work *in layman's terms*, as if to a smart brother or sister, or we will keep losing support to denialists, anti-big research types, etc.. The kind of people who didn't want us to go to the moon or send rovers to mars, don't see the point of vaccines, don't want to invest in clean global water, etc...
"correlation does not imply causation" is the typical way to phrase the problem. Talking about it doesn't prevent people from trying, due to the financial gain involved.
All science is data driven. Without data there is no hypothesis, and without hypothesis there is nothing to test (falsify). This is just another hype, like nanotechnology or now nanobiotechnology etc. Nearly all molecules are nanoscale: their size is measured in nanometers, and in the same way all science is data driven.
There is nothing wrong with good old "science driven science" where people think, do experiments, and think again.
Science may be data-driven, but historically scientists have not been trained to be good data custodians. They know reasonably well how to use data, but they don't know how to store it, label it, transfer it, etc. Go pick an article from 5 years ago which is data-heavy and try to get the original dataset from the authors: 95 times out of a hundred you'll spend a month emailing people and you'll end up with nothing. Four more out of the 100 you'll get an Excel spreadsheet without labels on the columns. Scientists desperately need to become better at managing data.
Personally, I think that this program is targeting a small subset of the people who need help, and as such it won't be very effective. These look like infrastructure projects, but infrastructure only drives trends in extremely rare cases. Here's a quote from one funded proposal:
This project develops web-based building blocks and cyberinfrastructure to enable easy sharing and streaming of transient data and preliminary results from computing resources to a variety of platforms, from mobile devices to workstations, making it possible to quickly and conveniently view and assess results and provide an essential missing component in High Performance Computing and cloud computing infrastructure.
Will that project help teach scientists they shouldn't email files to themselves as a method of long-term archival? Yes, that really is extremely common. We should be focusing on building data tools which are extremely simple, extremely broad in scope, and encourage or force adoption of those tools.
I can save the NSF a bunch of money with this initiative. There's a data center in Utah that's not being used (for anything legal) with a huge amount of data storage capacity. The NSF should have it.