Brown Dog: a Search Engine For the Other 99 Percent (of Data)
aarondubrow writes: We've all experienced the frustration of trying to access information on websites, only to find that the data is trapped in outdated, difficult-to-read file formats and that metadata — the critical data about the data, such as when and how and by whom it was produced — is nonexistent. Led by Kenton McHenry, a team at the National Center for Supercomputing Applications is working to change that. Recipients in 2013 of a $10 million, five-year award from the National Science Foundation, the team is developing software that allows researchers to manage and make sense of vast amounts of digital scientific data that is currently trapped in outdated file formats. The NCSA team recently demonstrated two publicly-available services to make the contents of uncurated data collections accessible.
The problem is that 99%* of data is actually trapped behind paywalls...
Which is more of a problem than the format. If the data was available without the paywall, then the format probably wouldn't matter as much.
GrpA
*99% is a made-up statistic - just like the original article. I assume it means "lots..."
Enjoy science fiction? "Turing Evolved" - AI, Mecha, Androids and rail-gun battles. What more could you want?
Isn't gathering, indexing, and trying to find heads/tails of data what Splunk is designed for? It is a commercial utility, and not cheap by any means... but at least this is one software package meant to sift through and generate reports/graphs/etc on stuff.
Disclaimer: Not associated with them, but have ended up using their products at multiple installations with very good results (mainly keeping customers happy with a morning PDF report that all is well, with the charts to prove it.)