Ask Carl Malamud About Shedding Light On Government Data
If you've ever tried to look up public records online, you may have run into byzantine sign-up procedures, proprietary formats, charges just to view what are ostensibly public documents, and generally the sense that you're in a snooty library with closed stacks. Carl Malamud of Public.Resource.Org has for years been forging a path through the grey goo of U.S. government data, helping to publicize the need for accessible digital archives — not just awkward, fee-per-page access. (Mother Jones calls him a "badass.") Malamud has (with help) been making it easier to get to the huge swathes of data in government sources like PACER, EDGAR, and the U.S. Patent Office. He's got a new initiative now to establish a "Federal Scanning Commission," the task of which would be to assess the scope and outcomes of a large-scale effort to actually digitize and make available online as much as practical of the vast holdings of the U.S. government. ("If we were able to put a man on the moon, why can't we launch the Library of Congress into cyberspace?") Ask Malamud below questions about his plans and challenges in disseminating public information. (But please, post unrelated questions separately, lest ye be modded down.)
Didn't Obama already mandate that all government agencies must digitize their records and develop plans within 4 months? http://www.simplysecurity.com/2011/12/28/obama-administration-pushes-for-digital-records-management-overhaul/
The US actually does a good job with sharing data on regulations and rulemaking on regulations.gov. You can pretty much search any of the regulatory dockets from msot departments, and even access public comments and supporting material. You can even take advantage of regulatory policy updates and eRulemaking Program activities on your Twitter stream. Wouldn't this be a good model to follow to systematically publish everything online? I'm thinking publishing everything online on a government website would make for a great summer job for students, and help boost the economy and employment stats, no?
Which government agency is the worst to get information from?
The amount of information you're trying to free is entirely staggering and consists, largely, of tables of numbers. These numbers are incredibly significant, but people generally can't see them.
After you free all of this information and make it available to the public (as it should be), then what? What do you expect for the public to do with these numbers? Tables of information are not nearly as useful as graphs. This data needs to be seen, but, more importantly, it needs to be understood.
Do you have any ideas for how to disseminate this information? Perhaps a team-up with someone like gapminder.org's Hans Rosling might be particularly valuable for all of us.
I like losing arguments, it just means that I can take your point and make it my own.
Three closely related questions about the rare books collections at the Library of Congress:
1. I know there is some kind of effort going on to digitize the rare books collections, but can it be sped up? There are many high-quality low-cost archival book scanners out there (such as the ones developed at diybookscanner.org).
2. It gets really annoying to have to receive paper copies of books when copies are requested. Why not DVDs of high-quality images?
3. Why is there no outreach by the LoC to smaller, cheaper book scanning efforts? The Internet Archive, DIYBookscanner.org, and Decapod all come to mind.
Towards the Singularity.