Slashdot Mirror


Princeton Researchers Say Feds Need Data Standard

dcblogs writes "The federal government's data-sharing efforts are a mess, and if Barack Obama really wants a useful 'Google for government,' he would have to set the government's vast amount of data free by exposing it and ensuring it complies to standards. Once that happens, commercial sites, aggregators, bloggers and everyone else will be able to access it, use it and transform it, argue a group of Princeton researchers (follow Download link for full PDF)."

9 of 49 comments (clear)

  1. Looks like a job for Microsoft! by erroneus · · Score: 3, Insightful

    What you need is not one [set of] standard(s) but one vendor controlling and maintaining those standards... they know what is best for all of us because they are paid professionals, not hack hobbyists.

    (Yes! I am kidding!!)

    1. Re:Looks like a job for Microsoft! by Daniel+Dvorkin · · Score: 2, Insightful

      You're kidding ... but Microsoft isn't.

      It makes me nervous when people say things like this line from TFA: "Private actors, either nonprofit or commercial, are better suited to deliver government information to citizens ..." Uh, no, governments are better suited to deliver information about themselves, and no matter how bureaucratic or obstructionist the US government may be it's still more open and credible, on a dollar-for-dollar basis, than a lot of the "private actors" who would just looove to charge us an arm and a leg for information we've already paid our taxes for.

      By all means, the government should make raw data as well as user-friendly aggregations available. And if "private actors" can do a better job of aggregating the raw data than the government can, and they can do it well enough to get people to pay them for it, then good for them. But by no means should the large number of (generally well-laid-out and quite informative) government data aggregation websites be shut down as a benefit for corporations that want to sell our own information back to us, which I strongly suspect is the idea that's being pushed here.

      --
      The correlation between ignorance of statistics and using "correlation is not causation" as an argument is close to 1.
  2. Re:Welcome to the "duh" department by Talchas · · Score: 4, Insightful

    No, but it takes at least a team of researchers to get the government to listen.

    --
    As the Americans learned so painfully in Earth's final century,free flow of information is the only safeguard against...
  3. Re:Add them to the buying spree. by wisty · · Score: 3, Insightful

    You can't get the fed buying google, only losses get socialized!
    In the meantime, if you want the government to produce useful data, don't insist that they standardize. Government employees are not particularly good at standardization, and if publishing requirements slow them down, then they just won't release data. Free, standard, and available are all possible, as long as you only want 2.

  4. Imagine a beowulf cluster of lobbyists by Crash+Culligan · · Score: 2, Insightful

    Remember the good old days, when transparency in government could be safely considered a good thing?

    Generally, I'm still for it. Absolutely we need transparency in our government, and anything that brings us closer to point-and-click convenience over what we have now (FOIA requests left behind the radiator for 9–18 months to age and mellow) is for the best.

    Furthermore, an open, accessible standard (i.e. no copyrighted DTDs, and I'm looking at you, Microsoft) will allow government resources to be brought together in interesting and inspiring ways. You know all those Facebook apps and Google Maps mashups? Imagine those applied to governance. The idea behind them is to put information together in new and interesting ways. If not only those in government, but the citizenry, can create government hacks like that, there would be great benefit.

    Now let's talk hazards.

    When was the last time you published your name and address online? See any good uses of microformats on any major sites lately? That's because there are some people on the Internet who are <sarcasm class="churchlady">not so nice</sarcasm>, and might willingly abuse whatever information they can find. The "government hack" alluded to above is an invitation to abuse. And we really can't afford to put government in that kind of position.

    Another consideration, and I've stated this before, is that a wide line must be maintained between security and transparency. Security means that everything that must be kept secret is really kept secret. Transparency means that everything that doesn't have to be secure is made available somehow. If things aren't secured, the government becomes ineffectual and even detrimental. If things aren't kept transparent, the government itself can become abusive. A freely searchable infrastructure would make the transparency all that much more powerful, and make any breaches in security that much more severe.

    --
    You cannot truly appreciate Dilbert until you read it in the original Klingon.
  5. Re:Welcome to the "duh" department by lysergic.acid · · Score: 2, Insightful

    only if you take the long route. the quickest way to get the government to listen is with an unattended briefcase full of cash.

    that is, if you don't already have former board members in the White House.

  6. Re:Add them to the buying spree. by WhatAmIDoingHere · · Score: 3, Insightful

    Well, we pay tax dollars to the government, meaning anything that it does is NOT free. That leaves "standard and available" as the only two options, and I'm fine with that.

    --
    Not a Twitter sockpuppet... but I wish I was.
  7. Good advice doesn't always have good result by feenberg · · Score: 3, Insightful

    I use lots of government supplied data in my work, and one constant has always been that the more work the agency does to make the data easily available, the harder the data are to use. Spreadsheets get posted with labels and data mixed, because that looks better in print. Spreadsheets get posted as PDFs, because that looks better in print. Footnotes and other textual material is mixed into numeric fields, because that is the way the material will be published in hardcopy. etc etc etc.

    Databases get posted to the web with "interfaces" that allow single rows to be downloaded, but require months of screen scraping to get the entire database. Databases get released with (windows-only, of course) software with the same effect. etc etc etc

    The reason is mostly that agencies want to discourage outside analysis of the data - they would prefer to avoid inconsistent messages getting to OMB or congress.

  8. Re:Add them to the buying spree. by Anonymous Coward · · Score: 1, Insightful

    I totally disagree. Who cares what format the data is in? The real challenge is obtaining data that you need in the first place. This is *NOT* about pedantic word document format wars. Its about access to raw datasets.

    You can already download thousands of large datasets in CSV form via FTP from many government information sites. The raw data needs work to process and generate user readable reports from but anyone with half a clue processing data is easily able to do that. Format has always been a non-issue in my experience.

    You can't snap your fingers and tell everyone to use the same schema because all the data is different requiring its own domain specific conciderations.

    The argument that the average joe with no experience working with datasets would be able to download and process this information is nonsense. As TFA mentioned people with experience managing data can make the datasets avaliable to the public online in a more friendly manner with searching, report generation...etc in manners useful to Mom, Dad and Grandma. Competing sites can offer even easier to use services or better insights.

    As an aside anything XML based sucks when you process most of these datasets because they are typically **huge** with millions of records. CSV or fixed width formats are where its at with most currently avaliable data from the US government.