Slashdot Mirror


Princeton Researchers Say Feds Need Data Standard

dcblogs writes "The federal government's data-sharing efforts are a mess, and if Barack Obama really wants a useful 'Google for government,' he would have to set the government's vast amount of data free by exposing it and ensuring it complies to standards. Once that happens, commercial sites, aggregators, bloggers and everyone else will be able to access it, use it and transform it, argue a group of Princeton researchers (follow Download link for full PDF)."

49 comments

  1. Rank and filed. by Anonymous Coward · · Score: 1, Interesting

    I'm not certain I would agree with setting it all free. However it all does need to be standardized.

    As for the rest. WEB 2.0!

    1. Re:Rank and filed. by PitaBred · · Score: 1

      And no one said set it ALL free. But to set what data that citizens should be able to access free, it needs to be in an open format that anyone can access.

  2. Add them to the buying spree. by bigtallmofo · · Score: 4, Funny

    Barack Obama really wants a useful 'Google for government,'

    Well, so far the government has bought parts of Bear Stearns and AIG. Maybe it's time they diversify into some technology companies like Google? Hell, let's buy them too!

    --
    I'm a big tall mofo.
    1. Re:Add them to the buying spree. by Anonymous Coward · · Score: 0

      Aw, come on mods, you can't help but laugh at the current American socialist revolution caused by unpopular uprising.

    2. Re:Add them to the buying spree. by wisty · · Score: 3, Insightful

      You can't get the fed buying google, only losses get socialized!
      In the meantime, if you want the government to produce useful data, don't insist that they standardize. Government employees are not particularly good at standardization, and if publishing requirements slow them down, then they just won't release data. Free, standard, and available are all possible, as long as you only want 2.

    3. Re:Add them to the buying spree. by betterunixthanunix · · Score: 4, Interesting

      It is more a question of whether or not citizens will be able to access government data in a meaningful way. If the government wants to standardize its data, it can, assuming it contracts with a company that actually knows what it is doing (this is the real hitch). Government employees need to be able to continue doing what they normally do, and have the standardization happen automatically -- such as a MS .doc to ODF converter that silently makes the conversion whenever a file is saved, or another tool that automatically indexes files as they are saved. Such things already exist, it is just a matter of implementing on the scale of the government.

      --
      Palm trees and 8
    4. Re:Add them to the buying spree. by magarity · · Score: 1

      You can't get the fed buying google, only losses get socialized
       
      Sure you can, it just takes a while to set it up. Require google to feature sites that don't bring them ad revenue (Not that well meaning government would ever meddle in a market that was working reasonably well otherwise just for social engineering) and then when Google fails, blame it on their own greed and socialize them.

    5. Re:Add them to the buying spree. by Anonymous Coward · · Score: 0

      Aw, come on mods, you can't help but laugh at the current American socialist revolution caused by unpopular uprising.

      Would they come in their pants or shit in their britches if we in the U.S. went full tilt communist and then proceeded to export the people's revolution worldwide?

      Maybe a bit of both when they realize they've gotten what they've wanted all along and when they finally realize that you must be careful what you wish for because sometimes you get it.

    6. Re:Add them to the buying spree. by Anonymous Coward · · Score: 0

      Maybe it's time that Google buys the American government

    7. Re:Add them to the buying spree. by WhatAmIDoingHere · · Score: 3, Insightful

      Well, we pay tax dollars to the government, meaning anything that it does is NOT free. That leaves "standard and available" as the only two options, and I'm fine with that.

      --
      Not a Twitter sockpuppet... but I wish I was.
    8. Re:Add them to the buying spree. by Hurricane78 · · Score: 0, Flamebait

      Well... the EU, Australia and Japan/Korea would shit in their pants (because they would become the non-believing axis or evil,
      and China and Cuba would come all over the place from having a new ally.
      Everyone else would not care and say "same old shit, different day".

      On the other hand, as the EU imitates the US in every aspect they can, maybe they would simply become communist too, just in a more bureaucratic way... Hey, in EU's government, because of their bureaucracy, communism could actually work. ;)

      --
      Any sufficiently advanced intelligence is indistinguishable from stupidity.
    9. Re:Add them to the buying spree. by Anonymous Coward · · Score: 1, Insightful

      I totally disagree. Who cares what format the data is in? The real challenge is obtaining data that you need in the first place. This is *NOT* about pedantic word document format wars. Its about access to raw datasets.

      You can already download thousands of large datasets in CSV form via FTP from many government information sites. The raw data needs work to process and generate user readable reports from but anyone with half a clue processing data is easily able to do that. Format has always been a non-issue in my experience.

      You can't snap your fingers and tell everyone to use the same schema because all the data is different requiring its own domain specific conciderations.

      The argument that the average joe with no experience working with datasets would be able to download and process this information is nonsense. As TFA mentioned people with experience managing data can make the datasets avaliable to the public online in a more friendly manner with searching, report generation...etc in manners useful to Mom, Dad and Grandma. Competing sites can offer even easier to use services or better insights.

      As an aside anything XML based sucks when you process most of these datasets because they are typically **huge** with millions of records. CSV or fixed width formats are where its at with most currently avaliable data from the US government.

    10. Re:Add them to the buying spree. by M-RES · · Score: 1

      They and many other very wealthy companies already have... hehe ;p

  3. GOXML by Anonymous Coward · · Score: 5, Funny

    I hear that Microsoft is already working on the problem with their proposed "Government Open XML" standard. This should not be confused with GOXMLb ("Google Open XML beta") because Microsoft would never try to confuse people on such an issue.

    It is going up for ISO vote next week. Be there*.

    (*) it will be very profitable for you to "be there".... nudge, nudge... wink, wink...

    1. Re:GOXML by LifesABeach · · Score: 1

      The problem that the U.S.Government faces is the lack of understanding about all things data. Obviously, there is secret data, and public data. The secret stuff we will never see, but the public stuff? Now THAT's worth considering. XML is an excellent format for data, and could work very well hiding information. But someone is going to have to convert written documents to XML; "Mind Deadening" would be a most polite term. But one of the nice things that could be applied here is the number of U.S.Citizens that are out of work, and could use a job. Not everyone in the U.S. wants to do what is required to be a Mortgage Banker, or Brain Surgeon. Some folks can, for a time, contribute to the conversion; until things get better for them. The methods used to build the Interstate Highway System could be applied here.

    2. Re:GOXML by Anonymous Coward · · Score: 0

      The mind-deadening work is merely scanning/OCR for the majority of documents, so it's not such a bad job. Think of all the mind-deadened individuals working in the world at the moment doing nothing but data-input, it's just more of the that work only with less typing involved. :D

  4. Looks like a job for Microsoft! by erroneus · · Score: 3, Insightful

    What you need is not one [set of] standard(s) but one vendor controlling and maintaining those standards... they know what is best for all of us because they are paid professionals, not hack hobbyists.

    (Yes! I am kidding!!)

    1. Re:Looks like a job for Microsoft! by Daniel+Dvorkin · · Score: 2, Insightful

      You're kidding ... but Microsoft isn't.

      It makes me nervous when people say things like this line from TFA: "Private actors, either nonprofit or commercial, are better suited to deliver government information to citizens ..." Uh, no, governments are better suited to deliver information about themselves, and no matter how bureaucratic or obstructionist the US government may be it's still more open and credible, on a dollar-for-dollar basis, than a lot of the "private actors" who would just looove to charge us an arm and a leg for information we've already paid our taxes for.

      By all means, the government should make raw data as well as user-friendly aggregations available. And if "private actors" can do a better job of aggregating the raw data than the government can, and they can do it well enough to get people to pay them for it, then good for them. But by no means should the large number of (generally well-laid-out and quite informative) government data aggregation websites be shut down as a benefit for corporations that want to sell our own information back to us, which I strongly suspect is the idea that's being pushed here.

      --
      The correlation between ignorance of statistics and using "correlation is not causation" as an argument is close to 1.
    2. Re:Looks like a job for Microsoft! by davolfman · · Score: 1

      Actually if most of the private actors are corporations I think neither can be trusted for competence. Rule by committee is rule by committee. My current pet idea is that any large group of people making decisions can only be considered to have the net intelligence of an animal as they never seem to get beyond stimulus-response.

  5. Welcome to the "duh" department by betterunixthanunix · · Score: 0, Redundant

    Seriously, it took a team of researchers to figure this out?

    --
    Palm trees and 8
    1. Re:Welcome to the "duh" department by Talchas · · Score: 4, Insightful

      No, but it takes at least a team of researchers to get the government to listen.

      --
      As the Americans learned so painfully in Earth's final century,free flow of information is the only safeguard against...
    2. Re:Welcome to the "duh" department by lysergic.acid · · Score: 2, Insightful

      only if you take the long route. the quickest way to get the government to listen is with an unattended briefcase full of cash.

      that is, if you don't already have former board members in the White House.

  6. Shakespeare:To share or not to share? by Ostracus · · Score: 2, Interesting

    One thought has occurred to me as part of this "sharing". Privacy and the other is Security.

    --
    Shai Schticks:"You don't make peace with friends, you make peace with enemies"
    1. Re:Shakespeare:To share or not to share? by betterunixthanunix · · Score: 4, Informative

      We are not talking about the government sharing data on individual citizens or on military secrets. We are talking about things involving government spending, contracts, loans, grants, etc. Things that citizens should have access to, but have trouble organizing.

      --
      Palm trees and 8
    2. Re:Shakespeare:To share or not to share? by Ostracus · · Score: 2, Informative

      "The District of Columbia is far ahead of its federal government overlord in bringing data to standard XML formats and RSS-enabling it. DC's government has what it calls a "data catalog" offering live data feeds of crime reports, construction reports, building permits and many other types of information. "

      I can see some that if not screened carefully could cause problems in this "sharing" environment.

      Also don't forget there have been examples were citizen information has accidentally been leaked. Soon retracted but "sharing" only means the mistake propagates faster.

      --
      Shai Schticks:"You don't make peace with friends, you make peace with enemies"
    3. Re:Shakespeare:To share or not to share? by TapeCutter · · Score: 1

      ""The District of Columbia is far ahead of its federal government overlord in bringing data to standard XML formats and RSS-enabling it. DC's government has what it calls a "data catalog" offering live data feeds of crime reports, construction reports, building permits and many other types of information. "

      "To share" is a GoodThing(TM), but my brain keeps showing me pictures of the Vogon destructor fleet.

      --
      And did you exchange a walk on part in the war for a lead role in a cage? - Pink Floyd.
    4. Re:Shakespeare:To share or not to share? by dthomas9 · · Score: 1

      We are not talking about the government sharing data on individual citizens or on military secrets. We are talking about things involving government spending, contracts, loans, grants, etc. Things that citizens should have access to, but have trouble organizing.

      Since the government spends, contracts, lends, and grants to individual citizens and to military contractors, we are in fact talking about sharing sensitive data. That's why it needs to be carefully reviewed before publishing it. SSNs have already appeared on the web as a result of efforts to share government grant and contract data.

    5. Re:Shakespeare:To share or not to share? by Anonymous Coward · · Score: 0

      One thought has occurred to me as part of this "sharing". Privacy and the other is Security.

      Two thoughts occurred to me while reading your "Shakespeare". Not to be.

    6. Re:Shakespeare:To share or not to share? by ThaddaeusV · · Score: 2, Funny

      "The Internet has made it possible to make more mistakes, faster, than any other invention in history, with the possible exceptions of handguns and tequila."
      -- A Usenet .sig I remember from years ago

      --
      Thaddaeus A. Vick, Speaker for the Coyote
    7. Re:Shakespeare:To share or not to share? by SpacePunk · · Score: 1

      It's in the best interest of the government, it's employees, and contractors that this information is as hard to find as possible. The last thing they want is for the people to be able to find information on what is really going on.

  7. Librarians by jbolden · · Score: 2, Interesting

    The government could hire librarians to organize the data. This is are a group of people highly trained in how to take large quantities of non standard data and organize it in a way that people can find what they want.

    1. Re:Librarians by Anonymous Coward · · Score: 0

      I'd love to hire a librarian to organize our data. Sadly they don't work for free and no one wants to give us that money. The expect us to publish our data for free. We do, but I'm not investing months standardizing it for free.

    2. Re:Librarians by jbolden · · Score: 1

      Creating large CRM systems is not free. Very little can be done other than "dump it online" if the cost has to be kept very low.

  8. Oh god no. by Anonymous Coward · · Score: 0

    The government in the recent past standardized the way grants are applied for. The NSF, DOE, NIH all had different ways of doing things, but now it's through one process. Strangely enough, once they standardized everything, we went from 1 accountant who spent about 1/4 of his time fixing and processing prof's grant apps, to almost 3 full time people.

    When a standard encompasses too many domains and departments, it just gets too large to be useful.

    1. Re:Oh god no. by Anonymous Coward · · Score: 0

      >we went from 1 accountant who spent about 1/4 of his time fixing and processing prof's grant apps, to almost 3 full time people.

      So not only did it create jobs, it's responsible for enormous job creation growth: almost a factor of twelve!

  9. Imagine a beowulf cluster of lobbyists by Crash+Culligan · · Score: 2, Insightful

    Remember the good old days, when transparency in government could be safely considered a good thing?

    Generally, I'm still for it. Absolutely we need transparency in our government, and anything that brings us closer to point-and-click convenience over what we have now (FOIA requests left behind the radiator for 9–18 months to age and mellow) is for the best.

    Furthermore, an open, accessible standard (i.e. no copyrighted DTDs, and I'm looking at you, Microsoft) will allow government resources to be brought together in interesting and inspiring ways. You know all those Facebook apps and Google Maps mashups? Imagine those applied to governance. The idea behind them is to put information together in new and interesting ways. If not only those in government, but the citizenry, can create government hacks like that, there would be great benefit.

    Now let's talk hazards.

    When was the last time you published your name and address online? See any good uses of microformats on any major sites lately? That's because there are some people on the Internet who are <sarcasm class="churchlady">not so nice</sarcasm>, and might willingly abuse whatever information they can find. The "government hack" alluded to above is an invitation to abuse. And we really can't afford to put government in that kind of position.

    Another consideration, and I've stated this before, is that a wide line must be maintained between security and transparency. Security means that everything that must be kept secret is really kept secret. Transparency means that everything that doesn't have to be secure is made available somehow. If things aren't secured, the government becomes ineffectual and even detrimental. If things aren't kept transparent, the government itself can become abusive. A freely searchable infrastructure would make the transparency all that much more powerful, and make any breaches in security that much more severe.

    --
    You cannot truly appreciate Dilbert until you read it in the original Klingon.
  10. Good advice doesn't always have good result by feenberg · · Score: 3, Insightful

    I use lots of government supplied data in my work, and one constant has always been that the more work the agency does to make the data easily available, the harder the data are to use. Spreadsheets get posted with labels and data mixed, because that looks better in print. Spreadsheets get posted as PDFs, because that looks better in print. Footnotes and other textual material is mixed into numeric fields, because that is the way the material will be published in hardcopy. etc etc etc.

    Databases get posted to the web with "interfaces" that allow single rows to be downloaded, but require months of screen scraping to get the entire database. Databases get released with (windows-only, of course) software with the same effect. etc etc etc

    The reason is mostly that agencies want to discourage outside analysis of the data - they would prefer to avoid inconsistent messages getting to OMB or congress.

    1. Re:Good advice doesn't always have good result by jimmyhat3939 · · Score: 1

      Really, I just think there's so much data out there that it's hard to do this effectively without compromising things. I'm not sure what the solution is, but letting a search engine go wild on the government's data seems possibly worse than the current state of affairs.

      --
      Free Conference Call -- No Spam, High Quality
  11. Various Departments.... by glitch23 · · Score: 1

    have already created information sharing data model standards for law enforcement and justice purposes. These include NIEM (National Information Exchange Model) and GJXDM (Global Justice XML Data Model). If the government can create these then additional models can be created for sharing information with its citizens. Someone (or group) just has to take the lead to do so.

    --
    this nation, under God, shall have a new birth of freedom. -- Lincoln, Gettysburg Address
  12. you mean like this for example? by plopez · · Score: 1
    --
    putting the 'B' in LGBTQ+
  13. Princeton PhDs says so, we must do by recharged95 · · Score: 1

    Come on, a bunch of Princeton researchers, after spending $X millions of grants from the US gov't over 5 years now says we need a data standard?

    Dublin Core?

    FEA? (Federal Enterprise Architecture, the other DRM)

    OAIS?

    And talk to anyone in gov't IT today on fed data problems, and they'll give you better info on how to solve the data issues vs. these researchers. Note to the Princeton researchers: stick to solving the semantic web problems--cause that's something we can ignore for the next 10 yrs.

    And Google for Gov't may not be a good thing--google tracks everything. Now you know why gov't wants the same system.

  14. Standard. by supernova_hq · · Score: 1

    ensuring it complies to standards

    Hopefully he knows the meaning of a TRUE standard, as opposed to the other kind!

  15. WTF...WTI - DCBlogs is wrong or the PRs are idiots by OldHawk777 · · Score: 1

    "The federal government's data-sharing efforts are a mess" This is very well known and real PRs would not need to state the obvious.

    We The People really wants a useful government, but continue to elect tent-revivalist and pick-pocket politicians (most not all are in on the $7B scam).

    Set the government's vast amount of data free is total bullshit.

    Globally exposing Gov/Mil data/content IS NOT REQUIRED to ensure web-services/SOA and data/content complies with "Open" standards like UTF, ODT, PNG/SVG..., syntax/XML..., semantics/OWL..., synergy/collaboration, topology, ontology... [START HERE: en.wikipedia.org/wiki/OpenDocument_technical_specification, then don't stop RTFM... continue ISO, W3C, OASIS... google some of the words/concepts above... CONTINUE].

    "Open" (as in 100% non-Proprietary) ISO/OASIS/W3C... hardware, software, and services International Standards (not corporate/MS/ATT/IBM...) can meet all NetCentric, GIG, ISE requirements without exposing data/content.

    Once "Open" Standards happens, corporations will lose a large part of the strangle hold on US/EU Government operations, contracts, and acquisitions. Gov-transformation will happen, but for now much of the Defense Industry (EU and US) act more like the enemy than reliable patriotic allies for US & EU.

    "A group of Princeton researchers (follow Download link for full PDF)" is either (or all) not understood by DCBlogs, an early 20090401 prank, or corporate BS attempts to keep the Government (technology clueless C*Os/managers) consuming bullshit from (technology clueless) Biz-Buzz marketeers.

    --
    Unaccountable leaders are masters, and unrepresented people are slaves. How do US and EU fare?
  16. I hate the standard game by Anonymous Coward · · Score: 0

    Government agencies do a lot of good stuff. A few bad apples pass contracts only to friends.
    Effect: Agencies have to publicly request for bids

    No we know if a three man team is not capable of building a power plant, so there might be more to consider than simply the price of the bid.

    This is where standards came in: In order to lock the small companies out, the government requires bidders to adhere to ridiculous standards like ISO9000 or CMMI Maturity Level shit whose overhead only giant companies can afford, since others are busy working. The inventors get their ass stuffed full of money for certifying fees and another big chunk of tax dollars goes into useless forms that are required by a "mature process".

    Now some researchers claim, that everything will be better if we make every agency adhere to standards. So now everyone will get the information he needs because some office uses xml? I don't think so! This will only force software onto government agencies that they don't want -- often enough for good reasons, and force more companies into the perverted process that is standardizing, which is again a big companies' game.

    Standards came into being because people wanted to agree on something, which would be useful for everybody. If a standard was passed that was unneeded it was simply ignored. Making something a standard doesn't make it good. I don't care if OOXML was a standard, Mondopoint shoe sizes are also a standard. Nobody uses them. This type of pseudo-scientific claims that standards are eo ipso good is something I hate.

  17. I'm surprised by PPH · · Score: 1

    follow Download link for full PDF

    they didn't publish this using Silverlight.

    --
    Have gnu, will travel.
  18. Well, its already happening in some places by Jesse+Rudolph · · Score: 2, Informative

    The United States Army uses 'PureEdge' which i guess was replaced by IBM with 'Lotus Forms' as there is no canonical link to the software anymore. Its an XML based form system. Its not really used in any standard way, other than electronically saving forms, and filling stuff in before printing the forms. It could though, because the Army, at least, does little to no documentation that isn't on some kind of standardized form. Now that the forms are machine parsable, I can definitely see the fed adoption some kind of organization and retrieval system.

    The problem with that is, that the government doesn't want to organize its documentation that well. Obfuscation is still a large part of information security in certain circles, and the possibility of leak is much greater when information flows so fluidly. Unclassified does not mean its not of a sensitive nature, it just means that it doesn't fall under any of the standard security classifications. Thus the reason why we shred EVERYTHING.

    Its archaic, but not necessarily ineffective.

  19. CSV eXtended by cyclomedia · · Score: 1

    What would be a good start would be to standardise the publication of tabular data, for example population statistics, with ways of defining column types, data types and units whilst retaining a tabular structure instead of bastardising the tree-structure of XML. I guess we could take CSV, add a couple of header blocks and call it Extended CSV. Though it'd need an X in it to sound 21st century... so how about CSVX?

    If anyone googles that my web site will go down in flames...

    --
    If you don't risk failure you don't risk success.