Slashdot Mirror


Where's the Open Data?

blamanj asks: "There's a lot of open-source code around, and generally, it's quite easy to find. Finding open source data, on the other hand, can be quite a pain. Why isn't there a common reponsitory for public domain data sets? I'm thinking of things like lists of world cities, dictionaries of stemmed words, population data, etc., etc."

6 of 56 comments (clear)

  1. Missing the point by Lavos · · Score: 1, Interesting

    I understand the importance of Google and the like, but let me give an example from the MS side of things, and why we love it.

    SQL Server and Access both come with the Northwind database. If I have some new query that I'm trying to write, for instance randomly returning different numbers of products for each product category, it is pretty darn handy to have a standardized data set to pull from for my example code.

    Otherwise, I have to include DDL and DML just to create the example data. Instead, I can just say "Run this against Northwind."

    The same applies for training and learning. Northwind is a pretty well known database, and most established developers won't have to learn a new schema in order to demonstrate a new concept.

    So rephrase the question from "Where can I find some data?" to "Where can I find a data set that other developers are using so we can more intelligently exchange information?"

    --
    "Tax preparation software eliminates errors your[SIC] may make...." From IRS home page.
  2. Where's the financial data?? by grammar+nazi · · Score: 4, Interesting
    Screw all of the other data mentioned above! I want to run pricing models on historic financial data... e.g. intraday option prices and vols, dividend schedules 30 years back, intraday stock prices for way back.

    This is stuff you can't download for free from Yahoo, CBOE, or other places.

    If I can just get access to this data, then I will make enough money to purchase the other data.

    --

    Keeping /. free of grammatical errors for ~5 years.
    1. Re:Where's the financial data?? by Anonymous Coward · · Score: 1, Interesting

      Some sources:

      US Company Security Filings:
      http://www.sec.gov/edgar.shtml

      Historical SEC findings in XML format:
      http://bulk.resource.org/edgar/

      Limited stock prices (15-30 years) are available from yahoo.

      What I'm looking for is historical stockbuyback lists.

  3. Boardgame/Parlor game data? by ClioCJS · · Score: 3, Interesting
    For a long time I've wanted a website to suppy open data for the purpose of playing board games / parlor games...

    For example, ever run out of trivia questions in your version of Trivial Pursuite? Or used up all the word cards in Taboo... etc etc.

    I think in the event of running out of data for your board game, it would be nice to download more. (And this would make a cool website.)

    Especially if they came with PalmPilot/Windows versions that would administer the game for you. For example Taboo consists of a word that you must get the other people to say, but there are 7 words that you CANNOT say as a clue. For example the word may be "George Bush" and you can't say "Texan", "President", etc. This game is fun but we calculate we'll use up all the data that comes with it in about 30 hours. The "electronic" version is $40. That's hardly worth it. If we could just download data, we could play forever. So .. um .. yes.. I want open data.

    --
    -Clio
    Karma: Bad (mostly from not giving a fuck)
    Blog: http://clintjcl.wordpress.com
  4. Timelines and the 'Necessary Web' by RobotWisdom · · Score: 4, Interesting
    I think you'd lose more than you'd gain if you tried to centralise this process-- it's hard enough to keep a local webpage up-to-date.

    I agree in theory that we need a Semantic Web where content is easier to find, but I don't think XML-etc can really help. [rant]

    My current theory is that individuals need to build the 'Necessary Web' which consists, like an encyclopedia, of a page for each topic (or many pages by different authors, on their own websites). Four special traits make a page qualify as 'Necessary':

    -- an attempt to be FAQ-like, and briefly cover all the important subtopics on a single page.

    -- an attempt to sort thru and link all the best web-resources on the topic. (By reducing the linktext to one- or two-word [text buttons] you can fit hundreds of links into a useful page.)

    -- a timeline, to present the most possible data in the neatest possible way. [theory]

    -- The Open Web Content License to encourage others to recycle-and-update your content, requiring only that they clearly link your page as one of the original sources.

    Most recent example of this format: Linux/Unix (timeline w/100s of links)

    I believe that once a critical mass of authors adopt this format, taking on the most useful topics, there will be a rapid shift from the current search-frustrations to something very much like the Semantic-Web ideal, without even requiring any fancier technology than simple HTML.

  5. Electronic music databases by heikkih · · Score: 2, Interesting
    When it comes to electronic music (house/techno/idm/electro etc) there are some excellent user-contributed databases out there.

    Check out and add to: