Slashdot Mirror


Where's the Open Data?

blamanj asks: "There's a lot of open-source code around, and generally, it's quite easy to find. Finding open source data, on the other hand, can be quite a pain. Why isn't there a common reponsitory for public domain data sets? I'm thinking of things like lists of world cities, dictionaries of stemmed words, population data, etc., etc."

14 of 56 comments (clear)

  1. Your Tax Dollars At Work by Dr.+Bent · · Score: 5, Informative
  2. Soon you'll be able to try... by damien_kane · · Score: 4, Informative

    MIT's SuperArchive

    Grabbe the link off of rootprompt in case any of you care

  3. NIMA and NOAA too by stanwirth · · Score: 5, Informative

    NOAA provides Bathymetry data and electronic navigation charts (vectorized) and NIMA (that's right, .mil, -- NIMA used to be the Defense Mapping Agency provides city lists and populations for all the countries in the world, as well as DEMs (digital elevation models--i.e. gridded topography). The National Atlas project provides boundaries of federal lands, outlines of states, locations of major cities, stuff like that.

    ENJOY!

  4. How 'common' do you need it? by tswinzig · · Score: 5, Funny

    Why isn't there a common reponsitory for public domain data sets?

    There is, it's right here.

    (aka The Internet)

    --

    "And like that ... he's gone."
    1. Re:How 'common' do you need it? by tswinzig · · Score: 3, Troll

      Slashdot ought to implement a new filter for its comments section in preferences: Score penalty for reference to Google in Ask Slashdot question.

      Unfortunately, most of the Ask Slashdot's are so lame they can be answered with a simple google search.

      The editor that posts the Ask Slashdot should first see if he can easily answer the question with a google search before posting the article.

      --

      "And like that ... he's gone."
    2. Re:How 'common' do you need it? by mabinogi · · Score: 4, Insightful

      > Unfortunately, most of the Ask Slashdot's are so lame they can be answered with a simple google search.

      But someone submitting a question to Ask Slashdot doesn't want a bunch of links from Google, they want opinions...opinions from real people that may or may not (most likely) know what they're talking about.
      They want discussion.....you cant get that by searching on google...

      --
      Advanced users are users too!
  5. here ya go by zogger · · Score: 5, Informative

    --here's a great site. refdesk.com. Matt Drudge's father runs this site,AFAIK, a boatload of data links.

  6. Amen by pizza_milkshake · · Score: 5, Informative

    That kind of data is out there if you want it, but it isn't always in the format you want it and sometimes it's hard to find (less hard with google). I started a site for this kind of data at www.tonsoflists.com, which contains data in MySQL tables that you can format into different kinds of sets,orderings, formats, etc. Just did it for fun, but haven't gotten any feedback.

  7. A little surprised . . . by Discoflamingo13 · · Score: 3, Informative

    nobody posted this - standardized data sets for training AI. It's a start, anyways - useful for comparing one machine learning system to another. Maybe you could use it for something else?

  8. Where's the financial data?? by grammar+nazi · · Score: 4, Interesting
    Screw all of the other data mentioned above! I want to run pricing models on historic financial data... e.g. intraday option prices and vols, dividend schedules 30 years back, intraday stock prices for way back.

    This is stuff you can't download for free from Yahoo, CBOE, or other places.

    If I can just get access to this data, then I will make enough money to purchase the other data.

    --

    Keeping /. free of grammatical errors for ~5 years.
  9. Boardgame/Parlor game data? by ClioCJS · · Score: 3, Interesting
    For a long time I've wanted a website to suppy open data for the purpose of playing board games / parlor games...

    For example, ever run out of trivia questions in your version of Trivial Pursuite? Or used up all the word cards in Taboo... etc etc.

    I think in the event of running out of data for your board game, it would be nice to download more. (And this would make a cool website.)

    Especially if they came with PalmPilot/Windows versions that would administer the game for you. For example Taboo consists of a word that you must get the other people to say, but there are 7 words that you CANNOT say as a clue. For example the word may be "George Bush" and you can't say "Texan", "President", etc. This game is fun but we calculate we'll use up all the data that comes with it in about 30 hours. The "electronic" version is $40. That's hardly worth it. If we could just download data, we could play forever. So .. um .. yes.. I want open data.

    --
    -Clio
    Karma: Bad (mostly from not giving a fuck)
    Blog: http://clintjcl.wordpress.com
  10. It should be clearly labeled by jki · · Score: 4, Insightful
    There's a lot of open-source code around, and generally, it's quite easy to find. Finding open source data, on the other hand, can be quite a pain

    When you go to Google to find software to fill some specific need, you already know quite clearly how to search. The problem with finding "open data" is that there currently is not any commonly used clear label on such texts, research and articles. I tend to mention that the content is released under the GNU Free Documentation License or FDL when I want to release something to be freely utlized by anyone. One such case is for example the Amazon Discoveries series. Not that it would be any useful for anyone :) This problem is a bit related to the problem of releasing your idea or concept under such license - there does not seem any clear practise how to go on about this :: what to do if your idea might be unique but you do not want to patent it. We have that exact problem with for example the Openchallenge concept submissions. Any ideas on what practises to use in that case would help us out.

  11. Timelines and the 'Necessary Web' by RobotWisdom · · Score: 4, Interesting
    I think you'd lose more than you'd gain if you tried to centralise this process-- it's hard enough to keep a local webpage up-to-date.

    I agree in theory that we need a Semantic Web where content is easier to find, but I don't think XML-etc can really help. [rant]

    My current theory is that individuals need to build the 'Necessary Web' which consists, like an encyclopedia, of a page for each topic (or many pages by different authors, on their own websites). Four special traits make a page qualify as 'Necessary':

    -- an attempt to be FAQ-like, and briefly cover all the important subtopics on a single page.

    -- an attempt to sort thru and link all the best web-resources on the topic. (By reducing the linktext to one- or two-word [text buttons] you can fit hundreds of links into a useful page.)

    -- a timeline, to present the most possible data in the neatest possible way. [theory]

    -- The Open Web Content License to encourage others to recycle-and-update your content, requiring only that they clearly link your page as one of the original sources.

    Most recent example of this format: Linux/Unix (timeline w/100s of links)

    I believe that once a critical mass of authors adopt this format, taking on the most useful topics, there will be a rapid shift from the current search-frustrations to something very much like the Semantic-Web ideal, without even requiring any fancier technology than simple HTML.

  12. not an easily distributed task by joe094287523459087 · · Score: 3, Insightful

    it costs more money to get data than to make code.