Where's the Open Data?

Your Tax Dollars At Work by Dr.+Bent · 2002-11-11 15:18 · Score: 5, Informative

Right Here.

planning a terrorist attack are we? by Anonymous Coward · 2002-11-11 15:24 · Score: 0

you can't have that data. you might use it to kill someone.

die fucker. information should be banned.

Its called Google by Anonymous Coward · 2002-11-11 15:27 · Score: 0

Follow da links. Use Google's cache.

Data = Money :(

Soon you'll be able to try... by damien_kane · 2002-11-11 15:30 · Score: 4, Informative

MIT's SuperArchive

Grabbe the link off of rootprompt in case any of you care

NIMA and NOAA too by stanwirth · 2002-11-11 15:33 · Score: 5, Informative

NOAA provides Bathymetry data and electronic navigation charts (vectorized) and NIMA (that's right, .mil, -- NIMA used to be the Defense Mapping Agency provides city lists and populations for all the countries in the world, as well as DEMs (digital elevation models--i.e. gridded topography). The National Atlas project provides boundaries of federal lands, outlines of states, locations of major cities, stuff like that.

ENJOY!

How 'common' do you need it? by tswinzig · 2002-11-11 15:37 · Score: 5, Funny

Why isn't there a common reponsitory for public domain data sets?

There is, it's right here.

(aka The Internet)

--

"And like that ... he's gone."

Re:How 'common' do you need it? by Anonymous Coward · 2002-11-11 16:22 · Score: 1, Funny

Slashdot ought to implement a new filter for its comments section in preferences: Score penalty for reference to Google in Ask Slashdot question.
Re:How 'common' do you need it? by tswinzig · 2002-11-11 17:53 · Score: 3, Troll

Slashdot ought to implement a new filter for its comments section in preferences: Score penalty for reference to Google in Ask Slashdot question.

Unfortunately, most of the Ask Slashdot's are so lame they can be answered with a simple google search.

The editor that posts the Ask Slashdot should first see if he can easily answer the question with a google search before posting the article.

--

"And like that ... he's gone."
Re:How 'common' do you need it? by mabinogi · 2002-11-11 22:23 · Score: 4, Insightful

> Unfortunately, most of the Ask Slashdot's are so lame they can be answered with a simple google search.

But someone submitting a question to Ask Slashdot doesn't want a bunch of links from Google, they want opinions...opinions from real people that may or may not (most likely) know what they're talking about.
They want discussion.....you cant get that by searching on google...

--
Advanced users are users too!
Re:How 'common' do you need it? by tswinzig · 2002-11-12 01:15 · Score: 2

They want discussion.....you cant get that by searching on google...

Yes, you can!

--

"And like that ... he's gone."

Have you tried by Apreche · 2002-11-11 15:48 · Score: 2

the world almanac? Oh no, a book? made of paper? what's that? Just because it isn't in digital form doesn't mean it's not out there. Try the public library sometime.

--
The GeekNights podcast is going strong. Listen!

Re:Have you tried by Anonymous Coward · 2002-11-11 15:57 · Score: 0

Oh, gee, thanks, that's so helpful, so next time I write an application that figures out user's city and state by zip code, I should just go ahead and type in the USPS manual in my database. Thanks again for valuable suggestion.
Re:Have you tried by Chipaca · 2002-11-12 03:20 · Score: 1

on Monday November 11, Apreche said:
> Try the public library sometime.

For many people that is usually not a valid option. Public libraries are, too often, one of the first "luxuries" a government cuts back on, and are rendered not useful by sheer neglect.

Missing the point by Lavos · 2002-11-11 16:09 · Score: 1, Interesting

I understand the importance of Google and the like, but let me give an example from the MS side of things, and why we love it.

SQL Server and Access both come with the Northwind database. If I have some new query that I'm trying to write, for instance randomly returning different numbers of products for each product category, it is pretty darn handy to have a standardized data set to pull from for my example code.

Otherwise, I have to include DDL and DML just to create the example data. Instead, I can just say "Run this against Northwind."

The same applies for training and learning. Northwind is a pretty well known database, and most established developers won't have to learn a new schema in order to demonstrate a new concept.

So rephrase the question from "Where can I find some data?" to "Where can I find a data set that other developers are using so we can more intelligently exchange information?"

--
"Tax preparation software eliminates errors your[SIC] may make...." From IRS home page.

Re:Missing the point by Lavos · 2002-11-11 16:11 · Score: 1

Oh yeah, and I realize that the article's poster/asker really is just wanting data, but I would like to see this turned into another topic entirely.

--
"Tax preparation software eliminates errors your[SIC] may make...." From IRS home page.

here ya go by zogger · 2002-11-11 16:17 · Score: 5, Informative

--here's a great site. refdesk.com. Matt Drudge's father runs this site,AFAIK, a boatload of data links.

Re:here ya go by tunah · 2002-11-11 17:36 · Score: 1

Matt Drudge's father runs this site,AFAIK, a boatload of data links.
Sorry, I'm having trouble visualising that.

--
Free Java games for your phone: Tontie, Sokoban

You're Using It Right Now by marvinx · 2002-11-11 16:44 · Score: 0

Not only the WWW, but the entire Internet is at your fingertips. It's not that the data is available, it's that it's sometimes hidden.

Amen by pizza_milkshake · 2002-11-11 17:00 · Score: 5, Informative

That kind of data is out there if you want it, but it isn't always in the format you want it and sometimes it's hard to find (less hard with google). I started a site for this kind of data at www.tonsoflists.com, which contains data in MySQL tables that you can format into different kinds of sets,orderings, formats, etc. Just did it for fun, but haven't gotten any feedback.

Re:Amen by stefanlasiewski · 2002-11-11 17:50 · Score: 1

Hey, nice resource. I've actually had need for some of that information in the last year.

--
"Can of worms? The can is open... the worms are everywhere."
Re:Amen by nathanm · 2002-11-12 13:22 · Score: 2
Just did it for fun, but haven't gotten any feedback.
I just checked the site out. Interesting, but I have a couple issues:
1. There isn't any contact info on the site.
2. The capital of MN is St Paul, not St Louis. I hope that's not representative of the rest of your data.

A little surprised . . . by Discoflamingo13 · 2002-11-11 17:15 · Score: 3, Informative

nobody posted this - standardized data sets for training AI. It's a start, anyways - useful for comparing one machine learning system to another. Maybe you could use it for something else?

Re:A little surprised . . . by Zurk · 2002-11-12 10:39 · Score: 1

interesting...lots of financial and other data types but not much linguistic data.

Open NASA data by Anonymous+Cowdog · 2002-11-11 17:45 · Score: 2

What's the deal with NASA data? Especially Hubble data? Sure would be nice to make some screensavers without those unsightly logos on them.

Oh, yeah... just remembered a nice bookmark! :-)

http://earthobservatory.nasa.gov/Newsroom/NewImage s/images_index.php3

The NASA Earth Observatory. Don't know how open, though.

Where's the financial data?? by grammar+nazi · 2002-11-11 18:00 · Score: 4, Interesting

Screw all of the other data mentioned above! I want to run pricing models on historic financial data... e.g. intraday option prices and vols, dividend schedules 30 years back, intraday stock prices for way back.

This is stuff you can't download for free from Yahoo, CBOE, or other places.

If I can just get access to this data, then I will make enough money to purchase the other data.

--

Keeping /. free of grammatical errors for ~5 years.

Re:Where's the financial data?? by Anonymous Coward · 2002-11-11 20:12 · Score: 1, Interesting

Some sources:

US Company Security Filings:
http://www.sec.gov/edgar.shtml

Historical SEC findings in XML format:
http://bulk.resource.org/edgar/

Limited stock prices (15-30 years) are available from yahoo.

What I'm looking for is historical stockbuyback lists.

Boardgame/Parlor game data? by ClioCJS · 2002-11-11 18:38 · Score: 3, Interesting

For a long time I've wanted a website to suppy open data for the purpose of playing board games / parlor games...

For example, ever run out of trivia questions in your version of Trivial Pursuite? Or used up all the word cards in Taboo... etc etc.

I think in the event of running out of data for your board game, it would be nice to download more. (And this would make a cool website.)

Especially if they came with PalmPilot/Windows versions that would administer the game for you. For example Taboo consists of a word that you must get the other people to say, but there are 7 words that you CANNOT say as a clue. For example the word may be "George Bush" and you can't say "Texan", "President", etc. This game is fun but we calculate we'll use up all the data that comes with it in about 30 hours. The "electronic" version is $40. That's hardly worth it. If we could just download data, we could play forever. So .. um .. yes.. I want open data.

--
-Clio
Karma: Bad (mostly from not giving a fuck)
Blog: http://clintjcl.wordpress.com

Re:Boardgame/Parlor game data? by spike2131 · 2002-11-12 02:08 · Score: 1

For example, ever run out of trivia questions in your version of Trivial Pursuite? Or used up all the word cards in Taboo... etc etc.

I think in the event of running out of data for your board game, it would be nice to download more. (And this would make a cool website.)

You know, I came up with this idea just two days ago, after discovering how lame the new questions are in the Trivial Pursuit 20th Aniversary Edition. It's really time these questions got open sourced.

What I had in mind was a system whereby people could go to a website and contribute trivia questions. After some human screening, the questions would go into a database, which would then be used to generate PDF files of Trivial Pursuit Cards that could be freeley downloaded and printed at the users home computer. Alternately, users could pay 10 bucks to have a deck of pre-printed Trivial Pursuit cards sent their house.

I'm interested in starting this project over at Sourceforge. Anyone else like the idea?

--
SpyDock: Scientific Python in a Docker container
Re:Boardgame/Parlor game data? by ClioCJS · 2002-11-12 05:57 · Score: 1

Well hopefully the questions would undergo some sort of evaluation system so they could be grouped by difficulty & subject, then a database would manage them such that certain questions could be used for games OTHER than trivial pursuit.
The idea would be to support as many different games as possible...

--
-Clio
Karma: Bad (mostly from not giving a fuck)
Blog: http://clintjcl.wordpress.com
Re:Boardgame/Parlor game data? by spike2131 · 2002-11-12 07:17 · Score: 1

Trouble is, different games have different requirements for their questions; Trivial Pursuit is a simple question and answer, Jeopardy is an answer then question combination, some would prefer a multiple choice format, and games like Taboo are a separate animal entirely. Writing for all these different games would be like cross platform software; it would make the task more difficult, but the advantages are clear as well. If questions added for one game could be easily ported to other game formats, considerable effort could be saved in having to go out and find new trivia questions. How best to implement that is an interesting connundrum, though.

--
SpyDock: Scientific Python in a Docker container
Re:Boardgame/Parlor game data? by Cy+Guy · 2002-11-12 10:32 · Score: 2
I like the idea a lot. And I have an idea for a starting point: rec.games.trivia they run quizes in the group which should provide a substantial base.

I'd recommend converting the whole thing into XML with fields for:
- Major genre
- minor genre
- the question
- a hint
- some numerically scaled level of difficulty
- multiple choice options A, B, C, & D, and
- the correct answer
that should allow anyone who want to build an interface a lot of leeway as to how they want to structure the quiz.

A long time ago, there used to be an IRC based game that was a run like Jeopardy. Don't know if the games were archived though.
--
Work for Change & GET PAID!

It should be clearly labeled by jki · 2002-11-11 19:49 · Score: 4, Insightful

There's a lot of open-source code around, and generally, it's quite easy to find. Finding open source data, on the other hand, can be quite a pain

When you go to Google to find software to fill some specific need, you already know quite clearly how to search. The problem with finding "open data" is that there currently is not any commonly used clear label on such texts, research and articles. I tend to mention that the content is released under the GNU Free Documentation License or FDL when I want to release something to be freely utlized by anyone. One such case is for example the Amazon Discoveries series. Not that it would be any useful for anyone :) This problem is a bit related to the problem of releasing your idea or concept under such license - there does not seem any clear practise how to go on about this :: what to do if your idea might be unique but you do not want to patent it. We have that exact problem with for example the Openchallenge concept submissions. Any ideas on what practises to use in that case would help us out.

Timelines and the 'Necessary Web' by RobotWisdom · 2002-11-11 19:55 · Score: 4, Interesting

I think you'd lose more than you'd gain if you tried to centralise this process-- it's hard enough to keep a local webpage up-to-date.

I agree in theory that we need a Semantic Web where content is easier to find, but I don't think XML-etc can really help. [rant]

My current theory is that individuals need to build the 'Necessary Web' which consists, like an encyclopedia, of a page for each topic (or many pages by different authors, on their own websites). Four special traits make a page qualify as 'Necessary':

-- an attempt to be FAQ-like, and briefly cover all the important subtopics on a single page.

-- an attempt to sort thru and link all the best web-resources on the topic. (By reducing the linktext to one- or two-word [text buttons] you can fit hundreds of links into a useful page.)

-- a timeline, to present the most possible data in the neatest possible way. [theory]

-- The Open Web Content License to encourage others to recycle-and-update your content, requiring only that they clearly link your page as one of the original sources.

Most recent example of this format: Linux/Unix (timeline w/100s of links)

I believe that once a critical mass of authors adopt this format, taking on the most useful topics, there will be a rapid shift from the current search-frustrations to something very much like the Semantic-Web ideal, without even requiring any fancier technology than simple HTML.

not an easily distributed task by joe094287523459087 · 2002-11-11 22:00 · Score: 3, Insightful

it costs more money to get data than to make code.

WWW by Anonymous Coward · 2002-11-12 01:45 · Score: 0

The World Wide Web. Here's a library: http://promo.net/pg/ Go try google. Did you just make this question to see if you could get in?

ibiblio by hysterion · 2002-11-12 02:14 · Score: 2

Surprise. No one as yet seems to have mentioned ibiblio?

--
Timeo idiotikOS et dona ferentes

Electronic music databases by heikkih · 2002-11-12 02:19 · Score: 2, Interesting

When it comes to electronic music (house/techno/idm/electro etc) there are some excellent user-contributed databases out there.

Check out and add to:

Baseball by Gabey · 2002-11-12 02:44 · Score: 2

I realize this is slashdot, but maybe there's other stats geeks out there that like baseball. Baseball-Reference is hands down the best stats site out there. And it's based on the Lahman database which is freely available (newest version coming soon).

-Gabe

Obligitory "Homer J" reference by Anonymous Coward · 2002-11-12 03:18 · Score: 0

the entire Internet is at your fingertips

AND, it's on computers now.

CRACKHEAD MODERATOR by Lavos · 2002-11-12 03:28 · Score: 0, Offtopic

Yeah, *THIS* post might be offtopic, but the above wasn't. That's the first post that I'd ever had moderated down, and this will likely be the second.

"Hmmmm, he mentioned MS in a fashion not suited to bashing, he must be moderated down." I bash MS quite often, but I will give them credit when it's due.

Why don't you reply so we can have a little discussion instead of hiding behind anonymity? Oh, never mind, I forgot that this is slashdot.

--
"Tax preparation software eliminates errors your[SIC] may make...." From IRS home page.

Re:CRACKHEAD MODERATOR by Anonymous Coward · 2002-11-12 06:22 · Score: 0

So let's see, you first post one way Offtopic comment, then you reply to yourself twice where the second reply is inflammatory. And you are mad at the Moderators? Such curious behavior from someone with such a low User #.

trivial matter by zogger · 2002-11-12 03:37 · Score: 2

--I play trivial pursuit once a week on irc with some friends. We do it ad hoc round robin style, and the good part is, you come up with your own questions. It's your "turn" as long as you stump people, whenever someone gets the answer, it's their turn, and so on. Seems to work great. You start a session by locking in the theme of the questions. Everyone gets a chance to both share knowledge they have, and also to learn from the others.

I guess if you wanted to get it in some sort of doc form you could use a session log and tweak it.

Re:trivial matter by spike2131 · 2002-11-12 04:10 · Score: 1

Where would I find such logs? What are the licensing restrictions on such things?

--
SpyDock: Scientific Python in a Docker container
Re:trivial matter by zogger · 2002-11-12 04:56 · Score: 2

--you do it yourself when you are in the channel, save the session-the conversation- to a logfile of your choice on your machine, or the OP does and you can get it from them. Try it out, go to an irc channel someplace on some subject that interests you then search your client IRC program you have running for whichever menu button you mash or command to type to save the session to file. The few different ones I've used all seem to have that function. I'm not an irc guru, but they are there. Perhaps one of the olden tymes unix guys here who've used irc for like forever can be more specific on the saving and tweaking part. The log files are a shade clunky but readable, you might want to cut and paste a lot to turn it into a normal easier to read html page, not sure if there's an automatic way to do that, but it's doable that way at least and not that hard unless it's a humongous log file, then it's just tedious..

Here's an example of where the logging function is found in x-chat, the client I use under linux, the path is > Settings > Setup > Options, you'll see the logging function then.

Now licensing ya got me, ask permission if it's not yours is the best bet. I can't see folks getting real anal over it, either, but ya never know. Probably just depends what you intend to do with it, share it around, they'll probably say have at it, try to develop a commercial product and sell it, you need more serious advice and a contract I guess. Like "yo, zeke, mind if I take this logfile and tweak it and come up with a nice page of trivial pursuit questions?" "Sure man goferit, send me the url when you finished I want a copy" "thanks" "swell". Beyond that your gonna need a contract of some sort. Cash changes reality. Asking permission is always the safest bet, IMO.
Re:trivial matter by spike2131 · 2002-11-12 07:21 · Score: 1

always better to ask permission than forgiveness, i suppose.

--
SpyDock: Scientific Python in a Docker container

Plenty of good data from the government by jutulen · 2002-11-12 05:00 · Score: 1

There are plenty of government sources of data that is free and open to anyone. The Census Bureau, Energy Information Administration, Commerce Department, a good starting place is FirstGov .

In addition, most state governments and even county level governments publish large amounts of data.

--
"The old forget, the young don't know" --Japanese Proverb

UCI Repositories by scruffy · 2002-11-12 05:35 · Score: 2

Someone should mention the UCI Machine Learning Repository and the UCI Knowledge Discovery in Databases Archive.

CIA's World fact book by StalinJoe · 2002-11-12 07:16 · Score: 2, Informative

Lots of great links there, but you left out The CIA's world fact book. They publish as much as they can so that anyone (including their own agents) can access the needed information, from anywhere. World Fact Book http://www.cia.gov/cia/publications/factbook/index .html

--
"Those who cast the votes decide nothing; those who count the votes decide everything." - Josef Stalin

Found this on the web by spike2131 · 2002-11-12 11:58 · Score: 1

While looking for available domain names, I came across OpenTrivia.com. I like the spirit of what he is doing, though the license is a little more restrictive than what I would prefer.

Much as you suggested, it uses XML to allow people leeway on structuring a quiz. It doesn't offer multiple choices, though. Thats not really a concern for the Trivial Pursuit application, but still, I'd like to have it as an option.

--
SpyDock: Scientific Python in a Docker container

Dear Slashdot: who will do my homework? by Anonymous Coward · 2002-11-12 12:16 · Score: 0

Hi Slashdot. I'm in junior high school and have a book report due tomorrow and I can't seem to get logged into cheater.com. I'm really pissed I can't find some l33t paperz on the web to hand in as my work, can you help?

for ubiquitous open data to happen... by zonker · 2002-11-12 14:35 · Score: 0

...you have to have open data types that is ubiquitous as well. i see a lot of the in-fighting and disagreement of the 'best way' to do things in the open source community as sometimes counterproductive to this end...

i am awaiting the obvious response that a beowulf cluster of xml will save the world...

--

Large print giveth, and the small print taketh away

Bioinfomatics databases by cramped+bowels · 2002-11-13 00:35 · Score: 1

A good starting URL - http://www.biotech.ubc.ca/bioinform.html#public

Way offtopic? by Lavos · 2002-11-13 07:00 · Score: 1

How the hell was the first comment offtopic? Oh, we aren't supposed to discuss stuff? The second reply came after the moderation, and I already said it was likely to be offtopic.

I'm sorry I've been around for a while and don't Karma Whore with everyone else. I post my opinion. Always. I don't cater to what I think will get posted up or down.

Use your points to mod intersting stuff up instead of wasting them by modding down stuff you don't agree with. Christ, there were how many trolls that weren't touched when my comment got modded down? My comment was the best one that could be found to be modded down?

This comment will be my third to get moderated down, and in the same thread no less. There are far more deserving comments that should have had the point used to be modded up.

Yeah yeah, YHBT, YHL, HAND, but I don't really care.

--
"Tax preparation software eliminates errors your[SIC] may make...." From IRS home page.

56 comments