Where's the Open Data?
blamanj asks: "There's a lot of open-source code around, and generally, it's quite easy to find. Finding open source data, on the other hand, can be quite a pain. Why isn't there a common reponsitory for public domain data sets? I'm thinking of things like lists of world cities, dictionaries of stemmed words, population data, etc., etc."
Right Here.
MIT's SuperArchive
Grabbe the link off of rootprompt in case any of you care
NOAA provides Bathymetry data and electronic navigation charts (vectorized) and NIMA (that's right, .mil, -- NIMA used to be the Defense Mapping Agency provides city lists and populations for all the countries in the world, as well as DEMs (digital elevation models--i.e. gridded topography). The National Atlas project provides boundaries of federal lands, outlines of states, locations of major cities, stuff like that.
ENJOY!
--here's a great site. refdesk.com. Matt Drudge's father runs this site,AFAIK, a boatload of data links.
That kind of data is out there if you want it, but it isn't always in the format you want it and sometimes it's hard to find (less hard with google). I started a site for this kind of data at www.tonsoflists.com, which contains data in MySQL tables that you can format into different kinds of sets,orderings, formats, etc. Just did it for fun, but haven't gotten any feedback.
nobody posted this - standardized data sets for training AI. It's a start, anyways - useful for comparing one machine learning system to another. Maybe you could use it for something else?
Lots of great links there, but you left out The CIA's world fact book. They publish as much as they can so that anyone (including their own agents) can access the needed information, from anywhere. World Fact Book http://www.cia.gov/cia/publications/factbook/index .html
"Those who cast the votes decide nothing; those who count the votes decide everything." - Josef Stalin