On Finding Semantic Web Documents
Anonymous Coward writes "A research group at University of Maryland has published a blog describing the latest approach for finding and indexing Semantic Web Documents. They have published it in reaction to Peter Norvig's (director of search quality at Google) view on the Semantic Web (Semantic Web Ontologies: What Works and What Doesn't): 'A friend of mine [from UMBC] just asked can I send him all the URLs on the web that have dot-RDF, dot-OWL, and a couple other extensions on them; he couldn't find them all. I looked, and it turns out there's only around 200,000 of them. That's about 0.005% of the web. We've got a ways to go.'"
Every user of a LiveJournal-based website running recent code has a FOAF file. Let's look how many users that is:
/feed/rdf or /wp-rdf.php, which is in RDF. Movable Type comes preinstalled with an RSS 1.0 feed. Each of these has at least a couple thousand users.
* LiveJournal.com: 5751567
* GreatestJournal.com: 717406
* DeadJournal.com: 474435
* Weedweb.net: 22650
* InsaneJournal.com: 12970
* JournalFen.net: 7629
* Plogs.net: 7086
* journal.bad.lv: 4530
(This list is most likely incomplete.)
In addition to this, every Typepad user has an account: according to the 6A merger stories, that's another million users. Add in the RDF from all the Typepad RSS files, and that's another 1 million.
All Wordpress blogs have a feed, located at
So, we've got, just as a guess, about 9 million RDF files out there in the blogging world alone. Throw in a hell of a lot of scientific data, and everything on RDFdata.org, and you start to get an idea that the world is a lot more Semantic Web enabled than you seem to think it is.
-- Christopher Schmidt YouTube Quality of Experience