Six Degrees of Wikipedia
An anonymous reader notes that someone has applied the game Six Degrees of Kevin Bacon to the articles in Wikipedia. Instead of the relation being "in the same film," he used "is linked to by." From the blog post: "We'll call the 'Kevin Bacon number' from one article to another the 'distance' between them. It's then possible to work out the 'closeness' of an article in Wikipedia as its average distance to any other article. I wanted to find the centre of Wikipedia, that is, the article that is closest to all other articles (has minimum [distance])."
Ignoring obvious stuff like main page, index etc.. is it not possible that there could be two articles that are not in the same transitive closure at all?
It's sometimes eerie to think of an idea and then see that someone has done it over the weekend and posted it on slashdot.
Last friday at work I was researching different chemicals on wikipedia (a favorite past time of mine) and thought it would be pretty neat if there was a way to find how related two articles were - or to have some way to query the links between two articles to find similarities.
What I really wanted was a very simple query. My SQL is very rusty, so a plain english version might be perhaps, 'show links where link exists in article_a and article_b'
Is there a way to execute SQL queries on wikipedia without having to actually download the entire database? I asked google, but was presented with the SQL page on wikipedia....
Read my Very Short "Stories"
Slashdot's favorite Star Wars Prequel actress Natalie Portman "... is among a very small number of professional actors with a defined ErdÅ'sâ"Bacon number."
...
Math AND movies. Mmmm
The distance going from Article A to Article B is not necessarily the same as from Article B to article A. For example, the Slashdot page links to the HTTP page, but not vice versa. It would be interesting to know if he took that into consideration when counting links, or whether he would have counted it as one in either direction.
01110000 01010111 01101110 00110011 01100100
While the results are interesting (I won't spoil it by posting the answers, although I'm sure someone else has already cut to the chase and done it), the way they arrived at their results is more interesting. I'm sure this could be extended to some pretty maps of what links where, or deep/shallow topics in different fields. I had tried to find the number of links between Kevin Bacon and Nuclear Physics, but it didn't like my input. Instead, I discovered that it takes 3 clicks to go from Bacon to Physics, passing through Columbia University and BDSM on the way.
Off-topic, but this is as good a place as any: There was a project hosted on some academic server a few years ago that linked song lyrics together. Clicking on the lyric 'creep' in the lyrics of the Radiohead song of the same title would bring up links to the TLC and Stone Temple Pilots songs of the same title, as well as any other song that used that word in their lyrics. Two songs that shared certain words would be linked by at most 2 clicks. I'm sure it has been buried in Google-cruft in the years since someone figured out that lyrics pages could be slurped up and turned into banner ad farms, but I had been thinking about how this could be re-implemented using a Wiki that would turn every word into a link and then link to a 'what links here' page. Does anyone know where this original project is or what happened to it? Any hints on re-implementing the behavior with a wiki?
fair.org counterpunch.com truthout.com indymedia.org salon.com
eff.org guerrilla.net debian.org gentoo.org
You're not the only one with this problem, I fear.
It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
The next thing to consider is that Wikipedia is produced by self-selecting contributors who are (necessariy) selective as to what facts (and what references) are to be used, making this a definitely non-random sample using incomplete data out of a population that may have unexpected biases.
What matters, then, is that even under heavily sub-optimal conditions, we are getting the same results as we'd expect from near-perfect data. What also matters is that the incompleteness of the data is not significantly perturbing the distance between any two articles. You would expect it to, but it doesn't.
It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
The 6 degrees theory claims that everyone in the world is connected. That means you'd have to include every Wikipedia page in other languages as well, not just English.
/. effect.
I tested some random Japanese Wikipages and the test failed. I then tried some very common English pages and those failed as well "Unknown article...". So I think their server might be having the
In any case it doesn't look like they included other languages in their setup.
You want fun, go home and buy a monkey!
I've been delighted to find out the shortest path from A to B. Two clicks, through ASCII. So it's not a straight line, as people try to make us believe...
Tis women makes us love, Tis Love that makes us sad, Tis sadness makes us drink, And drinking makes us mad.
The path from Rob Zombie to Dusty Springfield isn't that long:
- Rob Zombie covered Blitzkrieg Bop by Ramones
- Ramones covered Surf City by Jan & Dean
- Jan & Dean covered Lightnin' Strikes by Lou Christie
- Lou Christie covered If Wishes Could Be Kisses by Dusty Springfield
http://covertrek.com/findLinksBetween.html
Funny that. Start to end has five clicks needed.
Shortest path from start to end
Start
Start signal
Code
Computer printer
Black
End
5 clicks needed