Six Degrees of Wikipedia

← Back to Stories (view on slashdot.org)

Posted by kdawson on Tuesday May 27, 2008 @09:44AM from the finding-the-center dept.

An anonymous reader notes that someone has applied the game Six Degrees of Kevin Bacon to the articles in Wikipedia. Instead of the relation being "in the same film," he used "is linked to by." From the blog post: "We'll call the 'Kevin Bacon number' from one article to another the 'distance' between them. It's then possible to work out the 'closeness' of an article in Wikipedia as its average distance to any other article. I wanted to find the centre of Wikipedia, that is, the article that is closest to all other articles (has minimum [distance])."

19 of 296 comments (clear)

Min score:

Reason:

Sort:

Why wouldn't there be disjoint partitions? by Palmyst · 2008-05-27 09:54 · Score: 3, Interesting

Ignoring obvious stuff like main page, index etc.. is it not possible that there could be two articles that are not in the same transitive closure at all?
1. Re:Why wouldn't there be disjoint partitions? by Intron · 2008-05-27 09:59 · Score: 4, Interesting
  
  In theory. I haven't found two articles with a separation greater than 4, tho.
  
  Orca
  Argentina
  Saxophone
  Oboe
  3 clicks needed
  
  --
  Intron: the portion of DNA which expresses nothing useful.
2. Re:Why wouldn't there be disjoint partitions? by stedo · 2008-05-27 10:25 · Score: 2, Interesting
  
  Yes, there are. Read the rest of TFA for exactly how this is handled, but the gist is: closeness of an article = [total length of all shortest paths from this article]/[number of articles reachable from here]. There are a couple of disjoint sets, but they don't actually affect the results much as they're all tiny (disambig pages, etc)
3. Re:Why wouldn't there be disjoint partitions? by mfarah · 2008-05-27 10:36 · Score: 4, Interesting
  
  So far, my "personal best" has been 5 clicks:
  
  Shortest path from Pelagius of Asturias to Pham Nuwen
  
  Pelagius of Asturias
  Iberian Peninsula
  Africa
  Zheng He
  A Deepness in the Sky
  Pham Nuwen
  
  5 clicks needed
  
  I've found several others that require 5 links.
  
  I wish Stephen Dolan would have posted which article(s) has(have) the BIGGEST number as well...
  
  --
  "Trust me - I know what I'm doing."
  - Sledge Hammer
4. Re:Why wouldn't there be disjoint partitions? by Gat0r30y · 2008-05-27 10:43 · Score: 2, Interesting
  
  Those aren't linked from any other articles - but they link to other wikipedia articles. Since its a directional graph he's using (from what I gathered) it would appear to me that these would only be disjoint in a one way sort of style. I.E. You can get from A to B in a finite number of steps but you cannot get from B to A - he appears to measure the minimum distance. - However I was able to get thies -
  Shortest path from Agassaim to bananas No path found
  However that is not always the case for "orphaned" pages -
  Shortest path from Aldous to Gould Aldous Aldous Huxley 1949 Western Pacific Railroad Gould 4 clicks needed
  And since he is using a directional graph -
  Shortest path from Gould to Aldous No path found
  
  --
  Prediction: The real iPhone killer is going to be sex robots from Japan. Think about it.
5. Re:Why wouldn't there be disjoint partitions? by Redacted · 2008-05-27 11:04 · Score: 2, Interesting
  
  Shortest path from Nikon D300 to Ossa
  
  No path found
  
  What do I win?
6. Re:Why wouldn't there be disjoint partitions? by Anonymous+Cowpat · 2008-05-27 11:11 · Score: 2, Interesting
  
  well, according to "Complexity vs. stability in small-world networks." (Sitabhra Sinha. Journal of Physics. 2004):
  
  The number of links per node (bi-directional), k (must be) >> ln(N), where N is the number of nodes, to avoid a fragmented network (assuming undirected link distribution).
  So - figure out the number of pages (nodes) in wikipedia, slap a natural log around it and you know how many links you would need to double and then have much more than to avoid fragmentation.
  
  So, you need much more than ~29 links per node to ensure no fragmentation.
  That leads me to conclude that there are well over 61.2m individual inter-article links on wikipedia.
  I wonder if that's accurate.
  
  Also, I thought of that algorithm first and it's called HPSAUCE!
  
  --
  FGD 135
7. Re:Why wouldn't there be disjoint partitions? by MrAnnoyanceToYou · 2008-05-27 11:19 · Score: 2, Interesting
  
  Meh. There's a path I found manually in about a minute. There's probably a shorter one, though;
  Ossa
  Motorcycle
  Toyota
  Honda
  Nikon
  Nikon D300
  
  --
  My little site.
Where All... by TheLazySci-FiAuthor · 2008-05-27 09:54 · Score: 3, Interesting

It's sometimes eerie to think of an idea and then see that someone has done it over the weekend and posted it on slashdot.

Last friday at work I was researching different chemicals on wikipedia (a favorite past time of mine) and thought it would be pretty neat if there was a way to find how related two articles were - or to have some way to query the links between two articles to find similarities.

What I really wanted was a very simple query. My SQL is very rusty, so a plain english version might be perhaps, 'show links where link exists in article_a and article_b'

Is there a way to execute SQL queries on wikipedia without having to actually download the entire database? I asked google, but was presented with the SQL page on wikipedia....

--
Read my Very Short "Stories"
Interestingly by Anonymous Coward · 2008-05-27 09:58 · Score: 1, Interesting

Slashdot's favorite Star Wars Prequel actress Natalie Portman "... is among a very small number of professional actors with a defined ErdÅ'sâ"Bacon number."

Math AND movies. Mmmm ...
Link distance by ninjapiratemonkey · 2008-05-27 10:01 · Score: 5, Interesting

The distance going from Article A to Article B is not necessarily the same as from Article B to article A. For example, the Slashdot page links to the HTTP page, but not vice versa. It would be interesting to know if he took that into consideration when counting links, or whether he would have counted it as one in either direction.

--
01110000 01010111 01101110 00110011 01100100
1. Re:Link distance by jd · 2008-05-27 10:34 · Score: 3, Interesting
  
  In mathematical terms, this makes Wikipedia a non-simply-connected space. This has two consequences. Firstly, it makes the topology much harder to describe. Secondly, it means that topologists should have enough research material to write books and papers on the dynamics of Wikispace for years to come.
  
  --
  It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
From Bacon to Physics, 3 clicks. by certron · 2008-05-27 10:04 · Score: 2, Interesting

While the results are interesting (I won't spoil it by posting the answers, although I'm sure someone else has already cut to the chase and done it), the way they arrived at their results is more interesting. I'm sure this could be extended to some pretty maps of what links where, or deep/shallow topics in different fields. I had tried to find the number of links between Kevin Bacon and Nuclear Physics, but it didn't like my input. Instead, I discovered that it takes 3 clicks to go from Bacon to Physics, passing through Columbia University and BDSM on the way.

Off-topic, but this is as good a place as any: There was a project hosted on some academic server a few years ago that linked song lyrics together. Clicking on the lyric 'creep' in the lyrics of the Radiohead song of the same title would bring up links to the TLC and Stone Temple Pilots songs of the same title, as well as any other song that used that word in their lyrics. Two songs that shared certain words would be linked by at most 2 clicks. I'm sure it has been buried in Google-cruft in the years since someone figured out that lyrics pages could be slurped up and turned into banner ad farms, but I had been thinking about how this could be re-implemented using a Wiki that would turn every word into a link and then link to a 'what links here' page. Does anyone know where this original project is or what happened to it? Any hints on re-implementing the behavior with a wiki?

--

fair.org counterpunch.com truthout.com indymedia.org salon.com
eff.org guerrilla.net debian.org gentoo.org
"What is the use... by jd · 2008-05-27 10:17 · Score: 2, Interesting

...in staying up all night arguing over whether there is or isn't a God, if the machine only gives you his bleedin' phone number in the morning!"
You're not the only one with this problem, I fear.

--
It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
Well, that depends. by jd · 2008-05-27 10:32 · Score: 2, Interesting

The six degrees of seperation is an easily-misunderstood concept, making it important that what it is people are looking for is also what people think they are looking for.
The next thing to consider is that Wikipedia is produced by self-selecting contributors who are (necessariy) selective as to what facts (and what references) are to be used, making this a definitely non-random sample using incomplete data out of a population that may have unexpected biases.
What matters, then, is that even under heavily sub-optimal conditions, we are getting the same results as we'd expect from near-perfect data. What also matters is that the incompleteness of the data is not significantly perturbing the distance between any two articles. You would expect it to, but it doesn't.

--
It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
What about language? by kylehase · 2008-05-27 10:38 · Score: 5, Interesting

The 6 degrees theory claims that everyone in the world is connected. That means you'd have to include every Wikipedia page in other languages as well, not just English.

I tested some random Japanese Wikipages and the test failed. I then tried some very common English pages and those failed as well "Unknown article...". So I think their server might be having the /. effect.

In any case it doesn't look like they included other languages in their setup.

--
You want fun, go home and buy a monkey!
Re:I know the center by Slashidiot · 2008-05-27 20:04 · Score: 2, Interesting

I've been delighted to find out the shortest path from A to B. Two clicks, through ASCII. So it's not a straight line, as people try to make us believe...

--
Tis women makes us love, Tis Love that makes us sad, Tis sadness makes us drink, And drinking makes us mad.
Another type of six degrees of freedom... by teapot7 · 2008-05-27 22:38 · Score: 2, Interesting

It exists for rock/pop/whatever music and cover versions too:
The path from Rob Zombie to Dusty Springfield isn't that long:
- Rob Zombie covered Blitzkrieg Bop by Ramones
- Ramones covered Surf City by Jan & Dean
- Jan & Dean covered Lightnin' Strikes by Lou Christie
- Lou Christie covered If Wishes Could Be Kisses by Dusty Springfield
http://covertrek.com/findLinksBetween.html
Re:I know the center by Fumus · 2008-05-28 00:41 · Score: 3, Interesting

Funny that. Start to end has five clicks needed.

Shortest path from start to end
Start
Start signal
Code
Computer printer
Black
End
5 clicks needed