Slashdot Mirror


The Math Behind PageRank

anaesthetica writes "The American Mathematical Society is featuring an article with an in-depth explanation of the type of mathematical operations that power PageRank. Because about 95% of the text on the 25 billion pages indexed by Google consist of the same 10,000 words, determining relevance requires an extremely sophisticated set of methods. And because the links constituting the web are constantly changing and updating, the relevance of pages needs to be recalculated on a continuous basis."

131 comments

  1. 10,000 words by ambivalentduck · · Score: 5, Funny

    But 9,000 of those words are slang for parts of the human anatomy.  Go figure.

    1. Re:10,000 words by Anonymous Coward · · Score: 0, Insightful

      Dear The Zon,

      You are not funny. Quit your lame endless attempts at humor.

      The Other 1.05 million readers

    2. Re:10,000 words by MadAhab · · Score: 1

      On the other hand: explain Gallagher and Carrot Top. "Apparently" they are funny, because they have "careers". Yet everyone with an actual sense of humor knows they are just waiting to unhinge their jaws and swallow you whole.

      --
      Expanding a vast wasteland since 1996.
    3. Re:10,000 words by Anonymous Coward · · Score: 0

      The confusing thing about retro-active signature changes is that in the discussion where it is changed, you can't immediately see what came first.

      For a moment there, I thought AC was quoting The Zon's sig...

    4. Re:10,000 words by binaryacid · · Score: 1
      Not directly related to this reply, but putting it here for visibility. Not self-promotion. Just would like to provide some useful reference:

      The Anatomy of a Large-Scale Hypertextual Web Search Engine
      http://infolab.stanford.edu/~backrub/google.html
      - This paper tells you what PageRank really is, by the original author.

      Efficient Computation of PageRank
      http://dbpubs.stanford.edu:8090/pub/1999-31
      - This paper tells you how they efficiently compute it

      And as far as I know about information retrieval, the magic you see on Google today isn't primarily contributed by PageRank, since people fake it so much nowadays with domain farms.

    5. Re:10,000 words by Anonymous Coward · · Score: 0

      The Eskimo had 600 words for ice...

  2. PageRank doesn't seem to be based on keywords by dada21 · · Score: 4, Informative

    I have sites with a PR of 6, and I can tell you that they got that way because of inbound links from other sites. In fact, when other sites dropped those links, my PR dropped (to 5, and even to 4). Getting more inbound links brought the PR back.

    Think about those links, too. How often do you use common words in an HREF? I don't think there's a lot of weeding out of common words since the link to a site is usually either its name, or a description containing some important keywords.

    I love seeing these technoscientists think they understand PageRank, but just like TimeCube, they're way, way off.

    1. Re:PageRank doesn't seem to be based on keywords by pilkul · · Score: 0, Offtopic

      I love slashdot fools that post as quickly as possible to have a better chance of being moderated up, but they don't RTFA and are way, way off.

    2. Re:PageRank doesn't seem to be based on keywords by markov_chain · · Score: 2, Informative

      There has been a PageRank paper out there since 2000 or so, so it's not exactly a secret how it works. Basically an initial set of relevant pages is pulled from the database and ranked by doing some computation on a connectivity matrix. The trick is to come up with a good initial set; and unless they managed to implement an all-knowing oracle they probably do it by doing a keyword search. Here's where the article summary makes sense; if most pages have the same keywords, a keyword search is going to come up with an awfully large initial set.

      The article might have details, maybe someone who has actually read it can fill in :)

      --
      Tsunami -- You can't bring a good wave down!
    3. Re:PageRank doesn't seem to be based on keywords by Anonymous Coward · · Score: 3, Interesting

      It's not secret.

    4. Re:PageRank doesn't seem to be based on keywords by kimvette · · Score: 4, Funny
      --
      The Christian Right is Neither (Christian nor right). See: Matthew 23, Matthew 25, Ezekiel 16:48-50
    5. Re:PageRank doesn't seem to be based on keywords by zootm · · Score: 1

      If you're referring to the article, it focuses on the "links" aspect when describing the PageRank algorithm. The summary on here is pretty misleading in that way.

    6. Re:PageRank doesn't seem to be based on keywords by Pollardito · · Score: 1
      Think about those links, too. How often do you use common words in an HREF?
      interestingly, it appears that Adobe Acrobat leads the list of results when you search for "here" on Google (you can download it here).

      and who would have expected this
    7. Re:PageRank doesn't seem to be based on keywords by MyEyesTheyBurn · · Score: 0

      Inbound links definitely helps, although I think it's a combination of a dozen others things as well. We changed one of our sites front page to contain more of our "key words" and within a few weeks instantly shot up to the first 1-3 pages on Google for a variety of keywords.

    8. Re:PageRank doesn't seem to be based on keywords by Raenex · · Score: 1

      I'm behind on my Slashdot reading, but I wanted to offer you a supportive comment even if it isn't timely. You're right, the original poster only read the summary and got modded up for a stupid comment based on not RTFA.

      That said, your comment contained more insult than explanation (yeah he didn't RFTA, but point out the discrepancy in his argument). The more inflammatory your message, the less likely it will be considered. I know, it's tempting to flame, and I do it myself now and then, but not nearly as much as I've used to. It's amazing how much better a response you get by just sticking to the argument and not attacking the person, even if it means ignoring insults thrown your way.

  3. Pagerank is cool by pap3rw8 · · Score: 0, Interesting

    Whatever google's doing with PageRank, it seems to be doing it right. At least from my experience.

    1. Re:Pagerank is cool by silentounce · · Score: 5, Interesting

      Interestingly enough, google thinks so, too.

      Of course, yahoo has its own opinion.
       
      Although, altavista seems to almost agree. Check the second non-advertised result.
       
      I do find this amusing though. Third place, how humble.
       
      I didn't expect such interesting results. The site with the search term in its url was tops for av and yahoo, but not google. Yahoo ranked the wiki entry above google, but av reversed that decision, google of course thought itself was more important than the wiki. Google's own reference site was number one in its own search and near the top in the other two, but pagerank.net wasn't even in the top 10 for google's search. I'm not sure what conclusions can be drawn from all that, but it is definitely food for thought.

      --
      There are many tongues to talk, and but few heads to think. -Victor Hugo
    2. Re:Pagerank is cool by ben+there... · · Score: 1
      I do find this amusing though. Third place, how humble.

      What I found interesting about that link was the description listed for google's entry:
      Google - 11:54pm
      Enables users to search the Web, Usenet, and images. Features include PageRank, caching and translation of results, and an option to find similar pages.
      www.google.com/ - 5k - Dec 5, 2006 - Cached - Similar pages

      Where did they get that text from? It's not anywhere to be found in the source. Did they cheat? Or are they just tricky?
    3. Re:Pagerank is cool by McDutchie · · Score: 1
      Where did they get that text from? It's not anywhere to be found in the source. Did they cheat? Or are they just tricky?

      They got it from the Google category at the Open Directory Project at dmoz.org, mirrored at directory.google.com. Google is a user of dmoz.org data but has completely de-emphasized that as of late.

      It's actually against the dmoz license agreement to use their data without a link back to the source, but nobody seems to care.

    4. Re:Pagerank is cool by flood6 · · Score: 1

      Whenever possible, Google uses the DMOZ description for the snippet shown in the results.

    5. Re:Pagerank is cool by krishn_bhakt · · Score: 1

      Things change with key word "search engine"... MSN is 1st! [ http://www.google.com/search?hl=en&lr=&q=search+en gine ]

      --
      The Answer Lies in The Genome
    6. Re:Pagerank is cool by MotF+Bane · · Score: 1

      Use the Google search, try for "best search engine". They don't list themselves. Then go try Yahoo search, try for "best search engine"....

  4. Bad summary by Knights+who+say+'INT · · Score: 5, Interesting

    The article specifically says the PageRank eigenvector is only recalculated once a month, approximately. Even though Google uses some clever numerics to calculate the eigenvectors to a 25 billion by 25 billion matrix by iteration, it still takes several hours to finish.

    1. Re:Bad summary by The+Zon · · Score: 2, Funny

      Even though Google uses some clever numerics to calculate the eigenvectors to a 25 billion by 25 billion matrix by iteration, it still takes several hours to finish.

      Please. I can do that on paper in, like, five minutes.

      --
      Some attitudes replaced or by cgi optimizes
    2. Re:Bad summary by Firehed · · Score: 1

      Several hours for 25b x 25b? Jeez, it took Slashdot the better part of a day to update the comment id field type in their database... 16.7m by 1. OSTG, we demand that the servers running Slashdot be upgraded to something that could actually withstand a Slashdotting!

      --
      How are sites slashdotted when nobody reads TFAs?
    3. Re:Bad summary by Anonymous Coward · · Score: 0

      It takes several hours to compute and yet it is only run once a month or so?

    4. Re:Bad summary by Anonymous Coward · · Score: 0

      OK you ARE new here. Us old-timers did not use to read the article. But all these young'uns have started the tradition of not reading the summary also. You broke both the codes! The norm is to just read the mis-leading headline, make up a story in your head and comment on that.

    5. Re:Bad summary by trentblase · · Score: 1

      Please, I can do that in my mind in, like, 5 seconds.

    6. Re:Bad summary by martin-boundary · · Score: 4, Insightful
      It's nowhere near like that. A web matrix is very sparse, so if you did a true 25Bx25B matrix power iteration, you'd be multiplying zero by zero a gazillion times. Optimization is about not doing things you don't need to do, and optimizing PageRank is about figuring out clever ways to not do the full multiplication. Moreover, PageRank is calculated in parallel over a computer farm. Overall, you can expect a single iteration to take on the order of an hour, and you can expect around 50-80 iterations before Google gives up and says it's converged. You can also try and reuse the previous "converged" PageRank vector to cut down on the 50-80 iterations after you've crawled new pages.

      If google used a single computer to do all the work, and truly did 80*25B^2 operations, they'd be morons.

    7. Re:Bad summary by Anonymous Coward · · Score: 0

      What size paper? Assuming, for example, each cell of the matrix is 1mm square, that's a piece of paper about 15000 miles square meaning its area is 241,313,849 square miles, and the surface of the land on earth is about 93,000,000 square miles. You're saved if you're using waterproof paper since the total SA of earth is around 320,000,000 square miles.

      (Posting AC because I probably screwed the math up very badly and I have a fragile ego)

    8. Re:Bad summary by Patent-Monkey · · Score: 1

      Interestingly, Google does a lot of reindexing using existing searches and then builds upon a search listing and a page indexing review. For example in US Patent 6,526,440, "The search engine obtains an initial set of relevant documents by matching a user's search terms to an index of a corpus. A re-ranking component in the search engine then refines the initially returned document rankings so that documents that are frequently cited in the initial set of relevant documents are preferred over documents that are less frequently cited within the initial set."

      I agree that they are doing a lot of things to avoid computational load.

    9. Re:Bad summary by sasdrtx · · Score: 1

      42

      --
      Most people don't even think inside the box.
  5. Nouns maybe? by Bryansix · · Score: 3, Insightful

    It seems like it would be the nouns, pronouns, etc. that Google should be paying attention to. Who cares about all the verbs, adjectives, etc. that just muddy the indexing waters?

    1. Re:Nouns maybe? by kramulous · · Score: 1

      I believe that a race is on at the moment for semantic searching. Not only nouns, verbs etc, but whether the phases are subjective or objective. I know a blog search company that is working on this. They wanted to borrow some of my code.

      --
      .
    2. Re:Nouns maybe? by abshnasko · · Score: 2, Insightful

      Searching for pill and the pill should yield very different results. Yes nouns are more important, but articles and other words cannot be disregarded.

    3. Re:Nouns maybe? by WaXHeLL · · Score: 1

      RTFA please. It deals with determining relevance, not the optimal method of indexing pages.

      In regards to your comment:
      Verbs play an extremely important role when dealing with relevancy based on phrases.

      The small snippet that was posted was just cut and pasted from the opening hook of the article. It just leads into a mathematical discussion how to sort through the thousands of results that are returned.

      --
      The troll with karma.
    4. Re:Nouns maybe? by gfody · · Score: 1

      The is a stop word and will most likely be excluded from your search term.

      --

      bite my glorious golden ass.
    5. Re:Nouns maybe? by Anonymous Coward · · Score: 0

      Try "The Who" vs the who.

    6. Re:Nouns maybe? by WaXHeLL · · Score: 1

      It's not entirely excluded.

      An index of "the pill" and "pill" are two different queries becuase matching the whole phrase will get you more relevant results. This is built into the code that interprets queries (this is completely different from PageRank, which deals with cross linking between sites to get the highest probability of relevance -- AFTER the query is interpreted and a set of pages is generated). Almost all search engines work that way.

      --
      The troll with karma.
    7. Re:Nouns maybe? by Anonymous Coward · · Score: 0

      you're right. stop words are so 1990

    8. Re:Nouns maybe? by svindler · · Score: 1

      So if I want to look for dwarf throwing I'll have to wade through all dwarf related pages because throwing is not relevant for the pagerank?

    9. Re:Nouns maybe? by Bryansix · · Score: 1

      I actually thought about that after I posted. I know all the words are important for indexing. I'm just saying that looking at keywords and placing more importance on those is a part of the mix too. Those keywords are almost always nouns.

  6. A bit late? by kramulous · · Score: 1

    I read about this some time ago ... I think the paper was entitled "The 10 billion dollar Eignvector: The math behind google" or something to that effect. Sorry, but I've got a new laptop and cannot find the exact title. It was an excellent introduction for beginner computational scientists for an application of the eigenvector. I forget the American University responsible.

    --
    .
    1. Re:A bit late? by Anonymous Coward · · Score: 0

      The paper was in SIAM review -- its referenced at the bottom of the page that prompted this thread.

    2. Re:A bit late? by mochan_s · · Score: 1

      Here's the bibtex reference.

      @article{bryan:569,
      author = {Kurt Bryan and Tanya Leise},
      collaboration = {},
      title = {The $25,000,000,000 Eigenvector: The Linear Algebra behind Google},
      publisher = {SIAM},
      year = {2006},
      journal = {SIAM Review},
      volume = {48},
      number = {3},
      pages = {569-581},
      keywords = {linear algebra; PageRank; eigenvector; stochastic matrix},
      url = {http://link.aip.org/link/?SIR/48/569/1},
      doi = {10.1137/050623280}
      }

    3. Re:A bit late? by kramulous · · Score: 1

      Cheers for that. Helpful.

      Kind regards

      --
      .
  7. I joke a lot on Slashdot, but serious question by CrazyJim1 · · Score: 2, Interesting

    I skimmed the article and didn't find what I wanted to find. If you make a webpage that you want ranked high, what do you do? Do you make 100 geocities accounts and provide links to your main website, or what? I'm just wondering this out of curiosity, not out of need.

    1. Re:I joke a lot on Slashdot, but serious question by Larry+Lightbulb · · Score: 1

      At a very basic level a sites page rank is a reflection on how much other sites think it's relevent, and is based on how important the sites are that link to it. Get a link from the BBC, CNN, or somewhere like that and it's worth thousands or millions of links from Geocities sites.

    2. Re:I joke a lot on Slashdot, but serious question by mojodamm · · Score: 1

      That's kinda what I thought at first as well, but looking over the lower two-thirds of the article, I started to get a different impression. They talked about a 'strong web' idea, where if your webpage is disconnected from the 'main' web and set up in a sort of 'secondary web' with just your Geocities accounts, for instance, linking to it, then the actual websites that interconnected within your site matrix would rank a 0 overall.

      Not sure if this is correct or not, just the impression that I got from what little of the TFA I could skim at work...

      --
      I'd rather be an ignorant moron than an anonymous coward.
    3. Re:I joke a lot on Slashdot, but serious question by x_MeRLiN_x · · Score: 1

      That wouldn't work, because they'd all be coming from the same domain.

    4. Re:I joke a lot on Slashdot, but serious question by Anonymous+Brave+Guy · · Score: 5, Informative

      The underlying idea behind page rank is pretty well-exposed at this point, and is described in TFA. Essentially, it's a big set of simultaneous equations: each incoming link to your page gets a score that is roughly the rank of the source page divided by the number of outgoing links on that page, and then the rank of your page is roughly the sum of the scores of all incoming links.

      Various fudge factors are introduced along the way. For example, if you break Google's rules about displaying the same content to bots as to humans, you can get slapped right down. More subtly, newly registered domains take a modest hit for a while. More nobody-knows-ly, Google's handling of redirects is unclear: information about exactly what adjustments are made is pretty scarce, and there's a lot of conjecture around. One thing that's pretty certain is that they penalise for duplicate content, which is why some webmasters do apparently unnecessary things like redirecting http://www.theircompany.com/ to http://theircompany.com/ or vice versa.

      So, if you want to get a page with a high rank yourself, then ideally you need would get many established, highly-ranked pages to link to your page and no others. In your example, all those Geocities sites wouldn't help a lot, because (a) they'd have negligible rank themselves, and (b) they'd be penalised for being new and lose some of that negligible rank before they even started. Many times negligible is still negligible, and so would be your target page's rank. OTOH, get a few links from university sites, big news organisations and the like, and your rank will suddenly be way up there. Alternatively, get a grass-roots movement going where a gazillion individuals with small personal sites link to you, and the cumulative effect will kick in.

      --
      If you disagree, post your argument. (-1, Overrated) isn't your personal censorship tool for views you don't like.
    5. Re:I joke a lot on Slashdot, but serious question by TheLink · · Score: 2, Interesting

      "if you break Google's rules about displaying the same content to bots as to humans"

      I notice many sites that do that and don't get slapped down - esp subscription sites. And seems Google doesn't cache those, so its probably collusion.

      You see the keywords and paragraphs in the search, but click on it you get a login page.

      They should have to pay a special rate be marked differently from the other search results. It's a waste of time otherwise.

      --
    6. Re:I joke a lot on Slashdot, but serious question by Anonymous Coward · · Score: 0

      You get pages with high pagerank to link to you. (If you get apple.com, microsoft.com and google.com linking to you you're home safe).
      Thus the junk mail you get if you have a reasonably high ranking page:
      We have linked you from ``crappyplace.com'' please link to ``crappyplace.com'' from your page.

    7. Re:I joke a lot on Slashdot, but serious question by linhux · · Score: 1

      If those 100 geocities pages each have a PageRank of 0 (which they would if they aren't linked to from other high-ranking pages), their total contribution to your main page PageRank will be 0.

    8. Re:I joke a lot on Slashdot, but serious question by oni · · Score: 4, Interesting

      I notice many sites that do that and don't get slapped down - esp subscription sites.

      I wonder, if I changed my useragent to be whatever the googlebot reports itself to be - would I get by the registration screen on websites like the NYTimes??

    9. Re:I joke a lot on Slashdot, but serious question by kimvette · · Score: 1

      No, because they check the IP you're coming from as well now - they grew wise to user agent spoofing years ago.

      Google for the "bugmenot" Firefox extension.

      --
      The Christian Right is Neither (Christian nor right). See: Matthew 23, Matthew 25, Ezekiel 16:48-50
    10. Re:I joke a lot on Slashdot, but serious question by l0cust · · Score: 2, Insightful

      Thanks for the informative post. I have one question though. How does it help find the relevant information unless that information just happens to be on a popular page too? What I mean to say is that the idea behind grading/filtering systems like PageRank is to provide the most relevant information about the thing you are trying to search on the net. Now suppose Mr. A is looking for some obscure Indian text written in Sanskrit and Mr. B has (recently or not) put up a website with that text as one of the contents but its not a popular blogsite, nor a mainstream ebook source site etc. And there are a gazillion hugely popular sites out there which mention that particular text while talking about a totally different book or text.

      So it means the only way that Mr. A will come across Mr. B's website is if he kept on looking for 100s of result pages or if he just chanced upon it via something he read about earlier. Doesn't it defeat the purpose of making the searches more relevant, specially since lots of webmasters actively use PageRanking system to get better ranking on the search index. Where does that leave the people with worthwhile content but not much popular backing?

      (I would like to clear that I am not trying to knock this system or anything, I am just curious about the implications for small-but-good-content website owners)

      --
      Politicians and Pedophiles: Two groups of exploitive bastards who are most dangerous when they're thinking of children.
    11. Re:I joke a lot on Slashdot, but serious question by XorNand · · Score: 2, Informative

      As pointed out, the Times site isn't fooled, but there are a good many out there that are fooled. Sometimes if you ever do a Google search, one of the results will contain a keyword or two. However, when you click on the link, you'll find yourself redirected to a subscription page. Useragent spoofing can frequently show you the same page that Google indexed.

      If you're a FF user, grab the Useragent Switcher extension and add in a UA of "Mozilla/5.0 (compatible; googlebot/2.1; +http://www.google.com/bot.html)". You'll then be two clicks away from seeing what was previously registration-only.

      --
      Entrepreneur : (noun), French for "unemployed"
    12. Re:I joke a lot on Slashdot, but serious question by jZnat · · Score: 1

      Googlebot doesn't use the same IP address all the time (several servers running Googlebot I'd imagine), so filtering based on IP addresses would be infeasible (at least according to Google).

      --
      'Yes, firefox is indeed greater than women. Can women block pops up for you? No. Can Firefox show you naked women? Yes.'
    13. Re:I joke a lot on Slashdot, but serious question by suggsjc · · Score: 2, Interesting
      Here is an email with associated response I received from Google on roughly this topic.

      This is a very general question. I'm creating a website. It is going to be a blogging platform. Obviouslly, the content of the site(s) is the most important thing. I've already started making the content of my site dynamic in the sense that I tailor it to the requesting agent (via the user-agent header). My intention for doing this is to make sure that the content renders correctly for *any* browser that accesses the site. I've built the site modularly, so tailoring the content to the requesting agent isn't a big deal. However that leads me to my question(s) and the reason I am emailing you? FYI, I have no ulterior motives for being able to tailor my content, other than making sure that the user get the most usefull information.

      That said. When a "bot" (ie your crawler) accesses my site. I'm going to treat it like I would a mobile browser. I'm going to give the minimal markup and the css will be very simple. I'm going to make sure that my content comes before my navigation, advertisements, etc in the source.

      My real question is does the fact that I'm presenting you the content of my site differently from other browsers make a difference? If so, (then again my reasoning is to make sure that my users get the correct content) how do I prevent this from hurting me in your rankings? If not, then how do you protect yourself from other sites taking advantage of this "hole"? Meaning I could make my site appear legit when I knew your bots were crawling me, but give "alternate" content when real users were visiting.

      Last question. Do you have any idea when your ads will work correctly with xhtml?
      Hope you weren't expecting a straightforward answer (like I was), because here is what I got back

      Hello Jonathon,

      Thanks for your email about the website you're creating.

      First, since you asked when our program will support XHTML, I wanted to
      let you know that we're unable to say if we'll support XHTML pages in the
      future.

      While the AdSense team isn't able to answer your questions about your
      site's ranking in the Google search index, I'd recommend visiting
      http://www.google.com/support/webmasters . I also wanted to let you know
      that our advertising programs are independent of our search results.
      Participation in AdWords and AdSense doesn't affect inclusion or ranking
      in the Google search index.

      I've also included answers to some of the most common questions AdSense
      publishers have asked.

      How can I improve my site's ranking?
      Answer:
      http://www.google.com/support/w ebmasters/bin/answer.py?answer=34432&hl=en_US

      H ow do I add my site to Google's search results?
      Answer:
      http://www.google.com/support/w ebmasters/bin/answer.py?answer=34397&hl=en_US

      M y site is no longer included in the search results. What happened?
      Answer:
      http://www.google.com/support/ webmasters/bin/answer.py?answer=34443&hl=en_US

      Why doesn't my site show up for a specific keyword?
      Answer:
      http://www.google.com/support/w ebmasters/bin/answer.py?answer=34434&hl=en_US

      F or additional questions, I'd encourage you to visit the AdSense Help
      Center (http://www.google.com/adsense_help), our complete resource center
      for all AdSense topics. Alternatively, feel free to post your question on
      the forum just for AdSense publishers: the AdSense Help Group
      (http://groups.google.com/group/adsense-hel p).

      Sincerely,

      Jake
      The Google AdSense Team
      --
      When I have a kid, I want to put him in one of those strollers for twins and then run around the mall looking frantic.
    14. Re:I joke a lot on Slashdot, but serious question by Jotii · · Score: 1

      Still, Google has a few IP-ranges which are only for Google.

      --
      [sig]
    15. Re:I joke a lot on Slashdot, but serious question by Anonymous Coward · · Score: 0

      Uh, how about making a good website that people genuinely want to visit, instead of trying to cheat the system? Real consumer appeal - there's an idea!

    16. Re:I joke a lot on Slashdot, but serious question by Hyperspite · · Score: 1

      If you read the entire article carefully, they deal with that by changing the way they search through the web. Instead of following every link, they assign a probability of .85 to following it. This makes their eigenvectors have nonzero entries because the search can jump out of the strong web and get back on track (if the random number falls into the .15 category it goes to a random indexed page from the entire internet). So yea, making a web of geocities accounts wouldn't do much more than you'd think it should - and thank god, because MySpace would pwn the internet otherwise :P

    17. Re:I joke a lot on Slashdot, but serious question by cvos · · Score: 1

      a webpage that you want ranked high, what do you do? Do you make 100 geocities accounts and provide links to your main website No this would definitely not work. The reason is that 100 new geocities websites would have a value of 0 so using the PageRank algorithm you would effectively have 100 links X 0 PR. Incoming links only have a positive impact if they have weight independent of other websites. This is why it is so crucial to have your own website in the oldest dataset possible. It takes a long time for websites created in 1995 to disappear.
      --
      I'm just here for the sigs
  8. There's math?? by Anonymous Coward · · Score: 0

    I didn't think pigeons had much mathematical ability. Or does this mean they've abandoned the biological approach?

  9. Does PageRank count? by matr0x_x · · Score: 2, Interesting

    As a self proclaimed SEO expert - I honestly don't believe PageRank counts nearly as much as it did a few years ago! You'll find lots of PR5 sites ahead in the SERPS of PR9 sites!

    --
    LINUX ONLINE POKER: Linux Poker
    1. Re:Does PageRank count? by Trieuvan · · Score: 4, Insightful

      The pagerank that's reported from toolbar is really old. Google never want to let you know the real number or it will be easy to spam ...

    2. Re:Does PageRank count? by dbmasters · · Score: 1

      PageRank is worthless in terms of SEO. What it can do is tell you if there is a problem, if you have a PR of 0 or 1 or something, but thinking it somehow affects your SERPs is a dillusion far to many people fall in to. Concentrate on SERPs, not PR, ASAP for SEO on the WWW.

      --
      dB Masters
    3. Re:Does PageRank count? by Anonymous Coward · · Score: 2, Funny

      Concentrate on SERPs, not PR, ASAP for SEO on the WWW

      I searched on Google but I cannot find what "on", "not", "for" and "the" mean...

    4. Re:Does PageRank count? by HalfBrown · · Score: 1
      The pagerank that's reported from toolbar is really old.

      I think that at least part of this is indicative of the "Google Sandbox" (if you believe it exists). I've noticed, with the Google Toolbar in IE and FireFox, that some sites seem to have stagnant PR's (even with noticable increases/decreases of traffic), but others move along in a relatively sistent manner.

      Just my 2 cents.

      --
      HalfBrown
      100% Mestizo
  10. KJI by Anonymous Coward · · Score: 0

    As I was reading the article summary I thought it was going to say 95% of the 25 billion pages indexed by Google consist of spam blogs.

  11. Old guys bully new comers. by eat+bugs · · Score: 1

    I asked some math website to put a link to http://www.mathpotd.org/ Math Problem of the Day -- they don't bother to do so. They know the math and use it.

  12. Pagerank by Skythe · · Score: 5, Funny

    Because about 95% of the text on the 25 billion pages indexed by Google consist of the same 10,000 words, determining relevance requires an extremely sophisticated set of methods.

    They use a set of nested if-else statements
    *ducks*

    1. Re:Pagerank by darekana · · Score: 0

      They use a set of nested if-else statements

      No, that would be waaay too many if-elses to write by hand...
      they use IoC and code generation tools.

  13. Here it is... Google's PageRank formula by Reverend99 · · Score: 1, Funny

    SELECT advertiser, description, link, adcost
    FROM tblAdvertisers
    WHERE adword LIKE %searchstring%
    ORDER BY adcost

    1. Re:Here it is... Google's PageRank formula by 8ball629 · · Score: 1

      I'd be mad if I were advertising...

      Shouldn't it be "ORDER BY adcost DESC"?

    2. Re:Here it is... Google's PageRank formula by kaizenfury7 · · Score: 1

      Interesting, I never knew the formula was

      #1064 - You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near '%searchstring% LIMIT 0, 30' at line 1

  14. In Poor Taste by Reverend99 · · Score: 0, Offtopic
    James Kim
    1971-2006
    From Crave to Grave.

  15. Re: [OT] for whoever runs mathpotd by Anonymous Coward · · Score: 0
    It would help if problem 11 had the right answer. From the link I've quoted,
    In a high school auditorium, 1 junior and 2 sophomores are seated randomly together in a row. What is the probability that the 2 sophomores are seated next to each other?
    Assuming we're using the standard meanings of "random", "together" and "row", there are exactly 3 combinations (note: there's no need to distinguish between the sophomores as individuals).

    1. J S S = sophomores together
    2. S J S = not together
    3. S S J = sophomores together

    Thus, the correct answer is 2/3. But for some reason, the site insists that the answer is 1/3.
  16. Very simple mathematics by Anonymous Coward · · Score: 0

    Sure is no string theory. For some strange reason most useful mathematics applied to computers is rather simple; compare that with the sophisticated mathematical tools used in theoretical physics!.

    The fact that the mathematics involved is simple is irrelevant, the only thing that matters is the practical usefulness.

  17. you forgot.. by gfody · · Score: 4, Funny

    ORDER BY adcost DESC

    --

    bite my glorious golden ass.
    1. Re:you forgot.. by Reverend99 · · Score: 1, Troll

      Yeah... you and the sans-humor moderator who called this "Flamebait" should get together. I'm sure you'd make a match made in anal heaven.

  18. Re:Send email to a dead man? by treeves · · Score: 0, Offtopic

    I don't have any Mod points right now, but isn't a reply to an Offtopic post pretty much automatically offtopic? Go ahead and mod me Offtopic, I'll consider that an affirmative answer.

    --
    ...the future crusty old bastards are already drinking the Kool-Aid.
  19. Re:The two that matter by Anonymous Coward · · Score: 1, Interesting

    There's only two that really reflect the power of Pagerank: Click here.
    About 1.2 billion pages, and surprise surprise, Acrobat Reader tops the list, followed by a who's who of internet applications and plugins. But around result #30 it gets a bit more interesting, and when you're a few dozen pages in, "new patterns begin to emerge."

    And to explain why not to use "click here", I found this buried on page 45. Thanks for the proof pudding guys, it's delicious.

  20. Sorely dissapointed by Anonymous Coward · · Score: 0

    With a name like markov_chain I thought you were going to lay the math-smackdown on the GP. After all, page rank is basically a Markov chain where the graph it is acting on is the internet.

  21. OK, but... by indigest · · Score: 1, Informative

    The algorithms behind PageRank are no secret. Why not just read about them from the source?

  22. Pagerank is Broken by Anonymous Coward · · Score: 0

    http://www.google.com/search?hl=en&q=coupons

    Take a look at the top 20 results. You'll notice that 5 of these 20 results are from the same guys.

    1. Re:Pagerank is Broken by RegularFry · · Score: 1

      Why does that make PageRank broken? That's not the problem it tries to solve. Google might be broken for slavishly adhering to PageRank, but that's a different matter entirely...

      --
      Reality is the ultimate Rorschach.
  23. "The Who" vs the who by kenb215 · · Score: 1

    It seems that in searching "The Who", only that exact phrase is returned, but when searching the who, both words are searched, i.e. "the" appears as if it is being searched like a normal word here. If you try searching for the best, "the" is counted when used as part of the phrase "the best", but appears not to be counted when it appears by itself. The Google algorithm is apparently a lot more complicated than the usual explanations are.

  24. Only three articles about Google on one page? by colourmyeyes · · Score: 3, Funny

    I think we can get four or five tomorrow.

    --
    My grandmother used anecdotal evidence all the time, and she lived to be 120 years old.
  25. Thanks for all the replies by CrazyJim1 · · Score: 1

    I now have a nice basic understanding of Google page ranking system. Thats all I was asking for.

  26. evolution by drDugan · · Score: 1

    Great article.

    The character of online content is changing now rapidly. We used to be in an Internet where mostly only the site provider determined the content on the pages they served (/. being a notable, early exception). Now, with the rise of "2.0" systems, user-generated content, and empowerment of the individual - the content being served on many sites is coming into sites from wide groups, and being moderated and curated by those groups.

    So... a thought: as user-submitted and group-moderated content continues to rise on the Internet - the main premise behind PageRank system will change. To remain relevant, Google will need to continue to evolve how they do their rankings to match the structure of data in the online world. Will/Can they?

    1. Re:evolution by the_womble · · Score: 1

      I could not disagree more. Most of the sort of information people search for is not user generated: when did you last do a Google search for which a slasdot comment was the appropriate answer?

      The only exception that I can think of (form my searches) are forums that have answers to software problems. Google seems to have no problem finding these for me.

    2. Re:evolution by Vintermann · · Score: 1

      Sometimes you want to search through your old posts. Not all sites let you do that (slashdot does if you pay up, I think), and often forums are even norobots space.

      --
      xkcd is not in the sudoers file. This incident will be reported.
    3. Re:evolution by drDugan · · Score: 1

      The meme that Google helps us find all the information is a huge marketing Spin.

      Compared to "exactly the information you want, when and how you want it" - Google sucks. It is better that anything else now, but it still is not anywhere close to really solving the information access problem generally.

    4. Re:evolution by Anonymous Coward · · Score: 0

      post 1 of 2 in the serial art series

    5. Re:evolution by Anonymous Coward · · Score: 0

      post 2 of 2 in the serial art series

  27. It's the World' s Largest Matrix Computation by MadMagician · · Score: 2, Informative

    For a different, somewhat more technical, but more succint discussion, Cleve Moler [of Matlab fame] wrote another view of this topic, about 5 years ago.

    The math is the same, of course, but two points of view may provide a greater sense of perspective. So to speak. And Cleve is always worth listening to.

    1. Re:It's the World' s Largest Matrix Computation by jfengel · · Score: 1

      Actually, I'm not so sure it's the largest matrix computation. Weather and nuclear bomb simulations are done with matrix algebra, and it wouldn't surprise me to discover that they do some months-long calculations with even larger matrices.

  28. Other google technologies by quakehead3 · · Score: 1, Redundant
  29. Shameless plug for my uni by sat1308 · · Score: 0

    Just thought I'd add this shameless plug here for my uni...I'm currently taking an undergraduate course in linear algebra at Stanford (Math 51; it's taken by a lot of freshmen) and we studied almost everything the article talked about earlier in the quarter. So, moral of the story - if you want to learn interesting stuff, come to Stanford!

    I thought some people might be interested...

    1. Re:Shameless plug for my uni by CptPicard · · Score: 1

      Frankly, any university with a CS program worth anything will have students take a linear algebra course in math as the first thing. It's a good weed-out-the-weak excercise early on, gets you up to speed with university level mathematics, and the stuff in itself comes in handy, for example in computer graphics. Being good at manipulating matrices has a lot of use in algorithmics too.

      Please, try to impress me about Stanford some other way once you've progressed further ;-)

      --
      I want to play Free Market with a drowning Libertarian.
    2. Re:Shameless plug for my uni by Anonymous Coward · · Score: 1, Insightful

      Blah. Ugly red clothes... Go Bears!

    3. Re:Shameless plug for my uni by retiarius · · Score: 1

      shameless math plug(s) from my alma mater:

                - cal berkeley leads stanford in william lowell putnam competition fellows

                - as for killer math events

                                  stanford had streleski (v.i.z. wikipedia)
                                  but berkeley topped him with kaczynski (!)

      seriously, best wishes for the cardinals shepherding
      the putman team under prof. vakil last saturday.

  30. pretty cool by Anonymous Coward · · Score: 0

    neat search! page 28, at the bottom, right in a row, state of texas, gop.com, then realplayer

    there's some funny stuff in there! What humans really think is worth clicking here for and close linkages in group mindset

  31. Stop the obsession by Anonymous Coward · · Score: 0

    Pagerank is not a number between 1 and 10. There are lots of other good citations to back me up but I'm too lazy to find them right now. And too lazy to log in.

  32. Re: [OT] for whoever runs mathpotd by eat+bugs · · Score: 1

    A system error caused the problem. It didn't insist it be 1/3 -- it was because choice B (which didn't correspond to any choice) was given as the correct answer. The website support has corrected the problem.

  33. Interesting Appendix: Page and Brin on Advertising by jbourj · · Score: 1
    One of the references for the article is http://infolab.stanford.edu/pub/papers/google.pdf" >The Anatomy of a Large-Scale Hypertextual Web Serach Engine published in Computer Networks and ISDN Systems. At the end of the paper, they have a very interesting appendix: "Advertising and Mixed Motives"

    Currently, the predominant business model for commercial search engines is advertising. The goals of the advertising business model do not always correspond to providing quality search to users. For example, in our prototype search engine one of the top results for cellular phone is "The Effect of Cellular Phone Use Upon Driver Attention", a study which explains in great detail the distractions and risk associated with conversing on a cell phone while driving. This search result came up first because of its high importance as judged by the PageRank algorithm, an approximation of citation importance on the web [Page, 98]. It is clear that a search engine which was taking money for showing cellular phone ads would have difficulty justifying the page that our system returned to its paying advertisers. For this type of reason and historical experience with other media [Bagdikian 83], we expect that advertising funded search engines will be inherently biased towards the advertisers and away from the needs of the consumers.

    Since it is very difficult even for experts to evaluate search engines, search engine bias is particularly insidious. A good example was OpenText, which was reported to be selling companies the right to be listed at the top of the search results for particular queries [Marchiori 97]. This type of bias is much more insidious than advertising, because it is not clear who "deserves" to be there, and who is willing to pay money to be listed. This business model resulted in an uproar, and OpenText has ceased to be a viable search engine. But less blatant bias are likely to be tolerated by the market. For example, a search engine could add a small factor to search results from "friendly" companies, and subtract a factor from results from competitors. This type of bias is very difficult to detect but could still have a significant effect on the market. Furthermore, advertising income often provides an incentive to provide poor quality search results. For example, we noticed a major search engine would not return a large airline's homepage when the airline's name was given as a query. It so happened that the airline had placed an expensive ad, linked to the query that was its name. A better search engine would not have required this ad, and possibly resulted in the loss of the revenue from the airline to the search engine. In general, it could be argued from the consumer point of view that the better the search engine is, the fewer advertisements will be needed for the consumer to find what they want. This of course erodes the advertising supported business model of the existing search engines. However, there will always be money from advertisers who want a customer to switch products, or have something that is genuinely new. But we believe the issue of advertising causes enough mixed incentives that it is crucial to have a competitive search engine that is transparent and in the academic realm.
  34. The most interesting..... by Anonymous Coward · · Score: 0

    part of the original text is
    8. Appendix A: Advertising and Mixed Motives

    They discuss the motives of search engines and the inability of a for profit search engine to correctly answer a search if it is in direct competition with its advertisers

  35. Popularity != Quality by Anonymous Coward · · Score: 0

    Popularity != Quality. Thats the way I see pagerank and other search response methods.
    I think pagerank gives relatively bad results because of a flawed notion that something
    that is popular is authoritative for determining what you are searching for.

    Unfortunately computers are not very good at comprehending the data they read
    and understanding in context.... yet.

    I think, all we have so far in search engine ranking is a bunch of
    "near enough is good enough tricks".

    I hope someone in a garage somewhere finds a better trick, I am getting bored of
    the way search engines search.

    1. Re:Popularity != Quality by namco · · Score: 1

      Porn has no quality (just cheese ;p) and is very popular! I'd love to see pageranking's on those every month!

  36. Did anyone else... by Anonymous Coward · · Score: 0

    Did anyone else read that as "The MYTH behind Pagerank"

  37. Why doesn't Brin get some credit?? by moeinvt · · Score: 1

    ??

    Seems unfair that something Brin and Page developed together would bear only one of their names.

    "Page-rank"

    ??

  38. Re:James Kim by Anonymous Coward · · Score: 0

    And they predictably bitch-slapped this thread.

  39. Re:Bad summary (MOD UP???) by Anonymous Coward · · Score: 0

    ???
    6 (almost 7) hours later and this post wasn't moded freaking hilarious??

    I apologize for that sir, you're +500 funny by me.

    And yes I'm posting AC too..
    And yes I just got on reading this..
    And no I don't have any mod friggin points :/

  40. Pages that don't exist anymore by namco · · Score: 2, Interesting

    I've seen links on google searches that don't exist anymore but were ranked highly when they DID exist and still exist in the top 10 of the query. What happens to those? Do they stay at their ranking till they get overtaken by other more popular pages on the same search? Get their ranking slowly reduced because they don't exist?