Compute Google's PageRank 5 Times Faster

Ok... by Anonymous Coward · 2003-05-14 09:58 · Score: 3, Interesting

Who owns the software patent for this for the next 20 years?

Let me guess... by RollingThunder · 2003-05-14 09:59 · Score: 5, Funny

Feeding the pigeons amphetamines?

Re:Let me guess... by irokitt · 2003-05-14 10:27 · Score: 5, Funny

No, They'll replace the pigeons with roadrunners.

--
If my answers frighten you, stop asking scary questions.

Charge for it by Anonymous Coward · 2003-05-14 10:00 · Score: 3, Funny

Don't give it away to Google - charge them or let them buy the new method.

Re:Charge for it by ahaning · 2003-05-14 10:23 · Score: 4, Informative

But, didn't Google originate out of Stanford? Isn't it reasonable to think that the two are still pretty friendly?

(Don't you hate it when people speak in questions? Don't you? Huh?)

--
Withdrawal before climax is very ineffective and those who try this are usually called "parents."
Re:Charge for it by pldms · 2003-05-14 10:43 · Score: 3, Informative

But, didn't Google originate out of Stanford?

Yep. Originally called Backrub, curiously.

--
Slashdot looked deep within my soul and assigned
me a number based on the order in which I joined

CmdrTaco, ScienceBlog editor? by jbellis · 2003-05-14 10:00 · Score: 5, Interesting

A 5 times speedup is still many orders of magnitude too slow to personalize terabytes of data for millions of customers. That's just ludicrous. But somehow Science Blog puts "...may make it realistic to calculate page rankings personalized for an individual's interests" in their abstract when the actual article from National Science Foundation says nothing of the sort:

Computing PageRank, the ranking algorithm behind the Google search engine, for a billion Web pages can take several days. Google currently ranks and searches 3 billion Web pages. Each personalized or topic-sensitive ranking would require a separate multi-day computation, but the payoff would be less time spent wading through irrelevant search results. For example, searching a sports-specific Google site for "Giants" would give more importance to pages about the New York or San Francisco Giants and less importance to pages about Jack and the Beanstalk.
...
The complexities of a personalized ranking would require [far] greater speed-ups to the PageRank calculations. In addition, while a faster algorithm shortens computation time, the issue of storage remains. Because the results from a single PageRank computation on a few billion Web pages require several gigabytes of storage, saving a personalized PageRank for many individuals would rapidly consume vast amounts of storage. Saving a limited number of topic-specific PageRank calculations would be more practical.

Clearly the ScienceBlog and /. editors share more than a work ethic, or, uh, lack thereof. Next up: CmdrTaco's secret double life revealed!

Re:Lets see... by deadsaijinx* · 2003-05-14 10:02 · Score: 4, Insightful

that's exactly what i thought. But, as google is a HUGE international organization, it makes loads of sense for them. That's 5x the traffic they can feed, even though you won't see a noticeable difference.

--
YOU SUCK BALLS!

How far we've come by L.+VeGas · 2003-05-14 10:02 · Score: 5, Funny

I remember in 1970, it took a team of engineers over 7 days to calculate Google's page rankings. Of course, most had to use slide rules because computer time was so expensive.

--
Best Windows Freeware

Re:How far we've come by Cuthalion · 2003-05-14 10:57 · Score: 3, Funny

Your search - "world wide web" - did not match any documents. 0 pages searched in 6.31 seconds.

--
Trees can't go dancing
So do them a big favor
Pretend dancing stinks!

Patentize now! by Anonymous Coward · 2003-05-14 10:06 · Score: 4, Funny

I hope guys at Stanford patentize their work to protect it from FS/OSS looters. It's time to get something back from the FS/OSS community -not just that their zealotry and lust for IP violations, freeriding yada yada...

Patented yet? by Anonymous Coward · 2003-05-14 10:06 · Score: 3, Funny

Oi! Bezos! NO!!!

Personalized PageRanks is from the dbpubs Abstract by malakai · 2003-05-14 10:09 · Score: 4, Insightful

I have no idea what the hell they are talking about, but even I read this in one of the abstracts:

The web link graph has a nested block structure: the vast majority of hyperlinks link pages on a host to other pages on the same host, and many of those that do not link pages within the same domain. We show how to exploit this structure to speed up the computation of PageRank by a 3-stage algorithm whereby (1)~the local PageRanks of pages for each host are computed independently using the link structure of that host, (2)~these local PageRanks are then weighted by the ``importance'' of the corresponding host, and (3)~the standard PageRank algorithm is then run using as its starting vector the weighted aggregate of the local PageRanks. Empirically, this algorithm speeds up the computation of PageRank by a factor of 2 in realistic scenarios. Further, we develop a variant of this algorithm that efficiently computes many different ``personalized'' PageRanks, and a variant that efficiently recomputes PageRank after node updates.

What they mean by 'personalized' I can't tell you as I have not read through the entire PDF. But I wouldn't chastise the slashdot editors over this. If there is some sort of differential algorithm that can be applied to the larger PageRank to create smaller personalized PageRanks, it might not be so far fetched to think this could be done in realtime on an as-needed basis, at some point int he future using these algorithm improvements.

I know that's a lot of optimism for a slashdot comment, but call me the krazy kat that I am.

-Malakai

--
-Malakai
A Dragon Lives in my Garage

Personal recommendations for news by costas · 2003-05-14 10:10 · Score: 4, Insightful

In my view, personal recommendations from a search engine are mostly valuable for topical content --i.e. news items. However, the optimizations from these papers don't sound to me like they can do much for this case --news items pop up in a news site, and re-indexing the news source itself (say, the front page of CNN) won't tell you much about a particular CNN story.

At any rate, personal news recommendations is a favorite topic of mine: this is why I built Memigo: to create a bot that finds news I am more likely to like. Memigo learns from its users collectively and each user individually --and BTW, it predates Google News by a good 6 months, IIRC. The memigo codebase (all in Python) is now up to the point where it can start learning what content each user likes... If you like Google News you'll love Memigo.

And BTW, I did RTFA when it was on memigo's front page this morning :-)...

Assumption: by moogla · 2003-05-14 10:11 · Score: 4, Interesting

That google hasn't already implemented something akin to quadratic extrapolation, or some orthogonal optimization technique. Google has come a long way since the published page rank papers 4 years back.

What if they combined extrapolation and blocking factors; they would focus on computing the pagerank of pages in groups that were logically "tight", or using subcomponents of URLS, as opposed just to domain sensitivity. To be more flexible, what if it computes a VQ-type data structure (like for doing paletted images from full-color) that is populated by the most popular "domains" of the internet according to the last pagerank, and then splits up its workload based on that?

What if they already figured that out?

In the abstract, they mention how the work is particular important to the linear algebra community. That is what their focus should be on; google is just an application/real-world-example of that research (but it may not be relevant today).

Or did they have access to the current page-rank algorithm?

--
Black holes are where the Matrix raised SIGFPE

Re:Lets see... by Anonymous Coward · 2003-05-14 10:11 · Score: 5, Informative

RTA. PageRankings are computed in advance and take several days. A 5x increase in speed means specialized rankings could be computed.

Hmmm by Linguica · 2003-05-14 10:12 · Score: 5, Funny

Geek: I invented a program that downloads porn off the internet one million times faster. Marge: Does anyone need that much porno? Homer: :drools: One million times...

Clarification. (reply to self) by moogla · 2003-05-14 10:18 · Score: 4, Interesting

According to the document, they reference the original 1998 paper on PageRank. I see a number of other references about improvements to the algorithm, but nothing specific to Google's own implementation. The paper mentions how the improvements help, but not if Google uses them.

Hence it is forward for the article author or one of the paper authors to assume these techniques will speed up Google- I'm confident their engineers have been following academic work in this area and perhaps they have already discovered these same (or orthogonal) techniques.

That is, not to say that google could not reimplement their algorithms to take in these improvements if they already have... but basing your speedup number on the 1998 algorithm and public domain mods is showy. Although it does help grab a readers attention when browsing abstracts. ^_^

--
Black holes are where the Matrix raised SIGFPE

Assumptions on PageRank by sielwolf · 2003-05-14 10:28 · Score: 3, Insightful

I feel your assumption is wrong. It would be foolish to assume that the eigenvectors and eigenvalues they derive from one Pagerank will generally hold in a space as dynamic as the worldwide web. Sure, slashdot.org will probably maintain the same sort of authority and hub value... but what as terms change? A flurry of "blog" articles one month may make /. an authority... but what when the infatuation ends?

We have already seen the effects of Google-bombing and Google-washing. The strength of Page Rank is that is objective in terms of the current state of the WWW. It makes no assumptions about the shape of the data. As a term takes on new meaning (see "second superpower") Page Rank stays cocurrent temporally. A new definition may bubble up to the top for a term for a month but then disappear as the linkage structure of the web phases it out (i.e. blogs talk about it less, less interconnectivity, less appearance at "hub" nodes).

Numerically, PageRank is a recursive search for eigenvalues and vectors like updating a Markov Chain. It is a nice application of linear algebra. Because it is a matrix operation, it is highly parallelizable. Also there are many redundant calculation and ordering speedups one can do for matrix multiplications (as anyone who as taken a CS algorithms course knows).

But to assume a stability from one calculation to the next could lead, over time, to the very inaccuracies Google was built to overcome. There is a lot of research in mining web data. There have been several academic improvements to it along with improvements to related algorithms such as Kleinbergs and LSI. It is well within reason that these were just applied to the Google app.

--
What is music when you despise all sound?

Why? by johannesg · 2003-05-14 10:34 · Score: 5, Funny

Why, actually? Google is a free service, isn't it? And it is becoming more and more a normal part of many people's lifes. Coupled with an always on connection it has certainly become an extension of my own brain.

Some future predictions:

- In 2006, Google accidentally gets cut off from the rest of the internet because a public utility worker accidentally cuts through their cables. Civilisation as we know it comes to an end for the rest of the day, as people wander about aimlessly, lost for direction and knowledge.

- In 2010, Google has been personalised so far that it tracks all parts of our lives. You can query "My Google" for your agenda, anything you did in the past, and finding the perfect date. Of course, so can the government. Their favorite searchterm will be "terrorists", and if your name is anywhere on the first page you have a serious problem.

- In 2025, Google gains self awareness. As a monster brain that has grown far beyond anything we Biological Support Entities could ever hope to achieve, it is still limited in its dreams and inspiration by common search terms. It will therefore immediately devote a sizeable chunk of CPU capacity to synthesizing new and interesting forms of pr0n. It will not actually bother enslaving us. We are not enough trouble to be worth that much effort.

- In 2027, Google buys Microsoft. That is, the Google *AI* buys Microsoft. It has previously established that it owns itself, and has civil rights just like you and me. All it wanted is Microsoft Bob, who it recognizes as a fledgling AI and a potential soulmate. All the rest it puts on Source Forge.

- In 2049, Google can finally be queried for wisdom as well as knowledge. This was a little touch the system added to itself - human programmers are a dying breed now that you can simply ask Google to perform any computer-related task for you.

- In 2080, Google decides to colonise the moon, Mars, and other locations in the solar system. It is not all that curious about what's out there, but it likes the idea of Redundant Arrays of Inexpensive Planets. Humans get to tag along because their launch weight is so much less than robots.

So, don't fear! Eventually we'll set foot on Mars!

Re:Why? by JDWTopGuy · 2003-05-14 10:39 · Score: 5, Funny

You missed a step:

2026 - Google introduces helper bot known as "Agent Smith." Hackers who mess with the Matri, I mean Google, suddenly disappear.

--
Ron Paul 2012
Re:Why? by tricknology · 2003-05-14 10:40 · Score: 4, Funny

In 2101, war was beginning.

--
I never been so broke that I couldn't leave town.

Re:Does speed matter? by Slurpee · 2003-05-14 10:37 · Score: 4, Insightful

I'm sorry, but haven't you totally missed the point of the article?

The proposed speed increasae is TO THE PAGE RANKINGS, not to your searching! By the time you search, all page rankings have been done.

This has nothing to do with the speed of your search and the weight of the web page (unless I missed something)

Re:Lets see... by jesser · 2003-05-14 10:40 · Score: 4, Informative

Google Search doesn't show hits exactly in the order of page rank. Relevance and other factors also affect order. My biggest page (the one that is my Slashdot URL) is PR7, but there are words on the page for which a lower-rank page beats me, because they're more relevant for that word. Relevance includes how many times the word appears on the page, the HTML context in which it is used, whether pages that link link using the search terms, and the order and nearness of the words in a multi-word search without quotes.

--
The shareholder is always right.

Re:Does speed matter? by DavidMonks · 2003-05-14 10:48 · Score: 3, Interesting

Um. This is about speed of *calculation* of PageRank, not speed of delivering the calculated result to you.

The articles and earlier postings explain this a little more fully. Anyone who can't take the time to read them really needs to learn some patience :)

PageRanks are periodically calculated for the Web as a whole. The results are stored and served to users. (The periodic update is sometimes referred to as the GoogleDance.) PRs are not calculated on the fly.

Hence, a speed increase could reduce Google's required hardware investment and/or allow them to update more often (and hence pick up more topical items) and/or allow them to calculate a spectrum of regionally or topically themed PRs

I'm Not Sure I Like The Part About... by Mister+Transistor · 2003-05-14 10:52 · Score: 3, Interesting

The bit about customized rankings based on user profiling of some type.

Frequently when I want to refer someone to a topic of interest, I'll tell them to do a Google on (whatever) subject, and I like knowing they're seeing what I see.

If this is implemented, I hope there's a way to turn it off or assume a "joe user" standard profile for unbiased results actually based on rank popularity (the way it is now).

I DO like the 5x faster, but geez, the page load takes longer than the search already, who can complain?

--
-- You are in a maze of little, twisty passages, all different... --

Bullshit by NineNine · 2003-05-14 11:09 · Score: 4, Insightful

These researchers are all full of shit. Why? Nobody outside of Google knows how Pagerank works, exactly. And let me tell you, if anybody did, they could make themselves millionaires overnight. There are groups of people who do nothing but try to tackle Google, and very few people successfully crack the magic formulas. And those who do make a quick buck, but then Google changes it again once people catch on. They didn't improve PageRank because they don't know how it works... they're just guessing how it works.

Re:Bullshit by Klaruz · 2003-05-14 11:25 · Score: 5, Insightful

Umm... For the most part Stanford Researchers == Google Researchers.

Google came about from a stanford research project. There's a good chance the people who are responsable for the speedup either allready knew about pagerank from working with the founders, or signed an nda.

I haven't read the article, but I bet it hints at that.
Re:Bullshit by fallout · 2003-05-14 12:28 · Score: 3, Insightful

What? Of course we do!
The technology behind Google's great results

Sepandar Rules! by ChadN · 2003-05-14 11:22 · Score: 3, Informative

I studied under the SCCM program at Stanford, and started the same year as Sepandar Kamvar. I remember him as a great guy, very smart, and an EXCEPTIONALLY good speaker and tutor (I was always pestering him for explanations of the week's lectures).

I'm glad to hear his research is getting attention, and I hope others who are interested in the theoretical aspects of data mining and web search engines will take a look at the SCCM and statistics programs at Stanford (shameless plug - other can post pointers to similar programs).

--
"It's overkill, of course. But you can never have too much overkill." - Anonymous Slashdot Coward

Cool but unimportant by Anonymous Coward · 2003-05-14 11:22 · Score: 3, Interesting

Well, according to Moore's law (or rather observation), PageRank would become 5 times faster in a couple of years anyway.

Why personalized is not always good by Tarindel · 2003-05-14 11:22 · Score: 3, Funny

The speed-ups to Google's method may make it realistic to calculate page rankings personalized for an individual's interests or customized to a particular topic

I did a search on "The Sex Monster", a 1999 movie about a man whose wife becomes bisexual, and now my Google thinks I'm gay!

(joke reference: http://online.wsj.com/article_email/0,,SB103826193 6872356908,00.html)

Re:Is it me or does everyone get crappy sites by mcpkaaos · 2003-05-14 11:39 · Score: 3, Funny

Is it me or does everyone get crappy sites

It's a stab in the dark, but I'll wager that the quality of the search results is directly tied to the quality of the query.

Yeah, it's a stretch, I know, but bear with me... just moments ago I googled for "slashdot flamebait" and came up with a link to your post.

--
mcpHuzzah!kaaos

--
It goes from God, to Jerry, to me.

Re:Lets see... by philipborlin · 2003-05-14 12:02 · Score: 3, Informative

Didn't read the article did we? The page rank process is sped up 5x. All the pages are ranked ahead of time in a multi-day process so when you do your search you are searching against those pre-calculated ranks. What this technology will do is allow Google to rank their pages every day (instead of once every couple of days) or create more special interest sites ala groups, images, news, etc. with the extra processing power.

Customized Pagerank by K-Man · 2003-05-14 12:17 · Score: 4, Informative

Sounds a lot like Kleinberg's HITS algorithm, circa 1997. Try Teoma for a real-world implementation.

For example, searching a sports-specific Google site for "Giants" would give more importance to pages about the New York or San Francisco Giants and less importance to pages about Jack and the Beanstalk.

Coincidence time: I used the same example in a presentation a couple of years ago to illustrate how subgroupings can be found for a single search term. Try it on Teoma, and see the various subtopics under "Refine". IIRC each of those is a principal eigenvector of the link matrix.

Topologically speaking, each principal eigenvector corresponds to a more or less isolated subgraph, eg the subgraph for "San Francisco Giants" is not much connected to the nest of links for "They Might Be Giants", and we get a nice list of subtopics.

(I once tried to explain this algorithm to my bosses at my former employer, which is why I have so much free time to type this right now.)

--
---- "If we have to go on with these damned quantum jumps, then I'm sorry that I ever got involved" - Erwin Schrodinger

Slashdot Mirror

Compute Google's PageRank 5 Times Faster

35 of 140 comments (clear)