Compute Google's PageRank 5 Times Faster
Kimberley Burchett writes "CS researchers at Stanford University have developed three new techniques that together could speed up Google's PageRank calculations by a factor of five. An article at ScienceBlog theorizes that "The speed-ups to Google's method may make it realistic to calculate page rankings personalized for an individual's interests or customized to a particular topic.""
Who owns the software patent for this for the next 20 years?
damn.. this is good news ;)
anime+manga together at last.. in real time.
Feeding the pigeons amphetamines?
What is 1/14th of a second divided by five?
Don't give it away to Google - charge them or let them buy the new method.
This first post is coming at you 5 times faster. Or not. I don't care.
I remember in 1970, it took a team of engineers over 7 days to calculate Google's page rankings. Of course, most had to use slide rules because computer time was so expensive.
Best Windows Freeware
I hope guys at Stanford patentize their work to protect it from FS/OSS looters. It's time to get something back from the FS/OSS community -not just that their zealotry and lust for IP violations, freeriding yada yada...
Oi! Bezos! NO!!!
What they mean by 'personalized' I can't tell you as I have not read through the entire PDF. But I wouldn't chastise the slashdot editors over this. If there is some sort of differential algorithm that can be applied to the larger PageRank to create smaller personalized PageRanks, it might not be so far fetched to think this could be done in realtime on an as-needed basis, at some point int he future using these algorithm improvements.
I know that's a lot of optimism for a slashdot comment, but call me the krazy kat that I am.
-Malakai
-Malakai
A Dragon Lives in my Garage
In my view, personal recommendations from a search engine are mostly valuable for topical content --i.e. news items. However, the optimizations from these papers don't sound to me like they can do much for this case --news items pop up in a news site, and re-indexing the news source itself (say, the front page of CNN) won't tell you much about a particular CNN story.
:-)...
At any rate, personal news recommendations is a favorite topic of mine: this is why I built Memigo: to create a bot that finds news I am more likely to like. Memigo learns from its users collectively and each user individually --and BTW, it predates Google News by a good 6 months, IIRC. The memigo codebase (all in Python) is now up to the point where it can start learning what content each user likes... If you like Google News you'll love Memigo.
And BTW, I did RTFA when it was on memigo's front page this morning
That google hasn't already implemented something akin to quadratic extrapolation, or some orthogonal optimization technique. Google has come a long way since the published page rank papers 4 years back.
What if they combined extrapolation and blocking factors; they would focus on computing the pagerank of pages in groups that were logically "tight", or using subcomponents of URLS, as opposed just to domain sensitivity. To be more flexible, what if it computes a VQ-type data structure (like for doing paletted images from full-color) that is populated by the most popular "domains" of the internet according to the last pagerank, and then splits up its workload based on that?
What if they already figured that out?
In the abstract, they mention how the work is particular important to the linear algebra community. That is what their focus should be on; google is just an application/real-world-example of that research (but it may not be relevant today).
Or did they have access to the current page-rank algorithm?
Black holes are where the Matrix raised SIGFPE
.2 seconds is just too slow for a search to complete, why I almost had to wait for one the other day.
Geek: I invented a program that downloads porn off the internet one million times faster. :drools: One million times...
Marge: Does anyone need that much porno?
Homer:
According to the document, they reference the original 1998 paper on PageRank. I see a number of other references about improvements to the algorithm, but nothing specific to Google's own implementation. The paper mentions how the improvements help, but not if Google uses them.
Hence it is forward for the article author or one of the paper authors to assume these techniques will speed up Google- I'm confident their engineers have been following academic work in this area and perhaps they have already discovered these same (or orthogonal) techniques.
That is, not to say that google could not reimplement their algorithms to take in these improvements if they already have... but basing your speedup number on the 1998 algorithm and public domain mods is showy. Although it does help grab a readers attention when browsing abstracts. ^_^
Black holes are where the Matrix raised SIGFPE
I remember when Yahoo.com flauted all of the place how it would load in under 3 secs on a 28.8 modem. Now you visit them and you get big images, flash, java, and other massive bandwidth eatters.
Does it really matter anymore? More and more users seem to be using broadband, and if they don't, they have at least a 56k (that can only go up to 53k because of the all wonderful FCC want to be able to decode it if they tap your line). Does it really matter though. Google is fast and simple so it loads on any kind of browser on the planet (even Lynx and PalmOS). Most searches for me come up in under 2.3 secs (1/2 is spent searching and the other is downloading). Anyone who can't wait that long really needs to learn some patients. Zac
No.
Yeah, that half second delay has always annoyed me when I make a search on google... eh..
I feel your assumption is wrong. It would be foolish to assume that the eigenvectors and eigenvalues they derive from one Pagerank will generally hold in a space as dynamic as the worldwide web. Sure, slashdot.org will probably maintain the same sort of authority and hub value... but what as terms change? A flurry of "blog" articles one month may make /. an authority... but what when the infatuation ends?
We have already seen the effects of Google-bombing and Google-washing. The strength of Page Rank is that is objective in terms of the current state of the WWW. It makes no assumptions about the shape of the data. As a term takes on new meaning (see "second superpower") Page Rank stays cocurrent temporally. A new definition may bubble up to the top for a term for a month but then disappear as the linkage structure of the web phases it out (i.e. blogs talk about it less, less interconnectivity, less appearance at "hub" nodes).
Numerically, PageRank is a recursive search for eigenvalues and vectors like updating a Markov Chain. It is a nice application of linear algebra. Because it is a matrix operation, it is highly parallelizable. Also there are many redundant calculation and ordering speedups one can do for matrix multiplications (as anyone who as taken a CS algorithms course knows).
But to assume a stability from one calculation to the next could lead, over time, to the very inaccuracies Google was built to overcome. There is a lot of research in mining web data. There have been several academic improvements to it along with improvements to related algorithms such as Kleinbergs and LSI. It is well within reason that these were just applied to the Google app.
What is music when you despise all sound?
since the description of the methods are on non standard ports, can somebody put up a mirror for those behind restrictive firewalls?
Just when I was starting to go one direction with my theories on Google PR the game gets switched. I thought I was going to have the upper hand for once. Oh well. It would be nice to see this happen as a true user service.
Some future predictions:
- In 2006, Google accidentally gets cut off from the rest of the internet because a public utility worker accidentally cuts through their cables. Civilisation as we know it comes to an end for the rest of the day, as people wander about aimlessly, lost for direction and knowledge.
- In 2010, Google has been personalised so far that it tracks all parts of our lives. You can query "My Google" for your agenda, anything you did in the past, and finding the perfect date. Of course, so can the government. Their favorite searchterm will be "terrorists", and if your name is anywhere on the first page you have a serious problem.
- In 2025, Google gains self awareness. As a monster brain that has grown far beyond anything we Biological Support Entities could ever hope to achieve, it is still limited in its dreams and inspiration by common search terms. It will therefore immediately devote a sizeable chunk of CPU capacity to synthesizing new and interesting forms of pr0n. It will not actually bother enslaving us. We are not enough trouble to be worth that much effort.
- In 2027, Google buys Microsoft. That is, the Google *AI* buys Microsoft. It has previously established that it owns itself, and has civil rights just like you and me. All it wanted is Microsoft Bob, who it recognizes as a fledgling AI and a potential soulmate. All the rest it puts on Source Forge.
- In 2049, Google can finally be queried for wisdom as well as knowledge. This was a little touch the system added to itself - human programmers are a dying breed now that you can simply ask Google to perform any computer-related task for you.
- In 2080, Google decides to colonise the moon, Mars, and other locations in the solar system. It is not all that curious about what's out there, but it likes the idea of Redundant Arrays of Inexpensive Planets. Humans get to tag along because their launch weight is so much less than robots.
So, don't fear! Eventually we'll set foot on Mars!
Printer friendly version here
... and furthermore
"The speed-ups to Google's method may make it realistic to calculate page rankings personalized for an individual's interests or customized to a particular topic."
So in other words.... Its not like Google at all!
Other media have previously done this, and done this better. Case in point: Fox News.
(Although that channel uses "humans" (or they were at one point in their lives)).
please
Why are a public university's funds and time being used to benefit a private company? Last I checked, Google isn't a charity. Doesn't Google have its own programmers? Wouldn't these "CS Researchers'" time be better spent furthering science instead of being free labor for corporations, at the expense of taxpayers?
occultae nullus est respectus musicae - originally a Greek proverb
What will be interesting to see if Google will implement the improvements to the algorithm. This is, of course, a given, so long as the researchers haven't gone for a patent, and it really has the a 5x speedup. The only questions are matters of what additional hardware would be needed, and how much development effort it will take to integrate it. I doubt Google will simply ignore the research.
What will really be interesting to see, is if they decide to use it in the way the researchers recommended, bringing the power of ranking down to individual users with preferences. On one hand, they can boost performance and cut costs and have a little more green in their pockets from ads. On the other, they can maintain the sort of "geek cred" they've had up to this point, adding interesting features here and there, and take it the next mile by really adding something nice and useful.
Also, for bonus points, will they see personalization as a money making opportunity, selling personal information and/or aggregated preferences?
5/14/03: The Day CBN Returned!
Get your Unix fortune now!
The bit about customized rankings based on user profiling of some type.
Frequently when I want to refer someone to a topic of interest, I'll tell them to do a Google on (whatever) subject, and I like knowing they're seeing what I see.
If this is implemented, I hope there's a way to turn it off or assume a "joe user" standard profile for unbiased results actually based on rank popularity (the way it is now).
I DO like the 5x faster, but geez, the page load takes longer than the search already, who can complain?
-- You are in a maze of little, twisty passages, all different... --
Fox News is only personalized for reactionaries. CNBC also developed that specialization.
Google pagerank. Oh wait, prior art doesn't mean shit these days!
Anyways, software patents seem to just be ignored these days. I can't remember the last time I paid Unisys for using a GIF...
You can't judge a book by the way it wears its hair.
These researchers are all full of shit. Why? Nobody outside of Google knows how Pagerank works, exactly. And let me tell you, if anybody did, they could make themselves millionaires overnight. There are groups of people who do nothing but try to tackle Google, and very few people successfully crack the magic formulas. And those who do make a quick buck, but then Google changes it again once people catch on. They didn't improve PageRank because they don't know how it works... they're just guessing how it works.
If you had finished school before the internet you would know that 'by a factor of 5' and '5 times faster' are very far apart. :/
--
If this could be combined with a much more frequent Google web trawl, the path would be opened towards realtime web searching, where web content is indexed and ranked in a matter of hours. When that day comes, services like Google Alert will come into their own. Just imagine being notified by email an hour after someone mentions your name!
If not self aware, couldn't it be used to calculate solutions to traditional problems, perhaps by trying to find pages in an order that works from a stated problem to a stated solution?
I'd think such a huge index of data could be useful to the AI people...
I'm trying to teach myself to set people on fire with my mind... Is it hot in here?
I studied under the SCCM program at Stanford, and started the same year as Sepandar Kamvar. I remember him as a great guy, very smart, and an EXCEPTIONALLY good speaker and tutor (I was always pestering him for explanations of the week's lectures).
I'm glad to hear his research is getting attention, and I hope others who are interested in the theoretical aspects of data mining and web search engines will take a look at the SCCM and statistics programs at Stanford (shameless plug - other can post pointers to similar programs).
"It's overkill, of course. But you can never have too much overkill." - Anonymous Slashdot Coward
Well, according to Moore's law (or rather observation), PageRank would become 5 times faster in a couple of years anyway.
I did a search on "The Sex Monster", a 1999 movie about a man whose wife becomes bisexual, and now my Google thinks I'm gay!
(joke reference: http://online.wsj.com/article_email/0,,SB10382619
because the 0.01 seconds to search the web isn't fast enough :)
Is it me or does everyone get crappy sites
It's a stab in the dark, but I'll wager that the quality of the search results is directly tied to the quality of the query.
Yeah, it's a stretch, I know, but bear with me... just moments ago I googled for "slashdot flamebait" and came up with a link to your post.
--
mcpHuzzah!kaaos
It goes from God, to Jerry, to me.
You should have had "Shameless plug" as your subject
I can't remember the last time I paid Unisys for using a GIF...
When was the last time you bought a copy of GraphicConverter, Fireworks, Photoshop, Paint Shop Pro, or any other program licensed under U.S. Patent 4,558,302 and foreign counterparts? The price of each of those programs includes a royalty paid to Unisys.
Will I retire or break 10K?
if Microsoft still owns it... Say a puppet is lazy ?
Google's quality does seem to be going downhill, but I strongly suspect their splitting blogs out to their own index will do a LOT towards reducing the decline, and perhaps reversing it. Anything they can do to make sure people aren't abusing the system is a good thing.
(Here's hoping the next thing they split out are mailing list archives.)
Yup. You may already see the page fast enough, but that's *using* pagerank - Calculating pagerank is a separate process, and if they can do it five times faster, they can either spend less money calculating it, or calculate it more frequently so it stays more current.
Bill Stewart
New Fast-Compression-only CPR http://preview.tinyurl.com/dy575ks
Sounds a lot like Kleinberg's HITS algorithm, circa 1997. Try Teoma for a real-world implementation.
Coincidence time: I used the same example in a presentation a couple of years ago to illustrate how subgroupings can be found for a single search term. Try it on Teoma, and see the various subtopics under "Refine". IIRC each of those is a principal eigenvector of the link matrix.Topologically speaking, each principal eigenvector corresponds to a more or less isolated subgraph, eg the subgraph for "San Francisco Giants" is not much connected to the nest of links for "They Might Be Giants", and we get a nice list of subtopics.
(I once tried to explain this algorithm to my bosses at my former employer, which is why I have so much free time to type this right now.)
---- "If we have to go on with these damned quantum jumps, then I'm sorry that I ever got involved" - Erwin Schrodinger
The research was done partially with public funding from an NSF grant, yet the commercial applications are obvious and immediate.
So my question is, who sees the benefit of the research? The researchers? Can Google just jack the results and incorporate into their system?
It seems to me that the current system of allocation research dollars with public and private grants is very messy and needs overhaul.
"Personalized PageRank" is a bad term to use for what the researchers are describing. Essentially what they mean is categorized pagerank i.e. being able to rank a particular page differently based on the category which was being searched under. What this algorithm would allow you to do is to add more categories.
Bottomline: These researchers did some cool stuff to speed up the algorithm published in 1998 and how are trying to justify a use for it.
Mmmm.. Donuts
The character of Morpheus is based on him.
Not really, but he wrote (co-authored) the book, literally, on matrix algorithms.
---- "If we have to go on with these damned quantum jumps, then I'm sorry that I ever got involved" - Erwin Schrodinger
It would be realy great if Google opened up a personalization API - like the service currently offered by Google Alert. That would really be something...
And that is why the word personalized has quotation marks around it.
And don't call me Shirley!
The search results for "pagerank" on the group's server is useful: search results
... is quality.
I'm surprised how Google is choosing not to implement search features that would greatly enhance advanced queries.
How often I'd wish they allowed wildcards in their queries (where engl* would pull hits with england, english, etc).
Field searches still require you to add keywords, so I cannot just query "site:somesite.com" to get all the currently indexed pages from somesite.com
In this respect Altavista still produces better results, with an excelent range of fields to choose from.
If there is anything that Google is lacking, it's defenitely that.
Having said that, still my number one SE.
That is what /. is for. Only source for news needed, cause it's all the "Stuff that matters" (and News for nerds at the same time).
You are sure that everything here is of interest, and nothing is redundant, out of date, boring or stupid!
Enig? Det alt for hot det smor!
The assumption I thought they were making is that Google hasn't improved on page-rank since 1998, which is what they based their comparison (25-300% speedup) upon.
I further speculated google may have already discovered some of these techniques independantly, perhaps by reading the same papers these students did.
The other stuff was a pie-in-the-sky idea of mine that I thought was a way of combining both techniques, which I suspected google may have used part of. But that's just my opinion, I'm probably wrong. ^o^
Black holes are where the Matrix raised SIGFPE
Internet users doing searches may be free but google has plenty of paying customers.
They provide an excellent service for their paid advetisements and represent great value for money.
There are places where the networks are not touching,and there are places where they are-Boeing's Lori Gunter
Actually, I often find mailing list archives very helpful for solving technical problems.
However, if they would instead add them into their Google Groups hierarchy it could be quite good.
--
Simon.