Slashdot Mirror


Computing PageRank on your PC?

An anonymous reader writes "A group of CS researchers of the University of Milan has found a way to compress web graphs at 3 bits per link, and to access them in compressed form. They provide data sets representing real snapshots of portions of the web with one hundred million nodes and 1 billion links. You just need some bandwidth to download a few hundred megabytes of data, and you can compute PageRank with your PC. All the code involved is GPL'd, and the data are public: everybody can grok PageRank now!"

186 comments

  1. The major thing missing from Mozilla by Anonymous Coward · · Score: 5, Interesting

    Is a way to look at Google's pagerank. That's the only real thing the IE Google toolbar has over the Mozilla alternative.

    1. Re:The major thing missing from Mozilla by FroMan · · Score: 1

      Heck, at only 12 megs for a download of mozilla, maybe we should incorporate this "feature" into mozilla. Whats a couple hundred more meg right? :-)

      </sillyness>

      This was posted by mozilla, don't worry about modding me down for teasing the browser.

      --
      Norris/Palin 2012
      Fact: We deserve leaders who can kick your ass and field dress your carcass.
    2. Re:The major thing missing from Mozilla by Anonymous Coward · · Score: 1, Insightful
      Is a way to look at Google's pagerank. That's the only real thing the IE Google toolbar has over the Mozilla alternative.

      The Google toolbar for IE has to ask google.com for the PageRank of each page you view, via XML-RPC. One of the fields in the XML-RPC request is a checksum. Without that checksum, google.com rejects the request. So it's just a matter of finding out how the toolbar calculates the checksum based on your URL. Then you could write a standalone (or Mozilla-based) tool for fetching PageRanks.

    3. Re:The major thing missing from Mozilla by Anonymous Coward · · Score: 3, Informative

      http://googlebar.mozdev.org

    4. Re:The major thing missing from Mozilla by Anonymous Coward · · Score: 0
      Sig: it should be "I like swearing *in* french"

      Makes more sense, eh? :)

    5. Re:The major thing missing from Mozilla by FroMan · · Score: 1

      sic

      Check the second definition supplied. Is this making more sense to you?

      --
      Norris/Palin 2012
      Fact: We deserve leaders who can kick your ass and field dress your carcass.
    6. Re:The major thing missing from Mozilla by Anonymous Coward · · Score: 0

      Why not just use the Google XML-RPC API?

    7. Re:The major thing missing from Mozilla by wherley · · Score: 3, Informative

      Tried it...but it provides no pagerank. They say:
      "We currently have no plans to implement pagerank"

      Still - a cool addition to mozilla.

    8. Re:The major thing missing from Mozilla by Anonymous Coward · · Score: 0

      Notice I said Mozilla alternative? Notice it lacks PageRank? That was my entire point.

    9. Re:The major thing missing from Mozilla by Badge+17 · · Score: 1

      Hey, as long as we're being fussy and using [sic], shouldn't that be "it's," rather than "its"?

      "it's" := "it is"

    10. Re:The major thing missing from Mozilla by JamesDotCom · · Score: 2, Interesting

      The problem is, is that the google toolbars checksum changes constantly. So if you were to find out how the google toolbar works exactely regarding pagerank, all it takes if for googles official toolbar to change it and it wont work anymore. The catch is however, that if you send a wrong checksum to google, they don't send back an error message of any sort instead they send back a fake pagerank. So you wouldn't really know if it was still working or not.

    11. Re:The major thing missing from Mozilla by Anonymous Coward · · Score: 0

      And 'French' should be spelt 'Freedom' too.

    12. Re:The major thing missing from Mozilla by FroMan · · Score: 1

      Yep. Good call. :-) Thanks.

      --
      Norris/Palin 2012
      Fact: We deserve leaders who can kick your ass and field dress your carcass.
  2. This sounds cool.. by xchino · · Score: 5, Funny

    Now if I can just think of a reason why I would need this..

    --
    Everyone is entitled to their own opinion. It's just that yours is stupid.
    1. Re:This sounds cool.. by Daniel_Staal · · Score: 5, Funny
      Now if I can just think of a reason why I would need this..

      And you call yourself a geek. *Sigh*.

      It doesn't matter why you need it. It's technical, GPLed, and has to do with Google. That's all the reason you need.

      --
      'Sensible' is a curse word.
    2. Re:This sounds cool.. by imAck · · Score: 1

      This is really cool, not just for page rank. Finding pre-compiled data sources like this can be a great catalyst in scientific research. Just my two cents.

      --

      It's hard to tell the cool to chill, my favorite hotel room has a view to an ill.

    3. Re:This sounds cool.. by (trb001) · · Score: 5, Funny

      It's technical, GPLed, and has to do with Google

      It's a geek hattrick!

    4. Re:This sounds cool.. by inerte · · Score: 1

      1) Soccer/BasketBall/Any Ball game: Put a sensor on everyone's shoes (or hands) and one at the ball. Consider each player as a webpage and each pass as a link. The ball is the vertex. Then you can find what player (from what position) makes a good pass, and to who; Optimize your strategy :)

      2) P2P: Store and analyze peer positions (both geographically or between network connections/routes) to find the best combination. Consider the download as link, and each peer is a vertex. For example, "cluster" peers with the same content (if you make the file hash as the link initiator), or from the same country.

      3) User Interface: Streaming and saving inumerous screenshots from people using their computer, you can correlate what are the most common tasks performed on an application, or how "difficult" is to use it. For example, how many pixels the mouse pointer usually travel, either when you pass where you wanted to click, or from buttom to buttom. If you plot a web graphic showing that users usually go from buttom "A" to "B", and they are 20 pixels away, you could reduce the distance.

      4) Compression (I guess?): Find on your hard drive common byte sequences between files, and create "webpages" for them. When you open a file, it will look for its bytes on a vertex (the "link" from the webgraph). The more files you have, the better they are compressed.

      Also good for defrag: Keep the most accessed vertexes on the start of the hard drive, or its files "around";

      5) Surveys: If each answer and the profile of who's answering is a webpage, you can create vertexes of interests

      6) Spam: Consider each word from the spam as a vertex. Parse the email and connect each word with a vertex. Store this information (how many links a vertex have). When an email comes, check if it is spam by watching to how many vertexes its words connect to (and how "big" the vertexes are, ie: previous connections).

  3. Dumb Question: by Xesdeeni · · Score: 5, Interesting

    What's Page Rank? Does this indicate how often my page is visited?

    Xesdeeni

    1. Re:Dumb Question: by Chris_Stankowitz · · Score: 5, Informative

      Do you mods ever stop to wonder if this guy could have been asking a legit question? Its possible he doesn't know. Also possible that others don't. I know...I know..., this is /. how could he not know right. It is still very possible. I'm not saying he should have been modded up, but by modding him down someone may miss the chance to read his post and reply to it with an intelligent answer. All of that being said. I would answer his question. But now that I think about, I'm not sure what it is. I 'think' I know. But, I think he and I are in the same boat. I also thought about posting this as an AC, but I won't. Then surley someone will just think that it was the original poster posting as an AC. He may be trolling. He may not be. It won't hurt to answer the question.

    2. Re:Dumb Question: by Andorion · · Score: 1

      Well said =)

      ~Berj

    3. Re:Dumb Question: by Anonymous Coward · · Score: 5, Funny

      Jesus, you created a second account just to defend yourself!

    4. Re:Dumb Question: by kevin_conaway · · Score: 1

      I didnt know what it was either. Mod parent and grandparent up

    5. Re:Dumb Question: by Xesdeeni · · Score: 1

      Thanks for the defense, but I was kinda enjoying my first post labelled as a "Troll" :).

      So that's how Google ranks its pages. I didn't realize they tracked the number of links. I didn't really think about it long, but I figured they just used how many times the query string appeared, maybe the age of the page, or whatever.

      I wonder if this data would be hugely different from the number of visits a page receives, considering easily typed-in page addresses (fewer links needed), or the possibility that a single link is followed many times, while another is almost never followed.

      Anyway, thanks to AC for the info.

      Xesdeeni

    6. Re:Dumb Question: by biomass · · Score: 2, Informative

      Page rank, to a first order of approximation, ranks your page by "popularity". Using a voting system,it counts the number of links to your page.

      To a second order of approximation, it weights the votes of the referencing links by their popularity.

      To a third order of approximation, it is a Markov chain that measures the long term likelihood of you arriving at a page, if you to randomly traverse the net: taking random links out of a pages and occasionally take (1/20?) random jumps to arbitrary urls.

    7. Re:Dumb Question: by Anonymous Coward · · Score: 0

      All this and you didn't answer the question either... You don't know, do you?

    8. Re:Dumb Question: by sig+cop · · Score: 3, Funny
      I didn't know what is it either.

      Mod parent and grandparent and great-grandparent up.

      Also, mod parents children up.

      Also, mod great-great-grandparents great-great-granddaughters up.

      Also, say up unto them verily, that the mod of the parent will be cast down the generations to be a mod on the children, and on the children's children, and on the children's children's chilluns.

      And also, mod up the nephews of the parents of the sibilings of the grandparent for though they be trolls or flaimbait, they are righteous in the eyes of the moderators.

      And thou shalt visit the mods onto the descendents on through the generations, for I, your Mod, have smote upon thee a mod pestilence that shalt not be lifted until the second coming of the JonKats.

      Thanks be to Mod, Amen

    9. Re:Dumb Question: by Anonymous Coward · · Score: 0

      /me spits chicken pot pie out his nose...

    10. Re:Dumb Question: by breandandalton · · Score: 0, Offtopic
      >What's Page Rank? Does this indicate how often my page is visited?

      Ah, come on guys, who modded this "interesting"? Come on, own up! It's "funny" and "troll"!

      Or is this a slashdot moderation troll?

    11. Re:Dumb Question: by iphayd · · Score: 1

      I'm sure Jesus has better things to do that defending himself on Slashdot.

      Or posting on Slashdot, for that matter.

    12. Re:Dumb Question: by Anonymous Coward · · Score: 0

      you'd like to think so, but he really doesn't. Apparently, he's bascially just sitting around waiting for people to stop wearing crosses.
      "You think when Jesus come back he wants to see another fucking cross? ...'I m not going, Dad. No, they're still wearing crosses - they've totally missed the point.'" -the late Bill Hicks

    13. Re:Dumb Question: by Anonymous Coward · · Score: 0

      Bravo...

      Proof that the funny mod desires Karma points.

      AC

    14. Re:Dumb Question: by Anonymous Coward · · Score: 0

      No shit, is this the same Chris Stankowitz that went to NIU and skated with Andrew (and me late at night in the art building)? And lived with the Karate Kid for a while?

    15. Re:Dumb Question: by Ravioli · · Score: 1

      ... while forgetting to login with the third account for all the karma points! ;/
      --

      --
      I am too lame to make a .signature!
    16. Re:Dumb Question: by The+Cydonian · · Score: 1

      Think of it as Google-bestowed karma on your website. :-D

    17. Re:Dumb Question: by Anonymous Coward · · Score: 0

      Mod parent up!

    18. Re:Dumb Question: by Where's+my+towel · · Score: 1

      Some papers:

      http://citeseer.nj.nec.com/page98pagerank.html
      The original definition of pagerank.

      http://www.google.com/technology/
      The laymans definition.

      http://dbpubs.stanford.edu:8090/pub/1999-31
      How to compute it quickly.

      http://citeseer.nj.nec.com/haveliwala02topicsens it ive.html
      How to make it topic-specific

    19. Re:Dumb Question: by jonadab · · Score: 1

      > I wonder if this data would be hugely different from the
      > number of visits a page receives

      Dunno. But there's no way for Google to know how many visits
      a page receives; whereas, they can calculate how well-linked
      it is.

      It would be interesting to do a study on the relationship between
      pagerank and page loads and number of distinct visitors, but that
      would require having log data from all the servers involved; it
      would be pretty easy to get log data from a small number of servers,
      but getting those data for a decent subset of the internet would be
      significantly less easy.

      --
      Cut that out, or I will ship you to Norilsk in a box.
  4. Tee Hee by teamhasnoi · · Score: 5, Funny
    I bet the Searchking is steaming right about now...

    "Finally, proof!!"

    1. Re:Tee Hee by Catiline · · Score: 0, Redundant

      Too bad that their case was dismissed already.

  5. why is rank/rating necessary? by The+Terrorists · · Score: 0, Troll
    You get sort of a self reinforcing cycle of wankage the more we increase the "relevance", "awareness" or "utility" of pageranking. Post good info for its own sake, not for popularity's sake. Slashdot is a good example of the latter over the past few years. It could have receded back into the depths and maintained quality but it put page-ranking first, attempting to attract and contain a particular audience.

    TV without Nielsen ratings would be better too, for similar reason.

    1. Re:why is rank/rating necessary? by Anonymous Coward · · Score: 0

      Because other search enginges pre-PageRank didn't work as well as Google does with it. Is it possible that something could be better? Maybe, but it's not out there yet. Google didn't become popular because of a catchy name, a cute logo, or any crap like it. It became popular because it works, and PageRank is a part of that.

    2. Re:why is rank/rating necessary? by TopShelf · · Score: 4, Funny

      You get sort of a self reinforcing cycle of wankage...

      For a second there I thought you were just talking like Elmer Fudd! "wating and wanking incwease the welevance of pagewanking..."

      --
      Stop by my site where I write about ERP systems & more
    3. Re:why is rank/rating necessary? by Pionar · · Score: 2, Insightful

      It could have receded back into the depths and maintained quality but it put page-ranking first, attempting to attract and contain a particular audience.

      I disagree. In case you haven't noticed, the title of the /. front page is "News for Nerds, Stuff that Matters." So, of course /. is attracting a particular audience. That's a Good Thing.

      Target audience is one of the most important decisions when designing a web site. "Good info" is a subjective concept. What's good to you is not necessarily good to me. But, chances are if I search for something that I'm looking for, PageRank can provide a sense of the more authoratative pages for that subject.

      Also, putting stuff up for popularity's sake is a great reason to put something up. If I didn't want my employer's site to be seen, I wouldn't have put it up there. Attracting eyeballs is the only way to get good info. The more eyeballs, the better the accuracy of information. Why do you think peer review is such a big deal in scientific arenas (and it is, as I know from working for a big-name medical school)? If I was a scientist reviewing another scientist's work, then I would look at the writing aspects of the work. A little bit of style often makes information more credible to others. Don't ask me why, just know that it's human nature.

    4. Re:why is rank/rating necessary? by FearUncertaintyDoubt · · Score: 1
      You get sort of a self reinforcing cycle of wankage the more we increase the "relevance", "awareness" or "utility" of pageranking.

      That reminds me of Asimov's pyschohistory and the Second Foundation. That the First Foundation had to be unaware of the influence of the Second Foundation for it to work. Maybe that makes Searchking the Mule?

    5. Re:why is rank/rating necessary? by Spazmania · · Score: 1

      TV without Nielsen ratings would be better too, for similar reason.

      We have TV without Nielsen ratings. We call it "PBS."

      Is PBS better? Sometimes. Perhaps even often in recent years. Certainly no one has ever referred to PBS' content as mindless drivel, the way we talk about things like Survivor and American Idol.

      But let me ask you this: If you could have only one TV station and you had to choose between ABC, CBS, FOX, NBC and PBS would you choose PBS? Didn't think so.

      --
      Moderating "-1, Disagree" is simple censorship. Have the guts to post your opinion.
    6. Re:why is rank/rating necessary? by baka_boy · · Score: 4, Interesting

      I really shouldn't rise to this bait, but I can't resist: yes, given the choice between those networks, I would choose PBS. Just as I would take a non-profit-driven Internet, public radio over Clear Channel and its ilk, and community mesh wireless networks over 3G mobile phone service.

      Google has been, so far at least, a rare exception in the world of privatized communications utilities, by consistently showing a amazing lack of intention to lock people into their service, using either exclusivity agreements of some sort or the simple expedient of proprietary technology (i.e., "increase your PageRank by 10% if you support new encrypted GoogleML tags on your site!"). Nothing is permanent, though, and as we all know, single points of failure are a no-no.

      So, to bring all this back somewhere in the general neighborhood of the main story: further distributing the capability to build "mini-Googles", or specialized, community-maintained (but still fairly large-scale in terms of number of pages and links indexed) search tools is very interesting, and a useful body of technology to perpetuate.

      Or, even more generally, the technology needed to do large-scale storage, analysis, and manipulation of directed graph structures is a very useful tool. Software analysis often relies heavily on large graphs showing dependencies, caller-callee relationships, variable accesses, etc., as do any number of AI subdomains like knowledge representation and planning systems.

  6. Some webmasters/SEO's are obsessive by Anonymous Coward · · Score: 5, Interesting

    If Google tweaks one thing, causing result 97 to shift to result 98, they notice. They'd be doing this daily to check on their pages.

  7. does this mean... by Dreadlord · · Score: 1

    I wonder if this goes as it's planned to, is it the end of search engines, and the beginning of peap to pear search?

    --
    The IT section color scheme sucks.
    1. Re:does this mean... by Anonymous Coward · · Score: 0

      Not that many people need to find pears.

    2. Re:does this mean... by Diego_27182818 · · Score: 0, Offtopic
      and the beginning of peap to pear search?


      So the only definitions for peap I could find were as acronyms

      PEAP
      1. Positive-End Airway Pressure
      2. Protected Extensible Authentication Protocol

      And I'm not sure how either one of those would search to a pear.
      --
      Warning, cape does not enable user to fly
    3. Re:does this mean... by IMarvinTPA · · Score: 1

      I was going with it as an autamotapea(sp?, and no the spell checker didn't know either). So pick a random chirping bird or better yet, those Easter Marshmellow birds seeking out pears.

      IMarv

    4. Re:does this mean... by critter_hunter · · Score: 1

      Was the word you were looking for "onomatopoeia". You weren't even close, but I can hardly blame you

      --
      Karma: Could be worse (could be raining)
    5. Re:does this mean... by Anonymous Coward · · Score: 0

      The word you're looking for:

      Main Entry: onÂoÂmatoÂpoeÂia
      1 : the naming of a thing or action by a vocal imitation of the sound associated with it (as buzz, hiss)
      2 : the use of words whose sound suggests the sense

      [Merriam-Webster Online]

    6. Re:does this mean... by JamesOfTheDesert · · Score: 4, Funny
      ... pear search

      ... to find the fruits of your labor?

      What a grape idea! Orange you glad you thought of it?

      .

      .

      .

      Ok. Groan fest is over.

      --

      Java is the blue pill
      Choose the red pill
    7. Re:does this mean... by nacturation · · Score: 1
      ... and the beginning of peap to pear search?
      Depends on whether it's pear connections use they're bandwidth effectively. What happens when they loose the connection? I have too say that their way to many pear too pear programs out they're too shut them down all ready.
      --
      Want to improve your Karma? Instead of "Post Anonymously", try the "Post Humously" option.
    8. Re:does this mean... by Davak · · Score: 1

      PEAP really isn't used anymore...

      PEEP is now the common term.

      PEEP - Positive End-Expiratory Pressure

      Like anybody gives a flip.

      Davak

    9. Re:does this mean... by Anonymous Coward · · Score: 0

      Sure you can.

      Onomatopoeia is easy to spell, you just have to remember the mnemonic: Sing the letters to the music of Old McDonald.

      Old-Mc-Don-ald-had-a-farm
      O-n-o-m-a-t-o

      Ee-Oh-Ee-Eye-Aye
      p-o-e-i-a

  8. Which sites are the Root(s)? by amembleton · · Score: 5, Interesting

    When these Web Graph or Page Rank things are drawn up which sites do they use as the roots?

    I mean they've got to start with some site(s) and then go through each link from there.

    1. Re:Which sites are the Root(s)? by marcs · · Score: 1

      The ODP/dmoz would probably be a good place to start.

    2. Re:Which sites are the Root(s)? by cristofer8 · · Score: 1

      I would assume they have multiple roots. They have to seed google with some list of sites. In fact, I would guess that every site google sees gets added as a root, as well as any site added by google employees.

    3. Re:Which sites are the Root(s)? by Anonymous Coward · · Score: 0

      "...which sites do they use as the roots?"

      They simply use Google. Duh.

    4. Re:Which sites are the Root(s)? by warkda+rrior · · Score: 5, Informative

      It is a graph, not a tree, so there is no one root. Maybe you are looking for the seed site, i.e. the first site added to the webgraph they construct. You can choose any site you prefer, although something well-connected is better. It seems to me that Yahoo! would be a good starting point.

      --
      You need to install an RTFM interface.
    5. Re:Which sites are the Root(s)? by kentsin · · Score: 0

      I wonder if this is connected? Or how many connected graphs are there?

    6. Re:Which sites are the Root(s)? by menscher · · Score: 2, Interesting

      Google starts their webcrawl with the Stanford University home page. (Info based on a talk given by Craig SIlverstein, the directory of technology at Google.)

    7. Re:Which sites are the Root(s)? by jonadab · · Score: 1

      Backwards you have it. You start at leaf nodes (pages that don't
      link to anything) and work backwards to pages that link to the leaf
      nodes, pages that link to those, and so forth.

      Initially you have lots of webs -- one for each leaf node. As you
      add pages that link to them, these will tend to get joined as you
      find that many of them belong to some of the same trees.

      Oh, and you reserve known search engines for last.

      --
      Cut that out, or I will ship you to Norilsk in a box.
  9. One Billion Linnks? by Anonymous Coward · · Score: 1, Funny

    There is more links than that just at Microsoft's Support page. Although I don't know if you can call them links if they only send you around in a cirlce.

  10. beyond PageRank... by rfischer · · Score: 3, Interesting

    ... I would be interested in how the links change over time. Maybe take a new snapshot every day or week, see the web evolve.

    1. Re:beyond PageRank... by big_gibbon · · Score: 2, Interesting

      That would be amazingly cool.The only problem (and it's not really a problem) would be that generally people never, or rarely, remove links. If you limited this to links only (say) a month old or younger, you could see the paths of memes round the web . . . for example right now, you'd probably see a lot of BitTorrent hotspots, whereas a couple of years ago there'd be lots focussed on "all your base" . . .

      Anyone got a lot of procesing power and some spare time? ;)

      P

    2. Re:beyond PageRank... by Niet3sche · · Score: 1

      That'd be COO... wait, no, it's called
      www.archive.org
      And they have the "wayback machine" or some such.
      It's been done already.

  11. PageRank is part of Google's algo by Anonymous Coward · · Score: 5, Informative

    It's basically how well linked to your page is, and how well linked to the pages linking to you are, and so on. It's an advanced form of link popularity. The idea is that the more people that link to something, the more influential/important it is. Some sites have high PageRanks of 10 (like Google), while Slashdot is something like an 8. Many pages are in the 4-6 range. Every link you create is like a "vote" for another web page.

    1. Re:PageRank is part of Google's algo by e2d2 · · Score: 4, Informative

      Google's PageRank was actually named after Larry Page, the creator of their system for ranking pages. Pun was obviously intended.

    2. Re:PageRank is part of Google's algo by kisrael · · Score: 1

      Is there a way to see a given site's absolutely PageRank w/o using that toolbar? (as opposed to its relative PageRank of where it shows up on your search results.)

      --
      SO YOU'RE GOING TO DIE: The Comic for Dealing with Death
    3. Re:PageRank is part of Google's algo by Anonymous Coward · · Score: 2, Informative

      If your site is in the Google Directory (based on DMOZ), it may have a pagerank listed next to it.

    4. Re:PageRank is part of Google's algo by nacturation · · Score: 1

      Riiiight... and hypertext was so-called because someone's kid was always hyper. I suppose next you'll be telling us that Dell Computers was named after... er, wait... never mind.

      --
      Want to improve your Karma? Instead of "Post Anonymously", try the "Post Humously" option.
    5. Re:PageRank is part of Google's algo by Anonymous Coward · · Score: 0

      Git one of yer friends with der winderz compudor to luk at it on der IE toolbar.

    6. Re:PageRank is part of Google's algo by brakk · · Score: 1

      Ok, now we know what it is, can someone tell me what is usefull about being able to calculate it on my PC?

    7. Re:PageRank is part of Google's algo by thumperward · · Score: 1

      Informative, it gets! Informative!

      - Chris

    8. Re:PageRank is part of Google's algo by Anonymous Coward · · Score: 0

      That's right, Frank Computers

    9. Re:PageRank is part of Google's algo by InfoCynic · · Score: 2, Informative

      PageRank is a one-dimensional recursive weighting for a web page. Intially you assume all pages were created equal. Now for each page, compute an updated PageRank based on indegree (number of pages linking to the site). You usually also introduce a weighting factor which is designed to simulate some random chance that you "jump" to the next page by just typing a URL, not following a link. After that, you typically normalize the scores (sum of squares must equal one is the preferred norm).


      Now you have to iterate, but on subsequent iterations, you're no longer consider with purely indegree. You care about the PageRank of pages linking to you. Pages that are "popular" and have high PageRank will boost your score. Typically you iterate until the values converge to within a given threshold. If you know linear algebra, you can also cheat and use eigenvalues, but that's not the point.


      There are better algorithms, like Kleinberg's, which gives each page a "hub" and "authority" score, where just linking to a page isn't enough, and you can learn more about that in a Web Algorithms course.

      --

      "Recta non toleranda futuaris nisi irrisus ridebis"

  12. I can see it now... by AyeRoxor! · · Score: 5, Funny

    "[...] even on a PC with as little as 256 Mbytes of RAM."

    Somewhere in 1980, milk shoots out of Bill Gates' nose for no apparent reason.

    1. Re:I can see it now... by Anonymous Coward · · Score: 0

      It was very strange, since he was drinking a glass of iced tea at the time.

    2. Re:I can see it now... by nacturation · · Score: 1

      Damn, that's about the funniest thing I've read all day. Rather Douglas Adams-ish -- well done.

      --
      Want to improve your Karma? Instead of "Post Anonymously", try the "Post Humously" option.
    3. Re:I can see it now... by Anonymous Coward · · Score: 0

      Cute.

  13. exactly by Anonymous Coward · · Score: 1

    This is what we need to talk about at our little IRC chat session tonight, commander.

    web graphs at 3 bits per link, that's a paddling...

    compute PageRank with your PC, that's a paddling....

    groking PageRank, you better believe that's a paddling...

  14. Google with feedback by Sanity · · Score: 3, Interesting
    Doesn't Google have a patent on PageRank?

    Anyway, forgive the opportunism, but this is reasonably on-topic. Last weekend I set myself the ambitious task of improving on Google. I came up with a Google front-end which allows you to give feedback on the quality of search results, and thus refine your search. I could really use people's help to test it out - you can find it here. Feedback would really be appreciated.

    1. Re:Google with feedback by YoJ · · Score: 3, Insightful
      The whole point of patents is to encourage inventors to publish their inventions in a safe way. In some respects, PageRank is a good example of how the system is supposed to work. They publish the algorithm, people examine it and experiment further with it, but the inventors still have protection against people ripping off their work.

      The problem is that the GPL does not allow distribution of patent-encumbered technology. The authors of the code in question have every right to release their code with whatever license they want (I believe this is a free-speech issue, especially since the purpose of releasing the code is for doing research). People who receive their code may not use the code in a way that violates the patent, and in addition may not redistribute the code at all (since it would violate the GPL).

      The other issue is that PageRank is really a mathematical formula, and as such is unpatentable. What they actually patented is an algorithm for computing PageRank. If someone finds another way of computing the same formula, I think the patent holders would have a very hard time showing infringement.

    2. Re:Google with feedback by sjlutz · · Score: 1

      The only problem with getting feedback from anyone is that it would be very easy to reduce a search engines hit on specific things. Example, say I work for Pepsi, I seach to cola, and say "BAD" for every Coca-Cola result. I script it to submit hundreds of "bads" an hour. Now, I search for Cola, and only get Pepsi results.

  15. This is good, but... by Prince_Ali · · Score: 5, Funny
    This is good, but I'd rather have the google cache compressed to 3 bits per page.

    "I'll be there in a minute! I'm downloading the Internet!"

    1. Re:This is good, but... by Jerf · · Score: 1

      Here ya go:

      011

      Proper decompression is left an as exercise for the reader.

      Now, let us discuss payment schemes...

    2. Re:This is good, but... by misterhaan · · Score: 1
      --

      track7.org has all kinds of interesting stuff!

  16. Live XML Version by amembleton · · Score: 2, Insightful
    data sets representing real snapshots of portions of the web

    If these are snapshots then you'll need to keep downloading them for your Page Rank system to be up to date. The web is constantly changing and therefore so is Page Rank. I can't see having a data set on your computer being all that usefull as it'll soon expire.

    It would be far better to be able to link to a data set via XML and query it. That way you would have live upto the minute Page Ranks. I know that Google already does a live Page Rank system, but being able to access it and query it would be usefull.

    1. Re:Live XML Version by sporty · · Score: 1
      If these are snapshots then you'll need to keep downloading them for your Page Rank system to be up to date. The web is constantly changing and therefore so is Page Rank. I can't see having a data set on your computer being all that usefull as it'll soon expire.


      You are absolutely right. This is Google does on your behalf. They have the computing power, storage-wise as well as processing-wise to do the needed updating. Not that google can do it 24/7, but they do better than I can with my 4 computers. :\


      It would be far better to be able to link to a data set via XML and query it. That way you would have live upto the minute Page Ranks. I know that Google already does a live Page Rank system, but being able to access it and query it would be usefull.


      And expensive. Remember, queries more complex than "what is related to this" and doing the computation on the data requires resources to move the data from google to you PLUS the additional query. It'd be a bandwidht hog if not compressed significantly. Google's shared API is prolly more sufficient.
      --

      -
      ping -f 255.255.255.255 # if only

  17. Google patents? by PaulBu · · Score: 4, Interesting

    All the code involved is GPL'd, and the data are public: everybody can grok PageRank now!

    GPL'd? Hmm, I thought that Google did patent the PageRank algorithm (correct me if I am wrong), so re-implementing THEIR algorithm even more efficiently would be incompatible with GPL. OTOH, if it is not THEIR algorithm, it can not be called 'PageRank'
    Oh, the evils of software patents...
    Paul B.

    1. Re:Google patents? by JoeBuck · · Score: 3, Interesting

      Google hasn't exactly patented the algorithm for all uses, and no court has determined that the code infringes the patent, and software patents aren't valid in most countries, so it's not clear whether or not there is any compatibility.

      It would seem that anyone who uses the code to build a search engine would be infringing, but even that is something that lawyers can argue about.

    2. Re:Google patents? by egomaniac · · Score: 1

      OTOH, if it is not THEIR algorithm, it can not be called 'PageRank'.

      Unless the term is trademarked (is it?), you can call whatever the hell you want "PageRank" and nobody can do a thing about it.

      --
      ZFS: because love is never having to say fsck
  18. Doesn't actually calculate PageRank? by Vultan · · Score: 5, Informative

    As best as I can tell from the website, the API is only for storing and interacting with a large graph. Nothing there is actually involved with PageRank. You could use this API presumably to write your own PageRank code, but to say "everybody can grok PageRank now!" is misleading at best.

    Moreover, IANAL, but isn't the PageRank algorithm patented by Google? Wouldn't this prevent anyone from releasing GPL code that computes PageRank?

    1. Re:Doesn't actually calculate PageRank? by Chundra · · Score: 0, Flamebait

      As best as I can tell from the website, the API is only for storing and interacting with a large graph. Nothing there is actually involved with PageRank.

      Dude, how do you think pagerank works? You might want to go read the original paper before you make such idiotic claims.

    2. Re:Doesn't actually calculate PageRank? by Anonymous Coward · · Score: 0

      Yet another moderator on crack. Pagerank is an algorithm that ranks the nodes in a large graph based on the structure of the graph. So, "storing and interacting with a large graph" is very much a part of the algorithm.

    3. Re:Doesn't actually calculate PageRank? by morzel · · Score: 1
      Moreover, IANAL, but isn't the PageRank algorithm patented by Google? Wouldn't this prevent anyone from releasing GPL code that computes PageRank?
      It would prevent anyone in the US from releasing that code. Software patents don't apply everywhere (yet)

      --
      Okay... I'll do the stupid things first, then you shy people follow.
      [Zappa]
  19. Isn't that illegal? by anthony_dipierro · · Score: 1

    PageRank is patented, isn't it?

    1. Re:Isn't that illegal? by Anonymous Coward · · Score: 2, Informative

      Yes, and it's trademarked, too. Here's a bunch more info on PageRank.

  20. Not Page Rank (?) by SirTwitchALot · · Score: 1

    I don't think this is pagerank, reading the link, this looks more like another rating system that is similar to pagerank. It's great for study, but I don't think reading through the source and finding ways to 'trick' this algorithm will necessarily work on Google. Correct me if I'm wrong someone.

    --
    Go away, or I will replace you with a very small shell script.
  21. has to be said by madHomer · · Score: 4, Funny

    It's just not the same without the pigeons...

  22. By the way... by Sanity · · Score: 2, Informative

    ...it isn't on a fat pipe, so please understand if its slow.

    1. Re:By the way... by el-spectre · · Score: 1

      heh, we need a new mod level, -1, CheapShot

      --
      "Faith: Belief without evidence in what is told by one who speaks without knowledge, of things without parallel." - A.B.
    2. Re:By the way... by Anonymous Coward · · Score: 0

      ...it isn't on a fat pipe, so please understand if its slow.

      You are a brave, brave, stupid man...

    3. Re:By the way... by fanpoe · · Score: 1

      I'm sure some would prefer it to be +1, CheapShot ;)

    4. Re:By the way... by el-spectre · · Score: 1

      probably everyone except the guy who gets mocked :)

      then again, a whole lot of "MS SUCKS" posts might qualify.

      --
      "Faith: Belief without evidence in what is told by one who speaks without knowledge, of things without parallel." - A.B.
  23. Proof of concept only by Saganaga · · Score: 5, Informative

    I think this project is really just a proof of concept. As another post pointed out, to make this really useful you'd need to regularly update your local data set, which isn't very practical for most people.

    Also, if the downloadable dataset only covers a small portion of the web, how can this system's utility really compare to Google's?

    That said, I think computer science proof-of-concept type project are very useful and serve a valuable purpose in getting the ideas out there for others to improve upon.

    1. Re:Proof of concept only by Anonymous Coward · · Score: 0
      Also, if the downloadable dataset only covers a small portion of the web, how can this system's utility really compare to Google's?

      By limiting your domain. You could, for example, make a specialized engine for RSS feeds.

  24. Explains it all... by Spokehedz · · Score: 2, Informative

    It uses Slashdot as a root, of course. ;)

    Seriously, I don't know. Here's a page on how Google works though.

    http://www.google.com/technology/index.html

  25. Wow! The site is not /.'d! by tyroneking · · Score: 1

    Though after a quick read, I can see why ...

  26. You're an optimist by niom · · Score: 1

    Maybe take a new snapshot every day or week, see the web evolve.

    How much time do you think it's needed to take a snapshot of the Web? Most certainly much longer than a day or even a week. My bet would be several months at the very least.

    --
    -- Repeat with me: "There is no right to profits".
    1. Re:You're an optimist by jonadab · · Score: 1

      > How much time do you think it's needed to take a snapshot of
      > the Web? Most certainly much longer than a day or even a week.
      > My bet would be several months at the very least.

      I strongly suspect some pages get re-evaluated much more often than
      others. At least, it _ought_ to be that way. For one thing, pages
      with a higher PageRank are more important (to the evaluation of
      other pages) and thus should get redone more often. Additionally,
      pages that are known to change frequently should be done more often
      than pages that are not known to have changed previously. How to
      ballance those two considerations is an open question.

      --
      Cut that out, or I will ship you to Norilsk in a box.
  27. What a mess by Ignorant+Aardvark · · Score: 2, Funny

    Sure, now everybody can grok PageRank, but I, for the life of me, cannot grok grok.

    1. Re:What a mess by dpbsmith · · Score: 4, Informative

      Just in case this wasn't an implied rhetorical question... the term, as far as I know, was invented by Robert Heinlein in his novel _Stranger in a Strange Land,_ where it is an expression used by Martians. It literally means "to drink," but the Martians use it to mean an understanding that is both very deep and very complete.

    2. Re:What a mess by mbourgon · · Score: 2, Informative

      Yes. Basically, "to share water with", which on Mars meant you were more than brothers. Considering how little water was/is on Mars, it was a great honor.

      --
      "Sometimes a woman is a kind of religion, she can save your soul & set you free from all your sins" - Bad Examples
    3. Re:What a mess by Anonymous Coward · · Score: 0

      Just in case this wasn't an implied rhetorical question

      You will please observe that the phrase "cannot grok grok" uses "grok" in the canonical way. Ergo, he already grokked grok.

    4. Re:What a mess by nonetheless · · Score: 1
      Indeed.

      OED Online:

      grok, v. U.S. slang. Also grock.

      [Arbitrary formation by Heinlein (see quot. 1961).]

      a. trans. (also with obj. clause) To understand intuitively or by empathy; to establish rapport with. b. intr. To empathize or communicate sympathetically (with); also, to experience enjoyment.

      1961 R. HEINLEIN Stranger in Strange Land iii. 18 Smith had been aware of the doctors but had grokked that their intentions were benign. Ibid. xxiv. 250 Now that he knew himself to be self he was free to grok ever closer to his brothers. 1968 T. WOLFE Electric Kool-Aid Acid Test vi. 86 Instead they are all rapping and grokking over the sound it made..as if they had synched into a never-before-heard thing, a unique thing. 1968 Playboy June 80 He met her at an acid-rock ball and she grokked him, this ultracool miss loaded with experience and bereft of emotion. 1969 New Yorker 15 Mar. 35, I was thinking we ought to get together somewhere, Mr. Zzyzbyzynsky, and grok about our problems. 1975 D. LODGE Changing Places iv. 137 Nestling earth couple would like to find water brothers to grock with in peace. 1984 InfoWorld 21 May 32 There isn't any software! Only different internal states of hardware. It's all hardware! It's a shame programmers don't grok that better.

    5. Re:What a mess by Anonymous Coward · · Score: 0


      There are no martians. Nobody lives on Mars.

    6. Re:What a mess by MyHair · · Score: 1

      I never knew the source but figured it had to be Emacs. grep, grok, natch?

      I feel so much more enlightened now, and now I have less reason to learn Emacs. Viva vi!

  28. bile@netscape.com by radiumhahn · · Score: 1

    It doesn't mean a lot to me when my brother says he is going to double his efforts to find a job. This is especially true if you know my brother.

  29. Google's algorithms have changed quite a bit by HiKarma · · Score: 3, Insightful

    Since their original papers, according to all posted reports. So I don't think you're really going to get the exact google number from a basic algorithm and this data set.

    They also use terms that appear in links as a major key in ranking searches.

    (Among other things.)

    Not that it is not interesting to see these rankings, and note the most widely linked to sites on the net.

    Which, by the way, after the obvious winners like Yahoo, include Adobe and Real networks, which have gotten immense numbers of sites to link to them with "Get acrobat reader" style links.

    I've often wondered if the makeashorterlink and tinyurl folks are doing it just for the googlejuice.

    In reverse, many sites now use javascript links in order to preserve their googlejuice.

    Very much a heisenberg phenomenon here.

  30. I wonder... by crashnbur · · Score: 4, Interesting

    ...how this can be used to discover the percentage of broken links on the web at any given moment in time.

  31. Re:Cannot divide by zero by Anonymous Coward · · Score: 0

    Common trolls! Let's keep this thread rollin'!!!!

  32. Feedback is local by Sanity · · Score: 1

    Your feedback is local to your search, it doesn't affect other people's searches.

  33. You say Power-Law graph of a billion pages... by Salamanders · · Score: 1

    and I say "Dammit, where are all the pretty pictures."

  34. can anyone explain what a web graph is? by bongoras · · Score: 1

    I get this from the article: "A set of flat codes, called Î codes, which are particularly suitable for storing web graphs (or, in general, integers with power-law distribution in a certain exponent range). The fact that these codes work well can be easily tested empirically, but we also try to provide a detailed mathematical analysis."

    Maybe it's my ADD. Maybe it's my inherent dumbassedness... but I can't grok that.

    So what is a web graph? How is that related to PageRank? If I download all this data, what the hell do I use it for?

    1. Re:can anyone explain what a web graph is? by Anonymous Coward · · Score: 0

      A webgraph is a graph, where the edges are the links and the vertices the urls.

      And a graph is a set of vertices and edges, where each (in this case oriented) edge connects two vertices.

    2. Re:can anyone explain what a web graph is? by Wesley+Felter · · Score: 1

      Yeah, I think they should have explained that. Presumably a Web graph is what you get when you treat each URL as a node and each link as an edge in a graph. PageRank is an algorithm used by Google that takes a Web graph as input.

    3. Re:can anyone explain what a web graph is? by lordbrain · · Score: 5, Informative

      In a graph is made up of two things, edges and vertices.

      In a web graph, vertices are webpages and edges are hyperlinks.

      PageRank determines how many incoming edges a vertex has. Given the nature of the web, this is a nontrivial problem because a vertex only knows its outgoing edges.

      The assumption for PageRank is that the more incoming edges a vertex has, the more popular it is. So you would use this to figure out how popular a particular vertex is.

      Given this you could do like Google and combine it with a search engine to prioritize the results.

      --

      Thank you. Thank you. Please no applause; just throw money
  35. Can google sue for reverse engineering pagerank by asscroft · · Score: 1

    even if it was improved upon. Can the idea of ranking based on links popularity be patented? Did google patent it? if not, how much longer before some asshole lawyer in melo park or amazon/ms/aol does and tries to shut down google.

    --
    because I have been enjoined by this Holy Office to abandon the false opinion which maintains that the Sun is the centre
    1. Re:Can google sue for reverse engineering pagerank by ktorn · · Score: 1

      Hold on, if someone can prove that they had that idea first, how can a patent force any shutdown?

  36. why is rank/rating necessary?-Pick a peck of peers by Anonymous Coward · · Score: 0

    "The more eyeballs, the better the accuracy of information. "

    That's assuming they are peers.

    "Why do you think peer review is such a big deal in scientific arenas "

    Because they are peers.

    "With enough eyeballs, all bugs are shallow" only works well when people competent enough to understand (peers) do the work.

  37. Besides - not all browsers show page rank by kiddailey · · Score: 2, Interesting

    And let's not forget... not all of us even get exposed to page rank regularly.

    On my Mac for example, I can't see it at all. On my Wintel I can, thanks to the Google toolbar.

  38. Ask and ye shall receive... by Theaetetus · · Score: 4, Informative
    and I say "Dammit, where are all the pretty pictures."

    Here (for free)

    Here too (for free)

    This one too (for free)

    This one also (free)

    And don't forget this classic ($30 poster)

    -T

  39. WebGraph has a PR of NULL by mcguyver · · Score: 1

    If WebGraph can inflate their google PR to 10 then I'm a believer. Until then this looks like one of the many tools available to analyse your PR.

    The WebGraph tool may be interesting for college students but the webmasters that are interested in seo techniques are going to find little use out of this tool.

  40. Google tries but it's a long chalk by Anonymous Coward · · Score: 0

    My awstats tells me that google doesn't look at everything every month. I have 500,000+ pages and google tries but I think it's managed about half of them in the last three months

    Here's my stats anyway
    Robot / Hits / last hit

    Googlebot 95049 11 Jun 2003 - 21:17
    Inktomi Slurp 41456 12 Jun 2003 - 20:15
    Scooter 5067 12 Jun 2003 - 19:43
    Tcl W3 Robot 3579 12 Jun 2003 - 14:45
    ia_archiver 801 12 Jun 2003 - 02:44
    WISENutbot 565 12 Jun 2003 - 20:00
    Road Runner: The ImageScape Robot 305 12 Jun 2003 - 00:47
    Jeeves 121 08 Jun 2003 - 02:31
    IBM_Planetwide 96 03 Jun 2003 - 12:47
    Unknown robot 65 12 Jun 2003 - 04:44
    Walhello appie 18 12 Jun 2003 - 02:02
    LinkWalker 9 12 Jun 2003 - 05:36
    Fast-Webcrawler 7 12 Jun 2003 - 07:02
    arks 6 10 Jun 2003 - 08:53
    The Python Robot 3 05 Jun 2003 - 00:39
    Lycos 2 12 Jun 2003 - 17:05
    GetURL 1 06 Jun 2003 - 08:10

  41. My Results by Anonymous Coward · · Score: 0

    about:blank seems to be the winner on my system.

  42. pagerank for the masses by yoyo81 · · Score: 1

    I don't know what the big deal is. I've always been able to do pagerank on my computer...

  43. Just use Google by kc0dxh · · Score: 1

    Why reinvent the wheel?

    --

    --- "1.21 Jigawatts!" -Doc

    1. Re:Just use Google by Anonymous Coward · · Score: 0

      NIH! That's why.

    2. Re:Just use Google by Anonymous Coward · · Score: 0

      Well, you could copyright your WheelCode(R) and then sue General Motors for infringement.

  44. Im sorry, I have to say it tho...... by reality-bytes · · Score: 0, Troll

    *Cough*

    Technically speaking you'd be donwloading the WEB not the Internet



    Trouble is with googles web cache, theres no pics; just think of all the beautiful images of pet dogs and holidays at Weston-super-Mare you'd be missing out on!

    --
    Ripping an new rectum in the fabric of spacetime.
  45. Could I someday use it for my PC - Re:This soun by leoaugust · · Score: 2, Interesting

    I wonder if I can use pagerank algorithm for the smaller universe of my harddrive itself?

    I have over 6,000 files on my PC many of which link to each other, and I am adding more links between them as time goes by. The collection is now so big that I can't even revist my own files and reason out the implications of the links between pages, beacuse of the huge time it would take to even spend a minute on each saved file.

    I wonder if something like Pagerank will let the important files that are linked by many others on my PC to rise "up" like the cream to say, and I can avoid having to use keywords and categories to wade through all the clutter on my harddrive ...

    Any other ideas of how to study the relationship between my 6,000+ files?

    I also have quite a few articles, e.g. news items saved from the web itself. I wonder if the pagerank of google for those saved articles could also help me flush out the important "external" articles on my harddrive itself.

    --
    To see a world in a grain of sand, and then to step back and see the beach where the sand lies ...
  46. page rank isn't real-time. by jasonhamilton · · Score: 1

    Have you ever developed a website and monitored SE traffic to it? PageRank is not a real time process. If you're running your own version of it, updating it once per month is more than enough.

    --
    SearchIRC - Now with live chat directory!
  47. Patents brought to you... by Anonymous Coward · · Score: 0

    ...by the largest totalitarian system existent today.

  48. In other news... by Anonymous Coward · · Score: 0

    GPLed software uses minicab data to allow hackers to choose their favourite late-night ride back from downtown. Developers name it "TaxiRank". ;-)

  49. Re:You have one now ! by Anonymous Coward · · Score: 0

    That is funny, I voted for whom I wanted.

  50. You mean google's spyware by MushMouth · · Score: 1

    Don't think for a moment that google is not tracking and saving this.

    1. Re:You mean google's spyware by hkmwbz · · Score: 1
      Spyware? How can it be "spying" on the user when the installer practically assaults the user with warnings that it will send the URL to Google so it can return the pagerank. How exactly is it supposed to send the pagerank if it can't send the URL to Google anyway?

      Ridiculous. You "anti"-spyware freaks are more dangerous than most spyware because you stir up shit and cause hysteria by accusing everything and everyone for spyware. You remove the focus from the actual spyware and sleazeware out there.

      You are actually helping spyware companies by causing hysteria and confusion.

      Great job, mister spyware posterboy.

      --
      Clever signature text goes here.
  51. Three-bit compression for web pages. by stripmarkup · · Score: 1

    Here's the algorithm:

    000: page is spam. Ignore it.
    001: page is porn. Porn is all the same, show porn page from disk.
    010: page is pop-up ad. Block it.
    011: page is a 404.
    100: page has javascript. Show random javascript error.
    101: page is Slashdot.
    110: page is Slashdot.
    111: page belongs to the .000001% of uncompressible pages, store it as is (full page follows).

    --
    See charts for twitter trends on Trendistic
    1. Re:Three-bit compression for web pages. by teorth · · Score: 1
      000: page is spam. Ignore it.
      001: page is porn. Porn is all the same, show porn page from disk.
      010: page is pop-up ad. Block it.
      011: page is a 404.
      100: page has javascript. Show random javascript error.
      101: page is Slashdot.
      110: page is Slashdot.
      111: page belongs to the .000001% of uncompressible pages, store it as is (full page follows).

      I think your code for 110 is incorrect, it should be: 110: page is Slahdotted.

      Terry

  52. Googling your harddisk by |>>? · · Score: 2, Interesting
    While calculating PageRank seems like a nice idea, I'm much more interested in having a google search available over my harddisk. I recall that AltaVista in the mid-90's had a programme that created an index over your whole disk - it dealt with many filetypes including .doc, .pdf, .mbox and basically gave you an AltaVista search over all your harddisk content.

    Anyone know of anything like that?

    --
    |>>? ..EBCDIC for Onno..
    1. Re:Googling your harddisk by MyHair · · Score: 1

      I'm much more interested in having a google search available over my harddisk.

      I thought I remember Google having a product like this, but I can't find it now.

      MS Win2k and WinXP have an indexing service that's supposed to do just what you want. It's not enabled by default in 2k; not sure about XP. I've been afraid to try it for various paranoia and stability reasons.

      HTdig was my next thought. It's designed for web pages, but I bet you could restrict it to your hard disk. However, the site says they don't index non-text files yet.

      For some reason I felt like searching Freshmeat and came up with SWISH++. It says it can index hard drives and non-text files "such as Microsoft Office documents", although the method they describe they use is not one I'm sure would work since Office docs can be in Unicode.

      Both HTdig and SWISH++ are GPL. There were other possibilities on Freshmeat, too.

    2. Re:Googling your harddisk by Anonymous Coward · · Score: 0

      Ht://dig will (with some config tweaking) index (at least) PDF - and DOC, not that you'd want any of those on your disk.

  53. Re:You have one now ! by Anonymous Coward · · Score: 0

    Funnier still, your vote didn't count ! !

    The election was decided by the supreme court justices which G. Bush the 1st installed !

  54. Hattrick by Daniel_Staal · · Score: 1

    Of course it's a hattrick. Two reasons just isn't impressive, and four is starting to get cumbersome (like you are droning on). Now 7-boredom are funny again too (depending on how funny the reasons themselves are), but I wasn't sure I could come up with seven on short notice. Three reasons was the optimum length for that post.

    --
    'Sensible' is a curse word.
  55. Can someone explain..... by reality-bytes · · Score: 1

    Exactly how my last post was trolling?

    --
    Ripping an new rectum in the fabric of spacetime.