Slashdot Mirror


Google's Technology Explored

RobotWisdom writes "Internetnews offers a moderately detailed peek at Google's technology. For example, they use stripped-down Red Hat on a massively redundant network, and they're starting to have success with automatic clustering of concepts, so that pages can match even if none of the words in your query actually appear on the page." Additional analysis on InformationWeek and C|Net. From the article: "As a search query comes into the system, it hits a Web server, then is split into chunks of service. One set of index servers contains the index; one set of machines contains one full index. To actually answer a query, Google has to use one complete set of servers. Since that set is replicated as a fail-safe, it also increases throughput, because if one set is busy, a new query can be routed to the next set, which drives down search time per box."

294 comments

  1. PigeonRank(TM) by Kimos · · Score: 5, Funny

    That's now how google does it! This is their REAL secret:
    http://www.google.com/technology/pigeonrank.html

    1. Re:PigeonRank(TM) by Tackhead · · Score: 5, Funny
      > That's now how google does it! This is their REAL secret: http://www.google.com/technology/pigeonrank.html

      That was pre-IPO.

      We'd like you to meet Bubba. Bubba's fully vested, and as this article says, he's, uh... he's grown somewhat.

    2. Re:PigeonRank(TM) by eric_brissette · · Score: 5, Funny

      Their technology for waste management alone must be revolutionary.

    3. Re:PigeonRank(TM) by generic-man · · Score: 1

      Wrong animal. Bubba was the 23-pound lobster found in Pittsburgh, who died while being transported to a zoo for treatment. Truly an American icon.

      --
      For more information, click here.
    4. Re:PigeonRank(TM) by Anonymous Coward · · Score: 1, Funny

      " Truly an American icon. "

      And one of the few who's funeral was graced with melted butter.

    5. Re:PigeonRank(TM) by ramblin+billy · · Score: 1


      A joke.....maybe not!

      "Google uses its massive architecture to learn from data"

      They're using the user input to improve their product, discover otherwise impossible to recognize connections, and increase value.

      If User=Pigeon then Profit

      billy - having strange urges concerning the statue in the courtyard

    6. Re:PigeonRank(TM) by Anonymous Coward · · Score: 0

      It's crappy.

    7. Re:PigeonRank(TM) by googisgod · · Score: 1
      no, actually, google's dirty little secret is:


      http://www.fuckedgoogle.com/

    8. Re:PigeonRank(TM) by IamTheRealMike · · Score: 1

      Where do you think their electricty comes from? That's a lot of bio-fuel right there.

    9. Re:PigeonRank(TM) by 0siris · · Score: 1

      It explains all in the link - the Google white background is...err... maybe you should just read it ;-)

    10. Re:PigeonRank(TM) by Matthaeus · · Score: 1

      Their technology for waste management alone must be revolutionary.



      When companies employ revolutionary waste-management techniques, the shit hits the fan.

  2. /. effect by Anonymous Coward · · Score: 4, Funny

    If we could /. google, that would impress me

    1. Re:/. effect by SmokeHalo · · Score: 5, Interesting

      It's been tried. From TFA:

      One literal meltdown -- a fire at a datacenter in an undisclosed location -- brought out six fire trucks but didn't crash the system.

      --
      I'm not good in groups. It's difficult to work in a group when you're omnipotent. - Q
    2. Re:/. effect by Kimos · · Score: 1
      Everyone together now!
      perl -e 'while (true) { system("wget www.google.com") }'
    3. Re:/. effect by Manan+Shah · · Score: 0

      More likely, your ISP would be pissed off at the bandwith before Google notices.

    4. Re:/. effect by Anonymous Coward · · Score: 2, Insightful

      Perl is a great language, and I love it, but that does not mean that you have to use it for everything.

      while true; do wget www.google.com; done

      seems better to me.

    5. Re:/. effect by Anonymous Coward · · Score: 5, Funny

      Computer programming languages are great, and I love them, but that does not mean that you have to use them for everything

      open browser at www.google.com
      get a drinking duck thing that bobs up and down hitting F5 every second

      seems better to me.

    6. Re:/. effect by Kimos · · Score: 1

      Agreed, but to make it effective you'd need a drinking bird that could do hundreds of refreshes a second. *pictures that bird in action*

    7. Re:/. effect by Anonymous Coward · · Score: 0

      Drinking ducks are great, and I love them, but that does not mean that you have to use them for everything...

    8. Re:/. effect by NoOneInParticular · · Score: 1

      Better still, use the avian carrier protocol to transmit packets to google. If you select carriers attractive enough, I'm sure it will distort google's search technology.

    9. Re:/. effect by thedustbustr · · Score: 1
      mod parent (+1, Awesome)

      That... was totally awesome!

      --
      This sig is false.
    10. Re:/. effect by generic-man · · Score: 1

      You want to DDOS Google with birds?

      --
      For more information, click here.
    11. Re:/. effect by NoOneInParticular · · Score: 1

      yeah, that might work as well.

    12. Re:/. effect by mini+me · · Score: 1
      Graphical applications are where it's at these days...
      konqueror & while true; do dcop konqueror-$! konqueror-mainwindow#1 openURL www.google.com; done
    13. Re:/. effect by Anonymous Coward · · Score: 0

      I think you just pictured Taco's bird in action. Excuse me while I wash my brain out with soap.

    14. Re:/. effect by Anonymous Coward · · Score: 2, Informative

      The undisclosed location was Santa Clara. I won't get more specific than that, sorry. They had a room jam packed with gear that was improperly cabled and spaced, and they didn't want to pay for redundant cooling. Then again, it wasn't a production site. Someone was almost overcome by the heat just walking between rows of cabinets.

    15. Re:/. effect by Dolly_Llama · · Score: 2, Funny

      a datacenter in an undisclosed location

      Is Dick Cheney in the IT business now?

      --

      Somewhere, something incredible is waiting to be known. -- Carl Sagan

    16. Re:/. effect by leonmergen · · Score: 1

      Then again, it wasn't a production site.

      Hmmm, yeah, I can imagine 'the system' not crashing if the fire didn't even hit production servers...

      --
      - Leon Mergen
      http://www.solatis.com
    17. Re:/. effect by fulldecent · · Score: 1
      How much would it suck if...
      /
      | [JavaScript Application]
      |
      | Due to abuse of the system, your IP
      | has been blocked from google.com
      \
      In this situation I'd probabily kill myself, or change my IP... which is basically the same thing.
      --

      -- I was raised on the command line, bitch

    18. Re:/. effect by Anonymous Coward · · Score: 0

      It's been tried.

      So, where does it say it has been tried?

      Oh, datacenter fire == slashdotting.

      Now I get it. :-p

    19. Re:/. effect by dagnathan · · Score: 1

      Perhaps we could /. this person instead Google Sucks!

    20. Re:/. effect by WhiteDragon · · Score: 1
      while true; do wget www.google.com; done
      this will only send one request at a time. You will get more requests (at the expense of more cpu usage) by doing:
      while true; do wget -q www.google.com & done
      Also note that this will just hit the web servers. If you also want to hit the index servers, do this:
      while true; do wget -q http://www.google.com?q=$RANDOM & done
      but note that many shells will limit you to a certain number of concurrent jobs to about 40 or so.
      --
      Did you mount a military-grade, variable-focus MASER on an unlicensed artificial intelligence?
  3. Truly Amazing. by iibbmm · · Score: 5, Interesting

    It really is amazing to think of the amount of information and data that we can access so quickly these days. When I stop and think about what my little search query goes through to bring me an almost instant response, it almost seems impossible. Of course the search engine side of this is only one example, but it's a nifty insight into how powerfull our infrastructure is these days. Bravo, mankind.

    1. Re:Truly Amazing. by HD+Webdev · · Score: 1

      Brings up an interesting point... Last night we were offered "free" Guinness pint glasses with each Guinness purchased. We were told they were 20oz glasses. Pints aren't 20oz we said.

      I was in a resteraunt and my girlfriend ordered a small orange juice while I ordered a large one.

      She drank hers rather quickly and for some reason, I thought the glasses looked odd even though mine was larger. So, I poured my large orange juice in her empty glass, and (go figure) it fit.

      Even after demonstrating this to the manager, all he would say was 'it must be a defective glass'.

      --
      This is not a dream, not a dream...we are transmitting from the year 1-9-9-9.
    2. Re:Truly Amazing. by fataugie · · Score: 1

      Well, you ordered a "large glass of orange juice" did you not? And you got a large glass with orange juice?

      Just because the voume was the same as the small glass, that doesn't mean you didn't get your large glass.

      It means you're too smart to eat there.

      BTW, what's the name of that Jip Joint (so I don't ever go there)?

      --

      WTF? Over?

    3. Re:Truly Amazing. by Anonymous Coward · · Score: 0

      It really is amazing to think of the amount of food we can produce so quickly these days. When I stop and think that farmers are going bankrupt because food is so inexpensive, it almost seems impossible. Of course we *must* keep this fiction known as an "economy" going at all costs, even if it means throwing away food while people starve. Bravo, mankind.

    4. Re:Truly Amazing. by HD+Webdev · · Score: 1

      BTW, what's the name of that Jip Joint (so I don't ever go there)?

      I don't recall the name, (years ago), but it was on the East side of the main part of State Street in Santa Barbara. I do distinctly recall it had a wooden balcony covering part of the back of the place. The wooden staircase was on left side as you go in. I remember that because I had to holler for service from the balcony.

      --
      This is not a dream, not a dream...we are transmitting from the year 1-9-9-9.
    5. Re:Truly Amazing. by Darby · · Score: 1

      I don't recall the name, (years ago), but it was on the East side of the main part of State Street in Santa Barbara. I do distinctly recall it had a wooden balcony covering part of the back of the place. The wooden staircase was on left side as you go in. I remember that because I had to holler for service from the balcony.

      Calypso?

    6. Re:Truly Amazing. by VultureMN · · Score: 1

      In case ya didn't know, an Imperial Pint is 20 oz, and is the preferred size of a Guinness serving. For that matter, it's the preferred serving size of ANY beer, IMO.

    7. Re:Truly Amazing. by ikkonoishi · · Score: 1

      Yeah google's servers are so powerful they can answer a question that took the second most powerful computer in the universe millions of years to solve in just 0.09 seconds.

      Also they can produce in just 0.17 seconds.

      Google rocks.

  4. interesting by slapout · · Score: 3, Funny

    and they're starting to have success with automatic clustering of concepts, so that pages can match even if none of the words in your query actually appear on the page

    So that's why I can search on the result page for my orginally query and find nothing. And all this time I was blaming Internet Explorer!

    --
    Coder's Stone: The programming language quick ref for iPad
    1. Re:interesting by vidarlo · · Score: 1
      So that's why I can search on the result page for my orginally query and find nothing. And all this time I was blaming Internet Explorer!

      Heh, you're not the only one. Tough, using google's cache is nice for finding the search phrase, since it outlines it.

    2. Re:interesting by InfiniteWisdom · · Score: 2, Interesting

      What's interesting is that the notice "Google is not affiliated with the authors of this page nor responsible for its content." goes away when you look at the cache of Google.com! That's a change from the last time I looked at Google's cache of Google a couple of years or so ago.

  5. Whats really impressive by mattmentecky · · Score: 5, Funny

    The technology that is truly asstounding, is Google's ability to cache itself. Yeah, think about THAT one for a while.

    1. Re:Whats really impressive by Anonymous Coward · · Score: 0

      Do they cache their cached pages...?

    2. Re:Whats really impressive by Anonymous Coward · · Score: 1, Interesting

      Uh? Google cache is runned by bbernal.com not Google. This is a little better: http://64.233.161.104/search?q=cache:64.233.161.10 4 but still not surprising if you think about it for a while.

    3. Re:Whats really impressive by MillionthMonkey · · Score: 4, Funny

      I don't see what's astounding about this.

      Reminds me of a radio interview I once heard with the Google founders. The host was curious about what the "I'm feeling lucky!" button was about. She claimed she typed in "Google" into the search box and clicked "I'm feeling lucky!", and nothing happened, so it didn't work!

    4. Re:Whats really impressive by danheskett · · Score: 2, Informative

      That was Terri Gross on NPR's fresh air.. .. Tthat was one of my favorite interviews ever i think. Terri is one of the least technical people, probably ever. Yet her interview was still interesting thanks to little tidbits like that!

    5. Re:Whats really impressive by Rufus88 · · Score: 1

      What if Google decided to cache only those sites that don't cache themselves. Would google cache itself then?

    6. Re:Whats really impressive by jesushaces · · Score: 1

      Has anyone noticed the "in memoriam Jef Raskin" at the bottom of the cache page?

      Jef's Wikipedia entry

    7. Re:Whats really impressive by Anonymous Coward · · Score: 0

      Is that the beginning of Google's self consciousness?

    8. Re:Whats really impressive by lgw · · Score: 1

      Oh, you think you're very clever, but next you'll go on to prove that black is white and get killed in a zebra crossing!

      The answer to your question, BTW, is that while the phrase "site that caches all sites that don't cache themselves" parses as valid English, it does not describe a possible thing. It's not a very interesting paradox, merely a logical contradiction.

      --
      Socialism: a lie told by totalitarians and believed by fools.
    9. Re:Whats really impressive by Anonymous Coward · · Score: 0

      Google cache is runned by

      Runned? What the hell is that? Are you in grade 2 or what?

    10. Re:Whats really impressive by Rufus88 · · Score: 1

      You may not find it interesting, but it's closely related to Russell's Paradox, which was of serious concern to set-theoreticians. This, in turn, is closely related to Godel's Incompleteness Theorem and also to the Halting Problem, which place fundamental limits on mathematics and computability.

    11. Re:Whats really impressive by lgw · · Score: 3, Interesting

      I've done a more studying in that area than most. There has been a lot of over-reacting to paradoxes such as this. Godel's Incompleteness theorem is only narrowly interesting: as soon as you start talking about physical things, these paradoxes are much less imporant.

      A set which contains all sets which do not contain themselves may be a conundrum, but a catalog that lists all catalogs that do not list themselves is merely impossible (trivially impossible, in fact). There are plenty of things that can be described in English that aren't possible things, and most of them aren't very interesting.

      The important consequence of Godel's Theorem to physical things was that mathematics is not a completely accurate model of physical objects. One physical object plus one physical object equals two physical objects, but not every equation describes the physically possible (OK, it was already known that this was the case, but Godel showed it was the case more often than expected).

      --
      Socialism: a lie told by totalitarians and believed by fools.
    12. Re:Whats really impressive by cagle_.25 · · Score: 1

      You stole my sig!

      --
      Human being (n.): A genetically human, genetically distinct, functioning organism.
    13. Re:Whats really impressive by bhsx · · Score: 1

      What's really, REALLY impressive is typing google into the Google-cached site of itself and hitting "I'm Feeling Lucky."
      That's redundency even GNU would be proud of.

      --
      put the what in the where?
    14. Re:Whats really impressive by UranusReallyHertz · · Score: 1

      The fact that not every possible mathmatical equation describes reality is not very surprising, but the REAL question is the inverse, namely wether everything possible is described by an equation. This gets to the heart of the limitations of math at describing the universe.

      --
      Smoking is an expensive, slow, and unreliable method of suicide.
    15. Re:Whats really impressive by D+H+NG · · Score: 1

      What will happen when somebody Google-bombs a certain keyword so that the Google I'm feeling lucky link for that keyword is the first result? What will happen when you click on "I'm feeling lucky" for that keyword? Will Google self-destruct?

    16. Re:Whats really impressive by lgw · · Score: 1

      Well, that sort of depends on what you mean by "everything possible described by an equation". Equations do a poor job of describing love, for example.

      However, within the subset of "everything" that comprises physical objects, it does seem that everything possible is described by an equation. However, the problem is that when you manipulate that equation to make an inference about the universe, you might end up with a new equation that no longer describes the physically possible.

      Of course, if you had a way to determine whether an equation described a possible physical thing by some property of equations, you'd be golden. However, further work on Godel's theorem shows this is not possible in the general case: more proof of the perversity of nature.

      --
      Socialism: a lie told by totalitarians and believed by fools.
    17. Re:Whats really impressive by 1110110001 · · Score: 1

      No and it's easy to find out. Take the URI http://64.233.161.104/search?q=cache:zhool8dxBV4J: www.google.com/+google&hl=en&start=1 and change the path to robots.txt -> http://64.233.161.104/robots.txt

      You see /search is not allowed for spiders. That's why google won't index its index.

      b4n

    18. Re:Whats really impressive by UranusReallyHertz · · Score: 1

      I'm kinda an extremist materialist in that I firmly believe that even things like emotions and mental states can be fully described in terms of the physical state of the brain. We just can't examine the operating brain in enough detail yet.

      --
      Smoking is an expensive, slow, and unreliable method of suicide.
  6. Picked up a Microsoftie by solomonrex · · Score: 2, Informative

    This article explained to me why they would pick up a Microsoft guy who worked on NT. Yes, I'm sure Google's OS and NT have nothing in common, but all the same, this guy seems motivated and smart. And if they have their own custom OS, I'm sure they're not going to make their own distribution, they just need to work in house.

    http://news.yahoo.com/news?tmpl=story&u=/zd/2005 03 03/tc_zd/146950

    blog:

    http://mark-lucovsky.blogspot.com/2005/02/shippi ng -software.html

    1. Re:Picked up a Microsoftie by FranksChickenHouse · · Score: 0

      Well the yahoo link was interesting, here's the fix for it http://news.yahoo.com/news?tmpl=story&u=/zd/200503 03/tc_zd/146950.
      However the blog seems empty and the link you provided (as well as the one in the Yahoo story) seem broken. Wonder what's up with that?

  7. Meltdown? by Ironsides · · Score: 3, Interesting

    Google's redundancy theory works on a meta level, as well, according to Hoelzle. One literal meltdown -- a fire at a datacenter in an undisclosed location -- brought out six fire trucks but didn't crash the system.

    Gee.. I wish our /.ing could do this. On the other hand, they have a level of redundancy and up time many businesses would kill for.

    --
    Fly me to the moon Let me sing among those stars Let me see what spring is like On jupiter and mars
    1. Re:Meltdown? by Ignignot · · Score: 3, Funny

      One literal meltdown -- a fire at a datacenter in an undisclosed location -- brought out six fire trucks but didn't crash the system.

      Gee.. I wish our /.ing could do this.

      It is my belief that data center fires are caused by slashdot every day!

      --
      I submitted this story last night, and it didn't get posted.
    2. Re:Meltdown? by Anonymous Coward · · Score: 1, Insightful

      Any company could have that kind of uptime - with the right amount of money....

    3. Re:Meltdown? by Anonymous Coward · · Score: 0
      I wish our /.ing could do this. On the other hand, they have a level of redundancy and up time many businesses would kill for.
      Well Slashdot has a level of redundancy that most other news outlets would die to avoid.
  8. Also Amazing: How much we miss by Ieshan · · Score: 5, Interesting

    It's also amazing how much of the general knowledge of the world we *can't* access, because it's unconnected or unpublished.

    Just think about how vast and extensive Google's search is, and then think about how little of the World's knowledge and creative achievement it actually can access.

    The quantity and breadth of human knowledge is breathtaking, no?

    1. Re:Also Amazing: How much we miss by iibbmm · · Score: 5, Insightful

      That's why projects like wikipedia are so important, and so impressive.

      Only a few years ago it could take forever to find any kind of decent information on some topics online or even in libraries. Today, I go to wiki and I'm almost assured to have a FAIRLY reliable source for information, as it's cross checked by peers who have some kind of a personal interest in the subject.

      However, there's a downside.

      Back when I was in school, researching a subject typically meant going through encyclopedia after encyclopedia, which wasn't a bad thing. I learned quite a bit by being FORCED to over-research topics. Today, I can generally straight-shoot to whatever I need to find, giving my brain a good set of blinders to everything else along the way.

    2. Re:Also Amazing: How much we miss by garcia · · Score: 2, Funny

      Oh come now! You can always do a site:slashdot.org and search Google. All the knowledge about ANYTHING is right there at your fingertips. Sometimes in duplicate and triplicate!

      What more could you need?

    3. Re:Also Amazing: How much we miss by Anonymous Coward · · Score: 1, Funny

      Substitute "pron" for "knowledge" and the statement still stands.

    4. Re:Also Amazing: How much we miss by natedubbya · · Score: 2, Insightful
      The quantity and breadth of human knowledge is breathtaking, no?

      Well, I think you haven't studied enough if you think this. When you start to realize we actually know very little, then you're getting somewhere.

    5. Re:Also Amazing: How much we miss by bmorton · · Score: 1
      Back when I was in school, researching a subject typically meant going through encyclopedia after encyclopedia, which wasn't a bad thing. I learned quite a bit by being FORCED to over-research topics. Today, I can generally straight-shoot to whatever I need to find, giving my brain a good set of blinders to everything else along the way.

      The same thing is still possible thru the Wikipedia. Don't forget that the whole thing is hyperlinked! I often find myself looking up stuff on Wikipedia only to get lost in it following the thread of links to some obscure topic.

      -B
    6. Re:Also Amazing: How much we miss by aspx · · Score: 1

      I find the bulk and stinkiness of human bullsh** even more remarkable.

    7. Re:Also Amazing: How much we miss by Skim123 · · Score: 2, Interesting
      Also with computers there's the whole cut and paste thing... at least with a printed encyclopedia you had to read the content when writing your report.

      Technology has the ability to improve everyone's collective IQ, but also has the ability to dumb down the populace. Kind of like TV. I remember tutoring an elementary student when I was a high school student back in '95 or so, and he couldn't do simple math (addition, subtraction, etc.) without his calculator. Sad...

      --

      I could not justify my existence if I were a turkey farmer. Would I terminate myself? Undoubtably, yes.

    8. Re:Also Amazing: How much we miss by jon787 · · Score: 2, Interesting

      Not only that, but all the information we index and then can't retrieve!

      "We have an embarrassment of riches in that we're able to store more than we can access. Capacities continue to double each year, while access times are improving at 10 percent per year. So, we have a vastly larger storage pool, with a relatively narrow pipeline into it." -- Jim Gray, Microsoft Research.

      --
      X(7): A program for managing terminal windows. See also screen(1).
    9. Re:Also Amazing: How much we miss by johansalk · · Score: 1

      I'm wondering what the grey matter mass of an average individual of the future will look like once we have everything accessible and computable on demand and using one's own biological memory or mental faculties would be a less frequent event.

    10. Re:Also Amazing: How much we miss by Anonymous Coward · · Score: 0

      > FAIRLY reliable source for information

      That's the problem. It isn't reliable. For example, one local journalist got burned badly by using that piece of crap to do research during the election. An idiot from WY change the number of electoral votes on Wiki to be 57! Some of the moderators agreed with that piece of misinformation. What's the point in trying to use something where people are rewarded for posting misinformation?

    11. Re:Also Amazing: How much we miss by MillionthMonkey · · Score: 0, Flamebait

      I do this all the time. Before I buy anything electronic, for example, I type its model number or maker's name into Google and search site:slashdot.org to find out why it sucks.

    12. Re:Also Amazing: How much we miss by ralphus · · Score: 1

      Yes, the quantity and breadth of human knowledge is breathtaking. I often get bummed out when I think about how much knowledge we'll never have access to since the library at Alexandria was burned.

      --
      Revolutions are never about freedom or justice. They're about who's going to be top dog. -- Kilgore Trout
    13. Re:Also Amazing: How much we miss by Anonymous Coward · · Score: 0

      The Wikipedia is a joke; I wouldn't assume that any given bit of information in it is reliable. The technical stuff (e.g. what a MAC address is)is probably more reliable, I'll grant that. But there is a whole range of subjects that I'll never trust the wikipedia for.

      The idea of the encyclopedia is that it is an authoritative source of information that is pretty darn reliable. If you already have a fair amount of knowledge I can see how the wikipedia would help: it might give you a few links to relevant sources. But you would never be able to cite the wikipedia as a reliable source in a school paper...well, maybe in high school, or if your prof didn't really know what it was. How much of it is really knowledge, as opposed to opinion?

      That's why I would consider Google's initiative to commence the indexing of academic libraries as far more significant than so-called collaborative knowledge. Anyone who has done a liberal arts degree knows how important access to scholarly work is when doing research, and timely access at that.

      Google seems to be the only search engine that has the determination to make this leap: from idexing the *mostly* intellectually crappy free-for-all discussion of the internet to something more significant, with the technical might to make it usable.

    14. Re:Also Amazing: How much we miss by Jugalator · · Score: 2, Insightful

      > FAIRLY reliable source for information

      That's the problem. It isn't reliable. For example, one local journalist got burned badly by using that piece of crap to do research during the election.


      Correction: It's "often" reliable.

      You want a better source?

      Sorry, you won't find one. Not a single one at least.

      What you're speaking of is not a problem with Wikipedia, that's a problem with a journalist who doesn't know how to properly research a subject. If a journalist relies on any single source to be perfectly correct, well what can I say... We've been over this exact thing multiple times before on Slashdot, and the most recent article posted here that touched the subject was about a 12 year old finding actual undeniable flaws in Encyclopedia Britannica. The only difference here is that as opposed to Wikipedia, they can survive in a damn book shelf for decades. Or at a minimum a year or so. You take risks in both cases; with Wikipedia it's due to the fluctuating medium, in other cases it may instead be outdated information. If there's anything a researcher has have had hammered into his head during education, it's that theories and knowledge are rarely "final" or "ultimate". And here lies the disadvantages that's generally greater in sources other than Wikipedia than in Wikipedia itself due to how they're revised.

      --
      Beware: In C++, your friends can see your privates!
    15. Re:Also Amazing: How much we miss by Jugalator · · Score: 1

      What more could you need?

      Pr0n!

      --
      Beware: In C++, your friends can see your privates!
    16. Re:Also Amazing: How much we miss by Kazoo+the+Clown · · Score: 2, Interesting

      I think it might be pretty amazing to find out what we can't easily access, even that which is published on the net. A simple example: you can't differentiate "net" from ".net" on google, and net is an extremely common word so it is next to useless as a qualifier if your searching for info on the ".net" equivalent to anything common. Or try searching for the smiley face: ":-)". While those may be trivial and uninteresting specific examples, they illustrate at least one area where "you can't find it through Google". There's entire categories of things you can't find on Google, sometimes not because it's not indexed at all, but because you find too much and the needed qualifier isn't alphabetic.

      Some areas have gotten better, a search for "furniture polish" does return different results than "polish furniture" (even when both are unquoted in the search), and I seem to remember having gotten stuck on one like that before. Quotes don't always do the trick because sometimes you don't expect the words to be near each other on the desired pages.

      Certainly we've come a long way, but it still can, and should, get even better.

    17. Re:Also Amazing: How much we miss by Anonymous Coward · · Score: 0

      > finding actual undeniable flaws in Encyclopedia Britannica

      The difference is that Encyclopedia Britannica isn't biased. The people that spend the most time w/ Wiki are. That's why they do it. That's what drives them. They hate the truth. They want to push their skewed view of the world. That's why, for example, the Wiki article that claimed cutting carbohydrates to less than 30 grams per day would kill you. That is incorrect and has been proven wrong many times before. A Wiki poster with an AMA bias intentionally posted that incorrect information. How do you stop that type of thing when you have a large group of people that are driven by that type of hate? You can't.

    18. Re:Also Amazing: How much we miss by ikkonoishi · · Score: 1

      Grey matter is responsible for processing data White matter transmitts information. Thus having more information on hand would actually bulk the brain up on grey matter. Also the brain would devote more of its total mass to processing rather than memory retention. So we would end up with weaker memory skills, but the ability to handle far greater amounts of raw data.

    19. Re:Also Amazing: How much we miss by Anonymous Coward · · Score: 0

      Word! The only qualifications you need to write for the wikipedia are: a computer, internet connection and a bit of time on your hands. Kinda like....slashdot....

      Criticizing the Encyclopedia Britannica is like criticizing academia. It isn't perfect by any means, but it beats getting your information from the guy in the bar with the loudest voice.

    20. Re:Also Amazing: How much we miss by Nefarious+Wheel · · Score: 1

      We're not asked to calculate logarithms much anymore, either -- and yet we're doing far more calculations today then we ever did in Napier's time. A power drill does not build a house; you still need a carpenter.

      --
      Do not mock my vision of impractical footwear
    21. Re:Also Amazing: How much we miss by Skim123 · · Score: 1

      I'm not a luddite, saying that technology dumbs the masses. As my own post said, it can be used as a great educator, but reliance on it is a path that leads to ignorance.

      --

      I could not justify my existence if I were a turkey farmer. Would I terminate myself? Undoubtably, yes.

    22. Re:Also Amazing: How much we miss by Anonymous Coward · · Score: 0

      Just FYI, academic search engines certainly do already exist. For example, CiteSeer.

    23. Re:Also Amazing: How much we miss by ninjamonkey · · Score: 1


      Back when I was in school, researching a subject typically meant going through encyclopedia after encyclopedia, which wasn't a bad thing. I learned quite a bit by being FORCED to over-research topics. Today, I can generally straight-shoot to whatever I need to find, giving my brain a good set of blinders to everything else along the way.

      Wikipedia has a few nice features for this: the Today's Featured Article, Did You Know..., and my favorite, the Random Page.

      When I have a few minutes with nothing to do, I like to click a few random pages and see what I come up with.

    24. Re:Also Amazing: How much we miss by Thing+1 · · Score: 1
      Okay!
      (.)(.)
      She seems to be sagging a bit...
      --
      I feel fantastic, and I'm still alive.
    25. Re:Also Amazing: How much we miss by danila · · Score: 1

      No, it isn't. It's just that with advanced technology many people have the luxury of being ignorant and still living well. The problem is not in technology, the problem is bad education systems, lack of constitutional protection for the right to learn and not recognizing that the society is responsible for personal development of its members.

      --
      Future Wiki -- If you don't think about the future, you cannot have one.
    26. Re:Also Amazing: How much we miss by danila · · Score: 1

      But you would never be able to cite the wikipedia as a reliable source in a school paper...well, maybe in high school, or if your prof didn't really know what it was.

      Wikipedia as an academic source - a small list of academic works citing Wikipedia as a source.

      You are simply incapable of seeing the future. It's bad for you, but it's not abnormal. At any given moment most people are dead wrong about the future. If you asked people in 1990 whether Internet will be important, 99% would say it won't. If you asked people in 1900 whether planes are likely to carry hundreds of millions of people each year, 100% would say they aren't. Nevertheless, after some time passed, the percentage of people answering "yes" has increased to something around 100%. The same happens all the time. Take any technology that will be BIG in 2020 and ask people about it. Most will say that it's nonsense and will never be done. And they will all be wrong.

      --
      Future Wiki -- If you don't think about the future, you cannot have one.
  9. More useless search results? by SerialEx13 · · Score: 4, Insightful

    so that pages can match even if none of the words in your query actually appear on the page.

    Even pages that come up in my search results now that contain my query don't even have anything to do with what I am looking for. Isn't this just adding to the problem?

    How about a Did you mean? option that doesn't compare against spelling, but related topics instead?

    1. Re:More useless search results? by InfiniteWisdom · · Score: 4, Informative

      It says they're using clustering, so it might help eliminate pages that contain the words you're looking for but aren't relevant to your current query, in addition to including pages that are relevant but don't contain the words. For example,

      the word "tree" may either refer to a data structure (binary, B-,red-black etc.) or to the stuff forests are made of. If my query is "search tree", the words search and tree may show up on a page about people searching for some kind of a tree and on pages about search trees. Assuming they're both popular classes of pages, you're going to end up with some mishmash of results from both classes.

      Instead, the clustering algorithm might notice (based on other words that appear on the pages, for example) that pages with 'search' and 'tree' in them fall into two classes. That doesn't help if "search tree" is all it has to go by. But now if I add the words "data structure" to the query, it knows which class of pages I'm interested in, because many pages about binary trees contain the words "data structure" whereas almost none about the quest for trees do. Now it can return pages from the right cluester that it knows are relevant, even if they don't contain the word "data structure" in them.

    2. Re:More useless search results? by Anonymous Coward · · Score: 0

      Ah, but what if I need to develop some sort of searchable database for the forestry service?

      Sorry, just being a smart-ass. You did give a fairly good answer....

    3. Re:More useless search results? by GiantMonkey · · Score: 1

      sounds a lot like vivisimo: search tree

  10. no AND needed by tehshen · · Score: 4, Interesting

    From the summary:

    they're starting to have success with automatic clustering of concepts, so that pages can match even if none of the words in your query actually appear on the page.

    From the help guide:

    By default, Google only returns pages that include all of your search terms.

    Which of these is correct? If it's the summary, is there any way to turn this behaviour off? I find it immensely annoying.

    --
    Guy asked me for a quarter for a cup of coffee. So I bit him.
    1. Re:no AND needed by Ironsides · · Score: 4, Informative

      they're starting to have success with automatic clustering of concepts, so that pages can match even if none of the words in your query actually appear on the page.

      I think what they mean is that they are working on search algorithms that will implement this. Not that they have already made it publicly available. They want it to work first, and be released second. The problem the you have cropping up most likely occurs with pages that put info in the metadata, and hence don't show up in the page itself.

      --
      Fly me to the moon Let me sing among those stars Let me see what spring is like On jupiter and mars
    2. Re:no AND needed by M00TP01NT · · Score: 3, Insightful

      I don't know if this is what TFA was getting at, but in a google cache page you may from time to time see the phrase "These terms only appear in links pointing to this page: ...".

      For example, try searching for "miserable failure" on Google. The first result is George Bush's biography on www.whitehouse.gov.

      However, the term "miserable failure" doesn't actually show up (yet) in the biography. But, pages that POINT to the biography do include those terms.

      As a result, pages can match your search query even if none of the words in your query actually appear on the page.

    3. Re:no AND needed by amiable1 · · Score: 1

      What about appending "+" before the word? Maybe this is not so redundant.

    4. Re:no AND needed by Anonymous Coward · · Score: 0
      is there any way to turn this behaviour off? I find it immensely annoying.
      Of course. Try "allintext:" operator.

    5. Re:no AND needed by tehshen · · Score: 1

      Appending "+" just ensures the word is seen, as Google rejects 'I' and 'from' etc. It doesn't affect things like this.

      "allintext:" before everything works. Thanks to a helpful AC.

      --
      Guy asked me for a quarter for a cup of coffee. So I bit him.
    6. Re:no AND needed by jc42 · · Score: 1

      This does remind me of a funny story from the early days of search sites. There was an entomologist (studies insects) who put a paper he had written on his web site, and a few days later, it was getting a million hits per day, completely clogging his server. Finally, a colleague explained it by suggesting he go to a search site and type in "explicit sex image". His paper was the first URL returned. And, sure enough, it did contain those three words - in three different paragraphs. This is sometimes used as an example of an important limit to keyword indexing.

      Now, people like him will find their server swamped by searches for a set of words that they don't even use. I'm not sure they will necessarily see this as an improvement.

      But the idea that "miserable failure" returns www.whitehouse.gov as the first match is pretty funny, and by itself might make the effort worthwhile. Now if it would only return www.whitehouse.org as the second match ...

      (Right now, it returns a link to Jimmy Carter's bio as the second link, indented below Dubya's. I guess someone else had a similar idea ...)

      --
      Those who do study history are doomed to stand helplessly by while everyone else repeats it.
    7. Re:no AND needed by rp · · Score: 1

      I can't agree more. The *only*, I repeat: *only* reason Google and Altavista work is that you know what you're asking for, namely, for pages that actually contained the word(s) or phrase you are specifying. Lately I've noticed that Google is starting to return pages that do *not* literally contain the terms I'm specifying.

      Let me put it to you straight dear Google: this is *fucking retarded* and I'm going to seriously search for a competitor that doesn't do this unless you cease to do this *real soon now*. The nice age-old horse-beaten-to-death IR concepts about search engines outsmarting their users do not work, period.

      Thank you.

    8. Re:no AND needed by mikey13 · · Score: 1

      But they have publicly implemented a search operator that searches for a given term or any of its synonyms. So even if the word doesn't appear on the page, if the page includes a related word, you get the result. Just prefix a word with a tilde.

  11. Oops by Daedala · · Score: 5, Funny

    Theoretically, he said, if someone searches for "Bay Area cooking class," the system should know that "Berkeley courses: vegetarian cooking" is a good match even though it contains none of the query words.

    One word: cooking.

    I'm sure the principle is sound. I just think the example is a leetle bit flawed.

    --
    What I say does not represent the views of my employers, my friends, my cats, or myself.
    1. Re:Oops by Anonymous Coward · · Score: 0

      The person who wrote the article was a little sloppy and should have said "search term" instead of "query word".

    2. Re:Oops by Anonymous Coward · · Score: 0

      Yes but I think the example was more for "Bay Area" != "Berkely" than "cooking" == "cooking"...

    3. Re:Oops by swimmar132 · · Score: 1

      You're missing a few other connections it needs.

      1) The connection between "courses" and "class".

      2) The connection between "Bay Area" and "Berkeley".

    4. Re:Oops by ahem · · Score: 3, Interesting
      The actual quote from the article that I saw was:

      The company also is applying machine learning to its system to give better results. Theoretically, he said, if someone searches for "Bay Area cooking class," the system should know that "Berkeley courses: vegetarian cuisine" is a good match even though it contains none of the query words.

      FYI.

      --
      Not A Sig
    5. Re:Oops by itoleck · · Score: 1

      "a leetle bit flawed"
      Luckily though, Google does give suggestions for bad spelling.

    6. Re:Oops by Daedala · · Score: 2, Interesting

      Hmm. It must have been corrected; I did a direct copy/paste for my quote.

      --
      What I say does not represent the views of my employers, my friends, my cats, or myself.
    7. Re:Oops by apok04 · · Score: 0

      Make sure you quote the example correctly before you call it flawed. From TFA: ""Berkeley courses: vegetarian cuisine""

      --
      It's not a bug, it's a feature
  12. Too celver for their own good? by Mirk · · Score: 2, Insightful
    From the article summary:
    They're starting to have success with automatic clustering of concepts, so that pages can match even if none of the words in your query actually appear on the page.

    I hate that. Don't you hate that? When you type in a search keyword, isn't it because you want that keyword to appear in the documents you find?

    This "find tangentially related documents" feature will be fine so long as they make it optional and set it to be off by default. Otherwise, I don't want their idea of what pages I should be looking at polluting my results list.

    I call "innovation for the sake of innovation".

    --

    --
    What short sigs we have -
    One hundred and twenty chars!
    Too short for haiku.
    1. Re:Too celver for their own good? by huge · · Score: 1
      This "find tangentially related documents" feature will be fine so long as they make it optional and set it to be off by default. Otherwise, I don't want their idea of what pages I should be looking at polluting my results list.
      Agreed. What next, Google Clippy?
      --
      -- Reality checks don't bounce.
    2. Re:Too celver for their own good? by mythosaz · · Score: 2, Insightful
      The entire point of a search engine like Google is that they do give you their idea of what pages your query should return.

      That's how it works...

    3. Re:Too celver for their own good? by Halo- · · Score: 1
      Well, I agree that if Google starts ranking these derived results more highly than ones which actually contain the search terms it could be annoying. If I do a search for say "Roofer Austin" and there are no roofers in Austin found, I'd rather have something related like "Contractors Central Texas" returned than nothing at all. If for no other reason that it helps me reword my search.

      I do agree it should be an option though. Google (in my opinion) has been pretty good about not being obtrusive, so I suspect they won't piss people off with this.

    4. Re:Too celver for their own good? by Anonymous Coward · · Score: 0

      You _have_ the option. Use "allintext:".
      If you just had RTFM... [sigh]

    5. Re:Too celver for their own good? by Jugalator · · Score: 1

      When you type in a search keyword, isn't it because you want that keyword to appear in the documents you find?

      Hmm. Well, for many it's to find documents that match what you're looking for when giving it the keywords. Which may not necessarily be just the keywords you input themselves.

      As an actual example, the top link on Google when I search for "jfk death" was www.1underground.com/jfk.shtml, but if Google had understood I meant "John F. Kennedy" better, it would've maybe given me the page that comes up early on a search on "John F Kennedy death"; http://www.jfk-assassination.de/, which seems to be a much better resource than the former one.

      I call "innovation for the sake of accuracy". If Google didn't believe in that as well, they would obviously not implement the feature. A search engine's success is depending a lot on its accuracy. There's always competitors around the corners if it starts to suck or fall behind.

      --
      Beware: In C++, your friends can see your privates!
    6. Re:Too celver for their own good? by myov · · Score: 1

      I often have to repeat queries with different sets of keywords to solve what I'm looking for. For example, to search for something related to my operating system, I could search for "MacOS X", "Mac OS X", "10.3", "Panther", "Apple", etc. Or maybe it's a similar problem on linux, but I haven't used that as a keyword so I never see it.

      Google is in the business of organizing information. If it can improve my search, go for it.

      --
      I use Macs to up my productivity, so up yours Microsoft!
    7. Re:Too celver for their own good? by shish · · Score: 1
      When you type in a search keyword, isn't it because you want that keyword to appear in the documents you find?

      Personally, I use google to find useful information, not specific words; When was the last time someone said "find me a web page which contains the word 'foo'" as opposed to "find some information related to foo"?

      --
      I mod down anyone who says "I will be modded down for this", regardless of the rest of their comment
    8. Re:Too celver for their own good? by Paul+03244 · · Score: 1

      Umm--speaking of 'celver', actually they DO offer what you've described as an *advanced* search option:
      On the Google main page, click the "Advanced Search" link on the right, & on the linked page you will see four fields offering these options:

      with all of the words
      with the exact phrase
      with at least one of the words
      without the words

      all set Bucky? ;)

    9. Re:Too celver for their own good? by mvdw · · Score: 1

      Like the way MS broke the Explorer search in XP from the perfectly-good, easy-to-use, specific search tool in 2000.

  13. Yeah, I noticed that by rde · · Score: 3, Informative

    I've been putting movie reviews on my web page for a while now, and I've increasingly noticed that google will point people at them even though they search for stuff that isn't on the page. For example, I've had a number of hits where people search for 'AvP review' (or suchlike) and even though I never include the phrase 'AvP' in my review of Aliens vs Predator.

    I was mightily impressed, and not just because it means more people read my stuff. Or at least surf to it.

    1. Re:Yeah, I noticed that by Anonymous Coward · · Score: 0

      Problem is, if you know EXACTLY what you're looking for, this keyword clustering thing can get it the way. Add a freaking way to turn it off in the advanced search preferences.

    2. Re:Yeah, I noticed that by CdBee · · Score: 2, Informative

      I bet you'll find someones linked to you and put the phrase AvP in the link. Google references that as well...

      --
      I have been a user for about 10 years. This ends Feb 2014. The site's been ruined. I'm off. Dice, FU
    3. Re:Yeah, I noticed that by generic-man · · Score: 2, Interesting

      Try doing a search for a Macintosh software product. Even though "Mac OS X" was not one of your search terms, Google boldfaces it as though it were!

      I can't reproduce this with another term. I wonder whether this was a manual fix by Google programmers.

      --
      For more information, click here.
    4. Re:Yeah, I noticed that by arkanes · · Score: 1

      Ahem. No it doesn't. It bolds "macintosh" and "opera".

    5. Re:Yeah, I noticed that by generic-man · · Score: 1

      Well, I'll be damned. I just searched again and "Mac OS X" wasn't highlighted. I swear that it was when I searched earlier in the day.

      That's the weird thing about new unannounced Google features; they must deploy them to only a few machines out of a cluster at first, so users don't see them evenly.

      --
      For more information, click here.
  14. Semantic Search Technology by Anonymous Coward · · Score: 1, Informative

    Another interesting read on search engine technology.

    http://www.sigsemis.org/columns/swsearch/SSE1104/d ocument_view/

  15. Video about some of the backend stuff by otisg · · Score: 5, Interesting

    Here it is, from one of the Google guys:
    Google: A Behind-the-Scenes Look.

    --
    Simpy
    1. Re:Video about some of the backend stuff by otisg · · Score: 1

      Before /. effect kills that server, consider using Coral (www.coralcdn.org): here.

      --
      Simpy
    2. Re:Video about some of the backend stuff by ad0gg · · Score: 1

      I find it quite funny they are doing a power point presentation with screen shots of IE.

      --

      Have you ever been to a turkish prison?

    3. Re:Video about some of the backend stuff by adpowers · · Score: 1

      Somehow I don't think that is going to happen. Also, I don't think Coral works with streaming video (which this is). That said, I'd recommend people watch this video. I attended the lecture and it was really interesting. If you read a lot about Google and are observant, not much is new, but it is all put together nicely. However, you do get to see some interesting idea clustering stuff they have in their backend. I had never seen this before. Also, Jeffrey Dean is a funny guy.

  16. Google Lunar by Barryke · · Score: 4, Funny
    They're hiring.
    http://www.google.com/jobs/lunar_job.html
    a snippet:
    Google Copernicus Center is hiring
    Google is interviewing candidates for engineering positions at our lunar hosting and research center, opening late in the spring of 2007. This unique opportunity is available only to highly-qualified individuals who are willing to relocate for an extended period of time, are in top physical condition and are capable of surviving with limited access to such modern conveniences as soy low-fat lattes, The Sopranos and a steady supply of oxygen.
    --
    Hivemind harvest in progress..
    1. Re:Google Lunar by Anonymous Coward · · Score: 0

      why does the old lame ass pigeon joke get modded as +5 and this newer and still lame ass joke not?

      This is why I use http://www.gizoogle.com/ Much better search technology.

  17. "Celver"? Did I say "celver"? by Mirk · · Score: 1

    Looks like I was too dmub for my own good.

    --

    --
    What short sigs we have -
    One hundred and twenty chars!
    Too short for haiku.
  18. Want the dean.pdf without a USENIX account? by robmandu · · Score: 2, Informative
    --

    --
    Break the rules. Keep the faith. Fight for love.
  19. Question... by kryogen1x · · Score: 4, Interesting
    Moreover, Google has created its own patches for things that haven't been fixed in the original kernel.

    Do they share these patches with everyone else?

    1. Re:Question... by TreeHead · · Score: 2, Insightful

      ;i was wondering the same thing. do modifications of this sort fall under the GPL? if so, isn't google required to share them with the public, or are "patches" not considered "modifications" to the software?

      ;treehead

      --

      "If any part Linux was stolen, then Windows was the biggest heist in history."

    2. Re:Question... by Jussi+K.+Kojootti · · Score: 1

      Did you read the paragraph you linked to? As long as they're just using the modifications themselves they are under no obligation to publish them.

    3. Re:Question... by limbostar · · Score: 5, Informative

      They're not obligated to share unless they are planning on redistributing the software. They are perfectly free to patch their own software and use the patched versions for their servers without sharing those modifications.

      The GPL does not force them to do anything unless they wish to redistribute the software.

      --
      this is a sig.
    4. Re:Question... by TuringTest · · Score: 1

      "patches" are considered "modifications", but since they're not distributing the code they're not forced to provide the source.

      --
      Singularity: a belief in the "God" idea with the "demiurge" relation inverted.
    5. Re:Question... by generic-man · · Score: 2, Funny

      They will, once the patches are out of beta.

      --
      For more information, click here.
    6. Re:Question... by GeckoX · · Score: 1

      If they are only using the modified software internally why would they be required to share them?

      --
      No Comment.
    7. Re:Question... by TreeHead · · Score: 1

      ;i did indeed read that paragraph, and i linked to it precisely because google does distribute their software, at least at some level. the more technically accurate question then, i suppose, is whether *distributed* google products run modded linux.

      ;treehead

      --

      "If any part Linux was stolen, then Windows was the biggest heist in history."

    8. Re:Question... by Anonymous Coward · · Score: 0

      Thats one of the reasons why the idea of the GPL is flawed. Essentially Google can take all the work of all the people that contributed to the GPL code they use and use it internally without any sort of restitution. You guys are working for the corporations for free. I am sure the newly minted billionaires and millionaires at Google keep you in their thoughts though and give you a toast at holiday time.

    9. Re:Question... by lgw · · Score: 1

      The wording of the GPL, is unclear IMO for a large organization "distributing" things internally.

      If my company's IT department puts code on my desktop, and it's modified GPL code, are they required to make the source code available to me? Or has my company not "distributed" the code, as it's only used internally? The intent of the GPL is that the user always has the ability to investigate and tinker, but I'm not sure it works out that way.

      --
      Socialism: a lie told by totalitarians and believed by fools.
    10. Re:Question... by Anonymous Coward · · Score: 1, Informative

      The FSF specifically addresses this question in their GPL FAQ, and notes that internal distribute does not require releasing source.

    11. Re:Question... by lgw · · Score: 2, Interesting

      Sure, what what are the bounds of "internal distribution" when a maze of subcontractors and wholly-owned subsidiaries are involved?

      --
      Socialism: a lie told by totalitarians and believed by fools.
    12. Re:Question... by southpolesammy · · Score: 1

      Do their appliances qualify as redistribution?

      --
      Rule #1 -- Politics always trumps technology.
    13. Re:Question... by sploo22 · · Score: 1

      Jerk.

      It's called free software for a reason. It's free for anybody to use and modify. If Google wants to use it, more power to them. How would you like it if you couldn't change a single character of your kernel's code without posting it on a website? Should it be illegal to modify GPL code if you don't have an internet connection or other means to publish? Where would you draw the line?

      --
      Karma: Segmentation fault (tried to dereference a null post)
    14. Re:Question... by dakirw · · Score: 2, Informative

      Do their appliances qualify as redistribution?


      Technically, they're leasing a black box to you, so they still own the appliance. We have one here at the office, and we're not allowed to open up those pizza boxes. If there's a problem, they ship us another one or send a tech over.

    15. Re:Question... by asdfghjklqwertyuiop · · Score: 1

      Thats one of the reasons why the idea of the GPL is flawed. Essentially Google can take all the work of all the people that contributed to the GPL code they use and use it internally without any sort of restitution. You guys are working for the corporations for free. I am sure the newly minted billionaires and millionaires at Google keep you in their thoughts though and give you a toast at holiday time.


      The "flaw" you're talking about isn't in the GPL. Contrary to what you'd learn by reading most commercial EULAs, Copyright only gives you control of the rights to make and distribute copies. It doesn't grant you all kinds of control over what people can do with the copies you've given them. Those are their copies.

    16. Re:Question... by Anonymous Coward · · Score: 0

      This is nowt more than idle speculation, ask a lawyer for real advice.

      I would guess internal would be if you were working for them (directly or indirectly). However it's sounds like a grey area because it's not 100% clear when a partner becomes a customer...

      Either way your company is bound to know where internal starts and stops if only for tax reasons...

  20. Sure? by ferar · · Score: 5, Funny

    I always thougth that they used NT + Access Database.

    1. Re:Sure? by Anonymous Coward · · Score: 0

      They did, then they had the 6 fire trucks outside their data center.

    2. Re:Sure? by BearJ · · Score: 1
      I thought it was just a big Excel file, then they have people reading the requests coming in off the web, hitting CTRL+F in Excel, and away you go!

      Or...maybe not

      --
      Stand clear of the doors. The doors are now closing.
    3. Re:Sure? by ggvaidya · · Score: 1

      You mean it's not a flatfile?

    4. Re:Sure? by Anonymous Coward · · Score: 0

      And they use Peachtree for their accounting...

      Sorry, guys, still a little bitter about that one.

  21. gCluster by RobiOne · · Score: 5, Informative

    They should make a googleCluster Live CD.. ala clusterKnoppix.. ..or perhaps use more of clusterKnoppix features or openmosix..share cpu/mem..
    sourceforge is begging for something like this..

    Their engineer desktops have special google builds of linux which help them compile things insanely fast with g4, ie hacked p4 (Perforce).

    They also have one of the best intranet sites I've seen. Lots of info and services the employees can use, apart from email.

    The internal blogs really help with keeping track of projects you're not working on, and what others are doing. Their mailing lists are often usefull too, for example there's a lost and found, for sale, and biking partners list. All kinds of usefull little stuff, taking care of the people with little nice things. Lots of reading too.

    -- Robi

    --
    -- Robi
    1. Re:gCluster by Anonymous Coward · · Score: 0

      I'm sure they have a slipstreamed installer.

      As for a live cd, are you insane? Those things suffer performance lag like nobody's business. And, once you move it to non-cd, you're pretty much talking a slipstreamed installer.

  22. kernel patches? by alphan · · Score: 4, Insightful
    Moreover, Google has created its own patches for things that haven't been fixed in the original kernel.

    and the obvious question:

    where are the patches?

    Anybody knows? This is not a GPL question just an ethical one.

    1. Re:kernel patches? by Anonymous Coward · · Score: 1, Insightful

      It _might_ be a GPL question. It depends if Google is distributing their patches in their corporate intranet search applications.

    2. Re:kernel patches? by ornil · · Score: 1

      If they make custom patches for their own use and don't sell them, I don't see what's the problem. They have a legal and moral right to do that, it seems to me.

    3. Re:kernel patches? by DeKO · · Score: 2, Insightful

      If you consider the "freedom" involved in Free Software, you'll notice that they use their modified software for their own purposes. They are free to use the software in any way, they are free to modify it. And they aren't distributing it, so they aren't distributing the source code of their changes. I don't see any problem with it.

    4. Re:kernel patches? by The+Bungi · · Score: 3, Insightful
      where are the patches?

      They'll tell you as soon as you point out where or how they are distributing them (yes, that's why it wasn't a GPL question).

      Why should Google be "ethical"? Likely these modifications are part of their IP trove, which keeps them ahead of the (already heated up) competition.

      If you don't like the way someone uses the software you're giving away then perhaps you shouldn't give it away, or maybe it's just that the license is flawed. It's dumb to expect people who run billion-dollar publicly traded corporations to be "ethical". Mom and pop shops are "ethical".

      The whole concept of "free software" as encoded by the GPL is increasingly being outmoded by things like server-bound distributed applications (see that clumsy Affero GPL) and companies like Google which have strategic interests in the stuff. It's called progress.

    5. Re:kernel patches? by rk · · Score: 2, Interesting

      On their own servers, then they're obeying the rules.

      The question is: Do they use these patches on the search appliances they sell, and does that count as "distribution"? I honestly don't know the answer to that question, and I'd like to think Google has sharp legal advisors to go with their sharp technical people.

    6. Re:kernel patches? by alphan · · Score: 1
      On their own servers, then they're obeying the rules.

      You mean GPL. The other "rules" are pretty subjective. I for one, would like to see Google act in favor of the community assuming the kernel patches are not the core of their technology.

      The question is: Do they use these patches on the search appliances they sell, and does that count as "distribution"? I honestly don't know the answer to that question, and I'd like to think Google has sharp legal advisors to go with their sharp technical people.

      Probably they do. But I am still curious about some of the hardware they sell.

    7. Re:kernel patches? by lgw · · Score: 1

      They'll tell you as soon as you point out where or how they are distributing them (yes, that's why it wasn't a GPL question).

      How is rolling out a program to thousands of machines not "distributing" it. Friends of mine responsible for corporate networks certainly use the word "distribute" when talking about moving a patch to all of the machines they're responsible for. Further, they talk about "distributing a patch externally" for things going beyond their labs to their "customers" (at the same company). Do those "customers" have the right to see the source code, if it's GPLed?

      --
      Socialism: a lie told by totalitarians and believed by fools.
    8. Re:kernel patches? by AsimovBesterClarke · · Score: 2, Insightful

      > and the obvious question:
      >
      > where are the patches?

      No. The obvious question is "WHAT are those patches?" Followed by "where are the patches?"

      --
      Ads are broken.
    9. Re:kernel patches? by digidave · · Score: 2, Insightful

      Stop trying to make it a semantic argument. Distributing according to the GPL is not the same as patching your own systems and I'm sure you know that.

      The only question is whether or not Google is selling these patches as part of their appliances.

      --
      The global economy is a great thing until you feel it locally.
    10. Re:kernel patches? by Anonymous Coward · · Score: 0

      Their patches probably aren't needed and maybe aren't
      used on their appliances.
      It could be that their patches only have applications in distributed
      computing. I would like to know for sure, though.

      If they are making billions of dollars off linux,
      morally they should anyway, besides legal requirements.
      The GPL version 3 should address this.

    11. Re:kernel patches? by Anonymous Coward · · Score: 0

      This is not a GPL question just an ethical one.

      It's not even an ethical one, it's a practical one.

      When a new version comes out, then can't use it because it doesn't have their patches. So they are either stuck with old versions, or they have to keep altering new versions. If they submitted the patches upstream, they'd reduce maintenance.

    12. Re:kernel patches? by lgw · · Score: 1

      How often is a legal argument involving contracts *not* a semantic argument? I'm not really concerned with Google here, but I wonder how well the GPL works in a large corporate environment, where "patching your own systems" commonly involves a dozen companies with intricate ownership relationships.

      --
      Socialism: a lie told by totalitarians and believed by fools.
    13. Re:kernel patches? by Anonymous Coward · · Score: 0
      "And they aren't distributing it, so they aren't distributing the source code of their changes."

      What about those appliances they sell?

    14. Re:kernel patches? by colores · · Score: 2, Informative
      From "The Google File System" (pdf)", pag 14:
      "When appropriate, we improve the kernel and share the changes with open source community"

      A "grep -R google *" In my 2.6.5 kernel tree returns back:
      drivers/net/arcfour.c: * by Frank Cusack
      drivers/net/ppp_mppe_compress.c: * By Frank Cusack

      As established in the links he works in Network Working Group of Google
    15. Re:kernel patches? by Anonymous Coward · · Score: 0

      cisco uses a modified linux on some of there equiptment (and will be releasing at least 1 more product running it) and they don't share the code, infact, it is internal information only.

    16. Re:kernel patches? by Random832 · · Score: 1

      sure, it's legal, and it's moral... but is it cool? I think not

      --
      We've secretly replaced Slashdot with new Folgers Crystals - let's see if it notices.
  23. "The text you entered was not found." by Doc+Ruby · · Score: 4, Interesting

    " pages can match even if none of the words in your query actually appear on the page"

    The main flaw I've found in Google's results has been when it returns pages without one of my query words, which doesn't respond to the sense of my query. Sometimes it's changed page content at the same URL, so I go back and get the "cached" page, if it exists. The cached pages reveal in their headings whether the page matched only because the query word was found only in another page linking to the returned page. I'd like their immediate results to show that distinction, and to have links in the results to click around those pages related by my complete query. The current click/back/"cache" combinations are frustratingly disconnected, conflicting with Google's otherwise smooth immediacy.

    --

    --
    make install -not war

    1. Re:"The text you entered was not found." by nkh · · Score: 1

      The other thing I hate is when Google returns different results based on the page you came from: Google is trying to autodetect the page of my country (Google.uk, Google.se, Google.fr, Google.de... I hate this) and then gives me results in the language of the detected page.

      When I ask for Google.com, I'd like Google.com, not Google.random_country! (but I know, it's not a bug, it's a feature)

    2. Re:"The text you entered was not found." by omahajim · · Score: 1

      Isn't it simply doing this by geolocation of your IP address?

    3. Re:"The text you entered was not found." by Anonymous Coward · · Score: 0

      Actually yes, but unfortunately IP address have nothing logically to do with location and are only relavent by maintaining HUGE databases for ranges vs location which are changing all the time, and thus frequently wrong.

      WHERE THE HELL IS IPv6 DAMMIT?!?!?!?

    4. Re:"The text you entered was not found." by Anonymous Coward · · Score: 0

      Hmmm, IIRC if you click the Go to Google.com link on a country page it sets a cookie that won't redirect you to the country page when you hit google.com.

  24. University TV by tim256 · · Score: 1
    If you pick up the University TV channel, google did a presentation a couple of years ago that describes their server setup and the basics of their search technology. It was still very technical.

    I saw this one hour presentation on one of the 9000 channels offered with Dish TV. I think the show was something like "Computer Engineering Technology". They'll probably run that episode again.

    Anyways, I thought it was interesting and if you get that channel (I think it's by Washington U), you can see it too.

  25. Re:"Celver"? Did I say "celver"? by tehshen · · Score: 1

    You should have used Google ;)

    --
    Guy asked me for a quarter for a cup of coffee. So I bit him.
  26. Google Maps - Designed to protect data centres by Matt+Clare · · Score: 5, Funny

    Google's redundancy theory works on a meta level, as well, according to Hoelzle. One literal meltdown -- a fire at a datacenter in an undisclosed location -- brought out six fire trucks but didn't crash the system.

    "You don't have just one data center," he said, "you have multiples."

    The real idea behind Google Maps is so that as the server catches fire it use it's last cycles to send an eMail to the nearest fire cheif and include a map. I think it would also throw in a GMail invite for incentive.

    --
    .\.\att Clare
  27. Re:"Celver"? Did I say "celver"? by Quiet_Desperation · · Score: 1
    No, you're fine. Celver is the old Gaelic spelling, and I'm not going to argue with burly men in kilts, are you?

    No, wait, Gaelic is Ireland. Never mind.

    Anyway, you're fine, but the alternative definition for celver is "take one's own sister carnally" so maybe you were being dmub.

    Look, I dunno what I'm talking about. What are you reading this for anyway? Get back to work!

  28. Question -- Is any of this considered P2P? by Didion+Sprague · · Score: 1, Interesting

    Question -- and this may be a dumb one, but I'm going to ask it anyway:

    How much of what Google is doing -- the clustering, the redundancy, the sub-categorization -- how much of this (if any) could be described -- could fit under the mantle of "Peer-to-Peer"? Is anything that Google is doing here remotely considered P2P? (Even if the P2P is what's going on on their own, in-house servers?)

    Obviously, I ask this because of the upcoming supreme court case. And I ask because it struck me as I read the article that what Google is doing *seems* to be breaking down complex tasks and simplifying them so that work across the network -- their network, your network -- and I wonder if this is (in theory?) what Peer-to-Peer is doing?

    (I'm thinking, too, of the Google concept of "shards" and how their data is distributed.)

    1. Re:Question -- Is any of this considered P2P? by Anonymous Coward · · Score: 0

      No. It's not P2P.

    2. Re:Question -- Is any of this considered P2P? by MrAnnoyanceToYou · · Score: 2, Insightful

      Interesting addendum to that question - Is Google infringing upon copyrighted information by caching EVERY page they run across? That seems like pulling massive amounts of copyrighted Java code or design code or images or etc. into their server for 'personal' use...? Does this break any laws?

    3. Re:Question -- Is any of this considered P2P? by odin53 · · Score: 1

      Section 512(b) (part of the DMCA) specifically exempts system caching on the part of certain service providers, as long as the provider complies with certain requirements. It's not entirely clear that Google's cache falls under the exemption in section 512(b) -- e.g., Google might not be the kind of provider that's contemplated in section 512(b), and also it goes out and caches content before someone asks for the content -- but I would guess the exemption is what Google principally relies on. Otherwise, it might be fair use, though that's not clear either.

    4. Re:Question -- Is any of this considered P2P? by Anonymous Coward · · Score: 0

      Even if it was do you think a webmaster is gonna whine because Google was caching them? Kinda shooting yourself in the foot.

    5. Re:Question -- Is any of this considered P2P? by MrAnnoyanceToYou · · Score: 1

      Bloggers might not, but the Museum of Modern Art or the RIAA or the MPAA might, were Google to start caching songs / videos in small amounts for video searching. Could bring a lot of copyright issues into play that shouldn't be in play at all.

  29. considering.... by WindBourne · · Score: 2, Insightful

    that the virus which used google could not do it with 10's of thousand of computers, it is not likely that /. can do it.

    --
    I prefer the "u" in honour as it seems to be missing these days.
    1. Re:considering.... by Anonymous Coward · · Score: 0, Funny

      oh yee of little faith

      -Parent AC

    2. Re:considering.... by jesser · · Score: 1

      Google was down for hours the day that virus hit, at least for me. Maybe some of Google's datacenters did better than others.

      --
      The shareholder is always right.
  30. Re:Has slashdot... by Matt+Clare · · Score: 1

    I think CmdrTaco uses Safari to administer the site.*

    When you look at Safari what do you see? Apple + Google. One is always shaped by one's enviroment.

    --------------------
    *I saw it when I was trying to like the new Screen Savers. I've since returned my digital cable box.

    --
    .\.\att Clare
  31. MSN Search by jlramirez · · Score: 1

    I'd love to see the original articles server log file to see how many hits come from the MSN Search dev team.

    --
    "Me claiming Satan exist is just as valid as you claiming an atom exists" - 1inChrist
  32. no matches, yet matches by pedantic+bore · · Score: 1
    ... pages can match even if none of the words in your query actually appear...

    Let me guess... the pages that match just happen to point to advertisers?

    --
    Am I part of the core demographic for Swedish Fish?
  33. In some cases it's VERY useful by fdrebin · · Score: 1

    Sometimes this IS what I want. For instance, maybe I don't know what I'm looking for, thus finding similar concepts can be very handy.

    Perhaps Google Search Exact and Google Search General buttons in addition to the Do you feel lucky, Punk? button?

    --
    Stupidity... has a habit of getting its way.
  34. Frugal Google by Sundroid · · Score: 3, Insightful

    The word, "cheap", is used 4 times in the C/Net article that describes Google's "secret of success" -- "buying relatively cheap machines", "cheap commodity PCs", "(Power) becomes a factor in running cheaper operations", "not just buying cheaper components".

    They say being frugal is a virtue, which Google has, evidently. What is the lesson here? Holding down the cost and being innovative never fail. I guess.

    1. Re:Frugal Google by Anonymous Coward · · Score: 0

      "Never fail"? WTF? Innovation virtually ALWAYS fails. In rare cases it pays off big, but usually it's so far "out there" that no one knows what to do with it.

      Being innovative is like putting a million $ on #18 on a roulette wheel. Whereas being non-onnovative is more like betting black/red. The riches will be much lower, but much surer.

      Google as a single data point for supporting your conclusion is completely useless.

    2. Re:Frugal Google by godless+dave · · Score: 1

      I think efficient is a better word than cheap. I've seen plenty of companies go for a cheap short-term solution that ends up costing them more long term. Pennywise, pound-foolish and all that.

      --
      "If it's real, then it gets more interesting the closer you examine it. If it's not real, just the opposite is true." -
  35. define: cheap machines by TreeHead · · Score: 1

    "The downside to cheap machines is, you have to make them work together reliably," Hoelzle said. "These things are cheap and easy to put together. The problem is, these things break.

    ;when he refers to "cheap machines," is he speaking of software or hardware (or both)? the reason i find this interesting is that linux has a reputation for being very stable as a server operating system and i'm wondering what exactly is "failing" on these so-called "cheap machines"--the operating system or the hardware....

    ;treehead

    --

    "If any part Linux was stolen, then Windows was the biggest heist in history."

    1. Re:define: cheap machines by canadiangoose · · Score: 5, Interesting
      I read somewhere that early Google datacentres were built by filling their racks with plywood shelves, then filling each shelf with one power supply running four motherboards each with one HDD. They didn't even use cases. This allowed them to build massively dense datacentres very cheaply. At one point they decided it wasn't worth it to replace dead hardware, so they started placing the racks too close together to be accessible. Why dig through and replace things when you can just keep adding more?

      Anyhow, the article mentioned that in these early datacentres they experienced something like a 25% hardware failure rate, but that it didn't matter because the software worked around it and the hardware was cheap.

      Here's a link to the page where I read all this neat stuff. It's probably mostly about the same stuff as the article we've all just slashdotted, but I won't be albe to tell for a while....

      --
      Never eat more than you can lift -- Miss Piggy
    2. Re:define: cheap machines by not-real-sure · · Score: 1

      The thing the fails the most is hardware specifically the drives. They are using basic desktop systems maybe with a little more RAM. I read an article here on /. that covered this. It was very informative.

      --
      My Doom. The gift that keeps on giving
    3. Re:define: cheap machines by Anonymous Coward · · Score: 0

      I know you're not expecting this to be the case, but software fails quite frequently as well. I don't have to work for them to know they find bugs in the kernel, filesystem, memory manager, drivers, and probably every part of the system because they can make any software fail.

      And it's worse when it does, because in most cases hardware can be replaced when it fails. Software has to be debugged when it fails, and if these bugs were easy to find, they wouldn't still be there.

      When you have 10,000 custom motherboards with an Ethernet chipset that has a buggy driver, you are going to have 10,000 times as many problems as the person who only has one of them. And Google pushes their machines to the breaking point so they notice immediately when the network card scribbles all over kernel memory, while most users wouldn't see the crash until sometime later and would never attribute it to the net driver.

      If you had a motherboard with a faulty driver that the manufacturer couldn't figure out how to fix, you could just swap it for a better one. Google would have to swap out 10,000 of them, so it's easier to just live with the flaw until it's fixed.

      dom

  36. Re:Truly Amazing - Sad Story by fdrebin · · Score: 1

    What I find to be truly amazing is that there are people who don't believe in 'black magic' like this.
    Where I work, we needed a revision in how data was stored for our applications. What we came up with was rather similar to what Google does, though on a little smaller scale.
    What happened to the project? It was torpedoed, sabotaged, generally screwed in the ... functionality, because a couple old goats didn't understand it. Seriously.

    --
    Stupidity... has a habit of getting its way.
  37. Impressive technology but the algorithms aren't by Anonymous Coward · · Score: 0

    I've pretty much given up hope. All search engines do these days is spit out ad-populated and commercial websites trying to sell something. I'm not trying single out google here.. but their search results are not much different from any other query engine. Try any search today, any topic, and the first 20 results will be for pages trying to sell something, useless portals filled with links or *slightly* relavant pages absolutely crammed full of ads on the left and right, top and bottom.. google, yahoo, a9, whatever... they're all pretty much the same.

    I guess the good old days are long gone... it's too bad.

    Google might do itself and it's user a favour.... Rank down any page with a '$' in it... and any page with more than 15-20 links.... Just dump those into the abyss.

    r.a.s.1974

    1. Re:Impressive technology but the algorithms aren't by TheAwfulTruth · · Score: 2, Interesting

      Heh, well they could NEVER do that :)

      Here's another great idea you inspired that they could also never do (being a commercial company themselves and all).

      When I am searching I virtually always want to do one of two distinct things:

      1) Sarch only commercial sites for a product to purchase.

      2) Search everything but commercial sites for information.

      There really should be a "$" flag that you could add (or at least a "!$" flag) to control wheather you see commercial or non-commercial sites in the results list.

      --
      Contrary to popular belief, coding is not all free blow-jobs and beer. Those things cost MONEY!
    2. Re:Impressive technology but the algorithms aren't by WormholeFiend · · Score: 1

      you mean there are websites on the Internet that don't try to get any money from anyone?

    3. Re:Impressive technology but the algorithms aren't by Anonymous Coward · · Score: 0

      Well you wouldn't know it by using Google but, yes, there one or two. :)

    4. Re:Impressive technology but the algorithms aren't by white1827 · · Score: 1

      www.froogle.com is for searching for products.

  38. No, only electronic voting uses that system by Anonymous Coward · · Score: 0

    It's obviously not nearly as important as Google's search engine, so that's ok.

  39. Obligatory link to Google research paper by Anonymous Coward · · Score: 1, Interesting
  40. Nothing else innovating but google? by Anonymous Coward · · Score: 1, Insightful

    Why is there so much "google" on slashdot? I don't get it. Are they these days all the industry has to offer?

    Google == great, but not everything.

  41. MapReduce by iluvcapra · · Score: 2, Informative

    Alot of this stuff is application of SAN/RAID/Failover technology, which is cool (and we've never seen it so pervasively implemented), but not horribly revolutionary. I think the slickest thing they've developed, but might not get the most attention is their MapReduce framework. The abstract from their paper:

    MapReduce is a programming model and an associated implementation for processing and generating large data sets. Users specify a _map_ function that processes a key/value pair to generate a set of intermediate key/value pairs, and a _reduce_ function that merges all intermediate values associated with the same intermediate key. Many real world tasks are expressible in this model, as shown in the paper.

    It seems that the hard part of building massively parallel applications is efiiciently separating the parallel aspects of a problem from the necessarily serial aspects. If you start with a programming framework+runtime that handles this automatically, this could be a major boon to people running massively parallel applications. Could anyone who does this sort of thing often post their opinion on this?

    All google has to do know is figure out a way to charge for it.

    --
    Don't blame me, I voted for Baltar.
    1. Re:MapReduce by solomonrex · · Score: 1

      You don't sell your core technology. Period.

  42. Re:Has slashdot... by Anonymous Coward · · Score: 0

    You mean DotDot.org

  43. Re:Stripped-down Red Hat? by Anonymous Coward · · Score: 0

    "lacks the backing of a serious, committed enterprise"

    yes, because Google is not a serious, commited enterprise, right?

    I would have modded parent as funny.

  44. Re:Result: by Anonymous Coward · · Score: 0

    How do I find the book.

  45. hardware by r00t · · Score: 1, Interesting

    Google really slaps together a pile of junk.
    Parts fail left and right, and nobody bothers
    to fix them. The software hides all this from
    the users.

    Google even checksums the data, on the assumption
    that it is frequently getting corrupted by all the
    junk hardware they buy.

    1. Re:hardware by slim · · Score: 1

      Google really slaps together a pile of junk.
      Parts fail left and right, and nobody bothers
      to fix them. The software hides all this from
      the users.

      Google even checksums the data, on the assumption
      that it is frequently getting corrupted by all the
      junk hardware they buy.


      I find this self-healing incredibly elegant. I understand that it makes better economic sense for Google to simply ignore broken hardware than to attend to it. I read that they would not even send someone round to turn off a broken machine.

      But - the environmentalist in my is repelled by the idea. As time passes, Google will amass a huge and growing number of completely useless energy wasting units... what's their plan to dispose of antiquated datacentre equipment?

    2. Re:hardware by r00t · · Score: 1

      Google discards computers in bulk too. The true
      numbers are secret of course, but an example that
      seems to fit: after 2 years, sell a whole room full
      of old servers to a company that specializes in
      stripping old computers for parts. Perhaps 1/3 of
      the computers are dead, usually because one part
      has failed.

      Bulk operations make things easy on both Google
      and on the computer recycling company.

  46. Not Bubba! by antdude · · Score: 1

    His name is Hercule. :P

    --
    Ant(Dude) @ Quality Foraged Links (AQFL.net) & The Ant Farm (antfarm.ma.cx / antfarm.home.dhs.org).
  47. Laziness, ignorance or by sporty · · Score: 2, Insightful

    I think the only reason other companies don't do as well as google is due to either laziness or ignorance to some basic things and some advanced things. An index is not the most ground breaking thing in the world. Job delegation and breaking up work is not that ground breaking either. Clustering has been around in concept since forever. Now I ask you, the public, not just you iibbmm, how many applications have you done that use these concepts? Most biz concepts are very simple. They don't try to implement vertex cover or try and do the 3CSAT NP-Complete problems.

    Not to downplay google. Google did a great job of implementing a lot of these things: indexing, job delegation and maybe a good beaucracy. Larger companies either are lazy, ignorant or simply don't have to. I've worked for a few companies that "don't have to", but lord, if the places that weren't so ignorant or lazy, they could be powerhouses just by what they could do...

    --

    -
    ping -f 255.255.255.255 # if only

    1. Re:Laziness, ignorance or by Kashif+Shaikh · · Score: 4, Insightful

      None of the concepts of computer science are new, but what is ground breaking is Google touching all aspects of computer science to solve a problem. Distributed Databases, Replicated Filesystems, Clustering, Learning algorithms, job scheduling, map/reduce languages, etc. are not new. But they applied each of these sub-domains to 'searching' and 'lots of data'. Using old ideas is _new_ ways is ground breaking. That what everyone does(like Carmack and DOOM3).

    2. Re:Laziness, ignorance or by akirchhoff · · Score: 3, Insightful

      In my experience, you can add, "don't want to pay for". Some of the places I have worked for aren't lazy, ignorant of the possibilities; they have made a deliberate decision to work cheap. They will accept the downtime from a quick and dirty design, rather than pay for better design. It's all in the numbers, how much will we lose if we are down.

  48. AI in the making by Anonymous Coward · · Score: 0

    I believe history will show that Google is man's first successful attempt at a system that has some amount of AI.

  49. I ordered fish because I want to taste the fish by hymie3 · · Score: 0

    so that pages can match even if none of the words in your query actually appear on the page

    Look, I put the phrase in quotes because *that's the phrase I'm looking for*. Lately, I've been getting results (even the cached version!) from searches which don't have the quoted strings I'm looking for. Grr.

  50. Google and it's 1980's search literal-mindedness by Theovon · · Score: 2, Insightful

    My wife is studying Library Information Science. In one class, she studied information retrieval. Here's what's interesting: It appears that although Google has much success with determining relevance by using PageRank, it's still very literal about the words you pick. Although it appears to do stemming (ie. 'runner' matches 'running'), it doesn't do anything about synonyms. Now, here, I'll point out that the the textbook for my wife's class was written in like 1995. In the SECOND CHAPTER, they talk about basic query techniques that make use of patterns in documents and AUTOMATICALLY derive what words are synonyms or in some way semantically related. These are long-solved problems. Some search engines employ human-generated lists of synonmyns, and there are whole databases you can download that contain semantic networks.

    So, WHY, I ask, is google only now getting around to using these techniques?

  51. "we can't crawl as fast as we would like" by SnprBoB86 · · Score: 3, Interesting

    Why not enhance the robots.txt format to include a max crawl rate variable? Let the webmaster specify how often a robot is allowed to crawl a page.

    --
    http://brandonbloom.name
    1. Re:"we can't crawl as fast as we would like" by pe1chl · · Score: 1

      Why not?
      Because this already can be specified in html metadata:

    2. Re:"we can't crawl as fast as we would like" by pe1chl · · Score: 1

      Why not?
      Because this already can be specified in html metadata:
      <meta name="revisit-after" content="7 days">

    3. Re:"we can't crawl as fast as we would like" by SnprBoB86 · · Score: 1

      Well excellent... put this in the Google FAQ for webmasters:

      http://www.google.com/webmasters/faq.html#toofas t

      --
      http://brandonbloom.name
    4. Re:"we can't crawl as fast as we would like" by nokilli · · Score: 2, Interesting
      That appears to have been done.

      Take a look at slashdot's robot.txt. First I've seen of the crawl-delay instruction.

      (and isn't it interesting how Google, MSN, and Yahoo have access to content on /. that all the other search engines are prohibited from crawling?)

  52. Crawling rate... by advocate_one · · Score: 1
    "In parallel, clusters of document servers contain copies of Web pages that Google has cached. Hoelzle said that the refresh rate is from one to seven days, with an average of two days. That's mostly dependent on the needs of the Web publishers.

    "One surprising limitation is we can't crawl as fast as we would like, because [smaller] webmasters complain," he said. "

    well, we could introduce a setting into robots.txt where we can tell google how often they can spider your site...

    --
    Donald 'Duck' Dunn: We had a band powerful enough to turn goat piss into gasoline.
  53. Re:Slightly related by Anonymous Coward · · Score: 0
    This is why we shouldn't let Mac users post to slashdot.

    How are Mac users ever going to overcome their reputation as blithering idiots if we keep letting them talk?

  54. And Debian in Gmail servers? by stm2 · · Score: 1

    When I tried to log into my gmail accout at the begining of the beta program, I got a Debian welcome screen.

    Posted in my blog.

    --
    DNA in your Linux: DNALinux
    1. Re:And Debian in Gmail servers? by googisgod · · Score: 1
      Once again, I have to ask the question- why do the slashdot editors insist on just publishing fluff pieces on Google? Why not mention the other side of the coin?

      There is a world of news out there that doesn't worship Google as the second coming of Christ, fer chrissake. Namely:

      http://www.fuckedgoogle.com/

    2. Re:And Debian in Gmail servers? by Anonymous Coward · · Score: 0

      I read FuckedGoogle weekly. However, whoever is behind it could be a bit more intelligent and go about the posting in a tad less flammy way.

  55. That's an English pint, you yob. by aristus · · Score: 1

    Bloody Imperial, not your wimpy pints.

    --
    Sometimes seventeen/Syllables aren't enough to/Express a complete
  56. Re:Has slashdot... by Anonymous Coward · · Score: 0

    Nope, that's the Indian version of Slashdot, you Insensitive CLOD!

  57. My all time favorite blog is back! by Anonymous Coward · · Score: 0
  58. Desktop distributed backup by Anonymous Coward · · Score: 0
    I'd like to see Google apply their knowledge to this area.

    Corporate desktops often have > 80gb of space per system

    Much of that space is going unused (if an average of 40gb/system is unused, even 100 desktops present us with 4tb of unused space!)

    With tech similar to Google's index/sharding/chunking concepts we could easily put that extra space to good use as a backup repository, and have adequate redundancy

    We'd need to add some good encryption, though

  59. "Each set of (servers)...one copy of the Web..." by getAttr · · Score: 1

    Reminds me of the old Steven Wright routine, where he says, "I have a full size map of the world at home...the scale says, "One mile equals one mile"..."

  60. What's worse by grahamsz · · Score: 1

    Trying to find non-commercial sites with information about a product you wish to purchase. It can be virtually impossible sometimes.

  61. Usefulness of classification by harmonica · · Score: 1

    With the additional keywords data and structure the search will automatically result in pages about computer science search trees only because the two words data and structure most likely will not appear on pages about forests. I don't see why any clustering technology is necessary. Maybe there is a better example? I have an intuitive feeling that making the computer "understand" groups of related topics can be of importance, but I don't quite see how to integrate that feature into a search engine (once you've solved the classification problem).

    1. Re:Usefulness of classification by InfiniteWisdom · · Score: 1

      Let me be more specific:

      There are pages that:
      A. Contain the words search, tree, data, structure -- Clearly related to the computer trees
      B. Contain the words search, tree -- the clustering algorithm figures out they're related to computer trees.
      C. Contain the words search, tree -- relate to forestry

      Now if I search for "search tree", the clustering doesn't help, and the search engine returns results from all three classes of pages.

      If I search for "search tree data structure", without clustering, the search can only return pages from class A if all keywords have to be present. If all keywords don't have to be present, it would return pages from A, B and C which gives lots of junk pages.

      With clustering, pages from classes A and B would get clustered together, so you can relax the keyword matching and allow pages with some keywords missing to appear in the results as long as they are in the same cluster. This allows you to get pages from class A and class B, which is closer to what you want than either just A or A+B+C

  62. Non-matching search results... by statemachine · · Score: 2, Interesting

    "they're starting to have success with automatic clustering of concepts, so that pages can match even if none of the words in your query actually appear on the page."

    I have yet to see a "hit" served up by google where it didn't have any words I searched for and it still be relevant. It's especially annoying when I search for exact phrases (such as an error message) and I get something completely different. It's a waste of time so far.

  63. Mod up, and pay attention, Google! by alienmole · · Score: 1

    Those of us with sites that can handle it want Google to index us! Bring it on, Google! Make my server your little Google-bitch!

  64. LISP on crack is nicer by SparafucileMan · · Score: 1

    d x (wget google.com) x

  65. it is not the clustering by vehakki · · Score: 1

    it is the double "o" . what is up with that? yah oo, g oo gle , micr o s o ft, n o rt o n (symmantec) etc. what is in c o mm o n?

  66. Follow the IPO by alienmole · · Score: 1

    Google recently went public. If you've drunk sufficient kool-aid, this makes them the last best hope of the tech industry. Imagine the the '90s Internet bubble all focused on one company.

    Let's just hope Google doesn't decide to exploit this laser-like market focus by spinning off a host of baby Googles: then you'll be able to buy a GoogleBox PC running a browser-based GoogleOS and make phone calls over your Googlenet connection, using the Googlephone service, and you'll look up phone numbers using Whoogle (for any Google IP lawyers reading this, call me for a license on that last one).

    I, for one, welcome our new over-Googlords.

  67. Re:Result: by alienmole · · Score: 1

    Alternatively, you could learn how to search properly.

  68. Map/Reduce by njord · · Score: 1

    I wonder where they got these ideas?

  69. Here's another article by Opus01 · · Score: 1

    The magic that makes Google tick http://www.zdnet.com.au/insight/software/0,3902376 9,39168647,00.htm

  70. don't read this message. by Anonymous Coward · · Score: 0

    so if i post my url here, http://www.dapoker.com , i will get better search results? Hey guys, don't go to my website, am just wandering if google will find this url and link this with my website.

  71. did for me by goon · · Score: 2, Informative
    • ... The "I'm Feeling LuckyTM" button automatically takes you to the first web page returned for your query.

      An "I'm Feeling Lucky" search means less time searching for web pages and more time looking at them. ...

    from the "I'm Feeling LuckyTM" button. Guess they changed it.
    --
    peterrenshaw ~ Another Scrappy Startup
    1. Re:did for me by Anonymous Coward · · Score: 0

      Type google as your query and click I'm Feeling Lucky.

  72. Re:Result: by nagora · · Score: 1
    Alternatively, you could learn how to search properly.

    Yeah, that's right. I've forgotten how to search. Nothing to do with PageRank being useless.

    TWW

    --
    "Encyclopedia" is to "Wikipedia" what "Library" is to "Some people at a bus stop"
  73. Re:Google and it's 1980's search literal-mindednes by darkmeridian · · Score: 1

    Many things are much easier said than done. The techniques have existed forever, but dependable and accurate implementation on affordable hardware that can handle the high traffic of searches on large datasets with high reliability is much harder than just pointing to a few pages in a textbook that gives out theory.

    --
    A NYC lawyer blogs. http://www.chuangblog.com/
  74. Re:Google and it's 1980's search literal-mindednes by shish · · Score: 2, Insightful
    *cough*

    It's not a great example, but my mind seems to have gone temporarily blank of words that have many synonyms :(

    --
    I mod down anyone who says "I will be modded down for this", regardless of the rest of their comment
  75. Re:Result: by alienmole · · Score: 1

    That's right, blame the tools. It couldn't possibly be you. Tell me what you want to find and I'll give you a short tutorial on how to find it. Google's not an AI, you know.

  76. Their Secret Is Out by SEWilco · · Score: 1

    Now that you've told everyone how it works, everyone will build one.

  77. Cooling Google by mparaz · · Score: 1

    Perhaps they have to switch to cooler processors or make modifications to CPU heatsinks.

  78. Re:Slightly related by Anonymous Coward · · Score: 0

    looks like someone's jealous of his maxed out karma, haha

  79. Re:Result: by nagora · · Score: 1
    Tell me what you want to find and I'll give you a short tutorial on how to find it. Google's not an AI, you know.

    I want authority. I think perhaps you've missed the point of my original post. There is so much uninformed yakking about things, mainly in blogs, that Google is no longer of any use to someone looking for even slightly non-trivial information. As an example of what I mean, the Titanic did not break apart on the surface. The idea that it did was about before the film was made but it rested on the testimony of a woman who was four at the time; no one else claimed to have seen the stern actually fall into the sea, with the large wave that would have produced.

    Now, look for information on the web about the sinking. How long do you have to look before you find out that information? It is there, but it's burried deep and I doubt that you would find it if you didn't know it was there. And a search engine that only finds things you were expecting is not much use really, is it?

    TWW

    --
    "Encyclopedia" is to "Wikipedia" what "Library" is to "Some people at a bus stop"
  80. Re:Result: by alienmole · · Score: 1

    Well, your origial post was rather concise, which made it tough to divine what was behind it. When it comes to authority, I think you've misunderstood what Google is. It's an index of the WWW, not an encyclopedia. Besides, encyclopedias, newspapers, TV news & documentaries etc. get things wrong too, sometimes spectacularly, and more often than you might think. To take the Titanic as an example, problems with the history of its sinking apparently go back to the original press coverage at the time, such as the way in which Hearst's media spun the story. Given that, it's hard to see how the problem you're describing has anything to do with Google, or even with the modern-day presence of blogs.

    If you want ultimate authority, there are a number of gods I can introduce you to, although you have to agree to believe unquestioningly (have "faith") in the authority of whichever one you pick. I'll note that if you picked Google as your god in this sense, you wouldn't go as far wrong as with some of the other choices out there. Short of that, google for epistemology, and you'll find that the problem you're concerned about isn't going to be solved by any algorithm, ever.

  81. Re:Result: by alienmole · · Score: 1

    I forgot about the tutorial I promised. It doesn't take checking more than a few sources to find significant differences in the account of the Titanic's sinking. That's not uncommon, and means you have to do some research.

    It's quite possible, on something like a historical point of detail, that Googling casually won't find you any answers. However, it will find you many sources, which have references, which you can track down.

    In this particular case, the information you're looking for is actually more like informed speculation & analysis, since the available eyewitness reports are unreliable and inconsistent. So, applying epistemological principles, you might say to yourself "how can I verify whether the few eyewitness reports about the breakup of the ship make sense?" Some answers to that are likely to be found in a technical analysis of the sinking, so you look for those.

    You can also ask yourself questions like "why didn't the stern make a huge wave which would have swamped the lifeboats?" Inconsistencies in your working hypothesis are useful for drilling down towards a more accurate model of the information you're looking for. You don't need to know in advance what you're looking for, but you need to know how to recognize when you don't know something.

    You test the information you find against your current model(s), and there can be give and take on both sides, i.e. you might use a model to provisionally reject certain information.

    There's no predetermined way of knowing when you've reached a final conclusion. Ultimately, it comes down to how much work you want to put into it, how much information is available, etc.

    When you arrive at a final conclusion, you might even find that it doesn't match any single information source out there. Who's right, you or they? It's difficult to say. If you were intent on answering that question, you'd need to examine the processes used to arrive at other accounts, if possible. A less rigorous process is likely to arrive at a less reliable answer.

    As an example, when Brian Williams reported on NBC news the other night that the lead judge in Saddam Hussein's tribunal had been killed, they had apparently received the information from multiple US officials, who in turn had received the information from government sources in Baghdad. Turns out they were wrong, it wasn't the lead judge. But NBC news reported the information as "confirmed", apparently based on having spoken to multiple US governement sources. Obviously, their grasp of these issues is rather limited -- even a superficial analysis would indicate that multiple sources in the US government might match merely because they all got their information from the same place. They made a mistake in reporting the identity of the victim as "confirmed", which they could have avoided if they had applied proper procedures, including asking how reliable their sources are, and whether their sources might be contaminated by common factors. If you don't apply that level of diligence to gathering information, you have to accept that the quality of your information will be lower.

  82. Re:Result: by nagora · · Score: 1
    I forgot about the tutorial I promised. It doesn't take checking more than a few sources to find significant differences in the account of the Titanic's sinking.

    The point I'm trying to make is that before Cameron's film version there was no issue with the story about the stern: it was simply not accepted by anyone who had studied the event, witness statements were not conflicting on this - onely one four year old said it happened, everyone else didn't. Now, searching on Google does not tell you anything about that, it returns the modern day controversy which is wholely unfounded. The point being that all Google's much-vaunted powers, which is what the story is about, are of no help if it can not tell the difference between wittering idiots and authorative studies when it ranks the pages. This makes the effort they're putting into searching seem rather misguided. In many cases returning the results ordered by date of indexing would be vastly more useful than whatever algorithm they really use, so why bother with it?

    The big problem is that many people seem to think Google is a research tool and it just isn't. It's really good at confirming preconcieved notions which are inherent in one's search terms but it is worse than useless for telling you what the important ideas or opinions are in many fields. Fields that have been touched by hollywood are particularly badly mangled by the pagerank system which treats a page about historical event like the Titanic sinking lower than a page about the film simply because people link to it.

    TWW

    --
    "Encyclopedia" is to "Wikipedia" what "Library" is to "Some people at a bus stop"
  83. A change in the Index by GoogleAdvisor · · Score: 1

    Has anyone noticed a lot more forum and blog threads having higher rankings on Google since about early February? Perhaps it's just me, but the Index seems cluttered with these threads of late. Brad,