Slashdot Mirror


Google's Bigger Index

WebGangsta writes "Google Inc. today announced it expanded the breadth of its web index to more than 6 billion items. This innovation represents a milestone for Internet users, enabling quick and easy access to the world's largest collection of online information."

412 comments

  1. Here's hoping by r_glen · · Score: 5, Interesting

    ... this will lead to an increase in the integrity of PageRank(TM), and vintage Google will return in all her glory.

    1. Re:Here's hoping by Rotting · · Score: 2, Funny

      Perhaps these are the pesky folks behind the Half Life 2 source code theft and the Windows source code theft and now their search engine includes LANs.

    2. Re:Here's hoping by Destoo · · Score: 5, Interesting

      So it's not just me..

      First, the reindex that happened a few months ago removed all cross-reference with accents.
      (where google would find the same number of links for both the word and the unaccentuated word... right now: soupcon: 9,750 - soupcon: 88,500)

      Then, when searching for anything regarding ras error messages, I get 30 links from spammer and then the real stuff.
      Example: 711 error yields multiple links for similar pages...
      "Your one stop resource for all things error 711 remote access connection
      management related. ... error 711 remote access connection management. ... "

      Vintage Google.. in Net years, that's 15-16 months ago, right?

      --
      Nouvelles de jeux et technologies en français. TC
    3. Re:Here's hoping by Anonymous Coward · · Score: 0

      I have been running into this a lot as well recently. I've been trying to repair an old Compaq laptop, and searches concerning stuff like ac adapters, cooling fans and error messages return huge loads of false positives (like "we're redesigning our informative site on Compaq laptop batteries, but while visiting why don't you buy a fuckload of stuff you're not even interested in") on top. This has got to stop.

    4. Re:Here's hoping by thestarz · · Score: 0, Troll

      From the front page of Google: (C)2004 Google - Searching 4,285,199,774 web pages

      Am I missing something here? Or is the above mentioned integrity indeed gone?

      --

      c++; /* this makes c bigger but returns the old value */
    5. Re:Here's hoping by thestarz · · Score: 3, Informative

      Yes, you are missing something. They have reached 6 billion items, only 4 billion of those are web pages, the rest are pictures, usenet messages, etc. RTFA!

      --

      c++; /* this makes c bigger but returns the old value */
    6. Re:Here's hoping by Anonymous Coward · · Score: 0

      This is probably just their new stupid inclusion of dynamically searching other sites/databases into their "pages".

      I fucking HATE how in the last few months, when I search for something, Google doesn't give me the top search results for it - but just gives me a page full of links to other search engines. Many of these engines are merely link farms that coil around endlessly like a self-eating snake and contain NO content.

      If I wanted to search other engines, I'd search other engines.

    7. Re:Here's hoping by DeadSea · · Score: 3, Interesting
      Google does deal with spammers of the sort that you pointed out. It does take some prodding though. Last time that I found one of these, I submitted it to their problem report form on google.com. After a month nothing had been done. I then posted it in a slashdot comment that got modded up. A day later all the spammers were gone.

      Google search: 711 error

      Come on, Google. Stop reading slashdot and fix the problems.

    8. Re:Here's hoping by vanillacoke · · Score: 1

      That's not googles doing, that's someone google bombing google (hehe) and making sites like search-now-really-fast.net or a long-domain-name-simliar-to-wording-of-your-search .com rank at tops of list for stuff like toms hardware and bias....

      These people don't want to pay teh google for a legitimate place on the right side of the search engine and for that they don't get one dime from me....

      --
      The secret to getting modded up is to allways say i've got karma to burn in your sig..
    9. Re:Here's hoping by Anonymous Coward · · Score: 0

      Talking to yourself? What the hell?

    10. Re:Here's hoping by Destoo · · Score: 1

      today: search for soupcon and soupc,on (with the cedille) both yield the same number of results..

      Let's forget about bug reports on google.
      We just need to report the bugs on slashdot!!

      --
      Nouvelles de jeux et technologies en français. TC
  2. It could be much smaller ;-) by ChaoticChaos · · Score: 5, Funny

    ...yeah, but it would only be 2 billion items if all the Janet Jackson stuff was removed. ;-)

    1. Re:It could be much smaller ;-) by Lev13than · · Score: 5, Funny

      ...yeah, but it would only be 2 billion items if all the Janet Jackson stuff was removed. ;-)

      And if they'd just stop indexing blogs, the entire Internet would fit onto a CD.

      --
      When you have nothing left to burn you must set yourself on fire
    2. Re:It could be much smaller ;-) by kilonad · · Score: 5, Funny

      But... but... this company called AOL keeps shipping me the entire internet on a CD all the time!

    3. Re:It could be much smaller ;-) by fredrikj · · Score: 5, Funny

      And if they'd just stop indexing blogs, the entire Internet would fit onto a CD.

      You could fit the blogs on a CD as well. Just store a template blog and include a program to generate random variations, e.g. "my dog has fluffy fur today" vs "my cat has fluffy fur today".

      Technically, this would be "lossy compression" (since some data is deprecated but no one will notice the difference). Though on the other hand, it could even be argued that removing blogs entirely would be a form of "lossless compression".

    4. Re:It could be much smaller ;-) by kevin_ka · · Score: 5, Funny

      And if all the pron was removed there would be only 1 website left and that would be a petition to bring back the porn

    5. Re:It could be much smaller ;-) by tds67 · · Score: 1
      ...yeah, but it would only be 2 billion items if all the Janet Jackson stuff was removed.

      The increase should reduce the number of "Google malfuctions".

    6. Re:It could be much smaller ;-) by lavaface · · Score: 4, Funny

      Interesting? Interesting?!? Great jeebus, the legends are true. The swarms of AOL subscribers have discovered Slashdot and are slowly assimilating OSDN into AOL Time Warner!

    7. Re:It could be much smaller ;-) by Anonymous Coward · · Score: 0
      You could fit the blogs on a CD as well. Just store a template blog and include a program to generate random variations, e.g. "my dog has fluffy fur today" vs "my cat has fluffy fur today".
      Hey! Don't forget the endlessly fascinating "Bush sucks" "No Kerry sucks" blog-debate. How could we survive without the biting satire our nations bloggers provide.
    8. Re:It could be much smaller ;-) by jdavidb · · Score: 4, Funny

      Work has already been done on this. Have you seen the Markov blogger on use Perl? Soon all bloggers will be replaced with a Perl script.

    9. Re:It could be much smaller ;-) by Anonymous Coward · · Score: 0

      Hey, there is more than one country in the world.

    10. Re:It could be much smaller ;-) by BuckaBooBob · · Score: 1

      Generating Blogs from templates would be gainy compression rather than lossy :) Lossy you loose information... gainy would be I guess gain more content than you origonally had :) Infact you might even come up with some interesting blogs that haven't even been written yet :)

      "My hairless cat has fully fur today" :)

      --
      Who needs WiFi when we can have Packet Over Sheep! http://datacomm.org/PoS-InternetDraft.txt
    11. Re:It could be much smaller ;-) by double-oh+three · · Score: 4, Funny

      No, two websites; Fark would be hosting the petition, and Slashdot would be redirecting all internet traffic there.

      --
      "For years, I struggled with reality... but I'm happy to say I finally won out over it." -- Elwood P. Dowd
    12. Re:It could be much smaller ;-) by Anonymous Coward · · Score: 0

      >yeah, but it would only be 2 billion items if all the Janet Jackson stuff was removed

      From Google.com---(C)2004 Google - Searching 4,285,199,774 web pages

      YOU'RE CLOSE!

    13. Re:It could be much smaller ;-) by Just+Some+Guy · · Score: 1

      The writers of NBC's "Scrubs" thank you for using their (unattributed) quote.

      --
      Dewey, what part of this looks like authorities should be involved?
    14. Re:It could be much smaller ;-) by poot_rootbeer · · Score: 1, Funny

      Though on the other hand, it could even be argued that removing blogs entirely would be a form of "lossless compression".

      You do realize that Slashdot itself qualifies as a blog, right?

      Stop for a moment and think about what you're sug+++ NO CARRIER

    15. Re:It could be much smaller ;-) by warpath · · Score: 1

      Thank you.

      I was trying to remember where I heard that before.

    16. Re:It could be much smaller ;-) by HalliS · · Score: 0
      --


      My other UID is 1337
    17. Re:It could be much smaller ;-) by CreatureComfort · · Score: 1

      And there are those who believe this has already happened...

      --
      "Unheard of means only it's undreamed of yet,
      Impossible means not yet done." ~~ Julia Ecklar
    18. Re:It could be much smaller ;-) by Anonymous Coward · · Score: 0

      It's called "Time Warner" now. No more AOL.

    19. Re:It could be much smaller ;-) by Anonymous Coward · · Score: 0

      Frankly, I didn't think she was that......big.

    20. Re:It could be much smaller ;-) by Anonymous Coward · · Score: 0
      Lossy you loose information...

      Wow. You actually said 'lossy' and 'loose' in the same sentence. Please, either get them both right, or both wrong!

    21. Re:It could be much smaller ;-) by damiam · · Score: 1

      Maybe by someone's definition, but I think of Slashdot as a news site. A blog is a place where people (usually one person) post about what they've been doing, or their ideas. Slashdot is a place where a bunch of editors pick through external submissions and post them. They're really not all that similar.

      --
      It's hard to be religious when certain people are never incinerated by bolts of lightning.
    22. Re:It could be much smaller ;-) by fredrikj · · Score: 2, Funny
    23. Re:It could be much smaller ;-) by DotQuantum · · Score: 1

      well, it would only be two web site for a short time till /. kills fark.

      --
      -- Ben --
  3. how many? by QuantumRiff · · Score: 4, Interesting

    How many of these 6 billion items are in the form of www.massivepopups.com/your_search_term.html

    --

    What are we going to do tonight Brain?
    1. Re:how many? by sensei_brandon · · Score: 5, Funny

      exactly. I searched for "diode wave shaper" one time and got three hits -- all for porn. I had no idea diodes were so fap-worthy.

    2. Re:how many? by Anonymous Coward · · Score: 5, Informative
      That sort of search result spamming is getting out of hand.

      Maybe if more people used Google's Search Quality feedback form, it would help weed them out.

    3. Re:how many? by Anonymous Coward · · Score: 1, Informative

      Your google is broken. Mine gets me a PDF of a wave-shaper circuit layout see

    4. Re:how many? by Anonymous Coward · · Score: 4, Funny

      So, you're into diode wave shapers, heh? You kinky bastard!

    5. Re:how many? by Anonymous Coward · · Score: 0

      I do!

      And I'm as disgusted as the rest of you at those websites spamming google.

      Personally, I wish there was some way to burry all spammers under a mountain of SPAM (the Hormel kind) for the rest of their lives or something.

    6. Re:how many? by hippycow · · Score: 0

      You may want to proof-read your searches for Freudian slips before hitting ENTER. This may cut down on searches for "Dildo Wave Shapers" and the like.

    7. Re:how many? by BuckaBooBob · · Score: 2, Interesting

      I would like to see a new element added to reduce ranking based on the number of pop-ups contained in pages indexed or linked to on sites :) That would really kill alot of the garbage sites that skew their rankings and in the same breath reduce the need to pop-up blockers :)

      --
      Who needs WiFi when we can have Packet Over Sheep! http://datacomm.org/PoS-InternetDraft.txt
    8. Re:how many? by Anonymous Coward · · Score: 0


      Hmm, I just did it and it seems fine.

      > Results 1 - 10 of about 1,630.
      > Search took 0.10 seconds

      ???

    9. Re:how many? by UserGoogol · · Score: 1

      He searched the exact phrase "diode wave shaper." You did not.

      --
      "Never attribute to malice that which can be adequately explained by stupidity." -- Hanlon's Razor
    10. Re:how many? by Anonymous Coward · · Score: 0, Redundant

      What are you talking about? Diode wave shaping is natural and beautiful. It's never anything to be ashamed of.

      Triode wave shaping, on the other hand... now that's a totally different story. It should have been outlawed from day one.

    11. Re:how many? by dapyx · · Score: 1

      I searched for vibrant string and got Bikini-Panties.

      --
      I'm sorry, the number you have dialed is an imaginary number. Please rotate your phone 90 degrees and dial again.
    12. Re:how many? by Anonymous Coward · · Score: 0

      Can someone explain to me how the hell those pages work anyway? Surely it can't be that hard for google to block them if they wanted to.

  4. I am in here somewhere... by Anonymous Coward · · Score: 0

    Try to find me! ;-)

    No... er... wait, no, on second thoughts, don't search for me I don't want " these pictures " to surface again... I was young and I needed the money and all that... ;-)

    [Posted anonymously to protect the guilty, of course!]

  5. Heh by PaintyThePirate · · Score: 5, Interesting

    Anyone else find it funny that Google has around one item for every man woman and child on earth?

    1. Re:Heh by Xtraneous · · Score: 4, Funny

      The pigeons that the use in the Pigeon Ranking are preparing to attack

      --
      .noitacidem deen uoy siht daer nac uoy fI
    2. Re:Heh by etnoy · · Score: 1

      Sure! It's interesting
      hope I didn't get a popup page tho

      --
      Quantum hacker.
    3. Re:Heh by Attaturk · · Score: 5, Insightful

      Anyone else find it funny that Google has around one item for every man woman and child on earth?

      I'd find it funnier if every man woman and child on earth at least had unrestricted access to Google and everything it links to.

    4. Re:Heh by Doesn't_Comment_Code · · Score: 4, Funny

      Anyone else find it funny that Google has around one item for every man woman and child on earth?

      With my luck, I bet my one item is a page with prescription drugs and weightloss suplements at bargain prices.

      I hope your item is better.

      --

      Slashdot Syndrome: the sudden, extreme urge to correct someone in order to validate one's self.
    5. Re:Heh by Anonymous Coward · · Score: 3, Funny

      You must be a humanist. As a geek, I found it funny that Google has around one item for every bit on a CD-ROM.

    6. Re:Heh by perdelucena · · Score: 1

      Is it more than a googol? If not I am still not impressed.

    7. Re:Heh by Anonymous Coward · · Score: 0

      The first government link on a Google search for "world population counter" yielded a current count of 6,348,948,021 at 2/17/04 at 16:31:05 GMT. So Google will need to get about 350,000,000 more if they want to keep up with your statistic, much less keeping up with the population growth of about 200,500 people every day (according to last month).

      Ref-
      http://www.census.gov/cgi-bin/ipc/popclock w

    8. Re:Heh by kfg · · Score: 4, Insightful

      In the same sense that I find it funny that my book collection contains about 6 billion words, one for every man, woman and child on earth.

      In other words, no, can't say that I do.

      Not only is it an entirely artificial milestone devoid of meaning even in the sense of interesting coincidence, it's an artificially created "milestone" for the purpose of pointing it out.

      Any marketing department can churn out such by the barrel full.

      KFG

    9. Re:Heh by Anonymous Coward · · Score: 5, Funny

      One page for every man woman and child. That sounds exactly like the thinking of a machine to me.

    10. Re:Heh by rylin · · Score: 5, Funny

      My page was taken offline by the .cx registry

    11. Re:Heh by betelgeuse-4 · · Score: 1

      Some unlucky group of people will be assigned the goatse.cx pages. This could be a very large group depending on how many mirrors of it there are.

    12. Re:Heh by Eslyjah · · Score: 4, Interesting

      Well, we're a bit over 6 billion now. It's more like 6,348,951,839. Wait. Now it's 6,348,951,840. And now 6,348,951,841...

    13. Re:Heh by byolinux · · Score: 1

      It was offline, I think it probably still is.

      Just do a Google Image Search on 'Goatse' you'll see why shouldn't have.

      It's a shame to see it go... now what am I going to offer to my enemies as links?

    14. Re:Heh by Threni · · Score: 1

      > I'd find it funnier if every man woman and child on earth at least had
      > unrestricted access to Google and everything it links to.

      If your definition of `unrestricted` allows `during library opening hours` then your dream is currently a reality in the UK.

    15. Re:Heh by Threni · · Score: 0

      So it's true - there IS one born every minute!

    16. Re:Heh by Anonymous Coward · · Score: 0

      lemonparty.org isn't quite as scary, but is disturbing none-the-less.

    17. Re:Heh by Anonymous Coward · · Score: 0

      I found a replacement link that is not as disgusting, or at least as much but in a different direction and more work friedly but equally shocking ;)

    18. Re:Heh by ktanmay · · Score: 2, Interesting

      With more than 50% of them not even aware of what google is.
      If a few hundred million people can generate more than 6 billion pages, just imagine what number all of humanity can produce?

    19. Re:Heh by Anonymous Coward · · Score: 0

      yeah right

      at least one library i know filters internet access (and yes i am in the uk) they filter it less for adults than for kids but it is filtered

    20. Re:Heh by negacao · · Score: 0

      But the medication that makes me not read that interferes with the medication makes the voices go away!

    21. Re:Heh by glpierce · · Score: 1

      Anything below the hundred-thousands place is worthless - it's not verifiable and no one can claim it is accurate. I suppose it's just an attempt to make the transition between big, round numbers seem more "real".

      --
      G
    22. Re:Heh by greenhide · · Score: 1

      If your definition of "every man woman and child on earth" encompasses only people living in the UK, then obviously you have no clue what the parent poster was talking about.

      --
      Karma: Chevy Kavalierma.
    23. Re:Heh by Anonymous Coward · · Score: 0

      Harlequin fetus! Dude!

    24. Re:Heh by permaculture · · Score: 1

      MORPHEUS: Why not? [an item] for every man, woman, and child in Zion. That sounds exactly like the thinking of a machine to me.

      --
      Environmentalism is the new Victorianism. Everyone ties on a green corset and pretends we're virtuous.
    25. Re:Heh by cynicalmoose · · Score: 0, Redundant

      Or has anyone else noticed that Google's front page still says Searching 4,285,199,774 web pages

      Doesn't sound like 6 billion to me

      --
      Exercise your right not to vote. thinkoutside.org
    26. Re:Heh by builderbob_nz · · Score: 0

      And this time Keano Reeves isn't here to help us... oh wait, that's a good thing isn't it?

      --

      Karma? Hey I just call it as I see it.
    27. Re:Heh by CdnYoda · · Score: 1

      Think, one must. =) It took several tens of thousands of years for people to reach six billion. It has taken considerably shorter time for the internet to reach 6 billion items...and consider that just as war, disease, etc. has eliminated many people, many pages have been deleted, taken off line, etc. I think the number of internet items is going to rapidly outpace human population growth...millions of people making millions of web pages, etc. each day...I just wonder if google et al. will be able to keep up. We are going to be deluged with more and more results in the future, far beyond world population numbers... Spoken, I have! There is much more to learn, my young padwans... =)

      --
      -- "May the Source be with you!"
    28. Re:Heh by jez9999 · · Score: 1

      Oh dear, what were you up to then? NaughtyGirlsOfChristmasIsland.cx?

    29. Re:Heh by FreakWent · · Score: 1

      our library filters chat sites and porn sites in a reactive manner, ie when we find someone using it, we block it.

      So there's no actively filtering software. Generally, chat and e-mail is a bigger headache than porn, it ties up a PC for longer.

    30. Re:Heh by Anonymous Coward · · Score: 0

      Are you aware that your post was the 8,305,633rd on /.?

  6. Most press-release like post ever by Chris_Jefferson · · Score: 5, Insightful

    While I love google, this is so obviously just a link to a press release, and even worse the first line of the press release cut-and-pasted onto slashdot's page. And is going past 6 billion really that important?

    --
    Combination - fun iPhone puzzling
    1. Re:Most press-release like post ever by twilight30 · · Score: 5, Insightful

      What sucks about the press release (indeed, makes it sooo press releasy) is the total lack of anything that makes it useful:
      * "...to 6bn" : From what number before?

      And I still can't find what I'm looking for! (pun definitely not intended)

      --
      ========================================
      Death will come, and will have your eyes
      -- Pavese
    2. Re:Most press-release like post ever by CuOsc · · Score: 1

      The bit that gets me is the use of the word innovation.

      There's nothing fundamentally new here - and it's parroted press release crap like this that lets companies get away with 'innovating' pop-up blockers.

      Anyway, I'm off to innovate some food for dinner...

    3. Re:Most press-release like post ever by glinden · · Score: 2, Interesting
      • this is so obviously just a link to a press release
      It really is an uninformative press release. Surprising it made it to Slashdot.

      I would have liked to see some information about the underlying technology that allowed this bigger index, especially if it allowed the broader coverage without a reduction in search result quality.
    4. Re:Most press-release like post ever by Zardoz44 · · Score: 1
      My view of Google says:

      (C)2004 Google - Searching 4,285,199,774 web pages

      I seem to remember this going from 2-something, to 3-something, and I could swear they were in the 4's for a while now.

      Just think of how big this will get then they finally buy that 64-bit system. They seem to be approaching their 32-bit limit. (/joke)

    5. Re:Most press-release like post ever by harmonica · · Score: 1

      It's interesting however that the image index has been updated. That happens not so often and was long overdue.

    6. Re:Most press-release like post ever by gantrep · · Score: 1

      Last week they were around 3 billion for sure. I had a bet with a friend over it. I think it was 3.3 actually.

    7. Re:Most press-release like post ever by Anonymous Coward · · Score: 0
      It really is an uninformative press release. Surprising it made it to Slashdot.
      It is? ;-)
    8. Re:Most press-release like post ever by jdogs60 · · Score: 1

      5,999,999,999?

    9. Re:Most press-release like post ever by Anonymous Coward · · Score: 1, Informative

      "From 3.4bn to 6bn"

      That number will likely exceed 10 billion in the near future. Some Google projects are resource constrained, which is astounding considering that the company's computational resources are actually *greater* than publicly disclosed. The scale of the operation is something that most people (in IT, or otherwise) can hardly imagine. Suffice to say that Google is unusual in that marketing people routinely *understate* the numbers that competitors would gleefully overstate.

      It is disturbing that no one, not even Microsoft, may be able to catch up to Google for quite some time, simply because of the orchestrated efficiency of Google's processes and the scale of the deployed infrastructure (sorry, I cannot offer any more specifics, it is in my NDA). That is not good for competition, especially with pond-scum word-spammers and useless blog fluff posing a structural challenge to PageRank.

      Society could do worse than having a Googleopoly on search. Google is run by good people (ask anyone who works for Google, or any of Google's vendors) and puts a lot of effort into doing the Right Things. Nonetheless, healthy competition is preferable to a comfortable stagnation.

    10. Re:Most press-release like post ever by FreakWent · · Score: 1

      Can you comment on the "good people" side of orkut, given how many people have been complaining about it?

    11. Re:Most press-release like post ever by twilight30 · · Score: 1

      Thanks. Didn't see that, obviously wasn't reading carefully enough.

      --
      ========================================
      Death will come, and will have your eyes
      -- Pavese
  7. googlebombs away by Perianwyr+Stormcrow · · Score: 1, Funny

    And of course, 2 billion of that is goddamn blogs.

    --

    What we call folk wisdom is often no more than a kind of expedient stupidity.-Edward Abbey

    1. Re:googlebombs away by Doesn't_Comment_Code · · Score: 0, Offtopic

      And of course, 2 billion of that is goddamn blogs.

      I run a website for a living. And while I really don't like blogs, Google's PageRank algorith (or what's left of it) normalizes all the pages it know of on the web. So in order for me to make my page have higher rank, there have to be a whole lot of crappy pages out there with lower rank.

      I'm not thrilled with all the junk/info overload out there. But there is a silver lining.

      --

      Slashdot Syndrome: the sudden, extreme urge to correct someone in order to validate one's self.
  8. Google thumping its chest? by LostCluster · · Score: 4, Interesting

    What's going on here? This isn't like Google to put out a press release just because the index size just past a round number.

    Is Google setting up for its IPO and therefore becoming less like the Google we know and love?

    1. Re:Google thumping its chest? by Joel+Bruick · · Score: 2, Interesting

      It did the same thing over two years ago. Please, Google and stock market trolls, think before writing.

  9. The real question by Anonymous Coward · · Score: 3, Interesting

    Did they hit some sort of internal limit just above 4 billion? Were they using an unsigned int? Is that why all these extra items are in a "supplemental" index?

    1. Re:The real question by Anonymous Coward · · Score: 0

      64-bit all the way

    2. Re:The real question by autocracy · · Score: 1

      As far as other indexes go, probably more because every time I see "supplemental result," the content is basically the same as all the others. I think it's just a nice high number they can brag about otherwise.

      --
      SIG: HUP
  10. 6 Billion by use_compress · · Score: 2, Funny

    There will soon be more web pages indexed in Google than people. I, for one, welcome our HTML overlords!

  11. Google, over 6 billion served. by Anonymous Coward · · Score: 5, Funny

    They beat McDonalds.

  12. Bah... by Anonymous Coward · · Score: 1, Funny

    Man, just imagine.. how much of this information are the thoughts of stereotypical, american, teenage girls... And wepages about how so much stuff is "cute" and how great their webpage is.... ... Someone forgot to flush ...

    1. Re:Bah... by Anonymous Coward · · Score: 1, Funny

      Man, just imagine... how much of this information is pictures of teenage girls... And webpages about how many "cute" friends they have, and how great their webpage is.... ... Someone forgot the tissues...

  13. Milestone by Doesn't_Comment_Code · · Score: 3, Funny

    Google Inc. today announced it expanded the breadth of its web index to more than 6 billion items.

    One for every man, woman, and child. Sounds exactly like the thinking of a machine to me.

    --

    Slashdot Syndrome: the sudden, extreme urge to correct someone in order to validate one's self.
    1. Re:Milestone by nolife · · Score: 1

      Damn, same exact comment posted twice and both made +5. I was going to wait until the dupe story of "Google's Bigger Index" and repost it then but it would be to obvious now ;)

      --
      Bad boys rape our young girls but Violet gives willingly.
    2. Re:Milestone by Doesn't_Comment_Code · · Score: 1

      Ah, but my post had a Matrix reference, which appeals to our demographic!

      --

      Slashdot Syndrome: the sudden, extreme urge to correct someone in order to validate one's self.
    3. Re:Milestone by Anonymous Coward · · Score: 0

      Ummm... both comments were exactly the same. How could yours be a Matrix reference, and the other one not be?

  14. Related? by SkiddyRowe · · Score: 5, Funny

    In a related story Booble's index just expanded to a Double-D.

    Little boys across the globe will have sore arms tommorrow.

    1. Re:Related? by vinlud · · Score: 1

      I also heard they're going to offer a 'Peter North toolbar' soon

      --
      Repeat after me: We are all individuals
  15. Marching In Step by JavaSavant · · Score: 1

    At least *try* to obscure the fact that this was taken from a press release. Slahsdot is beginning to sound like the Iraqi Information Minister...

    1. Re:Marching In Step by WebGangsta · · Score: 4, Interesting
      My comment was left off the posting indicating that I noticed the change in "number of hamburgers served" message on the Google home page this morning, leading me to wonder what other changes we should be looking for today (and hence leading me to this news, albeit a press release - Search Engine Watch didn't have it mentioned on their home page at the time).

      And the press release doesn't say that they're indexing over 6B pages, so anyone who's saying that here is mistaken.

    2. Re:Marching In Step by Anonymous Coward · · Score: 0

      The infidels have not indexed our database. They are throwing down their keyboards. They turn away at the walls of our datacenter.

    3. Re:Marching In Step by kevin_ka · · Score: 1

      Slahsdot is beginning to sound like the Iraqi Information Minister...
      Nah, if that were the case then we would be getting stories like "Linux on the Desktop up to 70%", "www.windows.com uses Apache" ...

    4. Re:Marching In Step by millette · · Score: 1

      that's 6e6 items: pictures, usenet posts, etc.

    5. Re:Marching In Step by millette · · Score: 1
      that's 6e9 items: pictures, usenet posts, etc.

      P.S.: fixed a typo, oups...

  16. It's only a matter of time.. by pacsman · · Score: 5, Interesting

    I'm waiting for them to come up with a sound search and an image search that look at the subject of the image rather than its file name. After that I'm not sure what's left. Maybe comparative searches for sounds and images, where you can upload a source to compare? Who knows! I hope these guys don't follow the normal path of spiralling into inconsequence after they go public.

    1. Re:It's only a matter of time.. by Tango42 · · Score: 2, Insightful

      An subject based image search would require people to state what the subject was. That might be an important step towards a sematic web, if you include everything on the web, rather than just images.

    2. Re:It's only a matter of time.. by misof · · Score: 5, Insightful

      As far as I know, image search in the way you want it is still only a dream. But. Approx 2 years ago I attended a conference focused (mainly) on theoretical computer science. I saw some researchers (I think they were from Italy, not sure) present an early implementation of their algorithm to look for similar images to the one you select.

      The idea behind: For a computer, it's not easy to tell what exactly does an image contain. E.g. take all those "type the word you see above inside this box to prove you are not a bot" registration forms. If there are no working algorithms to tell "this image contains the word SLASHDOT written in yellow and blue stripes on a pink-dotted black background", the chances of creating an algorithm to tell "this is a game of tennis, it is probably played in the afternoon somewhere in England" are really low.

      However, by using various approaches from CG (comp. graphics), you MAY be able to tell whether two images are similar or not -- as simple examples consider edge detection, color spectrum, etc. As I already mentioned, such algorithms have already been implemented and their success ratio is already reasonably high. I expect that it won't take long until we see them on google.

      Note that using the ideas above you CAN search for an image with a given subject -- it just requires two stages. Suppose you want an image of a sun setting down somewhere in the mountains. Stage 1. You enter "sunset" into google's present search engine. You get lots of sunsets, several dogs named Sunset, a chinese girl Sun Set, etc. Then you select one of the sunsets most resembling the image you want and you tell google (or some other engine) to find all similar images. Et voila.

    3. Re:It's only a matter of time.. by autocracy · · Score: 1

      I don't think Google going public will be a big issue - they only offered up a small non-controlling portion to raise funds, and even then that's been postponed indefinitely (re: cancelled).

      --
      SIG: HUP
    4. Re:It's only a matter of time.. by harmonica · · Score: 1

      I saw some researchers (I think they were from Italy, not sure) present an early implementation of their algorithm to look for similar images to the one you select.

      Content-based image retrieval isn't that new. Check out GIFT. It should even be possible on a grand scale, given Google's resources (hardware and know-how). However, personally I don't think finding similar images is that useful. I never had the necessity to find similar images. At least not the kind of similarity retrieved by those tools: similar in the sense of "another image containing a dog catching a frisbee" doesn't work, because a second matching image could have totally different characteristics from the first one.

      Searching with keywords I do find useful. But it's a long time until image understanding will really work, I'm afraid. Until then, Google's approach (keywords in the file name and text near the image) must suffice.

    5. Re:It's only a matter of time.. by Anonymous Coward · · Score: 0

      >I'm waiting for them to come up with a sound search
      Actually how hard could a music search be. I was thinking something along the line of huming a tune to your microphone, the computer would brake it into notes and compare with a db.

    6. Re:It's only a matter of time.. by Anonymous Coward · · Score: 0

      I know this is picky, but I feel obliged to point out that the sound of a fricative "s" followed by a plosive "t" doesn't exist in Chinese. So, you wouldn't find a reference to a Chinese girl named Sun Set. While the letters "sun" can easily represent a homonym for many Chinese characters, the same cannot be said for "set." Perhaps in another asian language, but not in Chinese.

    7. Re:It's only a matter of time.. by ksiddique · · Score: 1

      This isn't exactly a search engine but it helps out trying to identify music. I've used it with my CDs and it works surprisingly well.

      " MusicBrainz is a community music metadatabase that attempts to create a comprehensive music information site. "

    8. Re:It's only a matter of time.. by Anonymous Coward · · Score: 0

      car keys...

  17. And let's hope it stays that way! by dot-magnon · · Score: 1

    I have Google as my number one source of information on the internet. I hope they will keep going like they have for years and years, and (no, I dislike monopolies) that they will withstand competition from others such as MSN. Which I believe will be the truth.

    Not that this is something to celebrate, because having 6 billion pages alone does not tell me that they're the greatest of all search engines and will exist for a long time. And it's not like it's some kind of jubilee. But still: Way to go Google! :)

    1. Re:And let's hope it stays that way! by Walkiry · · Score: 3, Funny

      I have Google as my number one source of information on the internet.

      Whatever happened to The Onion?

      --
      ---- Take the Space Quiz!
    2. Re:And let's hope it stays that way! by dot-magnon · · Score: 1

      I don't know. It hurts, but it made me cry.

  18. One page/human by etnoy · · Score: 1, Redundant

    6 billion would mean about one page for every person in the world! Wehee! (according to UN, the world's popultaion is about 6 000 000)

    --
    Quantum hacker.
    1. Re:One page/human by Tango42 · · Score: 1

      Wrong number of 0s. The worlds population is about 6 000 000 000, which is indeed 6 billion. (in fact, it's more like 6.3 billion)

    2. Re:One page/human by Tow_cow · · Score: 1
      (according to UN, the world's popultaion is about 6 000 000)

      I don't know what 'popultaion' is, but the population is about 6 000 000 000

  19. A company spokesman added... by Boing · · Score: 5, Funny

    ...that remarkably, a full five-sixths of the content consisted of different versions of the Google logo.

    1. Re:A company spokesman added... by graniteMonkey · · Score: 1

      The rest of the content consisted of webpages offering to help you search for different versions of the Google logo.

      --

      This is a manual virus. Copy it to your sig and help me spread!
  20. 4.28 billion web pages... by hanssprudel · · Score: 3, Interesting

    2^32 = 4.29 x 10^9

    Does it sound to anybody else like the rumours of Google hitting a deadend in the number of index position for the websearch are true? Especially given that it has been more than a year since they announced 4 billion.

    Apparently pagerank assigns an unsigned int to every page as id, and their index is so huge they cannot convert it to a 64 bit number. (You wonder why they didn't think of that 2-billion pages ago when a UTF8 like solution would still have been possible).

    1. Re:4.28 billion web pages... by JediTrainer · · Score: 5, Funny

      That reminds me of an old Dilbert (paraphrasing here, forgive the small errors):

      PHB: We've run out of accounting codes! We can't do anything without one!

      Dilbert: Why not upgrade the system to accept larger codes?

      PHB: To do that we'd need a budget and an accounting code

      Dilbert: Why can't we reuse a code from an old finished project?

      PHB: Strangely enough, we've never finished a project.

      --

      You can accomplish anything you set your mind to. The impossible just takes a little longer.
    2. Re:4.28 billion web pages... by Anonymous Coward · · Score: 0

      Mod parent up!

      Google claims it's indexing 4,285,199,774 web pages.

      log_base_2(4285199774) = 31.996715

  21. What I want to know... by Bob+McCown · · Score: 5, Interesting

    ...is how to get rid of those pseudo-pages in Google. The ones with names like "thing_that_youre_searching_for.html", and all they are is either a page of dead links to crap on ebay, or a "Hey, we do great searches for your stuff".

    1. Re:What I want to know... by Doesn't_Comment_Code · · Score: 1

      I agree!, Mod parent up!

      They are a terrible disservice. They waste out time searching, and should be on Google's top ten list. Google has made a lot of public statments about integrity of search, and returning the most relevant, information rich pages. Yet a couple of those pages always wind up on the first SERP!

      --

      Slashdot Syndrome: the sudden, extreme urge to correct someone in order to validate one's self.
    2. Re:What I want to know... by ctishman · · Score: 5, Informative

      Use that "Dissatisfied with your search results? Help us improve." link at the bottom of the page. Voila.

    3. Re:What I want to know... by Chris+Croome · · Score: 3, Informative

      ...is how to get rid of those pseudo-pages in Google. The ones with names like "thing_that_youre_searching_for.html", and all they are is either a page of dead links to crap on ebay, or a "Hey, we do great searches for your stuff".

      +1

      There are things that you just can't use Google for any more becaues these googlespam sites score so well... it's like being back in the days before google...

      --
      Check out MKDoc a mod_perl CMS
    4. Re:What I want to know... by samcentral2000 · · Score: 5, Insightful

      I totally agree. These day, whenever I use google, I always include "-search" in my search. Cleans it right up :)

    5. Re:What I want to know... by Anonymous Coward · · Score: 1, Interesting

      Just how do these sites know what I was searching for anyway? They don't have a cross-referenced page for every word in the dictionary, do they?

  22. "...represents a milestone..." by stratjakt · · Score: 5, Insightful

    No it doesn't. It represents a pretty reasonable upgrade for Google.

    It's expected as the web grows, so will the search engines.

    This isn't exactly a man-on-the-moon accomplishment.

    --
    I don't need no instructions to know how to rock!!!!
    1. Re:"...represents a milestone..." by LostCluster · · Score: 1

      It represents the fact that the total of the ever increasing numbers at the bottom of their main search pages passed a round number, that's all.

    2. Re:"...represents a milestone..." by daeley · · Score: 1, Funny

      Jeez, buddy, did somebody whiz in your Cheerios this morning? Give 'em a break, it's at least cool enough to merit a /. story, even if it doesn't meet up with the apparent need to land a man on the moon to warrant the barest modicum of excitement.

      --
      I watched C-beams glitter in the dark near the Tannhauser gate.
    3. Re:"...represents a milestone..." by stratjakt · · Score: 0, Troll

      it's at least cool enough to merit a /. story

      No its not, its some marketting drivel in a press release targetted at investors who might be interested in their upcoming IPO.

      "Microsoft announced today that windows will maximize your internet experience with over 32 bits of color, representing a milestone in OS design!"

      "Intel announced today that the new Prescott processors will give you the power to make the most of the internet, representing a milestone in processor design!"

      "Slashdot announced today that they will post anyones press release as news for a mere nickel, representing a milestone in for-hire shill technology!"

      --
      I don't need no instructions to know how to rock!!!!
    4. Re:"...represents a milestone..." by Anonymous Coward · · Score: 0
      No it doesn't. It represents a pretty reasonable upgrade for Google.

      It's expected as the web grows, so will the search engines.

      This isn't exactly a man-on-the-moon accomplishment.


      I agree. I see no reason why Google would lie about this.
    5. Re:"...represents a milestone..." by I+confirm+I'm+not+a · · Score: 1

      At least if /. waits until there's another man on the moon before posting a story, I might get some work done...
      ;)

      --
      This is where the serious fun begins.
    6. Re:"...represents a milestone..." by KFury · · Score: 2, Insightful

      Perhaps you should look up the definition of a 'milestone'. It's a marker by the side of the road, indicating the passing of a cognitive reference point (mile, or other round measure).

      6 billion items is just that, a milestone.

    7. Re:"...represents a milestone..." by nomadic · · Score: 1

      Hey, I only subscribe to The Moon Landing Times...

    8. Re:"...represents a milestone..." by Anonymous Coward · · Score: 0


      Yea, and I just dropped my 64k'th turd in the crapper, but that's not on Slash's frontpage now is it?

  23. is it just me? by trans_err · · Score: 5, Interesting

    Google has become so flooded with internet crap that it's quickly losing its status as a useful tool. Google needs some form of moderation to move out the superfulous blog entries and advertising fronts so it can someday become as useful as it always was.

    1. Re:is it just me? by Anonymous Coward · · Score: 2, Insightful

      I've gathered information from blogs that arnt avaliable anywhere else. When searching on howto setup my wireless smc network card with linux the only source I could find was a blog hit and it got me running it no time. Don't discount blogs so quickly!

    2. Re:is it just me? by ajagci · · Score: 2, Insightful

      Google has become so flooded with internet crap that it's quickly losing its status as a useful tool. Google needs some form of moderation to move out the superfulous blog entries and advertising fronts so it can someday become as useful as it always was.

      Ah, right. Then the various zealots that you already get on Slashdot can moderate pages they don't like out of existence. You know, the people who have a pet platform and will call anybody a "Troll" that is critical of their pet platform.

    3. Re:is it just me? by WoTG · · Score: 1

      Well, the more recent versions of the GoogleBar (only for IE) have voting buttons - cute smiley buttons at that! Whether or not it's a preliminary move to some sort of moderation-like system has been debated in many a Google forum.

    4. Re:is it just me? by Anonymous Coward · · Score: 1, Funny

      Troll

    5. Re:is it just me? by ajagci · · Score: 1

      See what I mean? You don't even have to name the platform--the people in question already know who they are and react completely predictably.

    6. Re:is it just me? by vinlud · · Score: 1

      Maybe we can expand Slashdot moderation throughout the internet?

      --
      Repeat after me: We are all individuals
    7. Re:is it just me? by The+Cydonian · · Score: 1

      I think it's time for a -1, post-modernist humour meta-moderation option. :-)

  24. Their search has apparently improved as well ! by phoxix · · Score: 4, Informative

    Search for any normal product name with google. What would you used to get ? Billions of useless sites that cross link to each other and have the same bloody reviews from amazon.com

    That seems to have changed!

    I just tried a search on television antennas and for once the results seem relevent.

    Hooray!! Google is back!! :^)

    Sunny Dubey

    1. Re:Their search has apparently improved as well ! by trans_err · · Score: 2, Interesting

      Television antennas Information at Business.com Television antennas industry web links for business products, services, information and resources. ... Television antennas. FEATURED LISTINGS, ... www.business.com/directory/media_and_entertainment /television/ equipment_and_supplies/television_antennas/ - 28k - Cached - Similar pages --wow a flase advertising front... how USEFUL!

    2. Re:Their search has apparently improved as well ! by Anonymous Coward · · Score: 0

      Holy crap, even more... purient... searches work again. Hooray! ;)

      Damn straight I'm posting this as AC...

    3. Re:Their search has apparently improved as well ! by ElliotLee · · Score: 1

      Indeed, I read about people making $thousands off their Amazon affiliate links by getting listed in Google. It was serious abuse of the system. They were quite angry when Google dropped them all late last year (a couple months ago). I'm glad.

  25. Faked URLs by Professr3 · · Score: 3, Interesting
    Surely a lot of these results are for search engines that prey on google. You can't run a lookup on anything these days without finding a link that goes straight to some other search page, filled with ads of course. Is this a problem, and is Google actually counting those pages in the 6 billion figure?

    </curious>

  26. Most of it dynamically generated crap? by Anonymous Coward · · Score: 0

    Any dude can configure his cgi script to have an ".html" address.

    How can we be sure that those billion new pages dont come from a dynamically generated list of prime numbers (and more interestingly, how does Goggle know how to stop before infinity!)

  27. Still nok by mirko · · Score: 5, Interesting
    • I own a forum on top of which I put a robots.txt file which is supposed to STOP any spider from visiting it.
      I however find my post while googling for words they also contain.
      How can one explicitely forbid Google from indexing a site ?
    • My wife developed 2 web sites which never got indexed even though we submitted these using Google's interface. As they might not be linked, I suppose Google just considers that if nobody mentions a site, then the site should not be registered as existing ? Do Google think it actually is the web ?

    Sorry, I'll keep using Altavista.
    --
    Trolling using another account since 2005.
    1. Re:Still nok by happystink · · Score: 2, Informative

      Just check the IPs googlebot comes from and ban those if they're not honoring your roots file, that works fine, they have a very set range they use, anything starting with 216.39 or something I think.

      --

      sig:
      See the "..for smart people" banners Wired runs here? Look elsewhere guys.

    2. Re:Still nok by bad-badtz-maru · · Score: 3, Informative

      If googlebot crawls your site, then your robots.txt file is either wrong or in the wrong location. There is no doubt that googlebot follows the robots.txt standard.

      It can take a very long time for a site to be spidered after it is submitted via the "add a url" form.

    3. Re:Still nok by GerritHoll · · Score: 1

      You want to have a NOARCHIVE header:

    4. Re:Still nok by Anonymous Coward · · Score: 0

      I own a forum on top of which I put a robots.txt file which is supposed to STOP any spider from visiting it.
      I however find my post while googling for words they also contain.

      You have to wait until Google reindexes your page for it to notice the robots.txt file. What else would you propose? That every time somebody searched, Google should check for a robots.txt file for every result in that search?

      I suppose Google just considers that if nobody mentions a site, then the site should not be registered as existing ?

      It's sensible logic. If nobody has found it interesting enough to link to, what makes you think people searching would find it interesting? It's not as if a single inbound link is hard to come by.

      Do Google think it actually is the web ?

      What utterly bizarre logic. You don't like the results it serves, so you diagnose it with megalomania? Get some perspective!

    5. Re:Still nok by radish · · Score: 1

      Actually, that will only prevent google caching the page, it won't stop it being in the index. To do that, use "NOINDEX, NOFOLLOW" instead of "NOARCHIVE". Check out google for full details - note that the robots.txt is a much better way of doing things than using the meta tags.

      --

      ---- Den ene knappen er powerknapp, den andre er Bender voice knapp "Bite My Shiny Metal Ass"

    6. Re:Still nok by naoiseo · · Score: 1

      ya, you're right, google probably just ignored your robots.txt... yeesh.

      if some other site on the web has linked to your page, a robots.txt will not help you. robots.txt will stop the bot from spidering around your site, but if you think someone from the outside world might link to it, you need a noarchive tag on every page you don't want archived.

      altavista has no magic technology to know what pages you don't want indexed either, so enjoy using it, but don't get confused.

    7. Re:Still nok by justMichael · · Score: 2, Interesting
      My wife developed 2 web sites which never got indexed even though we submitted these using Google's interface. As they might not be linked, I suppose Google just considers that if nobody mentions a site, then the site should not be registered as existing ? Do Google think it actually is the web ?

      Put it in your sig as a link, get a few high rated posts and google will visit.
    8. Re:Still nok by Anonymous Coward · · Score: 0

      if some other site on the web has linked to your page, a robots.txt will not help you. robots.txt will stop the bot from spidering around your site, but if you think someone from the outside world might link to it, you need a noarchive tag on every page you don't want archived.

      What on earth are you talking about? The robots.txt file handles that just fine. When google finds a link to your site, it retrieves the robots.txt file before doing anything else.

      altavista has no magic technology to know what pages you don't want indexed either

      No, it uses robots.txt as well. It's not magic, it's just common sense.

    9. Re:Still nok by Psychic+Burrito · · Score: 1

      Since Google browses Slashdot at "1, threaded, oldest first", you don't need many high rated post, you need many early root posts not modded below 1.

    10. Re:Still nok by elemental23 · · Score: 1
      You sure about that? From http://gnuart.org/robots.txt:
      Not Found

      The requested URL /robots.txt was not found on this server.

      Or are you talking about a site other than the one in your .sig?
      --
      I like my women like my coffee... pale and bitter.
    11. Re:Still nok by mirko · · Score: 1

      I am talking about a site which I do not want to see linked, a private forum I share with French-speaking friends :)

      --
      Trolling using another account since 2005.
  28. Great news by Illserve · · Score: 1

    Those Link farms were starting to sweat a bit about increasing the scale of their operations by another factor of 10.

  29. No Good... by Mork29 · · Score: 4, Interesting

    I don't want MORE things to search for, I want it to return more relavant searches. I know that the information I usually search for is out there, the problem is that there's so much chafe out there, that I can't find what I want. No matter what I search for, there are at least 2 or 3 responses related to porn. I understand that their are alot of variety of porn out there, but common... Search engines are getting even worse by throwing in search results that are hardly relevant, just because they got paid money by the company. I would even be willing to pay for a "google membership" if they eliminated the advertisers mixed in with search results and maybe gave me another special feature or 2. I'd want a search engine that returns just 1 or 2 good results over one that returns 5 good results mixed in with 200 bad ones.

    1. Re:No Good... by glinden · · Score: 4, Informative
      • I want it to return more relevant searches.
      Have you tried some of the Google alternatives? Vivisimo is particularly interesting with its clustering of search results. Teoma is also quite good.
    2. Re:No Good... by per11 · · Score: 1

      Search engines are getting even worse by throwing in search results that are hardly relevant, just because they got paid money by the company. Google doesn't "throw in" advertisements. Their ads are either on the side or clearly marked ad sponsored links.

  30. Book search by Henry+V+.009 · · Score: 1

    "Google Print"? Sounds neat. Is there a beta page for that yet?

    1. Re:Book search by byolinux · · Score: 1

      http://print.google.com/ - it's quite amusing too.

  31. They said 6 billion items, not webpages. by LostCluster · · Score: 5, Informative

    Notice that they claim that they search 6 billion items, but the home page only claims that they're "Searching 4,285,199,774 web pages".

    To find the rest, we need to use Google's other services. The image search is claiming "Searching 880,000,000 images". Google Groups says its "Searching 845,000,000 messages". Add those to the count and you get 6,010,199,744 items total.

    1. Re:They said 6 billion items, not webpages. by Jugalator · · Score: 2, Interesting

      Yes, and while the press release says they doubled their image search index size, I'm more interested in how much their regular web index increased in size? I have a vague feeling it was around 4 billion before too? :-/

      --
      Beware: In C++, your friends can see your privates!
    2. Re:They said 6 billion items, not webpages. by kevin_ka · · Score: 1

      As long as they're replacing useless indexes (like "your_search.xxxx.com") with decent ones I don't mind if they don't grow very fast. 4*10^9 web index should be enough for anyone :-)

    3. Re:They said 6 billion items, not webpages. by Anonymous Coward · · Score: 1, Insightful

      Why was this informative?

      The summary says 6 billion items, not webpages... and the linked-to article explicity breaks down the 6 billion items into those same stats.

      If only people would read the actual article.

    4. Re:They said 6 billion items, not webpages. by Anonymous Coward · · Score: 0

      I know there are some useful usenet groups, but really, usenet is not my first stop when I'm looking for information.

      You can scrap images, usenet etc, you can't scrap web pages. Web pages is what it is all about. And even there, the more crap you index, the harder it will be to keep the crap out of the search results.

    5. Re:They said 6 billion items, not webpages. by Anonymous Coward · · Score: 0

      It would be easier to just read the article (press release). You only have to read the third sentance. It's not like you discovered some great secret.

    6. Re:They said 6 billion items, not webpages. by Anonymous Coward · · Score: 0

      Moin,

      Google: "Searching 4,285,199,774 web pages"

      Me:

      perl -Mbignum -le 'print 2 ** 32'
      4294967296

      Uhoh...

      Cheers,

      Tels

    7. Re:They said 6 billion items, not webpages. by zunger · · Score: 1

      Previous size was about 3.3b, I believe. The old image index was 435m; I don't remember what the previous groups index was. So it's about 50% increase in web, and more than size doubled in images.

      (And that means all the links in imagesearch are working again...)

    8. Re:They said 6 billion items, not webpages. by srvivn21 · · Score: 1

      Some time between June 30, 2003 August 23, 2003 the number on the site changed from "3,083,324,652" to "4,285,199,774".

      Feel free to look for more changes at http://web.archive.org/web/*/www.google.com.

    9. Re:They said 6 billion items, not webpages. by Anonymous Coward · · Score: 0

      Article? what? Got a link?

  32. Sort out their indexing problems first by jolyonr · · Score: 5, Interesting

    I do hope they manage to sort out their recent indexing problems first. For many searches altavista is now showing far better relevent result searches than google - since their attempted cull of 'spam' sites last december which kind of backfired. They have improved things this year, but the quality of their search results is not as good as it was last year. Now, they need to figure out how to get rid of all the useless sites that are just shopping directories full of espotting URLs and similar and with no real content. Funnily enough, their anti-spamsite code seemed to actually promote these up the rankings on many search terms, while penalising many sites containing genuine content.

    Many people said that Google were using deliberate tactics to encourage small e-commerce websites to spend more on adwords, but I believe this wasn't deliberate - their index is so big that they simply can't tell what the results of their changes are going to do to the search orders for all the search options that people are going to use - and they simply didn't realise in advance the problems they were going to cause. And google have made efforts to minimise the damage since then, but they still need to do more.

    Jolyon

    --


    Please read my Canon EOS tech blog at http://www.everyothershot.com
    1. Re:Sort out their indexing problems first by bad-badtz-maru · · Score: 1

      They rolled out a new algorithm over the weekend that is supposed to be a vast improvement over the post-November result set. The November update wasn't to catch spam sites - it introduced stemming, some of the latent semantic indexing stuff, and also shifted weight from incoming links to on-page content.

  33. Makes a lot of sense to me.... by Anonymous Coward · · Score: 0

    Now convince the government to do it. I think this plan is good but it would have to get past the greed of corporations and politicians. But it does make sense - I say do it

  34. Since when did bigger == innovation? by Moderation+abuser · · Score: 5, Insightful

    It just means bigger. There may well be innovation in the technology which allows bigger, that might have been news for nerds, but bigger itself isn't innovative.

    --
    Government of the people, by corporate executives, for corporate profits.
    1. Re:Since when did bigger == innovation? by Chuck+Bucket · · Score: 1

      hmmm...I'm not going to touch that one, sounds too much like some spam related email I got the other day...

      CB

    2. Re:Since when did bigger == innovation? by HarveyBirdman · · Score: 0, Funny
      There's probably a really good pr0n comment to put here.

      I wish I knew what it was.

      --
      --- Ban humanity.
  35. What about page ranking? by Anonymous Coward · · Score: 1

    Hope they fix the "search engine spamming", there's no way to find anything in Google anymore.

  36. Thanks by KillerHamster · · Score: 5, Funny

    so much for the link to Google, I never would have found it otherwise.

    1. Re:Thanks by ixplodestuff8 · · Score: 1

      Yeah I kept going to googles.com and finding some wierd kids site. Slashdot saves the day again!

    2. Re:Thanks by De+Lemming · · Score: 1

      Hey, I need my favourite search engine - I don't want them to be slashdotted!

    3. Re:Thanks by MrAngryForNoReason · · Score: 1

      You could have just searched for it .... oh...right.

  37. Run out of indexing space? by rqqrtnb · · Score: 5, Interesting

    I heard that Google is using 4-byte ints for DOCids and they have been running out of indexing space since they are pretty close to 2^32 pages already. Is that true?

    1. Re:Run out of indexing space? by kindofblue · · Score: 3, Interesting
      Not likely. I would imagine that each item has a unique id, not just each web page, since their needs to be some way to identify what the target of a link is. Just because a link ends in pdf, or jpg, or gif, does not mean that it is of that type. The crawlers undoubtedly record the content-type of fetched resources.

      So I would guess that they already use more than 32 bits per item with everything in a single item ID space, or they use 32-bits plus some code indicating the ID-space, or more perhaps a variable length code depending on the item type, e.g. like UTF8. In any case, they should have exceeded 32-bits long ago.

    2. Re:Run out of indexing space? by dtfinch · · Score: 2, Informative

      Since they said they have 4.28 billion searchable pages in the index, and 32 bit integers have a range of about 4.29 billion possible values, I'd say they're pretty close to having to make another upgrade, unless they decide there will never be more than 4.29 billion pages online that searchers would be interested in.

    3. Re:Run out of indexing space? by That's+Unpossible! · · Score: 1

      In any case, they should have exceeded 32-bits long ago.

      I don't really understand what you are trying to say, but if they are grappling with a 32-bit integer problem on their webpage indexing, I don't see what it has to do with their separate index of images and usenet posts. I imagine they DO use more than one database table?!

      --
      Ironically, the word ironically is often used incorrectly.
    4. Re:Run out of indexing space? by kindofblue · · Score: 2, Informative
      Hypothetically, if web pages were identified with an 8-bit code of 0x01 along with a 32-bit identifier, then one could just assign another code to signify web pages. e.g. codes 0x00-0x7f could be web page codes, 0x80 for PDFs, 0x81 for Gifs, etc. Each code would be combined with a 32-bit int identifier that is unique relative to that code, giving a 40-bit identifier space.

      As for the space required, they must have gone to beyond 32-bits for on-disk identifiers. URL's and cached pages easily take a lot more space than a 5-byte to 8-byte (64-bits) identifier, so they've definately got the storage. For archival purposes, 64-bits is ample space and small.

      But a good reason to keep identifier sizes small is so that they don't take up much RAM space. That's why variable sized IDs would be useful. They are a simple fast form of compression. UTF8 is a variable sized encoding that uses 8 bits to encode the vast majority of characters used in English (ASCII) and uses between 2-bytes and 4-bytes for other less common character codes (symbols and other language characters). This is done by using the top 2-bits of the first byte to indicate how large that variable-sized character is. (I don't remember the details, however.) The effect is that on average for English, most strings would consume slightly more than 8 bits per character.

      The same principle would work for any variable sized identifier, e.g. useful for DOC Ids or word/term ids. The most common web pages (yahoo, hotmail, msn, nytimes, etc) would have very high page rank and could be given small ids', eg. 16-bits (2-bit code, 14-bit id). Same thing for frequent words, "whether", "while", "with", "over", or closed-class words. Compress them to small ids.

      Anyway the point is that you could have an effective id space of much greater than 32 bits and yet use much less than 32-bits per identifier on average. Every search engine must have dispensed with the 32-bit barrier by their beta phase, unless they're run by idiots. Maybe that's Microsoft's problem.

    5. Re:Run out of indexing space? by Anonymous Coward · · Score: 0

      Actually google only checks extensions.

    6. Re:Run out of indexing space? by Alomex · · Score: 1

      I doubt it. Search engine land was taken over by the 64 bit federation in 1996 or so...

      Seriously, I bet they delta-encode most of their page numbers, which means that for most space sensitive usage both 4-byte and 8-byte result in the same amount of space usage...

      Then again, barring a definitive statement from a googler it could be either way.

  38. pre slashdotted copy by ThomK · · Score: 0, Redundant

    February 17, 2004 08:02 AM US Eastern Timezone

    Google Achieves Search Milestone with Immediate Access to More Than 6 Billion Items

    MOUNTAIN VIEW, Calif.--(BUSINESS WIRE)--Feb. 17, 2004--
    Google Connects Searchers to World's Most Comprehensive Index; Increases Web Page and Image Collections

    Google Inc. today announced it expanded the breadth of its web index to more than 6 billion items. This innovation represents a milestone for Internet users, enabling quick and easy access to the world's largest collection of online information.

    "People worldwide can find more information with Google than with any other search engine," said Larry Page, Google co-founder and president of Products.

    Google's collection of 6 billion items comprises 4.28 billion web pages, 880 million images, 845 million Usenet messages, and a growing collection of book-related information pages. Web surfers worldwide can now search across Google's collection of items using the following services: -- Google Web Search: The company's flagship search service now offers 4.28 billion web pages. Google's powerful and scalable technology searches this information and delivers a list of relevant results in an instant. Google Web Search also enables users to search for numerous non-HTML files, including PDF, Microsoft Office, and Corel documents. -- Google Image Search: Comprising more than 880 million images, Google Image Search enables users to find electronic images relevant to a wide variety of topics. Advanced features include search by image size, format (JPEG and/or GIF), coloration, and the ability to restrict searches to specific sites or domains. -- Google Groups: This 20-year archive of Usenet conversations is the largest of its kind and serves as a powerful reference tool, while offering insight into the history and culture of the Internet. Google Groups offers more than 845 million postings in more than 35,000 topical categories. -- Google Print: A test service that enables Google users to immediately access a range of book related information, such as first chapters, reviews, and bibliographic information. These pages also offer users links to directly purchase titles.

    "Google Image Search has been significantly updated," said Sergey Brin, Google co-founder and president of Technology. "We've doubled the index to more than 880 million images, enhanced search quality, and improved the user interface."

    Today's news follows the announcement last week that Google received eight awards in the 4th Annual Search Engine Watch Awards, which recognize outstanding achievements in web searching. Google was recognized as the "Outstanding Search Service," for outstanding performance in helping internet users locate information from across the Web. Google has received this distinction every year since the awards were initiated in 2000. Google AdWords was also given top honors for value, targeting, tools and overall advertiser satisfaction.

    About Google Inc.

    Google's innovative search technologies connect millions of people around the world with information every day. Founded in 1998 by Stanford Ph.D. students Larry Page and Sergey Brin, Google today is a top web property in all major global markets. Google's targeted advertising program, which is the largest and fastest growing in the industry, provides businesses of all sizes with measurable results, while enhancing the overall web experience for users. Google is headquartered in Silicon Valley with offices throughout North America, Europe, and Asia. For more information, visit www.google.com.

    Google is a trademark of Google Inc. All other company and product names may be trademarks of the respective companies with which they are associated.

    --

    TK

    1. Re:pre slashdotted copy by goodbye_kitty · · Score: 1

      Are you seriously suggesting that google is likely to be slashdotted? =P

  39. get it all! by PhuckH34D · · Score: 0

    I wonder how long it would take to do a "wget" with their database as input :)

    --
    You're old school? I beta tested the motherf***ing abacus!
  40. Good for Google...but: by master_p · · Score: 4, Interesting

    I am still waiting for a search engine that does topic matching instead of text matching. In other words, I would like the search engine to return a list of urls with relative topics instead of relative text. As it is right now, all search engines, including Google, return pages that contain text equal or relative to the input but they might be 98% unrelated. I still can't consider the Internet as a library of knowledge due to this fact.

    For example, if one searches for "TCP/IP tutorials", it would return many unrelated links like posts in newsgroups, college lectures, etc.

    1. Re:Good for Google...but: by BenjyD · · Score: 2, Informative

      That's what directories like dmoz.org do. IIRC, google does use directory information, but it is far too hard a problem to automate topic finding without a lot of human editors.
      I saw some research recently at a conference that used complex vocabulary matching algorithms to automatically extract topics and organise large numbers of documents into topic hierachies and present summary reports, but I think that might be a bit too processor intensive and cutting edge, even for google.

    2. Re:Good for Google...but: by bad-badtz-maru · · Score: 1

      Google is working on this with the Latent Semantic Indexing technology they purchased late last year.

    3. Re:Good for Google...but: by Anonymous Coward · · Score: 0

      True, but I'd guess at least a few of those posts and lectures would reference something more like what you were looking for. Sure, it's one step removed from Google popping up exactly what you were looking for in the first few results, but sometimes you've got to expect a little work on your part.

    4. Re:Good for Google...but: by zarr · · Score: 1

      I fully agree that "topic search" or "concept search" is the way to go for search engines, and believe me, a lot of people are working on this.

      Your example isn't very good though. I can't remember ever having searched for "[some-technical-term] tutorial" an not gotten just what I was looking for. "[some-technical-term] reference" is also a sure bet

    5. Re:Good for Google...but: by madmaxx · · Score: 1

      Google almost does topic-searches, try: ~"TCP/IP tutorials" Which attempts to do a synonym-search, which is darned close to a topic search (in the way that I use it anyway).

      --
      mx
    6. Re:Good for Google...but: by afeeney · · Score: 1
      I thought that Mooter gets pretty close--it clusters results around specific areas that it finds in the surrounding text. Still very much in development but, like Vivismo, a good next step.

      Mooter

    7. Re:Good for Google...but: by evilviper · · Score: 1
      I am still waiting for a search engine that does topic matching instead of text matching.


      So you weren't reading /. when the story about http://vivisimo.com/ was front-page?

      Search for TCP/IP, and one of the categories is "IP Tutorial", another is "Training", and yet another is "Introduction to TCP".

      I bookmarked the site, and find it a good idea, but I haven't checked it out yet, since Google hasn't filed to find what I have recently been looking for, yet.
      --
      Slashdot gets worse every day... Pipedot: News for nerds, without the corporate slant
  41. Re:waste of time and energy by happystink · · Score: 1

    Also you don't get slashdotted by just having a (lame) link in the discussion, especvially if it's modded to -1 as this will be, but even if it's at 2. You only get the mad hits from front page links, there isn't a magical thing where any link on any page containing slashdot in it's url gets you 10,000 hits.

    --

    sig:
    See the "..for smart people" banners Wired runs here? Look elsewhere guys.

  42. Just in Case by Anonymous Coward · · Score: 1, Funny
  43. Good for them by Anonymous Coward · · Score: 0

    Too bad they're about to lose their shirt to SCO in an end user lawsuit.

  44. another press release reported as news by Anonymous Coward · · Score: 0
    This innovation represents a milestone for Internet users, enabling quick and easy access to the world's largest collection of online information.


    This press release represents a milestone for slashdot users, enabling quick and easy access to marketing drivel reported as news.


    Honestly guys, this isn't that hard. Could you at least try?

  45. Google Print by blorg · · Score: 5, Informative
    "Google's collection of 6 billion items comprises 4.28 billion web pages, 880 million images, 845 million Usenet messages, and a growing collection of book-related information pages."

    I was interested that they mentioned Google Print, which is Google's answer to Amazon's Search Inside feature, but hasn't got much press, and is pretty well hidden in Google itself.

    You can check it out by limiting results to site print.google.com, e.g. searchterm site:print.google.com. (Not quite at Amazon-type numbers yet.)

  46. How much of it is crap? by coolerthanmilk · · Score: 1, Funny

    from the how-much-of-it-is-crap dept.

    Oh, come on now, this is Google we're talking about. Just look it up yourself. Here it is in one click without having expend all that effort to type all four letters yourself. (Warning: The answer is not pretty)

    Crap on Google
    1. Re:How much of it is crap? by dan+dan+the+dna+man · · Score: 0

      Worryingly I had followed one link in the top 10 alreayd :/

      --
      I don't read your sig, why do you read mine?
  47. Caveat Emptor by erick99 · · Score: 5, Insightful
    Google is my favorite search engine. That said, I hope that most folks understand that just because they "google" something does not make that something a fact. Also, the first few pages of any search can be the result of manipulation to get in the top 10, 20 or 100. It is really, really important to consider the source when doing any kind of research on the 'net. I am homeschooling my 13 year old and having a hell of time getting these lessons across to him. He can research almost anything in a fraction of a second, but it takes a bit longer to separate the wheat from the chaf.

    Happy Trails!

    Erick

    --
    http://www.busyweather.com/
    1. Re:Caveat Emptor by yarbo · · Score: 1

      Have you checked out the Wikipedia? When I'm looking for something serious and educational, I look there first.

  48. How much space do they use for caching? by The+One+KEA · · Score: 4, Interesting

    With 6 billion pages indexed and cached, and maybe an average of 50K per page (which is probably pretty conservative - it's probably twice that in some cases), that's nearly 30TB, IICIC!!!

    The hard disk and RAID folks must LOVE Google....

    --
    SCREW THE ADS! http://adblock.mozdev.org/ Proud user of teh Fox of Fire - Registered Linux User #289618
    1. Re:How much space do they use for caching? by stratjakt · · Score: 2, Interesting

      They dont cache images and shockwave/java bloat though, just text. I'd say most pages are well under 10k. But who cares.

      --
      I don't need no instructions to know how to rock!!!!
    2. Re:How much space do they use for caching? by ediron2 · · Score: 4, Interesting
      With 6 billion pages indexed and cached, and maybe an average of 50K per page (which is probably pretty conservative - it's probably twice that in some cases), that's nearly 30TB, IICIC!!! The hard disk and RAID folks must LOVE Google....
      30tb... at a buck a gig, those $30,000 sure do look appetizing to all the hard drive and raid makers.

      Not!

      Hell, even doing 2x or 3x this amount for server-class drives still leaves us talking lame amounts. Just one Hitachi/Sun 9980 Fiber Channel drive costs several times more than this.

      Seriously, everything I've heard indicates that google's methods hinge on a lot of white boxes, each one covering a subset of the google data. Put another way, drivespace per server isn't the limiting factor. A distributed system with several hundred white box servers can't HELP but have tens of terabytes of storage, given drive capacities of tens and hundreds of gigs each.

      A client just bought a Hitachi 9980. As sweet as the Hitachi arrays are, I thought it was the most horrendous waste of cash I'd ever seen, considering this client's more modest needs. THOSE are the customers that raid/drive makers love... all it takes is one IT guy with hardware lust who has the trust of a Fortune-500 firm.

    3. Re:How much space do they use for caching? by dildatron · · Score: 2, Informative

      I'm a storage engineer, and, to the enterprise, 30TB is peanuts. On a busy day, I have provisioned 30TB in one day to various computers. A typical high-end array (an EMC/Hitachi/HP/etc)usually tops out at around 150TB, but you can have a bunch of them on the same storage area network.

      The trick, is how to back it all up in shortening backup windows. Things like truecopy work, but take twice the disk space.

      --


      If you had nuts on your chin, would they be chin nuts?
    4. Re:How much space do they use for caching? by Anonymous Coward · · Score: 0

      Yeah, I was surprised it's ONLY 30TB. Is that possible? If that's the case, then it's no big deal. One could serve this much data using less than 64 Intel-based servers... It's mostly read-only so it can be fast.

      Before I thought it must have been more like 300TB...

      Backup: snapshot disks that do diff snapshot are quite useful, you need the delta only (say u've got 30TB and every day only 2TB changes, so you get a 3TB snapshot space and ur done, no?)

    5. Re:How much space do they use for caching? by RedWizzard · · Score: 2, Interesting
      30tb... at a buck a gig, those $30,000 sure do look appetizing to all the hard drive and raid makers.
      I've heard that it's all kept in RAM. 30TB of RAM is going to cost a lot more than $30,000. If it is also on disk would they use cheap IDE disks or a server class solution?
  49. Thank You MS! by pararox · · Score: 1

    Though I'm likely to get hammered down around here for such a sentiment, I really think this is a result of MS declaring their intentions of ruling the websearching space.

    Without the fear of competition, it's very likely that Google would stagnate - thanks Microsoft ;)

  50. SPEED is the answer by codeshack · · Score: 3, Insightful

    Google's value seems to be in cutting out the crap in its bandwidth... look at their page loads (2.6k plus 8.4k for the image) versus Yahoo! (30k plus images, plus ads). And the less said about AV or Lycos in that regard, the better. Not to mention that Yahoo has basically just co-opted Google, but with more fat around the edges.

  51. Whee, it's a press release by Omnifarious · · Score: 2, Insightful

    A press release complete with corporate speak!

    "This innovation represents a milestone for Internet users, enabling quick and easy access to the world's largest collection of online information.".

    This is just google doing what they are already well known for doing best. There's nothing new or 'innovative' here. While it's a fine accomplishment, and I'm please google has indexed that much stuff, it's hardly innovative for them.

    1. Re:Whee, it's a press release by Threni · · Score: 1

      I just want to know what the worlds second largest collection of online information is called.

      I notice, however, that there's a shocking lack of pornography on Google's image search.

  52. Is /. pro Google? by dark-br · · Score: 5, Informative

    "Google currently does not allow outsiders to gain access to raw data because of privacy concerns. Searches are logged by time of day, originating I.P. address (information that can be used to link searches to a specific computer), and the sites on which the user clicked. People tell things to search engines that they would never talk about publicly -- Viagra, pregnancy scares, fraud, face lifts. What is interesting in the aggregate can seem an invasion of privacy if narrowed to an individual."


    That's a quote from the NYtimes (free req. yada yada) also posted as is here

    If any other site were to track the stuff Google does, /. would be up in arms protesting!

    Please note, this isn't a troll, and I'm not wearing a tin-foil hat (maybe I should?). Imagine the following scenario: a bomb goes off in the US. By tracing searches for "anarchist cookbook" to zipcodes within the area of the bomb blast, the FBI could have access to information that makes TIA look like a better alternative.

    Maybe this isn't such a good feature after all...

    1. Re:Is /. pro Google? by selderrr · · Score: 2, Interesting

      It all depends on ho often they rotate their logs and how long they store their backups. I honestly don't believe they can keep logs longer than a few weeks. Any longer and they'd need 2nd serverfarm to store the archive. And no terrorists would go from a google query to a bomb in a few weeks. So I guess you're quite toptinfoiled indeed.

    2. Re:Is /. pro Google? by Anonymous Coward · · Score: 0

      And now they can combine what they have logged while you are searching with what you give them signing up with orkut.

    3. Re:Is /. pro Google? by Anonymous Coward · · Score: 0

      How do you reckon that? I think they can keep them forever - for example, 200 million searches a day * 1KB of log per search... 190GB/day? That's what - about $200 of IDE storage a day? Peanuts.

    4. Re:Is /. pro Google? by hirebrand · · Score: 1
      Imagine the following scenario: a bomb goes off in the US. By tracing searches for "anarchist cookbook" to zipcodes within the area of the bomb blast, the FBI could have access to information that makes TIA look like a better alternative. Maybe this isn't such a good feature after all...
      So you are against the FBI finding terrorists?
    5. Re:Is /. pro Google? by alexo · · Score: 1

      >> Imagine the following scenario: a bomb goes off in the US. By tracing
      >> searches for "anarchist cookbook" to zipcodes within the area of the bomb
      >> blast, the FBI could have access to information that makes TIA look like a
      >> better alternative. Maybe this isn't such a good feature after all...
      >
      > So you are against the FBI finding terrorists?


      Although it smells like a troll, I'll take that question at face value.

      Let's imagine the you live within walking distance of the bomb blast...

      Let's imagine that you saw a reference to "anarchist cookbook" on /. and decided to google it out of curiousity...

      Let's imagine you bought some fertilizer for your flower patch (or veggie garden) two weeks earlier and used google to find a cheap outlet...

      Let's imagine that your neighbour is an immigrant from a Muslim country...

      Let's imagine that the FBI gets hold of these facts and decides that you are a likely suspect...

    6. Re:Is /. pro Google? by Eivind · · Score: 1
      Why ? There's no way the raw logs will fill more than say 100GB a day or so. That amount of data is peanuts for a company the size of Google, especially since searching is the core of their bussiness.

      Even if you stored it on triple-redundant raid-arrays, and kept it online forever, that is, no migration to tape-libraries or similar, it'd still only be a cost of around $500/day.

      The thing is, even people who are quite into technology are unable to wrap their heads around how mind-bogglingly cheap storage has become.

  53. Google's strategy becoming clear by polymorpheus · · Score: 2, Insightful

    We've got over 6 billion entries, but let's return garbage for most queries, making sure the good stuff is in the "sponsored links" or sidebars. At least it's a good business model.

  54. but... by Savatte · · Score: 5, Funny

    have they beaten Ron Jeremy?

    1. Re:but... by Anonymous Coward · · Score: 1, Interesting

      I'd rather they start beating the spammers...

      Random advertising sites are working 24/7 to flood Google with crap :(

      Some sites are even using sneaky things to display special pages that only those with the Google spider's user-agent will ever see...

      Dastardly--I only caught them due to a broken PHP script seen in the cache...

    2. Re:but... by first.last · · Score: 0

      I really don't want to know if they've beaten off Ron.

      --
      Wishing I was a millionaire since 1969.
    3. Re:but... by PornMaster · · Score: 1

      Not if you count individual sperm served. WHO estimates are of 20 million sperm per ejaculation, and as of 25 years ago, estimates were five times that. So I'd say that if the number served is sperm (as would be burgers), then Ron Jeremy's served far more... as have I.

    4. Re:but... by Anonymous Coward · · Score: 0

      Serving your hand or a kleenex doesn't really count as serving though...

  55. information by Anonymous Coward · · Score: 0

    You may be willing to pay for it, but that doesn't mean that they can provide it. I mean, google is already losing money as that phenomenon dilutes both the value of their service (junk hits) and their advertising model (ads on the side not as appealing as ads mixed into the results).

    Contrary to your (apparent) belief, google doesn't mix those advertisers in; they mix themselves in by exploiting the google search heuristic.

  56. Here's hoping by Destoo · · Score: 0, Redundant

    So it's a conspiracy to eliminate the french.

    that should have been a "c - cedille" and not a regular c on the second term.
    (which would either translate to "dumb soup" or "suspicion", depending if the right character is used)

    --
    Nouvelles de jeux et technologies en français. TC
  57. Google pulled us out of "The Dark Ages" by leoaugust · · Score: 4, Interesting

    There is an interesting article in Wash Post Search For Tomorrow on Google, and possible AI in search.

    Some excerpts:

    We stumbled around in libraries. We lifted from the World Book Encyclopedia. We paged through the nearly microscopic listings in the heavy green volumes of the Readers' Guide to Periodical Literature. We latched onto hearsay and rumor and the thinly sourced mutterings of people alleged to be experts. We guessed. We conjectured. And then we gave up, consigning ourselves to ignorance.

    Only now in the bright light of the Google Era do we see how dim and gloomy was our pregooglian world. In the distant future, historians will have a common term for the period prior to the appearance of Google: the Dark Ages.

    There have been many fine Internet search engines over the years -- Yahoo!, AltaVista, Lycos, Infoseek, Ask Jeeves and so on -- but Google is the first to become a utility, a basic piece of societal infrastructure like the power grid, sewer lines and the Internet itself.


    --
    To see a world in a grain of sand, and then to step back and see the beach where the sand lies ...
  58. However.... by Anonymous Coward · · Score: 0

    It was reported that 1.4 billion of these were dud links redirecting to ebay.

  59. The NY Times is partly wrong by Anonymous Coward · · Score: 1, Insightful

    They generally do not track where people click. There are exceptions (ads and in the occasional quality control), but most of the time, your links are direct to the page. They can't track that.

    Second, the other information is the same information most website collects in its logs.

    1. Re:The NY Times is partly wrong by Madmanz123 · · Score: 1

      And blocking cookies from google will also help a bit (though I'm not sure how much).

  60. It's worth mentioning... by dark-br · · Score: 4, Informative
    that not everything about Google is so visible.

    One shuold have a look at Google-Watch (tinfoil? maybe...) but they have some good points:

    According to DEA, Google is breaking the law

    Google Evil cookie

    We got your number!

    And so on...

    Not to troll but rather a thought. Mod as you wish.

    1. Re:It's worth mentioning... by Comsn · · Score: 2, Informative
      One should also have a look at Google-Watch-Watch

      which states

      Meet Daniel Brandt. He is a self-proclaimed public interest activist and the owner of Google-Watch.org Mr. Brandt founded Google-Watch.org after his own site, Namebase.org, did not get a good Google PageRank.
  61. And five billion of those pages are... by orangesquid · · Score: 1

    Your search for: "slashdot effect research whitepapers" returned approximately 40,000 results.

    1. Hot wild cum girls ... erotic slashdot whitepapers cum girls hot lesbian potato bookcase effect dildo research
    Cached - Similar Pages
    http://www.zebra-hot-dog-fetish-commander.n et/ ...

    --
    --TheOrangeSquid Is it any wonder things seem so awry? We swim in a sea of confusion and don't have to think to survive
  62. oh, come on by ajagci · · Score: 2, Insightful

    This really isn't a big deal and it happens all the time when building large systems. I don't know how their system works specifically, but you just change the transient in-memory representations to 64bit by recompiling, and for the on-disk stuff you create a new format using 64bits but still recognize the old format. That way, you have to convert nothing and you will be migrating to 64bit representations as needed. I'm sure Google has managed to deal with much more complex engineering problems than that.

  63. I doubt it by Aqua+OS+X · · Score: 1, Informative

    I doubt it. Google may have more things indexed, but it web search still sucks when compared to Teoma'a and it's image search still sucks when compared to AllTheWeb's.

    Google is most non triumphant.

    --
    "Things are more moderner than before- bigger, and yet smaller- it's computers-- San Dimas High School football RULES!"
  64. Re:My pages are not indicized :( by ixplodestuff8 · · Score: 0, Redundant

    I suggest linking to it from a site visited by google alot, you know one like slashdot.

    Step 1: complain about not being visit by google
    Setp 2: post a link on a site that gets almost every page spidered by google
    Step 3: ???
    Step 4: Profit!

  65. big but far from complete. by selderrr · · Score: 4, Informative

    I wrote a project for our univ and submitted the url to google bout 3 moths ago. It still doesn't show up

    1. Re:big but far from complete. by K-Man · · Score: 1

      Unfortunately, those submit-a-site programs are routinely ignored by the search engines who claim to be soliciting urls. I worked at one portal-wannabe in the 90's, and one of my duties was to evaluate auto-classification tools for the submit-a-site program. We went through several rounds of meetings, bids, etc., and then after we had finally selected a tool, somebody higher up pulled the plug on the program. There was simply no profit in it, as it was mainly a free submit-a-spam pipeline. Not long after, the idea of paying for inclusion was born, and all the backlogged submissions were dumped.

      My guess is your url is sitting in a log file somewhere, several levels removed from ever being touched again.

      --
      ---- "If we have to go on with these damned quantum jumps, then I'm sorry that I ever got involved" - Erwin Schrodinger
  66. Moving Goalposts by Dorf+on+Perl · · Score: 2, Funny

    I had only read through 1,673,233,497 items by last Friday, and now this. I'll never catch up now! Thanks for your "service," Google.

  67. Thanks alot editors! by Innova · · Score: 2, Funny

    How could you link directly to Google on the front page of slashdot? Couldn't you have used a mirror? Do you realize what will happen if Google gets slashdotted? The entire internet infrastructure will come to a screeching halt! You insensitive clods!

    /sarcasm

  68. It was mine by lordrich · · Score: 0, Funny

    And on the very same day my latest website get's into Google! Coincidence?

  69. Why not just... by Anonymous Coward · · Score: 0

    google for it!

    Google - indexing more web pages than any other type of document!

    1. Re:Why not just... by Anonymous Coward · · Score: 0

      http://66.102.11.104/search?q=cache:zhool8dxBV4J:w ww.google.com/+google&hl=en&ie=UTF-8

      Google is not affiliated with the authors of this page nor responsible for its content.

  70. Going Public & Pay Per Search by mslinux · · Score: 2, Redundant

    I've heard rumors (from very reliable sources) that Google will be going to a "Pay Per Search" business model when they go Public... anyone else heard this?

    1. Re:Going Public & Pay Per Search by /dev/trash · · Score: 2, Interesting

      Like, I search for say "perl code" and they isntead present me with a page to login with my credit card number?

      I highly doubt that. I'd no longer use Google, and I bet a lot of others wouldn't either. Free is pretty addictive, even if they do have a lot of stuff indexed.

  71. (C)2004 Google - Searching 4,285,199,774 web pages by Anonymous Coward · · Score: 0

    6 Billion - 4.2 Billion Web Pages = 1.8 Bill Other

    Other What???

    FTP Sites? Archie Servers ? Web Mail Servers?
    Gopher Servers??

    Blogs are web pages
    Pron is web pages..

    Just last night it was 3 Billion 300 million something...

    Sad that I actually pay attention to that...

  72. Size and Criteria are good, but... by mugnyte · · Score: 5, Insightful


    Too bad the article doesn't mention how google is trying to fight gaming the PageRank system or any of the other problems like commercials in the results. Still a great search tool though.

  73. How do we know for sure? by Eric_Cartman_South_P · · Score: 1
    How do we know they are for real, and are telling the truth?

    We don't.

  74. Google avoids that problem by using referrals... by thrill12 · · Score: 1

    It is referrals to a page that give a page credit, and describe the actual content a page is about. That is what google is trying to do at least.

    Is it not true that when you hear things from 3 different people, you believe the thing more opposed to hearing it from one person ? It's this "hearsay" that makes google so powerful, yet suspectible to mischief like the one put forward by many spam-sites that lay around a network of referrals.

    A possible improvement could perhaps be the use of a system that proves the referring website is unique in nature, and is not copied all over the place (call it a "jury" :). But doing this would mean some central authority (a "judge") collecting some ID on websites or the like.
    Better than this would be a formula that achieves this goal without such authentication, so everyone can go about their ways just as usual, and no longer have to pay attention to these shouters in the crowd ("evidence").

    --
    Slashdot: stuff for news, nerds that matter, matter for news, stuff that nerd
  75. Image search: What's your experiences? by GQuon · · Score: 4, Interesting

    Both Google and Fast have image and picture search. They're all right. But I have had more luck with Lycos.

    What are your experiences?

    Of course, none of these services search in the image data itself. They search filenames, special features (like image size), and the content of the pages they are found in.
    What is the state of searching in images today? Facial recognition systems have existed for a while, but they are made for a specific purpose.

    How long before we can take a picture of that piece of your IKEA furniture and find the same model in pictures of celebrity houses, Babylon 5 sets and crime scenes? Or taking a picture of that familiar-looking person walking down the street, searching for her, and remembering that she was in that "reality" series two years ago.

    --
    Irene KHAAAAAAN!
  76. search indexing by stefanmi · · Score: 0, Informative

    Also one of the main problems Google is currently having with their search results is that too many blogs are ending up in the top results, often ranking higher than the primary site that contains the information that the blogs refer to (due to many blog-users who heavily cross-linking amongst themselves which ups their rating). To combat this they've already discussed creating a seperate category for blogs to help seperate these. Good to see them taking a proactive stance -- get enough people using your service and you're suddenly got a category of blogs already identified and indexed. I'm giving them the benefit of the doubt as they've always been quite responsible with ads and while its a potential revenue stream I don't think they'll ever be as intrusive as other free sites/services.

  77. Mac users' image search by saddino · · Score: 4, Informative

    "Google Image Search has been significantly updated," said Sergey Brin, Google co-founder and president of Technology. "We've doubled the index to more than 880 million images, enhanced search quality, and improved the user interface."

    For Mac users, I recommend using Beholder to power your Google image search. Google's minimal UI changes notwithstanding.

    (Mod +1 Self-Promotive)

  78. Really? :) by SeanDuggan · · Score: 0, Funny
    Well, we're only talking about the important one. We'll get to yours later.

    And yes, I am an ugly American.

    --
    This sig has absolutely no significance and serves only to take up screen space and waste the time of the reader.
  79. I really don't agree with that article by PollGuy · · Score: 4, Insightful

    I read that article and really disagreed with the premise. Google is good for indexing what's available online, but only a tiny fraction of recorded human knowledge is available online. I work for a digital libraries project, and after visiting the Joint Conference on Digital Libraries, I can tell you that it's a librarian's wet dream to be in the kind of situation that the article describes: where all the information that we have to stumble around libaries and microfiches for is Googlable. But the full texts of almost no books are available. Who's going to scan in millions of volumes? Who's going to pay for that? And most importantly, how are the publishers going to allow it? US and world copyright laws are keeping almost all the content from being eligible for online publication, even if their profit windows are long closed.

    I encourage all of you who are in high school or have college papers to write to look beyond Google the next time you have to research something. You will find about fifty times as much information by looking in published volumes. Here's the technique I always use: visit a University library. Use the electronic card catalog to find a couple of titles that seem to match your topic. They will likely all have similar call numbers. Then, go browse the stacks around those call numbers. That will give you access to all the books available that are related to your topic, and on the next shelf over, are books that are tangentially related. Every time I do that, I find some fascinating angle on the subject matter I never even knew existed. The books you find will have references, and you can follow those to immense amounts of material more specifically related to the angle you've chosen. And none of it is on Google.

    If you have trouble, go ask one of the friendly research librarians. They do a lot more than go around and "shhh!" you.

    Google is a useful tool, but if you want real depth, from people who aren't tech savvy enough to put their full academic works online, the library is the only place to find it. Put in the time!

    1. Re:I really don't agree with that article by millette · · Score: 1

      what more can I say... mod parent up!

    2. Re:I really don't agree with that article by shadowbearer · · Score: 1


      I have something to add:

      Support your local library.

      Donate money, time, books. Whatever you can.

      Always make sure to remind your local city politicians and citizens of how valuable the library is. Libraries are rarely, if ever, funded enough.

      Frequent the book sales, buy whatever you can afford, and let the librarians know that you'll loan it out (assuming you have the space to keep it and the time to keep track of it) - you'll be amazed at the people you'll meet. I have collected old science books for years and recently got a request from a geology student doing a project on the history of geology (pre plate-tectonics). He was a farmer's son and I was immersed in free food for weeks :)

      I was (and am) a library geek 30+ years ago - long before I was a computer geek. Libraries are quite possibly the most valuable resource the human race possesses.

      SB

      --
      It's old. The more humans I meet, the more I like my cats. At least they are honest.
    3. Re:I really don't agree with that article by Ronny+Cook · · Score: 1
      Actually the degree of depth you'll get from a research library vs. from Google is highly dependent on the subject matter.

      Classic Computer Science. Literature. Criticism. Mathematics. Ninety percent of subject matter in fact - the research library is your friend.

      Up-to-the-minute technical data such as bugs databases or parts catalogues. Movie reviews. Pop culture. Basically anything where information will be outdated after one to two years - for such subject matter the Internet is probably your best reference.

      And then or course there's the sort of topics which your library won't carry because it's either censored or politically hazardous - porn foremost... I would say topics such as explosives except that a really *good* research library *will* cover such topics...

      ...Ronny

  80. But should be still be using Google? by mshiltonj · · Score: 1, Interesting

    Is Google becoming a task master for Big Brother?

    Et tu, Google?

  81. "miserable failure" top 5, update by Anonymous Coward · · Score: 1, Interesting

    Let's see if the 'new' index adds interesting stuff:

    1.- Michael Moore (still no. 1)
    2.- Dubya
    3.- Jimmy Carter
    4.- Sen. Hillary R. Clinton
    5.- Howard Dean

    PS: litigious bastards still clean, just pointing to litigiousbastards.com.

  82. META Tags by JSkills · · Score: 2, Insightful
    I thought this re-index would finally pick up our "description" meta tag and actually use it. Nope. Instead we still get the same concatenated list of links that are in our left nav bar as our description when people find us in google search results. They have a "decription" listed, but it looks like something they made up themselves?

    Guess I better call the whaaaaambulance :-(

    BTW - can you believe that a large number of visitors we get come from people who do a search on "goofball.com". Wow.

  83. Betty's bunnies have fluffy fur today by GQuon · · Score: 0, Offtopic
    --
    Irene KHAAAAAAN!
    1. Re:Betty's bunnies have fluffy fur today by Anonymous Coward · · Score: 0

      THIS is as sick as people breading hairless cats. Sometimes I wonder why we can screw with nature that much without some serious smiting going on :(

    2. Re:Betty's bunnies have fluffy fur today by johnlcallaway · · Score: 2, Funny

      Oh .. that link was just too funny.

      Some people just have way too much time on their hands. (This from a guy who spent 6 hours last night building a new computer from scratch [have to get those cables just right you know], and will probably spend another 6 tonight trying to get WinBloze to load.)

      --
      I rarely read replies, it's my opinion and if you thought about your opinion a little more, I'm OK with that.
    3. Re:Betty's bunnies have fluffy fur today by Anonymous Coward · · Score: 0


      and will probably spend another 6 tonight trying to get WinBloze to load.

      If you're loading it voluntarily then you should refer to it by its proper name. Otherwise you're just posturing.

    4. Re:Betty's bunnies have fluffy fur today by johnlcallaway · · Score: 1

      Normally, I would agree with your comment. I want to load W2K or XP so that I can use MS Money and a few other MS specific applications. No big deal, I can use either OS for whatever is needed.

      But Linux loaded up just fine the first time and I have spent over 8 hours trying to get either W2K or XP to load. W2K freezes so bad that even the caps lock key won't work, and XP keeps BSODing on me and telling me that I have bad memory, which is odd since Linux works just fine.

      Now, to be fair, my guess is that I have the wrong memory chips. Both Kingston and Crucial suggest other chips than what MWave shipped with the board. I think the CL value, which is 3, should be 2.5 and this is what is causing the problem.

      But I would love to know why Linux worked just fine yet XP and W2K have such problems. (I say worked because I took it off just in case it was part of the problem somehow, you know the old bit about removing anything but just what you need to solve a proble.)

      --
      I rarely read replies, it's my opinion and if you thought about your opinion a little more, I'm OK with that.
  84. Remember the start of the net? by MongooseCN · · Score: 0

    Although this w.

  85. Teoma by Anonymous Coward · · Score: 0

    Teoma is the search engine that does exactly what you are asking. It breaks the internet down into topics and subjects, and it only counts links within these subjects. For example sites about the apples farmers grow probably link to other sites of these same type of apples, whereas sites about Apple Computers probably link to each other more, and this is how Teoma can recognize each as a different subject because of its link farm. Teoma gives refinement suggestions to help you navigate through its subject related clusters, and lists pages on a subject with lots of relevant links, under its link collection.

  86. Mailing lists by ajs · · Score: 4, Interesting

    The thing that is starting to bother me is not the search-spam (easily removed over time with increasingly smart ranking), but the mailing lists. If 20 sites around the net archive the same mailing list, then I'll get the first 20 hits in most techical searches from the same list. Google really needs some way to identify duplicate archives (which is hard given that they're all formatted differently) and treat them as one "site".

    1. Re:Mailing lists by Just+Some+Guy · · Score: 1

      No kidding. And out of those 20, 19 will be to Geocrawler in all its cruddy non-threaded, un-intuitive glory, meaning that I usually revert to searching groups.google.com for the same post so that I can read it in context.

      --
      Dewey, what part of this looks like authorities should be involved?
    2. Re:Mailing lists by pediddle · · Score: 1

      Google does already filter out "similar" pages -- if you ever get to the bottom of a short list of results, you see that link to include the duplicate pages. It's obviously not perfect though, but IMHO it's better that they're conservative about it to avoid filtering possibly significant information.

    3. Re:Mailing lists by CvD · · Score: 1

      Oh yeah, that totally pisses me off.. most of the mailing lists can be found back on usenet anyway... or many articles on them, anyway. I mean, if I search for a technical computing term, I don't endless lists of crap, all with the same fucking message. I want a simple page where someone has documented how to do something. If I can't find that, next stop is google groups. If not there, then I might resort sifting through mailing lists, which have a nasty interface and are a pain in the ass to browse. Anyone know why these are always at the top of the list?

    4. Re:Mailing lists by millette · · Score: 1

      One thing I appreciate is google now only shows a couple of hits for each subdomain by default. That helps keep the clutter away.

    5. Re:Mailing lists by ajs · · Score: 1

      Yeah, they just need to expand that to the special case of mailing lists where formatting can be wildly different, but content is the same.

  87. Google has a page about this... by SilentT · · Score: 2, Informative

    Go here for instructions on removal from their index.

  88. New layout (slightly OT) by smart.id · · Score: 0

    This is slightly offtopic, but is anyone seeing a slimmer Google? The blue tabs are missing on the front page, and the search result pages are slightly different. I only see it when I use Firefox, IE and Firebird still show the old layout. Anyone else seeing this?

    --
    blog & fiction: jd87
  89. Re:Remember the start of the net? (retry) by MongooseCN · · Score: 1

    For some reason moz wiped out the part of the message that was highlighted in my above post when submitting. Here's what I was saying:

    Although this was before my time, the university I went to (WPI) used to have a board on it with all the known urls at the time. Every few days someone would add another url to the board. Ah, the days when you really could print the whole Internet out.

  90. Number One by Michael.Forman · · Score: 2, Interesting


    The upgrade has been quite good to me! Before the upgrade a search for my name would rank my website many pages down and then only secondary links not the root site. Now I rank number one! It looks like all my slashdot posting has finally paid off.

    Ahh. The small victories of the computer geek.

    Michael.

    --
    Linux : Mac :: VW : Mercedes
  91. Re:I think it is important by kfg · · Score: 1

    In 1959, a year and a half after IBM announced their first fully transistorized computer, I was issued a federal ID number. I rather considered that the proof that I could be databased quite easily, for any purpose whatsoever.

    KFG

  92. Sturgeon's Law by sarastro_us · · Score: 1, Informative

    90% of everything is crap...

    1. Re:Sturgeon's Law by tehcyder · · Score: 1

      Except on the internet where the figure's 99.9%

      --
      To have a right to do a thing is not at all the same as to be right in doing it
  93. Google mostly ignores META tags by friedegg · · Score: 1

    Your description comes from the Google Directory, which comes from DMOZ.

    --
    Google doesn't index user sigs, so stop trying to "Google Bomb" with them.
  94. Okay, here's how it's all implemented. by Jerky+McNaughty · · Score: 1

    All of this time, I thought google was actually doing something interesting. It turns out, these guys aren't really doing anything at all! I took a tour the other day and here's what things really look like behind the scenes.

    They have 2 front end web servers running Apache on some eMachines they got at Best Buy. They have a backend MySQL server running on a really big eMachine (2 GHz, if I recall correctly). The backend MySQL machine has two IDE hard drives, but they are like 200 GB each. They're hooked up via a 256K frac T1, but I hear they are behind on the monthly payments.

    Each time you hit google's page and do a search, it issues an SQL query like this:

    SELECT * FROM the_web
    WHERE text REGEXP [what you entered]

    They just moved out of one of the guy's apartments into some small rented office space in a shady part of town outside of Mountain View, CA. Mark my words, these google guys will be out of business in like two months.

    I'm going on tours of Yahoo and Amazon in the next couple of weeks. I'll get to the bottom of this internet hype, don't you worry.

    1. Re:Okay, here's how it's all implemented. by Anonymous Coward · · Score: 0

      I still can't figure out if this is a troll, or you were trying to be funny. Eitherway, this has to get the award for most retarded statment of the day.

    2. Re:Okay, here's how it's all implemented. by Anonymous Coward · · Score: 0

      I was trying to be funny. I guess it didn't work.

  95. Tinfoil... maybe? Hahaha. Try yes! by Anonymous Coward · · Score: 2, Insightful

    Why are people getting so upset about Google logging the exact same information as most other websites? Yes, they log your ip, your browser, what you got, where you came from and when you were there. So do I! So does Slashdot! So does every other major search engine. And, if someone is so worried about cookies, disable them. It's easy enough to do. This GoogleWatch site is incredibly biased and simply draws on people's fears. If you don't like Google, don't use it.

  96. Google alternatives: Gigablast by MikeCapone · · Score: 2, Informative

    My favourite right now is GigaBlast.

    It's still smaller than most other search engines, but it's quite fast, has good relevance and it indexes stuff in real time.

    Besides, if you don't find what you are looking, you can do the same search with 5 other search engines just by clicking on links at the bottom of the results page.

    But what I like with Gigablast is that it's always getting better and I feel like part of something that has potential.

    1. Re:Google alternatives: Gigablast by Anonymous Coward · · Score: 0

      You're right... Interesting...
      I "discovered" several "Google GUI" web sites but they were too bloated. This one's light, faster and good.

    2. Re:Google alternatives: Gigablast by glinden · · Score: 1

      Amazing thing about Gigablast is that it's a one man effort.

    3. Re:Google alternatives: Gigablast by MikeCapone · · Score: 1

      Amazing thing about Gigablast is that it's a one man effort.

      Yes, you can read Matt Wells' technical blog in the "about" section on the main Gigablast page.

      It's also pretty impressive that he could make the project on very limited hardware (I think he has 6 Linux boxes, although he's planning on buying more hardware (if it's not already done)).

  97. Already been done, sort of by first.last · · Score: 1, Interesting

    Kind of like this?

    --
    Wishing I was a millionaire since 1969.
    1. Re:Already been done, sort of by devilspgd · · Score: 2, Funny

      Without the dupes it's just not the kind of slashdot I think I could associate myself with on a regular basiss.

      --
      Give a man a fish, he'll eat for a day, but teach a man to phish...
  98. So.. by Anonymous Coward · · Score: 0
    So you mean they just indexed a few billion gateway pages and other auto-generated keyword pages like example.com/digital-camera-purchase-fuji-finepix-r eview.html

    Hurray.

  99. Re:Come on! by Anonymous Coward · · Score: 0

    Yeah. Only slightly more pathetic than the "did anyone else read this as 'Boogle's Gigger Index'?" retards.

  100. cable descrambler by nycsubway · · Score: 1

    search for "ntsc cable descrambler". you will get thousands of results, and ALL are spam. some search terms are just more likely to produce spam results.

    1. Re:cable descrambler by opello · · Score: 2, Insightful

      the folks at google could invent a -spam option, so those searching for 'diode wave guide' wouldn't have to put -dildo, but just include a -spam

  101. Perhaps they can fix those profiting from spam.. by Anonymous Coward · · Score: 0

    This is about the third Google related spam I've seen.
    I know its a SEO company that's doing it, but given there is going to be a route to them, perhaps Google can Cease and Desist em?

    Thanks Sergei and Larry!

    <x-html><!x-stuff-for-pete base="" src="" id="0" charset="iso-8859-1/macintosh"><html>
    <head>
    <ti tle>gam nszikvtojz fxyqkqbfmrsz u shzjaq x ib r xxxq
    vp h sm ip hnuigzykilvfxsctsmb q clean</title>
    <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
    </head>

    <body>
    <p>&nbsp;</p>
    <p><a href="http://www.globalmarketing2000.biz/cashinwit hgoogle/">Cash
    in with Google</a> makes earning an affiliate income very simple. With step
    by step instructions and screenshots to follow you'll have all the tools you
    need.</p>
    <p></p>
    <p><font size="2">no more <a href="http://www.globalmarketing2000.biz/remove.ht ml">emails</a>
    please </font></p>
    </body>
    </html>
    envpw

    </x-html>

  102. Another milestone soon? by color+of+static · · Score: 1

    I want to know when Google is going to have more machines in their server famrs then all the other domains on the Internet put together? Having seen one of their installations (I don't think it was a big one either), it can't be far off. Then they should be able to Index the Internet :-).

  103. slashdotted by cubyrop · · Score: 2, Funny

    could someone plz mirror google.com? looks like it got /.'ed

    --
    If I could make this sig kill you, I would.
    1. Re:slashdotted by jcuervo · · Score: 1

      Hmm. Wonder if Google has a cache of itself...

      Yup.

      Funny: Google is not affiliated with the authors of this page nor responsible for its content.

      --
      Assume I was drunk when I posted this.
  104. The real innovation is... by warpSpeed · · Score: 4, Funny
    When http://pr0n.google.com/ goes live

    1. Re:The real innovation is... by spood · · Score: 1

      It's currently aliased to http://www.booble.com.

      --
      ---- Just another spud server.
  105. Innovation? by bkhl · · Score: 2, Funny

    Since when is making something bigger innovation?

    I'm just going to go innovate some more tea into my mug.

  106. Bah. It's not the size... by harmonica · · Score: 1

    ...of the index but what you do with it. ;-)

  107. Better way to tell Google of bad results by sam1am · · Score: 3, Insightful
    Better than that spam report form for problems with particular searches is the Quality Feedback Form which includes the information about your search for better followup:
    At the bottom of the page, under the second search box, is a phrase "Dissatisfied with your search results? Help us improve." - Follow it and the form will ask you to:
    1. Please tell us what specific information you were seeking. Also tell us why you were dissatisfied with the search results.
    2. Were you looking for a specific URL that wasn't listed in the search results? If so, please enter the URL here..
    --
    HUMANS do it better
    1. Re:Better way to tell Google of bad results by geoffspear · · Score: 1

      If I was looking for a specific URL that I already know, why would I have been using a search engine in the first place?

      --
      Don't blame me; I'm never given mod points.
    2. Re:Better way to tell Google of bad results by pbrammer · · Score: 1

      Maybe you found the URL after the 30th page, of which 29.5 pages were all spam/pr0n/etc... So, you thought you'd post quality feedback to the team at Google.

  108. PNG! by pmsyyz · · Score: 4, Interesting

    ... Advanced features include search by image size, format (JPEG and/or GIF) ...

    They didn't mention PNG, the turbo-studly image format which Google Image Search does indeed support.

    It seems they used to have very few PNGs in their database, but now a search for +a filetype:png returns 700,000 results!

    --
    Phillip
  109. Damn, I'll never catch up now! by jocknerd · · Score: 0

    It will take me 190 years to see it all. And thats just a second each. Depressing.

  110. M1: Get the context by Anonymous Coward · · Score: 0

    The moderators are brutal today. Did they even read the grandparent?

  111. Mod parent up by Uksi · · Score: 1

    I wicked agree... that GoogleWatch site is full of crap. Same stuff applies to Yahoo, IMDB, AltaVista, any other search site.

    The fact is that your search queries are logged no matter what search engine you are using. If you follow a link from one website to another, the other web site can log where you came from.

  112. Interesting ?????? by Anonymous Coward · · Score: 0

    Man are moderators on crack

  113. Re:he's got a point! by ffub · · Score: 1

    It's fine. I checked. A quick google of mirko, forums and gnuart threw up his forum, which it shouldn't do because it's /robots.txt file reads:

    User-Agent: *
    Disallow: /

  114. Google "search engine" by Anonymous Coward · · Score: 1, Interesting

    And click on "I'm feeling lucky"

  115. Mirror - don't want to /. Google by jimicus · · Score: 1
  116. Quality, not quantity by Espectr0 · · Score: 1

    Nobody would ever need that many searching.

    What i would like to have is for those spam sites to stop being linked in google. Search for something like "free motorola ringtones" to see what i mean

  117. Similar image search by dargaud · · Score: 1

    Last week I tested some of those progs (freeware and shareware) and was pleasantly surprised by how well they worked. I have gone through 4 generations of slide scanners, so when I get a new one, I rescan all my best slides and want to give it the same name as the old file. We are talking thousands of png files at 4000dpi, typically 40Mb each. It took less than 30 minutes to search through 100Gb of images with only 10% false positives (okay, I got a fast machine with a fast drive). Can't remember the name of the app right now, sorry.

    --
    Non-Linux Penguins ?
    1. Re:Similar image search by Anonymous Coward · · Score: 0

      One good program is D'peg. Indispensable when you download tons of child porn from KaZaA and want to remove the duplicates (be they recompressed, resized, cropped, etc.).

  118. sco fell in "litigious bastard" search. by morcheeba · · Score: 2, Informative

    When you search for "litigious bastards", you now get a website promoting the googlebomb technique listed first. The sco group was listed first, but now it's ranked about 47. I'm not sure if they are reducing the relevance of the link-text, or if the ranking has been lowered because the sco group probably doesn't point back at any of the blogs that link to it.

    1. Re:sco fell in "litigious bastard" search. by Anonymous Coward · · Score: 0

      Looks like they are still #1, liar.

    2. Re:sco fell in "litigious bastard" search. by morcheeba · · Score: 1

      Maybe I should ammend what I said. When I search for "litigious bastards", I don't get sco first. Since their database is distributed, you might be getting an old copy that still has sco ranked first. Or, I could be getting an old copy, but I doubt it because SCO has been up there a while.

  119. Assuming by Snaller · · Score: 1


    You aren't searching for more than 10 words.

    --
    If Google really cared they would fix Android Chrome to reflow text, instead of discriminating
  120. adsense is making sense by DrSkwid · · Score: 3, Interesting


    Google's adsense service https://www.google.com/adsense/overview

    is certainly a winner

    The ads presented are similar to the paid ads shown on a standard google search but using the keywords of the page displayed and also tailored to the country of the viewer via their ip address.

    In this way webmasters can maximize the global potential of their website.

    We have some very highly ranked pages (i.e. top 10) but for UK only content. Now our visitors who find us via search engines and discover we aren't quite what they want are presented with a relevant exit strategy and we get a commission!

    We're getting an average 1.7% click through rate which is translating into a nice tidy sum.

    go google! keep kicking MSN's dirty butt

    --
    There are places where the networks are not touching,and there are places where they are-Boeing's Lori Gunter
  121. You searched for by bonaldi · · Score: 3, Funny

    We have batteries and accessories for your Google's Bigger Index. Buy now from our extensive selection of Google's Bigger Index, and when you buy your Google's Bigger Index you get free shipping. Buy now. Google's Bigger Index.

    God, google sucks nowadays.

  122. this is google's response to yahoo? by mcguyver · · Score: 1

    Yahoo turned on inktomi on yahoo.com, meaning their search results do not depend on google's algo. It's a little odd, however probably just a coincidence, that this google announcment came just as yahoo flipped the switch.

    1. Re:this is google's response to yahoo? by polymorpheus · · Score: 1

      Yeah, Yahoo flipping the switch today(?) is a plausible reason for the Google (non) announcement. Not a coincidence at all, IMO. Google should worry a bit, but not too much since they're still way ahead.

  123. Is Google the biggest? by Ed+Avis · · Score: 1

    Six billion items... how does that compare with other search engines like Alltheweb or Teoma, or even the venerable Altavista?

    --
    -- Ed Avis ed@membled.com
  124. That is suspiciously close by K-Man · · Score: 2, Informative

    It's probably not a big deal to expand the capacity, but it certainly looks like it's pegged to 2^32 for this release.

    --
    ---- "If we have to go on with these damned quantum jumps, then I'm sorry that I ever got involved" - Erwin Schrodinger
  125. Most of which is innactrate... by Frobozz0 · · Score: 1

    "This innovation represents a milestone for Internet users, enabling quick and easy access to the world's largest collection of online information."

    Most of which have broken links, are wildly innacurate, and contains completely unresearched information.

    Kudos. We are blessed indeed. :-)

    --
    "Politicians find new names for institutions which under old names have become odious to the people."
  126. Re:he's got a point! by radish · · Score: 1

    Of course the robots.txt is only read when the site is spidered. Is it possible the site was up for a while before the file was added?

    --

    ---- Den ene knappen er powerknapp, den andre er Bender voice knapp "Bite My Shiny Metal Ass"

  127. Massive computing power by dj245 · · Score: 2, Insightful
    While I would love to see such a thing on Google, I do not think such a thing is really plausable at this time. First of all, it takes massive computing power to process such a vast quanitity of data. When I process my database of images for duplicates and similar images, it usually takes over 10 hours to generate index jpgs and CRCs and to compare them on my athlon 1800+ with 640mb ram. And I "only" have about 130,000 pictures of, uh, family and friends.

    How many pictures does google have to index again? A lot. Sure, google has huge racks of clusters, but they are expanding pretty fast as it is. Does Google really want to add a bunch of racks to add a feature that maybe 20% of the people would use? I honestly don't know. I do know that google, like any company, will add features that are easy and cheap to implement, but probably won't if it means adding rack upon rack of servers.

    --
    Even those who arrange and design shrubberies are under considerable economic stress at this period in history.
  128. Worried about reliability by xihr · · Score: 2, Interesting

    Especially with this announcement, I'm starting to get worried about the reliability of Google. More and more groups are taking advantage of quirks in Google's ranking system, as has been mentioned in previous Slashdot articles, to the point now where if you're searching for something even a little outside of the pop-culture mainstream (where you will be inundanted with valid hits) you will find tons and tons of automatically generated garbage hits on "providers" who boost their indexes by feeding links to each other. Google is a great service; I hope that in its desire to continue its ever-expanding dominance of the search engine market, they don't let themselves get too complacent and let their search engine technology become stale in the sense of it being so abused that for reliable results you need to look elsewhere.

  129. For those of you who were wondering/complaining... by Afromelonhead · · Score: 3, Informative
    According to Google's cache of Google, there used to be only 3,307,998,701 pages in their index, as opposed to the 4,285,199,774 (as of writing) in the index.

    It's also interesting to note that both have a copyright date of 2004, which would imply that Google has found just under 1 billion websites in a month and a half, which seems like an interesting fact.

    --
    Procrastination sucks.
  130. Google is still #1 by SphericalCrusher · · Score: 0

    I think this just insures the fact that Google.com is the leading online search engine. Even though they didn't make the sites themselves, this still greatly helps in the enlargement of the internet.

    --
    "Instant gratification takes too long." - Carrie Fisher
  131. Ftp Search by J2000_ca · · Score: 2

    I'm still looking for ftp search to be included into google.

  132. Report Faked URLs by delfstrom · · Score: 1
    You can report faked, spoofed, and otherwise deceptive spamming of Google's index by going to http://www.google.com/contact/spamreport.html.

    They are very receptive to reports and I've seen deceptive sites removed from the index in less than 12 hours.

  133. Even better way to report by delfstrom · · Score: 4, Informative
    The "help us improve" link is okay, but a little general. Most of us slashdot readers know when a search result is truly bogus, and there's a more advanced form we can use for reporting abusers directly:

    http://www.google.com/contact/spamreport.html

    This will give you options of reporting cloaked pages, doorway pages, deceptive redirects, misleading or repeated words, hidden text, etc. You have to be more specific than the "help us improve" link at the bottom of search results. Using this form I've seen abusive sites disappear from Google's index in less than 12 hours.

  134. URL Please by rixstep · · Score: 2, Funny

    What's Google's URL please.

    I can't find it on Google.

  135. Oh really by jkovach · · Score: 1

    While Link 1 is the admittedly useful "Television Antenna Frequently Asked Questions", Links 2, 3, and 4 are spam, link 5 is a press release, link 6 is more spam, and the useful links start somewhere after that. From this end, it seems like nothing much has changed...

  136. Latest search terms added: variations of Viagra by thirty2bit · · Score: 1

    Vigara, viiagra, viagara, veragra, v1agra, viaagra...

    All were taken from the 2004 edition of the SPAMmers's Dictionary.

  137. I don't know about the rest of you... by dupper · · Score: 1

    ... but, these days, I find myself using Google's Newsgroup search before their main search for most things. The main search is so full of crap, but the Usenet search contains much less spam. Often more insigtful and informative, often conatining direct links to helpful websites, so you don't need to wade through the crap in a main search. On any given topic, there's usually some great old posts from 1994, or something, when the Newsgroups was (or seems to be, given the results I've been getting) damned near specialized periodicals from actual professionals in whatever topic a given group had. Hell, you can even do a porn search on the Googled Newsgroups, not get spammed crap, and actually find helpful information. You haven't been able to do that on a main web search since the early days of Yahoo's golden years, and even then it was mostly crap.

  138. You hum it, we'll find the mp3 or midi.. by zcat_NZ · · Score: 2, Interesting

    Waikato University has a music recognition system that would be awesome on google - if you can hum a few notes, it'll match it with the original tune. Remember all those emusic tunes that ended up as 'elevator' music? A lot of them are free downloads and still available on the artist's websites, but if you hear a tune you like while you're waiting on hold how do you find it?

    Also, it would be cool if I could upload a text-overlayed, renamed thumbnail from usenet and google could find the matching full-size image for me.

    --
    455fe10422ca29c4933f95052b792ab2
  139. Libraries, whatever... by schmiddy · · Score: 1

    I'm going to get modded down into oblivion for saying this, but whatever.

    For some reason I see a bunch of /.'ers coming out of the woodwork in support of the local libraries and expounding on how incredibly useful dead tree literature is. Another attitude that seems to crop up both here and in University classroms a lot is that "stuff on the Internet is all unverifiable crap and should never be used in real papers"

    In the words of the parent:

    The books you find will have references, and you can follow those to immense amounts of material more specifically related to the angle you've chosen. And none of it is on Google.

    Know what? I'm calling bollocks on this attitude. Wake up folks. It's the 21st century, not the 17th. I'm a college student. I've been there, done that. I know it's f-in tough to be forced to crank out a bunch of bullshit papers. That's life. But the Internet has made it all easier. If I need to look up information on, say, Hamlet, volumes of information are a few clicks away. Yeah, I've heard the usual jabber about how the barrier to entry on the Internet is practically nonexistant and anyone can publich any useless crap, yadda yadda yadda.

    Again, wake up people. That's what google and Pagerank is for. I'm not a total idiot. I know bullshit online when I see it. Moreover, if I look an opinion online over, and I think it's enlightening, there's a pretty good chance that my Professor will too. Yeah, if I wanted to invest hours and hours in a three page paper I could go out through the cold and snow to the library, hunt through the antiquated card catalogue for what I'm looking for, and actually read a real resource for factual, honest-to-god information.

    Forget it. Know where I turn to when I can't figure out what the hell is wrong with my network card? Where I go when I want to know trivia, like whether the original Goldfish were "cheddar" or "plain"? Right. It ain't the public library. And it's the same place I go when I want to find some useful information about Hamlet or any other serious research, for that matter. Instead of manually flipping through pages of some damn research books I've got the clusters of Google grep'ing through god only knows how many pages. Yeah, there are crazy ideas out there, but again, I'm not an idiot. And neither is Google for that matter. Pagerank is your friend, library-lovers.

    And another thing. The parent whined about how there's not enough material available online due to copyright crap.

    US and world copyright laws are keeping almost all the content from being eligible for online publication, even if their profit windows are long closed.

    He's absolutely right. Libraries will be dead the minute copyright law gets toned down to a 10 year span and every legal book on the planet has been OCR'ed. In the meantime, put your stuff out there for free. It makes a difference. Write a big research paper? An English paper? Science paper? Put it under GPL/Creative Commons/BSD/whatever, and let people have it. You don't need it anymore. Don't be so damn possessive. You won't be able to take the stupid papers with you when you die.

    I'm not just blowing smoke. I do this on my own site. Yeah, the papers I've put out are just stuff that I or others have written, but it helps. Information is a good thing. Let Google decide if your paper is good enough to show up in a search.

    --
    http://cltracker.net -- powerful craigslist multi-city search
    1. Re:Libraries, whatever... by PollGuy · · Score: 1

      So... if I can sum up, you are saying that what I said is boloney because any person can filter the junk from the gems using pagerank.

      That's fine for precision, but what I'm talking about is recall. Sure, you can find some stuff on Hamlet, which is huge, but how about entire volumes of the history of 17th century french poetry? A detailed history of mythological studies? Profs can spot shallow research as easily as bad sources. The mass of volumes simply does not exist online. The info just is not there.

      You say: Libraries will be dead the minute copyright law gets toned down to a 10 year span and every legal book on the planet has been OCR'ed. In the meantime, put your stuff out there for free. It makes a difference. Though I can't see how that is a rebuttal of my main point. In the here and now, the information does not exist online. When the copyright gets toned down adn every legal book gets OCR'd, then I will join you in celebration of the full sufficienty of Google. But don't expect the latter for at least 20 years, and I don't even want to hazard a guess on the former.

  140. Altavista by rp · · Score: 1

    2000 is not "before Google".
    The oldest Altavista page in the wayback archive,

    http://web.archive.org/web/19961022174810/http://w ww.altavista.com/

    is much smaller, but it already features ads. Early Altavista was adless.

  141. HTTP Authentication? by mparaz · · Score: 1

    Maybe you'd want nobody else to get in. How about using HTTP Authentication?

    1. Re:HTTP Authentication? by mirko · · Score: 1

      No.
      People may come to it but it should not be searchable.

      --
      Trolling using another account since 2005.
  142. Re:Come on! by use_compress · · Score: 1

    I thought it was funny becuase there was a hint of truth in it. All facets of our life are quickly being tied into and run through the Internet. Posting on Slashdot and further developing the Internet is, in a sense, welcoming our new HTML overlords. It's funny and instightful because it was an unexpected truth wrapped in a Simpson's Quote/Slashdot Cliche.