Slashdot Mirror


Google's Bigger Index

WebGangsta writes "Google Inc. today announced it expanded the breadth of its web index to more than 6 billion items. This innovation represents a milestone for Internet users, enabling quick and easy access to the world's largest collection of online information."

37 of 412 comments (clear)

  1. Here's hoping by r_glen · · Score: 5, Interesting

    ... this will lead to an increase in the integrity of PageRank(TM), and vintage Google will return in all her glory.

    1. Re:Here's hoping by Destoo · · Score: 5, Interesting

      So it's not just me..

      First, the reindex that happened a few months ago removed all cross-reference with accents.
      (where google would find the same number of links for both the word and the unaccentuated word... right now: soupcon: 9,750 - soupcon: 88,500)

      Then, when searching for anything regarding ras error messages, I get 30 links from spammer and then the real stuff.
      Example: 711 error yields multiple links for similar pages...
      "Your one stop resource for all things error 711 remote access connection
      management related. ... error 711 remote access connection management. ... "

      Vintage Google.. in Net years, that's 15-16 months ago, right?

      --
      Nouvelles de jeux et technologies en français. TC
  2. It could be much smaller ;-) by ChaoticChaos · · Score: 5, Funny

    ...yeah, but it would only be 2 billion items if all the Janet Jackson stuff was removed. ;-)

    1. Re:It could be much smaller ;-) by Lev13than · · Score: 5, Funny

      ...yeah, but it would only be 2 billion items if all the Janet Jackson stuff was removed. ;-)

      And if they'd just stop indexing blogs, the entire Internet would fit onto a CD.

      --
      When you have nothing left to burn you must set yourself on fire
    2. Re:It could be much smaller ;-) by kilonad · · Score: 5, Funny

      But... but... this company called AOL keeps shipping me the entire internet on a CD all the time!

    3. Re:It could be much smaller ;-) by fredrikj · · Score: 5, Funny

      And if they'd just stop indexing blogs, the entire Internet would fit onto a CD.

      You could fit the blogs on a CD as well. Just store a template blog and include a program to generate random variations, e.g. "my dog has fluffy fur today" vs "my cat has fluffy fur today".

      Technically, this would be "lossy compression" (since some data is deprecated but no one will notice the difference). Though on the other hand, it could even be argued that removing blogs entirely would be a form of "lossless compression".

    4. Re:It could be much smaller ;-) by kevin_ka · · Score: 5, Funny

      And if all the pron was removed there would be only 1 website left and that would be a petition to bring back the porn

  3. Heh by PaintyThePirate · · Score: 5, Interesting

    Anyone else find it funny that Google has around one item for every man woman and child on earth?

    1. Re:Heh by Attaturk · · Score: 5, Insightful

      Anyone else find it funny that Google has around one item for every man woman and child on earth?

      I'd find it funnier if every man woman and child on earth at least had unrestricted access to Google and everything it links to.

    2. Re:Heh by Anonymous Coward · · Score: 5, Funny

      One page for every man woman and child. That sounds exactly like the thinking of a machine to me.

    3. Re:Heh by rylin · · Score: 5, Funny

      My page was taken offline by the .cx registry

  4. Most press-release like post ever by Chris_Jefferson · · Score: 5, Insightful

    While I love google, this is so obviously just a link to a press release, and even worse the first line of the press release cut-and-pasted onto slashdot's page. And is going past 6 billion really that important?

    --
    Combination - fun iPhone puzzling
    1. Re:Most press-release like post ever by twilight30 · · Score: 5, Insightful

      What sucks about the press release (indeed, makes it sooo press releasy) is the total lack of anything that makes it useful:
      * "...to 6bn" : From what number before?

      And I still can't find what I'm looking for! (pun definitely not intended)

      --
      ========================================
      Death will come, and will have your eyes
      -- Pavese
  5. Google, over 6 billion served. by Anonymous Coward · · Score: 5, Funny

    They beat McDonalds.

  6. Related? by SkiddyRowe · · Score: 5, Funny

    In a related story Booble's index just expanded to a Double-D.

    Little boys across the globe will have sore arms tommorrow.

  7. It's only a matter of time.. by pacsman · · Score: 5, Interesting

    I'm waiting for them to come up with a sound search and an image search that look at the subject of the image rather than its file name. After that I'm not sure what's left. Maybe comparative searches for sounds and images, where you can upload a source to compare? Who knows! I hope these guys don't follow the normal path of spiralling into inconsequence after they go public.

    1. Re:It's only a matter of time.. by misof · · Score: 5, Insightful

      As far as I know, image search in the way you want it is still only a dream. But. Approx 2 years ago I attended a conference focused (mainly) on theoretical computer science. I saw some researchers (I think they were from Italy, not sure) present an early implementation of their algorithm to look for similar images to the one you select.

      The idea behind: For a computer, it's not easy to tell what exactly does an image contain. E.g. take all those "type the word you see above inside this box to prove you are not a bot" registration forms. If there are no working algorithms to tell "this image contains the word SLASHDOT written in yellow and blue stripes on a pink-dotted black background", the chances of creating an algorithm to tell "this is a game of tennis, it is probably played in the afternoon somewhere in England" are really low.

      However, by using various approaches from CG (comp. graphics), you MAY be able to tell whether two images are similar or not -- as simple examples consider edge detection, color spectrum, etc. As I already mentioned, such algorithms have already been implemented and their success ratio is already reasonably high. I expect that it won't take long until we see them on google.

      Note that using the ideas above you CAN search for an image with a given subject -- it just requires two stages. Suppose you want an image of a sun setting down somewhere in the mountains. Stage 1. You enter "sunset" into google's present search engine. You get lots of sunsets, several dogs named Sunset, a chinese girl Sun Set, etc. Then you select one of the sunsets most resembling the image you want and you tell google (or some other engine) to find all similar images. Et voila.

  8. Re:how many? by sensei_brandon · · Score: 5, Funny

    exactly. I searched for "diode wave shaper" one time and got three hits -- all for porn. I had no idea diodes were so fap-worthy.

  9. A company spokesman added... by Boing · · Score: 5, Funny

    ...that remarkably, a full five-sixths of the content consisted of different versions of the Google logo.

  10. What I want to know... by Bob+McCown · · Score: 5, Interesting

    ...is how to get rid of those pseudo-pages in Google. The ones with names like "thing_that_youre_searching_for.html", and all they are is either a page of dead links to crap on ebay, or a "Hey, we do great searches for your stuff".

    1. Re:What I want to know... by ctishman · · Score: 5, Informative

      Use that "Dissatisfied with your search results? Help us improve." link at the bottom of the page. Voila.

    2. Re:What I want to know... by samcentral2000 · · Score: 5, Insightful

      I totally agree. These day, whenever I use google, I always include "-search" in my search. Cleans it right up :)

  11. "...represents a milestone..." by stratjakt · · Score: 5, Insightful

    No it doesn't. It represents a pretty reasonable upgrade for Google.

    It's expected as the web grows, so will the search engines.

    This isn't exactly a man-on-the-moon accomplishment.

    --
    I don't need no instructions to know how to rock!!!!
  12. is it just me? by trans_err · · Score: 5, Interesting

    Google has become so flooded with internet crap that it's quickly losing its status as a useful tool. Google needs some form of moderation to move out the superfulous blog entries and advertising fronts so it can someday become as useful as it always was.

  13. Still nok by mirko · · Score: 5, Interesting
    • I own a forum on top of which I put a robots.txt file which is supposed to STOP any spider from visiting it.
      I however find my post while googling for words they also contain.
      How can one explicitely forbid Google from indexing a site ?
    • My wife developed 2 web sites which never got indexed even though we submitted these using Google's interface. As they might not be linked, I suppose Google just considers that if nobody mentions a site, then the site should not be registered as existing ? Do Google think it actually is the web ?

    Sorry, I'll keep using Altavista.
    --
    Trolling using another account since 2005.
  14. They said 6 billion items, not webpages. by LostCluster · · Score: 5, Informative

    Notice that they claim that they search 6 billion items, but the home page only claims that they're "Searching 4,285,199,774 web pages".

    To find the rest, we need to use Google's other services. The image search is claiming "Searching 880,000,000 images". Google Groups says its "Searching 845,000,000 messages". Add those to the count and you get 6,010,199,744 items total.

  15. Re:how many? by Anonymous Coward · · Score: 5, Informative
    That sort of search result spamming is getting out of hand.

    Maybe if more people used Google's Search Quality feedback form, it would help weed them out.

  16. Sort out their indexing problems first by jolyonr · · Score: 5, Interesting

    I do hope they manage to sort out their recent indexing problems first. For many searches altavista is now showing far better relevent result searches than google - since their attempted cull of 'spam' sites last december which kind of backfired. They have improved things this year, but the quality of their search results is not as good as it was last year. Now, they need to figure out how to get rid of all the useless sites that are just shopping directories full of espotting URLs and similar and with no real content. Funnily enough, their anti-spamsite code seemed to actually promote these up the rankings on many search terms, while penalising many sites containing genuine content.

    Many people said that Google were using deliberate tactics to encourage small e-commerce websites to spend more on adwords, but I believe this wasn't deliberate - their index is so big that they simply can't tell what the results of their changes are going to do to the search orders for all the search options that people are going to use - and they simply didn't realise in advance the problems they were going to cause. And google have made efforts to minimise the damage since then, but they still need to do more.

    Jolyon

    --


    Please read my Canon EOS tech blog at http://www.everyothershot.com
  17. Since when did bigger == innovation? by Moderation+abuser · · Score: 5, Insightful

    It just means bigger. There may well be innovation in the technology which allows bigger, that might have been news for nerds, but bigger itself isn't innovative.

    --
    Government of the people, by corporate executives, for corporate profits.
  18. Thanks by KillerHamster · · Score: 5, Funny

    so much for the link to Google, I never would have found it otherwise.

  19. Run out of indexing space? by rqqrtnb · · Score: 5, Interesting

    I heard that Google is using 4-byte ints for DOCids and they have been running out of indexing space since they are pretty close to 2^32 pages already. Is that true?

  20. Google Print by blorg · · Score: 5, Informative
    "Google's collection of 6 billion items comprises 4.28 billion web pages, 880 million images, 845 million Usenet messages, and a growing collection of book-related information pages."

    I was interested that they mentioned Google Print, which is Google's answer to Amazon's Search Inside feature, but hasn't got much press, and is pretty well hidden in Google itself.

    You can check it out by limiting results to site print.google.com, e.g. searchterm site:print.google.com. (Not quite at Amazon-type numbers yet.)

  21. Caveat Emptor by erick99 · · Score: 5, Insightful
    Google is my favorite search engine. That said, I hope that most folks understand that just because they "google" something does not make that something a fact. Also, the first few pages of any search can be the result of manipulation to get in the top 10, 20 or 100. It is really, really important to consider the source when doing any kind of research on the 'net. I am homeschooling my 13 year old and having a hell of time getting these lessons across to him. He can research almost anything in a fraction of a second, but it takes a bit longer to separate the wheat from the chaf.

    Happy Trails!

    Erick

    --
    http://www.busyweather.com/
  22. Is /. pro Google? by dark-br · · Score: 5, Informative

    "Google currently does not allow outsiders to gain access to raw data because of privacy concerns. Searches are logged by time of day, originating I.P. address (information that can be used to link searches to a specific computer), and the sites on which the user clicked. People tell things to search engines that they would never talk about publicly -- Viagra, pregnancy scares, fraud, face lifts. What is interesting in the aggregate can seem an invasion of privacy if narrowed to an individual."


    That's a quote from the NYtimes (free req. yada yada) also posted as is here

    If any other site were to track the stuff Google does, /. would be up in arms protesting!

    Please note, this isn't a troll, and I'm not wearing a tin-foil hat (maybe I should?). Imagine the following scenario: a bomb goes off in the US. By tracing searches for "anarchist cookbook" to zipcodes within the area of the bomb blast, the FBI could have access to information that makes TIA look like a better alternative.

    Maybe this isn't such a good feature after all...

  23. but... by Savatte · · Score: 5, Funny

    have they beaten Ron Jeremy?

  24. Re:4.28 billion web pages... by JediTrainer · · Score: 5, Funny

    That reminds me of an old Dilbert (paraphrasing here, forgive the small errors):

    PHB: We've run out of accounting codes! We can't do anything without one!

    Dilbert: Why not upgrade the system to accept larger codes?

    PHB: To do that we'd need a budget and an accounting code

    Dilbert: Why can't we reuse a code from an old finished project?

    PHB: Strangely enough, we've never finished a project.

    --

    You can accomplish anything you set your mind to. The impossible just takes a little longer.
  25. Size and Criteria are good, but... by mugnyte · · Score: 5, Insightful


    Too bad the article doesn't mention how google is trying to fight gaming the PageRank system or any of the other problems like commercials in the results. Still a great search tool though.