Google's Bigger Index
WebGangsta writes "Google Inc. today announced it expanded the breadth of its web index to more than 6 billion items. This innovation represents a milestone for Internet users, enabling quick and easy access to the world's largest collection of online information."
... this will lead to an increase in the integrity of PageRank(TM), and vintage Google will return in all her glory.
...yeah, but it would only be 2 billion items if all the Janet Jackson stuff was removed. ;-)
How many of these 6 billion items are in the form of www.massivepopups.com/your_search_term.html
What are we going to do tonight Brain?
Anyone else find it funny that Google has around one item for every man woman and child on earth?
eclecti.cc
While I love google, this is so obviously just a link to a press release, and even worse the first line of the press release cut-and-pasted onto slashdot's page. And is going past 6 billion really that important?
Combination - fun iPhone puzzling
What's going on here? This isn't like Google to put out a press release just because the index size just past a round number.
Is Google setting up for its IPO and therefore becoming less like the Google we know and love?
They beat McDonalds.
In a related story Booble's index just expanded to a Double-D.
Little boys across the globe will have sore arms tommorrow.
I'm waiting for them to come up with a sound search and an image search that look at the subject of the image rather than its file name. After that I'm not sure what's left. Maybe comparative searches for sounds and images, where you can upload a source to compare? Who knows! I hope these guys don't follow the normal path of spiralling into inconsequence after they go public.
...that remarkably, a full five-sixths of the content consisted of different versions of the Google logo.
...is how to get rid of those pseudo-pages in Google. The ones with names like "thing_that_youre_searching_for.html", and all they are is either a page of dead links to crap on ebay, or a "Hey, we do great searches for your stuff".
No it doesn't. It represents a pretty reasonable upgrade for Google.
It's expected as the web grows, so will the search engines.
This isn't exactly a man-on-the-moon accomplishment.
I don't need no instructions to know how to rock!!!!
Google has become so flooded with internet crap that it's quickly losing its status as a useful tool. Google needs some form of moderation to move out the superfulous blog entries and advertising fronts so it can someday become as useful as it always was.
transmission_err
Search for any normal product name with google. What would you used to get ? Billions of useless sites that cross link to each other and have the same bloody reviews from amazon.com
:^)
That seems to have changed!
I just tried a search on television antennas and for once the results seem relevent.
Hooray!! Google is back!!
Sunny Dubey
I however find my post while googling for words they also contain.
How can one explicitely forbid Google from indexing a site ?
Sorry, I'll keep using Altavista.
Trolling using another account since 2005.
I don't want MORE things to search for, I want it to return more relavant searches. I know that the information I usually search for is out there, the problem is that there's so much chafe out there, that I can't find what I want. No matter what I search for, there are at least 2 or 3 responses related to porn. I understand that their are alot of variety of porn out there, but common... Search engines are getting even worse by throwing in search results that are hardly relevant, just because they got paid money by the company. I would even be willing to pay for a "google membership" if they eliminated the advertisers mixed in with search results and maybe gave me another special feature or 2. I'd want a search engine that returns just 1 or 2 good results over one that returns 5 good results mixed in with 200 bad ones.
Notice that they claim that they search 6 billion items, but the home page only claims that they're "Searching 4,285,199,774 web pages".
To find the rest, we need to use Google's other services. The image search is claiming "Searching 880,000,000 images". Google Groups says its "Searching 845,000,000 messages". Add those to the count and you get 6,010,199,744 items total.
I do hope they manage to sort out their recent indexing problems first. For many searches altavista is now showing far better relevent result searches than google - since their attempted cull of 'spam' sites last december which kind of backfired. They have improved things this year, but the quality of their search results is not as good as it was last year. Now, they need to figure out how to get rid of all the useless sites that are just shopping directories full of espotting URLs and similar and with no real content. Funnily enough, their anti-spamsite code seemed to actually promote these up the rankings on many search terms, while penalising many sites containing genuine content.
Many people said that Google were using deliberate tactics to encourage small e-commerce websites to spend more on adwords, but I believe this wasn't deliberate - their index is so big that they simply can't tell what the results of their changes are going to do to the search orders for all the search options that people are going to use - and they simply didn't realise in advance the problems they were going to cause. And google have made efforts to minimise the damage since then, but they still need to do more.
Jolyon
Please read my Canon EOS tech blog at http://www.everyothershot.com
It just means bigger. There may well be innovation in the technology which allows bigger, that might have been news for nerds, but bigger itself isn't innovative.
Government of the people, by corporate executives, for corporate profits.
so much for the link to Google, I never would have found it otherwise.
I heard that Google is using 4-byte ints for DOCids and they have been running out of indexing space since they are pretty close to 2^32 pages already. Is that true?
I am still waiting for a search engine that does topic matching instead of text matching. In other words, I would like the search engine to return a list of urls with relative topics instead of relative text. As it is right now, all search engines, including Google, return pages that contain text equal or relative to the input but they might be 98% unrelated. I still can't consider the Internet as a library of knowledge due to this fact.
For example, if one searches for "TCP/IP tutorials", it would return many unrelated links like posts in newsgroups, college lectures, etc.
And the press release doesn't say that they're indexing over 6B pages, so anyone who's saying that here is mistaken.
I was interested that they mentioned Google Print, which is Google's answer to Amazon's Search Inside feature, but hasn't got much press, and is pretty well hidden in Google itself.
You can check it out by limiting results to site print.google.com, e.g. searchterm site:print.google.com. (Not quite at Amazon-type numbers yet.)
Happy Trails!
Erick
http://www.busyweather.com/
With 6 billion pages indexed and cached, and maybe an average of 50K per page (which is probably pretty conservative - it's probably twice that in some cases), that's nearly 30TB, IICIC!!!
The hard disk and RAID folks must LOVE Google....
SCREW THE ADS! http://adblock.mozdev.org/ Proud user of teh Fox of Fire - Registered Linux User #289618
That's a quote from the NYtimes (free req. yada yada) also posted as is here
If any other site were to track the stuff Google does,
Please note, this isn't a troll, and I'm not wearing a tin-foil hat (maybe I should?). Imagine the following scenario: a bomb goes off in the US. By tracing searches for "anarchist cookbook" to zipcodes within the area of the bomb blast, the FBI could have access to information that makes TIA look like a better alternative.
Maybe this isn't such a good feature after all...
have they beaten Ron Jeremy?
There is an interesting article in Wash Post Search For Tomorrow on Google, and possible AI in search.
Some excerpts:
To see a world in a grain of sand, and then to step back and see the beach where the sand lies
That reminds me of an old Dilbert (paraphrasing here, forgive the small errors):
PHB: We've run out of accounting codes! We can't do anything without one!
Dilbert: Why not upgrade the system to accept larger codes?
PHB: To do that we'd need a budget and an accounting code
Dilbert: Why can't we reuse a code from an old finished project?
PHB: Strangely enough, we've never finished a project.
You can accomplish anything you set your mind to. The impossible just takes a little longer.
One shuold have a look at Google-Watch (tinfoil? maybe...) but they have some good points:
According to DEA, Google is breaking the law
Google Evil cookie
We got your number!
And so on...
Not to troll but rather a thought. Mod as you wish.
I wrote a project for our univ and submitted the url to google bout 3 moths ago. It still doesn't show up
When will I end this grieving ? When will my future begin ?
Too bad the article doesn't mention how google is trying to fight gaming the PageRank system or any of the other problems like commercials in the results. Still a great search tool though.
Both Google and Fast have image and picture search. They're all right. But I have had more luck with Lycos.
What are your experiences?
Of course, none of these services search in the image data itself. They search filenames, special features (like image size), and the content of the pages they are found in.
What is the state of searching in images today? Facial recognition systems have existed for a while, but they are made for a specific purpose.
How long before we can take a picture of that piece of your IKEA furniture and find the same model in pictures of celebrity houses, Babylon 5 sets and crime scenes? Or taking a picture of that familiar-looking person walking down the street, searching for her, and remembering that she was in that "reality" series two years ago.
Irene KHAAAAAAN!
"Google Image Search has been significantly updated," said Sergey Brin, Google co-founder and president of Technology. "We've doubled the index to more than 880 million images, enhanced search quality, and improved the user interface."
For Mac users, I recommend using Beholder to power your Google image search. Google's minimal UI changes notwithstanding.
(Mod +1 Self-Promotive)
I read that article and really disagreed with the premise. Google is good for indexing what's available online, but only a tiny fraction of recorded human knowledge is available online. I work for a digital libraries project, and after visiting the Joint Conference on Digital Libraries, I can tell you that it's a librarian's wet dream to be in the kind of situation that the article describes: where all the information that we have to stumble around libaries and microfiches for is Googlable. But the full texts of almost no books are available. Who's going to scan in millions of volumes? Who's going to pay for that? And most importantly, how are the publishers going to allow it? US and world copyright laws are keeping almost all the content from being eligible for online publication, even if their profit windows are long closed.
I encourage all of you who are in high school or have college papers to write to look beyond Google the next time you have to research something. You will find about fifty times as much information by looking in published volumes. Here's the technique I always use: visit a University library. Use the electronic card catalog to find a couple of titles that seem to match your topic. They will likely all have similar call numbers. Then, go browse the stacks around those call numbers. That will give you access to all the books available that are related to your topic, and on the next shelf over, are books that are tangentially related. Every time I do that, I find some fascinating angle on the subject matter I never even knew existed. The books you find will have references, and you can follow those to immense amounts of material more specifically related to the angle you've chosen. And none of it is on Google.
If you have trouble, go ask one of the friendly research librarians. They do a lot more than go around and "shhh!" you.
Google is a useful tool, but if you want real depth, from people who aren't tech savvy enough to put their full academic works online, the library is the only place to find it. Put in the time!
The thing that is starting to bother me is not the search-spam (easily removed over time with increasingly smart ranking), but the mailing lists. If 20 sites around the net archive the same mailing list, then I'll get the first 20 hits in most techical searches from the same list. Google really needs some way to identify duplicate archives (which is hard given that they're all formatted differently) and treat them as one "site".
... Advanced features include search by image size, format (JPEG and/or GIF) ...
They didn't mention PNG, the turbo-studly image format which Google Image Search does indeed support.
It seems they used to have very few PNGs in their database, but now a search for +a filetype:png returns 700,000 results!
Phillip
http://www.google.com/contact/spamreport.html
This will give you options of reporting cloaked pages, doorway pages, deceptive redirects, misleading or repeated words, hidden text, etc. You have to be more specific than the "help us improve" link at the bottom of search results. Using this form I've seen abusive sites disappear from Google's index in less than 12 hours.