Google's Bigger Index
WebGangsta writes "Google Inc. today announced it expanded the breadth of its web index to more than 6 billion items. This innovation represents a milestone for Internet users, enabling quick and easy access to the world's largest collection of online information."
While I love google, this is so obviously just a link to a press release, and even worse the first line of the press release cut-and-pasted onto slashdot's page. And is going past 6 billion really that important?
Combination - fun iPhone puzzling
No it doesn't. It represents a pretty reasonable upgrade for Google.
It's expected as the web grows, so will the search engines.
This isn't exactly a man-on-the-moon accomplishment.
I don't need no instructions to know how to rock!!!!
It just means bigger. There may well be innovation in the technology which allows bigger, that might have been news for nerds, but bigger itself isn't innovative.
Government of the people, by corporate executives, for corporate profits.
Anyone else find it funny that Google has around one item for every man woman and child on earth?
I'd find it funnier if every man woman and child on earth at least had unrestricted access to Google and everything it links to.
An subject based image search would require people to state what the subject was. That might be an important step towards a sematic web, if you include everything on the web, rather than just images.
Happy Trails!
Erick
http://www.busyweather.com/
Google's value seems to be in cutting out the crap in its bandwidth... look at their page loads (2.6k plus 8.4k for the image) versus Yahoo! (30k plus images, plus ads). And the less said about AV or Lycos in that regard, the better. Not to mention that Yahoo has basically just co-opted Google, but with more fat around the edges.
A press release complete with corporate speak!
"This innovation represents a milestone for Internet users, enabling quick and easy access to the world's largest collection of online information.".
This is just google doing what they are already well known for doing best. There's nothing new or 'innovative' here. While it's a fine accomplishment, and I'm please google has indexed that much stuff, it's hardly innovative for them.
Need a Python, C++, Unix, Linux develop
We've got over 6 billion entries, but let's return garbage for most queries, making sure the good stuff is in the "sponsored links" or sidebars. At least it's a good business model.
In the same sense that I find it funny that my book collection contains about 6 billion words, one for every man, woman and child on earth.
In other words, no, can't say that I do.
Not only is it an entirely artificial milestone devoid of meaning even in the sense of interesting coincidence, it's an artificially created "milestone" for the purpose of pointing it out.
Any marketing department can churn out such by the barrel full.
KFG
I've gathered information from blogs that arnt avaliable anywhere else. When searching on howto setup my wireless smc network card with linux the only source I could find was a blog hit and it got me running it no time. Don't discount blogs so quickly!
They generally do not track where people click. There are exceptions (ads and in the occasional quality control), but most of the time, your links are direct to the page. They can't track that.
Second, the other information is the same information most website collects in its logs.
This really isn't a big deal and it happens all the time when building large systems. I don't know how their system works specifically, but you just change the transient in-memory representations to 64bit by recompiling, and for the on-disk stuff you create a new format using 64bits but still recognize the old format. That way, you have to convert nothing and you will be migrating to 64bit representations as needed. I'm sure Google has managed to deal with much more complex engineering problems than that.
Google has become so flooded with internet crap that it's quickly losing its status as a useful tool. Google needs some form of moderation to move out the superfulous blog entries and advertising fronts so it can someday become as useful as it always was.
Ah, right. Then the various zealots that you already get on Slashdot can moderate pages they don't like out of existence. You know, the people who have a pet platform and will call anybody a "Troll" that is critical of their pet platform.
As far as I know, image search in the way you want it is still only a dream. But. Approx 2 years ago I attended a conference focused (mainly) on theoretical computer science. I saw some researchers (I think they were from Italy, not sure) present an early implementation of their algorithm to look for similar images to the one you select.
The idea behind: For a computer, it's not easy to tell what exactly does an image contain. E.g. take all those "type the word you see above inside this box to prove you are not a bot" registration forms. If there are no working algorithms to tell "this image contains the word SLASHDOT written in yellow and blue stripes on a pink-dotted black background", the chances of creating an algorithm to tell "this is a game of tennis, it is probably played in the afternoon somewhere in England" are really low.
However, by using various approaches from CG (comp. graphics), you MAY be able to tell whether two images are similar or not -- as simple examples consider edge detection, color spectrum, etc. As I already mentioned, such algorithms have already been implemented and their success ratio is already reasonably high. I expect that it won't take long until we see them on google.
Note that using the ideas above you CAN search for an image with a given subject -- it just requires two stages. Suppose you want an image of a sun setting down somewhere in the mountains. Stage 1. You enter "sunset" into google's present search engine. You get lots of sunsets, several dogs named Sunset, a chinese girl Sun Set, etc. Then you select one of the sunsets most resembling the image you want and you tell google (or some other engine) to find all similar images. Et voila.
Too bad the article doesn't mention how google is trying to fight gaming the PageRank system or any of the other problems like commercials in the results. Still a great search tool though.
Why was this informative?
The summary says 6 billion items, not webpages... and the linked-to article explicity breaks down the 6 billion items into those same stats.
If only people would read the actual article.
I totally agree. These day, whenever I use google, I always include "-search" in my search. Cleans it right up :)
I read that article and really disagreed with the premise. Google is good for indexing what's available online, but only a tiny fraction of recorded human knowledge is available online. I work for a digital libraries project, and after visiting the Joint Conference on Digital Libraries, I can tell you that it's a librarian's wet dream to be in the kind of situation that the article describes: where all the information that we have to stumble around libaries and microfiches for is Googlable. But the full texts of almost no books are available. Who's going to scan in millions of volumes? Who's going to pay for that? And most importantly, how are the publishers going to allow it? US and world copyright laws are keeping almost all the content from being eligible for online publication, even if their profit windows are long closed.
I encourage all of you who are in high school or have college papers to write to look beyond Google the next time you have to research something. You will find about fifty times as much information by looking in published volumes. Here's the technique I always use: visit a University library. Use the electronic card catalog to find a couple of titles that seem to match your topic. They will likely all have similar call numbers. Then, go browse the stacks around those call numbers. That will give you access to all the books available that are related to your topic, and on the next shelf over, are books that are tangentially related. Every time I do that, I find some fascinating angle on the subject matter I never even knew existed. The books you find will have references, and you can follow those to immense amounts of material more specifically related to the angle you've chosen. And none of it is on Google.
If you have trouble, go ask one of the friendly research librarians. They do a lot more than go around and "shhh!" you.
Google is a useful tool, but if you want real depth, from people who aren't tech savvy enough to put their full academic works online, the library is the only place to find it. Put in the time!
Guess I better call the whaaaaambulance :-(
BTW - can you believe that a large number of visitors we get come from people who do a search on "goofball.com". Wow.
Why are people getting so upset about Google logging the exact same information as most other websites? Yes, they log your ip, your browser, what you got, where you came from and when you were there. So do I! So does Slashdot! So does every other major search engine. And, if someone is so worried about cookies, disable them. It's easy enough to do. This GoogleWatch site is incredibly biased and simply draws on people's fears. If you don't like Google, don't use it.
the folks at google could invent a -spam option, so those searching for 'diode wave guide' wouldn't have to put -dildo, but just include a -spam
At the bottom of the page, under the second search box, is a phrase "Dissatisfied with your search results? Help us improve." - Follow it and the form will ask you to:
- Please tell us what specific information you were seeking. Also tell us why you were dissatisfied with the search results.
- Were you looking for a specific URL that wasn't listed in the search results? If so, please enter the URL here..
--HUMANS do it better
How many pictures does google have to index again? A lot. Sure, google has huge racks of clusters, but they are expanding pretty fast as it is. Does Google really want to add a bunch of racks to add a feature that maybe 20% of the people would use? I honestly don't know. I do know that google, like any company, will add features that are easy and cheap to implement, but probably won't if it means adding rack upon rack of servers.
Even those who arrange and design shrubberies are under considerable economic stress at this period in history.