Google's Bigger Index

Most press-release like post ever by Chris_Jefferson · 2004-02-17 04:24 · Score: 5, Insightful

While I love google, this is so obviously just a link to a press release, and even worse the first line of the press release cut-and-pasted onto slashdot's page. And is going past 6 billion really that important?

--
Combination - fun iPhone puzzling

Re:Most press-release like post ever by twilight30 · 2004-02-17 04:28 · Score: 5, Insightful

What sucks about the press release (indeed, makes it sooo press releasy) is the total lack of anything that makes it useful:
* "...to 6bn" : From what number before?

And I still can't find what I'm looking for! (pun definitely not intended)

--
========================================
Death will come, and will have your eyes
-- Pavese

"...represents a milestone..." by stratjakt · 2004-02-17 04:26 · Score: 5, Insightful

No it doesn't. It represents a pretty reasonable upgrade for Google.

It's expected as the web grows, so will the search engines.

This isn't exactly a man-on-the-moon accomplishment.

--
I don't need no instructions to know how to rock!!!!

Re:"...represents a milestone..." by KFury · 2004-02-17 06:49 · Score: 2, Insightful

Perhaps you should look up the definition of a 'milestone'. It's a marker by the side of the road, indicating the passing of a cognitive reference point (mile, or other round measure).

6 billion items is just that, a milestone.

--

Kevin Fox

Since when did bigger == innovation? by Moderation+abuser · 2004-02-17 04:29 · Score: 5, Insightful

It just means bigger. There may well be innovation in the technology which allows bigger, that might have been news for nerds, but bigger itself isn't innovative.

--
Government of the people, by corporate executives, for corporate profits.

Re:Heh by Attaturk · 2004-02-17 04:31 · Score: 5, Insightful

Anyone else find it funny that Google has around one item for every man woman and child on earth?

I'd find it funnier if every man woman and child on earth at least had unrestricted access to Google and everything it links to.

Re:It's only a matter of time.. by Tango42 · 2004-02-17 04:32 · Score: 2, Insightful

An subject based image search would require people to state what the subject was. That might be an important step towards a sematic web, if you include everything on the web, rather than just images.

Caveat Emptor by erick99 · 2004-02-17 04:34 · Score: 5, Insightful

Google is my favorite search engine. That said, I hope that most folks understand that just because they "google" something does not make that something a fact. Also, the first few pages of any search can be the result of manipulation to get in the top 10, 20 or 100. It is really, really important to consider the source when doing any kind of research on the 'net. I am homeschooling my 13 year old and having a hell of time getting these lessons across to him. He can research almost anything in a fraction of a second, but it takes a bit longer to separate the wheat from the chaf.

Happy Trails!

Erick

--
http://www.busyweather.com/

SPEED is the answer by codeshack · 2004-02-17 04:38 · Score: 3, Insightful

Google's value seems to be in cutting out the crap in its bandwidth... look at their page loads (2.6k plus 8.4k for the image) versus Yahoo! (30k plus images, plus ads). And the less said about AV or Lycos in that regard, the better. Not to mention that Yahoo has basically just co-opted Google, but with more fat around the edges.

Whee, it's a press release by Omnifarious · 2004-02-17 04:38 · Score: 2, Insightful

A press release complete with corporate speak!

"This innovation represents a milestone for Internet users, enabling quick and easy access to the world's largest collection of online information.".

This is just google doing what they are already well known for doing best. There's nothing new or 'innovative' here. While it's a fine accomplishment, and I'm please google has indexed that much stuff, it's hardly innovative for them.

--
Need a Python, C++, Unix, Linux develop

Google's strategy becoming clear by polymorpheus · 2004-02-17 04:39 · Score: 2, Insightful

We've got over 6 billion entries, but let's return garbage for most queries, making sure the good stuff is in the "sponsored links" or sidebars. At least it's a good business model.

Re:Heh by kfg · 2004-02-17 04:42 · Score: 4, Insightful

In the same sense that I find it funny that my book collection contains about 6 billion words, one for every man, woman and child on earth.

In other words, no, can't say that I do.

Not only is it an entirely artificial milestone devoid of meaning even in the sense of interesting coincidence, it's an artificially created "milestone" for the purpose of pointing it out.

Any marketing department can churn out such by the barrel full.

KFG

Re:is it just me? by Anonymous Coward · 2004-02-17 04:43 · Score: 2, Insightful

I've gathered information from blogs that arnt avaliable anywhere else. When searching on howto setup my wireless smc network card with linux the only source I could find was a blog hit and it got me running it no time. Don't discount blogs so quickly!

The NY Times is partly wrong by Anonymous Coward · 2004-02-17 04:43 · Score: 1, Insightful

They generally do not track where people click. There are exceptions (ads and in the occasional quality control), but most of the time, your links are direct to the page. They can't track that.

Second, the other information is the same information most website collects in its logs.

oh, come on by ajagci · 2004-02-17 04:46 · Score: 2, Insightful

This really isn't a big deal and it happens all the time when building large systems. I don't know how their system works specifically, but you just change the transient in-memory representations to 64bit by recompiling, and for the on-disk stuff you create a new format using 64bits but still recognize the old format. That way, you have to convert nothing and you will be migrating to 64bit representations as needed. I'm sure Google has managed to deal with much more complex engineering problems than that.

Re:is it just me? by ajagci · 2004-02-17 04:50 · Score: 2, Insightful

Google has become so flooded with internet crap that it's quickly losing its status as a useful tool. Google needs some form of moderation to move out the superfulous blog entries and advertising fronts so it can someday become as useful as it always was.

Ah, right. Then the various zealots that you already get on Slashdot can moderate pages they don't like out of existence. You know, the people who have a pet platform and will call anybody a "Troll" that is critical of their pet platform.

Re:It's only a matter of time.. by misof · 2004-02-17 04:51 · Score: 5, Insightful

As far as I know, image search in the way you want it is still only a dream. But. Approx 2 years ago I attended a conference focused (mainly) on theoretical computer science. I saw some researchers (I think they were from Italy, not sure) present an early implementation of their algorithm to look for similar images to the one you select.

The idea behind: For a computer, it's not easy to tell what exactly does an image contain. E.g. take all those "type the word you see above inside this box to prove you are not a bot" registration forms. If there are no working algorithms to tell "this image contains the word SLASHDOT written in yellow and blue stripes on a pink-dotted black background", the chances of creating an algorithm to tell "this is a game of tennis, it is probably played in the afternoon somewhere in England" are really low.

However, by using various approaches from CG (comp. graphics), you MAY be able to tell whether two images are similar or not -- as simple examples consider edge detection, color spectrum, etc. As I already mentioned, such algorithms have already been implemented and their success ratio is already reasonably high. I expect that it won't take long until we see them on google.

Note that using the ideas above you CAN search for an image with a given subject -- it just requires two stages. Suppose you want an image of a sun setting down somewhere in the mountains. Stage 1. You enter "sunset" into google's present search engine. You get lots of sunsets, several dogs named Sunset, a chinese girl Sun Set, etc. Then you select one of the sunsets most resembling the image you want and you tell google (or some other engine) to find all similar images. Et voila.

Size and Criteria are good, but... by mugnyte · 2004-02-17 04:59 · Score: 5, Insightful

Too bad the article doesn't mention how google is trying to fight gaming the PageRank system or any of the other problems like commercials in the results. Still a great search tool though.

Re:They said 6 billion items, not webpages. by Anonymous Coward · 2004-02-17 05:01 · Score: 1, Insightful

Why was this informative?

The summary says 6 billion items, not webpages... and the linked-to article explicity breaks down the 6 billion items into those same stats.

If only people would read the actual article.

Re:What I want to know... by samcentral2000 · 2004-02-17 05:04 · Score: 5, Insightful

I totally agree. These day, whenever I use google, I always include "-search" in my search. Cleans it right up :)

I really don't agree with that article by PollGuy · 2004-02-17 05:14 · Score: 4, Insightful

I read that article and really disagreed with the premise. Google is good for indexing what's available online, but only a tiny fraction of recorded human knowledge is available online. I work for a digital libraries project, and after visiting the Joint Conference on Digital Libraries, I can tell you that it's a librarian's wet dream to be in the kind of situation that the article describes: where all the information that we have to stumble around libaries and microfiches for is Googlable. But the full texts of almost no books are available. Who's going to scan in millions of volumes? Who's going to pay for that? And most importantly, how are the publishers going to allow it? US and world copyright laws are keeping almost all the content from being eligible for online publication, even if their profit windows are long closed.

I encourage all of you who are in high school or have college papers to write to look beyond Google the next time you have to research something. You will find about fifty times as much information by looking in published volumes. Here's the technique I always use: visit a University library. Use the electronic card catalog to find a couple of titles that seem to match your topic. They will likely all have similar call numbers. Then, go browse the stacks around those call numbers. That will give you access to all the books available that are related to your topic, and on the next shelf over, are books that are tangentially related. Every time I do that, I find some fascinating angle on the subject matter I never even knew existed. The books you find will have references, and you can follow those to immense amounts of material more specifically related to the angle you've chosen. And none of it is on Google.

If you have trouble, go ask one of the friendly research librarians. They do a lot more than go around and "shhh!" you.

Google is a useful tool, but if you want real depth, from people who aren't tech savvy enough to put their full academic works online, the library is the only place to find it. Put in the time!

META Tags by JSkills · 2004-02-17 05:15 · Score: 2, Insightful

I thought this re-index would finally pick up our "description" meta tag and actually use it. Nope. Instead we still get the same concatenated list of links that are in our left nav bar as our description when people find us in google search results. They have a "decription" listed, but it looks like something they made up themselves?

Guess I better call the whaaaaambulance :-(

BTW - can you believe that a large number of visitors we get come from people who do a search on "goofball.com". Wow.

Tinfoil... maybe? Hahaha. Try yes! by Anonymous Coward · 2004-02-17 05:33 · Score: 2, Insightful

Why are people getting so upset about Google logging the exact same information as most other websites? Yes, they log your ip, your browser, what you got, where you came from and when you were there. So do I! So does Slashdot! So does every other major search engine. And, if someone is so worried about cookies, disable them. It's easy enough to do. This GoogleWatch site is incredibly biased and simply draws on people's fears. If you don't like Google, don't use it.

Re:cable descrambler by opello · 2004-02-17 06:02 · Score: 2, Insightful

the folks at google could invent a -spam option, so those searching for 'diode wave guide' wouldn't have to put -dildo, but just include a -spam

Better way to tell Google of bad results by sam1am · 2004-02-17 06:05 · Score: 3, Insightful

Better than that spam report form for problems with particular searches is the Quality Feedback Form which includes the information about your search for better followup:
At the bottom of the page, under the second search box, is a phrase "Dissatisfied with your search results? Help us improve." - Follow it and the form will ask you to:

Please tell us what specific information you were seeking. Also tell us why you were dissatisfied with the search results.
Were you looking for a specific URL that wasn't listed in the search results? If so, please enter the URL here..

--
HUMANS do it better

Massive computing power by dj245 · 2004-02-17 08:29 · Score: 2, Insightful

While I would love to see such a thing on Google, I do not think such a thing is really plausable at this time. First of all, it takes massive computing power to process such a vast quanitity of data. When I process my database of images for duplicates and similar images, it usually takes over 10 hours to generate index jpgs and CRCs and to compare them on my athlon 1800+ with 640mb ram. And I "only" have about 130,000 pictures of, uh, family and friends.

How many pictures does google have to index again? A lot. Sure, google has huge racks of clusters, but they are expanding pretty fast as it is. Does Google really want to add a bunch of racks to add a feature that maybe 20% of the people would use? I honestly don't know. I do know that google, like any company, will add features that are easy and cheap to implement, but probably won't if it means adding rack upon rack of servers.

--
Even those who arrange and design shrubberies are under considerable economic stress at this period in history.

26 of 412 comments (clear)