Google's Bigger Index
WebGangsta writes "Google Inc. today announced it expanded the breadth of its web index to more than 6 billion items. This innovation represents a milestone for Internet users, enabling quick and easy access to the world's largest collection of online information."
... this will lead to an increase in the integrity of PageRank(TM), and vintage Google will return in all her glory.
...yeah, but it would only be 2 billion items if all the Janet Jackson stuff was removed. ;-)
How many of these 6 billion items are in the form of www.massivepopups.com/your_search_term.html
What are we going to do tonight Brain?
Try to find me! ;-)
;-)
No... er... wait, no, on second thoughts, don't search for me I don't want " these pictures " to surface again... I was young and I needed the money and all that...
[Posted anonymously to protect the guilty, of course!]
Anyone else find it funny that Google has around one item for every man woman and child on earth?
eclecti.cc
While I love google, this is so obviously just a link to a press release, and even worse the first line of the press release cut-and-pasted onto slashdot's page. And is going past 6 billion really that important?
Combination - fun iPhone puzzling
And of course, 2 billion of that is goddamn blogs.
What we call folk wisdom is often no more than a kind of expedient stupidity.-Edward Abbey
What's going on here? This isn't like Google to put out a press release just because the index size just past a round number.
Is Google setting up for its IPO and therefore becoming less like the Google we know and love?
Did they hit some sort of internal limit just above 4 billion? Were they using an unsigned int? Is that why all these extra items are in a "supplemental" index?
There will soon be more web pages indexed in Google than people. I, for one, welcome our HTML overlords!
They beat McDonalds.
Man, just imagine.. how much of this information are the thoughts of stereotypical, american, teenage girls... And wepages about how so much stuff is "cute" and how great their webpage is.... ... Someone forgot to flush ...
Google Inc. today announced it expanded the breadth of its web index to more than 6 billion items.
One for every man, woman, and child. Sounds exactly like the thinking of a machine to me.
Slashdot Syndrome: the sudden, extreme urge to correct someone in order to validate one's self.
In a related story Booble's index just expanded to a Double-D.
Little boys across the globe will have sore arms tommorrow.
At least *try* to obscure the fact that this was taken from a press release. Slahsdot is beginning to sound like the Iraqi Information Minister...
I'm waiting for them to come up with a sound search and an image search that look at the subject of the image rather than its file name. After that I'm not sure what's left. Maybe comparative searches for sounds and images, where you can upload a source to compare? Who knows! I hope these guys don't follow the normal path of spiralling into inconsequence after they go public.
I have Google as my number one source of information on the internet. I hope they will keep going like they have for years and years, and (no, I dislike monopolies) that they will withstand competition from others such as MSN. Which I believe will be the truth.
:)
Not that this is something to celebrate, because having 6 billion pages alone does not tell me that they're the greatest of all search engines and will exist for a long time. And it's not like it's some kind of jubilee. But still: Way to go Google!
6 billion would mean about one page for every person in the world! Wehee! (according to UN, the world's popultaion is about 6 000 000)
Quantum hacker.
...that remarkably, a full five-sixths of the content consisted of different versions of the Google logo.
2^32 = 4.29 x 10^9
Does it sound to anybody else like the rumours of Google hitting a deadend in the number of index position for the websearch are true? Especially given that it has been more than a year since they announced 4 billion.
Apparently pagerank assigns an unsigned int to every page as id, and their index is so huge they cannot convert it to a 64 bit number. (You wonder why they didn't think of that 2-billion pages ago when a UTF8 like solution would still have been possible).
...is how to get rid of those pseudo-pages in Google. The ones with names like "thing_that_youre_searching_for.html", and all they are is either a page of dead links to crap on ebay, or a "Hey, we do great searches for your stuff".
No it doesn't. It represents a pretty reasonable upgrade for Google.
It's expected as the web grows, so will the search engines.
This isn't exactly a man-on-the-moon accomplishment.
I don't need no instructions to know how to rock!!!!
Google has become so flooded with internet crap that it's quickly losing its status as a useful tool. Google needs some form of moderation to move out the superfulous blog entries and advertising fronts so it can someday become as useful as it always was.
transmission_err
Search for any normal product name with google. What would you used to get ? Billions of useless sites that cross link to each other and have the same bloody reviews from amazon.com
:^)
That seems to have changed!
I just tried a search on television antennas and for once the results seem relevent.
Hooray!! Google is back!!
Sunny Dubey
</curious>
Any dude can configure his cgi script to have an ".html" address.
How can we be sure that those billion new pages dont come from a dynamically generated list of prime numbers (and more interestingly, how does Goggle know how to stop before infinity!)
I however find my post while googling for words they also contain.
How can one explicitely forbid Google from indexing a site ?
Sorry, I'll keep using Altavista.
Trolling using another account since 2005.
Those Link farms were starting to sweat a bit about increasing the scale of their operations by another factor of 10.
I don't want MORE things to search for, I want it to return more relavant searches. I know that the information I usually search for is out there, the problem is that there's so much chafe out there, that I can't find what I want. No matter what I search for, there are at least 2 or 3 responses related to porn. I understand that their are alot of variety of porn out there, but common... Search engines are getting even worse by throwing in search results that are hardly relevant, just because they got paid money by the company. I would even be willing to pay for a "google membership" if they eliminated the advertisers mixed in with search results and maybe gave me another special feature or 2. I'd want a search engine that returns just 1 or 2 good results over one that returns 5 good results mixed in with 200 bad ones.
"Google Print"? Sounds neat. Is there a beta page for that yet?
Notice that they claim that they search 6 billion items, but the home page only claims that they're "Searching 4,285,199,774 web pages".
To find the rest, we need to use Google's other services. The image search is claiming "Searching 880,000,000 images". Google Groups says its "Searching 845,000,000 messages". Add those to the count and you get 6,010,199,744 items total.
I do hope they manage to sort out their recent indexing problems first. For many searches altavista is now showing far better relevent result searches than google - since their attempted cull of 'spam' sites last december which kind of backfired. They have improved things this year, but the quality of their search results is not as good as it was last year. Now, they need to figure out how to get rid of all the useless sites that are just shopping directories full of espotting URLs and similar and with no real content. Funnily enough, their anti-spamsite code seemed to actually promote these up the rankings on many search terms, while penalising many sites containing genuine content.
Many people said that Google were using deliberate tactics to encourage small e-commerce websites to spend more on adwords, but I believe this wasn't deliberate - their index is so big that they simply can't tell what the results of their changes are going to do to the search orders for all the search options that people are going to use - and they simply didn't realise in advance the problems they were going to cause. And google have made efforts to minimise the damage since then, but they still need to do more.
Jolyon
Please read my Canon EOS tech blog at http://www.everyothershot.com
Now convince the government to do it. I think this plan is good but it would have to get past the greed of corporations and politicians. But it does make sense - I say do it
It just means bigger. There may well be innovation in the technology which allows bigger, that might have been news for nerds, but bigger itself isn't innovative.
Government of the people, by corporate executives, for corporate profits.
Hope they fix the "search engine spamming", there's no way to find anything in Google anymore.
so much for the link to Google, I never would have found it otherwise.
I heard that Google is using 4-byte ints for DOCids and they have been running out of indexing space since they are pretty close to 2^32 pages already. Is that true?
February 17, 2004 08:02 AM US Eastern Timezone
Google Achieves Search Milestone with Immediate Access to More Than 6 Billion Items
MOUNTAIN VIEW, Calif.--(BUSINESS WIRE)--Feb. 17, 2004--
Google Connects Searchers to World's Most Comprehensive Index; Increases Web Page and Image Collections
Google Inc. today announced it expanded the breadth of its web index to more than 6 billion items. This innovation represents a milestone for Internet users, enabling quick and easy access to the world's largest collection of online information.
"People worldwide can find more information with Google than with any other search engine," said Larry Page, Google co-founder and president of Products.
Google's collection of 6 billion items comprises 4.28 billion web pages, 880 million images, 845 million Usenet messages, and a growing collection of book-related information pages. Web surfers worldwide can now search across Google's collection of items using the following services: -- Google Web Search: The company's flagship search service now offers 4.28 billion web pages. Google's powerful and scalable technology searches this information and delivers a list of relevant results in an instant. Google Web Search also enables users to search for numerous non-HTML files, including PDF, Microsoft Office, and Corel documents. -- Google Image Search: Comprising more than 880 million images, Google Image Search enables users to find electronic images relevant to a wide variety of topics. Advanced features include search by image size, format (JPEG and/or GIF), coloration, and the ability to restrict searches to specific sites or domains. -- Google Groups: This 20-year archive of Usenet conversations is the largest of its kind and serves as a powerful reference tool, while offering insight into the history and culture of the Internet. Google Groups offers more than 845 million postings in more than 35,000 topical categories. -- Google Print: A test service that enables Google users to immediately access a range of book related information, such as first chapters, reviews, and bibliographic information. These pages also offer users links to directly purchase titles.
"Google Image Search has been significantly updated," said Sergey Brin, Google co-founder and president of Technology. "We've doubled the index to more than 880 million images, enhanced search quality, and improved the user interface."
Today's news follows the announcement last week that Google received eight awards in the 4th Annual Search Engine Watch Awards, which recognize outstanding achievements in web searching. Google was recognized as the "Outstanding Search Service," for outstanding performance in helping internet users locate information from across the Web. Google has received this distinction every year since the awards were initiated in 2000. Google AdWords was also given top honors for value, targeting, tools and overall advertiser satisfaction.
About Google Inc.
Google's innovative search technologies connect millions of people around the world with information every day. Founded in 1998 by Stanford Ph.D. students Larry Page and Sergey Brin, Google today is a top web property in all major global markets. Google's targeted advertising program, which is the largest and fastest growing in the industry, provides businesses of all sizes with measurable results, while enhancing the overall web experience for users. Google is headquartered in Silicon Valley with offices throughout North America, Europe, and Asia. For more information, visit www.google.com.
Google is a trademark of Google Inc. All other company and product names may be trademarks of the respective companies with which they are associated.
TK
I wonder how long it would take to do a "wget" with their database as input :)
You're old school? I beta tested the motherf***ing abacus!
I am still waiting for a search engine that does topic matching instead of text matching. In other words, I would like the search engine to return a list of urls with relative topics instead of relative text. As it is right now, all search engines, including Google, return pages that contain text equal or relative to the input but they might be 98% unrelated. I still can't consider the Internet as a library of knowledge due to this fact.
For example, if one searches for "TCP/IP tutorials", it would return many unrelated links like posts in newsgroups, college lectures, etc.
Also you don't get slashdotted by just having a (lame) link in the discussion, especvially if it's modded to -1 as this will be, but even if it's at 2. You only get the mad hits from front page links, there isn't a magical thing where any link on any page containing slashdot in it's url gets you 10,000 hits.
sig:
See the "..for smart people" banners Wired runs here? Look elsewhere guys.
Google Cache
Too bad they're about to lose their shirt to SCO in an end user lawsuit.
This press release represents a milestone for slashdot users, enabling quick and easy access to marketing drivel reported as news.
Honestly guys, this isn't that hard. Could you at least try?
I was interested that they mentioned Google Print, which is Google's answer to Amazon's Search Inside feature, but hasn't got much press, and is pretty well hidden in Google itself.
You can check it out by limiting results to site print.google.com, e.g. searchterm site:print.google.com. (Not quite at Amazon-type numbers yet.)
Oh, come on now, this is Google we're talking about. Just look it up yourself. Here it is in one click without having expend all that effort to type all four letters yourself. (Warning: The answer is not pretty)
Crap on GoogleHappy Trails!
Erick
http://www.busyweather.com/
With 6 billion pages indexed and cached, and maybe an average of 50K per page (which is probably pretty conservative - it's probably twice that in some cases), that's nearly 30TB, IICIC!!!
The hard disk and RAID folks must LOVE Google....
SCREW THE ADS! http://adblock.mozdev.org/ Proud user of teh Fox of Fire - Registered Linux User #289618
Though I'm likely to get hammered down around here for such a sentiment, I really think this is a result of MS declaring their intentions of ruling the websearching space.
;)
Without the fear of competition, it's very likely that Google would stagnate - thanks Microsoft
Google's value seems to be in cutting out the crap in its bandwidth... look at their page loads (2.6k plus 8.4k for the image) versus Yahoo! (30k plus images, plus ads). And the less said about AV or Lycos in that regard, the better. Not to mention that Yahoo has basically just co-opted Google, but with more fat around the edges.
A press release complete with corporate speak!
"This innovation represents a milestone for Internet users, enabling quick and easy access to the world's largest collection of online information.".
This is just google doing what they are already well known for doing best. There's nothing new or 'innovative' here. While it's a fine accomplishment, and I'm please google has indexed that much stuff, it's hardly innovative for them.
Need a Python, C++, Unix, Linux develop
That's a quote from the NYtimes (free req. yada yada) also posted as is here
If any other site were to track the stuff Google does,
Please note, this isn't a troll, and I'm not wearing a tin-foil hat (maybe I should?). Imagine the following scenario: a bomb goes off in the US. By tracing searches for "anarchist cookbook" to zipcodes within the area of the bomb blast, the FBI could have access to information that makes TIA look like a better alternative.
Maybe this isn't such a good feature after all...
We've got over 6 billion entries, but let's return garbage for most queries, making sure the good stuff is in the "sponsored links" or sidebars. At least it's a good business model.
have they beaten Ron Jeremy?
You may be willing to pay for it, but that doesn't mean that they can provide it. I mean, google is already losing money as that phenomenon dilutes both the value of their service (junk hits) and their advertising model (ads on the side not as appealing as ads mixed into the results).
Contrary to your (apparent) belief, google doesn't mix those advertisers in; they mix themselves in by exploiting the google search heuristic.
So it's a conspiracy to eliminate the french.
that should have been a "c - cedille" and not a regular c on the second term.
(which would either translate to "dumb soup" or "suspicion", depending if the right character is used)
Nouvelles de jeux et technologies en français. TC
There is an interesting article in Wash Post Search For Tomorrow on Google, and possible AI in search.
Some excerpts:
To see a world in a grain of sand, and then to step back and see the beach where the sand lies
It was reported that 1.4 billion of these were dud links redirecting to ebay.
They generally do not track where people click. There are exceptions (ads and in the occasional quality control), but most of the time, your links are direct to the page. They can't track that.
Second, the other information is the same information most website collects in its logs.
One shuold have a look at Google-Watch (tinfoil? maybe...) but they have some good points:
According to DEA, Google is breaking the law
Google Evil cookie
We got your number!
And so on...
Not to troll but rather a thought. Mod as you wish.
Your search for: "slashdot effect research whitepapers" returned approximately 40,000 results.
... erotic slashdot whitepapers cum girls hot lesbian potato bookcase effect dildo researchn et/ ...
1. Hot wild cum girls
Cached - Similar Pages
http://www.zebra-hot-dog-fetish-commander.
--TheOrangeSquid Is it any wonder things seem so awry? We swim in a sea of confusion and don't have to think to survive
This really isn't a big deal and it happens all the time when building large systems. I don't know how their system works specifically, but you just change the transient in-memory representations to 64bit by recompiling, and for the on-disk stuff you create a new format using 64bits but still recognize the old format. That way, you have to convert nothing and you will be migrating to 64bit representations as needed. I'm sure Google has managed to deal with much more complex engineering problems than that.
I doubt it. Google may have more things indexed, but it web search still sucks when compared to Teoma'a and it's image search still sucks when compared to AllTheWeb's.
Google is most non triumphant.
"Things are more moderner than before- bigger, and yet smaller- it's computers-- San Dimas High School football RULES!"
I suggest linking to it from a site visited by google alot, you know one like slashdot.
Step 1: complain about not being visit by google
Setp 2: post a link on a site that gets almost every page spidered by google
Step 3: ???
Step 4: Profit!
I wrote a project for our univ and submitted the url to google bout 3 moths ago. It still doesn't show up
When will I end this grieving ? When will my future begin ?
I had only read through 1,673,233,497 items by last Friday, and now this. I'll never catch up now! Thanks for your "service," Google.
How could you link directly to Google on the front page of slashdot? Couldn't you have used a mirror? Do you realize what will happen if Google gets slashdotted? The entire internet infrastructure will come to a screeching halt! You insensitive clods!
/sarcasm
See my Home Theater
And on the very same day my latest website get's into Google! Coincidence?
google for it!
Google - indexing more web pages than any other type of document!
I've heard rumors (from very reliable sources) that Google will be going to a "Pay Per Search" business model when they go Public... anyone else heard this?
6 Billion - 4.2 Billion Web Pages = 1.8 Bill Other
Other What???
FTP Sites? Archie Servers ? Web Mail Servers?
Gopher Servers??
Blogs are web pages
Pron is web pages..
Just last night it was 3 Billion 300 million something...
Sad that I actually pay attention to that...
Too bad the article doesn't mention how google is trying to fight gaming the PageRank system or any of the other problems like commercials in the results. Still a great search tool though.
We don't.
It is referrals to a page that give a page credit, and describe the actual content a page is about. That is what google is trying to do at least.
:). But doing this would mean some central authority (a "judge") collecting some ID on websites or the like.
Is it not true that when you hear things from 3 different people, you believe the thing more opposed to hearing it from one person ? It's this "hearsay" that makes google so powerful, yet suspectible to mischief like the one put forward by many spam-sites that lay around a network of referrals.
A possible improvement could perhaps be the use of a system that proves the referring website is unique in nature, and is not copied all over the place (call it a "jury"
Better than this would be a formula that achieves this goal without such authentication, so everyone can go about their ways just as usual, and no longer have to pay attention to these shouters in the crowd ("evidence").
Slashdot: stuff for news, nerds that matter, matter for news, stuff that nerd
Both Google and Fast have image and picture search. They're all right. But I have had more luck with Lycos.
What are your experiences?
Of course, none of these services search in the image data itself. They search filenames, special features (like image size), and the content of the pages they are found in.
What is the state of searching in images today? Facial recognition systems have existed for a while, but they are made for a specific purpose.
How long before we can take a picture of that piece of your IKEA furniture and find the same model in pictures of celebrity houses, Babylon 5 sets and crime scenes? Or taking a picture of that familiar-looking person walking down the street, searching for her, and remembering that she was in that "reality" series two years ago.
Irene KHAAAAAAN!
Also one of the main problems Google is currently having with their search results is that too many blogs are ending up in the top results, often ranking higher than the primary site that contains the information that the blogs refer to (due to many blog-users who heavily cross-linking amongst themselves which ups their rating). To combat this they've already discussed creating a seperate category for blogs to help seperate these. Good to see them taking a proactive stance -- get enough people using your service and you're suddenly got a category of blogs already identified and indexed. I'm giving them the benefit of the doubt as they've always been quite responsible with ads and while its a potential revenue stream I don't think they'll ever be as intrusive as other free sites/services.
"Google Image Search has been significantly updated," said Sergey Brin, Google co-founder and president of Technology. "We've doubled the index to more than 880 million images, enhanced search quality, and improved the user interface."
For Mac users, I recommend using Beholder to power your Google image search. Google's minimal UI changes notwithstanding.
(Mod +1 Self-Promotive)
And yes, I am an ugly American.
This sig has absolutely no significance and serves only to take up screen space and waste the time of the reader.
I read that article and really disagreed with the premise. Google is good for indexing what's available online, but only a tiny fraction of recorded human knowledge is available online. I work for a digital libraries project, and after visiting the Joint Conference on Digital Libraries, I can tell you that it's a librarian's wet dream to be in the kind of situation that the article describes: where all the information that we have to stumble around libaries and microfiches for is Googlable. But the full texts of almost no books are available. Who's going to scan in millions of volumes? Who's going to pay for that? And most importantly, how are the publishers going to allow it? US and world copyright laws are keeping almost all the content from being eligible for online publication, even if their profit windows are long closed.
I encourage all of you who are in high school or have college papers to write to look beyond Google the next time you have to research something. You will find about fifty times as much information by looking in published volumes. Here's the technique I always use: visit a University library. Use the electronic card catalog to find a couple of titles that seem to match your topic. They will likely all have similar call numbers. Then, go browse the stacks around those call numbers. That will give you access to all the books available that are related to your topic, and on the next shelf over, are books that are tangentially related. Every time I do that, I find some fascinating angle on the subject matter I never even knew existed. The books you find will have references, and you can follow those to immense amounts of material more specifically related to the angle you've chosen. And none of it is on Google.
If you have trouble, go ask one of the friendly research librarians. They do a lot more than go around and "shhh!" you.
Google is a useful tool, but if you want real depth, from people who aren't tech savvy enough to put their full academic works online, the library is the only place to find it. Put in the time!
Is Google becoming a task master for Big Brother?
Et tu, Google?
Software Wars
Let's see if the 'new' index adds interesting stuff:
1.- Michael Moore (still no. 1)
2.- Dubya
3.- Jimmy Carter
4.- Sen. Hillary R. Clinton
5.- Howard Dean
PS: litigious bastards still clean, just pointing to litigiousbastards.com.
Guess I better call the whaaaaambulance :-(
BTW - can you believe that a large number of visitors we get come from people who do a search on "goofball.com". Wow.
Betty's bunnies have fluffy fur today
Irene KHAAAAAAN!
Although this w.
Outdoor digital photography, mostly in New Engl
Teoma is the search engine that does exactly what you are asking. It breaks the internet down into topics and subjects, and it only counts links within these subjects. For example sites about the apples farmers grow probably link to other sites of these same type of apples, whereas sites about Apple Computers probably link to each other more, and this is how Teoma can recognize each as a different subject because of its link farm. Teoma gives refinement suggestions to help you navigate through its subject related clusters, and lists pages on a subject with lots of relevant links, under its link collection.
The thing that is starting to bother me is not the search-spam (easily removed over time with increasingly smart ranking), but the mailing lists. If 20 sites around the net archive the same mailing list, then I'll get the first 20 hits in most techical searches from the same list. Google really needs some way to identify duplicate archives (which is hard given that they're all formatted differently) and treat them as one "site".
Go here for instructions on removal from their index.
This is slightly offtopic, but is anyone seeing a slimmer Google? The blue tabs are missing on the front page, and the search result pages are slightly different. I only see it when I use Firefox, IE and Firebird still show the old layout. Anyone else seeing this?
blog & fiction: jd87
For some reason moz wiped out the part of the message that was highlighted in my above post when submitting. Here's what I was saying:
Although this was before my time, the university I went to (WPI) used to have a board on it with all the known urls at the time. Every few days someone would add another url to the board. Ah, the days when you really could print the whole Internet out.
Outdoor digital photography, mostly in New Engl
The upgrade has been quite good to me! Before the upgrade a search for my name would rank my website many pages down and then only secondary links not the root site. Now I rank number one! It looks like all my slashdot posting has finally paid off.
Ahh. The small victories of the computer geek.
Michael.
Linux : Mac
In 1959, a year and a half after IBM announced their first fully transistorized computer, I was issued a federal ID number. I rather considered that the proof that I could be databased quite easily, for any purpose whatsoever.
KFG
90% of everything is crap...
Your description comes from the Google Directory, which comes from DMOZ.
Google doesn't index user sigs, so stop trying to "Google Bomb" with them.
All of this time, I thought google was actually doing something interesting. It turns out, these guys aren't really doing anything at all! I took a tour the other day and here's what things really look like behind the scenes.
They have 2 front end web servers running Apache on some eMachines they got at Best Buy. They have a backend MySQL server running on a really big eMachine (2 GHz, if I recall correctly). The backend MySQL machine has two IDE hard drives, but they are like 200 GB each. They're hooked up via a 256K frac T1, but I hear they are behind on the monthly payments.
Each time you hit google's page and do a search, it issues an SQL query like this:
SELECT * FROM the_web
WHERE text REGEXP [what you entered]
They just moved out of one of the guy's apartments into some small rented office space in a shady part of town outside of Mountain View, CA. Mark my words, these google guys will be out of business in like two months.
I'm going on tours of Yahoo and Amazon in the next couple of weeks. I'll get to the bottom of this internet hype, don't you worry.
Why are people getting so upset about Google logging the exact same information as most other websites? Yes, they log your ip, your browser, what you got, where you came from and when you were there. So do I! So does Slashdot! So does every other major search engine. And, if someone is so worried about cookies, disable them. It's easy enough to do. This GoogleWatch site is incredibly biased and simply draws on people's fears. If you don't like Google, don't use it.
My favourite right now is GigaBlast.
It's still smaller than most other search engines, but it's quite fast, has good relevance and it indexes stuff in real time.
Besides, if you don't find what you are looking, you can do the same search with 5 other search engines just by clicking on links at the bottom of the results page.
But what I like with Gigablast is that it's always getting better and I feel like part of something that has potential.
Treehugger? Treehugger... Treehugger!
Kind of like this?
Wishing I was a millionaire since 1969.
Hurray.
Yeah. Only slightly more pathetic than the "did anyone else read this as 'Boogle's Gigger Index'?" retards.
search for "ntsc cable descrambler". you will get thousands of results, and ALL are spam. some search terms are just more likely to produce spam results.
http://github.com/gbook/nidb
This is about the third Google related spam I've seen.
i tle>gam nszikvtojz fxyqkqbfmrsz u shzjaq x ib r xxxq
t hgoogle/">Casht ml">emails</a>
I know its a SEO company that's doing it, but given there is going to be a route to them, perhaps Google can Cease and Desist em?
Thanks Sergei and Larry!
<x-html><!x-stuff-for-pete base="" src="" id="0" charset="iso-8859-1/macintosh"><html>
<head>
<t
vp h sm ip hnuigzykilvfxsctsmb q clean</title>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
</head>
<body>
<p> </p>
<p><a href="http://www.globalmarketing2000.biz/cashinwi
in with Google</a> makes earning an affiliate income very simple. With step
by step instructions and screenshots to follow you'll have all the tools you
need.</p>
<p></p>
<p><font size="2">no more <a href="http://www.globalmarketing2000.biz/remove.h
please </font></p>
</body>
</html>
envpw
</x-html>
I want to know when Google is going to have more machines in their server famrs then all the other domains on the Internet put together? Having seen one of their installations (I don't think it was a big one either), it can't be far off. Then they should be able to Index the Internet :-).
could someone plz mirror google.com? looks like it got /.'ed
If I could make this sig kill you, I would.
Since when is making something bigger innovation?
I'm just going to go innovate some more tea into my mug.
...of the index but what you do with it. ;-)
At the bottom of the page, under the second search box, is a phrase "Dissatisfied with your search results? Help us improve." - Follow it and the form will ask you to:
- Please tell us what specific information you were seeking. Also tell us why you were dissatisfied with the search results.
- Were you looking for a specific URL that wasn't listed in the search results? If so, please enter the URL here..
--HUMANS do it better
... Advanced features include search by image size, format (JPEG and/or GIF) ...
They didn't mention PNG, the turbo-studly image format which Google Image Search does indeed support.
It seems they used to have very few PNGs in their database, but now a search for +a filetype:png returns 700,000 results!
Phillip
It will take me 190 years to see it all. And thats just a second each. Depressing.
The moderators are brutal today. Did they even read the grandparent?
I wicked agree... that GoogleWatch site is full of crap. Same stuff applies to Yahoo, IMDB, AltaVista, any other search site.
The fact is that your search queries are logged no matter what search engine you are using. If you follow a link from one website to another, the other web site can log where you came from.
Man are moderators on crack
It's fine. I checked. A quick google of mirko, forums and gnuart threw up his forum, which it shouldn't do because it's /robots.txt file reads:
User-Agent: *
Disallow: /
And click on "I'm feeling lucky"
Mirror here
Nobody would ever need that many searching.
What i would like to have is for those spam sites to stop being linked in google. Search for something like "free motorola ringtones" to see what i mean
Open Source Java Web Forum with LDAP authentication
Last week I tested some of those progs (freeware and shareware) and was pleasantly surprised by how well they worked. I have gone through 4 generations of slide scanners, so when I get a new one, I rescan all my best slides and want to give it the same name as the old file. We are talking thousands of png files at 4000dpi, typically 40Mb each. It took less than 30 minutes to search through 100Gb of images with only 10% false positives (okay, I got a fast machine with a fast drive). Can't remember the name of the app right now, sorry.
Non-Linux Penguins ?
When you search for "litigious bastards", you now get a website promoting the googlebomb technique listed first. The sco group was listed first, but now it's ranked about 47. I'm not sure if they are reducing the relevance of the link-text, or if the ranking has been lowered because the sco group probably doesn't point back at any of the blogs that link to it.
HIV Crosses Species Barrier... into Muppets
You aren't searching for more than 10 words.
If Google really cared they would fix Android Chrome to reflow text, instead of discriminating
Google's adsense service https://www.google.com/adsense/overview
is certainly a winner
The ads presented are similar to the paid ads shown on a standard google search but using the keywords of the page displayed and also tailored to the country of the viewer via their ip address.
In this way webmasters can maximize the global potential of their website.
We have some very highly ranked pages (i.e. top 10) but for UK only content. Now our visitors who find us via search engines and discover we aren't quite what they want are presented with a relevant exit strategy and we get a commission!
We're getting an average 1.7% click through rate which is translating into a nice tidy sum.
go google! keep kicking MSN's dirty butt
There are places where the networks are not touching,and there are places where they are-Boeing's Lori Gunter
We have batteries and accessories for your Google's Bigger Index. Buy now from our extensive selection of Google's Bigger Index, and when you buy your Google's Bigger Index you get free shipping. Buy now. Google's Bigger Index.
God, google sucks nowadays.
Yahoo turned on inktomi on yahoo.com, meaning their search results do not depend on google's algo. It's a little odd, however probably just a coincidence, that this google announcment came just as yahoo flipped the switch.
Six billion items... how does that compare with other search engines like Alltheweb or Teoma, or even the venerable Altavista?
-- Ed Avis ed@membled.com
It's probably not a big deal to expand the capacity, but it certainly looks like it's pegged to 2^32 for this release.
---- "If we have to go on with these damned quantum jumps, then I'm sorry that I ever got involved" - Erwin Schrodinger
"This innovation represents a milestone for Internet users, enabling quick and easy access to the world's largest collection of online information."
:-)
Most of which have broken links, are wildly innacurate, and contains completely unresearched information.
Kudos. We are blessed indeed.
"Politicians find new names for institutions which under old names have become odious to the people."
Of course the robots.txt is only read when the site is spidered. Is it possible the site was up for a while before the file was added?
---- Den ene knappen er powerknapp, den andre er Bender voice knapp "Bite My Shiny Metal Ass"
How many pictures does google have to index again? A lot. Sure, google has huge racks of clusters, but they are expanding pretty fast as it is. Does Google really want to add a bunch of racks to add a feature that maybe 20% of the people would use? I honestly don't know. I do know that google, like any company, will add features that are easy and cheap to implement, but probably won't if it means adding rack upon rack of servers.
Even those who arrange and design shrubberies are under considerable economic stress at this period in history.
Especially with this announcement, I'm starting to get worried about the reliability of Google. More and more groups are taking advantage of quirks in Google's ranking system, as has been mentioned in previous Slashdot articles, to the point now where if you're searching for something even a little outside of the pop-culture mainstream (where you will be inundanted with valid hits) you will find tons and tons of automatically generated garbage hits on "providers" who boost their indexes by feeding links to each other. Google is a great service; I hope that in its desire to continue its ever-expanding dominance of the search engine market, they don't let themselves get too complacent and let their search engine technology become stale in the sense of it being so abused that for reliable results you need to look elsewhere.
It's also interesting to note that both have a copyright date of 2004, which would imply that Google has found just under 1 billion websites in a month and a half, which seems like an interesting fact.
Procrastination sucks.
I think this just insures the fact that Google.com is the leading online search engine. Even though they didn't make the sites themselves, this still greatly helps in the enlargement of the internet.
"Instant gratification takes too long." - Carrie Fisher
I'm still looking for ftp search to be included into google.
They are very receptive to reports and I've seen deceptive sites removed from the index in less than 12 hours.
http://www.google.com/contact/spamreport.html
This will give you options of reporting cloaked pages, doorway pages, deceptive redirects, misleading or repeated words, hidden text, etc. You have to be more specific than the "help us improve" link at the bottom of search results. Using this form I've seen abusive sites disappear from Google's index in less than 12 hours.
What's Google's URL please.
I can't find it on Google.
While Link 1 is the admittedly useful "Television Antenna Frequently Asked Questions", Links 2, 3, and 4 are spam, link 5 is a press release, link 6 is more spam, and the useful links start somewhere after that. From this end, it seems like nothing much has changed...
Vigara, viiagra, viagara, veragra, v1agra, viaagra...
All were taken from the 2004 edition of the SPAMmers's Dictionary.
... but, these days, I find myself using Google's Newsgroup search before their main search for most things. The main search is so full of crap, but the Usenet search contains much less spam. Often more insigtful and informative, often conatining direct links to helpful websites, so you don't need to wade through the crap in a main search. On any given topic, there's usually some great old posts from 1994, or something, when the Newsgroups was (or seems to be, given the results I've been getting) damned near specialized periodicals from actual professionals in whatever topic a given group had. Hell, you can even do a porn search on the Googled Newsgroups, not get spammed crap, and actually find helpful information. You haven't been able to do that on a main web search since the early days of Yahoo's golden years, and even then it was mostly crap.
Waikato University has a music recognition system that would be awesome on google - if you can hum a few notes, it'll match it with the original tune. Remember all those emusic tunes that ended up as 'elevator' music? A lot of them are free downloads and still available on the artist's websites, but if you hear a tune you like while you're waiting on hold how do you find it?
Also, it would be cool if I could upload a text-overlayed, renamed thumbnail from usenet and google could find the matching full-size image for me.
455fe10422ca29c4933f95052b792ab2
I'm going to get modded down into oblivion for saying this, but whatever.
/.'ers coming out of the woodwork in support of the local libraries and expounding on how incredibly useful dead tree literature is. Another attitude that seems to crop up both here and in University classroms a lot is that "stuff on the Internet is all unverifiable crap and should never be used in real papers"
For some reason I see a bunch of
In the words of the parent:
The books you find will have references, and you can follow those to immense amounts of material more specifically related to the angle you've chosen. And none of it is on Google.
Know what? I'm calling bollocks on this attitude. Wake up folks. It's the 21st century, not the 17th. I'm a college student. I've been there, done that. I know it's f-in tough to be forced to crank out a bunch of bullshit papers. That's life. But the Internet has made it all easier. If I need to look up information on, say, Hamlet, volumes of information are a few clicks away. Yeah, I've heard the usual jabber about how the barrier to entry on the Internet is practically nonexistant and anyone can publich any useless crap, yadda yadda yadda.
Again, wake up people. That's what google and Pagerank is for. I'm not a total idiot. I know bullshit online when I see it. Moreover, if I look an opinion online over, and I think it's enlightening, there's a pretty good chance that my Professor will too. Yeah, if I wanted to invest hours and hours in a three page paper I could go out through the cold and snow to the library, hunt through the antiquated card catalogue for what I'm looking for, and actually read a real resource for factual, honest-to-god information.
Forget it. Know where I turn to when I can't figure out what the hell is wrong with my network card? Where I go when I want to know trivia, like whether the original Goldfish were "cheddar" or "plain"? Right. It ain't the public library. And it's the same place I go when I want to find some useful information about Hamlet or any other serious research, for that matter. Instead of manually flipping through pages of some damn research books I've got the clusters of Google grep'ing through god only knows how many pages. Yeah, there are crazy ideas out there, but again, I'm not an idiot. And neither is Google for that matter. Pagerank is your friend, library-lovers.
And another thing. The parent whined about how there's not enough material available online due to copyright crap.
US and world copyright laws are keeping almost all the content from being eligible for online publication, even if their profit windows are long closed.
He's absolutely right. Libraries will be dead the minute copyright law gets toned down to a 10 year span and every legal book on the planet has been OCR'ed. In the meantime, put your stuff out there for free. It makes a difference. Write a big research paper? An English paper? Science paper? Put it under GPL/Creative Commons/BSD/whatever, and let people have it. You don't need it anymore. Don't be so damn possessive. You won't be able to take the stupid papers with you when you die.
I'm not just blowing smoke. I do this on my own site. Yeah, the papers I've put out are just stuff that I or others have written, but it helps. Information is a good thing. Let Google decide if your paper is good enough to show up in a search.
http://cltracker.net -- powerful craigslist multi-city search
2000 is not "before Google".
w ww.altavista.com/
The oldest Altavista page in the wayback archive,
http://web.archive.org/web/19961022174810/http://
is much smaller, but it already features ads. Early Altavista was adless.
Maybe you'd want nobody else to get in. How about using HTTP Authentication?
I thought it was funny becuase there was a hint of truth in it. All facets of our life are quickly being tied into and run through the Internet. Posting on Slashdot and further developing the Internet is, in a sense, welcoming our new HTML overlords. It's funny and instightful because it was an unexpected truth wrapped in a Simpson's Quote/Slashdot Cliche.