geekfiend writes "Today Google updated their website to indicate over eight billion pages crawled, cached and indexed. They've also added an entry to their blog explaining that they still have tons of work to do."
The reason the page is named that way is because of recent (well, a few months) changes in the Blogger.com software. Since Google owns Blogger.com, it's natural that they would use the Blogger.com software to generate their own blog.
Whether or not it has anything to do with where your page will rank on google is pure speculation.
In related news, the sun has set for today and will rise again tomorrow. The web is growing. Google is indexing it. It isn't news, it's a factoid.
--
In Soviet America the banks rob you!
More pages v.s more relevant pages
by
xiando
·
· Score: 5, Insightful
Personally I find that the lack of relevant pages if the biggest problem with search engines, not the lack of pages with information. It seems I always find what I'm looking for eventually, what I need improved is the time I spend looking though spam-bomb pages before I find a page with the correct information.
These spam-pages seem to be increasing; I mean those pages with just a buch of keywords or the output of some search system.
Re:More pages v.s more relevant pages
by
Kithraya
·
· Score: 5, Insightful
I'm especially irritated by the increasing number of highly-ranked pages that are nothing more than another search engine's results. If Google could find some way to identify and remove these from my result set, Google's usefulness to me would increase 10 times over.
Re:More pages v.s more relevant pages
by
metlin
·
· Score: 2, Interesting
Google has a problem with this because some of those searches are actually useful.
For instance, when I search for something technical, I often run into search results from DBLP, arXiv, CiteSeer and the like -- although these are really search results within themselves, they're immensely useful to me.
Since we both effectively have a conflict of interest - Google would need to figure out a way to strike a balance.
Re:More pages v.s more relevant pages
by
Eric+Giguere
·
· Score: 1
Absolutely. This is why I always tell people to "think like a librarian" when it comes to finding information in a search engine, whether it be Google or not. That said, I don't know how much is being taught about libraries and library organization these days, so maybe that's a meaningless thing to say.
Re:More pages v.s more relevant pages
by
__aahlyu4518
·
· Score: 2, Informative
Personally I find that the lack of relevant pages if the biggest problem with search engines, not the lack of pages with information.
Actually.... information IS relevant data. If it's not relevant to what you want, then it is just data...
Re:More pages v.s more relevant pages
by
juglugs
·
· Score: 1, Funny
What's a library?
-- This sig is in Spanish when you're not looking....
Re:More pages v.s more relevant pages
by
corrie
·
· Score: 2, Interesting
However, results from places like Starware Search are not useful, and elevates my blood pressure with all the attempts at spamming me.
Just because I use Firefox and Adblock doesn't mean I now want to visit all possible spam sites in existence.
I don't care if Starware and friends make their money from advertising or not. The point is that Google is ALREADY a search engine, and a pretty good one at that. What is the point of returning results from another search engine, especially if the other one does not even have specialised domain?
Re:More pages v.s more relevant pages
by
jez9999
·
· Score: 4, Interesting
One thing that would really help me sometimes would be if Google allowed you to do an 'exact match' search. No, I don't mean enclosing something in double quotes, that still ignores capitalization, whitespace, and most non-letter characters. I'd like to be able to search for pages that have the EXACT string '#windows EFNET', for example, or '/usr/bin/' or whatever. '/Usr/biN' wouldn't match, and nor would '#windows^^EFNET' (where ^ is equal to a space:-) ).
I sent an e-mail to Google about this and the guy who replied didn't seem to think it was possible... anyone know if it is?
Re:More pages v.s more relevant pages
by
fishbot
·
· Score: 1
It's what we'd get if we printed everything linked to by Google:)
Re:More pages v.s more relevant pages
by
Anonymous Coward
·
· Score: 1, Funny
I don't want to start an old discussion again... but hereI don't want to start an old discussion again... but here is where rdf,... can play a role. At my school they are starting to deposit articles,... in a repository that has metadata based on the dublin core. Hope this will help searching for that kind of info info: papers,... ?
Anyway, I believe google also has a personalised search:
http://labs.google.com/personalized
Maybe this can help.
Re:More pages v.s more relevant pages
by
Anonymous Coward
·
· Score: 1, Funny
That's why engines showing clustered results may well end up beating Google at its own game.
Re:More pages v.s more relevant pages
by
MoobY
·
· Score: 1
The same goes for duplicate information. I don't want 200 versions of wikipedia listed when I'm looking for a specific article, nor 200 times the same man page when I'm researching something different of a unix command besides the man page of a command.
-- ---
Sigmentation Fault - Comments Dumped
Re:More pages v.s more relevant pages
by
PsychoSlashDot
·
· Score: 5, Insightful
What I've read on the Google help pages seems to indicate that they don't index punctuation or capitalization. When you search for something, your string is looked for within an existing index, and appropriate reference materials are shown. Including punctuation wouldn't result in any hits within their index, meaning no results.
Now, obviously, it is theoretically possible to do just about anything. But in this case, with the architecture they have in place, anyone ever doing what you're asking would require a full-text search through their multi-TB dataset, which I suspect is highly impractical.
My point is that as I understand it, Google has coded a number of shortcut tricks which allow reasonable search times, and full-text string-exact searching would prevent them from using those shortcuts, resulting in search times they don't seem to think is reasonable.
-- "Oh no... he found the.sig setting."
Re:More pages v.s more relevant pages
by
Anonymous Coward
·
· Score: 1, Funny
It is an interesting problem, extact string matching. If you think at how it would be done it is relatively simple for a short piece of text. just call strstr on a chunk of text. The problem, is google does not likely index large bodies of text. Instead, google indexes bags of terms. Each term is likely a stemmed word, that no longer resembles the orignal word. In this way, google compresses the document, saving space, while making it faster to look up key words in a document. The only way I think google could provide exact string matching, is to search their google cache. The problem or limitation with the google cache, is if you didn't notice, google does not cache every page, hence the word cache. While disk space is cheap it is also slow to access, so, even while it is visible google could store all 8 billion pages on disk it is only likely you would want to wait that long to search for your extact match. There are some tricks that could be used to speed narrow in on which documents to do exact string checking in. First they use the string you passed in and do the normal tokenization of the string breaking it down into parts. Then they come up with a result set. Now they can start doing exact string matching within that returned result set. The issue with that is it is undeterministic as to how long that process will take as each document is of arbitrary size. The best they could do would be to do an exact string match in the summary text and return the documents in that set first followed by the other documents, which is very close to what they actually do.
Re:More pages v.s more relevant pages
by
maxwell+demon
·
· Score: 1
The stuff you find in/usr/lib:-)
-- The Tao of math: The numbers you can count are not the real numbers.
Re:More pages v.s more relevant pages
by
Erasmus+Darwin
·
· Score: 3, Interesting
"But in this case, with the architecture they have in place, anyone ever doing what you're asking would require a full-text search through their multi-TB dataset, which I suspect is highly impractical."
Actually, they could cut that down considerably. For example, say we were doing an exact search for '#windows EFNET' as in the original example. The first thing they could do is start with a traditional search on "#windows EFNET". At that point, they've cut their multi-TB dataset down to just a few megs or less of likely matches (in this case, only 10 pages matched). Then they could do a full-text check on each result, looking for an exact match and discarding all the rest.
Re:More pages v.s more relevant pages
by
LeoNomis
·
· Score: 1
I'm especially irritated by the increasing number of highly-ranked pages that are nothing more than another search engine's results. If Google could find some way to identify and remove these from my result set, Google's usefulness to me would increase 10 times over.
Don't you mean 20 times over?
Re:More pages v.s more relevant pages
by
cavemanf16
·
· Score: 1
What you guys don't realize is the orders of magnitude higher that it takes to perform the whole "capitalized/not capitalized" search makes this unreasonable for Google to attempt to do. A long while back our CRM application was consistently getting hung on queries that involved customer first/last name combinations because it WAS capitalization sensitive. You see, when you tell a computer to search for "Joe JingleheiMerScHmIdT" WITH a capitalization sensitive search, it has to go through every single combination of capitalization in that name. But when all it needs to do is match "J" to "j or J", "o" to "o or O", and so forth, the search takes MUCH less time.
While it is somewhat frustrating that Google can't do this (and do it in 0.2s), the reality is that you gain a whole lot more processing power for Google's algorithm to do it's thing in presenting you with the best results when you're not sure exactly what you want out of the search. I think Google has struck a good balance so far.
Re:More pages v.s more relevant pages
by
Zemplar
·
· Score: 1
I agree that this would be a nice feature, but the fact of the matter is that the vast majority of users don't have an OS (Windows) that is case sensitive, save for a very small list of exceptions. I would also go so far as to suggest that most users don't even think with a thought process that distinguishes data by capitalization.
Re:More pages v.s more relevant pages
by
ShecoDu
·
· Score: 1
I dont know a lot about search engines, but maybe somebody else can update my comment.
Google uses an advanced indexing algorith based on words and coordinates, some hashes here and around and you get the relevant results, they could add a few characters to their valid-word-chars list, but trying to get something more exact would need a new algorithm.
You should try reading a technical document about search algorithms, just for kicks:)
Re:More pages v.s more relevant pages
by
PMuse
·
· Score: 2, Insightful
How about a NEAR operator? Sure, AND OR NOT are nice, but my results would be a lot more relevant if I could eliminate results where the search terms appeared a thousand words apart.
-- "We reject as false the choice between our safety and our ideals." --The American President (20.1.2009)
Re:More pages v.s more relevant pages
by
Spoing
·
· Score: 1
I'm especially irritated by the increasing number of highly-ranked pages that are nothing more than another search engine's results. If Google could find some way to identify and remove these from my result set, Google's usefulness to me would increase 10 times over.
Agreed. I'd like to add these sites to a global block list; stumble on them during a search -- GRRRR! -- click 'block host' and never see the site again (bonus if the link can be removed or marked as 'already read'.
-- A firewall can not protect you from yourself. Turn off what you do not need. Do not use the firewall to do your work.
Re:More pages v.s more relevant pages
by
Anonymous Coward
·
· Score: 0
Try using "." it doesn't solve all your problems, but it can help. If you said foo.bar, then google treats it as one 7 character word, where the '.' can be a single space or certain other characters. It's great for finding.exact.quotes
Re:More pages v.s more relevant pages
by
tsiolkovsky
·
· Score: 1
Even if this were true, it seems that it would still be possible to take the query results and run them through another filter for punctuation, capitalization etc.
This wouldn't require a whole rewrite, just stick one more filter between the user's query and the final presentation of the data in HTML. You know that the query system has to be highly modular. The data passes through several modules before it is sent back to the user. They just need to add one more module that filters out results that don't meet the query's exact capitalization and punctuation.
Re:More pages v.s more relevant pages
by
KaiSeun
·
· Score: 1
That would be a good idea for them to do, but on the user side, the best solution in my opinion would be to just develop the ability to skim through the short descriptions, while identifying what looks like spam, and what is real. Do this fast enough, and you could go through 10 result maybe every 5 seconds. Not terribly efficient, nor does it solve the problem that google hasn't done anything yet, but a solution is need now.
Of course, people who currently do this probably are also able to skim through article and pick up key points.
Re:More pages v.s more relevant pages
by
Anonymous Coward
·
· Score: 0
It indexes the URL i leave on my slasdot comments. Its quite useful actually, i just managed to reclaim my home page from googles cache! I lost it in a format! im really happy now,even tho it is a small piece of poo!
I've always wanted to build a dead-man-switch email system. Something that pings you every week to see if you're still alive, and if you don't respond it sends emails. Something to protect you if you're blackmailing somebody, or let your boss know what you really think of him now that you're beyond retribution. Or maybe just a sappy final love letter to your wife. That sort of thing.
But boy would you have to build safeguards into that. "Uh, sorry, I never meant to admit my homosexual attraction to you, but see I went on vacation and forgot about the deadman switch..."
I've always wanted to build a dead-man-switch email system.
I always thought that people just wanted their porn to disappear when they do.
Do this affect how fresh their index will be?
by
Jugalator
·
· Score: 3, Insightful
I wonder if it'll take longer to index twice as many pages? Or if they, along with this change, improved their spider and/or added hardware. Otherwise I'm not sure this change is for the better, unless you like to search for really obscure topics.
-- Beware: In C++, your friends can see your privates!
Re:Do this affect how fresh their index will be?
by
andres32a
·
· Score: 1
Actually no. Better search results means fewer necessary searches, which in turn will make the entire process most time effective. And anyway, you can`t just stop indexing webpages just because it might take longer to index them. You just need to improve on hardware or the technology itself.
Re:Do this affect how fresh their index will be?
by
Jugalator
·
· Score: 1
Better search results means fewer necessary searches, which in turn will make the entire process most time effective.
Search results? Are you talking about a person searching? I was mostly concerned about how quickly Google can update their complete index now that it doubled in size. I understand for my part it might get better, as long as the index is kept up-to-date.
And anyway, you can`t just stop indexing webpages just because it might take longer to index them. You just need to improve on hardware or the technology itself.
Yes, I realize this too, however I just wonder if Google made the necessary hardware/tech changes to maintain their current freshness of the index so we aren't getting an index, say, twice as big but taking twice as long to reflect all the always ongoing fluctuations on the web. I'm not sure if that would really be an improvement. More broken links and all that.
-- Beware: In C++, your friends can see your privates!
What is new about this.
by
hanssprudel
·
· Score: 3, Interesting
What the article does not point out is why this something important. For just about forever google's store has been coverging on 2**32 documents. Some people have speculated that Google simply could not update their 100,000+ servers with a new system that allowed more. Apparently they have now done the necessary architecture changes to allow for identifying documents by 64 bit (or more identifiers) and back in the business of making their search for comprehensive.
Good timing to conincide with MSN attempt to start a new searchengine too!
Re:What is new about this.
by
Jugalator
·
· Score: 3, Interesting
Good timing to conincide with MSN attempt to start a new searchengine too!
Yes, they'd better fight back, as they now have a serious competitor in MSN. It's giving very accurate results.
Doesn't anyone find it strange that Google gave the same top result there a while back?
MSN must be using a very similar algorithm.
Maybe a bit too similar...?
*tinfoil hat on*
-- Beware: In C++, your friends can see your privates!
Re:What is new about this.
by
slavemowgli
·
· Score: 2, Insightful
I don't quite believe that Google would've limited themselves that way (using 32 bit identifiers for documents) - that would've been incredibly short-sighted.
-- quidquid latine dictum sit altum videtur.
Re:What is new about this.
by
Anonymous Coward
·
· Score: 4, Interesting
For just about forever google's store has been coverging on 2**32 documents. Some people have speculated that Google simply could not update their 100,000+ servers with a new system that allowed more. Apparently they have now done the necessary architecture changes to allow for identifying documents by 64 bit (or more identifiers) and back in the business of making their search for comprehensive.
As someone who routinely follows these things, I couldn't agree more with your statement. My company operates a number of sites, and over the past 6 months, we've seen an obvious trend. Sites with, say, 5000+ pages, which used to be entirely indexed in Google, gradually had pages lost from Google. A search for site:somesite.com would return 5000 results 6 months ago. 3 or 4 months ago, the same search gave maybe 1000 results. This month maybe 500 or 600. We were definitely of the opinion that Google's index was "maxxed out" and was dropping large portions of indexed sites in favor of attempting to index new sites.
Now after seeing this story, I did a search and found literally all 5000+ pages are indexed once again. This is a huge step forward for webmasters everywhere. If your site had been slowly edged out of Google's index it's most likely back in its entirety now.
"You'll note that other versions of Linux are languishing at version 6.3 or even 2.2 - only Be Dope Linux Version 27.1 with AVN (Advanced Version Numbering) brings you a version of Linux numbered at 27.1".
Re:What is new about this.
by
Jugalator
·
· Score: 2, Insightful
Wow, Microsoft must have fixed it... It now no longer shows microsoft.com as top hit.
Haha, I guess the joke reached MS headquarters.:-P
-- Beware: In C++, your friends can see your privates!
Re:What is new about this.
by
Anonymous Coward
·
· Score: 0
Actually, it first gave Microsoft.com as top hit. It now gives Be Dope with that search.
However, the first Google bomb mentioned in the popular press may have occurred accidentally in 1999, when users discovered that the query "more evil than Satan " returned Microsoft's home page. Now, it returns links to several news articles on the discovery.
As you see on the MSN search page, the same is happening here. I doubt they've made changes to target this exact query.
-- "Stop failing the Turing test!" -- Dilbert
Re:What is new about this.
by
Anonymous Coward
·
· Score: 0
Is Google not allowed to be short-sighted?
Re:What is new about this.
by
bighoov
·
· Score: 3, Interesting
Probably not short sighted, but rather an space and cpu efficiency issue. Space - If you have 64-bit doc ids, even if you index 2^48 documents you're still wasting 16 bits per stemmed word per document. CPU - dealing with 64-bit integers on 32-bit hardware usually involves multiple loads, and decreases what can fit in the hardware data caches.
Re:What is new about this.
by
Dayflowers
·
· Score: 1
It might have been an option of compromise.
One thing is for sure: they stayed on the 4bill for a looooong time.
--
I am a speak english. Do you not? - Saroto
Re:What is new about this.
by
goatpunch
·
· Score: 1
Yes, before the leap to 8 billion pages they were indexing 4285199744 pages, which is 99.8% of 2^32 (4294967296) - these numbers seem too close to be a coincidence (they differ by about 1 million).
Re:no update on the images
by
Anonymous Coward
·
· Score: 0
You really got the hots for Lyndie England huh?
Re:no update on the images
by
Anonymous Coward
·
· Score: 0
Yeah, I really like their image search, but the ancientness of it makes it less and less useful as time passes. Anyone know of a decent alternative that has a more up-to-date index?
Google makes minor change to website - news at 11!
by
Sanity
·
· Score: 3, Insightful
Does every minor Google or Apple related thing deserve a slashdot story? Can slashdot create a "Fanboy" section for insignificant stories advocating Google (with their software patent) and Apple (with their iTunes DRM)? That way I could filter them out more easily.
You mean Spock has a goatse...picture of Jim Kirk.
Quality - not quantity
by
seanyboy
·
· Score: 3, Insightful
Google needs to stop obsessing about the number of indexed pages, and start concentrating on the quality. Since pagerank was switched off, 2 out of 5 searches now seem to be jammed with pages full of nothing but random words and adverts. It's even more galling when the adverts are Google Ads. Much as I love Google, they're becoming increasingly less effective as a tool.
-- Training monkeys for world domination since 1439
Re:Quality - not quantity
by
Ingolfke
·
· Score: 3, Funny
I agree search engines are so 1990. I rely exlusively on word of mouth to find websites. If Firefox would add a button to the toolbar that said 'Cool Sites', maybe with an icon of a pair of glasses, and have the button link to a webpage with links to the latest cool sites on the net, that would certainly be the end of Google and their 8 billion pages. Pah!
Re:Quality - not quantity
by
Onionesque
·
· Score: 2, Insightful
To paraphrase Churchill, Google is the worst system devised by the wit of man, except for all the others. Where else would you go? Yahoo? Hey, how about AltaVista?
The problems faced by Google in their battle against the scumbags who would game the system are faced by every other search engine. Google, IMHO, handles them better.
Re:Quality - not quantity
by
seanyboy
·
· Score: 1
Agreed, they still need to know when people are being frustrated by the search results they're being given. And I'm finding it increasingly difficult to find what I want with Google.
-- Training monkeys for world domination since 1439
Re:Quality - not quantity
by
Anonymous Coward
·
· Score: 0
No, this is a good thing. It's taken them longer and longer to get pages in the damn index.
Re:Quality - not quantity
by
dabadab
·
· Score: 3, Informative
Tried Stumbleupon? It has a plugin for firefix iirc.
Re:Quality - not quantity
by
Anonymous Coward
·
· Score: 0
He was referencing the old Netscape default setup, I believe.
Re:Quality - not quantity
by
seanyboy
·
· Score: 4, Interesting
My bad. I'd skimmed a few things on the web, and assumed that it had been switched off. Looks instead as though Google have changed how it works. See PageRank is dead. I need to investigate further.
-- Training monkeys for world domination since 1439
That was actually how Yahoo! got started. A few of college drop-outs started making a webpage linking to their favorite sites... and their friends started going to it, and their friends' friends, and their friends' friends' friends... and then somebody offered to pay them to advertise on the site. And we ended up with this.
Re:Quality - not quantity
by
WhiteDragon
·
· Score: 1
at the bottom of every search results page, there is a link that says, "Dissatisfied? Help us improve". I've clicked on it once or twice, when encountering a particularly spammed keyword and they have fixed it!
-- Did you mount a military-grade, variable-focus MASER on an unlicensed artificial intelligence?
Re:Quality - not quantity
by
melvster
·
· Score: 0
Remember over 50% of google employees work in advertising or marketing.
They are an advertising compnay. The search is just a teaser to get you to their site.
StumbleUpon.com... you can thank me (or demonize me!) later.:^)
--Robert
Re:Quality - not quantity
by
Anonymous Coward
·
· Score: 0
For you young ones... anyone who was on the net before 1995 remembers the time when "cool site of the day" was the best way to find new interesting pages. In other words, it was a joke for us old fogeys.
Re:Quality - not quantity
by
Anonymous Coward
·
· Score: 0
You know that Zawodny works for Yahoo's search team, right? Not exactly the most impartial source to trust..
First, I have to give reluctant kudos to MSN for parsing long boolean queries such as (((A AND B) NOT (C OR D)) AND E) Google needs to play catch-up here.
Second, we need SORT OPTIONS. It's not that hard to allow sorting by date, title, file type, and number of hits. Again, MSN has won a march on Google in this area.
-- "We reject as false the choice between our safety and our ideals." --The American President (20.1.2009)
So they're using Slashcode's dupe-checking module?
--
--- DRM is like antifreeze, to the MPAA/RIAA it's sweet, to the consumers it's poison.
Makes you wonder...
by
manmanic
·
· Score: 5, Insightful
Does this mean that I've been missing a huge amount of important information until now? I'd just assumed that Google covered the entire relevant web but now it seems to cover the whole same amount again. My Google alerts also seem to have started producing a lot more results which suggest that a lot of these new pages are rated quite highly. Who knows how much more quality content on the web we're just not seeing?
Re:Makes you wonder...
by
jlar
·
· Score: 5, Interesting
"Does this mean that I've been missing a huge amount of important information until now?"
Maybe the steep increase is due to all the new file formats they are indexing now. That might be useful for some people (although I sometimes find it kind of annoying that a search returns MS-Word documents).
That might be useful for some people (although I sometimes find it kind of annoying that a search returns MS-Word documents).
This isn't Google's fault. I'd rather that people didn't put documents on the web in Word format, but people do it. I still need the information that's in the document, though, and I would like Google to index it. Same with PDFs, or any other format that contains text. An option would be nice for those who are looking for HTML only (or similar).
Re:Makes you wonder...
by
RedWizzard
·
· Score: 2, Informative
Maybe the steep increase is due to all the new file formats they are indexing now.
The steep increase is probably due to an architecture change. Google has, for a long time, been indexing around 4 billion pages. That implies that they have been giving each page a 32 bit unique identifier, and had exhausted that id space. It would be a lot of work for them to seamlessly upgrade all their software to support a larger id, and it has taken them a long time to do so. Now that they have the large jump in pages is simply due to the fact that they can index much more of the web.
Re:Google makes minor change to website - news at
by
timdorr
·
· Score: 1
Maybe it's just me, but I'd call the doubling of information available for me to search a pretty significant improvement. Especially when the last update was only a 1b increase ("only" is a relative term, of course...).
Re:Google makes minor change to website - news at
by
Anonymous Coward
·
· Score: 0
I guess its all in the wording
Re:Google makes minor change to website - news at
by
Anonymous Coward
·
· Score: 0
Gentoy and this weeks kiddies favourite Umbongo Linux come pretty close.
Google needs your cookie badly
by
Anonymous Coward
·
· Score: 2, Informative
Until today you could save your google settingswithout loosing your privacy.
You can still save those settings but google refuses to use them when you block their cookie. In my case I
get 10 search results although I like to receive 100. Seems that they are
making many dollars on a user's cookie, and now they are a public company my
privacy is less important than "stock holders' interests".
Re:Google needs your cookie badly
by
Anonymous Coward
·
· Score: 3, Informative
You can still save those settings but google refuses to use them when you block their cookie. In my case I get 10 search results although I like to receive 100.
Give it the keyword 100, then type 100 search_term in the address bar to use it.
Re:Google needs your cookie badly
by
marc252
·
· Score: 0
well, If you consider all your privacy goes away because a cookie from google, you better stop using cellular phones, regular telephone lines, credit cards, internet connections from private places, snail mail, bank accounts, and ahh! don't you go walking through city streets, there are cameras monitoring your activity....
Once you've done all this you will find yourself living alone in a forest, then you can surely shout out loud "I'm free!!!" but,
be careful don't shout to loud, threre might be some satellite monitoring forests for people who try to live aside from the system!
Good luck
Re:Google needs your cookie badly
by
Anonymous Coward
·
· Score: 0
Great solution! http://www.google.com/search?num=100&hl=enl&q= %s works too. Thanks.
Re:Google needs your cookie badly
by
Anonymous Coward
·
· Score: 0
> save your google settings without loosing your privacy
How does saving the settings make your privacy not tight? That doesn't make sense.
Re:Google needs your cookie badly
by
Anonymous Coward
·
· Score: 0
> and now they are a public company my privacy is less important than > "stock holders' interests".
May I treat you to the obligatory "Duh!" ?
Re:Google needs your cookie badly
by
Anonymous Coward
·
· Score: 0
You are so the man. Cheers!
Re:Google needs your cookie badly
by
oojah
·
· Score: 1
Presuming you are using Mozilla (I don't use Firefox, but I guess it should work the same), find the file searchplugins/google.src in your mozilla directory.
After the line starting
You may also want to change the updateCheckDays value as well as it looks as though it will overwrite your modified google.src file (although I'm not sure about this).
This modifies the default google search behaviour that you get when you type in the URL bar, press up then return.
Cheers,
Roger
-- Do you have any better hostages?
Re:Google needs your cookie badly
by
Everyman
·
· Score: 1
The instructions for cookie-less preferences at Google-Watch have been updated. By editing your bookmark and adding four characters, the Google sabotage is defeated.
Re:Google makes minor change to website - news at
by
Zork+the+Almighty
·
· Score: 1
The extra 3 billion pages are probably link farms.
--
In Soviet America the banks rob you!
Re:Google makes minor change to website - news at
by
dotmike
·
· Score: 2, Funny
At the same time, can Slashdot create a "Curmudgeon" section for those who like to gripe about the less than monumental significance of some story topics?
Google domination.
by
Anonymous Coward
·
· Score: 2, Informative
Local tabloid Aftonbladet is running a poll on search engine use:
Google (81.4 %)
Yahoo (2.2 %)
MSN (3.8 %)
Other (11.4 %)
Don't know (1.2 %)
61730 votes so far.
I'm a little surprised, either the masses who use the "default" (MSN?) aren't bothering to answer, or google is simply very very dominant and those "default using masses" do not exist [in this country].
Re:Google domination.
by
Mostly+a+lurker
·
· Score: 2, Insightful
the masses who use the "default" (MSN?) aren't bothering to answer
I think it is more that many users of IE just do not twig that their failed page access resulted in an automatic query to MSN.
In reality, most users make occasional deliberate queries to Google and more frequent accidental queries to MSN.
I watched a friend of mine type in the name of a website wrong so of course it brought up the MS search engine.
In the MS search box she then proceeded to type in google and hit enter. Does anyboy else see the incredible irony in this?
If I kept eating so much spam...
by
dos_dude
·
· Score: 2, Funny
... my weight would probably double, too.
Re:This is news ?
by
Ford+Prefect
·
· Score: 2, Interesting
A bigger index does not equal better search results, however, with the press this will generate, it will equal profits.
It would be terribly easy to get trillions of pages indexed. For instance, a site I've been working on has a public calendar system, with results fished out of a database. There are very few actual events in it at the moment, but with the 'Previous' and 'Next' links it'll run from 1970 to 2038. A naïve web-crawler would index every single month for every single year, but Google would appear to have crawled over just a few, presumably flagging the pages as too similar to warrant further investigation.
With stuff like public web forums, Slashdot and the like, I can easily imagine comparatively small sites producing thousands of pages apiece. Is there useful information in there? Quite possibly, but it definitely needs treating in a different manner to an old-fashioned, static-pages-only site...
A lot of people have been asking what the point of the artical is, why does it matter, well possibly because Microsoft announced the launch of their search engine http://news.bbc.co.uk/1/hi/technology/4000015.stm and are claiming more pages index than google (5 billion) so google have responded by effectivly doubling their pages indexed.
In a statement Microsoft said its search engine returned results from five billion web pages - more than any other search engine.
But this quickly won a response from Google which announced that its index has now grown to more than 8 billion pages.
Prior to the Microsoft announcement, Google was only indexing 4,285,199,774 web pages.
Steve Ballmer is soon to announce that his daddy is one hundrad years old, and kan kick your daddy's ass...
-- Offtopic, Inflammatory, Inappropriate, Illegal, or Offensive comments might be moderated up.
Re:Mine is bigger than yours!!!
by
Anonymous Coward
·
· Score: 0
Prior to the Microsoft announcement, Google was only indexing 4,285,199,774 web pages.
Interesting world we live in when 4 billion is a small enough figure to be prefixed with "only".
Re:Mine is bigger than yours!!!
by
qbwiz
·
· Score: 1
A large world, actually. 4 billion is less than one per person.
-- Ewige Blumenkraft.
Re:This is news ?
by
Anonymous Coward
·
· Score: 0
Dictionary definition of redundant re.dun.dant adj 1a: exceeding what is necessary or normal: SUPERFLUOUS
b: characterized by or containing an excess; specif: using more words than necessary
c: characterized by similarity or repetition
d chiefly Brit: being out of work: laid off 2: PROFUSE, LAVISH 3: serving as a duplicate for preventing failure of an entire system (as a spacecraft) upon failure of a single component -- re.dun.dant.ly adv
Searching LiveJournal.com
by
hackrobat
·
· Score: 4, Informative
Looks like they've added a gazillion LiveJournal pages to their index. I used to have a Google search box on my LJ that didn't throw up relevant results until last week or so. Now it works perfectly, just like builtin search (like what you see in MT and WordPress).
Re:Searching LiveJournal.com
by
grazzy
·
· Score: 1
Re:Searching LiveJournal.com
by
cavemanf16
·
· Score: 2, Insightful
MSN's "msnbot" has been crawling/spidering my webserver (which runs Geeklog and is just another blog of my random crap) pretty extensively for weeks now. (Lie 5 times a day it seems) Searching on Google for my site's name now reveals more results from my site, but not a lot of those circle-jerk style search results pages that are just trying to generate some ad revenues. However, using the beta.search.msn.com site DOES yield a lot more random crap (mostly blogs and personal webservers) that somehow generated some kind of link to my site because of the title of one of my articles, someone linking to my site in one of their blog posts, etc.
I have a feeling MSN's new search site is gonna be mostly blogs and advertisements, not relevant information. I think it's good Google has indexed more pages, but I still believe their algorithm will continue to provide more USEFUL results than MSN. (BTW, the googlebot doesn't hit my site too frequently which tells me Google's bot understands that my site isn't updated too frequently, nor is it linked to from other important sites)
Geeks who understand marketing
by
Mostly+a+lurker
·
· Score: 1
What Google has going for them is that they combine technical know how with marketing smarts. I still use Google as my primary search engine because it produces better results. Google understands though that, in the market at large, they need to play the numbers game. Fine they say. Within hours of the Microsoft announcement, out comes this.
Frankly, I love it any time someone can best Microsoft. The next big thing may well be consumers putting their data on servers provided by the likes of Google, Microsoft and Yahoo -- running their applications there and having PCs that are little more than very easy to use display devices. If so,I would not mind seeing Google with the dominant market share. I trust them with that kind of power a lot more than Microsoft.
Doubled? Wait a minute...
by
't+is+DjiM
·
· Score: 5, Funny
From 4 to 8 billion pages... I guess they just indexed the google cache...
-- --Use ant to make.war
Re:Doubled? Wait a minute...
by
fronti
·
· Score: 1
rotf...
perhaps it's a bug in de indexer..
But when I take a look in my logfiles, there is a real "fight" googlebot vs. new-msnbot vs. ast jewes..
and all the 3 index the hole transcode, mplayer, xvid mailinglist archive ( http://www.itdp.de )
tonns of small files:)
(ok I know about robots.txt)
Competing with Microsoft's 5bn?
by
Richard+W.M.+Jones
·
· Score: 4, Informative
On the same day that
this story hits the BBC. In that
story Microsoft claim that they have
5 billion pages indexed, more than
the 4.2 billion pages indexed (at that
point) by Google. The BBC have just
updated the story with the 8bn figure.
Yes, this is probably a troll, but anyway... I take it you've never heard of the robots.txt file? You sound like you might want to read up on it. It's designed to help control the spidering of your pages for whatever reason, particularly cases like yours or situations where a spider would get confused and end up doing something stupid (recursive stuff, etc).
-ReK
-- md5sum -c reality.md5
reality: FAILED
md5sum: WARNING: 1 of 1 computed checksum did NOT match
Re:Google thieves my bandwidth
by
MobiusClark
·
· Score: 1
Erm... Have you considered putting a robots.txt file on your webserver?
The Googlebot is quite well designed and should honour any instructions you put in it.
meta-no-archive
by
Anonymous Coward
·
· Score: 3, Interesting
apparently my sites will never get a good ranking on google because I don't want the search engine to cache the site. So I'm using meta no-archive tags. That's the only thing I can figure out why the sites rank so poorly on google, when they come up in the top 10-20 hits on yahoo and other search engines. The keywords for the searches are valid, the sites are relevant to the keyword searches, yet the sites don't show in the top 100-300 on google.
I've avoided all the usual spam type of tags (auto refreshing, hidden text, cloaking, etc.) and the sites are legitimate and on the up and up, and yet the only page or two that google is spidering are the one or few that appear to be without the no-archive tags and possibly the revisit/expire tags.
Is google's policy, allows us to cache your site, or get penalized? Anyone else run into a similar problem or can shed some light on this? The only other thing I can think of is the robots text file, that keeps googlebot, and then other spiders through a *, from entering images directories. The spiders, including googlebot, aren't restricted from entering any other directories, they are given free reign.
Anyone else with problems with no-cache, no archive, tight revisit/expire times, or similar non-spam tags that result in penalties in google ranking?
I've been using google exclusively for a few years now. But the poor page ranking of sites on my server got me wondering about other sites that may be relevant to my own searches which may be exluded or penalized by google. So I've started using Yahoo search again, as much as I hate Yahoo (what they do with advertising to Yahoo groups and Yahoo mail is a shame). It appears that Yahoo is including better results because other sites show up with higher ranking that actually are relevant. So I've learned that Google isn't as perfect as I thought it was, which was disappointing in itself. It was easy using one search site. Now I have to use two to make sure I'm getting good results. Anyone know if there is a plugin for Firefox with both Google and Yahoo search boxes on the toolbar?
I use "no-archive" on several websites, and they don't seem to get penalized. From my observations one of the most important things for a good Google ranking is still page rank. I would check the number and quality of backlinks of your site and the sites ranking higher than yours. Maybe that's the reason.
This is exactly why I don't use Google anymore - not since 2002, or thereabouts. I suspect it has something to do with who buys adverts and who doesn't. What I seemed to observe was that when I searched on Google I would always get a lot of results that were irrelevant or only remotely relevant, but which pointed to commercial sites. The most grotesk was once I searched for nonsense words (just to see what happened) and I got results like 'Buy books about [nonsense-word] on Amazon'. I mean, that is simply totally worthless; at least to me. Not to mention deceitful.
Yahoo, which I prefer now, does the same, but at least they are honest about it and display these links seperately as 'Sponsored Links'.
Re:meta-no-archive
by
justMichael
·
· Score: 2, Informative
I'm sure I've seen some way of doing a sort of backwards search on a page, that will show all the pages in Google that link to it.
Re:meta-no-archive
by
Anonymous Coward
·
· Score: 0
You do notice that these are *on* the side, right? Not with the regular search results? As opposed to Yahoo which puts them right above their regular results, only slightly seperating them. I guess putting "Sponsored Links" right above it isn't much of a give away either?
Re:meta-no-archive
by
Anonymous Coward
·
· Score: 0
Google probably doesn't penalize you do for not allowing a cache, but they should.
Nothing pisses me off more than sites that allow google to view their logged in information (many scientific journals do this) but not let google cache it. If you click on the link, you can't get any information unless you pay to subscribe.
If you don't want your information available to the public, then why have it in a search engine???
If you don't mind people seeing your website, then why not let google cache it??
Re:meta-no-archive
by
Anonymous Coward
·
· Score: 0
What does selling a subscription have to do with what I posted?
I'm not charging anyone to see the content.
In case you haven't noticed, google doesn't update their search engine on-the-fly, nor do they do it very often. If you've been paying attention at all, you'll know that google doesn't update very frequently, in fact takes months to update, because when they do, the howls of protest come out from some sites losing ranking, while others stay quiet while they get a better rank. Some of it has to do with changing algorithms, and some may think they don't do this often, but I'd say they change their algorithms daily, tweaking all the time, so every index update changes ranking.
So this should give you a hint as to why not let google cache it. Web sites that have information that changes frequently is the #1 reason. Google honors the revisit tag, and possibly the expire tag. But if you have a revisit tag that says revisit in one day, and they don't change their index for 6 weeks, and you change your site on November 11 for Christmas, and their last update was November 1, with the next update January 4th, they can spider your site daily and you are still screwed for the Christmas shopping season, when 40-60% of sales happen for some businesses.
There are a lot of other reasons, but the reason outlined above is the #1 reason for the sites on my server. It's just the opposite of what you state, we want the information available to the public, and we don't want outdated, old, no longer accurate information available to the public.
Thanks to google penalizing the refresh tag, (actually thanks to the spammers abusing it, but google is the one penalizing it), I still have sites and directories out there linking to pages that either have moved, or had their urls renamed due to a mispelling. Yet I still have hits coming in for the mispelled urls, and they are ignoring the server side redirection that Apache does through the httpd.conf file. So what would you rather have, a missing file with a cached page that shows nothing, or a listing that shows the up-to-date site, including the correctly spelled url sub-page? The cache feature won't work in the above example.
Scientific journals are selling their journals, not giving it away as free content. If they allowed caching, would anyone buy what they could get free through the cache? I personally think that scientific journals should be free, not subscription. I think a lot of important information is locked away because of this. But it appears, thanks to the internet, that scientific journals are being forced to evolve along with a lot of other industries. From what I read in the last year or two, the subscription only journals are coming under pressure from freely published journals. So they may be charging the content providers (the authors) to publish, and publishing openly in the future. This would be a good thing for everyone, I'm sure you would agree. And I use the cache for slow loading sites since the cache is faster. And I use it for other purposes. But if you use google enough, you'll realize that some industries, and some fields/categories don't use, or avoid the cache entirely. When you see this, take a look at the web sites, what their content is, and maybe you can figure out why they avoid the cache also, which can be any one of a number of reasons, but the reasons are usually similar for particular industries/fields/categories, and which may be completely different for other industries/fields/categories.
The sites on my server are being served by apache on linux. Their uptimes are measured in months to years, and better than 99.9% availability. I've had less luck with google itself than the sites on my server. And that doesn't include the backup servers. So availability is not issue.
It's even worse than that!
by
Anonymous Coward
·
· Score: 0
I don't quite believe that Google would've limited themselves that way (using 32 bit identifiers for documents) - that would've been incredibly short-sighted.
What was even more short sighted was their use of two digits to store the year value for the file dates. Something about the amount of space saved by not using those extra two bytes (four for unicode).
It isn't about having a better search engine, so much as it is knowing how to use it. If you are looking for information on a recipe for oriental rice using asian spice, how would you search?
Bad search example:
oriental rice recipe asian spice
Good search example:
recipe+"oriental rice"+spice
See the difference? google tries its best to get rid of the spam pages, but it won't ever combat them all. Half of the work has to be done with you understanding the best way to describe to the search engine, what it is you want to do. The better you explain it, the better it can search for you.
-- "We're breaking out the ramen noodles. . . "
"Really? Is it someone's birthday?"
Re:Google thieves my bandwidth
by
Rakshasa+Taisab
·
· Score: 2, Insightful
You can rant all you want, but Google still has a fair use right to your images. They are reduced resolution images and therefor legal for non-commercial use.
Not to mention robot.txt, but that is so obvious it shouldn't need to be mention.
-- -
These characters were randomly selected.
great but where are the .txt and directories?
by
js7a
·
· Score: 2, Informative
Google won't be within reach of the pinnacle until they index.txt files, directory listings, and anonymous ftp sites.
Re:great but where are the .txt and directories?
by
geminidomino
·
· Score: 2, Informative
One out of 3 ain't a bad start. Add a few more keywords to narrow down the google-crawling.
Re:great but where are the .txt and directories?
by
Lehk228
·
· Score: 1
they *Do* index directory listings just search for "index of"
-- Snowden and Manning are heroes.
Re:great but where are the .txt and directories?
by
Anonymous Coward
·
· Score: 0
or "parent directory" to catch those fugly IIS autogenerated indexes, too.
Re:great but where are the .txt and directories?
by
dandman
·
· Score: 1
For directories and other files (including, interestingly but worryingly zips etc) I found much unexpected data in the Wayback Machine
Now while it's not exactly a search engine itself, it's in the same family, and I use it instead of GoogleCache when needed. Most informative were the snapshots you can find of sites recorded whilst they were in development (ie, before they turned off directory listings and turned on security settings) Good for retrieving any backups you forgot to make - although a bit hard (and slow) to re-assemble if using a web-whacker to grab the bits automatically - the mirror files are all over the place.
Re:great but where are the .txt and directories?
by
irc.goatse.cx+troll
·
· Score: 1
You get better results with intitle:"Index Of" Saves you from some spam traps, anyways.
-- Pain lasts, kid. Its how you know you're alive.
Sometimes I think this growing up thing is just pain management-TheMaxx
Re:Google thieves my bandwidth
by
jvj24601
·
· Score: 5, Informative
Well, if you know that Google is indexing your site and "stealing" your bandwidth, then you must have looked at the server logs, right? You'd see the name of the search bot is googlebot. Search for it, and you'll find that the first relevant link explains how to prevent googlebot from accessing your site.
The logs would probably also show failed attempts to find the file/robots.txt. Similar info is gained from searching on that term as well.
Search terms: oriental rice recipe asian spice Search Results: Results 1 - 10 of about 254,000 for oriental rice recipe asian spice . (0.40 seconds) Search Effectiveness: REASONABLE. good list of relivent items matched.
Search terms: recipe+"oriental rice"+spice Search Results: Your search - recipe+"oriental rice"+spice - did not match any documents. Search Effectiveness: UTTER SHITE
The user wants SIMPLICITY. If google cannot give decent results for simple search criteria, then people will go elsewhere.
The examples were only examples, nothing more, and hence thus why I said example. I'm quite sure most readers (that aren't out with a jackboot) will get the drift of what I am saying.
If a user wants simplicity, then they will get a simple search. If a user wants an advanced and refined search, then that requires advanced knowledge of google.
If people go elsewhere, oh well. those who know how to use the search engine properly will still be here, educating those who do not know how to use it. Know why? Because eliminating all spam and fake pages from searches won't happen. It just won't due to the time it would take to check each and every page for content, much less content defeating methods.
-- "We're breaking out the ramen noodles. . . "
"Really? Is it someone's birthday?"
Actually, no, it appears I'm the muppet. Google is one of the stories you have to have. I apologise!
J.
-- You're only jealous cos the little penguins are talking to me.
So, to sum up...
by
kahei
·
· Score: 5, Insightful
I am feeding this troll because there are people who really _do_ think like that and I wish I could yell at them to their faces:)
You put content in a place where it is publically accessible. You explicitly and proactively made that content available to everyone, including 'the average surfer' and googlebots. You took no steps to make it available only to the select few of whom you approve.
Now you are all cross and bothered because average surfers / googlebots have read / copied your content, such as it is.
The solution is to drown yourself in a bucket. I have a bucket.
-- Whence? Hence. Whither? Thither.
Re:So, to sum up...
by
Anonymous Coward
·
· Score: 0
Hear hear.
This person belongs with the idiots who use right-click blockers to 'protect' their precious 'content', and the whiny teens who have a LiveJournal or blog and use it to publically slag off everyone and everything, and then throw a tantrum when someone reads it and responds...
If you don't want people to read it, and possibly even *gasp* save it for future use, don't post it publically.
I regularly watch where my nickname, full name, parents names, etc come up in google. I've noticed in the past couple of months, my hits have DRASTICALLY reduced. They just disapeared from the database. But over the past 2 days, I've gotten notifications (thanks google alerts) about new pages being indexed and voila! They come up in a search again.
Proximity search will help
by
Sai+Babu
·
· Score: 3, Insightful
This is why I've been begging google folks to implement NEAR operator!
Re:Proximity search will help
by
Spy+Hunter
·
· Score: 1
Google makes these kinds of operators almost entirely redundant. I can't remember the last time using search operators gave me better Google results; even quotes are unneccessary 99% of the time. Google already prioritizes the pages which contain your search words in the order you specify, and pages which contain your search words in close proximity, and I believe it even does this in a phrase-sensitive way (so if you search for two common two-word phrases, Google recognizes this and prioritizes results accordingly, instead of prioritizing results based on one four-word phrase). This works much better than a NEAR operator because it is automatic based on the phrases actually used in the pages you're searching for. Notice if you search for "ahi fish recipie" on Google you get results about the same as your example MSN search; no operators necessary.
Erm, that's only because of the bizarre plus signs the grandparent poster put in - try this. Note to grandparent: Just about any modern search engine assumes words not prefixed by anything are to be included in the Boolean search query. No need for +.
When a search engine announces it has increased its index of pages, it advertises a deficiency....
"Oh, if you just added several billion pages, were you giving me crap before? How many more billions of pages are you not indexing right now?"
Google's announcement merely gives its users reason to question the size and comprehensiveness of Google's index.
Re:Advertising a deficiency
by
skraps
·
· Score: 1
Riiight.. because search engines are supposed to be birthed by God with a complete index already in place. None of that crawling business to make the index larger.
-- Karma: -2147483648 (Mostly affected by integer overflow)
Re:Advertising a deficiency
by
fleener
·
· Score: 1
No, I said making a big public announcement of that sort is advertising a deficiency, not that building the index is bad. It's negative public relations. Read before criticizing.
The problem is, you tell people to use quotes and pluses and cryptic search terms.
When google cannot find anything, it comes up and tells them the opposite:
Tip: Try removing quotes from your search to get more results.
People don't need to know the quoting syntax, or the inclusion format rules, they just need to click the "Advanced Search".:)
When you make an comparison regarding how much better your way is than everybody elses, make sure your facts are clear. I agree it was a mistake, and I agree with your sentiment, but most users don't even know how to type a quote character.
-- liqbase:: faster than paper
Re:Google makes minor change to website - news at
by
shrykk
·
· Score: 1
You can block Apple stories in your user preferences page.
-- #define struct union/* Reduce memory usage */
Re:Google thieves my bandwidth
by
rdc_uk
·
· Score: 1
"Google still has a fair use right to your images. They are reduced resolution images and therefor legal for non-commercial use."
FYI; nothing google does is "non commercial".
Even the stuff they let out "for free" serves to funnel their adverts to you, which is their source of revenue. i.e. it is a commercial activity.
Ergo; their use of other people's data (or data ABOUT other people's data, such as a thumbnail of someone else's copyright imagery) is in NO WAY non-commercial.
Do web crawler really have a future?
by
Anonymous Coward
·
· Score: 0
It seems to me the larger and more dynamic web sites become, the less
and less useful web crawlers will become. I suspect it will get the
point were site admins will have to regularly submit a keyword list to
the search engines.
Re:Google makes minor change to website - news at
by
kjamez
·
· Score: 0, Offtopic
i don't know if it's news or not, but c|net news was reporting gmail now offeres free pop access.
that's cool.
i have gmail invites for free that require no ipod or free lcd signup. i just have no one to give them too. everyone i *know* has one already. i have six.
and it's more fun to talk to someone than just to submit them to the gmailinvitecache.
-- you can't have everything, where would you put it?
Re:In other news...
by
Anonymous Coward
·
· Score: 0
I am happy because this is the first google update that have indexed some files on one of my websites that are going to be used for a program I wanted to write. I registered the domain and created the basic website 11 months ago and have been waiting since!!!. So finally I will be able to get to work on it.
The real reason...
by
Anonymous Coward
·
· Score: 0
Every kid in China has been asked to make a webpage about their family as a school project.
Yes, the plus sign on google is only used to force a search for "common words", which are otherwise filtered out. These are usually simple words like are, is, how, what etc.
There IS a + operator, and you are modded "informative"....
gus
-- .. if only.
I know why it has doubled...
by
jmcmunn
·
· Score: 2, Interesting
Because every blogger in the universe has added at least 3 pages since the last index. I fail to see how it is significant to me that there are now 8 billion mostly worthless sites out there. The number of actually useful sites has not gone up considerably.
Web API
by
Anonymous Coward
·
· Score: 0
Just notised this on the google pages, does anyone know where there web API came from: http://news.google.com/apis/ oh, and why do they know own: http://www.keyhole.com/ no then, a picture with each google local response?
That's gota be the quickest dupe!!
by
mcoko
·
· Score: 0, Offtopic
Amazing...Just as fast as Goodle hit 8 Billion, Slashdot duped the story. Subscribers will see it two to three stories about this one.
-- www.fotoforay.com
Still censoring images?
by
Anonymous Coward
·
· Score: 0
Yeah, but are they still censoring stuff? Like pictures of american war crimes in Iraq (just try a search for abu graib and lyndie england, google still returns none of the pictures of her torturing "inmates"). Google appears to be very open to censorship by commerical and government interests. I'm afraid unless this changes I am going to have to stop using them.
Read more carefully.
by
Anonymous Coward
·
· Score: 0
As you are the foremost of several identical replies, somehow marked (+5 insightful) instead of (-1 redundant), I will answer you. But consider this a response for all of you who have replied.
Firstly, read my words. I am fully aware of the existence of robots.txt. The clue was where I said Unless I adhere to their own arbitrary rules. In your rush to correct me you have all seemed to have been missing your reading comprehension. Thanks should also go to moderators who have yet again branded me troll because of 5 or 6 of you hot headedly misresponding to the same mistaken point.
The real question here is why I, a UKian, should have to KNOW about robots.txt at all? Why should have to 1)find out, somehow just instinctively know about this arcane piece of information, just to host my own website for my friends to visit. Why is this an opt out list, instead of an opt in list? Why are you automatically expected to want these ridiculous bots crawling over my personal space, violating my privacy??This is an American company with dubious notions of personal data privacy and no clear data retention policy indexing the contents of my site for all to see, when this is against my wishes, basing its decision on its own ridiculous opt out list. That is the scandal. Next time, percieve the beam instead of the mote etc. etc.
And when they make the original mistake, how easy do you think it is to make them cleanse the site from their archives, cache, etc? I can tell you, I hope you are never in that situation. In the end it took a legal threat for them to take notice of me.
Google are thieves. Just because they thieve from everyone, that their thievery is diluted a trillion times does not make it OK. They take our words, our information and our images and they use it to make money for themselves. And we do not see one red cent of it. It is paid to the rich backers.
I am aware that anonymity on the Internet is the motivating cause of many insults. But you should consider your words carefully, I was offended by being told to go and drown in a bucket. A family member drowned recently and it brought back to me the horror of a loss of life, especially in one so young. Please try to be more polite in future. I debated with myself to include this paragraph, as it would just open me up to more abuse from you, but I will give you the benefit of the doubt. We often say things in haste we repent slowly.
I'll give you the benefit of the doubt in assuming your thoughts and words are genuine, and you're not trolling.
The internet is not a private place. Other than the security procedures the developer implements, there is no "opt in" for any web site, any more than you can only allow people on your whitelist to call your phone number. Robots.txt is not JUST a Google or an American tool. It is recognized by many international search engines or other indexing spiders. Having a web site comes with a certain amount of responsibility, including protecting the information you want to keep private, and telling the spiders that index sites that you don't want them. I think the vast majority of users will agree that Google provides a valuable service, and by conforming to the rules they allow a way to keep sites out that don't want to be included in searches. Just add this:
User-agent: *
Disallow:/
to the file to keep all robots out. Now, robots that don't pay attention to your requests are a legitimate problem.
Re:Read more carefully.
by
Mant
·
· Score: 4, Insightful
Robots.txt isn't some thing that only applies to Google, it is (supposed) to be honoured by all search engines, and uses the Robots Exclusion Standard. So, when you claim these are Google's arbitary rules, you are in fact wrong. They are neither Google's nor arbitary (at least no more than any web standard).
So your clue, not so much of clue, as robots.txt doesn't fit your description.
As for why you should know about it, you are putting up a web site, it is part of running a web site. You might as well complain why you need to know about HTML, CSS or registering a domain name. Quite what coming from the UK has to do with it (something I also do), I have no idea.
"I simply do not want the average surfer to be able to visit my site, I am not interested in serving my pages to them, they simply would not appreciate or understand what it is I am showing."
Then a publicly accessable webiste is the wrong place. It is not your personal space, and it isn't private. You made it available to the world, nobody made you. To turn around and complain when (some of) the world visits it is hypocracy.
It's like putting up posters around a town, then running around complaining all these people are looking at them, won't appreciate them, and you don't want them too. It's also comes across as condescending and arrogant, which probably explains the nastiness of some of the responses.
You opted in when you put up the publicly accessable website. If all search engines had to be opt in, nobody could find anything on the web, and it would use a lot of its utility. Your assumed to want them crawling becuase the vast majority of people do, they want their site to be found. If you don't though, no problem, just use the standards for stopping searches, or password protect the site. No scandal at all, just hysterics.
Showing the low res thumbnail of your image isn't violating your copyright either. The only legitimate claim you have is the amount of time it took to remove something from the cache.
The "thieves" accusation is even more ridiculous. If you put something up on the web people can see for free, you can't complain. There are options if you want to protect it. Google doesn't claim you work as theirs (which would be 'stealing' or at least copyright violation), they help people find you public web site.
If you don't want a public website but made one, whose fault is that? If you are going to run a website and can't be bothered to find out how to do it properly, you can't blame Google.
Re:Read more carefully.
by
Anonymous Coward
·
· Score: 0
If you wanted to keep the information private why did you post it on a publicly accessable webpage in the first place. You should have included an authentication mechanism to limit access or make it a private BBS instead on part of the internet.
Re:Read more carefully.
by
Anonymous Coward
·
· Score: 0
"The "thieves" accusation is even more ridiculous. If you put something up on the web people can see for free, you can't complain. There are options if you want to protect it. Google doesn't claim you work as theirs (which would be 'stealing' or at least copyright violation), they help people find you public web site."
I hardly agree with this. If someone writes a book and gives it away for free. Someone can't simply put infront of the book "John Doe wrote: " and then use it to make money either by selling it, giving it away when you buy something or even give it away when you read their ads. The point is mute tho, because again you can opt-out of being cached.
*Posted annoymously because I modded in this thread.
Re:Read more carefully.
by
Anonymous Coward
·
· Score: 0
I think you mean hypocrisy. If hypocracy meant anything, it might be a deficient system of government.
Re:Read more carefully.
by
Anonymous Coward
·
· Score: 0
Hear hear! Stupid self-centered egocentric selfish grandparent BS. "I published information for the world, how dare they steal it from me!!!"
Google also started implementing POP access for Gmail today (my account has it enabled already). There's no IMAP yet, and we know there were ways of doingthis before, but it's an interesting direction for Google to take. As stated in the article, they don't intend to start charging for POP access or mail forwarding in the future. So how can Gmail's ad-based business model continue to be viable when its users can read their mail from external clients and via external addresses?
-- ----
scrm
GOOGLE DUPE STORY HIDDEN, BUT STILL THERE!
by
Anonymous Coward
·
· Score: 0
Somebody quickly mirror this link before Michael destroys it!
but... you don't know how to use it properly. the plus sign is not used in google searches in the way you specify. from google's own help page which you obviously haven't read yourself:
" + " Searches
Google ignores common words and characters such as "where" and "how", as well as certain single digits and single letters, because they tend to slow down your search without improving the results. Google will indicate if a common word has been excluded by displaying details on the results page below the search box.
If a common word is essential to getting the results you want, you can include it by putting a "+" sign in front of it. (Be sure to include a space before the "+" sign.)
Another method for doing this is conducting a phrase search, which simply means putting quotation marks around 2 or more words. Common words in a phrase search (e.g., "where are you") are included in the search.
"if the index has doubled so has our supply of information. Information rules!"
Not to be a spelling Nazi but you misspelled pr0n.
-- I laughed at the weak who considered themselves good because they lacked claws.
Re:What?
by
Anonymous Coward
·
· Score: 0
The user wants SIMPLICITY. If google cannot give decent results for simple search criteria, then people will go elsewhere.
Which user? The same people who threatened to move to Canada if Bush was reelected? I know I want relevance.
Where will that user go? Yahoo? MSN? Or any other search engine that is no more easier to search or to get relevant results? Google is still the best, but people want more.
Google is not hard to master if you spend a few minutes to read their guide.
Search for (abu ghraib), and find only pictures of a harsh wartime prison. None of the famous torture pictures which appear on the web, even though there are several showing Iraqis happily celebrating. Either their excuse that their index is just too old is now obviously bogus, or their image search should never have lost its "Beta" label. Either way, it's obviously dangerous to rely on Google, or any one Web filter, for any accuracy. A much more useful search system would include a multi-index client behind the browser's "Address" input widget. It would query multiple competing search indices (from among a user-defined list with popular defaults), returning collated results including the "messages" (ads) sent by the responding index. Accepting multiple different query formats, like Google, Yahoo, and others (and translating to query the respective indices), it could completely take over the search function, as long as it didn't play favorites with one engine over another (like the locked-in pages from these engines today). Mozilla plugin, anyone?
--
--
make install -not war
He who mods down
by
Anonymous Coward
·
· Score: 0
...doesn't get the joke....also has a small penis, most likely.
courts have ruled in google's favor
by
Anonymous Coward
·
· Score: 0
Thumbnails are fair use
Beginning of the end for Google!
by
Anonymous Coward
·
· Score: 0
Google was great a few years ago, now with 8 billion pages search results consist mostly of irrelevant nonsense such as web logs, links to stupid pages from lousy personal web sites like Angelfire and Geocities, etc.
For some reason Google ranks these junk pages higher than the primary source of information users are searching for. So much for Google's superior search algorithms, they are next to useless now that they have this much data to search.
I rarely use Google anymore because I waste too much time filtering through crap results!
...a search for Phil John now yields two results before my slashdot user page.:o(
-- I am NaN
Re: Case-sensitive search takes more effort?
by
Alwin+Henseler
·
· Score: 1
What you guys don't realize is the orders of magnitude higher that it takes to perform the whole "capitalized/not capitalized" search
I beg your pardon? You didn't ever follow any basic programming courses, did you? What you're saying is nonsense.
Case-sensitive searching is just EXACT comparison of text strings, if you compare:
"Joe JingleheiMerScHmIdT" with
"Joe JinGleheiMerScHmIdT"
there's no match, because the "G" doesn't match "g". This kind of searching is easy, simple & fast. Case-INsensitive comparison just means filtering the strings through "make all uppercase" or "make all lowercase" (or other filters) before doing the comparison. This is EXTRA work, but for most applications, insignificant (fast, simple & easy as well).
A long while back our CRM application was consistently getting hung on queries that involved customer first/last name combinations because it WAS capitalization sensitive.
You're confusing the programming technique itself with a badly coded implementation (your CRM app).
Google already had more than 5 billion
by
Dryth
·
· Score: 1
For quite some time now, searching for extremely common words (i.e. "the" by itself) would turn up a page count in the area of ~5,400,000,000. The number on Google's front page seems to update less frequently than the actual number indexed.
Still, I suppose it isn't unreasonable for most people to go by the number on the front page.
Re:Google already had more than 5 billion
by
izomiac
·
· Score: 1
Whoa, and that doesn't even count many of the non-english sites Google indexes... maybe the only thing they did was update the number on the bottom of the webpage...
Re:Google already had more than 5 billion
by
adpowers
·
· Score: 1
I'd believe it. Notice how when you search for [the] now, it returns exactly 8 billion pages? Who wants to bet Google has code in there that limits the number of results listed (not that you can view them anyway) so no one really knows/how/ much they have indexed. They are secretive about their computer count for competition reasons, I wouldn't be surprised if they limit how much the public knows about the number of indexed pages. However, I, like others, have noticed a bunch of new Google Alerts being sent out, so maybe they did update, but it may be much more than 8 billion.
Andrew
Re:Google thieves my bandwidth
by
Rakshasa+Taisab
·
· Score: 1
I can't find those adverts you are talking about, perhaps you are talking of some other Google image search?
-- -
These characters were randomly selected.
Re:Google thieves my bandwidth
by
roman_mir
·
· Score: 1
What is more interesting to me is why aren't search engines work the other way around. Why do we need a robot.txt file to tell the robots to sod off, when anyone can come up with a new spider overnight.
I think it would be more appropriate to have the robots.txt file with invitations, so that the spiders would always check first and if they are welcome, only then they would crawl this site.
I wonder if they are going to take some actions based on blogs. It seems to be skewing the results a bit. Google bombing is still popular, I would think that google would do something to clean up the problem, and give less weight for a link from a blog to the text they use to define it.
I have found that it is annoying sometimes I will be searching for something, and my own website is a hit, I admit I am not surprised, because I will put things in the blog that interest me and I will search for the same things.
you don't know what you are talking about
by
Anonymous Coward
·
· Score: 0
the + in front of a word means it MUST be included. without it all words are optional.
just an FYI, an - in front of a word means it MUST NOT be included.
Re:you don't know what you are talking about
by
toddestan
·
· Score: 1
the + in front of a word means it MUST be included. without it all words are optional.
Anyone who has done some Googlewhacking knows that is false. But for search engines other than Google, this might be true.
no its not
by
Anonymous Coward
·
· Score: 0
the + sign is used to FORCE a word to HAVE to appear in the results.
searching for
military victories +french
will only bring back results that MUST HAVE french in them, making the results MORE RELEVANT
military victories -french
does the opposite, it says I only want military victories that DON'T have the word french in them!
god, people can't even read some simple documentation.
god, people can't even read some simple documentation.
Yeah, like yourself. This is straight from the most basic of the Google search instructions found here:
"The Basics of Google Search
To enter a query into Google, just type in a few descriptive words and hit the 'enter' key (or click on the Google Search button) for a list of relevant web pages. Since Google only returns web pages that contain all the words in your query, refining or narrowing your search is as simple as adding more words to the search terms you have already entered. [...]
Google ignores common words and characters such as "where" and "how", as well as certain single digits and single letters, because they tend to slow down your search without improving the results. Google will indicate if a common word has been excluded by displaying details on the results page below the search box.
If a common word is essential to getting the results you want, you can include it by putting a "+" sign in front of it. (Be sure to include a space before the "+" sign.)"
Fucking idiot.
-- Martin
When will they get around to indexing blogspot?
by
.killedkenny
·
· Score: 1
Looks like Google doesn't even index their own blog hosting site. The title of my blogspot blog is nowhere to be found in a Google search, and that blog is over 6 months old.
Re: Case-sensitive search takes more effort?
by
delphi125
·
· Score: 1
I think you are the confused one.
Great-Great Grandparent said Google uses an index.
Great Grandparent said that the index could be used to prefilter results.
Grandparent posted a problem with this - admittedly not as clearly as he could have.
You posted some crap which displays that you don't know what you are talking about at all, and insulting someone who is raising a valid objection.
The magic of Google is that using a very clever index, it can find relevant results amazingly quickly - as do all search engines. The whole point is that they don't search everything.
While prefiltering could in theory help, searching for exact matches is still far more expensive than is realistic. The reason for this is not the very specific searches proposed above, but rather people wanting to search for '#Windows' (with a capital W).
I'm sure you understand that searching for such a term in all documents which match the (case-insensitive, indexed) term 'windows' would be prohibitively expensive even if only a million such queries were put to Google each day.
The only sensible way in which such a thing could be achieved is if Google 'randomly' selected some searches to 'improve'. No keyword or special symbol, just IF there is CPU time and for a suitable (case sensitive) search term, do the post-processing. Having said that, it would be better to recognize 'common' case sensitive keywords, such as pH, PhD, EFNET, or whatever, and simply use those as separate keywords in the index.
What is more interesting to me is why aren't search engines work the other way around. Why do we need a robot.txt file to tell the robots to sod off, when anyone can come up with a new spider overnight.
I think it would be more appropriate to have the robots.txt file with invitations, so that the spiders would always check first and if they are welcome, only then they would crawl this site.
That is what they do. If you don't want to let any spiders crawl your site, you can say:
User-agent: * Disallow:/
If you want something fancier, you could have robots.txt served up by a program. But the features of robots.txt suffice for most.
-- Ironically, the word ironically is often used incorrectly.
Re:Google makes minor change to website - news at
by
mcc
·
· Score: 1
Uh...
Google, the foremost search engine, makes a pretty serious change (doubling their index size while suddenly announcing a new dedication to increasing their index) on the exact same day that an intended major competitor of theirs (MSN search, which slashdot already had a story on) launches...
I'd say that's pretty significant. Since you apparently don't want any Apple or Google news, why not just disable those stories...?
"The documents in Google's index are in dozens of file types from HTML to PDF, including PowerPoint, Flash, PostScript and JavaScript."
I'm sure people write a lot of great information in JavaScript. Is this a sign of Google going down hill?
How to get rich: 1) Create expensive, hyped-up IPO for search company 2) Announce you're getting double the pages since you now index JavaScript 3) Watch as share prices double 4) ??? 5) Enjoy yourself on out-of-the-way tropical island, with your GOOG shares sold just before the crash.
OT: hotmail storage increased to 250MB today
by
peter303
·
· Score: 1
Seems like lots of M$ and Google services are being enhanced today. I welcome the new storage.
Finally stopped using 32-bit int
by
glyph42
·
· Score: 1
I guess they finally got around to changing their page indexing scheme from a 32-bit unsigned integer. The number sat at 2^32 - epsilon for what seemed like an eternity! I expect the sudden doubling of pages is simply the backlog that built up while they waited for the conversion to 64-bit, or whatever.
-- Music speeds up when you yawn, but does not change pitch.
I just did a couple of comparative searches on google and the new http://beta.search.msn.com/ and it is the first time lately when I saw another search engine returning more results and faster than google. At least for some keywords.
No holes barred, find anything. One of the best sites on the web (totally private, no ads).
http://www.searchlore.org/
Knock politely, ask for Fravia.
http://fravia.2113.ch/phplab/mbs.php3/mb001
Win-Linux centric.
P.S. Some friendly advice, don't piss off the natives... you've been warned.
http://www.searchlore.org/tools.htm
-- ~hylas
Re: Case-sensitive search takes more effort?
by
Alwin+Henseler
·
· Score: 1
I think you are the confused one.
No confusion here, but misunderstanding perhaps. For clarification, let me summarise:
jez9999 writes he/she would like to 1) "search the web" for exact match '#windows EFNET'. Obviously a massive amount of work, impossible for quick search queries.
Google uses an index, which is updated/refreshed every so many weeks, and only contains a very limited/filtered subset of "the web". Logical, this is the whole point of using an index.
PsychoSlashDot writes that Google's index works in a way that doesn't allow search 1)
Erasmus Darwin proposes to do traditional search, and then use 2) retrieved subset of Google's index to do full-text search 1) on. I think it's very important to make a distinction here between 2) this subset of Google's index, and 3) the actual web content that this subset of Google's index refers to. 2) would be quickly accessible to Google, although using it differently could require major changes in Google's hardware/software infrastructure. To do full-text search on 3), you'd have to actually retrieve/process the web content itself, which could get huge task quickly, if search doesn't involve very small number of search results.
I think we can agree that 3) can be regarded very time-consuming, but that 2) may be possible, or not (ask Google).
cavemanf16's comment may have been meant to point this out (valid point), but what cavemanf16 actually wrote (CRM app stuff), says that full-text search becomes way more expensive if you include case-sensitivity. That is plain nonsense.
Searching subset of web-content found in Google's index (3) maybe too much work for Google, but maybe adding case-sensitive or punctuation search within subset of Google's index (2) IS do-able. Again, only Google knows.
They must have crawled into Slashdot's -1 threshold cache.
-- Chika Chik-ah... do-e ow ow.
Whoa! Did they disclose this to investors?
by
rbrome
·
· Score: 1
Whoa - wait a minute...
If that is true - if Google's system has had a design flaw limiting it to 4.3 billion pages until now - then that is a really huge weakness, risk, and vulnerability that the company has had until now.
Thinking back, they must have known about this for a long time - before they went public. If that's the case, did they disclose this weakness/risk to inverstors in their S-1? If not, did they break the law by not doing so?
Re:Google makes minor change to website - news at
by
Anonymous Coward
·
· Score: 0
The big issue with Google is that their page count has been stuck at around 4 billion for a few years now. Which, as covered elsewhere, indicated that someone was using 32bit unsigned integers somewhere...
Nice to see that they've finally patched everything up and can now index beyond 4 billion pages.
Client-side full-text search
by
gottabeme
·
· Score: 1
Seems like what is needed is a client-side app to take the first X number of Google results and do a case-sensitive search on the client computer, instead of on Google's servers. Sure, it'd take a while to download the first X number of results, but it would work if you really needed a more specific search.
-- "Those who consume the bulk of goods are those who make them. We must never forget this secret of our prosperity."
It's not because these spaces are "bizarre", it's because that instead of recipe+"oriental rice"+spice, the great grandparent should have put in some spaces around the pluses: recipe +"oriental rice" +spice
I spelled it write you ass. You should learn how to spell (or how to use a dictionarie sight for that matter)
Surely you mean you spelt it right. And surely you mean dictionary.
If you are so fond of the dictionary, try using it occassionally to ensure that you not only have the right spelling, but the right homophone (or is that the write homophone?)
-- This sig is intentionally blank
Page rank is good
by
Anonymous Coward
·
· Score: 0
I should've included that info in the top post. The particular site in question is linked by other sites fairly well, and some of the linking sites are highly ranked directories that place my site as a premier or feature site. And if a competitor complained about the site, and google looked at it for problems, they should've left it alone because the keyword results leading to the site are highly relevant and no one is led to the site through misleading keywords or other methods.
The number and quality of backlinks are the first things I checked and double checked to make sure everything stayed ok with the site. It isn't easy to fake or mislead on this, which is one of the reasons google uses this method, and one of the reasons why I would be expecting much better results explicitly because of this. But I highly doubt that google is alone in using this method for search results, since this has been talked about for more than a year. I'm sure that Yahoo is using this as one of their algorithms as well, which would explain the excellent ranking of the site in question on their search engine.
Something else is going on with google. And that is what got me wondering about search results when I use the search engine myself.
Yes because we at /. love Google..
Google is a constant source of information and a geeks friend - if the index has doubled so has our supply of information. Information rules!
Have they updated image search yet?
In related news, the sun has set for today and will rise again tomorrow. The web is growing. Google is indexing it. It isn't news, it's a factoid.
In Soviet America the banks rob you!
Personally I find that the lack of relevant pages if the biggest problem with search engines, not the lack of pages with information. It seems I always find what I'm looking for eventually, what I need improved is the time I spend looking though spam-bomb pages before I find a page with the correct information.
These spam-pages seem to be increasing; I mean those pages with just a buch of keywords or the output of some search system.
9/11: Never forget it was a false-flag operation
8 billion pages and not a single link to my blog.
/.
Can't figure of I should just shoot my self or maybe just open a subscription to
TC - My Photos..
I wonder if it'll take longer to index twice as many pages? Or if they, along with this change, improved their spider and/or added hardware. Otherwise I'm not sure this change is for the better, unless you like to search for really obscure topics.
Beware: In C++, your friends can see your privates!
What the article does not point out is why this something important. For just about forever google's store has been coverging on 2**32 documents. Some people have speculated that Google simply could not update their 100,000+ servers with a new system that allowed more. Apparently they have now done the necessary architecture changes to allow for identifying documents by 64 bit (or more identifiers) and back in the business of making their search for comprehensive.
Good timing to conincide with MSN attempt to start a new searchengine too!
Unfortunately they didn't update the image-search yet.
Does every minor Google or Apple related thing deserve a slashdot story? Can slashdot create a "Fanboy" section for insignificant stories advocating Google (with their software patent) and Apple (with their iTunes DRM)? That way I could filter them out more easily.
I made my internet mirror world reable.
Google needs to stop obsessing about the number of indexed pages, and start concentrating on the quality. Since pagerank was switched off, 2 out of 5 searches now seem to be jammed with pages full of nothing but random words and adverts. It's even more galling when the adverts are Google Ads. Much as I love Google, they're becoming increasingly less effective as a tool.
Training monkeys for world domination since 1439
No, wait, they are our internet search overlords since, like, 1999?
Mhm to anonymous coward or not to anonymous coward?
Will moderators smack my karma below zero?
Nothing beats Apple. It is superior and makes people instant geeks without knowing shiat.
A bigger index does not equal better search results, however, with the press this will generate, it will equal profits.
They already have.
Training monkeys for world domination since 1439
In case of slashdotting use this mirror.
In Soviet America the banks rob you!
They doubled the index by counting all the stuff on your hard drive indexed and sent to them by Google Desktop Search.
Yeah, but it'd be news if the sun set twice in one night or rose twice as bright.
It's more the exponential increase in the size of the index rather than the piecemeal addition.
over eight billion pages crawled
You don't just go from 4 billion to 8 billion overnight.
They are probably just crawling the same 4 billion twice.
Blearf. Blearf, I say.
Does this mean that I've been missing a huge amount of important information until now? I'd just assumed that Google covered the entire relevant web but now it seems to cover the whole same amount again. My Google alerts also seem to have started producing a lot more results which suggest that a lot of these new pages are rated quite highly. Who knows how much more quality content on the web we're just not seeing?
You are damned right that'd be news. It means we slipped out of reality and headed into the twilight zone. (CUE MUSIC)
stuff
Maybe it's just me, but I'd call the doubling of information available for me to search a pretty significant improvement. Especially when the last update was only a 1b increase ("only" is a relative term, of course...).
Tim Dorr
Owner/Manger
A Small Orange
I guess its all in the wording
Gentoy and this weeks kiddies favourite Umbongo Linux come pretty close.
Until today you could save your google settings without loosing your privacy. You can still save those settings but google refuses to use them when you block their cookie. In my case I get 10 search results although I like to receive 100. Seems that they are making many dollars on a user's cookie, and now they are a public company my privacy is less important than "stock holders' interests".
The extra 3 billion pages are probably link farms.
In Soviet America the banks rob you!
At the same time, can Slashdot create a "Curmudgeon" section for those who like to gripe about the less than monumental significance of some story topics?
Local tabloid Aftonbladet is running a poll on search engine use:
61730 votes so far.
I'm a little surprised, either the masses who use the "default" (MSN?) aren't bothering to answer, or google is simply very very dominant and those "default using masses" do not exist [in this country].
... my weight would probably double, too.
A bigger index does not equal better search results, however, with the press this will generate, it will equal profits.
It would be terribly easy to get trillions of pages indexed. For instance, a site I've been working on has a public calendar system, with results fished out of a database. There are very few actual events in it at the moment, but with the 'Previous' and 'Next' links it'll run from 1970 to 2038. A naïve web-crawler would index every single month for every single year, but Google would appear to have crawled over just a few, presumably flagging the pages as too similar to warrant further investigation.
With stuff like public web forums, Slashdot and the like, I can easily imagine comparatively small sites producing thousands of pages apiece. Is there useful information in there? Quite possibly, but it definitely needs treating in a different manner to an old-fashioned, static-pages-only site...
Tedious Bloggy Stuff - hooray?
He was just speaking what the rest of us were thinking. Right now, we're thinking that you don't know how to spell "repetitive".
Google respects the robots.txt file. Use it.
No, you don't know shit infinity.
A lot of people have been asking what the point of the artical is, why does it matter, well possibly because Microsoft announced the launch of their search engine http://news.bbc.co.uk/1/hi/technology/4000015.stm and are claiming more pages index than google (5 billion) so google have responded by effectivly doubling their pages indexed.
Of which 80% is V1AGR@ advertising,
and 19% is pr0n.
There's debate if the remaining 1% contains pirated music and movie or plans for DIY nukes.
"Sufficiently advanced satire is indistinguishable from reality." - [Tips: 1DrYakQDKCQ6y52z6QbnkxHXAocMZJE61o ]
They are probably just crawling the same 4 billion twice
From the article: These are not just copies of the same pages, but truly diverse results that give more information.
From BBC News here.
In a statement Microsoft said its search engine returned results from five billion web pages - more than any other search engine.
But this quickly won a response from Google which announced that its index has now grown to more than 8 billion pages.
Prior to the Microsoft announcement, Google was only indexing 4,285,199,774 web pages.
Steve Ballmer is soon to announce that his daddy is one hundrad years old, and kan kick your daddy's ass...
Offtopic, Inflammatory, Inappropriate, Illegal, or Offensive comments might be moderated up.
Dictionary definition of redundant
re.dun.dant adj
1a: exceeding what is necessary or normal: SUPERFLUOUS
b: characterized by or containing an excess; specif: using more words than necessary
c: characterized by similarity or repetition
d chiefly Brit: being out of work: laid off
2: PROFUSE, LAVISH
3: serving as a duplicate for preventing failure of an entire system (as a spacecraft) upon failure of a single component
-- re.dun.dant.ly adv
Learn what does it mean, mods.
Now it's going to be even harder to get my name in the top spot. Why was I cursed with the surname Smith!
I used to have a better sig but it broke.
Looks like they've added a gazillion LiveJournal pages to their index. I used to have a Google search box on my LJ that didn't throw up relevant results until last week or so. Now it works perfectly, just like builtin search (like what you see in MT and WordPress).
Frankly, I love it any time someone can best Microsoft. The next big thing may well be consumers putting their data on servers provided by the likes of Google, Microsoft and Yahoo -- running their applications there and having PCs that are little more than very easy to use display devices. If so,I would not mind seeing Google with the dominant market share. I trust them with that kind of power a lot more than Microsoft.
From 4 to 8 billion pages... I guess they just indexed the google cache...
--Use ant to make
I smell competition!
Rich.
libguestfs - tools for accessing and modifying virtual machine disk images
Does this mean twice as many pages with "Search for 'printer problem linux' on Kelkoo"?
Yes, this is probably a troll, but anyway... I take it you've never heard of the robots.txt file? You sound like you might want to read up on it. It's designed to help control the spidering of your pages for whatever reason, particularly cases like yours or situations where a spider would get confused and end up doing something stupid (recursive stuff, etc).
-ReK
md5sum -c reality.md5
reality: FAILED
md5sum: WARNING: 1 of 1 computed checksum did NOT match
Erm... Have you considered putting a robots.txt file on your webserver?
The Googlebot is quite well designed and should honour any instructions you put in it.
Take a look http://www.google.com/search?q=robots.txt
apparently my sites will never get a good ranking on google because I don't want the search engine to cache the site. So I'm using meta no-archive tags. That's the only thing I can figure out why the sites rank so poorly on google, when they come up in the top 10-20 hits on yahoo and other search engines. The keywords for the searches are valid, the sites are relevant to the keyword searches, yet the sites don't show in the top 100-300 on google.
I've avoided all the usual spam type of tags (auto refreshing, hidden text, cloaking, etc.) and the sites are legitimate and on the up and up, and yet the only page or two that google is spidering are the one or few that appear to be without the no-archive tags and possibly the revisit/expire tags.
Is google's policy, allows us to cache your site, or get penalized? Anyone else run into a similar problem or can shed some light on this? The only other thing I can think of is the robots text file, that keeps googlebot, and then other spiders through a *, from entering images directories. The spiders, including googlebot, aren't restricted from entering any other directories, they are given free reign.
Anyone else with problems with no-cache, no archive, tight revisit/expire times, or similar non-spam tags that result in penalties in google ranking?
I've been using google exclusively for a few years now. But the poor page ranking of sites on my server got me wondering about other sites that may be relevant to my own searches which may be exluded or penalized by google. So I've started using Yahoo search again, as much as I hate Yahoo (what they do with advertising to Yahoo groups and Yahoo mail is a shame). It appears that Yahoo is including better results because other sites show up with higher ranking that actually are relevant. So I've learned that Google isn't as perfect as I thought it was, which was disappointing in itself. It was easy using one search site. Now I have to use two to make sure I'm getting good results. Anyone know if there is a plugin for Firefox with both Google and Yahoo search boxes on the toolbar?
I don't quite believe that Google would've limited themselves that way (using 32 bit identifiers for documents) - that would've been incredibly short-sighted.
What was even more short sighted was their use of two digits to store the year value for the file dates. Something about the amount of space saved by not using those extra two bytes (four for unicode).
It isn't about having a better search engine, so much as it is knowing how to use it. If you are looking for information on a recipe for oriental rice using asian spice, how would you search?
Bad search example:
oriental rice recipe asian spice
Good search example:
recipe+"oriental rice"+spice
See the difference? google tries its best to get rid of the spam pages, but it won't ever combat them all. Half of the work has to be done with you understanding the best way to describe to the search engine, what it is you want to do. The better you explain it, the better it can search for you.
"We're breaking out the ramen noodles. . . "
"Really? Is it someone's birthday?"
You can rant all you want, but Google still has a fair use right to your images. They are reduced resolution images and therefor legal for non-commercial use.
Not to mention robot.txt, but that is so obvious it shouldn't need to be mention.
- These characters were randomly selected.
Google won't be within reach of the pinnacle until they index .txt files, directory listings, and anonymous ftp sites.
Well, if you know that Google is indexing your site and "stealing" your bandwidth, then you must have looked at the server logs, right? You'd see the name of the search bot is googlebot. Search for it, and you'll find that the first relevant link explains how to prevent googlebot from accessing your site.
/robots.txt. Similar info is gained from searching on that term as well.
The logs would probably also show failed attempts to find the file
I see the difference...
Search terms: oriental rice recipe asian spice
Search Results: Results 1 - 10 of about 254,000 for oriental rice recipe asian spice . (0.40 seconds)
Search Effectiveness: REASONABLE. good list of relivent items matched.
Search terms: recipe+"oriental rice"+spice
Search Results: Your search - recipe+"oriental rice"+spice - did not match any documents.
Search Effectiveness: UTTER SHITE
The user wants SIMPLICITY. If google cannot give decent results for simple search criteria, then people will go elsewhere.
Its the KISS principle in effect.
liqbase
The examples were only examples, nothing more, and hence thus why I said example. I'm quite sure most readers (that aren't out with a jackboot) will get the drift of what I am saying.
If a user wants simplicity, then they will get a simple search. If a user wants an advanced and refined search, then that requires advanced knowledge of google.
If people go elsewhere, oh well. those who know how to use the search engine properly will still be here, educating those who do not know how to use it. Know why? Because eliminating all spam and fake pages from searches won't happen. It just won't due to the time it would take to check each and every page for content, much less content defeating methods.
"We're breaking out the ramen noodles. . . "
"Really? Is it someone's birthday?"
You can customise your page to only have stories in your interests, and Google is one of the story types.
;-)
I'm moderating at the mo, and I'd have moderated you 'muppet', but I thought I'd be useful instead
J.
You're only jealous cos the little penguins are talking to me.
I am feeding this troll because there are people who really _do_ think like that and I wish I could yell at them to their faces
You put content in a place where it is publically accessible. You explicitly and proactively made that content available to everyone, including 'the average surfer' and googlebots. You took no steps to make it available only to the select few of whom you approve.
Now you are all cross and bothered because average surfers / googlebots have read / copied your content, such as it is.
The solution is to drown yourself in a bucket. I have a bucket.
Whence? Hence. Whither? Thither.
I regularly watch where my nickname, full name, parents names, etc come up in google. I've noticed in the past couple of months, my hits have DRASTICALLY reduced. They just disapeared from the database. But over the past 2 days, I've gotten notifications (thanks google alerts) about new pages being indexed and voila! They come up in a search again.
This is why I've been begging google folks to implement NEAR operator!
Here is an example msn search: http://search.msn.com/results.aspx?FORM=SMCRT&q=f
Now I'm the grandest Tiger in the Jungle!
There is no (+) operator to use with Google. It is being used by other search engines, but not the way you wrote it.
Virus infects both Windows and Linux!
Erm, that's only because of the bizarre plus signs the grandparent poster put in - try this. Note to grandparent: Just about any modern search engine assumes words not prefixed by anything are to be included in the Boolean search query. No need for +.
== Jez ==
Do you miss Firefox? Try Pale Moon.
Well, it seems Microsoft has dropped to rank 5 in spiritual ranking, should I sell my stock?
I'm still trying to figure out what people mean by 'social skills' here.
It is not clear to me how I can help them improve. Suggest they switch their servers to Linux?
"Oh, if you just added several billion pages, were you giving me crap before? How many more billions of pages are you not indexing right now?"
Google's announcement merely gives its users reason to question the size and comprehensiveness of Google's index.
The problem is, you tell people to use quotes and pluses and cryptic search terms.
:)
When google cannot find anything, it comes up and tells them the opposite:
Tip: Try removing quotes from your search to get more results.
People don't need to know the quoting syntax, or the inclusion format rules, they just need to click the "Advanced Search".
When you make an comparison regarding how much better your way is than everybody elses, make sure your facts are clear. I agree it was a mistake, and I agree with your sentiment, but most users don't even know how to type a quote character.
liqbase
You can block Apple stories in your user preferences page.
#define struct union
"Google still has a fair use right to your images. They are reduced resolution images and therefor legal for non-commercial use."
FYI; nothing google does is "non commercial".
Even the stuff they let out "for free" serves to funnel their adverts to you, which is their source of revenue. i.e. it is a commercial activity.
Ergo; their use of other people's data (or data ABOUT other people's data, such as a thumbnail of someone else's copyright imagery) is in NO WAY non-commercial.
It seems to me the larger and more dynamic web sites become, the less and less useful web crawlers will become. I suspect it will get the point were site admins will have to regularly submit a keyword list to the search engines.
i don't know if it's news or not, but c|net news was reporting gmail now offeres free pop access.
that's cool.
i have gmail invites for free that require no ipod or free lcd signup. i just have no one to give them too. everyone i *know* has one already. i have six.
and it's more fun to talk to someone than just to submit them to the gmailinvitecache.
you can't have everything, where would you put it?
Including dictionary.com it would seem...
I am happy because this is the first google update that have indexed some files on one of my websites that are going to be used for a program I wanted to write. I registered the domain and created the basic website 11 months ago and have been waiting since!!!. So finally I will be able to get to work on it.
Every kid in China has been asked to make a webpage about their family as a school project.
Yes there is, try to search for
The Doctor
vs
+the doctor
Hmmm... actually:h .html
http://www.google.com/help/refinesearc
There IS a + operator, and you are modded "informative"....
gus
.. if only.
Because every blogger in the universe has added at least 3 pages since the last index. I fail to see how it is significant to me that there are now 8 billion mostly worthless sites out there. The number of actually useful sites has not gone up considerably.
Just notised this on the google pages, does anyone know where there web API came from :
http://news.google.com/apis/
oh, and why do they know own:
http://www.keyhole.com/
no then, a picture with each google local response?
Amazing...Just as fast as Goodle hit 8 Billion, Slashdot duped the story. Subscribers will see it two to three stories about this one.
www.fotoforay.com
Yeah, but are they still censoring stuff? Like pictures of american war crimes in Iraq (just try a search for abu graib and lyndie england, google still returns none of the pictures of her torturing "inmates"). Google appears to be very open to censorship by commerical and government interests. I'm afraid unless this changes I am going to have to stop using them.
As you are the foremost of several identical replies, somehow marked (+5 insightful) instead of (-1 redundant), I will answer you. But consider this a response for all of you who have replied.
Firstly, read my words. I am fully aware of the existence of robots.txt. The clue was where I said Unless I adhere to their own arbitrary rules. In your rush to correct me you have all seemed to have been missing your reading comprehension. Thanks should also go to moderators who have yet again branded me troll because of 5 or 6 of you hot headedly misresponding to the same mistaken point.
The real question here is why I, a UKian, should have to KNOW about robots.txt at all? Why should have to 1)find out, somehow just instinctively know about this arcane piece of information, just to host my own website for my friends to visit. Why is this an opt out list, instead of an opt in list? Why are you automatically expected to want these ridiculous bots crawling over my personal space, violating my privacy??This is an American company with dubious notions of personal data privacy and no clear data retention policy indexing the contents of my site for all to see, when this is against my wishes, basing its decision on its own ridiculous opt out list. That is the scandal. Next time, percieve the beam instead of the mote etc. etc.
And when they make the original mistake, how easy do you think it is to make them cleanse the site from their archives, cache, etc? I can tell you, I hope you are never in that situation. In the end it took a legal threat for them to take notice of me.
Google are thieves. Just because they thieve from everyone, that their thievery is diluted a trillion times does not make it OK. They take our words, our information and our images and they use it to make money for themselves. And we do not see one red cent of it. It is paid to the rich backers.
I am aware that anonymity on the Internet is the motivating cause of many insults. But you should consider your words carefully, I was offended by being told to go and drown in a bucket. A family member drowned recently and it brought back to me the horror of a loss of life, especially in one so young. Please try to be more polite in future. I debated with myself to include this paragraph, as it would just open me up to more abuse from you, but I will give you the benefit of the doubt. We often say things in haste we repent slowly.
you can click on the google search bar and it will bring down a choice of search engines, including yahoo.
click me
The dupe has been dropped :)
2x google is enough for anyone.
Do the moderation points given in that article get returned?
liqbase
It seems that the dupe of this article http://slashdot.org/comments.pl?sid=129334 "Google Cranks Up Index" is removed! Is this the first time it happens on Slashdot?
Does this mean the end of the googlewhack? Or the beginning of a whole new googlewhacky world?
Quidquid latine dictum sit, altum videtur.
Google also started implementing
POP access for Gmail today (my account has it enabled already). There's no IMAP yet, and we know there were ways of doing this before, but it's an interesting direction for Google to take. As stated in the article, they don't intend to start charging for POP access or mail forwarding in the future. So how can Gmail's ad-based business model continue to be viable when its users can read their mail from external clients and via external addresses?
---- scrm
that's what you get for being an arrogant dick.
"if the index has doubled so has our supply of information. Information rules!" Not to be a spelling Nazi but you misspelled pr0n.
I laughed at the weak who considered themselves good because they lacked claws.
Which user? The same people who threatened to move to Canada if Bush was reelected? I know I want relevance.
Where will that user go? Yahoo? MSN? Or any other search engine that is no more easier to search or to get relevant results? Google is still the best, but people want more.
Google is not hard to master if you spend a few minutes to read their guide.
if you want up-to-date results, screw google and try Technorati, then you'll know who's talking about you...
:D
still, it seems that you are the only one talking about you!
- live from Costa Rica !
There is the need for the + sign if you want to force Google to include the word in the search when normally it would class it as an ignored word.
Useful if you want to find things that incorporate "noise words" in their names:
eg "+the guru" compared to "guru"
(film)
asian nurses spice
No fancy pluses or quotes needed. But I see we're looking for different things.
My porn resources just doubled!
"Double your pleasure... double your fun..."
How can anyone prove this?
Is there any way to spider their spider, to prove thay have that many pages on an index?
Why isn't there a headline on Microsoft's search engine which directly competes with Google?
...
Seems you guys love biting the hand that feeds you
Updated figures for 67256 (+5526) respondents:
Has google made any progress in indexing the so-called "Dark Net"?
http://news.bbc.co.uk/1/hi/sci/tech/1721006.stm
...weird. I though this was the "fanboy section".
Search for (abu ghraib), and find only pictures of a harsh wartime prison. None of the famous torture pictures which appear on the web, even though there are several showing Iraqis happily celebrating. Either their excuse that their index is just too old is now obviously bogus, or their image search should never have lost its "Beta" label. Either way, it's obviously dangerous to rely on Google, or any one Web filter, for any accuracy. A much more useful search system would include a multi-index client behind the browser's "Address" input widget. It would query multiple competing search indices (from among a user-defined list with popular defaults), returning collated results including the "messages" (ads) sent by the responding index. Accepting multiple different query formats, like Google, Yahoo, and others (and translating to query the respective indices), it could completely take over the search function, as long as it didn't play favorites with one engine over another (like the locked-in pages from these engines today). Mozilla plugin, anyone?
--
make install -not war
...doesn't get the joke. ...also has a small penis, most likely.
Thumbnails are fair use
Google was great a few years ago, now with 8 billion pages search results consist mostly of irrelevant nonsense such as web logs, links to stupid pages from lousy personal web sites like Angelfire and Geocities, etc.
For some reason Google ranks these junk pages higher than the primary source of information users are searching for. So much for Google's superior search algorithms, they are next to useless now that they have this much data to search.
I rarely use Google anymore because I waste too much time filtering through crap results!
...a search for Phil John now yields two results before my slashdot user page. :o(
I am NaN
I beg your pardon? You didn't ever follow any basic programming courses, did you? What you're saying is nonsense.
Case-sensitive searching is just EXACT comparison of text strings, if you compare:
"Joe JingleheiMerScHmIdT" with
"Joe JinGleheiMerScHmIdT"
there's no match, because the "G" doesn't match "g". This kind of searching is easy, simple & fast. Case-INsensitive comparison just means filtering the strings through "make all uppercase" or "make all lowercase" (or other filters) before doing the comparison. This is EXTRA work, but for most applications, insignificant (fast, simple & easy as well).
A long while back our CRM application was consistently getting hung on queries that involved customer first/last name combinations because it WAS capitalization sensitive.
You're confusing the programming technique itself with a badly coded implementation (your CRM app).
For quite some time now, searching for extremely common words (i.e. "the" by itself) would turn up a page count in the area of ~5,400,000,000. The number on Google's front page seems to update less frequently than the actual number indexed.
Still, I suppose it isn't unreasonable for most people to go by the number on the front page.
I can't find those adverts you are talking about, perhaps you are talking of some other Google image search?
- These characters were randomly selected.
What is more interesting to me is why aren't search engines work the other way around. Why do we need a robot.txt file to tell the robots to sod off, when anyone can come up with a new spider overnight.
I think it would be more appropriate to have the robots.txt file with invitations, so that the spiders would always check first and if they are welcome, only then they would crawl this site.
You can't handle the truth.
I wonder if they are going to take some actions based on blogs. It seems to be skewing the results a bit. Google bombing is still popular, I would think that google would do something to clean up the problem, and give less weight for a link from a blog to the text they use to define it.
I have found that it is annoying sometimes I will be searching for something, and my own website is a hit, I admit I am not surprised, because I will put things in the blog that interest me and I will search for the same things.
the + in front of a word means it MUST be included.
without it all words are optional.
just an FYI, an - in front of a word means it MUST NOT be included.
the + sign is used to FORCE a word to HAVE to appear in the results.
searching for
military victories +french
will only bring back results that MUST HAVE french in them, making the results MORE RELEVANT
military victories -french
does the opposite, it says I only want military victories that DON'T have the word french in them!
god, people can't even read some simple documentation.
Looks like Google doesn't even index their own blog hosting site. The title of my blogspot blog is nowhere to be found in a Google search, and that blog is over 6 months old.
I think you are the confused one.
Great-Great Grandparent said Google uses an index.
Great Grandparent said that the index could be used to prefilter results.
Grandparent posted a problem with this - admittedly not as clearly as he could have.
You posted some crap which displays that you don't know what you are talking about at all, and insulting someone who is raising a valid objection.
The magic of Google is that using a very clever index, it can find relevant results amazingly quickly - as do all search engines. The whole point is that they don't search everything.
While prefiltering could in theory help, searching for exact matches is still far more expensive than is realistic. The reason for this is not the very specific searches proposed above, but rather people wanting to search for '#Windows' (with a capital W).
I'm sure you understand that searching for such a term in all documents which match the (case-insensitive, indexed) term 'windows' would be prohibitively expensive even if only a million such queries were put to Google each day.
The only sensible way in which such a thing could be achieved is if Google 'randomly' selected some searches to 'improve'. No keyword or special symbol, just IF there is CPU time and for a suitable (case sensitive) search term, do the post-processing. Having said that, it would be better to recognize 'common' case sensitive keywords, such as pH, PhD, EFNET, or whatever, and simply use those as separate keywords in the index.
What is more interesting to me is why aren't search engines work the other way around. Why do we need a robot.txt file to tell the robots to sod off, when anyone can come up with a new spider overnight.
/
I think it would be more appropriate to have the robots.txt file with invitations, so that the spiders would always check first and if they are welcome, only then they would crawl this site.
That is what they do. If you don't want to let any spiders crawl your site, you can say:
User-agent: *
Disallow:
If you want something fancier, you could have robots.txt served up by a program. But the features of robots.txt suffice for most.
Ironically, the word ironically is often used incorrectly.
Uh...
Google, the foremost search engine, makes a pretty serious change (doubling their index size while suddenly announcing a new dedication to increasing their index) on the exact same day that an intended major competitor of theirs (MSN search, which slashdot already had a story on) launches...
I'd say that's pretty significant. Since you apparently don't want any Apple or Google news, why not just disable those stories...?
Irritable, left-wing and possibly humorous bumper stickers and t-shirts
"The documents in Google's index are in dozens of file types from HTML to PDF, including PowerPoint, Flash, PostScript and JavaScript."
I'm sure people write a lot of great information in JavaScript. Is this a sign of Google going down hill?
How to get rich:
1) Create expensive, hyped-up IPO for search company
2) Announce you're getting double the pages since you now index JavaScript
3) Watch as share prices double
4) ???
5) Enjoy yourself on out-of-the-way tropical island, with your GOOG shares sold just before the crash.
Seems like lots of M$ and Google services are being enhanced today. I welcome the new storage.
I guess they finally got around to changing their page indexing scheme from a 32-bit unsigned integer. The number sat at 2^32 - epsilon for what seemed like an eternity! I expect the sudden doubling of pages is simply the backlog that built up while they waited for the conversion to 64-bit, or whatever.
Music speeds up when you yawn, but does not change pitch.
I just did a couple of comparative searches on google and the new http://beta.search.msn.com/ and it is the first time lately when I saw another search engine returning more results and faster than google. At least for some keywords.
Try it yourself.
quite wrong. *BZZZRT* thanks for playing, take your FUD with you on the way out.
No holes barred, find anything.
... you've been warned.
One of the best sites on the web (totally private, no ads).
http://www.searchlore.org/
Knock politely, ask for Fravia.
http://fravia.2113.ch/phplab/mbs.php3/mb001
Win-Linux centric.
P.S. Some friendly advice, don't piss off the natives
http://www.searchlore.org/tools.htm
~hylas
No confusion here, but misunderstanding perhaps. For clarification, let me summarise:
jez9999 writes he/she would like to 1) "search the web" for exact match '#windows EFNET'. Obviously a massive amount of work, impossible for quick search queries.
Google uses an index, which is updated/refreshed every so many weeks, and only contains a very limited/filtered subset of "the web". Logical, this is the whole point of using an index.
PsychoSlashDot writes that Google's index works in a way that doesn't allow search 1)
Erasmus Darwin proposes to do traditional search, and then use 2) retrieved subset of Google's index to do full-text search 1) on. I think it's very important to make a distinction here between 2) this subset of Google's index, and 3) the actual web content that this subset of Google's index refers to. 2) would be quickly accessible to Google, although using it differently could require major changes in Google's hardware/software infrastructure. To do full-text search on 3), you'd have to actually retrieve/process the web content itself, which could get huge task quickly, if search doesn't involve very small number of search results.
I think we can agree that 3) can be regarded very time-consuming, but that 2) may be possible, or not (ask Google).
cavemanf16's comment may have been meant to point this out (valid point), but what cavemanf16 actually wrote (CRM app stuff), says that full-text search becomes way more expensive if you include case-sensitivity. That is plain nonsense.
Searching subset of web-content found in Google's index (3) maybe too much work for Google, but maybe adding case-sensitive or punctuation search within subset of Google's index (2) IS do-able. Again, only Google knows.
They must have crawled into Slashdot's -1 threshold cache.
Chika Chik-ah... do-e ow ow.
Whoa - wait a minute...
If that is true - if Google's system has had a design flaw limiting it to 4.3 billion pages until now - then that is a really huge weakness, risk, and vulnerability that the company has had until now.
Thinking back, they must have known about this for a long time - before they went public. If that's the case, did they disclose this weakness/risk to inverstors in their S-1? If not, did they break the law by not doing so?
The big issue with Google is that their page count has been stuck at around 4 billion for a few years now. Which, as covered elsewhere, indicated that someone was using 32bit unsigned integers somewhere...
Nice to see that they've finally patched everything up and can now index beyond 4 billion pages.
I thought that was "Google enlarged"
Seems like what is needed is a client-side app to take the first X number of Google results and do a case-sensitive search on the client computer, instead of on Google's servers. Sure, it'd take a while to download the first X number of results, but it would work if you really needed a more specific search.
"Those who consume the bulk of goods are those who make them. We must never forget this secret of our prosperity."
2.7 billion from dealtime.net, 5 billion from slickdeals.net, and 250 million from blogs, and 50 million from real web sites?
Guess 50 million ain't bad.
A search engine now indexes more web pages than there are members of the human race.
It's not because these spaces are "bizarre", it's because that instead of recipe+"oriental rice"+spice, the great grandparent should have put in some spaces around the pluses: recipe +"oriental rice" +spice
I spelled it write you ass. You should learn how to spell (or how to use a dictionarie sight for that matter)
Surely you mean you spelt it right. And surely you mean dictionary.
If you are so fond of the dictionary, try using it occassionally to ensure that you not only have the right spelling, but the right homophone (or is that the write homophone?)
This sig is intentionally blank
I should've included that info in the top post. The particular site in question is linked by other sites fairly well, and some of the linking sites are highly ranked directories that place my site as a premier or feature site. And if a competitor complained about the site, and google looked at it for problems, they should've left it alone because the keyword results leading to the site are highly relevant and no one is led to the site through misleading keywords or other methods.
The number and quality of backlinks are the first things I checked and double checked to make sure everything stayed ok with the site. It isn't easy to fake or mislead on this, which is one of the reasons google uses this method, and one of the reasons why I would be expecting much better results explicitly because of this. But I highly doubt that google is alone in using this method for search results, since this has been talked about for more than a year. I'm sure that Yahoo is using this as one of their algorithms as well, which would explain the excellent ranking of the site in question on their search engine.
Something else is going on with google. And that is what got me wondering about search results when I use the search engine myself.