GoogleGuy · Slashdot Mirror

This story has been refuted on Google's Silent Monopoly · 2006-12-08 04:47 · Score: 1

Matt Cutts has debunked this story, and Google's AdWords team has also posted to their blog to debunk this. I think it's funny that people beat up on Google for buying ads, when Yahoo just takes the screen real estate for free. Try a search for [online advertising] on Yahoo. They hard-code a shortcut to their own products.

Google emailed this site on Google De-indexes Talk.Origins, Won't Say Why UPDATED · 2006-12-03 22:44 · Score: 5, Informative

If you dig deeper, it turns out that Google emailed talkorigins.org to alert the site that it had been hacked and was stuffed with rape and animal porn spam. Google's head of webspam has posted a full write-up.

Re:My humble advise to Yahoo! and Google on Yahoo Rejects Microsoft Search Offer · 2006-05-14 04:24 · Score: 1

Fair points. In my experience at Google, we try to crawl at least a little bit from any site that might prove useful, but PageRank is also a large factor in crawling, so that helps to avoid infinite spaces.

I do think you make an important distinction between crawling and indexing though, because they don't have to be identical. Anyway, if you did the study--nice job. We all enjoyed reading it at Google. :)

Re:My humble advise to Yahoo! and Google on Yahoo Rejects Microsoft Search Offer · 2006-05-13 15:26 · Score: 1

I really enjoyed the crawl analysis on drunkmenworkhere.org (and lots of Googlers enjoyed reading it), but I wouldn't necessarily agree that Slurp is the deepest crawling in the general case. In the case of a (mostly) empty website with a huge binary tree under it, quite a few of my colleagues would argue that it's good to have some pages from the tree, but that you don't want to have too many. Most of the pages were pretty empty other than the spambot comments, so you could argue that it might be better to crawl fewer of those low-content pages.

Look at it from other perspective: here's another infinite spider trap:
http://infinitetree.com/big-tree/node/01011000110/
A good index selection algorithm would probably notice the near-duplicate nature of many of these pages, and select only a sampling for indexing. Crawling too many of those low-content pages would be a bad idea.

Re:Censorship not just in China on Google's China Problem · 2006-04-22 19:48 · Score: 1

The original story isn't true. A url went live and Google indexed it 3-4 days later. The people who wrote about Charlie Sheen and 9/11 assumed that because it didn't appear faster, Google was somehow making a statement about Charlie Sheen and the 9/11 story. Nope. Google didn't do anything special (good or bad) for that url. Sometimes it takes time for a search engine to crawl/index/return pages.

I haven't checked out the newest claim, but given the typos on the page you point to, such as "apparant" and "its" (needs an apostrophe) and "supressing" I'm inclined to be skeptical (again). For example, news stories exit the Google News index after a few weeks as part of normal operations. Also, just because prisonplanet.com is a news source doesn't mean that infowars.com is a news source. And even if prisonplanet.com is a news source, that doesn't mean that Google knows how to index articles on every section of prisonplanet.com. It's unclear exactly what that url is claiming, but I would first look for non-tinfoil-hat explanations, such as the three that I just mentioned.

Any examples? on Google to be Added to S&P 500 Index · 2006-03-25 08:45 · Score: 1

I posted elsewhere on this thread, but if you want to give a couple examples of searches that didn't work well, I'll ask someone to check them out..

Examples? on Google to be Added to S&P 500 Index · 2006-03-25 08:42 · Score: 1

If you want to post some specific examples of poor search results, I'd be happy to pass them on for someone to check out.

Re:We shopped for an SEO on NewsWeek Looks at Search Engine Optimization · 2005-12-12 10:26 · Score: 1

Shane, you need to put this up on a blog somewhere; that's pretty funny. The whole notion of sending a noname proposal and a bigname proposal is really wild..

Re:Trust Yahoo? on Is Yahoo Actively Supporting Adware? · 2005-09-20 13:37 · Score: 1

If you want to get to the main version of Google and skip the localized version, here's a page that describes how to do it without allowing cookies.
http://www.tech-recipes.com/google_tips769.html
Hope that helps; if you want to search without a cookie, that's fine with us. :)
GoogleGuy

Re:I'm not feeling sorry on Google Blacklists CNet Reporters · 2005-08-05 14:13 · Score: 2, Informative

"Nobody has proved what little factual information they [CNET] conveyed was false."

Not so--CNET admits it themselves. If you read the article in question, CNET did correct false information from their original article. CNET added this disclaimer after they wrote the article:

"Correction: The original article incorrectly implied that Google Desktop Search can track what's stored on a user's PC. The service does not expose a user's content to Google or anyone else without the user's explicit permission."

I can see both sides of this issue; I just wanted to point out that CNET did imply incorrect things in their privacy article; points to them for adding the correction afterwards.

Re:Source code? on 'DVD Jon' Breaks Google Video Lock · 2005-06-29 03:52 · Score: 1

Nice. Very nice. :)

So if I understand the Yahoo News story, you can go to http://code.google.com/vlc-diff.txt, around line 389, and comment that out. The patch presumably just makes the change in the executable to comment this out, yah?

Re:Spreading the goodness too thin on Google Scholar: Not Ready for Prime Time? · 2005-06-15 14:26 · Score: 1

We definitely do read Slashdot. I'm sure the Google Scholar folks will read the original article, plus the comments here.

Re:What I would like to see on Looking for Answers in the Age of Search · 2005-06-13 14:53 · Score: 1

I checked into this. It turns out that the product name is 5062AF, and the page you wanted only has "5062AF" on the page--not 5062 by itself surrounded by whitespace. If you do the query [concord 5062af] then the page that you wanted shows up at #2, after a PC Magazine review, which would be a pretty solid result too.

It's interesting to think about indexing 5062af as 5062 as well, but some searches would probably become less precise because we added in more general matches.

Re:What does Creative Commons mean here? on Google Launches Google Sitemaps · 2005-06-03 05:18 · Score: 1

We're also offering a python2.2 program that will run on your computer and generate a Sitemap for you. I think that's what has the Creative Commons license. Google wants Sitemaps to be open/available to anyone that's interested in creating or using them (including other search engines, if they're interested).

Re:502 Server Error! on Google Launches Google Sitemaps · 2005-06-03 05:07 · Score: 1

I think the Sitemaps links to a "normal" webserver, as opposed to our custom setup. Plus the Sitemaps stuff is using https. Looks like a higher amount of interest than a typical Slashdotting too. I alerted the Sitemaps team, but you may have to wait for the techie stampede to subside. :)

Re:what's the basis of the license? on Google Launches Google Sitemaps · 2005-06-03 05:04 · Score: 1

There's python2.2 code to generate Sitemaps for people. I believe that's what was released under Creative Commons. The intent is to make this open and wide available to anyone that wants to use it.

Re:Google now advertising with Aurora on Google's Secret Lab · 2005-06-01 07:29 · Score: 1

Thanks for mentioning this. I forwarded your url within Google, and someone is investigating. They're checking for agencies or spots that we're using for the Google Desktop Search.

Thanks again--I appreciate you noticing this.
GoogleGuy

Secret Labs on Google's Secret Lab · 2005-06-01 06:53 · Score: 1

Huh. I guess the sharks with frickin' lasers on their heads must have let their guards down.

Seriously, every search engine does evaluation on their results. It's a good way to test that relevance is high, especially in different languages and locations. The fact that Google does lots of testing and evaluation of our results in tons of different ways shouldn't be a surprise. That's part of the 70-30 breakdown where ~70% of our effort is on the core areas of search and advertising, but we usually don't talk about that 70% work to improve our results or validate their quality. So keep it quiet; I hear some other search engines read Slashdot too. ;)

Re:Redirection loop on Google's New Personalized Homepage · 2005-05-19 18:36 · Score: 1

I'll pass on the feedback--thanks for mentioning it.

20% time works pretty well in my opinion. on Software Development Practices At Google · 2005-03-25 10:58 · Score: 2, Insightful

The 20% projects work well in my experience. Sometimes you have to take the initiative to make sure you take that time, but you usually end up doing fun, search-y type stuff. And you end up meeting other people from different parts of Google, and getting familiar with new/different bits of the Google code base. It's also a good way to break out of a rut and make sure that you think about "bigger picture" issues. If you end up crunching on an important project, you can also bank that 20% time and use it up later.

Re:Actually an example has been posted on Millions of Pages Google Hijacked using ODP Feed · 2005-03-23 13:12 · Score: 2, Insightful

claus, I'm glad that you mentioned this search. I looked through those 100 results. Every example that I saw in those results was from a while ago--they were all listed with the Supplemental Result tag. So this is already handled correctly in our main index, and as urls are updated in the supplemental index, those examples should be handled correctly as well.

Thanks for mentioning this search; it's a good point. We've already made some changes to improve our heuristics, and you can see that improvement in the fact that current urls look better than the supplemental urls.

Re:clsc.net seems to be down... on Millions of Pages Google Hijacked using ODP Feed · 2005-03-23 10:18 · Score: 1

allinurl:foo.com says "show me all the results you know of that have foo.com in the url." And since com is a stopword, I wouldn't be surprised if this really just said "show me all the results with foo in the url"--that is, without the .com. You could force the .com to also match by using allinurl:foo-com to make it a phrase match, I believe.

So bar.com/dir1/dir2/foo.com would be a valid result for that search, for example. But that doesn't mean that we've confounded bar.com with foo.com. bar.com may do a 302 to foo.com or it may not, but it's not a hijacking. We're just showing all the results we know of with "foo.com" in the url; the fact that some of those results are not on foo.com isn't really a problem. Now if you did site:foo.com and saw results from bar.com, that's something that I would email to us.

Re:OK, an example on Millions of Pages Google Hijacked using ODP Feed · 2005-03-23 10:10 · Score: 1

Different folks often hit different data centers because of load balancing and stuff like that. I'll certainly keep an eye on this search myself too though.

Re:THANK YOU! on Millions of Pages Google Hijacked using ODP Feed · 2005-03-23 09:56 · Score: 2, Informative

You bet. If you want to make sure that we have the info to check it out, you can go to google.com/support and when you get to a form where you can enter info, just use canonicalpage as the subject line. We are collecting data to user support to build up a testset for checking any changes we want to try.

Re:clsc.net seems to be down... on Millions of Pages Google Hijacked using ODP Feed · 2005-03-23 09:05 · Score: 2, Informative

It's me. I've had the GoogleGuy handle since Jan 19th, 2005. From the K5 article, the allinurl: stuff isn't true though; allinurl: just looks for term in the url. So [allinurl:imatix.com] can show results from any site that has imatix in the url.

Slashdot Mirror

User: GoogleGuy

Comments · 50