Slashdot Mirror


User: GoogleGuy

GoogleGuy's activity in the archive.

Stories
0
Comments
50
First seen
Last seen
Profile
(view on slashdot.org)

Comments · 50

  1. Re:Ugh. This is so not true. on Millions of Pages Google Hijacked using ODP Feed · · Score: 1, Offtopic

    Hi bigbloggingbuggar, I believe RTFA is a related term to RTFM. I prefer to think of it as "read the fine article". ;) I try to be polite at most of the places that I post (WebmasterWorld, Danny Sullivan's Search Engine Watch forums, etc.), but RTFA is a well-established piece of lingo at Slashdot, used to encourage people to review the basic facts presented in the submitted story. As part of blending in, also look for me to slip in references to Cowboy Neal, Soviet Russia, Natalie Portman, grits, etc. etc. I've been reading Slashdot for years, but I've only had the GoogleGuy handle on Slashdot since Jan 19th.

  2. Re:Ugh. This is so not true. on Millions of Pages Google Hijacked using ODP Feed · · Score: 4, Informative

    One example is http://www.doi.org/, because people want to have a persistent url like dx.doi.org/10.1226/1588290972, but then be able to have that url do a 302 redirect to a destination page like http://doi.contentdirections.com/mr/humana.jsp?doi =10.1226/1588290972 The destination urls might change, so it's handy to have a persistent digital object identifier on doi.org.

  3. Re:OK, an example on Millions of Pages Google Hijacked using ODP Feed · · Score: 3, Informative

    Just to follow-up, I saw your email come through the queue from user support. The engineer who checked it out basically said "They appear at the top of the results when I do a search. Still, just because their website only has that one phrase on it doesn't guarantee that their site will appear at the top of the results." So this isn't a "302 hijacking," but I hope our user support will reply in addition to my post. :)

  4. Re:OK, an example on Millions of Pages Google Hijacked using ODP Feed · · Score: 4, Informative

    Thanks for the concrete example. As someone else pointed out:
    - for the search imatix I see you at number one.
    - for the search "Strategic solutions for a complex world" I see you at number one.
    - for the search allinurl:imatix.com, that search (and it's sister operator inurl:) only look for the words in the url. So it's perfectly fine to show results like "real-imatix.com/" because they contain the word imatix. These results are not hijacking results--this is expected behavior for inurl and allinurl.

    Hope this helps,
    GoogleGuy

  5. Re:My site is affected on Millions of Pages Google Hijacked using ODP Feed · · Score: 4, Informative

    Yeah, this is a common misconception. allinurl: and its sister operator inurl: look for terms matching in the url. For a search like [allinurl:thehumorarchives.com], a result like www.stumbleupon.com/url/www.thehumorarchives.com/f orums/ is a fine result, and doesn't have anything to do with this.

  6. Re:Not a surprise on Millions of Pages Google Hijacked using ODP Feed · · Score: 5, Insightful

    Hey, if you've run across spammy sites, have you filled out a spam report and used the keyword slashdot? I mentioned in a earlier comment from a different story that you can do this. We got eight reports last time, and the responses are on their way. We do check that data to look for new tricks that spammers are trying.

  7. Re:Ugh. This is so not true. on Millions of Pages Google Hijacked using ODP Feed · · Score: 5, Informative

    Okay, I'll talk about this whole "millions of webpages hijacked! Film at 11!" piece of scaremongering. If you RTFA, the author (and the submitter of the story?) claims that some scraper sites have pulled down a copy of the dmoz RDF, gotten the urls, and are doing 302 redirects to sites in an attempt to hijack them. Note that this does not mean that lots of pages were hijacked at all.

    Here's the skinny on "302 hijacking" from my point of view, and why you pretty much only hear about it on search engine optimizer sites and webmaster forums. When you see two copies of a url or site (or you see redirects from one site to another), you have to choose a canonical url. There are lots of ways to make that choice, but it often boils down to wanting to choose the url with the most reputation. PageRank is a pretty good proxy for reputation, and incorporating PageRank into the decision for the canonical url helps to choose the right url.

    A lot of sites that try to spam search engine indices get caught, and their PageRank goes lower and lower as their reputation suffers. We do a very good job of picking canonical urls for normal sites; sites with their PageRank going toward zero are more likely to have a different canonical url picked, though, and to a webmaster I understand that it can look like "hijacking" even though the base cause is usually your reputation declining. For a long time, it was hard to get anyone to report canonicalization problems, because the site that got "hijacked" would be free-cheap-texas-holdem-plus-viagra-and-payday-loa ns-as-well.com type sites. In fact, I had to offer to ignore the spamminess of any reported sites in order to get people to send in any real data.

    But even though I suspected that this issue affected very few sites, we still wanted to collect feedback to see how big of a problem it was, and to see if we could improve our url canonicalization. So starting a while ago, we offered a way to report "302 hijacking" to Google; I mentioned the method on several webmaster forums. You contact user support and use the keyword "canonicalpage" in your report. Then I created a little mailing list with some engineers on it, and user support passes on emails that meet the criteria to the mailing list.

    So how much reports has all this work (including posting multiple times on lots of webmaster boards to request data) gotten me? The last time I checked, it was under 30. Not a million pages. Not even a hundred reports. Under 30. Don't get me wrong, we're still looking at how we can do better: one engineer proposed a way that might help these sites, and he's got a testset of sites that would be affected by changes in how we canonicalized urls. A few of us have been looking through it to see if we can improve things, but please know that this is not a wildfire issue that will result in the web melting down.

    As a side note, I'm getting a little tired of debunking the source of this story (NickW at threadwatch). For example, he claimed that Google had removed Greg Duffy from Google's index. When I pointed out that he was making an assertion of fact without evidence, he started out revising the story by sprinkling in words like "appears" and eventually pulled the story at http://www.threadwatch.org/node/1822 off his front page. But given that this is the third link to NickW's site from Slashdot in the last couple weeks, I'm guessing that he's tasted the Slashdot effect and wants more.

  8. Ugh. This is so not true. on Millions of Pages Google Hijacked using ODP Feed · · Score: 2, Informative

    This is a placeholder. I'll include more details of why you shouldn't listen to Threadwatch.org in a bit, and debunk this some. Let me get this posted and I'll follow up.

    (Yes, I am GoogleGuy.)

  9. Re:How to report spam on A Search Engine Manipulator's Tale · · Score: 2, Interesting

    I could probably get Google to pay for it, but I've been reading Slashdot forever, so it was probably good for me to do my part to thank Slashdot for years of letting me enjoy geeky distraction instead of Real Work. So I just paid out of my pocket.

  10. Re:How to report spam on A Search Engine Manipulator's Tale · · Score: 5, Informative

    I wouldn't necessarily say that I'm high up, but I am an engineer at Google. The googleguy.de fellow nicely let me have the GoogleGuy identity at Slashdot. I think (hope) that we sent him some schwag to say thanks.

    So yes: from now on, when you see GoogleGuy on Slashdot, it is the original, tried and true GoogleGuy. I even subscribed and everything.

  11. How to report spam on A Search Engine Manipulator's Tale · · Score: 5, Informative

    If you find a page in Google that violates our quality guidelines (cloaking, sneaky redirects, hidden text, hidden links, etc.), please let us know by reporting it at our spam report form.

    If you include the word slashdot in the "Additional details" section, I'll someone to do an additional check this weekend for Slashdot-reported spam.

    We use spam report data to improve our quality directly, but also to look for new types of spam and ways to improve our scoring algorithms.

  12. My favorite is perftools on Google Launches Google Code · · Score: 3, Interesting

    You can see code that Google is opening up here. My favorite is the perftools code because it helps with things like heap profiling. Very handy stuff, and it's hosted with Sourceforge. I'm pretty sure these four projects were just added in the last day or so.

  13. Re:A suggestion.... on 'Online Poker' Googlebomb · · Score: 1

    Unfortunately, we have to optimize our features according to how often users take advantage of them. This is a rare enough request that I remember seeing your comments on this a few months ago--I can't remember anyone else bringing this up recently. So I'm sorry to say that you probably shouldn't expect a -inanchor checkbox any time soon.

    However, you could hack up something for yourself that does this using the Google WebAPI. Or you might be able to rig up a search box on your home page that just transforms all your queries from ["A B C"] to ["A B C" -inanchor:"A B C"].

  14. Re:Googlebombing is part of Google's design flaw. on 'Online Poker' Googlebomb · · Score: 1

    I'm saying that if I search for [stanford univ] then stanford.edu is a relevant result that Google can return because of anchors to that site. Regarding ["to be or not to be"], the results that are brought up because of anchors are quite relevant in my opinion (e.g. tobeornottobe.com), but if you want only on-page matches, just use -inanchor.

    I think I understand what you're saying, but I believe that the majority of people searching for ["to be or not to be"] would be content with results like tobeornottobe.com. We provide a way to know whether the match was on-page or not (via looking in the header of the cached page), and we provide a way to turn off anchor matches (via -inanchor). I'm sorry the default behavior isn't what you'd prefer, but that behavior is intended and I wouldn't expect it to change.

  15. Re:Googlebombing is part of Google's design flaw. on 'Online Poker' Googlebomb · · Score: 1
    ["to be or not to be" atariamarok] is an interesting search--AtariAmarok really really hates this search. AtariAmarok, if you check out this link, someone answered how this happens several months ago here The answer was
    If you view the cached page, you'll see a message in the header: "These terms only appear in links pointing to this page: to be or not to be". That's how you can tell."
    Being able to search for something like [stanford univ] and being able to return Stanford University (even though the word "univ" might not occur on the stanford.edu page) is usually a nice win for search quality. However, you clearly want only on-page matches. So if you type the search ["to be or not to be" -inanchor:"to be or not to be"] then you get the search that you want, AtariAmarok.
  16. Re:Didn't mean to post as AC on Google Punishes Self for Cloaking · · Score: 2, Interesting

    Just to chime in: Google didn't remove Greg Duffy's site from our index. I've said as much today on Metafilter, Kuro5hin, Threadwatch, Greg's own site, and now I'm happy to say it here.

  17. Re:Nice to see... on Google Punishes Self for Cloaking · · Score: 1

    Of course Googlers read Slashdot--we're geeks. Personally, I haven't really started out the day right until I've caught up on Slashdot, Techdirt, Bloglines, Freshnews.org, and so forth.

  18. Re:I checked into this. on Is Google Breaking Their Own Rules? · · Score: 1

    Next time I'll try to make sure I post faster. I guess you gotta move fast before attention moves to the next Slashdot post..

  19. I checked into this. on Is Google Breaking Their Own Rules? · · Score: 2, Informative

    We inadvertently showed additional information on product support pages to both Google's site search crawler and Google's main web crawler. The additional information shown on the product support pages was intended only for the site search crawler, not the main web crawler. They're in the process of changing it so that the pages show only same the information that users get.

  20. Re:Gmail's forced "basic HTML view" - and a soluti on Google Weather Service And GMail Improvements · · Score: 1

    I passed this feedback on the Gmail team. Thanks for mentioning this.

  21. Glad I subscribed.. on Google Still Ahead In Search Competition · · Score: 0, Offtopic

    I'm glad I subscribed to Slashdot so I can read articles like this early. Now I need to grep my subscriber RSS feed so it drops me an email whenever Slashdot mentions Google or Gmail. :)

  22. Re:Is the result valid HTML/XHTML? on Google Cans Comment Spam · · Score: 1

    Yup. rel= is nicely compliant--thanks for pointing this out. One tidbit I didn't realize is that if you have multiple attributes for rel, W3C says to separate them with spaces. Google will accept spaces, commas, or probably any other punctuation separator in order to be safe.

  23. IE Lock in on OD2 Launches Penny-Per-Song Streaming Jukebox · · Score: 2, Interesting

    Good idea, but its only accessable with Microsoft Internet Explorer. WTF? I use Firefox and am not going to switch back ;-)

  24. Comapre the Algorithms manually on Yahoo! Vs. Google: Algorithm Standoff · · Score: 4, Informative

    The challenge for Google and Yahoo is to filter out the SEO spam (Doorways, cloaking, ...)

    Check out the algorithms yourself by comparing google and yahoo search results side by side.

  25. Compare the search results on Yahoo! Switches Search Engines · · Score: 5, Informative

    Hi,

    here's a small tool to compare the search results of Google and Yahoo.

    Have fun.