Slashdot Mirror


Google Now Searches JavaScript

mikejuk writes "Google has been improving the way that its Googlebot searches dynamic web pages for some time — but it seems to be causing some added interest just at the moment. In the past Google has encouraged developers to avoid using JavaScript to deliver content or links to content because of the difficulty of indexing dynamic content. Over time, however, the Googlebot has incorporated ways of searching content that is provided via JavaScript. Now it seems that it has got so good at the task Google is asking us to allow the Googlebot to scan the JavaScript used by our sites. Working with JavaScript means that the Googlebot has to actually download and run the scripts and this is more complicated than you might think. This has led to speculation of whether or not it might be possible to include JavaScript on a site that could use the Google cloud to compute something. For example, imagine that you set up a JavaScript program to compute the n-digits of Pi, or a BitCoin miner, and had the result formed into a custom URL — which the Googlebot would then try to access as part of its crawl. By looking at, say, the query part of the URL in the log you might be able to get back a useful result."

25 of 114 comments (clear)

  1. Really? by Anonymous Coward · · Score: 5, Insightful

    Googlebot will have a very quick timeout on scripts and probably wont be more powerful than a standard home computer. How would that be useful for calculating digits of pi or bitcoin mining? It would take far longer than doing it the conventional way.

  2. Incremental and/or parallel computing? by SlovakWakko · · Score: 5, Interesting

    You can always cut the whole process into smaller steps, each providing URL that will initiate the next step. Or you can provide several URLs and have the Google cloud compute a problem for you in parallel...

    1. Re:Incremental and/or parallel computing? by Anonymous Coward · · Score: 5, Funny

      I already do this using a system of CNAME's in a .xxx domain.

    2. Re:Incremental and/or parallel computing? by ThatsMyNick · · Score: 5, Interesting

      Anyone wanting to do this would be doing it on a dedicate website. They wont care about the domain or IP address being blacklisted from Google. And good luck with the theft of service charge, they never asked Google to index them. They did not even agree to any terms of service from Google. As I said, good luck.

    3. Re:Incremental and/or parallel computing? by truedfx · · Score: 4, Informative

      No, that's not what opting in means. Opting in means you're asking Google to visit your site. Opting out means you're asking Google not to visit your site. When you're not asking for anything, merely hoping, you're neither opting in nor opting out.

    4. Re:Incremental and/or parallel computing? by ThatsMyNick · · Score: 2

      I think you missed the "just for the heck of it". I understand my approach is not the practical one, and any sane person would just use their resources to do what little can be done and implement it on their own hardware. But it does it mean it cannot be done in a no loss way. Say I want to calculate the last 100 digits of Graham's number, it is can be split into multiple calculations, a sub result calculation can take less than a second (which is what I assume Google will limit the runtime to). The bandwidth requirements are less
       
      And I really dont understand why you believe this cannot be done. If your argument is this can be done is other easier way, I agree. If not, I dont really understand your argument.

    5. Re:Incremental and/or parallel computing? by ThatsMyNick · · Score: 3, Insightful

      Your JS would generate HTML on the client side. Just generate a link that your server can understand. Google bot, doing what it does, will try to load this URL. When it does, the server stores this result, and generates a new problem for GoogleBot to solve. This is the basis, for the article and the entire comment thread.

    6. Re:Incremental and/or parallel computing? by ThatsMyNick · · Score: 2

      Er, they are looking for JS that generates HTML (So this is not an assumption). The purpose of GoogleBot is to index. If they run the JS and dont even index the results, it is makes no sense.
       
      Would you mind specifically mentioning what I assumption I am making. And there is no way to Sanitize JS (JS is a turing complete language, there is no way (atleast as far as present day research) to santize it in any reasonable way)

  3. A much more likely application by maxwell+demon · · Score: 5, Interesting

    Send Google JavaScript which generates different results for Google than for normal visitors, in order to rank up the site.

    --
    The Tao of math: The numbers you can count are not the real numbers.
    1. Re:A much more likely application by aaronb1138 · · Score: 4, Funny

      What is this method you have written, "sudo_mod_me_up?"

    2. Re:A much more likely application by slapyslapslap · · Score: 2

      This is already being done, but in reverse. Google doesn't like it much either. Get caught, and you are de-listed.

  4. Re:Simply another example by Zero__Kelvin · · Score: 5, Funny

    Well, I think the bigger problem is that you are writing arbitrary code.

    --
    Guns don't kill people; Physics kills people! - John Lithgow as Dick Solomon on Third Rock From The Sun
  5. Re:Not convinced by mwvdlee · · Score: 2

    By "gracefully degrading" do you mean "if (useragent == 'googlebot') { random-spamwords(); paywalled-content(); links-to-every-parsable-uri(); }"?

    --
    Slashdot social media options: AIM, ICQ, Yahoo, Jabber and Mobile Text. Why no MySpace?
  6. so much for by Anonymous Coward · · Score: 5, Insightful

    using javascript to hide or obfuscate email addresses to help protect them from spammers, scammers and bots.

    thanks fer nuttin, google.

    1. Re:so much for by VortexCortex · · Score: 2

      robots.txt

    2. Re:so much for by MattskEE · · Score: 2

      Do you think spammers scraping the web for email addresses respect robots.txt?

    3. Re:so much for by John+Bokma · · Score: 2

      Uhm, years ago one could already do that using SpiderMonkey and some Perl. It's what I used to report nasty redirects in Blogspot/Blogger to Google (thousands and thousands). It took me some time, but Google did see the light and the problem was resolved.

      Why do people keep thinking that spammers are retards? If it can be abused, it will be. And spammers/cybercriminals are among the first to do so.

  7. Google has been doing this for quite some time by Anonymous Coward · · Score: 2, Interesting

    Although maybe not quite in the same context. Google used to display javascript-munged email addresses in their search results until some of the larger sites involved, such as Rootsweb, complained.

  8. Re:I noticed this already some time ago. by RoccamOccam · · Score: 3, Funny

    Also, the dry cleaning that you dropped off on Thursday is ready for pick-up and your driver's license expires in three months.

    Sincerely,
    The Slashdot Citizens Brigade

  9. Chrome by The+MAZZTer · · Score: 2

    If you check out some of the thumbnails, it looks like Googlebot is using a customized version of Chrome now. You can see it blocking plugins.

  10. Re:I noticed this already some time ago. by marcosdumay · · Score: 2

    Now that you said it. The preview Google shows of one of my sites has all the CSS aplied, including some that is aplied by javascript after the page load.

  11. They don't need to run the scripts by Hentes · · Score: 2

    You don't need to actually run the scripts, most of the time it's enough to just scrape the strings and links out of them.

  12. Here's your sign by Zero__Kelvin · · Score: 2

    "There is no reason to believe, as the research is scant at best, that Google even respects a robots.txt file.

    From the preceeding link: "Make use of the robots.txt file on your web server. This file tells crawlers which directories can or cannot be crawled. Make sure it's current for your site so that you don't accidentally block the Googlebot crawler. Visit http://code.google.com/web/controlcrawlindex/docs/faq.html to learn how to instruct robots when they visit your site. You can test your robots.txt file to make sure you're using it correctly with the robots.txt analysis tool available in Google Webmaster Tools.

    --
    Guns don't kill people; Physics kills people! - John Lithgow as Dick Solomon on Third Rock From The Sun
  13. Spammers! by xenobyte · · Score: 3, Informative

    They've been testing this for a while - We've already had the first complaints against someone spamming an email that only exists in exactly one place: Online as the result of some (trivial) javascript. Turned out that if you Googled the page, the result snapshot included the javascript generated email... In other words - it's already there and this will effectively kill javascript as a way of hiding functioning mailto links. Okay it would be fairly simple to add a condition based on the User Agent as GoogleBot is easily identified but it will make things a bit more complicated for the average user.

    --
    "For every complex problem, there is a solution that is simple, neat, and wrong." -- H.L. Mencken (1880-1956) --
  14. Re:GET vs POST by xOneca · · Score: 2

    Maybe put Javascript functions on a separate file and use robots.txt to ban bots access.