Google Now Searches JavaScript
mikejuk writes "Google has been improving the way that its Googlebot searches dynamic web pages for some time — but it seems to be causing some added interest just at the moment. In the past Google has encouraged developers to avoid using JavaScript to deliver content or links to content because of the difficulty of indexing dynamic content. Over time, however, the Googlebot has incorporated ways of searching content that is provided via JavaScript. Now it seems that it has got so good at the task Google is asking us to allow the Googlebot to scan the JavaScript used by our sites. Working with JavaScript means that the Googlebot has to actually download and run the scripts and this is more complicated than you might think. This has led to speculation of whether or not it might be possible to include JavaScript on a site that could use the Google cloud to compute something. For example, imagine that you set up a JavaScript program to compute the n-digits of Pi, or a BitCoin miner, and had the result formed into a custom URL — which the Googlebot would then try to access as part of its crawl. By looking at, say, the query part of the URL in the log you might be able to get back a useful result."
Googlebot will have a very quick timeout on scripts and probably wont be more powerful than a standard home computer. How would that be useful for calculating digits of pi or bitcoin mining? It would take far longer than doing it the conventional way.
You can always cut the whole process into smaller steps, each providing URL that will initiate the next step. Or you can provide several URLs and have the Google cloud compute a problem for you in parallel...
Send Google JavaScript which generates different results for Google than for normal visitors, in order to rank up the site.
The Tao of math: The numbers you can count are not the real numbers.
Well, I think the bigger problem is that you are writing arbitrary code.
Guns don't kill people; Physics kills people! - John Lithgow as Dick Solomon on Third Rock From The Sun
By "gracefully degrading" do you mean "if (useragent == 'googlebot') { random-spamwords(); paywalled-content(); links-to-every-parsable-uri(); }"?
Slashdot social media options: AIM, ICQ, Yahoo, Jabber and Mobile Text. Why no MySpace?
using javascript to hide or obfuscate email addresses to help protect them from spammers, scammers and bots.
thanks fer nuttin, google.
Although maybe not quite in the same context. Google used to display javascript-munged email addresses in their search results until some of the larger sites involved, such as Rootsweb, complained.
Also, the dry cleaning that you dropped off on Thursday is ready for pick-up and your driver's license expires in three months.
Sincerely,
The Slashdot Citizens Brigade
If you check out some of the thumbnails, it looks like Googlebot is using a customized version of Chrome now. You can see it blocking plugins.
Now that you said it. The preview Google shows of one of my sites has all the CSS aplied, including some that is aplied by javascript after the page load.
Rethinking email
You don't need to actually run the scripts, most of the time it's enough to just scrape the strings and links out of them.
From the preceeding link: "Make use of the robots.txt file on your web server. This file tells crawlers which directories can or cannot be crawled. Make sure it's current for your site so that you don't accidentally block the Googlebot crawler. Visit http://code.google.com/web/controlcrawlindex/docs/faq.html to learn how to instruct robots when they visit your site. You can test your robots.txt file to make sure you're using it correctly with the robots.txt analysis tool available in Google Webmaster Tools.
Guns don't kill people; Physics kills people! - John Lithgow as Dick Solomon on Third Rock From The Sun
They've been testing this for a while - We've already had the first complaints against someone spamming an email that only exists in exactly one place: Online as the result of some (trivial) javascript. Turned out that if you Googled the page, the result snapshot included the javascript generated email... In other words - it's already there and this will effectively kill javascript as a way of hiding functioning mailto links. Okay it would be fairly simple to add a condition based on the User Agent as GoogleBot is easily identified but it will make things a bit more complicated for the average user.
"For every complex problem, there is a solution that is simple, neat, and wrong." -- H.L. Mencken (1880-1956) --
Maybe put Javascript functions on a separate file and use robots.txt to ban bots access.