Slashdot Mirror


Google Now Searches JavaScript

mikejuk writes "Google has been improving the way that its Googlebot searches dynamic web pages for some time — but it seems to be causing some added interest just at the moment. In the past Google has encouraged developers to avoid using JavaScript to deliver content or links to content because of the difficulty of indexing dynamic content. Over time, however, the Googlebot has incorporated ways of searching content that is provided via JavaScript. Now it seems that it has got so good at the task Google is asking us to allow the Googlebot to scan the JavaScript used by our sites. Working with JavaScript means that the Googlebot has to actually download and run the scripts and this is more complicated than you might think. This has led to speculation of whether or not it might be possible to include JavaScript on a site that could use the Google cloud to compute something. For example, imagine that you set up a JavaScript program to compute the n-digits of Pi, or a BitCoin miner, and had the result formed into a custom URL — which the Googlebot would then try to access as part of its crawl. By looking at, say, the query part of the URL in the log you might be able to get back a useful result."

114 comments

  1. Really? by Anonymous Coward · · Score: 5, Insightful

    Googlebot will have a very quick timeout on scripts and probably wont be more powerful than a standard home computer. How would that be useful for calculating digits of pi or bitcoin mining? It would take far longer than doing it the conventional way.

    1. Re:Really? by multicoregeneral · · Score: 1

      Depends how often they hit your site. Google has been known to check sites pretty regularly.

      --
      This signature intentionally left blank.
    2. Re:Really? by Sloppy · · Score: 1

      Wait a minute, are you suggesting that having spiders run my javascript x86 emulator which runs jruby scripts which mines bitcoins, isn't practical?

      --
      As copyright owner of this comment, I authorize everyone to defeat any technological measure which limits access to it.
  2. Incremental and/or parallel computing? by SlovakWakko · · Score: 5, Interesting

    You can always cut the whole process into smaller steps, each providing URL that will initiate the next step. Or you can provide several URLs and have the Google cloud compute a problem for you in parallel...

    1. Re:Incremental and/or parallel computing? by Anonymous Coward · · Score: 5, Funny

      I already do this using a system of CNAME's in a .xxx domain.

    2. Re:Incremental and/or parallel computing? by Anonymous Coward · · Score: 0

      I realise that the kind of idiots who like Bitcoins will be the same fools who drool over Google, and that these same monkeys won't see any problem with providing an algorithm which generates a secret to a third party for execution, but exactly why do they think any significant sort of compute time will be dedicated to their shitty little site?

    3. Re:Incremental and/or parallel computing? by Anonymous Coward · · Score: 1

      The same reason why 72 hours of video is uploaded to YouTube every minute.

    4. Re:Incremental and/or parallel computing? by Anonymous Coward · · Score: 0

      Not worth it, IMO. You have to serve it a small fraction of the problem and then get back the answer and use it. Using your own resources would be faster and cheaper.

    5. Re:Incremental and/or parallel computing? by Zero__Kelvin · · Score: 1, Interesting

      Even if this is possible, you would certainly be violating Google's guidelines and have your site blacklisted from Googlebot pretty quickly. Furthermore, you could be charge with theft of services.

      --
      Guns don't kill people; Physics kills people! - John Lithgow as Dick Solomon on Third Rock From The Sun
    6. Re:Incremental and/or parallel computing? by Anonymous Coward · · Score: 0, Flamebait

      I don't think so. Google downloaded my script and ran it on their servers. How is that my fault if it uses their resources?

      --
      Sundar Pichai is the utter asshole whose incompetence has resulted in the shutdown of Google's Atlanta office.

    7. Re:Incremental and/or parallel computing? by ThatsMyNick · · Score: 5, Interesting

      Anyone wanting to do this would be doing it on a dedicate website. They wont care about the domain or IP address being blacklisted from Google. And good luck with the theft of service charge, they never asked Google to index them. They did not even agree to any terms of service from Google. As I said, good luck.

    8. Re:Incremental and/or parallel computing? by Zero__Kelvin · · Score: 1

      Intent.

      --
      Guns don't kill people; Physics kills people! - John Lithgow as Dick Solomon on Third Rock From The Sun
    9. Re:Incremental and/or parallel computing? by Zero__Kelvin · · Score: 0

      Thanks, but I won't need luck. If I don't set up a robots.txt file telling them not to index it, then I have opted in to be indexed. If my code is clearly designed to exploit Google's bot, then I had intent. The two combined equals guilt.

      --
      Guns don't kill people; Physics kills people! - John Lithgow as Dick Solomon on Third Rock From The Sun
    10. Re:Incremental and/or parallel computing? by truedfx · · Score: 4, Informative

      No, that's not what opting in means. Opting in means you're asking Google to visit your site. Opting out means you're asking Google not to visit your site. When you're not asking for anything, merely hoping, you're neither opting in nor opting out.

    11. Re:Incremental and/or parallel computing? by rtfa-troll · · Score: 1
      What if the URL triggers, for example, a slashdot posting then you use another external Javascript interpreter to gather all the results. Sort of map-reduce. Incredibly inefficient but you don't have to pay so who cares? Even better if some xss our similar attack on a Web site can be used to parcel out the work.

      It seems to me though that there's no reason to limit this to googlebot any Javascript interpreter will do.I'm surprised if nobody from the blackhat community doesn't have this up and running for password cracking or similar.

      --
      =~ s,(.*),<sarcasm>$1</sarcasm>,g if any_point_you_wish();
    12. Re:Incremental and/or parallel computing? by postbigbang · · Score: 0

      There is no reason to believe, as the research is scant at best, that Google even respects a robots.txt file. They are a vacuum hose attached to an analytic engine, easily metaphorized to Steven King's Langoliers.

      --
      ---- Teach Peace. It's Cheaper Than War.
    13. Re:Incremental and/or parallel computing? by Anonymous Coward · · Score: 0

      That only applies if you target it specifically at Google. Which is harder than making a generic calculate-all-the-pi-digits javascript that simply outputs to the browser, with a different colour background for each digit. Then you do exactly the above, and you get the result on the server, and there is no malicious intent; only the intent to compute pi digits.

    14. Re:Incremental and/or parallel computing? by Anonymous Coward · · Score: 0

      You're a bit of a moron aren't you. I think you need to read up on the concept of 'opt in' vs 'opt out'.

      Herpa derp!

    15. Re:Incremental and/or parallel computing? by dreamchaser · · Score: 1

      Intent.

      Prove it. Seriously. You wouldn't be able to.

    16. Re:Incremental and/or parallel computing? by Anonymous Coward · · Score: 0

      Shouldn't be that way. Should be no robots.txt is the same as...

      User-agent: *
      Disallow: /

      WITH a robots.txt, then the bot behavior should be specified, for example:

      User-agent: *
      Allow: googlebot

    17. Re:Incremental and/or parallel computing? by Zero__Kelvin · · Score: 1

      "Anyone wanting to do this would be doing it on a dedicate website. They wont care about the domain or IP address being blacklisted from Google."

      So you are saying that someone would go through all the trouble of registering the domain, creating the code, and getting (or waiting for) Google to index it, then wouldn't care that Google would cease to execute the actual code before the desired results are obtained? Re-read what I wrote. I merely said it would be blacklisted quickly. I didn't say that it would affect some tangential portion of the site. The purpose of the attempt itself would be defeated quickly. That is the point.

      --
      Guns don't kill people; Physics kills people! - John Lithgow as Dick Solomon on Third Rock From The Sun
    18. Re:Incremental and/or parallel computing? by ThatsMyNick · · Score: 1

      So far blacklisting has worked pretty well for Google. Google has used it well to punish black hat SEO techniques.

      In this case though, if I dont care about my page rank, I would simply create tons of long length domain names for pennies (+icann fees). I would use few at a time and would care if Google blacklisted few at a time (I would be storing partial results, just like one of the parent mentioned, and the takeover should be seamless). It doesnt take a lot to recoop your domain name fees if your task is purely computational.

    19. Re:Incremental and/or parallel computing? by Zero__Kelvin · · Score: 1

      It doesnt take a lot to recoop your domain name fees if your task is purely computational.

      Dedicated hardware is cheap, and designing software costs a lot of money and time. What you are proposing would be ridiculously convoluted and costly, even disregarding the legal ramifications. We software engineers often talk about using the right tool for the right job. Your outlandish proposal ignores numerous sound engineering principles, not the least of which is adhering to this simple maxim.

      --
      Guns don't kill people; Physics kills people! - John Lithgow as Dick Solomon on Third Rock From The Sun
    20. Re:Incremental and/or parallel computing? by Dwonis · · Score: 1

      I realise that the kind of idiots who like Bitcoins will be the same fools who drool over Google, and that these same monkeys won't see any problem with providing an algorithm which generates a secret to a third party for execution,

      Bitcoin mining doesn't involve any secret information.

      I'm not sure why you're slagging "idiots who like Bitcoins" so much, either. Sure, Bitcoin has attracted some cranks, anarchists, people who don't trust government-issued money, and speculators who will say all manner of things in attempts to influence the price of Bitcoins (both up and down), but have you actually looked at the crypto and the system of incentives built into the Bitcoin system? It's brilliant, and it's basically the micropayment system that everyone wanted back in the 1990s, but couldn't have because it didn't exist.

    21. Re:Incremental and/or parallel computing? by Dwonis · · Score: 1

      Guilt under what section of what law, specifically?

    22. Re:Incremental and/or parallel computing? by amRadioHed · · Score: 1

      Research is scant? It's ridiculously easy for anyone with a webserver to verify if Google respects robots.txt.

      --
      We hope your rules and wisdom choke you / Now we are one in everlasting peace
    23. Re:Incremental and/or parallel computing? by ThatsMyNick · · Score: 1

      May be not. But if someone wanted to do it just for the heck of it, it can be done. It may not scale very well, otherwise I dont see issues at all with it.

    24. Re:Incremental and/or parallel computing? by Clogoddess · · Score: 0

      Blah blah blah Google employee alert yawn.

    25. Re:Incremental and/or parallel computing? by ThatsMyNick · · Score: 1

      I feel honored to have been considered a Google employee. Well, not really. Is there is something wrong with my point, that it sounds Fanboish or Employeeish?

    26. Re:Incremental and/or parallel computing? by Zero__Kelvin · · Score: 1

      You must be from another country. In the US, they have a smorgasbord of them from which they can choose now. But in this case I was thinking theft of service as I already stated quite explicitly.

      --
      Guns don't kill people; Physics kills people! - John Lithgow as Dick Solomon on Third Rock From The Sun
    27. Re:Incremental and/or parallel computing? by Zero__Kelvin · · Score: 1

      As I was trying to explain, there is a huge difference between the problems you can see with it and the actual long list of problems that any moderately competent software engineer could quickly point out.

      --
      Guns don't kill people; Physics kills people! - John Lithgow as Dick Solomon on Third Rock From The Sun
    28. Re:Incremental and/or parallel computing? by ThatsMyNick · · Score: 2

      I think you missed the "just for the heck of it". I understand my approach is not the practical one, and any sane person would just use their resources to do what little can be done and implement it on their own hardware. But it does it mean it cannot be done in a no loss way. Say I want to calculate the last 100 digits of Graham's number, it is can be split into multiple calculations, a sub result calculation can take less than a second (which is what I assume Google will limit the runtime to). The bandwidth requirements are less
       
      And I really dont understand why you believe this cannot be done. If your argument is this can be done is other easier way, I agree. If not, I dont really understand your argument.

    29. Re:Incremental and/or parallel computing? by Zero__Kelvin · · Score: 1

      Let's start with the simplest problem. You plan on having Googlebot load and run your client side code. Great. Now how do you plan to get Googlebot to feed you the result?

      --
      Guns don't kill people; Physics kills people! - John Lithgow as Dick Solomon on Third Rock From The Sun
    30. Re:Incremental and/or parallel computing? by ThatsMyNick · · Score: 3, Insightful

      Your JS would generate HTML on the client side. Just generate a link that your server can understand. Google bot, doing what it does, will try to load this URL. When it does, the server stores this result, and generates a new problem for GoogleBot to solve. This is the basis, for the article and the entire comment thread.

    31. Re:Incremental and/or parallel computing? by Zero__Kelvin · · Score: 1

      "Your JS would generate HTML on the client side."

      Like I said, you are making assumptions about Googlebot. You seem to think that they have no idea how to sanitize an input and will just execute whatever you send them byte for byte. That's not going to happen.

      --
      Guns don't kill people; Physics kills people! - John Lithgow as Dick Solomon on Third Rock From The Sun
    32. Re:Incremental and/or parallel computing? by ThatsMyNick · · Score: 2

      Er, they are looking for JS that generates HTML (So this is not an assumption). The purpose of GoogleBot is to index. If they run the JS and dont even index the results, it is makes no sense.
       
      Would you mind specifically mentioning what I assumption I am making. And there is no way to Sanitize JS (JS is a turing complete language, there is no way (atleast as far as present day research) to santize it in any reasonable way)

    33. Re:Incremental and/or parallel computing? by ThatsMyNick · · Score: 1

      Soory about the typos, I guess I need to get some sleep.

    34. Re:Incremental and/or parallel computing? by Dark$ide · · Score: 1
      What happens when Google chooses to ignore my carefully crafted robots.txt?

      If they then download the my javascript experiment and run it at their cost, that's their problem.

      When I can trust crawlers to not ignore my robots.txt I'll stop using fail2ban on my apache logs.

      --

      Sigs. We don't need no steenking sigs.

  3. Simply another example by Anonymous Coward · · Score: 1

    why having other parties fetch your arbitrary code and execute it is such a wonderful idea.

    1. Re:Simply another example by Zero__Kelvin · · Score: 5, Funny

      Well, I think the bigger problem is that you are writing arbitrary code.

      --
      Guns don't kill people; Physics kills people! - John Lithgow as Dick Solomon on Third Rock From The Sun
  4. A much more likely application by maxwell+demon · · Score: 5, Interesting

    Send Google JavaScript which generates different results for Google than for normal visitors, in order to rank up the site.

    --
    The Tao of math: The numbers you can count are not the real numbers.
    1. Re:A much more likely application by Anonymous Coward · · Score: 0

      That's an interesting idea and much more insidious than mine, which was to simply send nothing to Google and fuck 'em.

    2. Re:A much more likely application by aaronb1138 · · Score: 4, Funny

      What is this method you have written, "sudo_mod_me_up?"

    3. Re:A much more likely application by Anonymous Coward · · Score: 0

      Or you could do it the easy way - robots.txt.

    4. Re:A much more likely application by The+Mighty+Buzzard · · Score: 1

      Wait, GoogleBot gets mod points now? This explains soooo much.

      --
      Violence is like duct tape. If it doesn't solve the problem, you didn't use enough.
    5. Re:A much more likely application by Anonymous Coward · · Score: 1

      You don't need JavaScript for that. A lot of servers serve different HTML to Google than to us. It's especially noticeable when searching for a rare term; Google will show you results that appear to contain the term, but without relevant context (only mystifying unrelated terms) and when you open it the page turns out to have some completely different subject.

    6. Re:A much more likely application by Anonymous Coward · · Score: 1

      I noticed this in a PHP attack script earlier this year. It installs a script pointing to a Russian malware domain, but only inserts it in the page if the user agent is not GoogleBot or a few other spiders. It also checked for some Google ip ranges. Surely Google must be combating this by doing some stealth spidering, otherwise SEO and malware providers will game them if they stick to their classic robot rules.

    7. Re:A much more likely application by slapyslapslap · · Score: 2

      This is already being done, but in reverse. Google doesn't like it much either. Get caught, and you are de-listed.

    8. Re:A much more likely application by multicoregeneral · · Score: 1

      Or one that generates useful looking links to other sites you own (on different servers and subnets, of course).

      --
      This signature intentionally left blank.
    9. Re:A much more likely application by multicoregeneral · · Score: 1
      --
      This signature intentionally left blank.
    10. Re:A much more likely application by Anonymous Coward · · Score: 1

      That's an interesting idea and much more insidious than mine, which was to simply send nothing to Google and fuck 'em.

      Not allow your site to be indexed by Google? Yeah, that'd really fuck Google up good, wouldn't it?

    11. Re:A much more likely application by squidinkcalligraphy · · Score: 1

      I would be surprised if the googlebot didn't try everything to appear to the server like a normal user browser. Even better would be to crawl a site while in disguise, then again while not disguised. Differences would affect the sites ranking negatively.

      --
      "I think it would be a good idea" Gandhi, on Western Civilisation
    12. Re:A much more likely application by maxwell+demon · · Score: 1

      The point is, with Google executing JavaScript you could make it less obvious, by just having the JavaScript depend on some difference between the Google and the Browser JavaScript execution (maybe timings of certain rendering operations).

      Also, it might be used through XSS, to have competitors delisted.

      --
      The Tao of math: The numbers you can count are not the real numbers.
    13. Re:A much more likely application by maxwell+demon · · Score: 1

      Serving different content based on IP or self-identification is possible even without JavaScript. However if the detection makes use of peculiar behavior of the JavaScript implementation (and the JavaScript implementation will have to have some differences, or else it won't find content which is initially hidden, but unhidden by an user interaction), just fetching from a different UI or with a different browser/spider identification doesn't work.

      And BTW, the spider will certainly expose itself from the very fact that it accesses robots.txt

      --
      The Tao of math: The numbers you can count are not the real numbers.
  5. Not convinced by Anonymous Coward · · Score: 0

    As a programmer, this still sounds like an extremely dirty hack to me. For the time being, I'll stick to creating gracefully degrading sites, thank you very much.

    1. Re:Not convinced by mwvdlee · · Score: 2

      By "gracefully degrading" do you mean "if (useragent == 'googlebot') { random-spamwords(); paywalled-content(); links-to-every-parsable-uri(); }"?

      --
      Slashdot social media options: AIM, ICQ, Yahoo, Jabber and Mobile Text. Why no MySpace?
    2. Re:Not convinced by Anonymous Coward · · Score: 0

      They're probably doing it to deal with retards like the Meteor developers.

      Go ahead and look at the source of that page, and note how the page is completely invisible to search engines.

      Really, who wouldn't want their hip new Web 2.0 site to be completely unfindable through search engines? The accessibility degradation is just a bonus.

    3. Re:Not convinced by TheLink · · Score: 1

      Would it be possible to filter out sites like this? I personally don't want to find sites like these in my Google search results.

      --
    4. Re:Not convinced by dave420 · · Score: 1

      Google is not there to represent your own idea of what the internet is. Sites like that will become more and more common, whether you like it or not.

    5. Re:Not convinced by Anonymous Coward · · Score: 0

      when I visit that site I get a background image only, consequently, unless it's something I really need, I surf away.

      If they want my business, they'd better make sure the basic content is available without JS (and that goes double for links)

  6. I noticed this already some time ago. by jimbauwens · · Score: 1

    When I was looking at the page previews (in google) of my JavaScript network scanner, I noticed it listed some IP's, indicating that it was running the script. Just google "http://bwns.be/jim/scanning_printing/detect_range.html" and look at the preview. (Also, most of those IP's probably exist, as my script indicates it is sure about them).

    1. Re:I noticed this already some time ago. by C18H27NO3+ · · Score: 1

      You typoed your url. You have detect_range.html which is actually detect-range.html

    2. Re:I noticed this already some time ago. by jimbauwens · · Score: 1

      Oh :P Thanks :)

    3. Re:I noticed this already some time ago. by RoccamOccam · · Score: 3, Funny

      Also, the dry cleaning that you dropped off on Thursday is ready for pick-up and your driver's license expires in three months.

      Sincerely,
      The Slashdot Citizens Brigade

    4. Re:I noticed this already some time ago. by marcosdumay · · Score: 2

      Now that you said it. The preview Google shows of one of my sites has all the CSS aplied, including some that is aplied by javascript after the page load.

  7. Probably, but limited in use. by Anonymous Coward · · Score: 0

    I remember once I was going to try use Google Cache to see if I could store backups on it.

    I still haven't actually bothered doing it to be honest.
    And Caching in general doesn't seem to show up as often as it used to on websites.
    I feel they are only caching larger or active websites, or social.

    It would probably require a bit of trial and error most likely.

  8. so much for by Anonymous Coward · · Score: 5, Insightful

    using javascript to hide or obfuscate email addresses to help protect them from spammers, scammers and bots.

    thanks fer nuttin, google.

    1. Re:so much for by Anonymous Coward · · Score: 0

      Exactly. Or you could modify the script to detect the googlebot and then generate garbage, bogus addresses, or maybe the e-mail address to Google's complaint department (must be the size of a small planet by now).

    2. Re:so much for by VortexCortex · · Score: 2

      robots.txt

    3. Re:so much for by MattskEE · · Score: 2

      Do you think spammers scraping the web for email addresses respect robots.txt?

    4. Re:so much for by John+Bokma · · Score: 2

      Uhm, years ago one could already do that using SpiderMonkey and some Perl. It's what I used to report nasty redirects in Blogspot/Blogger to Google (thousands and thousands). It took me some time, but Google did see the light and the problem was resolved.

      Why do people keep thinking that spammers are retards? If it can be abused, it will be. And spammers/cybercriminals are among the first to do so.

    5. Re:so much for by Anonymous Coward · · Score: 0

      No, but they probably haven't implemented JavaScript execution either, so if you prevent them from using the results of Google's crawl (by telling Google not to crawl the page in the first place) they're back to square one.

    6. Re:so much for by Anonymous Coward · · Score: 0

      This whole thread is under the premise that spammers and scammers use the results of Google's crawl to harvest email addresses. How does that work? Which query are they using?

    7. Re:so much for by goaxcap · · Score: 1

      Use images or flash to show up email

    8. Re:so much for by Anonymous Coward · · Score: 0

      ... how long have you been using google? search for:

      *@hotmail.com

      even with a unique strike rate of 0.0001%, at apparently 15 billion results you'll harvest 150000 email addresses
      that's just looking up the one domain. do it for every domain ever and see how many you can uncover
      noting that something like this will probably be more accurate for targetting specific sites:

      site:domaintoscrape.com *@domaintoscrape.com

    9. Re:so much for by Anonymous Coward · · Score: 0

      If they can OCR CAPTCHA they can OCR your images. You'll only succeed in making your page harder for the legit users.

  9. Evaluate JavaScript on the client by Anonymous Coward · · Score: 1

    Now Google controls the client, the search engine and the analytics it should not be too difficult for them to see how traffic is flowing between sites. Pages need not even be physically linked for Google to see a connection. E.g. reading an article on the BBC may cause people to search for a company. With people signing into Chrome Google Google must have some very rich logs.

  10. Google has been doing this for quite some time by Anonymous Coward · · Score: 2, Interesting

    Although maybe not quite in the same context. Google used to display javascript-munged email addresses in their search results until some of the larger sites involved, such as Rootsweb, complained.

  11. GET vs POST by Anonymous Coward · · Score: 1

    I really hope website developers and web application developers know the difference between GET and POST requests.

    Else, this could turn ugly.

    1. Re:GET vs POST by physburn · · Score: 1

      I've often programmed write new article, or add item, GET links, and also javascript actions. Which would mean google is going to be spamming forums and databases. Whats the robots.txt command to prevent going running the javascript on a page?

    2. Re:GET vs POST by xOneca · · Score: 2

      Maybe put Javascript functions on a separate file and use robots.txt to ban bots access.

  12. Silly by Anonymous Coward · · Score: 0

    That's a silly idea anyway.

    What I expect from Google is to basically download the page and process it, as if it was Chrome, and then diff it against the unprocessed page to figure out which section of content is changed.

    Secondly, to avoid stupidity, keep a whitelist and blacklist of scripts that should, and should not be processed. For example, Whitelist scripts on Twitter, Facebook, and Disqus to read comments, but blacklist login pages, local storage, and advertisements. This would let google figure out which content is part of the page and which content is dynamically added to the page that's of contextual value. Google can do it's own oAuth and login to sites that allow authentication from G+ as "The GoogleBot" which would also allow users to ban it from accessing any data they don't want it to see.

  13. Google adding potential security holes in its bot? by Kergan · · Score: 1

    I can already picture hackers drooling at the idea of turning Google's cloud into the ultimate zombie network.

  14. Chrome by The+MAZZTer · · Score: 2

    If you check out some of the thumbnails, it looks like Googlebot is using a customized version of Chrome now. You can see it blocking plugins.

  15. I for one welcome the Javascript spamming. by multicoregeneral · · Score: 1

    It's inevitable. Someone will figure out a way to abuse the system that google hasn't thought to make contingencies for yet. I'm on the fence as to whether this is a good idea. I just hope they know what they're doing.

    --
    This signature intentionally left blank.
    1. Re:I for one welcome the Javascript spamming. by dave420 · · Score: 1

      Yeah, it's true - Google clearly knows nothing about searching the internet. ;)

    2. Re:I for one welcome the Javascript spamming. by multicoregeneral · · Score: 1

      Dave, every time they make a change like this, they get hammered. They made some big changes the release before "panda" and the site was useless for almost a year.

      --
      This signature intentionally left blank.
  16. Chrome from users used as web spider by Anonymous Coward · · Score: 0

    I thought that every (yet unknown) url that is visited by a user from inside Google Chrome is reported back to Google. I guess that could also be used for crawling javascript by using the client's computer for that.

  17. They don't need to run the scripts by Hentes · · Score: 2

    You don't need to actually run the scripts, most of the time it's enough to just scrape the strings and links out of them.

  18. WTF? by Johann+Lau · · Score: 1

    Oh yeah, fuck accessibility. Fuck the web in general. "It's better for everybody". That's literally all you need to know. "Just go ahead and remove that from your robots.txt".

    I'm not saying there may not be good reasons (e.g. having the CSS and Javascript actually makes it possible to detect invisible text and whatnot, without that search engines may not even have a chance), but I really would appreciate some good reasoning, not being talked to like a fucking 5 year old.

    Or hey, how about adding that "of course, not having a unique URL for relevant content is a noob fucking mistake, and generally a cancer everybody is looking forward to eradicate, and irrelevant gimmick content is hardly interesting for search, so if you just went ahead and made a site that doesn't suck butt, that would be fine, too." --- *something* to indicate he isn't fucking clueless.

  19. Here's your sign by Zero__Kelvin · · Score: 2

    "There is no reason to believe, as the research is scant at best, that Google even respects a robots.txt file.

    From the preceeding link: "Make use of the robots.txt file on your web server. This file tells crawlers which directories can or cannot be crawled. Make sure it's current for your site so that you don't accidentally block the Googlebot crawler. Visit http://code.google.com/web/controlcrawlindex/docs/faq.html to learn how to instruct robots when they visit your site. You can test your robots.txt file to make sure you're using it correctly with the robots.txt analysis tool available in Google Webmaster Tools.

    --
    Guns don't kill people; Physics kills people! - John Lithgow as Dick Solomon on Third Rock From The Sun
  20. Stop trying to teach what you don't understand by Zero__Kelvin · · Score: 1

    "Opting in means you're asking Google to visit your site. "

    Right. That is exactly what I said. The standard for the internet is well defined. You should read about it. If you make a web page available to the internet without a password, captcha or firewall, etc. you are making it available to all. You have already purposely accepted the condition ahead of time. This is opting in. The robots.txt allows you to opt-out instead. If you opt in by placing it on the internet available to web crawlers and not opting-out with a robots.txt entry, you opt-in to having that data accessible to all, including but by no means limited, to Googlebot.

    --
    Guns don't kill people; Physics kills people! - John Lithgow as Dick Solomon on Third Rock From The Sun
    1. Re:Stop trying to teach what you don't understand by truedfx · · Score: 1

      No, that isn't what you said. Allowing Google to access your site and asking Google to access your site are two different things. By neither opting in nor opting out, you're allowing Google to access your site, because the default is to allow it and you haven't told Google otherwise, but that's not opting in. Hint: what does opt mean? Who has chosen that the default is to allow anyone to visit your site? If it isn't you, then you didn't opt.

    2. Re:Stop trying to teach what you don't understand by Zero__Kelvin · · Score: 1
      You are confusing e-mail with the internet, which it turns out isn't a series of tubes by the way. In an e-mail scenario opting in involves a specific request to receive information. In a web publishing scenario you are opting in to having your published data and information read by all.

      "Who has chosen that the default is to allow anyone to visit your site? If it isn't you, then you didn't opt."

      Why do you keep reiterating my point for me and then saying I didn't make my point? If you don't create a mechanism to keep Google out (e.g. robots.txt) then - by your own admission - you have opted to allow Googlebot to read what you publish to the world.

      --
      Guns don't kill people; Physics kills people! - John Lithgow as Dick Solomon on Third Rock From The Sun
    3. Re:Stop trying to teach what you don't understand by Anonymous Coward · · Score: 0

      if it's not a series of tubes then what is it?

    4. Re:Stop trying to teach what you don't understand by bingoUV · · Score: 1

      If you don't create a mechanism to keep Google out (e.g. robots.txt) then - by your own admission - you have opted to allow Googlebot to read what you publish to the world.

      Allowing Google to do something does not mean asking Google to do it. Allowing does not involve "service".

      --
      Bingo Dictionary - Pragmatist, n. A myopic idealist.
    5. Re:Stop trying to teach what you don't understand by Zero__Kelvin · · Score: 1

      That's great. Now show me where I said it does mean asking them to do it. You are confusing the e-mail version of opt-in (opting to have others send to you) with the web version of opt-in, which is opting to have others view your content. When you don't block access to your site you are opting to have your site crawled. Any web content designer who doesn't know this is incompetent.

      --
      Guns don't kill people; Physics kills people! - John Lithgow as Dick Solomon on Third Rock From The Sun
    6. Re:Stop trying to teach what you don't understand by bingoUV · · Score: 1

      You said "theft of service". If you had read the second sentence, it said allowing does not involve service.

      For example, you "allowed" me to pray for you. I pray for a fee. You are hereby charged with theft of service.

      --
      Bingo Dictionary - Pragmatist, n. A myopic idealist.
    7. Re:Stop trying to teach what you don't understand by Zero__Kelvin · · Score: 1

      You might want to go back and read the whole thread and study it. As I quite explicitly stated, theft of service doesn't come in until you write your code to run on and leverage Google's systems. Obviously, allowing their bot to crawl your site is not theft of service.

      --
      Guns don't kill people; Physics kills people! - John Lithgow as Dick Solomon on Third Rock From The Sun
    8. Re:Stop trying to teach what you don't understand by bingoUV · · Score: 1

      I have "studied" it quite well. So you are saying you cannot be charged with theft of service until you arrange with god to benefit from my prayers.

      Google is doing what it wants to do. It doesn't become theft of service just because someone benefits from it.

      --
      Bingo Dictionary - Pragmatist, n. A myopic idealist.
    9. Re:Stop trying to teach what you don't understand by Zero__Kelvin · · Score: 1

      "So you are saying you cannot be charged with theft of service until you arrange with god to benefit from my prayers."

      No. I am saying that you shouldn't try to make analogies, because you suck at it.

      "I have "studied" it quite well. .... It doesn't become theft of service just because someone benefits from it."

      Also, don't waste your time studying things if your definition of 'quite well' results in the level of complete misunderstanding you have managed to acheive. Just accept that you aren't smart enough to understand what I wrote and move on.

      --
      Guns don't kill people; Physics kills people! - John Lithgow as Dick Solomon on Third Rock From The Sun
    10. Re:Stop trying to teach what you don't understand by bingoUV · · Score: 1

      Please try to be funny when you troll.

      --
      Bingo Dictionary - Pragmatist, n. A myopic idealist.
    11. Re:Stop trying to teach what you don't understand by Zero__Kelvin · · Score: 1

      Please try to learn what the term troll means. More importantly, please don't reply back with your interpretation of what you have "learned" when you do.

      --
      Guns don't kill people; Physics kills people! - John Lithgow as Dick Solomon on Third Rock From The Sun
    12. Re:Stop trying to teach what you don't understand by bingoUV · · Score: 1

      You are demonstrating the act of trolling quite well. Only thing left is for the observer to know the name of this internet behavior. For an experienced internet user like me, it was very simple, thank you. Your posts can go into textbooks to illustrate trolling to help people less well informed than me. Thanks for community service.

      --
      Bingo Dictionary - Pragmatist, n. A myopic idealist.
    13. Re:Stop trying to teach what you don't understand by Zero__Kelvin · · Score: 1

      I'm truly glad I could help with your book, which I have no doubt will be published once you pay the fee. Coincidentally, I'm writing a book on clueless morons, so I really value the examples you have provided for me as well! It isn't often that a Slashdot disagreement turns into this kind of a win-win situation!

      --
      Guns don't kill people; Physics kills people! - John Lithgow as Dick Solomon on Third Rock From The Sun
    14. Re:Stop trying to teach what you don't understand by bingoUV · · Score: 1

      Sorry, I am not writing the book I mentioned. I just hoped someone would. Though you can add your above post as an illustration in your own book, as I hadn't mentioned I intend to write any book like mentioned but you concluded it anyway. You are quite the person to write a book on "clueless morons". Being one yourself is quite a help, I am sure.

      --
      Bingo Dictionary - Pragmatist, n. A myopic idealist.
    15. Re:Stop trying to teach what you don't understand by Zero__Kelvin · · Score: 1

      "Sorry, I am not writing the book I mentioned."

      That's OK. It's probably for the best. I've seen your writing.

      "I just hoped someone would. "

      It is always good to have hopes and dreams, even if they are phenomenally unrealistic. For example, I hope you get a clue someday.

      "Though you can add your above post as an illustration in your own book, as I hadn't mentioned I intend to write any book like mentioned but you concluded it anyway."

      You really are a dim bulb there, Sherlock. Have a nice life in fantasy land!

      --
      Guns don't kill people; Physics kills people! - John Lithgow as Dick Solomon on Third Rock From The Sun
    16. Re:Stop trying to teach what you don't understand by bingoUV · · Score: 1

      You really are a dim bulb there

      Even though it was you that drew the wrong conclusions?

      Anyway, don't worry. This is the best you can come up with, at the moment, but next year you are sure to think of a witty reply. Keep trying.

      --
      Bingo Dictionary - Pragmatist, n. A myopic idealist.
    17. Re:Stop trying to teach what you don't understand by Zero__Kelvin · · Score: 1

      "Even though it was you that drew the wrong conclusions?"

      I didn't draw the wrong conclusion. I was making the point that the only way a book will be published that uses my post as an example of trolling is if you write one yourself, and the only way any book you write would be published is if you pay someone to publish it. Alas, you are too dim to figure these things out, so:

      PLONK

      --
      Guns don't kill people; Physics kills people! - John Lithgow as Dick Solomon on Third Rock From The Sun
    18. Re:Stop trying to teach what you don't understand by bingoUV · · Score: 1

      the only way a book will be published that uses my post as an example of trolling is if you write one yourself

      Unsubstantiated

      the only way any book you write would be published is if you pay someone to publish it

      Ditto. Also, a "book" need not be "paid published" to be called a book in these days of e-books.

      Anyway, I was just making fun of your stupidity. Alas, the same quality of yours makes you unable to understand it.

      --
      Bingo Dictionary - Pragmatist, n. A myopic idealist.
  21. How Are They Doing This? by Anonymous Coward · · Score: 0

    I wonder if they're using a standard browser to load pages now or if they have incorporated V8 and WebKit (or something similar) into Googlebot?

    Imagine if you could get this on your local machine as a web crawler app, but with filtering capabilities. Traditional web crawlers only work with static content for the very reason that they're not advanced enough to load the entire page, including running javascript, plus there is the overhead of that additional processing, which can be a real kill to your crawling time.

    I really hope more details are released on how they're doing this (but not on how they're ranking anything since Google is protective of that).

    1. Re:How Are They Doing This? by dave420 · · Score: 1

      There are command-line WebKit-based parsers out there, which allow you to process any URL or file as a browser would, and take either a screenshot of the page or access the DOM or whatever. They're not new.

  22. Spammers! by xenobyte · · Score: 3, Informative

    They've been testing this for a while - We've already had the first complaints against someone spamming an email that only exists in exactly one place: Online as the result of some (trivial) javascript. Turned out that if you Googled the page, the result snapshot included the javascript generated email... In other words - it's already there and this will effectively kill javascript as a way of hiding functioning mailto links. Okay it would be fairly simple to add a condition based on the User Agent as GoogleBot is easily identified but it will make things a bit more complicated for the average user.

    --
    "For every complex problem, there is a solution that is simple, neat, and wrong." -- H.L. Mencken (1880-1956) --