Google To Host Ajax Libraries
ruphus13 writes "So, hosting and managing a ton of Ajax calls, even when working with mootools, dojo or scriptaculous, can be quite cumbersome, especially as they get updated, along with your code. In addition, several sites now use these libraries, and the end-user has to download the library each time. Google now will provide hosted versions of these libraries, so users can simply reference Google's hosted version. From the article, 'The thing is, what if multiple sites are using Prototype 1.6? Because browsers cache files according to their URL, there is no way for your browser to realize that it is downloading the same file multiple times. And thus, if you visit 30 sites that use Prototype, then your browser will download prototype.js 30 times.
Today, Google announced a partial solution to this problem that seems obvious in retrospect: Google is now offering the "Google Ajax Libraries API," which allows sites to download five well-known Ajax libraries (Dojo, Prototype, Scriptaculous, Mootools, and jQuery) from Google. This will only work if many sites decide to use Google's copies of the JavaScript libraries; if only one site does so, then there will be no real speed improvement.
There is, of course, something of a privacy violation here, in that Google will now be able to keep track of which users are entering various non-Google Web pages.' Will users adopt this, or is it easy enough to simply host an additional file?"
Compared to all the other crappy media that sites tend to have these days, centralizing distribution of a bunch of Javascript libraries makes almost no sense. I doubt it would even appreciably reduce your bandwidth costs.
If you want to improve the speed of downloading, how about removing 70% of the code which just encodes/decodes from XML and start using simple and efficient delimiters? I was a fan of Xajax, but I had to re-write it from scratch... XML is too verbose when you control both endpoints.
It is not a problem to host an additional file, and this only gives Google more information than they need... absolutely no good reason for this.
"There is, of course, something of a privacy violation here..."
Yeah, its Google, so lets just talk about privacy. Does not matter if its relevant to the story or not. You see, its Google.
...the blurb: There is, of course, something of a privacy violation here, in that Google will now be able to keep track of which users are entering various non-Google Web pages.
Ha. News at 11.
This is only a partial solution. The real solution is for sites using AJAX to get away from this habit of requiring hundreds of kilobytes of scrip just to visit the home page. Couldn't you design a modular AJAX system that would bring in functions as they are needed? That way, someone visiting just a couple pages wouldn't have to download the entire library. Have each function in it's own file, and then when an AJAX call is done, make it smart enough to figure out which functions need to be downloaded to run the resulting Javascript. The problem with Google hosting everything, is that everybody has to use the versions that Google has posted, and that you can't do any custom modifications to the components. I think that what Google is doing would help. But the solution is far from optimal.
Anthropic principle: We see the universe the way it is because if it were different we would not be here to see it.
Now if only this could be done with GWT. Rather than building on a base-library, GWT vomits a slew of files all with hashed names. Since no two compiles are the same, you end up with an ever growing set of JS and HTML files sitting in the component directory. This is particularly annoying as all these files interact poorly with version control systems. (Even one as advanced as, say, Mercurial.)
At the very least, a standard ANT plugin so that GWT could be built at build-time rather than dev-time would do wonders for the project.
The DTD files for the basic XML schemas had been hosted centrally at Netscape and w3.org since forever. No one cares or, indeed, notices (until they go down, that is).
I can assure you, the best way to get rid of dragons is to have one of your own.
Surely this doesn't open the door to Google much wider than it already was. Don't they already know about every page you hit that serves up their ads?
Yeah, but what if Google decides that nobody is using these -- or they can't legally host them for whatever reason -- or they just decide that they don't want to do this anymore?
I like Google too -- and this is nice of them -- but I like the idea of a website being as self-sufficient as possible (not relying on other servers, which introduce extra single-points-of-failure into the process.)
At the risk of sounding like an old curmudgeon, whatever happened to good ol' HTML?
Paleotechnologist and connoisseur of pretty shiny things.
As a developer, privacy of my users is of paramount importance. I have grown increasingly concerned with Google's apparently incessant need to pry into my searches and my browsing habits. Where once I was a major Google supporter, I have trimmed my use of their service back from email and toolbars to simple searches and now even won't use their service at all if I am searching for anything that may be misconstrued at some point by guys in dark suits with plastic ID badges. The last thing I am going to do as a developer is force my users into a situation where they can feed the Google Logging Engine.
public void karmaWhore(String url){addSlashdotComment(fetchContent(url));}
Yes, you've gotta be careful with those incompetant sysadmins that Google are hiring.
After all, they're constantly getting the servers hacked.
If I were worried about bandwidth, why wouldn't I just use one of the packed down files? They're as small, if not smaller, than most of the images that will appear on a web page.
I didn't see no slashdot article when yahoo put up hosted YUI packages served off their CDN.
I guess it's because google is hosting non-google libraries?
Quidquid latine dictum sit, altum videtur
I see this as more of a value add for people using other outward facing google products -- namely google apps and google pages. Why have a brazillion copies of these things on their servers (and using up their customers' storage limit) when they can offer it up once.
It also ensures that all web-sites using these projects can keep up to date automatically, so any security hole or bug gets fixed immediately for sites that take advantage of this.
As well, I can see this as a benefit for users of noscript and the like. If you've already white listed "code.google.com" (or wherever it's being hosted) on one site's implementation, any other site using it will automatically be cool too.
Besides, you can already do this with their google code repository. go look at Dean Edward's projects. All of them are hosted on google code, and he specifically recomends pointing to the google server from your site. This seems to be just an extension for other open source projects. Sure this could be handled by the individual projects themselves, on their own servers. But why have your site hammered by the infinite visitors of the sites that use your product when Google is willing to absorb the hammering for you.
With their own YUI libraries. See here Anyone have any experience with this? I'm a bit wary of trusting Yahoo, although I guess it's easy enough to swap it out.
Yeah, but what if Google decides that nobody is using these -- or they can't legally host them for whatever reason -- or they just decide that they don't want to do this anymore?
Think broader. What happens when:
But, yes- you're right. This is a scary new dependency. For a company full of PhD geniuses supposedly Doing No Evil, nobody at Google seems to understand how dangerous they are to the health of the web. In fact, I'd suggest they do, and they don't care- because they seem hell-bent on making everything on the web touch/use/rely upon Google in some way. This is no exception.
A lot of folks don't even realize how Google is slowly weaning open-source projects into relying on them, too (with Google Summer of Code.)
Please help metamoderate.
What I would expect is that this will be useful for many people and that there is no drawback in using (yet another) Google service especially not if Adsense or Analytics already let Google track your visitors.
If there are reasons for not to use it (privacy, control), you probably already know this of yourself because you have carefully picked where to host your site (possibly in-house) and/or partnered with a CDN (even if just S3) to optimise content delivery. Or you have an intranet application where there is hardly any advantage for this.
Basically, you won't use this if you believe you know what you're doing, which you (yes, you) and me both do.
Additionally, if you're using compression, it is likely that one large file will compress more effectively than a collection of smaller files. (You *are* using compression, aren't you?)
or someone else not trying to be not evil
when sysadmin blocks google...
... sysadmin blocks yahoo and other yahoo properties .. so sites that uses yahooapis.com are blocked also.
your site won't be rendered properly.
on our corporate network
i know you're not suppose to use companys internet connection --- but who else are workin that sometimes visit other sites? like slashdot
eg http://o.aolcdn.com/dojo/1.1.1/dojo/dojo.xd.js
This is really fast - I think they cache on distributed servers. Much faster than from my own server.
Anybody have more info on this? Is Google going to do something similar? Is AOL harvesting data on my clients' users?
Verbum caro factum est
SSL might not like referencing remote libraries...
A far better solution would be to add a meta-tag to a call, which the browser could check to see if it has it. For security reasons you need to define it always to use it, so if you don't define it, there will never be a mixup.
Eg:
script type="javascript" src="prototype.js" origin="http://www.prototype.com/version/1.6/" md5="..............."
When another user want to use the same lib, he can the use the origin, and the browser will not download it from the new site. It's crucial to use the md5 (or other method), which the browser must calculate the first time it download it. Or else it would be easy to create a bogus file and get it run on another site.
Of course this approach is only as secure as the hash.
The web really needs some sort of link to a SHA-256 hash or something. If that kind of link were allowed ubiquitously it could solve the Slashdot effect and also make caching work really well for pictures, Ajax libraries and a whole number of other things that don't change that often.
Need a Python, C++, Unix, Linux develop
I know it is not obvious, but sites that are sensitive to bandwidth issues may find this a cost saving measure.
Google, of course, gets even more information about everyone.
win win, except for us privacy people. I guess we have to true "do no evil," huh?
Ah, darnit, I forgot that allmighty Google is totally hackerproof by definition. My bad.
It's really foolish to replicate these libraries all over the place.
No one says you have to use google's service. It's just an idea. They eliminate library management problems for you and you give them a little data.
So what. Do you think that COMCAST and other companies that are throttling bit torrent and high jacking DNS queries aren't mining and selling all your UNENCRYPTED CAN BE READ WITH NOTEPAD AND TCPDUMP HTTP get requests?
not related to the story?
Yeah, so it downloads some Ajax library twice, or even ten times, or a hundred. So what? The ads on your typical webpage are ten times as much in size and bandwidth.
Thanks, but I prefer that my site works even if some other site I have nothing to do with is unreachable today. Granted, Google being unreachable is unlikely, but think about offline copies, internal applications, and all the other perfectly normal things that this approach suddenly turns into special cases.
Assorted stuff I do sometimes: Lemuria.org
...your script are belong to us
You feel sleepy. Close your eyes. The opinions stated above are yours. You cannot imagine why you ever felt otherwise.
First, I block all google-related content, period. This type of thing would render many sites non-operational.
."
Second, I've always had this complaint with the whole external javascript files. When you're already downloading a 50K html page, another 10K of javascript code in the same file inline downloads at full-speed. The external file requires yet another hit to the server, and everything involved therein. It almost never makes any sense. Even as a locally cached file, on a broadband connection, downloading the extra 10K is typically faster than opening and reading the locally cached file!
But still, hosting a part of your corporate web-site with google simply breaches most of your confidentiality and non-disclosure agreements that you have with your clients and suppliers. It's that simple. Find the line that reads "shall not in any way disclose Confidential Information to any third party at any time, including consultants and contractors, copy and/or merge the Confidential Information/business relationship with any other technology, software or materials, except contractors with a specific need to know . .
Simply put, if your Confidential client conversations go over gmail, you're in breach. If google tracks/monitors/sells/organizes/eases your business with your clients or suppliers, you're in breach -- i.e. it's illegal, and your own clients/suppliers can easily sue you for giving google their trade secrets.
Obviously it's easier to out-source everything and do nothing. But there's a reason that google and other such companies offer these services for free -- it's free as in beer, at the definite cost of every other free; and it's often illegal for businesses.
The whole point of AJAX is to reduce the amount of data you need to send to the user, not necessarily to reduce the amount of code. Yes, the browser will need to download the entire library, but only once. Caching takes it from there.
Compared to data, code is small. This is not a universal truth -- you can have a white pages site with a tremendously weighty interface that displays nothing but "Jenny 867-5309" -- but it is a valid assumption in the general case. With AJAX, data is effectively unbounded.
If you're using AJAX just to make your collection of 42 casual haiku look pretty, that's one thing. If you're using AJAX more along the lines of Google Maps (where there is almost unfathomably more data than code), that's a horse of a different color. I imagine most people are somewhere in between, but it seems readily obvious that it would be incorrect to think of the AJAX designs in the present using the assumptions of the now distant past.
Currently its either use a popular open source library which adds some extra bandwidth overhead or reinvent the wheel yourself.
Isn't pulling javascript from different domains a fundamentally dumb idea? I disable javascript for everything, then enable on a per site basis if the javascript provides something useful to me. Pulling javascript from multiple domains makes it a pain in the backside having to find where all the javascript is coming from and enable javascript exection from that domain.
Well, one effect of this would be to allow google to execute scripts in the security context of any site using their copy of the code. The same issue occurs for urchin.js etc. If your site needs to comply with regulations like PCI DSS or similar then you shouldn't be doing this as it means google has access to your cookies, can change your content etc. etc.
For many common sites that aren't processing sensitive information however, sharing this code is probably a very good idea. Even better would be if google provided a signed version of the code so that you could see if it has been changed.
This was a dumb feature in Javascript. In LISP, there's the "reader", which takes in a string and generates an S-expression, and there's "eval", which runs an S-expression through the interpreter. The "reader" is safe to run on hostile data, but "eval" is not. In Javascript, "eval" takes in a string and runs it as code. Not safe on hostile data.
JSON is a huge security hole if read with "eval". Better libraries try to wrap "eval" with protective code that looks for "bad stuff" in the input. Some such libraries actually work. Maybe. The process of checking "JSON" input for "bad stuff" is complicated enough that just parsing the input without "eval" can be simpler.
I asked Google to do this a long time ago:
http://www.tallent.us/blog/?p=7
This will enable web developers to support richer, cross-browser apps without the full "hit" of additional HTTP connections and bandwidth.
Users gain the benefit of faster rendering on every site that uses these libraries--both due to proper caching, and because their browser can open more simultaneous HTTP connections.
If Google goes down, change your header/footer scripts. BFD.
In an age where Flash/Silverlight/etc. are supposed to be the "next big thing," I'm glad at least one company is not abandoning HTML-based apps.
One site covering this noted plans to 'stay up to date with the most recent bug fixes' of the hosted libraries -- this sounds like blindly upgrading the hosted libraries to new versions, which is a very bad idea.
As a commenter there noted, it's a much better idea to use version-specific URIs, allowing users to choose the versions they wish to use -- otherwise version mismatches will occur between user apps and the Google-hosted libs, creating bugs and the classic 'dependency hell' that would be familiar to anyone who remembers the days of 'DLL hell'.
Single-point-of-failure, DNS-cache-poisoning, host-file-redirects, etc. etc.
You are not thinking this through!
The only difference is that it would not be obvious to you when it happens. If you think otherwise you're basically fooling yourself through obscurity.
You're absolutely within your rights to decline to participate by blocking GA. Just don't think you're accomplishing anything of substance by doing so. If you really don't want your site access used for marketing, your only option is to not go to the site. You could use an anonymizing proxy to break the connection to you personally, but the use patterns would still be recorded.
Build a man a fire, he's warm for one night. Set him on fire, and he's warm for the rest of his life.
Hmm... I can see some security issues with this!
Imagine:
- Hacker creates homebrew prototype.js with malicious code and puts it somewhere on a server.
- Hacker now proceeds to hack into an ISP's or company's DNS server, changing prototype.google.com or whatever it will be called to the desired IP.
Voila, everyone visiting a site where the Google version of prototype.js is used will be loading the malicious code.
Perhaps I am too paranoid about this but...
If you and I have an NDA, and I place a call to you from my cell phone, the mere existence of that call does not constitute "confidential information" or a "trade secret." My cell company and your phone provider (at a minimum) would have logged the call, although not the contents. You're right that sending confidential information via Gmail may constitute breach, but by that standard, sending confidential information via ANY unencrypted e-mail may constitute breach since it traverses the public Internet, including both of your ISPs--where it may be subject to caching and deep inspection by spam filters. Simply put, only end-to-end encryption protects confidential information. If you have that, you can send the encrypted data any way you want.
I applaud the desire to consider confidentiality and contractual obligations, but overreaching can be needlessly complex and costly. Reacting so strongly to ANY third party vendor--without consideration of the details--is sort of like "your computer is broadcasting its IP address!" It's true, but of no serious consequence.
Build a man a fire, he's warm for one night. Set him on fire, and he's warm for the rest of his life.
This isn't something Google came up with. It's great that they're doing it, but YUI did it quite a while ago. http://developer.yahoo.com/yui/articles/hosting/
In fact, the cache headers specify that the JS libs don't expire for A YEAR, so Google will only see the first site you visit with X library Y version in an entire year.
Is this information really that valuable?
Mind you, this assumes you're hard-coding the google-hosted URLs to the JS libs, instead of using http://www.google.com/jsapi -- but that's a perfectly valid and supported approach.
If you use their tools to wildcard the library version, etc. etc. then they get a ping each time that JSAPI script is loaded (again, no huge amount of info for them, but still you can decide whether you want the extra functionality or not).
The google-hosted JS libraries have headers that will only make your browser resolve & hit the server once a YEAR per library version.
This is different from analytics and ad servers for that reason -- those are NEVER cached because they want every browser to hit them, every time.
The whole idea of having a single URI for these very common .js files is that they can be cached, and not just on your local computer. Any router with the ability to follow the HTTP1.1 cache protocol would serve these pages out of a local cache.
Moreover, if this idea catches on, WebBrowsers will begin shipping with these well know URIs preinstalled, perhaps even with optimized versions of the scripts that cut out all the IE6 cruft. What is really needed to make this work is a high bandwidth, high availability server that has enough name recognition to get them selves on slashdot. Google sounds like the right choice to me.
If this works, in 5 years most of the requests for these URIs will never even leave your computer, and you cannot beat that kind of privacy.
Strive to make your client happy, not necessarly give them what they ask for
XSS, SQL Injection, not escaping various characters, bad ideas, etc, etc
We're all doomed!!!!!!! *yawn*
all SMTP passes in the open right?
google or not-- emails pass through third parties all the time.
every day http://en.wikipedia.org/wiki/Special:Random
Dojo is available on the AOL CDN for quite a while now... it's even the suggested way of using it, in the current manuals.
Also, since Google is opening up its platform, makes sense to host popular AJAX toolkits for all to use -- why the big fuss? If you host your apps there, might as well use the common libraries -- if you don't want to, fine.
And what does that have to do with opening yourself up to unnecessary risks despite Google having competent admins? Do you really want a Google outage to cause your site to become unusable, whether that is due to an outage at Google itself or caused by a problem on the way to Google, like the recent bogus route announcements? Do you want your website to be in the same boat when a single DNS poisoning makes millions of websites load trojaned AJAX libraries? As a user, do you want a single universal hosts entry to be able to subvert millions of websites? This is a massively bad idea, considering that you only save a split second load time once per visit.
No one said they were hackerproof. However, how often do you hear about them getting hacked? They put a significant amount of energy into security.
Wouldn't it make more sense to create a Firefox extension that preloads this and/or redirects pages to use your local copy of these libraries?
:)
It would be even easier with Google's hosting. Just hit them on startup, without referrer info, and then any site that links to Google's version is a little bit faster, and no privacy concerns exist.
You could even have the extension check if the file is changed and pop up a warning that google is being evil
There is valuable data that you can get from an analytics package, but not all analytics packages need to be invasive at the client-side. A ton of accurate info (including bot traffic) can be obtained from server-side packages like Webalizer and AWstats. These do not invade the user's privacy and give you an accurate idea of what is hitting your servers. That being said, it is not as easy to process the information just from the server side. However, best of all, these apps are free and Open Source!
Hosting files externally is a huge security risk and causes a warning/error pop up in the browser. No to mention that putting the page load time of your site in someone else's hands is also a risk.
Yahoo does the same thing with it's YUI. It's optional. But careful, Yahoo (or Google in this case) can see all the referrer information since it's something you include *every* page, including generated ones. Google/Yahoo could be viewing parameters to your dynamic pages. Then again that's true of any site that displays third party advertising. http://noscript.net/ takes care of this problem in firefox.
While I think google has identified a legitimate problem, comparatively large and widely used ajax libraries aren't going anywhere anytime soon, their proposed solution seems like a weak hack.
Coming up with a real solution will require changes to the browser and/or the server, and it will take time and thought to get everything working and cover the edge cases; but the problem is hardly insurmountable.
Ideally, we could add a little bit of extra, optional, data to the browser caching system, and markedly increase its ability to support reuse of what are basically libraries across domains. Off the cuff, I'd suggest hashing. The browser would take an MD5 or SHA1 of each object it is caching. When the browser hits a new page, the page could refer to http://urloffoolibrary.com/foo.js [SHA1 of foo.js] and, based on that, the browser could check to see if it already has foo.js(even if downloaded from http://foomirror.com./
The other option, of course, would be to recognize that ajax stuff is basically growing its way toward equivalency with desktop apps, and bite the bullet of essentially adding package management to the browser cache. That would be rather heavierweight, and would present a number of possible issues; but might also allow a more elegant approach.
You do have all the rights you're talking about. But if the biz model doesn't work for the provider, and they can find a way to exclude you for not accepting Google ads, or for not accepting images, or... well, pretty much anything else, there's no law that says they have to allow you to browse your way.
Of course, if their content is compelling enough, either you'll accept their terms, negotiate new terms with them (i.e. pay for no-ad versions) or find a way to defeat their protection mechanisms. If it's the last option, they'll eventually figure it out and create yet another layer of protection.
Heck, some sites will even refuse you if you don't use a particular version of IE. Blocking access to unprofitable visitors shouldn't be too difficult.
The CB App. What's your 20?
I design sites to that when Javascript fails the site still works. Same with CSS.
I agree about DNS poisoning though. It's a pity that we haven't fixed DNS yet.
they decide to do, and put some malicious code in the libraries? Or if one of the libraries somehow enables XSS attacks on google mail accounts or the like?
I don't like it.
One very important metric google analytics doesn't include is "Who doesn't have Javascript enabled?". Another thing to keep in mind: the whole "hosting scripts for global caching" thing was already done by Yahoo! with their YUI libraries, so keep in mind you should apply all your google-directed conspiracy hate at them as well.
Uh, the problem isn't "bandwidth costs". The problem is user experience -- lots of sites use the same JavaScript libraries, but browsers cache based on URL. If a centralized repository for the JavaScript is used, the average load time of every page using the repository is reduced compared to if each site hosted the library locally.
The idea isn't that every request for a library has to go through Google, the idea is that almost all requests for the libraries can hit the user's local cache and not hit the network at all. (Google, I suspect, will get more, though still rough, information about website popularity than information about individual browsing habits from this.)
Really? Sure, they can tell which web page using the Google-hosted library a user hits first, if they care, but how are they going to tell which one's they hit later when the request for the library are satisfied by the user's local cache rather than resulting in any network request at all, which is the whole point here?
Incorrect. Attempting to read a cached page may still result in a hit to the server, depending on the configuration of the browser. Most browser have a configurable setting as to when they will check for modified pages, and "on every access" or the equivalent is not, IME, usually the default setting, so, without an active decision by the user to do so, the browser will not hit the server on every request for a cached page, and most browsers are easily configured to never hit the server if a non-stale version of a resource is cached, though the usual default is somewhere in between the extremes.
The HTTP spec disagrees with your position, which lays out a position contrary to a major purpose of HTTP caching (RFC 2616, Sec. 13: "The goal of caching in HTTP/1.1 is to eliminate the need to send requests in many cases, and to eliminate the need to send full responses in many other cases." [emphasis added].) Under the requirements for cache correctness (Sec. 13.1.1 of RFC 2616), a correct cache need not send a request to the server if it has a cached version that is "fresh enough" as defined by the most restrictive of the requirements of the client, origin server, or the cache itself (for a client local cache, of course, the first and last of these should be the same), and indeed, under the spec, a cache that has a fresh-enough version available should return it even when it has no connectivity to the origin server.
You seem to think that HTTP caching never reduces the number of requests sent, only the number of full responses returned. This position is not supported by the HTTP 1.1 specification (RFC 2616).
http://www.cert.org/homeusers/email_postcard.html
there are, between my ISP and the destination ISP-- many many waypoints that a bored tech can use to copy all the packets moving through--
every day http://en.wikipedia.org/wiki/Special:Random
And that bored tech would be performing an illegal activity. Plain and simple.
Not to mention that my e-mail is sent via my e-mail server -- which is a web-server that I pay for and manage -- which then connects pretty-well directly to the destination e-mail server. The only way-points are my server as the source, and my client's server as the destination. So again we're left only with the actual node-to-node transmission through the ISP.
It's not trusting. It's liability. I don't need to trust that the bored tech in your case isn't reading the e-mail. I need to know that if he does, and he does something with the information, it's illegal and probably criminal which means that I'm not liable for the damages to my client that may result.
That's the same reason for the cameras and security alarm systems here. It's not to stop the burgler, who's wearing a mask anyway. It's to prove to the insurance company that someone did indeed break in and steal something, and that I'm not commiting insurrance fraud.
in windows (if you have it)
type in tracert gmail.com
EVERY computer/ip item listed-- takes your plaintext SMTP and passes it to the next item, and can keep a copy if they want to.
you really think this is wholly different than keeping it in googles gmail server?
every day http://en.wikipedia.org/wiki/Special:Random