CDN Optimizing HTML On the Fly
Caerdwyn writes "Cotendo, which is a content distribution network, has taken to altering HTML as it passes through their CDN to optimize web pages for faster rendering. This is essentially a repackaging of the Apache mod mod_pagespeed (from Google), with the critical difference being that the rewriting of HTML occurs inline rather than at the web server. We all know that well-written HTML can result in much better rendering of whatever your content is; the questions are 'Will this automatic rewriting cause other problems, i.e. browser quirks?' and 'Assuming that only the web pages of Cotendo's customers are altered, are there nonetheless potential legal troubles with someone rewriting HTML before delivery to a browser?'"
It doesn't really matter if it's a technically good move; if this sticks, we might be getting lots of ISP-inserted ads, iframed toolbars and other "value-added" stuff in non-encrypted HTTP traffic pretty soon.
If I write a web page, however much it sucks, that's exactly how it should be delivered.
If I see a bad website that takes 20 minutes to load, then I will never buy anything from that site or it's company. If they can't hire a decent web programmer, they don't deserve my money.
However, if you change the page to make it render faster, the ISP is lying FOR the shitty company and their shitty website by making it appear to be a well crafted site.
tl;dr: Leave the shit shitty. It'll put bad programmers out of business which we need.
"Freedom in the USA is not the ability to do what you want. It is the ability to stop others from doing what THEY want"
Instead of doing it over and over again on the fly, why not just do it once and shoot the "fixed" html back to the authors, and firmly insist that they update their pages? This seems like a much better way to accomplish the same thing.
the questions are 'Will this automatic rewriting cause other problems, i.e. browser quirks?' and 'Assuming that only the web pages of Cotendo's customers are altered, are there nonetheless potential legal troubles with someone rewriting HTML before delivery to a browser?'
I couldn't give a rat's ass about legal troubles. Slashdot is still a tech forum, right?
There are LOADS of much more interesting questions to ponder, such as: what is exactly the speed improvement? And does it work for Javascript and CSS too? And wouldn't it be much better to work on images instead? Or is that too computationally intensive? What kind of algorithm do they use? In what language is it implemented? Et cetera. Legal troubles shmegal smougles.
8 of 13 people found this answer helpful. Did you?
If you voluntarily upload your web site to a CDN that tells you it is going to optimise your code, what legal issues could there be? The arrangement is entirely mutually consensual. If you don't want your site optimised, then don't use that CDN.
This seems like an ad for Contendo disguised as an inflammatory post.
Any webmaster worth their salt is using a variety of tools to improve loading speed - minification of html/css/js, combining scripts, CSS optimization, js packing, compressing PNGs with better tools and using CSS sprites.
I use W3 Total Cache for two of my blogs and the speed increase is substantial.
While we are at it, I wish developers would think it through before using JQuery for trivial stuff. Loading JQuery + bunch of plugins to do simple (and I mean simple) fades or form validations is pointless. Here's an example of what I mean.
So if they're doing this transparently, it's all th better.
They need a case of whupass opened up on them. What a bunch of maroons !! It's fucking idiots like this that give Canada a bad name !!
'Assuming that only the web pages of Cotendo's customers are altered, are there nonetheless potential legal troubles with someone rewriting HTML before delivery to a browser?'"
Why should there be? They're not selling bandwidth. They're selling an optimization service (at least, according to their press release, that's what they're selling). This seems to be a clear opt-in situation for their customers. Also, their customers are the ones who are going to be saving money because of this, probably not Cotendo.
Just think of all the possibilities with steganography in poorly written html?/> tags, empty span and font tags, the number of 's - casing in css styling and everything!
vs
and
Just look to North Korea!
File not found. Fake it(Y/N)? _
... in browsers with an incomplete implementation of the rendering engine, of course. i.e. IE
If you're paying a CDN to do this for you, then you obviously want them to do it, otherwise you'd take your business elsewhere. This literally has nothing to do with ISPs filtering web pages, because they aren't. It's a completely (useful) opt-in service.
Some "aggresive" optimizations could eventually do more damage than good, the extension in apache is very configurable in its features, and some of them have their risks, specially if you work with very dynamic html. Of course, their optimizations could be the safe ones mostly, and the one that makes the pages is the one hiring their services, in particular this one (as seem to be optional), clients be aware exactly of what optimizations they are doing and what they should take into account doing/maintaining their sites.
In general, it seem to be a good idea doing it, in a way or another. And if you can't go thru all the troubles doing the optimizations yourself, and your own servers for some reason can't run this module, the cdn could be your last chance to do that.
Encode as binary HTML (fastinfoset or exi) & transcode jpg images to jpeg-2000 (approx. 50% saving on image bandwidth vs. optimized jpeg).
Simples.
Computers are fast as shit these days. Most of my time is spent on news sites or community sites rendering mainly a wall of text anyway. The bottleneck is -delivering- the page (and all those goddamn barnacles from *.adservice.* that have to come down with the content I -actually- want to read. Once the page reaches my laptop rendering is instantaneous. So why bother altering the content?
Is an apache module, so they should be using it in a reverse proxy role or something like that. Without taking merits from Apache, i tought that they would be using i.e. Varnish for that task.
The design mechanisms of many ad-laden sites *cough*cough*cough*ebay*cough*yahoo*cough*google*cough*aol*cough*cough* are extremely inefficient, but they are still fast because of optimizations. However go to your average "I know how to use dreamweaver" idiot developed site and watch as all the broken tables and stuff make the browser client crawl.
Any site that allows user-submitted-content, eg wordpress, blogger, ebay listings, are like this. This is the very content that the apache module can fix, but it can only do so much against poor coding.
The ad companies, eg adsdaq/contextweb give their customers shitty javascript that looks like this:
-script src="http://tag.contextweb.com/TagPublish/getjs.aspx?action=VIEWAD&cwrun=200&cwadformat=728X90&cwpid=XXXXXXXX&cwwidth=728&cwheight=90&cwpnet=1&cwtagid=XXXXXXXXX"--/script-
Note lack of escape characters and required attributes.
the questions are 'Will this automatic rewriting cause other problems, i.e. browser quirks?'
A snippet out mod_pagespeed's "rewrite CSS" filter:
"CSS minification is considered moderate risk. Specifically, there is an outstanding bug that the CSS parser can silently lose data on malformed CSS. This only applies to malformed CSS or CSS that uses proprietary extensions. All known examples have been fixed, but there may be more examples not discovered yet. Without this the risk would be low. Some JavaScript code depends upon the exact URLs of resources. When we minify CSS we will change the leaf name of the file (although leave it in the same directory). This could break such JavaScript."
Yet their own examples show other risks, as they rewrite CSS selectors from longer selectors to completely different short selectors (i.e. from "div.class span" to "#id"), which even in their examples are only superficially equivalent, and not taking into account any dynamic content the page may render via JS. So any application utilizing JS would suffer a range of awkward issues, even if your code is perfectly valid.
Talking about valid code, one of their other filters:
"The quote removal filter eliminates unnecessary quotation marks (either "" or '') from HTML attributes. While required by the various HTML specifications, browsers permit their omission when the value of an attribute is composed of a certain subset of characters (alphanumerics and some punctuation characters)."
The rest is predictable and of little use: cache headers, image compression, JS minification, CSS/JS outlining/inlining, whitespace collapsing, and removing attributes that specify defaults. Those are all basic and low yield optimizations that any web developer would, for the most part, have done in his original source to begin with.
Unfortunately Google's expertise in searching doesn't automatically transfer into other areas, and this clumsy tool, which at best does little, and at worst produces broken code, is definitely one of their weaker efforts.
If Cotendo intends to force this option on their users without an opt-in (the press release doesn't clearly say), then that's one distribution network I'm definitely not using.
I installed mod-pagespeed recently on a server, and it had some unintended results to put it mildly.
Initial page-loads were slower, as perhaps page-speed was analyzing each page and figuring out an optimization plan. However, that wasn't the worst problem: mod-pagespeed sometimes BROKE javascript code. It attempted to combine and minify javascript files and code, and modern web browsers started producing javascript errors and not working as expected.
Needless to say, mod-pagespeed was immediately removed from our server. I expect that the Cotendo CDN will also have these sorts of issues, which will drive website owners and developers crazy. It will be interesting to see how major content providers react when their content breaks because of the CDN.
Matthew Clark
http://gorges.us/
Sounds like a Man-in-The-Middle attack to me...
Interesting that this is from Google. One of the most frequent causes of delay I see are links to external sites. In recent times, I have specifically noticed lots of web-pages waiting for Google Analytics.
Enjoy life! This is not a dress rehearsal.
I think it's an interesting feature that might help Cotendo stand out among the pack. Don't use them if this kind of feature bothers you. But if the changes don't introduce any bugs and your website loads faster for it, why would you possibly take that as a bad thing?
I went to eat some animal crackers and the box said, "Do not eat if seal is broken." I opened the box and sure enough..
What happened to view source, the browser function that build the web?
I think it is not nice to deliver unreadable code to your users. Removed line breaks and indenting spaces, obfuscated javascript variables, automatic changing of meaningful file names to some hash-gibbeish ... do not like.
It's only optimization of images, css and some caching,..but still pretty COOL.
Write a webpage that in a very inefficient way fails to display goatse.
If pages load slow, it's very seldom because their HTML has too much white space.
Most page load delays today come from waits for loads from third-party sites. Usually ads, of course. Or because they're doing something in Javascript that's eating time in the browser.
Now, rewriting the page to remove ads - that would really speed things up. Or just replace all images from ad sites. The server still reads the image from the ad site, so the ad site thinks it delivered the image, but there's no need to ship it down the last mile to the user.
We all know that well-written HTML can result in much better rendering of whatever your content is
I did not expect to read that on Slashdot.
So what, been doing this for clients with F5 BigIP for the last 4years....
What is very interesting about this, is that <rest of comment optimized away>
If Pandora's box is destined to be opened, *I* want to be the one to open it.
Now you may know the infrastructure on CDN, but who really knows if they need CDN? Here's how http://bit.ly/97GMxA