Are Web Pages Getting Larger?
An anonymous reader asks: "I work for a large multinational in a remote part of world. Our connectivity to the outside world (the Internet as well as company communications) is all done via a single E1 line - that's 2Mbps. Thousands of users. The company keeps access pretty well screwed down for security reasons, and the fact that our link to the outside world costs almost $300K/year! Our growing problem is Internet traffic. While policing of non-business use is very active, Internet traffic continues to grow. I'm becoming convinced that one of our problems is that average web page size is growing. As more of the world enjoys broadband access, I think web developers have less reason to limit the size of their web pages. Large images, flash animations and other size-increasing content seem increasingly common. Am I right? Can anyone point to a recent study that would support my theory, and help me convince my management that we just plain need more bandwidth?"
I think the answer to your question lies within the technology itself, and the obvious answer is "yes", web pages are getting larger. Consider that:
So, yes, the web universe is "expanding" in very nearly every dimension. To your specific question, will you need to petition for more bandwidth? Undoubtedly. And, I can't imagine it isn't doable at today's rates. It sounds like a balky bureaucracy, not a question of need. Good luck.
I think maybe the better question to ask, is what has happened to the general psyche of the average employee, and how do you address it? If I had to guess (see, I'm not proving anything with this post!) I'd guess the technology has easily stepped up to the task of underpinning the network use but people still have not learned how to modulate and attenuate the siren that is the internet. (Maybe that would help decelerate your need to upgrade and expand bandwidth.)
You could add a local caching proxy server and/or set browsers to cache longer to reduce bandwidth. Have you done an analysis on how much of the traffic is people just pulling up the same pages?
-- these are only opinions and they might not be mine.
Lynx
A website and all of its pages can be expected to grow over its lifetime, but a lot of newer sites are lot smaller than previous generations. The wide adoption of CSS, and all the user friendliness tech evangalism emphasizing simplicity over noise has been paying off those who listen. There are still a lot of sites, such as web forums, where the attitude seems to be to make have really complex themes with almost no CSS and let mod_gzip/deflate deal with the task of making it small.
You need to change policy, not spend more money. Change the cache settings on clients. Insert caching proxy servers. Make sure mail, DNS, etc. is local. Et cetera. You should find a solution that does not have a linear (at best) relationship with the number of users.
Web pages are getting larger. It might be what is causing your increase in utilization, but to me it's hard to believe (although if your users are viewing a lot of embedded videos, that's another story). And, if it's hard for me to believe, it's probably going to be hard for your PHB to believe, too.
"Duh! Here's more content"
With the broadband market now including a minimum of 25% of home users, and up to maybe 40%, though I haven't looked at those numbers in some time, would be a contributing factor to the fact that yes, web pages are getting bigger.
One way to see proof of this is using the wayback machine.
http://www.waybackmachine.org/
I took a quick sampling of the NYTimes homepage, and noticed that the number has increased by a few kilobytes per year, from 56K in 2001, to 67K in 2003, to 83K in 2005. That's not even counting images. They've added more ad banners since the old days. If you google search, I'm sure you will find stuff.
Ad banners have increased in size, and complexity over time. Streaming content, is another addition, as well as more services running over the network.
You probably have a number of contributing factors happening to your bandwidth, in addition to web pages.
- Unless you have an internal instant messenging environment, you may have many ppl chatting away on services having to use your bandwidth.
- Email for personal use. Jokes, funny attachments, and worms clogging up things.
Here are a couple of suggestions to try and improve traffic:
- block services that shouldn't be run at the office like streaming music content.
- block websites that you see can have an impact on traffic, that you believe users should not be visiting. ie: quicktime movies.
- block your daytrading slacker coworkers.
- block ad servers entirely! this should drastically improve your situation, and be the easiest to implement.
- switch to an internal instant messenging service, if you haven't done so already.
- disable unnecessary services.
- ensure that you have an internet policy that prohibits the users from using their work companies for personal use.
- cache often used content.
this is one of the dumbest /. stories ever
I mean wtf....are they getting bigger? you mean...you mean as more and more people attach databases? and...and...java? etc...
I mean really...how stupid is this
We seldom regret saying too little but often regret saying too much.
A T1(or E1) in a downtown metro area == cheap.
A T1 out in, say, Montana in the US? NOT cheap.
It all depends where you are.
INsigNIFICANT
according to archive.org/waybackmachine:
html size (doesn't include images/dependencies)
slashdot.org yahoo.com microsoft.com
1996 - 7k 11k
1997 - 9k -
1998 23k 10k 20k
1999 35k 10k 20k
2000 36k 12k 17k
2001 41k 16k 21k
2002 39k 17k 28k
2003 39k 32k 31k
2004 51k 33k 38k
Today 19k 14k 22k
the trend has certainly been up, but lately big sites' main pages seem to be slimming down, due to CSS as well as a tendency to store style and javascript in separate file
http://www.google.com/search?hl=en&lr=&safe=off&q= average+web+page+size
w -much-info/
w -much-info/internet.htmlw -much-info-2003/internet.htm
has some good results
http://www.sims.berkeley.edu/research/projects/ho
has info from 2000 and a link to the same info from 2003
specific internet 2000
http://www.sims.berkeley.edu/research/projects/ho
and 2003
http://www.sims.berkeley.edu/research/projects/ho
it's worth noting, these types of statistics can take a year or more to compile..
every day http://en.wikipedia.org/wiki/Special:Random
Web developers (and programmers in general) don't care about optimizing anymore, they just want it to be done so they can get paid. Worrying about such trivial things as a few kbytes or making valid and accessible HTML is asking too much of them.
From a web-designer standpoint, a lot of size can be reduced without altering the content.
Are you serving up nicely formatted HTML with indentations? That's wasteful. Strip whitespace and carriage returns.
Are you using HTML comments? Why? Does the customer really need to see them? Do you need to waste that bandwidth? Delete them or use comments in your server-side scripting language of choice.
Are you using GIF's where PNG's would be smaller? Or PNG's where GIF's would be smaller?
Have you optmized your PNGs, JPEGs and GIFs? (I don't remember a GIF optimizer, but there are plenty of non-destructive ones).
A 50x50 JPEG preview of an item does not need embedded comments, thumbnails, or EXIF data.
If you must use animated GIF's, be sure they are optimized and not full-frame!
Are you using pictures of words, when actual stylized text could convey the same message?
Are you using inline JavaScript or CSS, rather than calling it from a cacheable external file?
Are you using PDF, Flash or Java when it's not ABSOLUTELY necessary?
From a user's standpoint, the best solution, short of getting more bandwith: use less bandwidth. Turn off image loading or use a text-based browser. Don't browse the web as much. If you have a choice of sites to use, use the one that is smallest. Use a proxy. blah blah.
Slashdot: Failed Car Analogies. Amateur Lawyering. Anecdote Battles.
You need a smart gateway. Your E1's border router, or a gateway immediately behind it, needs traffic shaping and queueing. Pretty much any circuit anywhere needs traffic queueing. Either side of your E1 could probably benefit from a compressed virtual circuit such as maybe a VPN. Compress all traffic that way. If you locally host your web servers, you can use a reverse proxy that includes mod_gzip and other stuff to strip whitespace from their content. You can also control your users' behaviors with caching proxies like squid and with a layer 7 packet filter. The layer 7 filter will protect against p2p and such. If you think the network is being abused but you want to encourage self-censorship, make the squid logs public. :)
The un-adoption of mod_gzip and whatever IIS *should* use is also prevalent.
Ticking the box used to crash IIS but these days it actually works, not that you'd notice :
Response Headers - http://www.microsoft.com/
Date: Thu, 08 Dec 2005 10:30:40 GMT
Content-Length: 23186
Content-Type: text/html; charset=utf-8
Cache-Control: private
Server: Microsoft-IIS/6.0
P3P: CP="ALL IND DSP COR ADM CONo CUR CUSo IVAo IVDo PSA PSD TAI TELo OUR SAMo CNT COM INT NAV ONL PHY PRE PUR UNI"
X-Powered-By: ASP.NET
X-AspNet-Version: 2.0.50727
200 OK
Response Headers - http://slashdot.org/
Transfer-Encoding: chunked
Date: Thu, 08 Dec 2005 10:40:11 GMT
Content-Type: text/html; charset=iso-8859-1
Cache-Control: no-cache
Server: Apache/1.3.33 (Unix) mod_gzip/1.3.26.1a mod_perl/1.29
SLASH_LOG_DATA: mainpage
X-Powered-By: Slash 2.005000090
X-Fry: Where's Captain Bender? Off catastrophizing some other planet?
Pragma: no-cache
Vary: User-Agent,Accept-Encoding
Content-Encoding: gzip
200 OK
There are places where the networks are not touching,and there are places where they are-Boeing's Lori Gunter
An E1 data circuit via a satellite channel to Africa or the Middle East will run about US$125k to US$200k/year, in satellite costs, uplink and downlink station maintenance, and the actual internet connection in Europe or NYC.
Compressors, TCP (packet shaping) optimisers, proxy caches, DNS/email caching, webvertising blocks, QoS and agressive firewall rules are pretty much a given for any kind of expensive satellite connection. On the luser end, to really make use of the web they can set their browsers to not automatically load images, change their TCP window to something huge, and a bunch of other tricks to keep themselves happy. Remote stations with large numbers of geeks have NNTP servers locally to keep up on the non-web world. IRC/IM is quite widely used, because they don't use much bandwidth at all (although I've heard of remote stations banning MSN messenger because it won't work without constantly loading advertising images)
But really, US$300k per year for an E1 circuit? There isn't any place on earth still that expensive. Drop me an email, we'll do lunch.
the AC
Hemos is like...sci-fi fans;he thinks technology is cool, but he hasn't bothered to understand the science it's based on