How Much Bandwidth is Required to Aggregate Blogs?
Kevin Burton writes "Technorati recently published that they're seeing 900k new posts per day. PubSub says they're seeing 1.8M. With all these posts per day how much raw bandwidth is required? Due to innefficiencies in RSS aggregation protocols a little math is required to understand this problem." And more importantly, with millions of posts, what percentage of them have any real value, and how do busy people find that .001%?
so i wouldn't say ANY site using apache... but probably most. the real problem there is with compression load on the servers... gzip compression doesn't just happen you know, it takes CPU cycles that could be being used to just push data rather than encode it.
I run the spiders at Technorati, and it is 0.9 million posts a day, which Kevin Burton had correct in the post cited. Is the is the no dot effect?
If your weblog server implements ETag and Last-Modified, my spider can send a one packet request with the values I last saw from you, and you can send a one packet 304 response if nothing has changed.
Charles Miller explained this well a few years ago.
(I run the spiders at Technorati).
This effect is called the The long tail effect, and is visible all over the web. For instance, Amazon.com says that every day, it sells more books that didn't sell yesterday than the sum of books sold that *also* sold yesterday. In other words, they sell (in sum) more of the items selling less than one every other day than of items selling (by type) more than that.
Eivind.
Doubting the existence of evolution is like doubting the existence of China: It just shows that you're uninformed.