How Much Bandwidth is Required to Aggregate Blogs?
Kevin Burton writes "Technorati recently published that they're seeing 900k new posts per day. PubSub says they're seeing 1.8M. With all these posts per day how much raw bandwidth is required? Due to innefficiencies in RSS aggregation protocols a little math is required to understand this problem." And more importantly, with millions of posts, what percentage of them have any real value, and how do busy people find that .001%?
order of magnitude out there, fella... better try again with this new fangled "math" stuff
so i wouldn't say ANY site using apache... but probably most. the real problem there is with compression load on the servers... gzip compression doesn't just happen you know, it takes CPU cycles that could be being used to just push data rather than encode it.
Answer: Not enough to justify the cost to do it. Which goes to show you that if a site as popular as slashdot can't save money doing this, no other site on the net belongs converting to xhtml, economically speaking of course.
"Though a few KB doesn't sound like a lot of bandwidth, let's add it up. Slashdot's FAQ, last updated 13 June 2000, states that they serve 50 million pages in a month. When you break down the figures, that's ~1,612,900 pages per day or ~18 pages per second. Bandwidth savings are as follows:
Savings per day without caching the CSS files: ~3.15 GB bandwidth
Savings per day with caching the CSS files: ~14 GB bandwidth
Most Slashdot visitors would have the CSS file cached, so we could ballpark the daily savings at ~10 GB bandwidth. A high volume of bandwidth from an ISP could be anywhere from $1 - $5 cost per GB of transfer, but let's calculate it at $1 per GB for an entire year. For this example, the total yearly savings for Slashdot would be: $3,650 USD!"
I run the spiders at Technorati, and it is 0.9 million posts a day, which Kevin Burton had correct in the post cited. Is the is the no dot effect?
First, As some AC points out, 0.001 PERCENT of 9 million is 90.
Secondly, that would be posts, i'm assuming the intelligent stuff tends to be not in 90 seperate posts, but with multiple intelligent posts from the same person.
Third, since the original poster somehow messed up and cited the number 9 million instead of the correct number, 900,000 , that number is reduced to 9 posts a day, a reasonable amount to read.
It has absolutely sod-all to do with XHTML. HTML 4.01 and XHTML 1.0 are functionally identical. You can use table layouts and <font> elements with XHTML 1.0 and you can use CSS with HTML 4.01.
You are referring to separating the content and the presentation through the use of stylesheets. This has nothing to do with XHTML, although it would save a hell of a lot of bandwidth if Slashdot implemented it. They are implementing it.
Bogtha Bogtha Bogtha
Are you saying that you read the logs directly/manually?
See AWStats
If your weblog server implements ETag and Last-Modified, my spider can send a one packet request with the values I last saw from you, and you can send a one packet 304 response if nothing has changed.
Charles Miller explained this well a few years ago.
(I run the spiders at Technorati).
Most sane webservers GZIP the content. XML compresses extremely well. (In other words, gzipped XML is just as efficient space-wise as a binary memory dump. And much easier for mere people to understand.)
My other car is first.
This effect is called the The long tail effect, and is visible all over the web. For instance, Amazon.com says that every day, it sells more books that didn't sell yesterday than the sum of books sold that *also* sold yesterday. In other words, they sell (in sum) more of the items selling less than one every other day than of items selling (by type) more than that.
Eivind.
Doubting the existence of evolution is like doubting the existence of China: It just shows that you're uninformed.