Slashdot Mirror


How Much Bandwidth is Required to Aggregate Blogs?

Kevin Burton writes "Technorati recently published that they're seeing 900k new posts per day. PubSub says they're seeing 1.8M. With all these posts per day how much raw bandwidth is required? Due to innefficiencies in RSS aggregation protocols a little math is required to understand this problem." And more importantly, with millions of posts, what percentage of them have any real value, and how do busy people find that .001%?

10 of 209 comments (clear)

  1. All at once by someonewhois · · Score: 5, Interesting

    It would make a lot more sense to have a protocol where you check one file that has a list of links to another XML file, and then the aggregator figures out which of those URLs has NOT been aggregated, then it downloads the other XML file which has the post-specific info, which it proceeds to display. That would save a lot of bandwidth, I'm sure.

    1. Re:All at once by broward · · Score: 4, Interesting

      The bandwidth isn't going to matter much.

      The blog wave is close to an inflection point,
      probably within six to twelve months...
      which means that total bandwidth will probably
      top out at about TWICE the current rate.

      http://www.realmeme.com/Main/miner/preinflection/b logDejanews.png

      I suspect that even now, many blogs are
      starved for readership as new blogs come online
      and steal mental bandwidth.

  2. Bandwidth wasted for non-xhtml pages? by bdigit · · Score: 5, Interesting

    How much bandwidth is /. wasting every month by not creating a standard xhtml page even though someone created one for them already

    1. Re:Bandwidth wasted for non-xhtml pages? by A+beautiful+mind · · Score: 4, Interesting

      Normally you would be right, but now you're banging open doors. CmdrTaco and others are actively working on a new CSS-using formatting of slashdot.

      --
      It takes a man to suffer ignorance and smile
      Be yourself no matter what they say
    2. Re:Bandwidth wasted for non-xhtml pages? by A+beautiful+mind · · Score: 4, Interesting

      Oh yea, here is the link about it.

      --
      It takes a man to suffer ignorance and smile
      Be yourself no matter what they say
  3. Don't forget the robots by astrashe · · Score: 4, Interesting

    I used to have a blog that I recently shut down because no one read it.

    No one read it, but I got a ton of hits -- all from indexing services. WordPress pings a service that lets lots of indexing systems know about new posts. Some of them -- Yahoo, for example, were contstantly going through my entire tree of posts, and hitting links for months, subjects, and so on.

    It didn't bother me, because the bandwidth wasn't an issue, and it wasn't like they were hammering my vps or anything. It mostly just made it really hard to read the logs, because finding human readers was like looking for a needle in a haystack.

    But bandwidth is cheap, and RSS is really useful, so it seems at least as good of a use for the resource as p2p movie exchanges.

  4. Rather than assuming... by llZENll · · Score: 5, Interesting

    Rather than a making all these assumptions why not just email Bob Wyman and ask him?

    "How much data is this? If we assume that the average HTML post is 150K this will work out to about 135G. Now assuming we're going to average this out over a 24 hour period (which probably isn't realistic) this works out to about 12.5 Mbps sustained bandwidth.

    Of course we should assume that about 1/3 of this is going to be coming from servers running gzip content compression. I have no stats WRT the number of deployed feeds which can support gzip (anyone have a clue?). My thinking is that this reduce us down to about 9Mbps which is a bit better.

    This of course assumes that you're not fetching the RSS and just fetching the HTML. The RSS protocol is much more bloated in this regard. If you have to fetch 1 article from an RSS feed your forced to fetch the remaining 14 addition posts that were in the past (assuming you're not using the A-IM encoding method which is even rarer). This floating window can really hurt your traffic. The upside is that you have to fetch less HTML.

    Now lets assume you're only fetching pinged blogs and you don't have to poll (polling itself has a network overhead). The average blog post would probably be around 20k I assume. If we assume the average feed has 15 items, only publishes one story, and has a 10% overhead we're talking about 330k per fetch of an individual post.

    If we go back to the 900k posts per day figure we're talking a lot of data - 297G most of which is wasted. Assuming gzip compression this works out to 27.5Mbps.

    Thats a lot of data and a lot of bloat which is unnecessary. This is a difficult choice for smaller aggregator developers as this much data costs a lot of money. The choice comes down to cheap HTML index ing with the inaccuracy that comes from HTML or accurate RSS which costs 2.2x more.

    Update: Bob Wyman commented that he's seeing 2k average post size with 1.8M posts per day. If we are to use the same metrics as above this is 54G per day or around 5Mbps sustained bandwidth for RSS items (assuming A-IM differentials aren't used)."

  5. Slashdot = blog = ironic by Lovejoy · · Score: 3, Interesting

    Does anyone else wonder why Slashdot editors seem to have it in for blogs? Is it because in Internet years, Slashdot is as old and sclerotic as the Dinomedia? Is Slashdot the Dinomedia of the new media?

    Does anyone else consider it ironic that the Slashdot editorship HATES blogs, but Slashdot is actually a blog?

    Anyone else getting tired of these questions?

  6. Value by lakin · · Score: 5, Interesting

    what percentage of them have any real value

    I had for a while held the view that most blogs out there are pointless. Some can be insightful and some are basically used as company press releases, but most are people talking about their days activities that few people really care about, and a few of my friends have blogs like these. When I asked one whats the point, she said she just blogs stuff she would normally mention to many people on msn throughout the day. Its not meant to have value to anyone on slashdot, be hugely insightful, or detail some breathtaking new hack, its simply another way for her to talk to friends (that doesnt involve repeating herself).

    --
    Paul
  7. Re:How much? If everyone GZipped, a lot less! by ZorbaTHut · · Score: 4, Interesting

    As I remember, www.livejournal.com has experimented with gzip compression several times. They've discovered that the price of the CPU far exceeds the price of the bandwidth.

    Bandwidth is cheap. Computers, not so much.

    --
    Breaking Into the Industry - A development log about starting a game studio.