Slashdot Mirror


Is RSS Doomed by Popularity?

Ketchup_blade writes "As RSS is becoming more known to the mainstream users and press, the bandwidth issue reported by many sites (Eweek, CNet, InternetNews) related to feeds is becoming a reality. Stats from sites like Boing Boing are showing a real concern regarding feeds bandwidth usage. Possible solutions to this problem are emerging slowly, like RSScache (feed caching proxy) and KnowNow (even-driven syndication). RSScache seems to offer a realistic solution to the problem, but can this be enough to help RSS as it reaches an even bigger user base in the upcoming year?"

18 of 351 comments (clear)

  1. Welcome to the internet by Anonymous Coward · · Score: 2, Informative

    Where we use "push" technologies for everything that functionally pulling information, and "pull" technologies for everything that functionally pushes information.

    Whee!

    And the funny thing here is, if RSS had-- at its conception-- included caching and push-based update notification and all the other smart features that would have prevented this sort of thing from becoming a problem now, [i]it would never have been adopted[/i], because the only reason RSS succeeded where the competing standards to do the same thing failed was because RSS was so simple.

  2. About time for asynchronous by benow · · Score: 3, Informative

    Asynchronous event driven models are the way to go for changing content. They're trickier to code, but require less bandwidth and are more responsive. Perhaps a bit of a privacy issue, at some level (registration with source), but easy to implement, failure resistent distributed asynchronous networks have much applicability, not just to RSS.

  3. Solutions by markfletcher · · Score: 5, Informative

    There are several ways to mitigate the bandwidth issues. First, all aggregators should support gzip compression and the HTTP last-modified and etags headers. That'll take care of a lot of the problems. The other solution is to get people to use server based aggregators, like Bloglines, which only fetch a feed once per iteration, regardless of how many subscribers there are. As a bonus, there are several things that server-based aggregators can do that desktop based aggregators can't do, like provide personalized recommendations. I like this solution, but of course I'm biased since I'm the founder of Bloglines. :)

  4. A simple fix by jd · · Score: 2, Informative
    What you have is a large number of subscribers accessing a common data source at or around the same time. The simplest fix would be to have a reliable multicast version of RSS, which is broadcast to all subscribers to that feed. Then, you only have to transmit the updates once. The network would take care of it from then on.


    New subscribers would receive the initial copy of the feed via traditional unicast TCP, because that would be the least CPU-intensive way of handling a few requests at a time.


    A caching system won't work for the same reason web caches have never caught on in the US - people are terrified of being sued to smithereens for potential copyright infringement. Even if any case would be thrown out of court instantly (by no means certain in the US) the costs would be prohibitive and malicious plaintifs rarely ever get asked to pay costs.


    The main problem with the multicast solution is that although multicasting is enabled across the backbone, most ISPs disable it - for reasons known only to them, because it costs nothing to switch it on. Persuading ISPs to behave intelligently is unlikely, to say the least.

    --
    It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
  5. Solved, move on by Jeremiah+Blatz · · Score: 3, Informative
    Shrook for the Mac has already solved this issue with "distributed checking". Popular sites are checked once every 5 minutes, if the site is updated, everyone gets the latest content, otherwise, nobody touches it.

    As another poster has pointed out, banning users who check too frequently is an excellent fallback. A tiny site won't know to install the software, but it won't be an issue for a tiny site.

  6. Re:They just need to follow ./'s lead by NeoSkandranon · · Score: 3, Informative

    If the server initiated the connection then RSS would be useless to nearly everyone who's behind a router or firewall that they do not administer.

    The server would also need to have a list of clients to send the refresh to, which means you'd need to "sign up" so the server puts you on the list.

    Nevermind the difficulties that dynamic IP addresses would cause. It's generally easier if the user initiates things.

    --
    If you can't see the value in jet powered ants you should turn in your nerd card. - Dunbal (464142)
  7. Re:RSS readers don't cache! by maskedbishounen · · Score: 5, Informative

    To some extent, this could be blamed on the feed itself. Ideally, it works like this..

    When you request the feed, you first get sent your normal HTTP header. If properly configured, it will return a 304 if you have the most recent version -- however, as many feeds are generated in PHP[1], this header is defaulted off, and you'll end up with your standard 200, or go ahead, code. This single handedly wastes a metric tonne of bandwidth needlessly.

    Even if you're trying to rape a feed, you'll only be wasting a few hundred bytes at most every half hour, than the whole 50K or whatnot size it is.

    See here for a more detailed explanation.

    [1] This is not a PHP specific issue; a lot of dynamic content, and even static content, fails to do this properly. But this is what it's there for, after all.

    --
    "An infinite number of monkeys typing into GNU emacs would never make a good program."
  8. Re:RSS readers don't cache! by IO+ERROR · · Score: 4, Informative
    For instance, the GPL blog software Word Press doesnt do ANY cacheing.

    Technically true but misleading. WordPress allows user agents to cache the RSS/Atom feeds, and will only serve a newer copy if a post has been made to the blog since the time the user agent says it last downloaded the feed. Otherwise it sends a 304. This is in 1.3-alpha5. I dunno what 1.2.1 does.

    Not to mention, a lot of these RSS readers are big sites like bloglines, newgator, etc who should be respecting bandwidth limits, but really have no incentive to do so.

    Not coincidentally, these are the egregious worst offenders I mentioned. Bloglines grabs my RSS2 and Atom feeds hourly, and doesn't cache or even pretend to. Firefox Live Bookmarks appears to cache feeds, but your aggregator plugins might not. I can't (yet) tell the difference from the server logs between Firefox and the various aggregator plugins.

    The best ones are the syndication sites that only grab my feeds after being pinged. Too bad I can't ping everybody. That could solve the problem if there was some way to do that.

    --
    How am I supposed to fit a pithy, relevant quote into 120 characters?
  9. Re:They just need to follow ./'s lead by interiot · · Score: 2, Informative
    This question has been asked many times, and has been answered better than I'm able to.

    But the gist of it is that push-media and multicast are either a thankfully-dead-fad, or are a technology whose time has yet to come. Push media, in particular, was salivated over quite a bit in the late 90's (eg. see Wired's 1997 cover article on it), so it's not as if it's a new idea. Despite this, push and multicast haven't gained wide success yet. Lots of people have various reasons why, and some of them are actually quite insightful. Google more if you want, but at least be aware that if one simply repeats the thoughts of the past in this area, one isn't likely to be successful.

  10. Re:They just need to follow ./'s lead by interiot · · Score: 4, Informative

    You know what happens then? The same thing they do when you hamper your RSS feed in any other way, they scrape your HTML and create their own feeds. Slashdot doesn't monitor their front page as closely as they do their rss page, so you can get away with quite a bit of abuse, at least for a while. They've blacklisted my IP ocassionally when I got overzealous though.

  11. Re:They just need to follow ./'s lead by Electroly · · Score: 5, Informative

    HTTP 1.1 already supports this. A conditional HTTP request can be made which basically asks the server if the file has been updated. The server can then respond a 304 Not Modified and avoid sending the entire RSS file again. Unfortunately, poorly written RSS aggregators don't implement this, and it is those aggregators that are the real problem here. They typically are the ones with the default 5 minute update time, too.

  12. Slashdot's RSS blocking policy by jamie · · Score: 4, Informative
    Slashdot blocks your IP from accessing RSS if you access our site more than fifty times in one hour. I think that's reasonable, don't you? Especially since our FAQ tells you to request a feed only twice an hour.

    Every complaint about this that I've investigated has turned out to be either a broken RSS reader or an IP that's proxying a ton of traffic (which we usually do make an exception for).

    Oh, and if you want to read sectional stories in RSS, then:

    • create a user if you haven't already,
    • edit your homepage to include sectional stories you like (and exclude those you don't),
    • then reload the homepage and copy that "rss" link at the very bottom of the page. It will be customized to your exact specs!

    Slashdot's RSS traffic, like Boing Boing's, is huge, and blocking broken readers has saved us a ton of bandwidth, which of course means money. We were one of the first sites to do this but (as this story suggests) you'll see a lot more sites doing it in the future. I think our policy is fair.

    1. Re:Slashdot's RSS blocking policy by jamie · · Score: 2, Informative

      Sorry, I goofed, that feature I described is subscriber-only. I'd forgotten that. I'll update the FAQ to describe it.

  13. I agree with you. by mcmasuda · · Score: 3, Informative

    I just fired up ethereal and refreshed my RSS reader. Out of the dozen or so feeds I monitor, a few of them are using Etags and sensible cache-control headers, so I just get 304 Not Modified. Of the rest, not a single one is compressing even though my client is specifying gzip and deflate in its Accept-Encoding header.

    HTTP compression will work even better here than it does for regular pages - RSS is basically all text so every response is going to be compressible. Looking at a handful of my feeds, some quick messing about with wget & gzip gives me an average compression ratio of 3:1. That's a 66% reduction in bandwidth utilization. If just half of your clients support HTTP compression, it's still a significant savings.

    Now, the article is talking about poorly designed aggregators that don't check whether the content's changed (I'm assuming he's talking about Etags). There's not much you can do about that, but using compression for capable clients sure seems like it would be a good thing.

  14. Re:Swarming (Like BitTorrent) is the answer by Jerf · · Score: 3, Informative

    BitTorrent works great for distributing media like ISOs. Folks, it can distribute "little" stuff, too.

    No, it "can't". Or at least, it can't serve it with any benefit. Tracker overhead swamps any gains you might make. BitTorrent is unsuitable for use with small files, unless the protocol has radically changed since I last looked at it. In the limiting case, like 1K per file, it can even be worse or much worse than just serving the file over HTTP.

    Inside the RSS file, users can try to get the webpage (and all its images, etc.) through the torrent first.

    Oh, here's the problem, you don't know what you're talking about or how these technologies work. When an RSS file has been retrieved, there is nothing remotely like "get the webpage" that takes place in the rendering. The images are retrieved but those are typically too small to be usefully torrented too.

    Regretably, solving the bandwidth problem involves more than invoking some buzzwords; when you're talking about a tech scaling to potentially millions of users you really have to design carefully. Frankly, the best proof of my point is that it was as easy as you say it is, it'd be done now. But it's not, it's hard, and will probably require a custom solution... which is what the article talks about, coincidentally.

  15. RFC3229+feed defines "delta" encoding for RSS by bobwyman · · Score: 2, Informative

    Your suggestion is precisely what is defined by RFC3229+feed (i.e. an RSS-specific extension to RFC3229 " delta encoding for HTTP). I maintain a list of implementation of RFC3229+feed on my blog. You can also find some empirical evidence showing massive bandwidth savings as a result of RFC3229+feed use.

    This is a well known and "solved" issue...

    bob wyman

  16. RSS Throttling Script by TheLoneGundam · · Score: 2, Informative

    Glenn Fleishman, of Wi-Fi Networking News has written a script to throttle the poorly-behaved aggregators and writes about it on his personal blog.

  17. There are ways to mitigate the impact by TNLNYC · · Score: 2, Informative

    I'm reproducing this article from my own site (all the links are on the site):

    Capacity planning and RSS
    September 9, 2004

    Robert Scoble points to MSDN having issues with full entry RSS. What it comes down to is a capacity planning exercise.

    In his note, he says that RSS is broken. I personally believe that at issue is not whether RSS is working or not. RSS is working but it has complicated the bandwidth issue. At issue is the fact that RSS feeds are generally generating more traffic to a site. Because RSS readers are polling the site to check if a feed has been updated, the traffic patterns change, with increased numbers of spikes on a hourly basis. This is similar to some of the issues network administrators started facing when Pointcast first appeared.

    There are a number of ways to mitigate the issue.

    HTTP Conditional GET for RSS
    First of all, one of the things to consider when using RSS is to create conditional HTTP headers on RSS feeds. This helps mitigate some of the impact by ensuring that feeds are only served if the content has changed.

    Feed Compression
    The next item to think of is to use compression when serving feeds. By doing so, one reduces the size of the payload, which ends up being much better in terms of managing bandwidth. In my own experience, because RSS is primarily text, I've seen a reduction of 80% of the bandwidth when delivering RSS feeds in a compressed format. That represents a fairly large gain in bandwidth that can then accommodate more users.

    Change the polling schedule
    The RSS 2.0 specification already offers a number of optional elements to give RSS readers a better idea as to when to get content. For example, the pubDate element offers information as to when a feed was last published, as does the lastBuildDate one. ttl (aka. time to live) can also be used to indicate to the software that this feed should live for a certain amount of time. Finally, skipHours and skipDays offers more pointers as to when RSS reader software should not poll. With all those mechanisms in place, it looks like a lot of flexibility exists in the format to accommodate scalability.

    When all else fails, reduce
    If all of the above still fail, RSS publishers should look at reducing the size of their feeds. There are two ways you can do this. First, you can just say that you're not going to offer full-text feeds. This seems to be the option that Scoble hates. Another way to do things is to offer both abbreviated feeds and full-text feeds or offer more detailed feeds, as I do on TNL.net.

    An important consideration when doing something like this is how to address them. By default, users who just use the RSS autodiscovery feature will only get the abbreviated feed. However, they still have the option to go and get the full-text version. The compromise here is that users who just want to subscribe quickly can do so at a lower bandwidth costs, while power users can seek out the fuller feed and subscribe to that. The result, in my experience, is that most people use the autodiscovery feature, grabbing the smaller feed. Some power users do seek out the fuller feed and subscribe to that instead (based on the numbers, I'm seeing a 5% usage of the full-text feed as opposed to the default abbreviated one. This is a compromise solution that seems to accomodate everyone involved to date.

    Final considerations
    When publishing RSS feeds, your audience grows, which results in traffic growth too. One of the thing to realize is that RSS feeds are generally stickier than the rest of a site. What this means is that, for every new subscriber you get, you will see an on-going increase in your overall site traffic stats. This is not a bad thing as messages emanating from your site do get a higher passive readership. One of the thing that new syndication standards should consider is a follow-up on this. While RSS publisher know how many feeds are being pushed out, there is littl

    --
    Check out http://www.tnl.net/blog