Is RSS Doomed by Popularity?
Ketchup_blade writes "As RSS is becoming more known to the mainstream users and press, the bandwidth issue reported by many sites (Eweek, CNet, InternetNews) related to feeds is becoming a reality. Stats from sites like Boing Boing are showing a real concern regarding feeds bandwidth usage. Possible solutions to this problem are emerging slowly, like RSScache (feed caching proxy) and KnowNow (even-driven syndication). RSScache seems to offer a realistic solution to the problem, but can this be enough to help RSS as it reaches an even bigger user base in the upcoming year?"
Remember all the hype about "push" technology back in the mid-nineties? Nobody was interested, but RSS feeds are being used in much the same way now. I'm thinking there are two significant differences: 1) with RSS, the user feels like they're in control of what's going on; with push, users felt like they were at the mercy of whatever money-grabbing corporations wanted to throw at them, and 2) a hell of a lot of people now have an always-on Internet connection with plenty of bandwidth to spare. When you've got a 33.6kbps dialup connection, you use the Internet differently than when you've got DSL or cable.
How much bandwidth does Slashdot's RSS feed use?
It looks like the RSS feed on my home page has a small handful of subscribers. Neat.
$x='S24;r)>63/* h@<5+oZ)32"5cz';$me='phroggy'x$];
$x=~y+ -xz+\0-Tx+;print$_^chop$me for split'',$x;
And institute jackboot banning policies if you access them more than x times per y hours.
404 File Not Found
The requested URL (developers/04/12/08/2321217.shtml?tid=126&tid =8) was not found.
If you feel like it, mail the url, and where ya came from to pater@slashdot.org.
...do as Slashdot?
Ban everyone querying its RSS feed more than once a hour?
Where we use "push" technologies for everything that functionally pulling information, and "pull" technologies for everything that functionally pushes information.
Whee!
And the funny thing here is, if RSS had-- at its conception-- included caching and push-based update notification and all the other smart features that would have prevented this sort of thing from becoming a problem now, [i]it would never have been adopted[/i], because the only reason RSS succeeded where the competing standards to do the same thing failed was because RSS was so simple.
One thing that would help immensely is if RSS readers/aggregators would actually cache the RSS feed and not download a new copy if they already have the most current one. I could go through my server logs and point out the most egregious problem aggregators if anyone's interested.
How am I supposed to fit a pithy, relevant quote into 120 characters?
doesn't stop me from hitting refresh a million times on Slashdot.
Doesn't Usenet deal with exactly this kind of problem? It's not quite instant, but neither is rss, since most people aren't polling non-stop.
Tim
Do most sites have sort of limit to how many times you can access the RSS feed in a given period of time? It seems like limiting requests to once an hour would cut down bandwidth considerably. There is always those people who think they need up to the second updates.
Hacker Media
rsstorrent -- distributed rss,echoing bittorrent?
So first BoingBoing gets in trouble because of all the RSS traffic.. and now they're about to be slashdotted. Tough luck
It probably won't help the sites to slashdot them :)
I'm sure someone's already thought of this, but what if the RSS reader was required to submit the "code" of the latest feed that was recieved. Then, the only thing that would be sent to the reader were the more recent articles.
If they're serving pages in HTML (or whatever) anyway, who cares what the format is? Who cares what pulls the info off their server?
Oh, yeah, advertisers. RSS could be an annoying reminder to them that the internet used to be ad-free.
What you're seeing right now are teething troubles. Nothing more, nothing less. The bandwidth and consumption experienced right now will be laughed off a couple of years from now as miniscule.
Take the BBC News website for example. On September 11th 2001 its traffic was way beyond anything it had experienced to that point. Within a year or so, it was comfortably serving more requests and seeing more traffic every day. Proof if it was needed that capacity isn't the issue when it comes to Internet growth, and won't be for the foreseeable future.
RSS is in its infancy. Just because people didn't anticipate it being adopted as fast as it has been that doesn't make it "doomed". By that rationale, the Internet itself, DVDs, digital photography, etc are all "doomed" too.
"Accept that some days you are the pigeon, and some days you are the statue." - David Brent, Wernham Hogg
Just what is the scaling behavior of RSS, anyway?
Instead of downloading the entire RSS feed every time, why not have aggregators indicate to the server the timestamp of the last time the RSS feed was downloaded, or the timestamp of the last item in the feed the aggregator knows about, and then the server can dynamically generate the RSS with only new content for that client. Increases processing load while reducing bandwidth, but processing time is what most servers have lots of, not to mention it's far cheaper to increase than bandwidth.
Asynchronous event driven models are the way to go for changing content. They're trickier to code, but require less bandwidth and are more responsive. Perhaps a bit of a privacy issue, at some level (registration with source), but easy to implement, failure resistent distributed asynchronous networks have much applicability, not just to RSS.
There we go. You now have version control.
Keep copies of the RSS on the server for 30 days.
http://www.mysite.com/requestfeed?myversion=200
diff the new version from the old version. Send whats changed.
How fucking hard is that people?
RSS feeds are meant as a way to strip all the nonsense from a site and offer easy syndication, right? Basically, present the relevent news from a full-fledged webpage in a smaller file size? If such is the case, this isn't an RSS issue, really. I see it more as a bandwidth issue. I mean, people are going to get their news one way or the other.. either with a bunch of images and lots of markup via HTML or with just the bare minimum of text and markup via RSS. I would prefer RSS over HTML any day of the week! But perhaps RSS makes syndication TOO simple. Thus everyone does it and that eats additional bandwidth that normally would be reserved for those browsing the HTML a site offers.
And you could implement bans on people who request the RSS feed more than X times per hour as someone suggested (Doesn't /. do this?), but I don't think that gets around the bandwidth issue. I mean, those who want the news will either go with RSS or simply hit the site. Again, RSS is the preferred alternative to HTML.
So here's my suggestion.. go to nothing but RSS and no HTML!
What is your penile percentile?
"Is RSS Doomed by Popularity?"
:)
"Is Instant Messaging Doomed by Popularity?"
"Is E-Mail Doomed by Popularity?"
"Is Usenet Doomed by Popularity?"
"Is The Internet Doomed by Popularity?"
"Is Linux Doomed by Popularity?"
"Is Apple Doomed by Popularity?"
"Is Netcraft Doomed by Popularity?"
"Is Sex with Geeks Doomed by Popularity?"
... it's always too busy.
I hereby place the above post in the public domain.
Following along the same line of reasoning, why not have the RSS reader send one request, and then changes are pushed to the reader after that? The reader can cache the change so if the user hits reload they get the most recent cache rather than hitting the server again.
There are several ways to mitigate the bandwidth issues. First, all aggregators should support gzip compression and the HTTP last-modified and etags headers. That'll take care of a lot of the problems. The other solution is to get people to use server based aggregators, like Bloglines, which only fetch a feed once per iteration, regardless of how many subscribers there are. As a bonus, there are several things that server-based aggregators can do that desktop based aggregators can't do, like provide personalized recommendations. I like this solution, but of course I'm biased since I'm the founder of Bloglines. :)
New subscribers would receive the initial copy of the feed via traditional unicast TCP, because that would be the least CPU-intensive way of handling a few requests at a time.
A caching system won't work for the same reason web caches have never caught on in the US - people are terrified of being sued to smithereens for potential copyright infringement. Even if any case would be thrown out of court instantly (by no means certain in the US) the costs would be prohibitive and malicious plaintifs rarely ever get asked to pay costs.
The main problem with the multicast solution is that although multicasting is enabled across the backbone, most ISPs disable it - for reasons known only to them, because it costs nothing to switch it on. Persuading ISPs to behave intelligently is unlikely, to say the least.
It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
Nothing a little tweaking can't fix...
http://blog.glennf.com/mtarchives/004540.html
I love RSS. I use it all the time with Firefox, Thunderbird and Trillian Pro. I just adore it.
Kamran A
Just an idea that pop out of my head without further think: why not RRSoP2P (RSS over P2P)? Sharing streams across peer users.
Although the caching solution seems intriguing, the onus should really be on the aggregator authors to do at least local caching for RSS access between "refresh" intervals and even better, use HTTP conditional GETs. It's also important to use sane default "refresh" intervals and constraints.
During our product's development, our debugging refresh interval was 5 minutes and hardcoded to Slashdot. As you can imagine, it didn't take us long to discover Slashdot's unique banning mechanism -- it woke us up to the problem of letting people check feeds way too often (this also before we had implement local caching).
However, if this bandwidth issue keeps getting worse, someone like Akamai is certainly going to think of a corporate solution.
Everyone loves RSS. If the sites with lots of users taking up bandwidth ask for some donations to keep it alive, i'm sure they'll get some help.
Interweb fanatics tend to be very generous.
As another poster has pointed out, banning users who check too frequently is an excellent fallback. A tiny site won't know to install the software, but it won't be an issue for a tiny site.
People polling my site for updates via RSS, would be good for my bandwith usage, because users will be retrieving a small amount of data and noticing nothing has changed... instead of doing a full access to my site, requesting images, etc.
That would be savings at the long term. Or not? What's the deal going on here?
The Podcasters need it too. I'm subscribed to a couple dozen feeds and have well over 4GB of files in my cache right now.
The biggest problem with Bittorrent and podcasts is that the RSS aggregators needs to be Bittorrent aware. Unfortunately, few are.
A firewall can not protect you from yourself. Turn off what you do not need. Do not use the firewall to do your work.
Are Question Marks
sulli
RTFJ.
Seems like bittorrent, or a bittorrent-alike protocol would be useful here. Turn the RSSfeed into a tracker/seed and then all it has to keep track of is who has the latest version of the content and it could redirect feeders to each other, always preferring the latest updated version. Eventually, you will have the same scaling problems that bittorrent has (single tracker), but at least you stretch things out a few months or a year until a better solution ocomes around.
When I use my computer, I'm either sshed in or logged in locally through GDM. How can I record how long I'm on over a certain period of time? I'm using Ubuntu.
Okay, what about a distributed RSS feeder system?
Say you have 100 people who want to get feeds from 10 sites. Regular RSS has 100 people hitting each of those sites once per minute. They all get the same data.
However, if you have a system where a group of people all want the same site feeds, you could group them by their interest in sites. Within the pool, only x% of the sites interested in, for example, eweek.com would request the feed. Then, they would be available to distribute the feed to y% of the sites who would distribute to z% of the sites until everyone in the pool is up to date.
The same goes for other sites. Another subgroup in that pool would pick up slashdot.org and distribute that out to the rest of the pool in waves.
It doesn't change the overall bandwidth used, but it does decentralize it a great deal.
Alito: A vote for Alito is a punch in the eye to put that bitch back in her place!
Ex: Slashdot RSS via Coral
This seems like a fair method of reducing the amount of throughput... only permitting a certain number of requests per hour per user, or whatever time period one wishes.
I'm pretty sure there are other ways of going about it, though.
1. Send a header which specifies when the feed was last downloaded from this location. If I downloaded the feed an hour ago, I don't need the feed to contain articles which occurred half a day ago.
2. Include less articles in the RSS.
3. Push the RSS updates to users, using XMPP or similar, as sites like PubSub.com are starting to do.
But realistically, what would you want more: (a) someone fetching 1kb of RSS once every 10 minutes, or (b) someone fetching 10-50kb of HTML and assorted crap once every 10 minutes? It seems to me that for every RSS download a user makes, they're actually saving you bandwidth!
Karma: It's all a bunch of tree-huggin' hippy crap!
When I want updates from sites, I subscribe to an email feed, and stick it in its own mailbox. I agree that some standardized format and display would be nice, but you can send XML over email too, so what's needed is a reader that I can point to an IMAP mailbox full of XML mails.
An alternate approach would be to do the same thing with a news server. Why keep refreshing a feed for updates instead of letting it notify you when it has updates?
10 PRINT CHR$(205.5+RND(1)); : GOTO 10
How about the development of an RSS torrent? Lets decentralize they entire concept.
I, for one, welcome our old USENET overlords!
I dub thee... Sir Phobos, Knight of Mars, Beater of Ass.
If these sites would just take advantage of HTTP Compression and buy some cheaper bandwidth they wouldn't have a problem.
These places are all so uppity they'd never consider anything less than highlevel colocation with high bandwidth costs, or in-house connections. And while that's fine for what they normally do, they shouldn't be complaining about RSS eating up much bandwidth if they won't consider all their options.
I really hate it when my web site gets a lot of traffic. I publish my content in hopes that nobody will look at it, but because of my stupidity I took advantage of that little "live bookmarks" feature of Firefox, and ever since I've regretted it. Granted, I *could* put advertisements on the links links by the live bookmarks, but that would be too easy. I'd rather sulk and bitch about my bandwith usage because there are no providers out there that offer more than 500MB transfer for $0/month.
Please stop being interested in my content. Please?
Slashdot user GaryM posted a related question elsewhere about 20 months ago. At that time, in that forum, commenters dismissed his proposed solution, the use of NNTP, on the grounds that NNTP is deficient, but others continue to see NNTP as a possible solution nevertheless.
A lawyer & digital forensics examiner. Also an expert on open source software (OSS).
Now if only they'd bring back the $$$ from the mid 90s too.... :)
Is this whole in Korea.. to be the new in Soviet Russia..? Tell me it's not.
What is your penile percentile?
Every complaint about this that I've investigated has turned out to be either a broken RSS reader or an IP that's proxying a ton of traffic (which we usually do make an exception for).
Oh, and if you want to read sectional stories in RSS, then:
Slashdot's RSS traffic, like Boing Boing's, is huge, and blocking broken readers has saved us a ton of bandwidth, which of course means money. We were one of the first sites to do this but (as this story suggests) you'll see a lot more sites doing it in the future. I think our policy is fair.
How big are RSS files normally? I'd be surprised if the bandwidth involved in tracking and coordinating a whole bunch of clients would be significantly less than the RSS itself.
By the time you've told a client to "go an ask these other clients" you may as well have just sent it the RSS file.
Boffoonery - downloadable Comedy Benefit for Bletchley Park
Nick's got it covered
Devise, Repair, Solve, Build
Create a new first level domain ( like alt, comp, talk etc ) named "rss" and use an extra header to identify the originating rss feed URL. The latter header could be used by the RSS/NNTP reader to select which article bodies to download and to verify each RSS entry to identify fake posts.
I see some people talking about bit torrent like networks for rss feeds. Really, why not just zip it. If it were normal text, I'd expect to shrink file size down to 10% but we're talking about XML which has a lot more redundancy.
Of course we blocked your IP when you hammered our server. And we'll do it again. Duh. We monitor abuse on the whole site, not just RSS.
This still baffles me. BitTorrent works great for distributing media like ISOs. Folks, it can distribute "little" stuff, too.
A content creator (say Slashdot) has webpages and it has an RSS feed. They create a torrent for each page. They sign the RSS file and each torrent (and its content) with a private key. They post their public key on their homepage.
Now, you can cache the RSS file on other sites that support you yet the users can still be confident that it really came from you. Inside the RSS file, users can try to get the webpage (and all its images, etc.) through the torrent first. When the page loads locally in your browser, it could still go out and get ads if you are an ad sponsored site.
If you are a popular site and have a "fan base", you should have no problem implementing something along these lines. If you are a site that has these problems, you are probably popular and have a fan base. Given the right software and the buy-in from users, the problem solves itself.
Main issue is that some clients check far too many times the site or download whole content without checking the change time. ban those and you will be fine. same kind of issue dyndns.org or some other site was having with linksys clients.
Internet2
Well, RSS was simple, and everything you're talking about (caching, push-based update, etc) are application-level issues. Even though that stuff is defined in HTTP 1.1, it took years for HTTP 1.1 to come out.
If the web started with HTTP 1.1, it would never have gone anywhere because it's too complicated. There are parts of 1.0 that probably aren't implemented very well.
If you want to improve things, adopt an RSS reader project and add those features.
Bloglines is quite good, and I appreciate that it's very chummy with Firefox, but I'm not 100% satisfied with it. I wish I could articulate what bugged me about it (especially to the founder, heh), but I find it's slower for me to check bloglines than it is to just swoop through my bookmarks every once in a while.
By far the best RSS experience I've had has been with the Konfabulator RSS widget, which pops up when it finds a new entry and hides away when there's nothing new. It's elegant and simple. Bloglines is a fantastic service for aggregating large amounts of information, but it's still not very efficient at providing it to me quickly.
On topic, the problem with RSS seems to me to be that it's a solution in search of a problem that, in turn, creates more problems (through the non-caching, etc). I suppose it's useful for keeping tabs on all those LJ blogs you read (but don't admit to reading), but, honestly, if the BBC posts a news item about, say, a nuclear explosion in Karachi (apologies to both Indians and Pakistanis reading this), I don't want to wait for my RSS client to cycle. I guess there's some element of push and some element of browse that need to be mixed together.
Why does everybody seem to feel the need to have their last 20-25 posts in their feed? It's just going to mean wasted bandwidth, especially for websites that update infrequently. I'd say the last five posts would be sufficient for most weblogs and 10 for news sites like Slashdot and The Register.
Feed readers are the other issue. Many set their default refresh to an hour. I use SharpReader which has an adequate 4 hour default. I adjust that on a per feed basis. Some update once per day, and that's all I need it to load the latest.
I just fired up ethereal and refreshed my RSS reader. Out of the dozen or so feeds I monitor, a few of them are using Etags and sensible cache-control headers, so I just get 304 Not Modified. Of the rest, not a single one is compressing even though my client is specifying gzip and deflate in its Accept-Encoding header.
HTTP compression will work even better here than it does for regular pages - RSS is basically all text so every response is going to be compressible. Looking at a handful of my feeds, some quick messing about with wget & gzip gives me an average compression ratio of 3:1. That's a 66% reduction in bandwidth utilization. If just half of your clients support HTTP compression, it's still a significant savings.
Now, the article is talking about poorly designed aggregators that don't check whether the content's changed (I'm assuming he's talking about Etags). There's not much you can do about that, but using compression for capable clients sure seems like it would be a good thing.
many many bandwidth issues that were HTML related were solved by incorporating proxys between the viewers and servers, I fail to see why you couldn't do the same with index.xml
"Draco dormiens nunquam titillandus."
I'd be interested in seeing how many of these hits are for complete feeds rather than If-Modified-Since the last time it was downloaded. I suspect that if the RSS readers were behaving like nice User-Agents, we wouldn't see such reports.
Perhaps particularly offending User-Agents should be denied access to feeds. If I saw particular User-Agents consistently sending requests without If-Modified-Since, I'd ban them.
My car gets 40 rods to the hogshead, and that's the way I likes it!
What's the problem here, is everybody loading the full feed each time?
Wouldn't a client include a If-Modified-Since HTTP header in the GET request?
We're talking 200 bytes for a not-modified query.
Is it these 200-some-odd byte requests that people are complaning about?
My God, it's Full of Source!
OUTSIDE_IP=$(dig +short my.ip @outsideip.net)
Does Atom use significantly more or less bandwidth than RSS?
Hail Eris, full of mischief...
E pluribus sanguinem
A single peer to peer ad-hoc worldwide bittorrent style feed through which RSS stories are passed and spread. A great vision for the future.
The GeekNights podcast is going strong. Listen!
RSS is such a valueable service that not having them is unacceptable. Perhaps it could provide a better argument for Multicast. Service providers should all seriously accept the need for a widescale multicast solution. It's technology that we have today, but nothing substantial seems to be emerging in terms of native connectivity. This kind of service would provide a great delivery mechanism for traffic of this nature.
If I load a page through my Firefox, all the advertisements get blocked. So they surely aren't getting any revenue from my downloading of the entire HTML.
Meanwhile, if I load the story page via the RSS reader in Thunderbird, I can't block the ads. :-)
So clearly it can work both ways.
Karma: It's all a bunch of tree-huggin' hippy crap!
Why is there such concern over accessing RSS feeds as opposed to accessing the web site? Take for instance Slashdot: as of this writing, the main page is 65K and the RSS feed is 14K. Isn't this the case for most websites? So why the big fuss? If people are continuously refreshing the RSS feed, at least less bandwidth is consumed than if they were continuously refreshing the main page.
... Or is this one of those things where geeks have become so enamored with the technology that they go completely overboard with it? Are people refreshing the RSS feeds every two seconds or something?
"Slashdot's RSS traffic, like Boing Boing's, is huge, and blocking broken readers has saved us a ton of bandwidth, which of course means money."
So's using correct HTML, and CSS.
Any one else read is it, "Is RMS Doomed by Popularity?"
Now that would be something.
I wouldn't doubt that eventually someone will build a RSS caching device & sell it to the corporate market. Given how big a drain as RSS is to the supplier, the corporate market has the money and determination not to permit it to become a problem for them.
Chip H.
for well over a year http://scottraymond.net/archive/4745#4745
I use the Sage RSS reader extension for Firefox, and my web sites' feeds use Last-Modified and ETag headers which Sage respects. If I have 10 feeds from my site and none have updates, Sage merely checks the header and the server sends back a "HTTP/1.0 304 Not Modified". If all RSS readers cached feeds and if all servers serving RSS feeds used Last-Modified, then the load drops massively.
This gets a lot of caching behavior automatically.
- David A. Wheeler (see my Secure Programming HOWTO)
Everybody writing an RSS client or server script should read this and make it one of their main priorities.
I imagine even more bandwidth could be saved if the next version of the RSS or ATOM standards mandated rsync support.
The roots of education are bitter, but the fruit is sweet.
--Aristotle
One thing that would help is if people would stop stupidly going gaga over data wrappers that contain multiple times more metadata than data. XML (which RSS really is) and related technologies are blatantly dumb ideas - both in terms of reinventing a wheel that never needed to be invented in the first place, and in terms of being a massive waste of bandwidth.
-- Ed Carp, N7EKG erc@pobox.com PGP KeyID: 0x0BD32C9B What I'm up to: http://intuitives.mine.nu
A caching system won't work for the same reason web caches have never caught on in the US - people are terrified of being sued to smithereens for potential copyright infringement. Even if any case would be thrown out of court instantly (by no means certain in the US) the costs would be prohibitive and malicious plaintifs rarely ever get asked to pay costs.
Maybe the caching is just so damn good that you don't notice it. UIUC has a transparent web cache that I doubt 90% (of ~8,000) of the dorm students realize is there.
Also, Akamai is everywhere now. Is it not a caching solution?
I don't think web caches have failed... I just don't think they've been applied to the RSS problem.
Well, I've been thinking about this since RSS first came on my radar a few years ago, and it seems to me something like Jabber might be part of the solution.
I.e., instead of polling slashdot.org every hour, you maintain a persistence connection to your local Jabber server (at your ISP perhaps), which registers with slashdot's Jabber server. When a new story is published, slashdot's server notifies all the registered servers with the new story, which then distributes the content to each local news reader.
It would look and feel just like RSS readers do today.
And you might think "that's too complicated".. well, RSS today works over an HTTP server that you have to install and maintain, and you don't worry about the details of HTTP, why not a Jabber server?
I wish folks would think of this stuff BEFORE they start using RSS, but what can ya do.
Also, BitTorrent could be involved somehow to reduce bandwidth even more. For instance in the example above, Slashdot wouldn't have to distribute to EVERY listener, just enough for them to start downloading from each other.
All it takes is for a couple of the big RSS reader authors to add this and it will happen.. you just gotta have the guts to try. Next version of Mac OS will have Jabber libraries built-in I believe.. there's your chance!!!!
http://boingboing.net/stats/ says that Firefox contributes to 31.5% of user agents :-0
Push, if memory serves right, kept the connection open "forever" and when it was time for an update, the server just dropped the new content down the pipe. This may not be an appropriate solution, really...
:)
Since the whole thing as it is now is just HTTP requests anyways, I don't see why
a) clients couldn't poll at sane intervals, and
b) something like if-modified-since couldn't be used. Web servers have it out of the box if you serve static RSS manually, otherwise it is just one extra header in your CGI. And on the client side it is as easy, just one simple header to test against *first*.
I mean, come on.
Ok, so I haven't actually done any research. But it would surprise me greatly if those things are being done and there still is a problem.
Spine World
The problems with pointcast were more than just technological. Intercast was a better push technology for the pre-broadband era, and even now with HDTV and cable/satellite. It still may be a good choice.
It was meant for syndication. So that one website could gather syndicated news from other sites. It was not meant for individual readers to use it as a news update service. Simply using an appropriate protocol would solve this problem, but do to the blogtard community, this will never happen. And so RSS is doomed to be used stupidly just like it is now.
Conditional GET request and centralized feed reading services like the ones mentioned in this post are important to keep syndication bandwidth down. Also, beating people that update feeds every 5 minutes is good too.
"I'll say it again for the logic-impaired." -- Larry Wall.
Back in September Microsoft blogging evangelist Robert Scoble warned that RSS is broken, saying the sky was falling and RSS bandwidth usage was forcing Microsoft to skinny down its feeds. Turns out it wasn't quite true. Microsoft's IT folks thought 400KB feeds were excessive, and RSS feeds are no big deal compared to 106 million downloads of the 75MB SP2 update. But the ensuing debate produced some useful discussion among RSS enthusiasts about ways to make clients smarter and give more server-side control. See the writeup at Netcraft (Slashdot is noted as an early adopter).
RichM
Data Center Knowledge
This might be a good time to point out that I've often had trouble with Slashdot's RSS feed. I use the News plugin for Trillian Pro to access it every 32 minutes... Yet I'm often being banned. Have you heard of problems with this reader before? Or is it slashdot that's broken?
Use RSS (or rather, Atom) over XMPP (Jabber!)
Like, Duh!
I have throught that some form of randomizing function in an RSS reader would be a good idea; so that the readers aren't all hammering at the door at the same time (ie the top of the hour).
Don't know if that would make much of a dent or not.
You've just made me a subscriber again!
The internet community had better start looking at how to integrate Bit Torrent into the internet itself It could very well save it from choking on itself. I beleve that eventually that in return for using the net you should be required to spread the love. This model also makes it a bit harder for illicit enterprises to operate.
411 Y0UR 8453 4R3 8310NG 70 U5!! -NSA
I wonder how long it takes to come into effect. Still just index.rss here.
Close enough to be a dupe? You Decide.
Nevertheless, this is not an issue, but like the unwashed shrills squawking that the end of Social Security is nigh, RSS is far from being dead. The issue is that ignorant (maybe I should say 'stupid') people did not bother to implement the spec properly in their RSS reader code. I'm not talking about the RSS spec, but the HTML spec. This is a simple two step process (credit Charles Miller):
If the RSS feed has not been updated since you last polled, you will get a 304: Not Modified in response, but no RSS feed (because it has not changed, duh).
It's like in The Army, you know--The Great Prince issues commands, founds states, vests families with fiefs. Inferior people should not be employed (creating broken RSS readers).
Yeah, right.
I assume the complainers are using it?
51894b boingboing.rss.xml
17842b boingboing.rss.xml.gz
No, I did not read the f***ing article!
But, the folks involved with RSS / Atom are "wire protocols should be xml" types who like the idea of using an XML-RPC call too much to give it up easily.
It's too bad, since it doesn't really need to be death to a site to have too many people subscribe. If I post an article on Usenet, 100 million people could read it tomorrow and it wouldn't cost me a cent. There are some problems with it, but problems that could have been solved in a lot less work than what it's going to take to fix syndication now.
Here is some more of my ranting about this.
http://www.informatik.uni-oldenburg.de/~ulli/why-r ss-sucks.html
Your suggestion is precisely what is defined by RFC3229+feed (i.e. an RSS-specific extension to RFC3229 " delta encoding for HTTP). I maintain a list of implementation of RFC3229+feed on my blog. You can also find some empirical evidence showing massive bandwidth savings as a result of RFC3229+feed use.
This is a well known and "solved" issue...
bob wyman
first and foremost, use a publish/subscribe system. no more polling! jabber may be a good framework for this. upon joining an rss subscription you get a refresh and after that you get deltas.
as has been mentioned many times already, a swarming delivery service would be helpful.
Yeah, "RSS was a really stupid protocol".
As was HTTP, and the idea of putting protocol specifiers in names, and Napster, and Microsoft Dfs and a dozen other protocols which were designed to varying degrees of poor.
People invent these non-scaling, incredibly wasteful protocols that seem like they work fine for screwing around with their three buddies when you're willing to dedicate $300 worth of server hardware and 1 Mb/s of network bandwidth per user.
But when you try to handle hundreds of thousands of users for $1 each, those protocols won't ever work.
Of course, if you're a physicist, or a freshman, or an MBA, you're likely to assume that one, ten, a thousand users, it's all the same.
And never stop to think about tens or hundreds of thousands of users.
You'll need to parse the file every time someone ask for it. So you're just trading exceessive bandwidth usage for excessive CPU load.
I've always been held back from using RSS by the fact that there is no way to syndicate the "page of the day". For example, as I write this, it is Thursday in Europe and Australia but Wednesday in California and points west. Whatever time of day the feed is retrieved, it's going to be wrong for someone.
Do I have to create 28 separate feeds, one for each timezone, or has someone come up with a better solution?
Yes.
Pretend that something especially witty is here. Thanks.
Using the Coral Web cache for RSS is simple and requires NO modifications to web sites and RSS clients. The server load and bandwidth usage can be reduced in no time. See Making RSS scale with Coral.
should be possible to store/read cookies by the rss feed server... why just don't store the date of last access and only send new items like this rss-cache site does with ip's ?
Well, Jabber is a solution for many tasks. For me, it's gathering RSS feeds. There is an RSS transport on a particular server, it gets the feeds about hourly and sends the new messages to me. If everyone would move their RSS needs to Jabber (and also help to develop the transports!), the bandwidth problem will cease to exist!
P2P RSS... hmm sounds familiar.... oh yeah... It's called usenet... nothing new here but extra xml...
--------------------------------- Born Again Bourne Again Believer: New Life, GNU/Linux Be Free!
Very simple solution: combine an RSS client with BitTorrent.
XML munches up bandwidth like a lardy butter lover. Yes, yes, RSS feeds are handy, but they dont actually do anything that couldnt be achieved with a much leaner binary format. Its 2004, we dont have byte compatablitily issues any more
See Roedy Greens (one time comp.java.lang FAQ maintainer)excellent essay on why XML causes these problems.
Jamie, if you need help securing /., I have just your man. He is a smarty like you.
Possible solutions to this problem are emerging slowly, like RSScache (feed caching proxy) and KnowNow (even-driven syndication).
Didn't Usenet solve this sort of thing decades ago?
One might want to modernise it slightly so you get shoter lag times etc. but the basic distribution problem is the same and the algorithms chosen have worked well almost since the start of th internet without eating up the capacity.
sudo ergo sum
was I the only one who liked the idea of using (let's say) 500K of bandwidth per hour while my screen saver was on anyway (and so I wasn't really doing anything) so that if I saw something interesting, I could have it NOW (and not in the 10 seconds it takes to load a page)? Was I the only one who actually liked their screensaver (yes, it was shiny and flashy, but when you look across the room at it, this stuff is important).
Actually, this post is serious, not sarcastic. Is there any similar replacements?
-- Is "Sig" copyrighted by www.sig.com?
Hey, what about not sending the whole RSS xml (which could be huge) but just the diffs with previous? Like cvs or cvsup. This will save a ton of bandwidth.
I don't get the popularity of RSS.
I personally rather have content sans a gui, but most people consider going from a web site to text a step backwards.
Some sites also get RSS feeds from other sites, but why bother? Why not just go to that other site? Usually they do a better job.
So, what is the appeal or am I misunderstanding RSS?
They already started doing this on AListApart a while back.
Can't happen a moment too soon!
Organic free-range music... yum!
Could someone please explain?
I did some measurements across several hundred thousand of the most prominent RSS feeds, and I found that only a few actually return a compressed feed when so requested.
On average, compressed feeds are 30.42% of the size of the original, as you can see in graphical form here.
Better support for mod_gzip would certainly help to reduce the impact of RSS polling, but then again so would proper use of conditional get.
Really, it's like people look for things to complain about.
What do you think uses more bandwidth, 20,000 people loading a webpage with the latest news, or 20,000 people loading an RSS feed?
Bit Torrents.
When we showed our corporate overlords what RSS was, the first thing they asked was "How can we get ads in there" ...
It took some time to explain why RSS was good without ads.
Glenn Fleishman, of Wi-Fi Networking News has written a script to throttle the poorly-behaved aggregators and writes about it on his personal blog.
I've seen many RSS URLs pull from a site's database to build the XML each time it's hit. This is fixed simply by creating a CRON job that builds the RSS XML on a periodic basis, then serving the resulting file. If you're just throwing a file back, then server bandwidth isn't as much of a problem, especially when you consider that browsers themselves cache files.
Steve Magruder, Metro Foodist
The second option seems safer for the server. At an interval of hours or days, the server could check to verify the existence of the client.
Beta is broken and the link to classic doesn't work. Stop wasting our time or there won't be anybody left here.
I'm reproducing this article from my own site (all the links are on the site):
Capacity planning and RSS
September 9, 2004
Robert Scoble points to MSDN having issues with full entry RSS. What it comes down to is a capacity planning exercise.
In his note, he says that RSS is broken. I personally believe that at issue is not whether RSS is working or not. RSS is working but it has complicated the bandwidth issue. At issue is the fact that RSS feeds are generally generating more traffic to a site. Because RSS readers are polling the site to check if a feed has been updated, the traffic patterns change, with increased numbers of spikes on a hourly basis. This is similar to some of the issues network administrators started facing when Pointcast first appeared.
There are a number of ways to mitigate the issue.
HTTP Conditional GET for RSS
First of all, one of the things to consider when using RSS is to create conditional HTTP headers on RSS feeds. This helps mitigate some of the impact by ensuring that feeds are only served if the content has changed.
Feed Compression
The next item to think of is to use compression when serving feeds. By doing so, one reduces the size of the payload, which ends up being much better in terms of managing bandwidth. In my own experience, because RSS is primarily text, I've seen a reduction of 80% of the bandwidth when delivering RSS feeds in a compressed format. That represents a fairly large gain in bandwidth that can then accommodate more users.
Change the polling schedule
The RSS 2.0 specification already offers a number of optional elements to give RSS readers a better idea as to when to get content. For example, the pubDate element offers information as to when a feed was last published, as does the lastBuildDate one. ttl (aka. time to live) can also be used to indicate to the software that this feed should live for a certain amount of time. Finally, skipHours and skipDays offers more pointers as to when RSS reader software should not poll. With all those mechanisms in place, it looks like a lot of flexibility exists in the format to accommodate scalability.
When all else fails, reduce
If all of the above still fail, RSS publishers should look at reducing the size of their feeds. There are two ways you can do this. First, you can just say that you're not going to offer full-text feeds. This seems to be the option that Scoble hates. Another way to do things is to offer both abbreviated feeds and full-text feeds or offer more detailed feeds, as I do on TNL.net.
An important consideration when doing something like this is how to address them. By default, users who just use the RSS autodiscovery feature will only get the abbreviated feed. However, they still have the option to go and get the full-text version. The compromise here is that users who just want to subscribe quickly can do so at a lower bandwidth costs, while power users can seek out the fuller feed and subscribe to that. The result, in my experience, is that most people use the autodiscovery feature, grabbing the smaller feed. Some power users do seek out the fuller feed and subscribe to that instead (based on the numbers, I'm seeing a 5% usage of the full-text feed as opposed to the default abbreviated one. This is a compromise solution that seems to accomodate everyone involved to date.
Final considerations
When publishing RSS feeds, your audience grows, which results in traffic growth too. One of the thing to realize is that RSS feeds are generally stickier than the rest of a site. What this means is that, for every new subscriber you get, you will see an on-going increase in your overall site traffic stats. This is not a bad thing as messages emanating from your site do get a higher passive readership. One of the thing that new syndication standards should consider is a follow-up on this. While RSS publisher know how many feeds are being pushed out, there is littl
Check out http://www.tnl.net/blog
Massive polling for updates leads to scalability problems? Big surprise! We need to learn that HTTP is not always the best technology for the job. Just-in-time content delivery requires a different set of tools. There's already an Internet-Draft for sending Atom feeds over XMPP (a.k.a. Jabber), and the same "publish-subscribe" technology could be used for RSS (or a smart service could translate to Atom so your client doesn't need to parse all those RSS formats). Check out PubSub.com for a real-life implementation of the basic concept (they track 3+ million feeds and notify you when a feed you're interested in has changed, and even do handy keyword-based monitoring). And one added benefit of using the XMPP pubsub extension is that these are all open protocols with many open-source implementations. In this problem-space at least, HTTP is so second-millennium!
In 2000 I tried to invent a spam-proof usenet. The result of my efforts was Miski. The idea of Miski was that users would have addresses on servers representing what are effectively RSS channels, and other users would subscribe to these channels through their servers. There would be a DNS extension for the naming of servers. Channels would have names like username@example.com/"Java Programming". The system would be spam-proof because your server would only send you what you had subscribed to. It would be "push", because as soon as you posted something to a channel, your server would pass the message on to the servers of those who had subscribed to your channel. Only the notifications would be push: ordinary http would be used to retrieve the actual content.
Miski also had the important concept of "reposting", whereby if you saw something you liked, you could press a single button in your client to repost the notification, so that any subscribers to you could know about the item being reposted, if they had not already heard about it from somewhere else. The presumption was that the client (or the reader's server) would trim out duplicates, so that people posting would have no inhibitions about reposting stuff that maybe many of their subscribers already knew about.
Miski was more than just an attempt to create scalable-push RSS, or a spam-proof equivalent of Usenet: it was a vision of the "global brain". Using posting and reposting, notification of a new "interesting" idea could spread very quickly from the inventor of the idea to almost anyone in the world likely to be interested in that idea, even if the inventor was not well known. We would all be like neurons in the brain, with signals passing from one person to the next as fast as possible. It was an attempt to solve the dual problems of "How can I tell the world what I have to say when I have to compete against the efforts of all those other people trying to tell the world stuff?" and "How can I find out new stuff that's really interesting to me from among all this junk that I am getting from all these people trying to tell stuff to the world?".
I asked the question How fast is the Internet?. Although packets can travel from one computer to another in seconds, or even less, information can still take days, weeks, months or even years to travel from the person who created it to another person who is interested in it. One way to measure this is to consider how often you find a document on the web which is interesting, but which you did not know about, and which has nevertheless been available for months or years, and which would have been interesting to you even when it was originally posted on the web.
Sadly Miski was never implemented, and I reduced my ambitions to write Womcat Bookmarks, which attempted to be a less dynamic version of Miski, but has ended up being just another RSS reader.
Music: a super-stimulus for the perception of musicality. Musicality: a perceived aspect of speech.
blame XML!
That's XML for ya!
the best way is to optimize your rss feeds to a max of 10 items, and stick to TITLE and LINK fields only.
Tom's hardware had a feed that was over 500kb, and they wonder why they had bandwidth issues.
During the Dot-bomb era, all these companies popped up. Remember them? The ones like Pets.com, medsonline.com, etc. The philosophy was that all this new technology was so going to permeate into every home in the world by 2000 - and every time someone needed a Tylenol for their headache they would be going online to buy it.
Turns out, as all of us know, that was dead wrong. But the companies, before going bust, kept yelling one message to the Telcos and IP providers - "WE NEED BANDWIDTH" and "WE NEED LOTS OF IT". Come on - there are literally millions of miles of fiber optics out there - each capable of handling TONS of gigabits of capacity.
Want reality? I don't know the exact numbers or anything, but I seem to recall from somewhere that over 60% of the fiber out there is DARK. Yes, that's right - DARK FIBER! Capable of handling those OC-192+s.
Ok, the laws of economics are supposed to tell us that if you have a demand the price goes up, and more of a supply and the price goes down. If carriers would only open more supply up (which, mind you, they already have), prices would drop for that bandwidth, and VOILA - we don't have bandwidth problems any more because the companies it's sucking on (Cnet, /., etc.) can afford more bandwidth!
It seems silly to me that we're fussing over bandwidth issues when we have literally gigabits of the stuff laying under the roads we drive every day that's not turned on. Now, if I could only get come of that optical stuff in front of my house I'd be doing good!
Yes, but that overhead would then be handled by the p2p clients, not by the central server. It's become Somebody Else's Problem. Not necessarily good for the internet as a whole, but a solution for the server
--LWM
The solution is twofold:
1: Conditional get(s) enable only new data to be sentZ
2: Random (or pseudorandom) refresh. Most people set their readers to get headlines every 30 or 60 minutes & consequently sites are overwhelmed at the hourly & half-hourly marks. I personally set my reader to update at 100 minute intervals. I would support efforts of RSS developers to enact random requeing (the server gives the reader a random time to recheck for updates)
BTW, Eweek had relatively the same response in September... Clicky
Cheers!
Daniel Lott,
Service Computers, LLC.
"Nobody goes there any more, it's too crowded."
Other quotes by Yogi (You've heard many of them.)
It's easier to be a result of the past, but more fun to be a cause of the future! http://www.spacefinancegroup.com/