Is RSS Doomed by Popularity?

Push by Phroggy · 2004-12-08 12:56 · Score: 5, Insightful

Remember all the hype about "push" technology back in the mid-nineties? Nobody was interested, but RSS feeds are being used in much the same way now. I'm thinking there are two significant differences: 1) with RSS, the user feels like they're in control of what's going on; with push, users felt like they were at the mercy of whatever money-grabbing corporations wanted to throw at them, and 2) a hell of a lot of people now have an always-on Internet connection with plenty of bandwidth to spare. When you've got a 33.6kbps dialup connection, you use the Internet differently than when you've got DSL or cable.

How much bandwidth does Slashdot's RSS feed use?

It looks like the RSS feed on my home page has a small handful of subscribers. Neat.

--
$x='S24;r)>63/* h@<5+oZ)32"5cz';$me='phroggy'x$];
$x=~y+ -xz+\0-Tx+;print$_^chop$me for split'',$x;

Re:Push by Anonymous Coward · 2004-12-08 13:12 · Score: 5, Insightful

Pointcast sent way too much data at the time, and now we all have orders of magintude more bandwidth.

Most of the problem come from a few older RSS readers that don't support Conditional GET, gzip, etc. With modern readers, there's essentially no problem (I've measured it on a few sites I run). Yes, they poll every hour or two, but the bandwidth is a tiny, tiny fraction of what we get from say, putting up a small QuickTime.

There seem to be lots of people who freak out way to quickly about a few bytes. RSS sends to unnecessary data, but if you've configured things correctly, it's much smaller than lots of other things we do on our networks...
Re:Push by sploxx · 2004-12-08 13:14 · Score: 2, Insightful

Yes, maybe this way 'feels technically different', but if you have an RSS aggregator/news ticker applet whatever on your desktop, it usually hides the implementation details completely from the user. Do you really think of "ok, now my client makes a http request, that travels through the call hierarchy of the libraries, gets a tcp socket open, gets a kernel call of the driver to send a SYN packet??". Even if I may have detailed knowledge about the inner workings of an application, I usually don't care about it.

BTW, it's the same about eMail and another good reason why the SMTP/POP suite should be replaced soon (besides spam).
Re:Push by ikewillis · 2004-12-08 13:31 · Score: 2, Interesting

http://beacon.sf.net/ tries to do this using UDP and filesystem monitoring. It waits for the RSS document to change then sends a UDP datagram to notify everyone that a new version is available. It's better than everyone polling the server via HTTP anyway.
Re:Push by rlanctot · 2004-12-08 13:54 · Score: 2, Interesting

My suggestion is to revamp RSS to use a P2P format of publishing, so you spread out the load.
Re:Push by jasonwea · 2004-12-08 14:33 · Score: 2, Interesting

This seems like a far better than the UDP notification idea. Port forwarding for an RSS feed? No thanks.

There is almost always a DNS cache at the ISP so the polling interval can be completely controlled by the TTL of the record. Using the existing distributed caching of DNS versus the large percentage of users who are not behind HTTP caches.

I see two potential problems with this idea:

1. A lot of people are stuck behind HTTP proxies with limited or no DNS. This isn't too bad as they could fallback to the current system.

2. Access to the DNS server zone file. Unless you are running your own server, this might be a difficult thing to do as a lot of hosts do not allow direct access to the zone file and would probably frown on lots of changes to the file. If you have a static IP address you could host your own DNS server to get around this however. For someone with bandwidth problems from RSS feeds, this is unlikely to be an issue.
Re:Push by Jahf · 2004-12-08 16:21 · Score: 2, Interesting

The problems with many of these mechanisms is that (as you mention) smaller sites may not have the facilities to do it.

On the other hand it seems like everyone and their dog can do P2P.

A P2P-ish RSS system that:

* Attempts to make each client capable (but not always used) of functioning as a caching server for the feed

* Has a top-level owner of a feed who has sole rights to update the feed. Perhaps passing public/private keys with the feed to ensure no tampering. Anyone who wanted to subscribe to the feed would need to connect to the top-level one time to get the keys before using RSS-P2P caches.

* Hopefully has some intelligence to determine the closest feasible cache (perhaps based on # of hops and # of retries) so that we are peering out bandwidth usage as best as possible

* Use a standard port and open protocol such that a large organization can route any RSS-P2P requests through a main RSS-P2P cache at the router (further enhancing the ability to minimize traffic ... and also giving a polite way for an organization to shut it off ... just like HTTP)

* Possibly can push a "refresh notification" packet to any clients that have connected to the cache ... if a client fails to pull a refresh after X # of notification packets, assume it went away ... push a "norefresh notification" every X (minutes|hours|etc) to make sure that the client knows the cache is still viable ... if the client doesn't get a (norefresh|refresh) notification after X number of (minutes|hours|etc) then assume that the server has gone down and find a new one

* Probably obvious but the RSS-P2P cache would be able to select which caches it wanted to host (though I can see use for a mode where it is told to proxy and cache any RSS-P2P request it receives)

* Since there are existing RSS (not RSS-P2P) setups out there, we could possibly enhance them by allowing the RSS-P2P cache to speak and send RSS over existing mechanisms (HTTP). Further, any RSS-P2P cache that has this mode enabled could, if willing, send a notification to the top-level RSS-P2P server (which would always be maintained by the authoritative feed owner) and be added to a round-robin DNS for the normal RSS feeds so that it helps share the load for normal RSS as well. Only people willing to be "supercaches" would do this, but it allows larger sites to help spread the load.

Or I could be way off base. Been known to happen.

--
It is more productive to voice thoughtful opinions (reply) than to judge (moderate) others.
Re:Push by some+guy+I+know · 2004-12-08 19:32 · Score: 2, Funny

My suggestion is to revamp RSS to use a P2P format of publishing, so you spread out the load.
RSS Torrent!

--
Those who sacrifice security to condemn liberty deserve to repeat history or something. - Benjamin Santayana
Re:Push by kimba · 2004-12-08 22:54 · Score: 2, Insightful

DNS expiries and retries are completely configurable. You can set your zone to expire every 5 minutes if you want to. That is how these dynamic DNS places do it.

Just because you have set up your zone to refresh every 12 hours doesn't mean its mandatory.

They just need to follow ./'s lead by Neil+Blender · 2004-12-08 12:57 · Score: 5, Insightful

And institute jackboot banning policies if you access them more than x times per y hours.

Re:They just need to follow ./'s lead by Hatta · 2004-12-08 13:07 · Score: 2, Insightful

And institute jackboot banning policies if you access them more than x times per y hours.

I don't know much about RSS, but it seems kind of silly to have the user refresh. Doesn't that defeat the purpose? Why not just have the server send out new news as it gets it?

--
Give me Classic Slashdot or give me death!
Re:They just need to follow ./'s lead by NeoSkandranon · 2004-12-08 13:13 · Score: 3, Informative

If the server initiated the connection then RSS would be useless to nearly everyone who's behind a router or firewall that they do not administer.

The server would also need to have a list of clients to send the refresh to, which means you'd need to "sign up" so the server puts you on the list.

Nevermind the difficulties that dynamic IP addresses would cause. It's generally easier if the user initiates things.

--
If you can't see the value in jet powered ants you should turn in your nerd card. - Dunbal (464142)
Re:They just need to follow ./'s lead by interiot · 2004-12-08 13:23 · Score: 2, Informative

This question has been asked many times, and has been answered better than I'm able to.
But the gist of it is that push-media and multicast are either a thankfully-dead-fad, or are a technology whose time has yet to come. Push media, in particular, was salivated over quite a bit in the late 90's (eg. see Wired's 1997 cover article on it), so it's not as if it's a new idea. Despite this, push and multicast haven't gained wide success yet. Lots of people have various reasons why, and some of them are actually quite insightful. Google more if you want, but at least be aware that if one simply repeats the thoughts of the past in this area, one isn't likely to be successful.
Re:They just need to follow ./'s lead by interiot · 2004-12-08 13:30 · Score: 4, Informative

You know what happens then? The same thing they do when you hamper your RSS feed in any other way, they scrape your HTML and create their own feeds. Slashdot doesn't monitor their front page as closely as they do their rss page, so you can get away with quite a bit of abuse, at least for a while. They've blacklisted my IP ocassionally when I got overzealous though.
Re:They just need to follow ./'s lead by Electroly · 2004-12-08 13:38 · Score: 5, Informative

HTTP 1.1 already supports this. A conditional HTTP request can be made which basically asks the server if the file has been updated. The server can then respond a 304 Not Modified and avoid sending the entire RSS file again. Unfortunately, poorly written RSS aggregators don't implement this, and it is those aggregators that are the real problem here. They typically are the ones with the default 5 minute update time, too.

Welcome to the internet by Anonymous Coward · 2004-12-08 12:58 · Score: 2, Informative

Where we use "push" technologies for everything that functionally pulling information, and "pull" technologies for everything that functionally pushes information.

Whee!

And the funny thing here is, if RSS had-- at its conception-- included caching and push-based update notification and all the other smart features that would have prevented this sort of thing from becoming a problem now, [i]it would never have been adopted[/i], because the only reason RSS succeeded where the competing standards to do the same thing failed was because RSS was so simple.

Re:Welcome to the internet by Svet-Am · 2004-12-08 13:03 · Score: 2, Insightful

depends on your perspective. If I imagine myself to be a server, I'm pushing information to a client and pulling information from a client, like the name implies.

you're interpreting it from the client perspective, which is not where the name came from.

--
[move .sig! for great justice, take off every .sig!]

RSS readers don't cache! by IO+ERROR · 2004-12-08 12:58 · Score: 4, Insightful

One thing that would help immensely is if RSS readers/aggregators would actually cache the RSS feed and not download a new copy if they already have the most current one. I could go through my server logs and point out the most egregious problem aggregators if anyone's interested.

--
How am I supposed to fit a pithy, relevant quote into 120 characters?

Re:RSS readers don't cache! by gad_zuki! · 2004-12-08 13:11 · Score: 4, Insightful

Sometimes you can't tell if you have the newest file, depending on the web server/config.

The problem, is of course, server-side. For instance, the GPL blog software Word Press doesnt do ANY cacheing. Its RSS is a php script. So if you get 10,000 requests for that RSS, then you're running a script 10,000 times. That's ridiculous and poor planning. Other RSS generation is guilty of this crime.

Yes, there is a plug in (which doesnt work at nerdfilter nor at the other wordpress site I run) and a savvy person could just make a cron job and redirect RSS requests to a static file, but that's all besides the point. This should all be done "out of the box." This is a software problem that should be addressed server side first, client side later.

Not to mention, a lot of these RSS readers are big sites like bloglines, newgator, etc who should be respecting bandwidth limits, but really have no incentive to do so. RSS really doesnt scale too well for big sites. What they should be doing is denying connections for IPs that hit it too often or change the RSS format to give server instructions like "Dont request this more than x times a day" in the header for the clients to obey. x would be a low number for a site not updated often and high for asite updated very often.
Re:RSS readers don't cache! by maskedbishounen · 2004-12-08 13:15 · Score: 5, Informative

To some extent, this could be blamed on the feed itself. Ideally, it works like this..

When you request the feed, you first get sent your normal HTTP header. If properly configured, it will return a 304 if you have the most recent version -- however, as many feeds are generated in PHP[1], this header is defaulted off, and you'll end up with your standard 200, or go ahead, code. This single handedly wastes a metric tonne of bandwidth needlessly.

Even if you're trying to rape a feed, you'll only be wasting a few hundred bytes at most every half hour, than the whole 50K or whatnot size it is.

See here for a more detailed explanation.

[1] This is not a PHP specific issue; a lot of dynamic content, and even static content, fails to do this properly. But this is what it's there for, after all.

--
"An infinite number of monkeys typing into GNU emacs would never make a good program."
Re:RSS readers don't cache! by IO+ERROR · 2004-12-08 13:17 · Score: 4, Informative

For instance, the GPL blog software Word Press doesnt do ANY cacheing.

Technically true but misleading. WordPress allows user agents to cache the RSS/Atom feeds, and will only serve a newer copy if a post has been made to the blog since the time the user agent says it last downloaded the feed. Otherwise it sends a 304. This is in 1.3-alpha5. I dunno what 1.2.1 does.
Not to mention, a lot of these RSS readers are big sites like bloglines, newgator, etc who should be respecting bandwidth limits, but really have no incentive to do so.

Not coincidentally, these are the egregious worst offenders I mentioned. Bloglines grabs my RSS2 and Atom feeds hourly, and doesn't cache or even pretend to. Firefox Live Bookmarks appears to cache feeds, but your aggregator plugins might not. I can't (yet) tell the difference from the server logs between Firefox and the various aggregator plugins.
The best ones are the syndication sites that only grab my feeds after being pinged. Too bad I can't ping everybody. That could solve the problem if there was some way to do that.

--
How am I supposed to fit a pithy, relevant quote into 120 characters?

rsstorrent will solve it all by RangerWest · 2004-12-08 13:00 · Score: 4, Interesting

rsstorrent -- distributed rss,echoing bittorrent?

Doomed? It's barely got off the ground... by WIAKywbfatw · 2004-12-08 13:02 · Score: 5, Insightful

What you're seeing right now are teething troubles. Nothing more, nothing less. The bandwidth and consumption experienced right now will be laughed off a couple of years from now as miniscule.

Take the BBC News website for example. On September 11th 2001 its traffic was way beyond anything it had experienced to that point. Within a year or so, it was comfortably serving more requests and seeing more traffic every day. Proof if it was needed that capacity isn't the issue when it comes to Internet growth, and won't be for the foreseeable future.

RSS is in its infancy. Just because people didn't anticipate it being adopted as fast as it has been that doesn't make it "doomed". By that rationale, the Internet itself, DVDs, digital photography, etc are all "doomed" too.

--

"Accept that some days you are the pigeon, and some days you are the statue." - David Brent, Wernham Hogg

Limit download to new content by zoips · 2004-12-08 13:04 · Score: 5, Interesting

Instead of downloading the entire RSS feed every time, why not have aggregators indicate to the server the timestamp of the last time the RSS feed was downloaded, or the timestamp of the last item in the feed the aggregator knows about, and then the server can dynamically generate the RSS with only new content for that client. Increases processing load while reducing bandwidth, but processing time is what most servers have lots of, not to mention it's far cheaper to increase than bandwidth.

About time for asynchronous by benow · 2004-12-08 13:05 · Score: 3, Informative

Asynchronous event driven models are the way to go for changing content. They're trickier to code, but require less bandwidth and are more responsive. Perhaps a bit of a privacy issue, at some level (registration with source), but easy to implement, failure resistent distributed asynchronous networks have much applicability, not just to RSS.

Not a problem with RSS.. just humans. by dustinbarbour · 2004-12-08 13:06 · Score: 4, Interesting

RSS feeds are meant as a way to strip all the nonsense from a site and offer easy syndication, right? Basically, present the relevent news from a full-fledged webpage in a smaller file size? If such is the case, this isn't an RSS issue, really. I see it more as a bandwidth issue. I mean, people are going to get their news one way or the other.. either with a bunch of images and lots of markup via HTML or with just the bare minimum of text and markup via RSS. I would prefer RSS over HTML any day of the week! But perhaps RSS makes syndication TOO simple. Thus everyone does it and that eats additional bandwidth that normally would be reserved for those browsing the HTML a site offers.

And you could implement bans on people who request the RSS feed more than X times per hour as someone suggested (Doesn't /. do this?), but I don't think that gets around the bandwidth issue. I mean, those who want the news will either go with RSS or simply hit the site. Again, RSS is the preferred alternative to HTML.

So here's my suggestion.. go to nothing but RSS and no HTML!

--
What is your penile percentile?

Re:Not a problem with RSS.. just humans. by kardar · 2004-12-08 13:24 · Score: 2, Insightful

I wonder if advertising has anything to do with it - if you go to a news site just to see "what's up", you might get banner ads, google ads, so on and so forth - but RSS just makes a nice neat webpage for you or something similar.

I have to point out how much I love "Sage", the Mozilla Firefox plugin for RSS - you can even rightclick on that XML thing that tries to tell you to save the page and bookmark it under "Sage Feeds" and then Alt-S and you have your RSS.

I started using Sage for /., Groklaw, and a couple others and it's very cool. Very very cool. I hope the advertising revenue doesn't hurt people or whatever, but it's almost one of those things that would be worth money in how much time and aggravation it saves you having to deal with web designs that aren't as great as they could be.

I've heard a lot about how people complain about Slashdot and the interface and the web design and so on, but Sage cuts down significantly on the time spent here, more or less - or anywhere, for that matter - I think it make the ./ or Groklaw or whatever experience BETTER, not worse.

Only thing I can think of is advertising revenue.

Pop Fly by Anonymous Coward · 2004-12-08 13:07 · Score: 5, Funny

"Is RSS Doomed by Popularity?"

"Is Instant Messaging Doomed by Popularity?"

"Is E-Mail Doomed by Popularity?"

"Is Usenet Doomed by Popularity?"

"Is The Internet Doomed by Popularity?"

"Is Linux Doomed by Popularity?"

"Is Apple Doomed by Popularity?"

"Is Netcraft Doomed by Popularity?"

"Is Sex with Geeks Doomed by Popularity?" :)

Solutions by markfletcher · 2004-12-08 13:07 · Score: 5, Informative

There are several ways to mitigate the bandwidth issues. First, all aggregators should support gzip compression and the HTTP last-modified and etags headers. That'll take care of a lot of the problems. The other solution is to get people to use server based aggregators, like Bloglines, which only fetch a feed once per iteration, regardless of how many subscribers there are. As a bonus, there are several things that server-based aggregators can do that desktop based aggregators can't do, like provide personalized recommendations. I like this solution, but of course I'm biased since I'm the founder of Bloglines. :)

A simple fix by jd · 2004-12-08 13:08 · Score: 2, Informative

What you have is a large number of subscribers accessing a common data source at or around the same time. The simplest fix would be to have a reliable multicast version of RSS, which is broadcast to all subscribers to that feed. Then, you only have to transmit the updates once. The network would take care of it from then on.

New subscribers would receive the initial copy of the feed via traditional unicast TCP, because that would be the least CPU-intensive way of handling a few requests at a time.

A caching system won't work for the same reason web caches have never caught on in the US - people are terrified of being sued to smithereens for potential copyright infringement. Even if any case would be thrown out of court instantly (by no means certain in the US) the costs would be prohibitive and malicious plaintifs rarely ever get asked to pay costs.

The main problem with the multicast solution is that although multicasting is enabled across the backbone, most ISPs disable it - for reasons known only to them, because it costs nothing to switch it on. Persuading ISPs to behave intelligently is unlikely, to say the least.

--
It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)

Solved, move on by Jeremiah+Blatz · 2004-12-08 13:13 · Score: 3, Informative

Shrook for the Mac has already solved this issue with "distributed checking". Popular sites are checked once every 5 minutes, if the site is updated, everyone gets the latest content, otherwise, nobody touches it.

As another poster has pointed out, banning users who check too frequently is an excellent fallback. A tiny site won't know to install the software, but it won't be an issue for a tiny site.

RSS + Bittorrent -- works for Podcasts... by Spoing · 2004-12-08 13:15 · Score: 2, Interesting

Or, is in the works now on Dave Slusher's Evil Genius Chronicles Podcast. [Podcasts = RSS subscrition feeds for time shifted radio blogging.]

The Podcasters need it too. I'm subscribed to a couple dozen feeds and have well over 4GB of files in my cache right now.

The biggest problem with Bittorrent and podcasts is that the RSS aggregators needs to be Bittorrent aware. Unfortunately, few are.

--
A firewall can not protect you from yourself. Turn off what you do not need. Do not use the firewall to do your work.

Re:RSS + Bittorrent -- works for Podcasts... by Wesley+Felter · 2004-12-08 17:21 · Score: 2, Insightful

Too bad podcasts are totally different from normal RSS feeds, because podcasts are about 100x larger. BitTorrent doesn't work for normal RSS feeds because they are too small and change too often.

Bittorrent by Jherek+Carnelian · 2004-12-08 13:16 · Score: 3, Insightful

Seems like bittorrent, or a bittorrent-alike protocol would be useful here. Turn the RSSfeed into a tracker/seed and then all it has to keep track of is who has the latest version of the content and it could redirect feeders to each other, always preferring the latest updated version. Eventually, you will have the same scaling problems that bittorrent has (single tracker), but at least you stretch things out a few months or a year until a better solution ocomes around.

what's wrong with the old subscription model? by Trepidity · 2004-12-08 13:21 · Score: 2, Interesting

When I want updates from sites, I subscribe to an email feed, and stick it in its own mailbox. I agree that some standardized format and display would be nice, but you can send XML over email too, so what's needed is a reader that I can point to an IMAP mailbox full of XML mails.

An alternate approach would be to do the same thing with a news server. Why keep refreshing a feed for updates instead of letting it notify you when it has updates?

--
10 PRINT CHR$(205.5+RND(1)); : GOTO 10

Re:Duh... Simple solution by moojj · 2004-12-08 13:25 · Score: 2, Interesting

I think the biggest reason people are offering RSS feeds is because its a standard XML file on the webserver. No need to make additional scripts, no need to setup additional services -- just upload the XML file. When you start complicating the "Really Simple Syndication" model you start making it less simplistic. In my opinion the easiest way to limit bandwidth is to supply the XML file on servers that support gzip compression and the "Etag" header function. This way RSS readers will only download a compressed XML file, but only when it has been modified. Larger sites could go one step further and ban polling by RSS clients that don't support the Etag lookup feature before requesting the XML file. Then, theres always the obvious solution: cut down the number of items inside the XML file, thus lowering the amount of bandwidth per hit.

This issue was previously discussed elsewhere by Paul+Bain · 2004-12-08 13:36 · Score: 4, Insightful

As RSS [becomes] more known to the mainstream users and press, the bandwidth issue reported by many sites . . . related to feeds is becoming a reality. Stats from sites like Boing Boing are showing a real concern regarding feeds bandwidth usage. Possible solutions to this problem are emerging slowly, like RSScache (feed caching proxy) and KnowNow (even-driven syndication). RSScache seems to offer a realistic solution to the problem, but [will it] be enough . . . ?

Slashdot user GaryM posted a related question elsewhere about 20 months ago. At that time, in that forum, commenters dismissed his proposed solution, the use of NNTP, on the grounds that NNTP is deficient, but others continue to see NNTP as a possible solution nevertheless.

--

A lawyer & digital forensics examiner. Also an expert on open source software (OSS).

Solution! by Quixote · 2004-12-08 13:37 · Score: 2, Funny

I have an idea... let's start a company which pushes data to the consumers... from a central point. We'll call it "pointcast".

Now if only they'd bring back the $$$ from the mid 90s too.... :)

Slashdot's RSS blocking policy by jamie · 2004-12-08 13:43 · Score: 4, Informative

Slashdot blocks your IP from accessing RSS if you access our site more than fifty times in one hour. I think that's reasonable, don't you? Especially since our FAQ tells you to request a feed only twice an hour.

Every complaint about this that I've investigated has turned out to be either a broken RSS reader or an IP that's proxying a ton of traffic (which we usually do make an exception for).

Oh, and if you want to read sectional stories in RSS, then:

create a user if you haven't already,
edit your homepage to include sectional stories you like (and exclude those you don't),
then reload the homepage and copy that "rss" link at the very bottom of the page. It will be customized to your exact specs!

Slashdot's RSS traffic, like Boing Boing's, is huge, and blocking broken readers has saved us a ton of bandwidth, which of course means money. We were one of the first sites to do this but (as this story suggests) you'll see a lot more sites doing it in the future. I think our policy is fair.

Re:Slashdot's RSS blocking policy by jamie · 2004-12-08 14:50 · Score: 3, Insightful

Is there a reason Slashdot doesn't cache better? I'd think that'd save a lot of bandwidth.

Not really. Our cache hit rate would be about zero. We update the homepage about once a minute, and the same goes for any page that any reader would be likely to reload within a reasonable time.
Re:Slashdot's RSS blocking policy by jamie · 2004-12-08 14:59 · Score: 2, Interesting

The limit was bumped up a couple months ago, I don't remember exactly when. (And if abuse gets worse, of course, we'll take it back down... but hopefully in 2004 we're no longer on the bleeding edge and client application authors will get more friendly...)
If you'd like me to check it out, I will. I've set up a Firefox live bookmark for myself and I'll check the logs for my own accesses and see what happens. If you do the same and get banned, go ahead and email me directly -- as soon as possible so our logs don't roll over -- and I'll take a look.
Re:Slashdot's RSS blocking policy by jamie · 2004-12-08 15:43 · Score: 2, Informative

Sorry, I goofed, that feature I described is subscriber-only. I'd forgotten that. I'll update the FAQ to describe it.
Re:Slashdot's RSS blocking policy by bill_mcgonigle · 2004-12-08 15:54 · Score: 2, Interesting

Sorry, I goofed, that feature I described is subscriber-only.

That's OK, I'm a subscriber... still don't see how the custom RSS works. From my RSS reader how does Slashdot know I'm a subscriber? Special URL?

--
My God, it's Full of Source!
OUTSIDE_IP=$(dig +short my.ip @outsideip.net)

Solution: RSS over Usenet news by NZheretic · 2004-12-08 13:50 · Score: 4, Interesting

One solution would be to use an existing infrastructure that was built for flood filling content - the Usenet news server network.

Create a new first level domain ( like alt, comp, talk etc ) named "rss" and use an extra header to identify the originating rss feed URL. The latter header could be used by the RSS/NNTP reader to select which article bodies to download and to verify each RSS entry to identify fake posts.

Re:They just need to follow /.'s lead by jamie · 2004-12-08 13:54 · Score: 4, Funny

Of course we blocked your IP when you hammered our server. And we'll do it again. Duh. We monitor abuse on the whole site, not just RSS.

Swarming (Like BitTorrent) is the answer by MS_leases_my_soul · 2004-12-08 13:54 · Score: 4, Interesting

This still baffles me. BitTorrent works great for distributing media like ISOs. Folks, it can distribute "little" stuff, too.

A content creator (say Slashdot) has webpages and it has an RSS feed. They create a torrent for each page. They sign the RSS file and each torrent (and its content) with a private key. They post their public key on their homepage.

Now, you can cache the RSS file on other sites that support you yet the users can still be confident that it really came from you. Inside the RSS file, users can try to get the webpage (and all its images, etc.) through the torrent first. When the page loads locally in your browser, it could still go out and get ads if you are an ad sponsored site.

If you are a popular site and have a "fan base", you should have no problem implementing something along these lines. If you are a site that has these problems, you are probably popular and have a fan base. Given the right software and the buy-in from users, the problem solves itself.

Re:Swarming (Like BitTorrent) is the answer by Jerf · 2004-12-08 14:36 · Score: 3, Informative

BitTorrent works great for distributing media like ISOs. Folks, it can distribute "little" stuff, too.

No, it "can't". Or at least, it can't serve it with any benefit. Tracker overhead swamps any gains you might make. BitTorrent is unsuitable for use with small files, unless the protocol has radically changed since I last looked at it. In the limiting case, like 1K per file, it can even be worse or much worse than just serving the file over HTTP.

Inside the RSS file, users can try to get the webpage (and all its images, etc.) through the torrent first.

Oh, here's the problem, you don't know what you're talking about or how these technologies work. When an RSS file has been retrieved, there is nothing remotely like "get the webpage" that takes place in the rendering. The images are retrieved but those are typically too small to be usefully torrented too.

Regretably, solving the bandwidth problem involves more than invoking some buzzwords; when you're talking about a tech scaling to potentially millions of users you really have to design carefully. Frankly, the best proof of my point is that it was as easy as you say it is, it'd be done now. But it's not, it's hard, and will probably require a custom solution... which is what the article talks about, coincidentally.
Re:Swarming (Like BitTorrent) is the answer by Quixotic · 2004-12-08 16:15 · Score: 2, Insightful

i'm not sure the overhead of maintaining a torrent would be less than just serving up a single rss feed (or webpage, image, whatever small file). if i'm not mistaken, each client still needs to download the torrent from the main site to determine where it should download the payload from... and if you're going to do that, you might as well just serve up the small file.

also, using a torrent might not work so well for sites like slashdot, which allows users to customize the homepage and/or feeds...

--
--

You're talking application-level by mveloso · 2004-12-08 13:58 · Score: 4, Interesting

Well, RSS was simple, and everything you're talking about (caching, push-based update, etc) are application-level issues. Even though that stuff is defined in HTTP 1.1, it took years for HTTP 1.1 to come out.

If the web started with HTTP 1.1, it would never have gone anywhere because it's too complicated. There are parts of 1.0 that probably aren't implemented very well.

If you want to improve things, adopt an RSS reader project and add those features.

I agree with you. by mcmasuda · 2004-12-08 14:03 · Score: 3, Informative

I just fired up ethereal and refreshed my RSS reader. Out of the dozen or so feeds I monitor, a few of them are using Etags and sensible cache-control headers, so I just get 304 Not Modified. Of the rest, not a single one is compressing even though my client is specifying gzip and deflate in its Accept-Encoding header.

HTTP compression will work even better here than it does for regular pages - RSS is basically all text so every response is going to be compressible. Looking at a handful of my feeds, some quick messing about with wget & gzip gives me an average compression ratio of 3:1. That's a 66% reduction in bandwidth utilization. If just half of your clients support HTTP compression, it's still a significant savings.

Now, the article is talking about poorly designed aggregators that don't check whether the content's changed (I'm assuming he's talking about Etags). There's not much you can do about that, but using compression for capable clients sure seems like it would be a good thing.

If-Modified-Since, User-Agent by pbryan · 2004-12-08 14:06 · Score: 3, Insightful

I'd be interested in seeing how many of these hits are for complete feeds rather than If-Modified-Since the last time it was downloaded. I suspect that if the RSS readers were behaving like nice User-Agents, we wouldn't see such reports.

Perhaps particularly offending User-Agents should be denied access to feeds. If I saw particular User-Agents consistently sending requests without If-Modified-Since, I'd ban them.

--

My car gets 40 rods to the hogshead, and that's the way I likes it!

Re:If-Modified-Since, User-Agent by phixus · 2004-12-08 19:13 · Score: 2

In a lot of cases, it doesn't matter if the client supplies If-Modified-Since headers, or If-None-Match headers either. Slashdot is a prime example. You can supply all of the bandwidth reducing headers you want but slashdots server will give you the full feed every single damn time. Servers should also be gzipping their output, but most don't. Slashdot sure doesn't. Slashdot's rss feed would go from 12K to 3K if they did (I checked). Most of those 12K full feed results would be reduced to about 200 bytes if they honored the caching headers.

Slashdot's RSS blocking policy-$$$$ Kaching. by Anonymous Coward · 2004-12-08 14:23 · Score: 4, Insightful

"Slashdot's RSS traffic, like Boing Boing's, is huge, and blocking broken readers has saved us a ton of bandwidth, which of course means money."

So's using correct HTML, and CSS.

corporate caching by chiph · 2004-12-08 14:25 · Score: 2, Insightful

I wouldn't doubt that eventually someone will build a RSS caching device & sell it to the corporate market. Given how big a drain as RSS is to the supplier, the corporate market has the money and determination not to permit it to become a problem for them.

Chip H.

Compression by yem · 2004-12-08 16:30 · Score: 2, Insightful

I assume the complainers are using it?

51894b boingboing.rss.xml
17842b boingboing.rss.xml.gz

--
No, I did not read the f***ing article!

Re:Jabber and/or BitTorrent ! by hildjj · 2004-12-08 17:27 · Score: 2, Interesting

And here's a first cut at an Internet draft to make it happen. Very small amounts of code, if you have a pubsub service already.

http://xmpp.org/drafts/draft-saintandre-atompub-no tify-01.html

RFC3229+feed defines "delta" encoding for RSS by bobwyman · 2004-12-08 18:32 · Score: 2, Informative

Your suggestion is precisely what is defined by RFC3229+feed (i.e. an RSS-specific extension to RFC3229 " delta encoding for HTTP). I maintain a list of implementation of RFC3229+feed on my blog. You can also find some empirical evidence showing massive bandwidth savings as a result of RFC3229+feed use.

This is a well known and "solved" issue...

bob wyman

Actually, this is a more general xml problem by evil_one666 · 2004-12-08 22:11 · Score: 2, Interesting

XML munches up bandwidth like a lardy butter lover. Yes, yes, RSS feeds are handy, but they dont actually do anything that couldnt be achieved with a much leaner binary format. Its 2004, we dont have byte compatablitily issues any more

See Roedy Greens (one time comp.java.lang FAQ maintainer)excellent essay on why XML causes these problems.

Re:They just need to follow /.'s lead by the+angry+liberal · 2004-12-08 22:28 · Score: 2, Funny

Jamie, if you need help securing /., I have just your man. He is a smarty like you.

RSS Throttling Script by TheLoneGundam · 2004-12-09 04:44 · Score: 2, Informative

Glenn Fleishman, of Wi-Fi Networking News has written a script to throttle the poorly-behaved aggregators and writes about it on his personal blog.

RSS hits that directly hit databases are flawed by smagruder · 2004-12-09 04:47 · Score: 2, Insightful

I've seen many RSS URLs pull from a site's database to build the XML each time it's hit. This is fixed simply by creating a CRON job that builds the RSS XML on a periodic basis, then serving the resulting file. If you're just throwing a file back, then server bandwidth isn't as much of a problem, especially when you consider that browsers themselves cache files.

--
Steve Magruder, Metro Foodist

There are ways to mitigate the impact by TNLNYC · 2004-12-09 05:02 · Score: 2, Informative

I'm reproducing this article from my own site (all the links are on the site):

Capacity planning and RSS
September 9, 2004

Robert Scoble points to MSDN having issues with full entry RSS. What it comes down to is a capacity planning exercise.

In his note, he says that RSS is broken. I personally believe that at issue is not whether RSS is working or not. RSS is working but it has complicated the bandwidth issue. At issue is the fact that RSS feeds are generally generating more traffic to a site. Because RSS readers are polling the site to check if a feed has been updated, the traffic patterns change, with increased numbers of spikes on a hourly basis. This is similar to some of the issues network administrators started facing when Pointcast first appeared.

There are a number of ways to mitigate the issue.

HTTP Conditional GET for RSS
First of all, one of the things to consider when using RSS is to create conditional HTTP headers on RSS feeds. This helps mitigate some of the impact by ensuring that feeds are only served if the content has changed.

Feed Compression
The next item to think of is to use compression when serving feeds. By doing so, one reduces the size of the payload, which ends up being much better in terms of managing bandwidth. In my own experience, because RSS is primarily text, I've seen a reduction of 80% of the bandwidth when delivering RSS feeds in a compressed format. That represents a fairly large gain in bandwidth that can then accommodate more users.

Change the polling schedule
The RSS 2.0 specification already offers a number of optional elements to give RSS readers a better idea as to when to get content. For example, the pubDate element offers information as to when a feed was last published, as does the lastBuildDate one. ttl (aka. time to live) can also be used to indicate to the software that this feed should live for a certain amount of time. Finally, skipHours and skipDays offers more pointers as to when RSS reader software should not poll. With all those mechanisms in place, it looks like a lot of flexibility exists in the format to accommodate scalability.

When all else fails, reduce
If all of the above still fail, RSS publishers should look at reducing the size of their feeds. There are two ways you can do this. First, you can just say that you're not going to offer full-text feeds. This seems to be the option that Scoble hates. Another way to do things is to offer both abbreviated feeds and full-text feeds or offer more detailed feeds, as I do on TNL.net.

An important consideration when doing something like this is how to address them. By default, users who just use the RSS autodiscovery feature will only get the abbreviated feed. However, they still have the option to go and get the full-text version. The compromise here is that users who just want to subscribe quickly can do so at a lower bandwidth costs, while power users can seek out the fuller feed and subscribe to that. The result, in my experience, is that most people use the autodiscovery feature, grabbing the smaller feed. Some power users do seek out the fuller feed and subscribe to that instead (based on the numbers, I'm seeing a 5% usage of the full-text feed as opposed to the default abbreviated one. This is a compromise solution that seems to accomodate everyone involved to date.

Final considerations
When publishing RSS feeds, your audience grows, which results in traffic growth too. One of the thing to realize is that RSS feeds are generally stickier than the rest of a site. What this means is that, for every new subscriber you get, you will see an on-going increase in your overall site traffic stats. This is not a bad thing as messages emanating from your site do get a higher passive readership. One of the thing that new syndication standards should consider is a follow-up on this. While RSS publisher know how many feeds are being pushed out, there is littl

--
Check out http://www.tnl.net/blog

Miski: client2server2server2client by Philip+Dorrell · 2004-12-09 08:07 · Score: 2, Interesting

In 2000 I tried to invent a spam-proof usenet. The result of my efforts was Miski. The idea of Miski was that users would have addresses on servers representing what are effectively RSS channels, and other users would subscribe to these channels through their servers. There would be a DNS extension for the naming of servers. Channels would have names like username@example.com/"Java Programming". The system would be spam-proof because your server would only send you what you had subscribed to. It would be "push", because as soon as you posted something to a channel, your server would pass the message on to the servers of those who had subscribed to your channel. Only the notifications would be push: ordinary http would be used to retrieve the actual content.

Miski also had the important concept of "reposting", whereby if you saw something you liked, you could press a single button in your client to repost the notification, so that any subscribers to you could know about the item being reposted, if they had not already heard about it from somewhere else. The presumption was that the client (or the reader's server) would trim out duplicates, so that people posting would have no inhibitions about reposting stuff that maybe many of their subscribers already knew about.

Miski was more than just an attempt to create scalable-push RSS, or a spam-proof equivalent of Usenet: it was a vision of the "global brain". Using posting and reposting, notification of a new "interesting" idea could spread very quickly from the inventor of the idea to almost anyone in the world likely to be interested in that idea, even if the inventor was not well known. We would all be like neurons in the brain, with signals passing from one person to the next as fast as possible. It was an attempt to solve the dual problems of "How can I tell the world what I have to say when I have to compete against the efforts of all those other people trying to tell the world stuff?" and "How can I find out new stuff that's really interesting to me from among all this junk that I am getting from all these people trying to tell stuff to the world?".

I asked the question How fast is the Internet?. Although packets can travel from one computer to another in seconds, or even less, information can still take days, weeks, months or even years to travel from the person who created it to another person who is interested in it. One way to measure this is to consider how often you find a document on the web which is interesting, but which you did not know about, and which has nevertheless been available for months or years, and which would have been interesting to you even when it was originally posted on the web.

Sadly Miski was never implemented, and I reduced my ambitions to write Womcat Bookmarks, which attempted to be a less dynamic version of Miski, but has ended up being just another RSS reader.

--
Music: a super-stimulus for the perception of musicality. Musicality: a perceived aspect of speech.

63 of 351 comments (clear)