Slashdot Mirror


Is RSS Doomed by Popularity?

Ketchup_blade writes "As RSS is becoming more known to the mainstream users and press, the bandwidth issue reported by many sites (Eweek, CNet, InternetNews) related to feeds is becoming a reality. Stats from sites like Boing Boing are showing a real concern regarding feeds bandwidth usage. Possible solutions to this problem are emerging slowly, like RSScache (feed caching proxy) and KnowNow (even-driven syndication). RSScache seems to offer a realistic solution to the problem, but can this be enough to help RSS as it reaches an even bigger user base in the upcoming year?"

351 comments

  1. Push by Phroggy · · Score: 5, Insightful

    Remember all the hype about "push" technology back in the mid-nineties? Nobody was interested, but RSS feeds are being used in much the same way now. I'm thinking there are two significant differences: 1) with RSS, the user feels like they're in control of what's going on; with push, users felt like they were at the mercy of whatever money-grabbing corporations wanted to throw at them, and 2) a hell of a lot of people now have an always-on Internet connection with plenty of bandwidth to spare. When you've got a 33.6kbps dialup connection, you use the Internet differently than when you've got DSL or cable.

    How much bandwidth does Slashdot's RSS feed use?

    It looks like the RSS feed on my home page has a small handful of subscribers. Neat.

    --
    $x='S24;r)>63/* h@<5+oZ)32"5cz';$me='phroggy'x$];
    $x=~y+ -xz+\0-Tx+;print$_^chop$me for split'',$x;
    1. Re:Push by Anonymous Coward · · Score: 5, Insightful

      Pointcast sent way too much data at the time, and now we all have orders of magintude more bandwidth.

      Most of the problem come from a few older RSS readers that don't support Conditional GET, gzip, etc. With modern readers, there's essentially no problem (I've measured it on a few sites I run). Yes, they poll every hour or two, but the bandwidth is a tiny, tiny fraction of what we get from say, putting up a small QuickTime.

      There seem to be lots of people who freak out way to quickly about a few bytes. RSS sends to unnecessary data, but if you've configured things correctly, it's much smaller than lots of other things we do on our networks...

    2. Re:Push by sploxx · · Score: 2, Insightful

      Yes, maybe this way 'feels technically different', but if you have an RSS aggregator/news ticker applet whatever on your desktop, it usually hides the implementation details completely from the user. Do you really think of "ok, now my client makes a http request, that travels through the call hierarchy of the libraries, gets a tcp socket open, gets a kernel call of the driver to send a SYN packet??". Even if I may have detailed knowledge about the inner workings of an application, I usually don't care about it.

      BTW, it's the same about eMail and another good reason why the SMTP/POP suite should be replaced soon (besides spam).

    3. Re:Push by sploxx · · Score: 1

      Ahh, and I forgot: Multicast is also a very nice idea for such applications.
      And, did I forget to mention that IPv6 should be implemented ASAP?

      There are sometimes reasons besides DRM and user control for new protocols, standards and formats :-)

    4. Re:Push by ikewillis · · Score: 2, Interesting

      http://beacon.sf.net/ tries to do this using UDP and filesystem monitoring. It waits for the RSS document to change then sends a UDP datagram to notify everyone that a new version is available. It's better than everyone polling the server via HTTP anyway.

    5. Re:Push by AvitarX · · Score: 1

      WooHoo port forward my RSS.

      Never let my roomates know was RSS is, so I can get the forward.

      --
      Wow, sent an e-mail as suggested when clicking on "use classic" banner, and got a fast response that addressed my msg
    6. Re:Push by rlanctot · · Score: 2, Interesting

      My suggestion is to revamp RSS to use a P2P format of publishing, so you spread out the load.

    7. Re:Push by nomel · · Score: 1

      I never understood why they didn't use IP's or DNS entries to tell if a new article was ready.

      For instance,
      They could use a dynamic dns entry. The client would poll the ip of some domain until the ip changed. After a change, the client would go get the new article from some other ip. This wouldn't be very good for small timespans between articles since, looking at my no-ip domain, it takes about 5 mins between updates.

      Or, the RSS client gets the current "waiting" ip at the first poll. Then, it tries to connect to this waiting ip. The waiting ip will not respond until a new article is ready. Then, the waiting ip is incremented/changed. Or something like that. I don't know much about networking, so, I don't know who would get the bandwidth bill.

      Or something similar.

      Any comments?

    8. Re:Push by eugene.y.jen · · Score: 1

      "Push" is not enough to solve the problem. What is necessary is a mechanism as brokers to distribute infromation from RSS publishers to RSS readers in Internet scale. It may look like NNTP, but NNTP does not provide content based distribution, only on topic based distribution. Peoples in PubSub Concepts Inc. is working on this problem for a while and they have already create an interesting system using jabber to send out subscriptions from publishers. I think they are welcomed any critisism and participation in create such systems.

    9. Re:Push by jasonwea · · Score: 2, Interesting

      This seems like a far better than the UDP notification idea. Port forwarding for an RSS feed? No thanks.

      There is almost always a DNS cache at the ISP so the polling interval can be completely controlled by the TTL of the record. Using the existing distributed caching of DNS versus the large percentage of users who are not behind HTTP caches.

      I see two potential problems with this idea:

      1. A lot of people are stuck behind HTTP proxies with limited or no DNS. This isn't too bad as they could fallback to the current system.

      2. Access to the DNS server zone file. Unless you are running your own server, this might be a difficult thing to do as a lot of hosts do not allow direct access to the zone file and would probably frown on lots of changes to the file. If you have a static IP address you could host your own DNS server to get around this however. For someone with bandwidth problems from RSS feeds, this is unlikely to be an issue.

    10. Re:Push by Anonymous Coward · · Score: 0

      p2p adds major overhead on a packet that small.... plus sync issues would be rather difficult to sort out.

      p2p is not a magic bullet. hense why we don't have p2p http.... for web pages and even most images it just isn't worth the overhead to p2p it.

    11. Re:Push by Jahf · · Score: 2, Interesting

      The problems with many of these mechanisms is that (as you mention) smaller sites may not have the facilities to do it.

      On the other hand it seems like everyone and their dog can do P2P.

      A P2P-ish RSS system that:

      * Attempts to make each client capable (but not always used) of functioning as a caching server for the feed

      * Has a top-level owner of a feed who has sole rights to update the feed. Perhaps passing public/private keys with the feed to ensure no tampering. Anyone who wanted to subscribe to the feed would need to connect to the top-level one time to get the keys before using RSS-P2P caches.

      * Hopefully has some intelligence to determine the closest feasible cache (perhaps based on # of hops and # of retries) so that we are peering out bandwidth usage as best as possible

      * Use a standard port and open protocol such that a large organization can route any RSS-P2P requests through a main RSS-P2P cache at the router (further enhancing the ability to minimize traffic ... and also giving a polite way for an organization to shut it off ... just like HTTP)

      * Possibly can push a "refresh notification" packet to any clients that have connected to the cache ... if a client fails to pull a refresh after X # of notification packets, assume it went away ... push a "norefresh notification" every X (minutes|hours|etc) to make sure that the client knows the cache is still viable ... if the client doesn't get a (norefresh|refresh) notification after X number of (minutes|hours|etc) then assume that the server has gone down and find a new one

      * Probably obvious but the RSS-P2P cache would be able to select which caches it wanted to host (though I can see use for a mode where it is told to proxy and cache any RSS-P2P request it receives)

      * Since there are existing RSS (not RSS-P2P) setups out there, we could possibly enhance them by allowing the RSS-P2P cache to speak and send RSS over existing mechanisms (HTTP). Further, any RSS-P2P cache that has this mode enabled could, if willing, send a notification to the top-level RSS-P2P server (which would always be maintained by the authoritative feed owner) and be added to a round-robin DNS for the normal RSS feeds so that it helps share the load for normal RSS as well. Only people willing to be "supercaches" would do this, but it allows larger sites to help spread the load.

      Or I could be way off base. Been known to happen.

      --
      It is more productive to voice thoughtful opinions (reply) than to judge (moderate) others.
    12. Re:Push by tylernt · · Score: 1

      "Do you really think of "ok, now my client makes a http request, that travels through the call hierarchy of the libraries, gets a tcp socket open, gets a kernel call of the driver to send a SYN packet??"."

      Er... yeah, actually, I do think that. Is that bad?

      --
      DRM 'manages access' in the same way that a prison 'manages freedom'
    13. Re:Push by some+guy+I+know · · Score: 2, Funny
      My suggestion is to revamp RSS to use a P2P format of publishing, so you spread out the load.
      RSS Torrent!
      --
      Those who sacrifice security to condemn liberty deserve to repeat history or something. - Benjamin Santayana
    14. Re:Push by Taladar · · Score: 1

      When it comes to troubleshooting I do it too. But naturally not for every click in my browser. If you know all the layers you can choose the necessary abstraction level for the task at hand. As a user you choose high abstraction, as admin trying to find out why something doesn't work lower abstraction levels are sometimes really useful.

    15. Re:Push by Hast · · Score: 1

      Torrents are good for big files. There are extremely unsuited for something like RSS-feeds.

    16. Re:Push by Hast · · Score: 1

      The lag for a DNS change can be several hours, 12+ hours is standard to wait for a change to take effect. Not particularly useful if you ask me.

    17. Re:Push by some+guy+I+know · · Score: 1

      'Twas a joke.

      --
      Those who sacrifice security to condemn liberty deserve to repeat history or something. - Benjamin Santayana
    18. Re:Push by kimba · · Score: 2, Insightful

      DNS expiries and retries are completely configurable. You can set your zone to expire every 5 minutes if you want to. That is how these dynamic DNS places do it.

      Just because you have set up your zone to refresh every 12 hours doesn't mean its mandatory.

    19. Re:Push by Hast · · Score: 1

      Ah, it wasn't quite clear from context. And quite a lot of other people made the suggestion in a serious manner later on.

    20. Re:Push by Ilgaz · · Score: 1

      I know multicast for ages as an ordinary user.

      I think its a hosting provider companies/ mainstream TV stations conspiracy that multicast isn't implemented on regular users ISP.

      The current radio I am listening has to charge 5$ over Real Radiopass for broadband since they can't handle amazing bandwidth charges.

      If multicast was available to all, they would pay a real, real cheap price and STILL could handle station salaries, RIAA money with ads.

    21. Re:Push by bhtooefr · · Score: 1

      Actually, that's kinda what I was thinking - rather than BitTorrent over RSS (well, not OVER, but the files will be listed on the feed), RSS over BitTorrent. Just grab slashdot.rdf.torrent, DL it, and load it.

    22. Re:Push by gnu-generation-one · · Score: 1

      A P2P-ish RSS system that:
      * Attempts to make each client capable (but not always used) of functioning as a caching server for the feed
      * Has a top-level owner of a feed who has sole rights to update the feed. Perhaps passing public/private keys with the feed to ensure no tampering. Anyone who wanted to subscribe to the feed would need to connect to the top-level one time to get the keys before using RSS-P2P caches.


      Like this one?

    23. Re:Push by Pxtl · · Score: 1

      The only catch with multicast is that it would spawn a whole new world of spamming. Consider multicasting portscans, Net Send spam, and a host of other malevolent activities.

      Still, yeah, multicast is important. To me, I think one big nice use is in P2P networking - if you have a fixed peergroup, you can message all the peers with the ease of messaging only one.

    24. Re:Push by Pxtl · · Score: 1

      Agreed. I've always thought that the web needed better support for real-time, and RSS was such an unpleasant band-aid solution for the problem. For one thing, sending XML in plaintext form is a fugly waste of bandwidth. And polling, imho, is always a sign that something is wrong.

      There are many new open P2P systems that promise to give users a way around port forwarding (jxta for example) - so really, having the server dispatch notification of a new update to all the clients is doable. The same P2P framework could then be used to disseminate the information to clients who request details on the update, with the feed acting as a tracker.

      IMHO, I think a better use than simply sending random little headlines would be to work with the existing framework of HTML - have it send out updates to a shared dynamic page. Rather than resending all the html files, send a diff. Basically, eliminate the "refresh" button mashing and streamline the process. Make the Fark comments page into a hybrid of web and irc. Of course, it would have poor ping compared to IRC, and you wouldn't want to track individualised pages for each user like Slashdot maeks (heh, I could see Frames coming back to compensate for this), but a systme like that would be so much cooler - making the Web actually *interactive* - than just RSS.

      But nobody's gonna do it. RSS succeeded _because_ it had a small, refined scope that made it easy to implement.

    25. Re:Push by Ilgaz · · Score: 1

      I don't even speak about p2p, multicast is something you BROADCAST over TCP/IP. Broadcast. Like TV station.

    26. Re:Push by Pxtl · · Score: 1

      Right - so why wouldn't it be a problem that it could allow hackers and spammers to ramp up their activities to a larger scale? Spamming and hacking actions are usually very high in the output and low in the feedback - hitting every port until one acks, net sending to a thousand users, etc. Why wouldn't multicast be useful for that? Especially since most users have a bigger downpipe to handle responses than up-pipe.

      And as for P2P, imagine you've a shared session with a bunch of users, and you make a change to teh shared document - you can multicast the change out to all the users rather than sending it to each one or bouncing it off of a server.

    27. Re:Push by Tinidril · · Score: 1

      Setting your domain to expire every 5 minutes does not mean that every ISP will honor that setting. Several major ISPs routinely ignore short expire settings. Also, this doesn't solve the orignal problem. Now your DNS server will get pounded instead of your RSS box.

      --
      XML is the best data format; unless your data needs to be read or written by a human or a computer.
    28. Re:Push by Ilgaz · · Score: 1

      You should better review multicast faq.

    29. Re:Push by sootman · · Score: 1

      My favorite issue of all time. Also has good articles on ReBoot (cartoon) and a great interview with Tim Berners-Lee.
      W: Do you wish you'd started the Web as a business?
      TBL: If I'd started "Web Inc." it would have been just another proprietary system. You wouldn't have had this universality. For something like the Web to exist, it has to be based on public, nonproprietary standards.

      --
      Dear Slashdot: next time you want to mess with the site, add a rich-text editor for comments.
    30. Re:Push by Jahf · · Score: 1

      Interesting ... combined with Konspire (from another response) this would be very much what I was getting at. Niether by themselves is quite there but taken together it looks like it is almost all covered.

      Thanks for the link :)

      --
      It is more productive to voice thoughtful opinions (reply) than to judge (moderate) others.
    31. Re:Push by devilspgd · · Score: 1

      As cool as the DNS option (or UDP push alternatves) would be, the reality of it is that it's beyond what most users could easily accomplish on your average hosting company.

      Not everybody that has an RSS feed even has anything other then static content -- I've got one HTML file I update locally which I also offer via RSS -- I just upload a new RSS file as well.

      Why not use HTTP's native caching support?

      Client connects to server, provides the date/time of the last cached copy. The server either says 304/Not Modifed, or 200/OK and provides the updated content. There is no reason to transmit the whole XML shebang on every request.

      Ideally, it would be nice if the server could tell the client when the next expected update is too, either on the original request, or with an HTTP extension to include it in the 304/Not Modified response as well.

      For a feed like slashdot the time is hard to predict, but for some feeds we know in advance that the feed only updates every Wednesday morning.

      Users don't care enough to find out how often the feed updates, they usually just accept whatever their client's default is -- Why not integrate this into the protocol?

      Cutting the number of data bytes transmitted from a few KB to a couple hundred bytes would be a huge savings.

      --
      Give a man a fish, he'll eat for a day, but teach a man to phish...
    32. Re:Push by XFilesFMDS1013 · · Score: 1

      There seem to be lots of people who freak out way to quickly about a few bytes. RSS sends to unnecessary data, but if you've configured things correctly, it's much smaller than lots of other things we do on our networks...

      Like...downloading porn?

    33. Re:Push by cloudmaster · · Score: 1

      What "other guy" is trying to say is that multicast is just a "broadcast", much like all of those TV/radio channels out there (but not really the same as a broadcast ping). You have a TV, but you don't get everything that's on every channel, all at once. You just get the channel(s) you've asked to receive. Similarly, multicast capabilities are not likely to affect spam production. You can't just say "send this to every computer on the internet" and have it actually do that - "every computer on the internet" would have to be listening for your broadcast.

      If the client's not listening for a broadcast, the client won't receive it. For things like SMTP - where spam comes from - multicast wouldn't be a benefit. For things like "net send", well, I don't care about net send spam. People affected by that are the kinds of people who will always have internet problems (and, there's no likelyhood that it'll change to work with multicasting anyway). Multicast is and will be implemented only for applications where it makes sense, like steraming of high bandwidth data to multiple clients. Lucklily, that describes 0 situations where spam is a real problem.

    34. Re:Push by Anonymous Coward · · Score: 0
      Like...downloading porn?

      Hoo boy! That's a good one!! Hang on while I stop laughing. OK there. How did you come up with that joke?!? Man, nobody's ever thought of that one before! You must be some sort of comedic genius!!

    35. Re:Push by Anonymous Coward · · Score: 0
      a great interview with Tim Berners-Lee.

      How about the part where he said "why isn't all email hypertext?"

  2. They just need to follow ./'s lead by Neil+Blender · · Score: 5, Insightful

    And institute jackboot banning policies if you access them more than x times per y hours.

    1. Re:They just need to follow ./'s lead by Hatta · · Score: 2, Insightful

      And institute jackboot banning policies if you access them more than x times per y hours.

      I don't know much about RSS, but it seems kind of silly to have the user refresh. Doesn't that defeat the purpose? Why not just have the server send out new news as it gets it?

      --
      Give me Classic Slashdot or give me death!
    2. Re:They just need to follow ./'s lead by NeoSkandranon · · Score: 3, Informative

      If the server initiated the connection then RSS would be useless to nearly everyone who's behind a router or firewall that they do not administer.

      The server would also need to have a list of clients to send the refresh to, which means you'd need to "sign up" so the server puts you on the list.

      Nevermind the difficulties that dynamic IP addresses would cause. It's generally easier if the user initiates things.

      --
      If you can't see the value in jet powered ants you should turn in your nerd card. - Dunbal (464142)
    3. Re:They just need to follow ./'s lead by oliverthered · · Score: 1

      HEAD /myfeed.rss HTTP 1.1

      --
      thank God the internet isn't a human right.
    4. Re:They just need to follow ./'s lead by interiot · · Score: 2, Informative
      This question has been asked many times, and has been answered better than I'm able to.

      But the gist of it is that push-media and multicast are either a thankfully-dead-fad, or are a technology whose time has yet to come. Push media, in particular, was salivated over quite a bit in the late 90's (eg. see Wired's 1997 cover article on it), so it's not as if it's a new idea. Despite this, push and multicast haven't gained wide success yet. Lots of people have various reasons why, and some of them are actually quite insightful. Google more if you want, but at least be aware that if one simply repeats the thoughts of the past in this area, one isn't likely to be successful.

    5. Re:They just need to follow ./'s lead by slashkitty · · Score: 1

      Yahoo news uses a ping mechanism. They only update your feed every once and awhile, but, if you want it updated faster, you ping them when you have an update.

      --
      -- these are only opinions and they might not be mine.
    6. Re:They just need to follow ./'s lead by Red+Alastor · · Score: 1

      It could be done in a more efficient way however. The first few bytes could tell you if there's something new (a number that increment each time something change) and you would only fetch the whole file if there's something new.

      --
      Slashdot anagrams to "Sad Sloth"
    7. Re:They just need to follow ./'s lead by interiot · · Score: 4, Informative

      You know what happens then? The same thing they do when you hamper your RSS feed in any other way, they scrape your HTML and create their own feeds. Slashdot doesn't monitor their front page as closely as they do their rss page, so you can get away with quite a bit of abuse, at least for a while. They've blacklisted my IP ocassionally when I got overzealous though.

    8. Re:They just need to follow ./'s lead by Electroly · · Score: 5, Informative

      HTTP 1.1 already supports this. A conditional HTTP request can be made which basically asks the server if the file has been updated. The server can then respond a 304 Not Modified and avoid sending the entire RSS file again. Unfortunately, poorly written RSS aggregators don't implement this, and it is those aggregators that are the real problem here. They typically are the ones with the default 5 minute update time, too.

    9. Re:They just need to follow ./'s lead by Anonymous Coward · · Score: 0
      poorly written RSS aggregators don't implement this, and it is those aggregators that are the real problem here.
      Any evidence of that?
    10. Re:They just need to follow ./'s lead by JanneM · · Score: 1

      It could be done in a more efficient way however. The first few bytes could tell you if there's something new (a number that increment each time something change) and you would only fetch the whole file if there's something new.

      That is more or less how it is working already. The problems mentioned above seem to be mostly due to some feed readers not actually polling for the last change timestamp (or ignoring it) before downloading the entire feed.

      --
      Trust the Computer. The Computer is your friend.
    11. Re:They just need to follow ./'s lead by Anonymous Coward · · Score: 0

      Yeah, it's a shame they're banning multiple connections from behind the same proxy as well.

    12. Re:They just need to follow ./'s lead by Jeff+DeMaagd · · Score: 1

      I really have never been hit with it.

    13. Re:They just need to follow ./'s lead by Gojira+Shipi-Taro · · Score: 1

      No kidding. I've never even SEEN the RSS feed from /. and when I tried to add an RSS bookmark with Firefox earlier today, I got a "your RSS client has been BANNED"

      Yea you guys are REAL damned competant. Your explaination page goes on about X times in Y minutes. I haven't even QUERYED yet.

      Idiots.

      --
      "Oh my God. This is terrible. This is the end of my Presidency. I'm fucked."; ~ Donald J. Trump
    14. Re:They just need to follow ./'s lead by nesthigh · · Score: 1

      Would wget -N http://www.foobar.baz/rss.xml still grab the whole file or would it honor the 304?? Just curious, I use a bash script (once a day) that calls wget to grab podcasts.

    15. Re:They just need to follow ./'s lead by the+angry+liberal · · Score: 1

      And institute jackboot banning policies if you access them more than x times per y hours.

      Insightful, that is, until someone spoofs a few hundred thousand SRC addresses and takes your site down completely.

      I have never met an automatic IP banning tool which could not be used against the administrator far more effectively than he can use it on the lusers.

    16. Re:They just need to follow ./'s lead by Anonymous Coward · · Score: 0


      Same here. IP banning does not work.

      Thought the geniuses here at /. would have worked that out by now.

    17. Re:They just need to follow ./'s lead by Mike1024 · · Score: 1

      You know what happens then? The same thing they do when you hamper your RSS feed in any other way, they scrape your HTML and create their own feeds. Slashdot doesn't monitor their front page as closely as they do their rss page, so you can get away with quite a bit of abuse, at least for a while. They've blacklisted my IP ocassionally when I got overzealous though.

      From http://paperlined.org/rss/feeds/

      slashdot.rss (perl source) (updated every 10 minutes) (if this gets hammered, it will be removed from public use

      Ironic.

      And by ironic, I mean hypocritical.

      Michael

      --
      "Goodness me, how unlike the FBI to abuse the trust of the American public." -- The Onion
    18. Re:They just need to follow ./'s lead by droleary · · Score: 1

      Would wget -N http://www.foobar.baz/rss.xml still grab the whole file or would it honor the 304?

      Since it's just a plain request with no state information, it would get the whole file. In theory you could add the necessary parts with --header arguments, but you would still have to maintain state (for every URL) somewhere. Odds are that once a day isn't a big burden on whatever servers you're hitting anyway. It's people polling servers multiple times in an hour than need to act more friendly.

  3. syndicate this! by Anonymous Coward · · Score: 0


    404 File Not Found
    The requested URL (developers/04/12/08/2321217.shtml?tid=126&tid =8) was not found.

    If you feel like it, mail the url, and where ya came from to pater@slashdot.org.

  4. Should all sites... by romcabrera · · Score: 1, Redundant

    ...do as Slashdot?
    Ban everyone querying its RSS feed more than once a hour?

    1. Re:Should all sites... by Anonymous Coward · · Score: 0

      Once an hour? Good lord man you would make one sadistic webmaster.

    2. Re:Should all sites... by Anonymous Coward · · Score: 0

      Is it possible to use gzip compression with RSS? I know it works fine with most browsers and with pure text we're talking about 50 - 80% bandwidth savings.

    3. Re:Should all sites... by jamie · · Score: 1

      Your number is a bit off... facts here...

    4. Re:Should all sites... by OrenWolf · · Score: 1

      Yep, that's exactly what BoingBoing does. I simply configured Apache to gzip .xml as well as .html.

      Of course, the readers *themselves* have to support it, too..

  5. Welcome to the internet by Anonymous Coward · · Score: 2, Informative

    Where we use "push" technologies for everything that functionally pulling information, and "pull" technologies for everything that functionally pushes information.

    Whee!

    And the funny thing here is, if RSS had-- at its conception-- included caching and push-based update notification and all the other smart features that would have prevented this sort of thing from becoming a problem now, [i]it would never have been adopted[/i], because the only reason RSS succeeded where the competing standards to do the same thing failed was because RSS was so simple.

    1. Re:Welcome to the internet by Svet-Am · · Score: 2, Insightful

      depends on your perspective. If I imagine myself to be a server, I'm pushing information to a client and pulling information from a client, like the name implies.

      you're interpreting it from the client perspective, which is not where the name came from.

      --
      [move .sig! for great justice, take off every .sig!]
    2. Re:Welcome to the internet by Anonymous Coward · · Score: 1, Insightful

      And the funny thing here is, if RSS had-- at its conception-- included caching and push-based update notification and all the other smart features that would have prevented this sort of thing from becoming a problem now

      What are you talking about? RSS had caching built-in from day one - it uses HTTP as its transport mechanism.

      Frankly, I fail to see the point of this article. It already mentions that there are significant ways around the bandwidth problem. Any decent non-polling technique such as pubsub offers a solution, not to mention the fact that virtually everybody complaining about RSS bandwidth use hasn't bothered implementing best practice - things like 304 Not Modifed, Cache-Control headers, etc.

      So the executive summary of this story is: some people are complaining about RSS bandwidth. Here are links to solutions. Oh no, sky is falling because I just have to include an ominous yet completely unsubstantiated prediction about next year.

    3. Re:Welcome to the internet by Pxtl · · Score: 1

      The fundamental problem is this: RSS was simple. That's why it was adopted - its easy to serve, easy to implement.

      Unfortunately, that's also why it sucks. It just doesn't scale.

      My dream RSS replacement:

      -serves out HTML, so you use it with a modified web browser instead of an RSS aggregator.
      -uses a p2p system like JXTA to send out update notifications to clients, rather than polling.
      -uses a p2p system or multicasting to disseminate content, minimizing server load on updates.
      -sends out only differential information from the last user request. With each update to the page, it logs the current page and the last diff - users requesting updates receive either an agggregate of the diffs since last request, or the full page if it would be smaller. Updates are sent gzipped.

      With features like that, you could serve comment threads off of the damn thing, updated per comment, much less news headlines. Only catch is that you'd want the site to have one single page shared between all users (like Fark) so that it wouldn't have to log a diff history per-user.

  6. RSS readers don't cache! by IO+ERROR · · Score: 4, Insightful

    One thing that would help immensely is if RSS readers/aggregators would actually cache the RSS feed and not download a new copy if they already have the most current one. I could go through my server logs and point out the most egregious problem aggregators if anyone's interested.

    --
    How am I supposed to fit a pithy, relevant quote into 120 characters?
    1. Re:RSS readers don't cache! by Dorothy+86 · · Score: 1

      I'm rather interested on how bad the built-in firefox reader (i.e. "Live Bookmarks") is, if you wouldn't mind.

    2. Re:RSS readers don't cache! by The-Pheon · · Score: 1

      I am interested in which aggregators you think are the worst offenders. the only true solution to the bandwidth problem, even with well behaved aggregators, is moving away from a polling framework. Syndication should be pubsub event based to solve the problem. Q.E.D.

      ----
      Dynamic DNS from ThatIP.

    3. Re:RSS readers don't cache! by gad_zuki! · · Score: 4, Insightful

      Sometimes you can't tell if you have the newest file, depending on the web server/config.

      The problem, is of course, server-side. For instance, the GPL blog software Word Press doesnt do ANY cacheing. Its RSS is a php script. So if you get 10,000 requests for that RSS, then you're running a script 10,000 times. That's ridiculous and poor planning. Other RSS generation is guilty of this crime.

      Yes, there is a plug in (which doesnt work at nerdfilter nor at the other wordpress site I run) and a savvy person could just make a cron job and redirect RSS requests to a static file, but that's all besides the point. This should all be done "out of the box." This is a software problem that should be addressed server side first, client side later.

      Not to mention, a lot of these RSS readers are big sites like bloglines, newgator, etc who should be respecting bandwidth limits, but really have no incentive to do so. RSS really doesnt scale too well for big sites. What they should be doing is denying connections for IPs that hit it too often or change the RSS format to give server instructions like "Dont request this more than x times a day" in the header for the clients to obey. x would be a low number for a site not updated often and high for asite updated very often.

    4. Re:RSS readers don't cache! by maskedbishounen · · Score: 5, Informative

      To some extent, this could be blamed on the feed itself. Ideally, it works like this..

      When you request the feed, you first get sent your normal HTTP header. If properly configured, it will return a 304 if you have the most recent version -- however, as many feeds are generated in PHP[1], this header is defaulted off, and you'll end up with your standard 200, or go ahead, code. This single handedly wastes a metric tonne of bandwidth needlessly.

      Even if you're trying to rape a feed, you'll only be wasting a few hundred bytes at most every half hour, than the whole 50K or whatnot size it is.

      See here for a more detailed explanation.

      [1] This is not a PHP specific issue; a lot of dynamic content, and even static content, fails to do this properly. But this is what it's there for, after all.

      --
      "An infinite number of monkeys typing into GNU emacs would never make a good program."
    5. Re:RSS readers don't cache! by IO+ERROR · · Score: 4, Informative
      For instance, the GPL blog software Word Press doesnt do ANY cacheing.

      Technically true but misleading. WordPress allows user agents to cache the RSS/Atom feeds, and will only serve a newer copy if a post has been made to the blog since the time the user agent says it last downloaded the feed. Otherwise it sends a 304. This is in 1.3-alpha5. I dunno what 1.2.1 does.

      Not to mention, a lot of these RSS readers are big sites like bloglines, newgator, etc who should be respecting bandwidth limits, but really have no incentive to do so.

      Not coincidentally, these are the egregious worst offenders I mentioned. Bloglines grabs my RSS2 and Atom feeds hourly, and doesn't cache or even pretend to. Firefox Live Bookmarks appears to cache feeds, but your aggregator plugins might not. I can't (yet) tell the difference from the server logs between Firefox and the various aggregator plugins.

      The best ones are the syndication sites that only grab my feeds after being pinged. Too bad I can't ping everybody. That could solve the problem if there was some way to do that.

      --
      How am I supposed to fit a pithy, relevant quote into 120 characters?
    6. Re:RSS readers don't cache! by pridkett · · Score: 1

      Even if the file is generated on the fly, you still can avoid having to retransmit it by utilizing the etag and the if-none-match header. Basically this is a hash of the file contents that overrides the if-modified-since header. Simple solution: make wordpress generate an etag for the file and then compare it.

      Anyway, you're right, it's not a bandwidth issue, for the most part its a software issue. I'm tracking some weblogs for research and crawl the RSS feeds once a day. Most sites only update their feed once every few days (individual weblogs we're talking about here), but don't handle either if-modified-since or if-none-match. This is largely because the software that generates the RSS feed does so on the fly.

      I was running into similar problems on my site using Pyblosxom. The solution was just a little Cron magic combined with some mod_rewrite foo. Now the feed gets updated at most once an hour and all those remote sites can cache to their hearts content.

      --
      My Slashdot account is old enough to drink...
    7. Re:RSS readers don't cache! by Knightking · · Score: 1

      Considering that a large number of users could be getting the feed from bloglines, it really isn't a problem. If it isn't saving you bandwidth even with the lack of caching, then the bandwidth taken up by RSS probably doesn't matter.

    8. Re:RSS readers don't cache! by IO+ERROR · · Score: 1
      The bloglines user-agent field tells you how many people have subscribed. And I find it patently ridiculous for them to fetch my RSS feeds hourly for...wait, they should hit me any minute now...

      216.148.212.188 - - [08/Dec/2004:20:29:12 -0600] "GET /feed/rss2/ HTTP/1.1" 200 17300 "-" "Bloglines/2.0 (http://www.bloglines.com; 1 subscriber)"
      216.148.212.188 - - [08/Dec/2004:20:29:13 -0600] "GET /feed/atom/ HTTP/1.1" 200 11531 "-" "Bloglines/2.0 (http://www.bloglines.com; 1 subscriber)"

      One subscriber!

      --
      How am I supposed to fit a pithy, relevant quote into 120 characters?
    9. Re:RSS readers don't cache! by Anonymous Coward · · Score: 0

      The best ones are the syndication sites that only grab my feeds after being pinged. Too bad I can't ping everybody. That could solve the problem if there was some way to do that.

      Man, I sure hope some light bulbs are going off in some coder's head somewhere, this exactly the problem with doing "push" over a "pull" network.

      It's the same thing as polling a queue vs. getting an event when an item is added to the queue, etc.

      Something like Jabber is the solution perhaps?

    10. Re:RSS readers don't cache! by Hooded+One · · Score: 1

      I would think (hope?) that Firefox would use its built-in caching and HTTP 1.1 conditional get for Live Bookmarks, since those things are all there for the browser already.

    11. Re:RSS readers don't cache! by Anthony+Boyd · · Score: 1
      If properly configured, it will return a 304 if you have the most recent version -- however, as many feeds are generated in PHP[1], this header is defaulted off, and you'll end up with your standard 200, or go ahead, code. This single handedly wastes a metric tonne of bandwidth needlessly.

      Yes, this is a problem for lots of generated feeds, PHP or otherwise. In fact, in my phpBB Blog tool, this is one of its weaknesses. However, the solution is surprisingly easy: generate the RSS file as an actual .rss file that sits in the filesystem. If you output to the filesystem instead of to the browser, then Apache (or whatever Web server you use) takes over. Apache will serve the file in optimal fashion. This will be a change I make for the next revision of my code, and hopefully many others do the same.

    12. Re:RSS readers don't cache! by dangermouse · · Score: 1
      Are you actually updating the posts in your feed that frequently? If not, you might want to figure out why your server isn't sending 304s. Here's what bloglines looks like to me:

      216.148.212.188 - - [03/Dec/2004:09:15:43 -0500] "GET /index.xml HTTP/1.1" 304 0 "-" "Bloglines/2.0 (http://www.bloglines.com; 1 subscriber)"
      216.148.212.188 - - [03/Dec/2004:10:16:00 -0500] "GET /index.xml HTTP/1.1" 304 0 "-" "Bloglines/2.0 (http://www.bloglines.com; 1 subscriber)"
      216.148.212.188 - - [03/Dec/2004:11:16:03 -0500] "GET /index.xml HTTP/1.1" 304 0 "-" "Bloglines/2.0 (http://www.bloglines.com; 1 subscriber)"
      216.148.212.188 - - [03/Dec/2004:12:15:40 -0500] "GET /index.xml HTTP/1.1" 200 21331 "-" "Bloglines/2.0 (http://www.bloglines.com; 1 subscriber)"
      216.148.212.188 - - [03/Dec/2004:13:15:54 -0500] "GET /index.xml HTTP/1.1" 304 0 "-" "Bloglines/2.0 (http://www.bloglines.com; 1 subscriber)"
      216.148.212.188 - - [03/Dec/2004:14:15:28 -0500] "GET /index.xml HTTP/1.1" 304 0 "-" "Bloglines/2.0 (http://www.bloglines.com; 1 subscriber)"
      216.148.212.188 - - [03/Dec/2004:15:16:27 -0500] "GET /index.xml HTTP/1.1" 304 0 "-" "Bloglines/2.0 (http://www.bloglines.com; 1 subscriber)"

      Last month, the most popular blog on my server transferred about 200MB of data (very few images on this site). 900k of that went to bloglines, which polled every hour all month long.

  7. RSS by Anonymous Coward · · Score: 0

    doesn't stop me from hitting refresh a million times on Slashdot.

  8. Usenet? by froody · · Score: 0

    Doesn't Usenet deal with exactly this kind of problem? It's not quite instant, but neither is rss, since most people aren't polling non-stop.

    Tim

    1. Re:Usenet? by Jeffrey+Baker · · Score: 1

      Yes. RSS is not doomed by popularity, it's doomed by its own naïve polling architecture. The designers of RSS certainly did not learn anything from history.

    2. Re:Usenet? by Anonymous Coward · · Score: 1, Interesting

      Exactly. The Web has long needed a newsfeed-style protocol that defines a path of caches that distribute data. It need not be quite as rigid as the Usenet setup. For example, the initial fetch of data by user request could establish the cache path, rather than having it explicitly administered. But the core idea of pushing a lot of data a lot of people want closer to their node makes sense. The "ball of endpoints" view of the 'Net unfortunately does not encompass this sort of distribution network.

      Current web caches tend to be local to the user (their ISP) or to the source (Akamai hosting, etc). It's the intermediate caches that are missing.

  9. Limiting by Kizzle · · Score: 0, Redundant

    Do most sites have sort of limit to how many times you can access the RSS feed in a given period of time? It seems like limiting requests to once an hour would cut down bandwidth considerably. There is always those people who think they need up to the second updates.

  10. rsstorrent will solve it all by RangerWest · · Score: 4, Interesting

    rsstorrent -- distributed rss,echoing bittorrent?

    1. Re:rsstorrent will solve it all by DietFluffy · · Score: 1

      i don't think a bittorrent-type solution would work very well because bt was designed for the transfer of relatively large files. the fixed cost of having to negotiate a connection with peers would probably be larger than the tiny individual rss feeds themselves.

  11. First RSS, now Slashdot by Anonymous Coward · · Score: 1, Funny

    So first BoingBoing gets in trouble because of all the RSS traffic.. and now they're about to be slashdotted. Tough luck

  12. I'm guessing by xXDarkNinjaXx · · Score: 1

    It probably won't help the sites to slashdot them :)

  13. New idea? by pmazer · · Score: 1

    I'm sure someone's already thought of this, but what if the RSS reader was required to submit the "code" of the latest feed that was recieved. Then, the only thing that would be sent to the reader were the more recent articles.

    1. Re:New idea? by Anonymous Coward · · Score: 0

      Good, but how do you get a new reader "subscribed"? You have to give out a first, full copy of the feed sometime.

    2. Re:New idea? by Doctor+Crumb · · Score: 1

      HTTP incorporates jsut such a thing; it's implemented in every single webserver out there. All serious browsers support this; many RSS readers do not. Check out HTTP error 304: not modified: http://www.w3.org/Protocols/HTTP/HTRESP.html

  14. I just don't see the problem by Anonymous Coward · · Score: 0

    If they're serving pages in HTML (or whatever) anyway, who cares what the format is? Who cares what pulls the info off their server?

    Oh, yeah, advertisers. RSS could be an annoying reminder to them that the internet used to be ad-free.

  15. Doomed? It's barely got off the ground... by WIAKywbfatw · · Score: 5, Insightful

    What you're seeing right now are teething troubles. Nothing more, nothing less. The bandwidth and consumption experienced right now will be laughed off a couple of years from now as miniscule.

    Take the BBC News website for example. On September 11th 2001 its traffic was way beyond anything it had experienced to that point. Within a year or so, it was comfortably serving more requests and seeing more traffic every day. Proof if it was needed that capacity isn't the issue when it comes to Internet growth, and won't be for the foreseeable future.

    RSS is in its infancy. Just because people didn't anticipate it being adopted as fast as it has been that doesn't make it "doomed". By that rationale, the Internet itself, DVDs, digital photography, etc are all "doomed" too.

    --

    "Accept that some days you are the pigeon, and some days you are the statue." - David Brent, Wernham Hogg
    1. Re:Doomed? It's barely got off the ground... by Anonymous Coward · · Score: 0

      The key problem is that people poll RSS sites when the site should be pushing updates to clients instead.

      It's just a backwards lame design.

    2. Re:Doomed? It's barely got off the ground... by B747SP · · Score: 1
      y that rationale, the Internet itself, DVDs, digital photography, etc are all "doomed" too.

      And *BSD... you forgot to mention that *BSD is dying too!

      --
      I find your ideas intriguing and I wish to subscribe to your newsletter.
    3. Re:Doomed? It's barely got off the ground... by pipingguy · · Score: 1


      Within a year or so, it was comfortably serving more requests and seeing more traffic every day.

      Nah, I think we have spammers to thank for bandwidth increases. [/joke]

  16. Maybe someone should explain... by jefp · · Score: 1

    Just what is the scaling behavior of RSS, anyway?

    1. Re:Maybe someone should explain... by Anonymous Coward · · Score: 0

      Doubling the users "subscribing" to a site quadruples the load because the new users check for updates more often than the old ones. Ah the curse of having the unwashed masses using their unlimited always-on high bandwidth connections to the max. Fortunately the spam-bots, zombies, and various spyware programs limit the available capacity for RSS use somewhat.

  17. Limit download to new content by zoips · · Score: 5, Interesting

    Instead of downloading the entire RSS feed every time, why not have aggregators indicate to the server the timestamp of the last time the RSS feed was downloaded, or the timestamp of the last item in the feed the aggregator knows about, and then the server can dynamically generate the RSS with only new content for that client. Increases processing load while reducing bandwidth, but processing time is what most servers have lots of, not to mention it's far cheaper to increase than bandwidth.

    1. Re:Limit download to new content by Duncan3 · · Score: 1

      Because that's way to obvious to anyone with a second grade education ;)

      Everyone admits (now) that RSS was a really stupid protocol even as protocols go.

      Oh, and the timestamp thing adds far less processing overhead then the reduction of packets would save.

      --
      - Adam L. Beberg - The Cosm Project - http://www.mithral.com/
    2. Re:Limit download to new content by Fweeky · · Score: 1

      This is what conditional GET does. The client sends If-Modified-Since: $last_timestamp and If-None-Match: $e_tag_of_last_download headers and the server can respond with Not Modified as it sees fit.

    3. Re:Limit download to new content by JanneM · · Score: 1

      I actually find myself wanting to refer back to earlier entries quite often (feeds for software releases, for instance, or go back to check some link on a blog that I didn't have time for earlier).

      --
      Trust the Computer. The Computer is your friend.
    4. Re:Limit download to new content by RollingThunder · · Score: 1

      That's not quite what he's talking about.

      If you previously had a copy of the feed with items A through X, and now A has dropped off and Y has been added, you'll pull the entire thing, B through Y, when all you really need is:

      remove A
      add Y, data follows

      Think more along the lines of "diff since $last_timestamp".

    5. Re:Limit download to new content by qbwiz · · Score: 1

      Just because the server wouldn't send old items every time, doesn't mean that the client can't cache them and display them anyway.

      --
      Ewige Blumenkraft.
    6. Re:Limit download to new content by goon+america · · Score: 1

      I do this with my web site... it's dynamically generated by PHP, but it sets the HTTP headers to appear like a static file, setting the Last modified: date to the latest item in the list (and it will not execute the remainder of the file if it's getting a HEAD request).

      So that saves a little bit of bandwidth. Now all we need is RSS readers that actually obey this protocol.

    7. Re:Limit download to new content by zoips · · Score: 1

      At my last job we did we did the same basic thing for essentially a specialized RSS aggregator for the product. The client software would always do a HEAD request first, then compare the value of the Last Modified header to the timestamp of the last item. If it was newer, it would do a GET request with the timestamp of the last item it had as part of the URL in the request. The server would then run an XSL-T stylesheet on the RSS file on the server, copying out only items that would be new for the client, and send those. The client itself would then merge those items into its local version of the RSS feed.

      There were some other things (such as the server could indicate when an RSS item was considered dropped from the feed, or even modified), and aside from using XSL-T, instead of a servlet like we used for most everything else in the project, it was a good solution that saved on bandwidth.

    8. Re:Limit download to new content by Wesley+Felter · · Score: 1

      This has been suggested before, and software is starting to support it. I think RSSCache does this and IIRC there's a WordPress plugin to do it. It will just take a while for this feature to become common practice.

    9. Re:Limit download to new content by cheekyboy · · Score: 1

      Or better yet, together with the request, add the hash/crc sum of what you have, so the server can respond, and say, "sorry dude, u alreayd have the latest"

      ie

      GET /readnews.rss?mycrc=9287429874

      --
      Liberty freedom are no1, not dicks in suits.
    10. Re:Limit download to new content by psetzer · · Score: 1
      I was thinking something similar. Multi-file RSS with hashing. Of course, this isn't my area of expertise (or even competence), so this may be a little stupid.

      OK, suppose you ping some website with my proposed system. It gets the request, and sends back a list of the hashes of the current article files. Your computer hashes your files, and then it sends back the hashes which don't match up. Those are the files you don't have. The server then ships the files back to you, and you have your blog fix, or whatever it is.

      There are some possible problems. One, you can still ask for the hashes every few seconds. I'd just set up some system where you automagically ban any IP address that's doing that. Two, you could send back nonsense hashes, which would be akin to asking for files that aren't there. Three, it seems like I'm reinventing the wheel here. Any informed (or uninformed) opinions?

      --
      "Anyone who attempts to generate random numbers by deterministic means is living in a state of sin." -- John von Neumann
  18. About time for asynchronous by benow · · Score: 3, Informative

    Asynchronous event driven models are the way to go for changing content. They're trickier to code, but require less bandwidth and are more responsive. Perhaps a bit of a privacy issue, at some level (registration with source), but easy to implement, failure resistent distributed asynchronous networks have much applicability, not just to RSS.

    1. Re:About time for asynchronous by 21mhz · · Score: 1

      Indeed, especially as the technological base needed for asynchronous delivery on a subscription basis has been just about completely laid out.
      Heck, it can even deliver the same RSS items.

      --
      My exception safety is -fno-exceptions.
  19. Duh... Simple solution by Talez · · Score: 1



    There we go. You now have version control.

    Keep copies of the RSS on the server for 30 days.

    http://www.mysite.com/requestfeed?myversion=2004 12 061753

    diff the new version from the old version. Send whats changed.

    How fucking hard is that people?

    1. Re:Duh... Simple solution by moojj · · Score: 2, Interesting

      I think the biggest reason people are offering RSS feeds is because its a standard XML file on the webserver. No need to make additional scripts, no need to setup additional services -- just upload the XML file. When you start complicating the "Really Simple Syndication" model you start making it less simplistic. In my opinion the easiest way to limit bandwidth is to supply the XML file on servers that support gzip compression and the "Etag" header function. This way RSS readers will only download a compressed XML file, but only when it has been modified. Larger sites could go one step further and ban polling by RSS clients that don't support the Etag lookup feature before requesting the XML file. Then, theres always the obvious solution: cut down the number of items inside the XML file, thus lowering the amount of bandwidth per hit.

    2. Re:Duh... Simple solution by Talez · · Score: 1

      Ummmm....

      If a site is being hammered by RSS traffic do you think it would be a bit too much to just implement a script that implements crude version control?

      After all, scripts have to make the RSS XML files in the first place. Why you couldn't just drop in version control into that I don't know.

  20. Not a problem with RSS.. just humans. by dustinbarbour · · Score: 4, Interesting

    RSS feeds are meant as a way to strip all the nonsense from a site and offer easy syndication, right? Basically, present the relevent news from a full-fledged webpage in a smaller file size? If such is the case, this isn't an RSS issue, really. I see it more as a bandwidth issue. I mean, people are going to get their news one way or the other.. either with a bunch of images and lots of markup via HTML or with just the bare minimum of text and markup via RSS. I would prefer RSS over HTML any day of the week! But perhaps RSS makes syndication TOO simple. Thus everyone does it and that eats additional bandwidth that normally would be reserved for those browsing the HTML a site offers.

    And you could implement bans on people who request the RSS feed more than X times per hour as someone suggested (Doesn't /. do this?), but I don't think that gets around the bandwidth issue. I mean, those who want the news will either go with RSS or simply hit the site. Again, RSS is the preferred alternative to HTML.

    So here's my suggestion.. go to nothing but RSS and no HTML!

    1. Re:Not a problem with RSS.. just humans. by yup+that's+me · · Score: 1

      The thing about RSS is that it's a lot easier to check a whole heap of sites with the click of one button than it is to keep accessing each one individually. Ergo, people check sites more often, and bandwidth consumption is higher. Presumably above a certain threshold of users, more bandwidth is being consumed by the frequent checks than was used on sporadic non-RSS checks.

    2. Re:Not a problem with RSS.. just humans. by IO+ERROR · · Score: 1
      /. will temporarily block you if you pull the feed more than 48 times in a day or something like that. It works out to once a half hour.

      And excuse me, but lose HTML? The whole web as RSS feeds? You must be kidding. There's way too much content out there that simply can't be put into an RSS feed. It's static, it represents downloadable files, or documentation, or useless marketing hype, or whatever.

      --
      How am I supposed to fit a pithy, relevant quote into 120 characters?
    3. Re:Not a problem with RSS.. just humans. by kardar · · Score: 2, Insightful

      I wonder if advertising has anything to do with it - if you go to a news site just to see "what's up", you might get banner ads, google ads, so on and so forth - but RSS just makes a nice neat webpage for you or something similar.

      I have to point out how much I love "Sage", the Mozilla Firefox plugin for RSS - you can even rightclick on that XML thing that tries to tell you to save the page and bookmark it under "Sage Feeds" and then Alt-S and you have your RSS.

      I started using Sage for /., Groklaw, and a couple others and it's very cool. Very very cool. I hope the advertising revenue doesn't hurt people or whatever, but it's almost one of those things that would be worth money in how much time and aggravation it saves you having to deal with web designs that aren't as great as they could be.

      I've heard a lot about how people complain about Slashdot and the interface and the web design and so on, but Sage cuts down significantly on the time spent here, more or less - or anywhere, for that matter - I think it make the ./ or Groklaw or whatever experience BETTER, not worse.

      Only thing I can think of is advertising revenue.

    4. Re:Not a problem with RSS.. just humans. by dustinbarbour · · Score: 1

      Yes, I was kidding about losing HTML.

    5. Re:Not a problem with RSS.. just humans. by rmarll · · Score: 1

      So here's my suggestion.. go to nothing but RSS and no HTML!

      Except it'll be an hour before someone implements images in RSS feeds, and then it's 1990 all over again.

    6. Re:Not a problem with RSS.. just humans. by JanneM · · Score: 1

      Except it'll be an hour before someone implements images in RSS feeds, and then it's 1990 all over again.

      We have that already. Boing Boing (mentioned above), for instance, sends image links with their feed, which shows up nicely in Sage.

      --
      Trust the Computer. The Computer is your friend.
    7. Re:Not a problem with RSS.. just humans. by Anonymous Coward · · Score: 1, Insightful

      RSS feeds are meant as a way to strip all the nonsense from a site and offer easy syndication, right? Basically, present the relevent news from a full-fledged webpage in a smaller file size? If such is the case, this isn't an RSS issue, really.

      RSS has different use patterns to normal website visits.

      If you visit a website on a daily basis, you might pull down a single hit. You might lose interest after a couple of days. You might not visit at weekends.

      But if you subscribe to the feed (RSS, Atom, whatever), chances are, you'll be requesting that feed every hour for as long as you have your mail client/newsreader/web browser open. And even if you lose interest, a lot of people will still remain subscribed and just skim over what doesn't interest them.

      Newsfeeds make visitors "sticky". Normally, that's a good thing, but it's actually far better at doing so than need be, "capturing" visitors that really aren't all that interested, and inducing normal visitors to "visit" far more frequently than usual.

      That's the inherent difference between serving, say a cut-back XHTML Basic document, and serving an RSS feed that people can subscribe to.

    8. Re:Not a problem with RSS.. just humans. by drew · · Score: 1

      maybe there needs to be some sort of QOS policy on the server. never allow more than x% of available bandwidth to be dedicated to rss feeds; that would allow sites to make sure they always give preferential treatment to real html page views. since rss aggregation happens mostly in the background, this wouldn't overly affect rss readers, while insuring that the people browsingthe website aren't being slowed down noticeably.

      as far as i am aware, there is no way to do this with any current webserver software, unless you use separate programs or server instances to serve rss feeds and normal pages.

      --
      If I don't put anything here, will anyone recognize me anymore?
    9. Re:Not a problem with RSS.. just humans. by elemental23 · · Score: 1

      Not only that, but since I started using an RSS reader, I'm constantly on the lookout for new and interesting sites that have RSS feeds. Half the sites I read RSS from are sites I didn't visit normally before, and probably wouldn't if it weren't for RSS. I only have enough time in the morning to browse a few sites normally, but I can keep up on three times as many using RSS.

      Judging from my experience then, RSS may reduce the bandwidth used by your regular readers (because they aren't downloading images, etc), but that savings will be used up by new readers who are only there because of the feed.

      --
      I like my women like my coffee... pale and bitter.
    10. Re:Not a problem with RSS.. just humans. by m50d · · Score: 1

      No, there is an important difference. This site may be an exception, but normally people don't keep a web browser in the background continuously refreshing a page for when it changes. They read the page and move on. RSS is designed to let people do exactly that, and it's what they do. RSS is for truly dynamic pages, which HTML isn't really designed for although, again, this site shows it can work. And it has been argued that the old "client requests page, server sends it" model isn't suitable for this kind of use.

      --
      I am trolling
  21. Pop Fly by Anonymous Coward · · Score: 5, Funny

    "Is RSS Doomed by Popularity?"

    "Is Instant Messaging Doomed by Popularity?"

    "Is E-Mail Doomed by Popularity?"

    "Is Usenet Doomed by Popularity?"

    "Is The Internet Doomed by Popularity?"

    "Is Linux Doomed by Popularity?"

    "Is Apple Doomed by Popularity?"

    "Is Netcraft Doomed by Popularity?"

    "Is Sex with Geeks Doomed by Popularity?" :)

    1. Re:Pop Fly by kyouteki · · Score: 1

      "Is Sex with Geeks Doomed by Popularity?" Not until it's actually popular. :/

      --
      A slashdotter who didn't build his own computer is like a Jedi who didn't build his own lightsaber.
    2. Re:Pop Fly by Anonymous Coward · · Score: 0

      Doom Doomed by popularity?

    3. Re:Pop Fly by nEoN+nOoDlE · · Score: 1

      So that's what happened with sex with geeks! damn, I joined geekdom way too late, and now the orgies are gone.

      --
      Don't trust a bull's horn, a doberman's tooth, a runaway horse or me.
    4. Re:Pop Fly by Wesley+Felter · · Score: 1

      I know the parent post is a joke, but there are some legitimate differences between RSS and other forms of Internet communication.

      "Is Instant Messaging Doomed by Popularity?"

      No, because it doesn't generate traffic when you're not sending messages.

      "Is E-Mail Doomed by Popularity?"

      No, for the same reason as IM.

      "Is Usenet Doomed by Popularity?"

      It has been for some time. :-)

      "Is Netcraft Doomed by Popularity?"

      Netcraft confirms it: Netcraft is dying.

    5. Re:Pop Fly by flynniec6 · · Score: 1

      Thought you'd slip the last one by without anyone noticing, didn't you.

  22. Nobody goes there anymore... by wirelessbuzzers · · Score: 1

    ... it's always too busy.

    --
    I hereby place the above post in the public domain.
  23. Mod parent up by commodoresloat · · Score: 1
    This is a big problem and it would be substantially mitigated with such a simple solution.

    Following along the same line of reasoning, why not have the RSS reader send one request, and then changes are pushed to the reader after that? The reader can cache the change so if the user hits reload they get the most recent cache rather than hitting the server again.

    1. Re:Mod parent up by gad_zuki! · · Score: 1

      >why not have the RSS reader send one request, and then changes are pushed to the reader after that?

      Well, they tried this way back when. I think they called it web casting. RSS is really just a lo-fi form of webcasting. You dont need to have any open ports on your machine, no special service running on the web server, just a flat file in the RSS format.

      Webcasting may replace RSS, but then we would probably have the opposite problem. "Why is slashdot slashdotting me!!"

    2. Re:Mod parent up by Ctrl-Z · · Score: 1

      Following along the same line of reasoning, why not have the RSS reader send one request, and then changes are pushed to the reader after that?

      The problem I see with that is network users who are behind firewalls. You can't very well push RSS data to them now, can you?

      --
      www.timcoleman.com is a total waste of your time. Never go there.
  24. Solutions by markfletcher · · Score: 5, Informative

    There are several ways to mitigate the bandwidth issues. First, all aggregators should support gzip compression and the HTTP last-modified and etags headers. That'll take care of a lot of the problems. The other solution is to get people to use server based aggregators, like Bloglines, which only fetch a feed once per iteration, regardless of how many subscribers there are. As a bonus, there are several things that server-based aggregators can do that desktop based aggregators can't do, like provide personalized recommendations. I like this solution, but of course I'm biased since I'm the founder of Bloglines. :)

    1. Re:Solutions by Anonymous Coward · · Score: 0

      hey why doenst' bloglines play nice and publish all its own content as rss?

    2. Re:Solutions by markfletcher · · Score: 1
      We do publish a lot of stuff as RSS. All clip blogs have RSS feeds, of the form http://www.bloglines.com/blog/BLOGNAME/rss.

      The top links displays also have RSS feeds (see http://www.bloglines.com/rss/{toplinks_inc,toplink s,toplinks_dec}).

      And, through the Bloglines API, (see our API section), you can access all of your subscriptions as RSS 2.0 feeds. There are several open-source projects based on the Bloglines API, and several desktop-based aggregators have integrated the API as well.

      Hope this helps.

    3. Re:Solutions by themurph17 · · Score: 1

      i tried some stand alone aggregators and Firefox Live Bookmarks and then finally Bloglines on a co-worker's recommendation. Bloglines rocks. accessible from any web browser, mobile version (great for Pocket PC on the train), and of course the Firefox extensions. i see no need for a stand-alone aggregator program. the only thing that interests me in the RSS "scroller" in the new Netscape preview. i'm surprised there wasn't more mention of Bloglines. does the general Slashdot population have something against it?

    4. Re:Solutions by the_truk_stop · · Score: 1

      there are several things that server-based aggregators can do that desktop based aggregators can't do

      I assume you're speaking of server-vs-desktop in the same relationship as webmail-vs-pop3. Yes, server-based aggregators can probably do a whole lot more because of the ability to network people together with common interests and watch usage statistics, etc.

      That said, there's a reason I use RSS feeds (and preferably Atom, if available) over visiting my friends' Xangas (ugh) and Livejournals. Most of my friends have an inability to color-coordinate their blogs. One person I know has an awful background image that repeats poorly, and consistently has varying colors from yellow-on-white to purple-on-black. Thanks to my desktop aggregator, I'm able to view the content in a readable color scheme. I would assume that a server-side aggregator would impose its own color scheme on me.

      Just some thoughts from a POP3 user. *grin*

      And as a suggestion: please make sure Bloglines validates properly against validator.w3.org.


      Moderate this comment
      Negative: Offtopic Flamebait Troll Redundant
      Positive: Insightful Interesting Informative Funny

      Sign up for a free flatscreen or LCD monitor

  25. A simple fix by jd · · Score: 2, Informative
    What you have is a large number of subscribers accessing a common data source at or around the same time. The simplest fix would be to have a reliable multicast version of RSS, which is broadcast to all subscribers to that feed. Then, you only have to transmit the updates once. The network would take care of it from then on.


    New subscribers would receive the initial copy of the feed via traditional unicast TCP, because that would be the least CPU-intensive way of handling a few requests at a time.


    A caching system won't work for the same reason web caches have never caught on in the US - people are terrified of being sued to smithereens for potential copyright infringement. Even if any case would be thrown out of court instantly (by no means certain in the US) the costs would be prohibitive and malicious plaintifs rarely ever get asked to pay costs.


    The main problem with the multicast solution is that although multicasting is enabled across the backbone, most ISPs disable it - for reasons known only to them, because it costs nothing to switch it on. Persuading ISPs to behave intelligently is unlikely, to say the least.

    --
    It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
    1. Re:A simple fix by AeiwiMaster · · Score: 1


      Or just send out messages on jabber or psyc.

  26. Solved? by Anonymous Coward · · Score: 0

    Nothing a little tweaking can't fix...
    http://blog.glennf.com/mtarchives/004540.html

  27. RSS rules. by SubKamran · · Score: 0

    I love RSS. I use it all the time with Firefox, Thunderbird and Trillian Pro. I just adore it.

    --
    Kamran A
    1. Re:RSS rules. by version5 · · Score: 1

      Wow, really? I like pistachios! They are tasty, and I eat them all the time!

      --

      "It's Dot Com!"

  28. RSSoP2P? by Anonymous Coward · · Score: 0


    Just an idea that pop out of my head without further think: why not RRSoP2P (RSS over P2P)? Sharing streams across peer users.

    1. Re:RSSoP2P? by cyfer2000 · · Score: 0, Redundant

      Same here. It should be a good choice.

      --
      There is a spark in every single flame bait point.
    2. Re:RSSoP2P? by Siener · · Score: 1

      Just an idea that pop out of my head without further think: why not RRSoP2P (RSS over P2P)? Sharing streams across peer users.

      It's the first thing I thought of as well. Create an RSS reader that works along the same lines as bittorrent - a technology that has already proven it can greatly reduce the load on servers.

  29. Aggregator writers are to blame by saddino · · Score: 1

    Although the caching solution seems intriguing, the onus should really be on the aggregator authors to do at least local caching for RSS access between "refresh" intervals and even better, use HTTP conditional GETs. It's also important to use sane default "refresh" intervals and constraints.

    During our product's development, our debugging refresh interval was 5 minutes and hardcoded to Slashdot. As you can imagine, it didn't take us long to discover Slashdot's unique banning mechanism -- it woke us up to the problem of letting people check feeds way too often (this also before we had implement local caching).

    However, if this bandwidth issue keeps getting worse, someone like Akamai is certainly going to think of a corporate solution.

    1. Re:Aggregator writers are to blame by PornMaster · · Score: 1

      HTTP conditional GETs combined with something like mod_gzip to compress the content would be nice.

  30. It'll be fine. by eeg3 · · Score: 0

    Everyone loves RSS. If the sites with lots of users taking up bandwidth ask for some donations to keep it alive, i'm sure they'll get some help.

    Interweb fanatics tend to be very generous.

    1. Re:It'll be fine. by Anonymous Coward · · Score: 0

      Interweb fanatics tend to be very generous.

      I'm riding on a lollercoaster.

    2. Re:It'll be fine. by Anonymous Coward · · Score: 0

      roffle waffle!

  31. Solved, move on by Jeremiah+Blatz · · Score: 3, Informative
    Shrook for the Mac has already solved this issue with "distributed checking". Popular sites are checked once every 5 minutes, if the site is updated, everyone gets the latest content, otherwise, nobody touches it.

    As another poster has pointed out, banning users who check too frequently is an excellent fallback. A tiny site won't know to install the software, but it won't be an issue for a tiny site.

    1. Re:Solved, move on by PingXao · · Score: 1

      Shrook for the Mac. Please. Nothing "for the Mac" solves anything aove and beyond the tiny percentage of people who use a Mac. The problem cited in the article is a bandwidth problem related to a rather new technology and it's not something you can "solve" by deploying a "solution" that 99% of internet users will never use. Mac heads. They never learn.

  32. Shouldn't it be just the opposite?? by romcabrera · · Score: 1

    People polling my site for updates via RSS, would be good for my bandwith usage, because users will be retrieving a small amount of data and noticing nothing has changed... instead of doing a full access to my site, requesting images, etc.
    That would be savings at the long term. Or not? What's the deal going on here?

    1. Re:Shouldn't it be just the opposite?? by dustinbarbour · · Score: 1

      The only thing I can think of is perhaps RSS makes syndication TOO simple. Perhaps fewer people would be requesting updates on the news (or whatever) from a site if they didn't offer the feeds. I mean, without RSS, many users would rather go without the updates than access the site and get the images and everything.

      That is the only argument I can come up with and I think it's fallous.

  33. RSS + Bittorrent -- works for Podcasts... by Spoing · · Score: 2, Interesting
    Or, is in the works now on Dave Slusher's Evil Genius Chronicles Podcast. [Podcasts = RSS subscrition feeds for time shifted radio blogging.]

    The Podcasters need it too. I'm subscribed to a couple dozen feeds and have well over 4GB of files in my cache right now.

    The biggest problem with Bittorrent and podcasts is that the RSS aggregators needs to be Bittorrent aware. Unfortunately, few are.

    --
    A firewall can not protect you from yourself. Turn off what you do not need. Do not use the firewall to do your work.
    1. Re:RSS + Bittorrent -- works for Podcasts... by Refrag · · Score: 1

      You insensitive clod, don't link to an RSS feed.

      --
      I have a website. It's about Macs.
    2. Re:RSS + Bittorrent -- works for Podcasts... by Wesley+Felter · · Score: 2, Insightful

      Too bad podcasts are totally different from normal RSS feeds, because podcasts are about 100x larger. BitTorrent doesn't work for normal RSS feeds because they are too small and change too often.

  34. I Don't Know by sulli · · Score: 0, Offtopic

    Are Question Marks

    --

    sulli
    RTFJ.
    1. Re:I Don't Know by mskfisher · · Score: 1

      good call - I was about to post that very thing, but it appears you took the downmod instead...

      --
      0x0D 0x0A
  35. Bittorrent by Jherek+Carnelian · · Score: 3, Insightful

    Seems like bittorrent, or a bittorrent-alike protocol would be useful here. Turn the RSSfeed into a tracker/seed and then all it has to keep track of is who has the latest version of the content and it could redirect feeders to each other, always preferring the latest updated version. Eventually, you will have the same scaling problems that bittorrent has (single tracker), but at least you stretch things out a few months or a year until a better solution ocomes around.

  36. Ask Slashdot: Tracking time using computer by Anonymous Coward · · Score: 0

    When I use my computer, I'm either sshed in or logged in locally through GDM. How can I record how long I'm on over a certain period of time? I'm using Ubuntu.

    1. Re:Ask Slashdot: Tracking time using computer by Anonymous Coward · · Score: 0

      Use last in shell. It'll show login and logout times. -b0lt

    2. Re:Ask Slashdot: Tracking time using computer by Anonymous Coward · · Score: 0

      That's perfect, thanks

  37. What About... by the_mad_poster · · Score: 0, Redundant

    Okay, what about a distributed RSS feeder system?

    Say you have 100 people who want to get feeds from 10 sites. Regular RSS has 100 people hitting each of those sites once per minute. They all get the same data.

    However, if you have a system where a group of people all want the same site feeds, you could group them by their interest in sites. Within the pool, only x% of the sites interested in, for example, eweek.com would request the feed. Then, they would be available to distribute the feed to y% of the sites who would distribute to z% of the sites until everyone in the pool is up to date.

    The same goes for other sites. Another subgroup in that pool would pick up slashdot.org and distribute that out to the rest of the pool in waves.

    It doesn't change the overall bandwidth used, but it does decentralize it a great deal.

    --
    Alito: A vote for Alito is a punch in the eye to put that bitch back in her place!
  38. Coral and planning for feed growth by mlinksva · · Score: 1
    Coralized feeds ought to help sites with lots of subscribers. One nice thing about RSS traffic, I suspect, is that it is less bursty than requests for web pages. You can plan for traffic as the number of subscribers to your feed increases vs. your web pages getting /.'d.

    Ex: Slashdot RSS via Coral

  39. Seems somewhat reasonable. by Trejkaz · · Score: 1

    This seems like a fair method of reducing the amount of throughput... only permitting a certain number of requests per hour per user, or whatever time period one wishes.

    I'm pretty sure there are other ways of going about it, though.

    1. Send a header which specifies when the feed was last downloaded from this location. If I downloaded the feed an hour ago, I don't need the feed to contain articles which occurred half a day ago.

    2. Include less articles in the RSS.

    3. Push the RSS updates to users, using XMPP or similar, as sites like PubSub.com are starting to do.

    But realistically, what would you want more: (a) someone fetching 1kb of RSS once every 10 minutes, or (b) someone fetching 10-50kb of HTML and assorted crap once every 10 minutes? It seems to me that for every RSS download a user makes, they're actually saving you bandwidth!

    --
    Karma: It's all a bunch of tree-huggin' hippy crap!
    1. Re:Seems somewhat reasonable. by Whyrph · · Score: 0

      b gets you ad revenue. a does not (well, it does if people read the stories, but not so much as a. And I think a lot of people just look at the summaries.

  40. what's wrong with the old subscription model? by Trepidity · · Score: 2, Interesting

    When I want updates from sites, I subscribe to an email feed, and stick it in its own mailbox. I agree that some standardized format and display would be nice, but you can send XML over email too, so what's needed is a reader that I can point to an IMAP mailbox full of XML mails.

    An alternate approach would be to do the same thing with a news server. Why keep refreshing a feed for updates instead of letting it notify you when it has updates?

    1. Re:what's wrong with the old subscription model? by Anonymous Coward · · Score: 0
      When I want updates from sites, I subscribe to an email feed, and stick it in its own mailbox. I agree that some standardized format and display would be nice, but you can send XML over email too, so what's needed is a reader that I can point to an IMAP mailbox full of XML mails.

      An alternate approach would be to do the same thing with a news server. Why keep refreshing a feed for updates instead of letting it notify you when it has updates?

      So you're just shifting the server that handles the burden of your poll. Instead of polling the directy provider of the content you're polling the mail/news server.

      Another problem with the mail server approach is when servers go offline. Assuming no backup MX, now the outgoing bandwidth is being eaten up with re-transmission attempts. Also (for mail) you have to manage subscriptions. With RSS people just ask when they want it, hopefully doing a short "has it changed" query first.

  41. Torrents by cartzworth · · Score: 0, Redundant

    How about the development of an RSS torrent? Lets decentralize they entire concept.

  42. What about the odds? by Anonymous Coward · · Score: 0
    and KnowNow (even-driven syndication)./i>
    Only allowing even-numbered articles is as useless as outdated articles. All or nothing!!
  43. RSS is dead, long live the king! by Murphy+Murph · · Score: 1

    I, for one, welcome our old USENET overlords!

    --
    I dub thee... Sir Phobos, Knight of Mars, Beater of Ass.
  44. Why is this a problem? by Guspaz · · Score: 0

    If these sites would just take advantage of HTTP Compression and buy some cheaper bandwidth they wouldn't have a problem.

    These places are all so uppity they'd never consider anything less than highlevel colocation with high bandwidth costs, or in-house connections. And while that's fine for what they normally do, they shouldn't be complaining about RSS eating up much bandwidth if they won't consider all their options.

  45. I want to die. by Anonymous Coward · · Score: 0

    I really hate it when my web site gets a lot of traffic. I publish my content in hopes that nobody will look at it, but because of my stupidity I took advantage of that little "live bookmarks" feature of Firefox, and ever since I've regretted it. Granted, I *could* put advertisements on the links links by the live bookmarks, but that would be too easy. I'd rather sulk and bitch about my bandwith usage because there are no providers out there that offer more than 500MB transfer for $0/month.

    Please stop being interested in my content. Please?

  46. This issue was previously discussed elsewhere by Paul+Bain · · Score: 4, Insightful

    As RSS [becomes] more known to the mainstream users and press, the bandwidth issue reported by many sites . . . related to feeds is becoming a reality. Stats from sites like Boing Boing are showing a real concern regarding feeds bandwidth usage. Possible solutions to this problem are emerging slowly, like RSScache (feed caching proxy) and KnowNow (even-driven syndication). RSScache seems to offer a realistic solution to the problem, but [will it] be enough . . . ?

    Slashdot user GaryM posted a related question elsewhere about 20 months ago. At that time, in that forum, commenters dismissed his proposed solution, the use of NNTP, on the grounds that NNTP is deficient, but others continue to see NNTP as a possible solution nevertheless.

    --

    A lawyer & digital forensics examiner. Also an expert on open source software (OSS).
    1. Re:This issue was previously discussed elsewhere by MikeBabcock · · Score: 1

      I still think HTTP 1.2 should include the rsync protocol. If a website changes, it usually changes very little. My proxy / browser cache should say "I have version 1.1, what's new?" and the webserver should say "Here's the whole thing" or "Here's the two line diff".

      --
      - Michael T. Babcock (Yes, I blog)
  47. Solution! by Quixote · · Score: 2, Funny
    I have an idea... let's start a company which pushes data to the consumers... from a central point. We'll call it "pointcast".

    Now if only they'd bring back the $$$ from the mid 90s too.... :)

    1. Re:Solution! by DarthWiggle · · Score: 1

      Bless you, Quixote, for stirring these passions so long laid dormant in my loins. I thought Pointcast was lost to us forever.

    2. Re:Solution! by One+Louder · · Score: 1

      Pointcast never actually pushed - it was periodic HTTP polling, just like RSS. The difference is that Pointcast responded with entire articles with graphics, whether or not they were going to ever be read, while RSS typically responds with only headline and summary, with a link to the actual content.

  48. Not it didn't.. by dustinbarbour · · Score: 0, Offtopic

    Is this whole in Korea.. to be the new in Soviet Russia..? Tell me it's not.

  49. Slashdot's RSS blocking policy by jamie · · Score: 4, Informative
    Slashdot blocks your IP from accessing RSS if you access our site more than fifty times in one hour. I think that's reasonable, don't you? Especially since our FAQ tells you to request a feed only twice an hour.

    Every complaint about this that I've investigated has turned out to be either a broken RSS reader or an IP that's proxying a ton of traffic (which we usually do make an exception for).

    Oh, and if you want to read sectional stories in RSS, then:

    • create a user if you haven't already,
    • edit your homepage to include sectional stories you like (and exclude those you don't),
    • then reload the homepage and copy that "rss" link at the very bottom of the page. It will be customized to your exact specs!

    Slashdot's RSS traffic, like Boing Boing's, is huge, and blocking broken readers has saved us a ton of bandwidth, which of course means money. We were one of the first sites to do this but (as this story suggests) you'll see a lot more sites doing it in the future. I think our policy is fair.

    1. Re:Slashdot's RSS blocking policy by bill_mcgonigle · · Score: 1

      then reload the homepage [slashdot.org] and copy that "rss" link at the very bottom of the page. It will be customized to your exact specs!

      OK, that's completely cool. Kudo to whoever implemented that. Now I don't have to bitch about it on this thread. :)

      But when I follow your link, I get

      http://developers.slashdot.org/index.rss

      and if I go to the normal homepage I still have

      http://slashdot.org/index.rss

      I'd expect there to be a ?user= or something. How does the RSS generator know it's me? My RSS reader doesn't share my browser's cookies... I'm missing something.

      --
      My God, it's Full of Source!
      OUTSIDE_IP=$(dig +short my.ip @outsideip.net)
    2. Re:Slashdot's RSS blocking policy by MikeFM · · Score: 1

      Don't normal web proxies work just fine for caching RSS traffic? I just looked at Slashdot's feed and it seems to cache well - much better than the rest of Slashdot apparently. [Is there a reason Slashdot doesn't cache better? I'd think that'd save a lot of bandwidth.]

      So, to me, it looks like there is no need for a RSS proxy. RSS readers just need to learn to use regular web proxies and users need to be convinced that using such proxy servers is to their benefit. Good luck given the low number of users that bother using such proxy servers for their web browsing.

      Going with the system you described I might suggest limiting an IP to two hits per resource an hour unless that IP is from a known proxy server. If a lot of popular sites did this, and provided some directions, then it'd not be long before users began using the proxy servers and anyway it'd help cut back on bandwidth usage.

      On my own websites I like to compromise by including a link to users not logging from a proxy server that gives them information on using a proxy server.

      --
      At what price learning? At what cost wisdom? The price is a man's peace of mind, and the cost is his life.
    3. Re:Slashdot's RSS blocking policy by Fnkmaster · · Score: 1

      Jamie, I'm not sure if you're mistaken or if something has been changed in the last month or two, but your IP blocking provisions certainly were kicking in WAY before 50 accesses in one hour.

      I had a Slashdot RSS feed live bookmark in Firefox (supposedly gets checked once an hour, or when the browser is started up), and that got me temporarily banned (perhaps I had restarted the browser several times in an hour for some reason, but it certainly wasn't 50 times!).

      Like I said, hopefully you have upped the ban factor to something more sane like 50, but the rep you guys got for your "jackbooted" policy on RSS was well deserved.

    4. Re:Slashdot's RSS blocking policy by am+2k · · Score: 1
      Every complaint about this that I've investigated has turned out to be either a broken RSS reader or an IP that's proxying a ton of traffic (which we usually do make an exception for).

      Well, I got banned twice while developing some RSS-reader, because I wanted to make sure that it works with slashdot's feed, and relaunched it about 51 times in an hour... a co-worker of mine got hit with the same problem, so we ended up removing slashdot from the default feed while developing the application.

      Note that this application supports the HTTP standard for checking for updates, didn't really help in that case.

    5. Re:Slashdot's RSS blocking policy by jamie · · Score: 3, Insightful
      Is there a reason Slashdot doesn't cache better? I'd think that'd save a lot of bandwidth.

      Not really. Our cache hit rate would be about zero. We update the homepage about once a minute, and the same goes for any page that any reader would be likely to reload within a reasonable time.

    6. Re:Slashdot's RSS blocking policy by Dr.+Awktagon · · Score: 1

      then reload the homepage and copy that "rss" link at the very bottom of the page. It will be customized to your exact specs!

      This doesn't work for me, I get just "http://slashdot.org/index.rss" .. the top of the page says "This page was generated by a Squadron of Attack Elephants for Dr. Awktagon (233360)" so I must be logged in, right?

      Also like a poster above I was developing an RSS reader and testing on slashdot's feed and I got shut out after like 5-10 hits, I guess you check for a RATE exceeding 50 hits/hour, rather than actually waiting an hour, right?

    7. Re:Slashdot's RSS blocking policy by jamie · · Score: 2, Interesting
      The limit was bumped up a couple months ago, I don't remember exactly when. (And if abuse gets worse, of course, we'll take it back down... but hopefully in 2004 we're no longer on the bleeding edge and client application authors will get more friendly...)

      If you'd like me to check it out, I will. I've set up a Firefox live bookmark for myself and I'll check the logs for my own accesses and see what happens. If you do the same and get banned, go ahead and email me directly -- as soon as possible so our logs don't roll over -- and I'll take a look.

    8. Re:Slashdot's RSS blocking policy by Vr6dub · · Score: 1

      What about us people that read the RSS feeds from work. My company has tens of thousands of employees.....all leaving on a couple IP addresses. I'm replying becuase I sent an email already and have had no response. Additionally, I have been banned from my computer at home as well (using the Firefox RSS). So something is not right.

    9. Re:Slashdot's RSS blocking policy by Gojira+Shipi-Taro · · Score: 1

      It would be reasonable, but I got banned before I had accessed the RSS feed ONCE.

      That's fucking retarded.

      --
      "Oh my God. This is terrible. This is the end of my Presidency. I'm fucked."; ~ Donald J. Trump
    10. Re:Slashdot's RSS blocking policy by jamie · · Score: 1

      Argh! Sorry, I just described a subscriber-only feature. My bad. Never mind.

    11. Re:Slashdot's RSS blocking policy by jamie · · Score: 2, Informative

      Sorry, I goofed, that feature I described is subscriber-only. I'd forgotten that. I'll update the FAQ to describe it.

    12. Re:Slashdot's RSS blocking policy by bill_mcgonigle · · Score: 2, Interesting

      Sorry, I goofed, that feature I described is subscriber-only.

      That's OK, I'm a subscriber... still don't see how the custom RSS works. From my RSS reader how does Slashdot know I'm a subscriber? Special URL?

      --
      My God, it's Full of Source!
      OUTSIDE_IP=$(dig +short my.ip @outsideip.net)
    13. Re:Slashdot's RSS blocking policy by Dr.+Awktagon · · Score: 1
      I am a subscriber! In fact I remember asking about this feature in a slashdot IRC chat.

      You have paid for a total of 19000 pages and so far 16835 have been used up (10 today). Thank you for supporting Slashdot! We appreciate your contribution very much.

    14. Re:Slashdot's RSS blocking policy by Cecil · · Score: 1

      Your policy may be fair (I'd argue against that, personally, but there's not much point in doing so) however your implementation of that policy varies wildly, or at least it used to, and if you don't acknowledge at least that it used to suck (because it certainly did) then you're going to come across as if you don't actually know the situation well enough to know whether it sucks now either.

      My friend, Neil Fraser has had no end of troubles with a fairly legitimate utility he created. Which is even linked on slashdot's code page as being under "ultramode".

    15. Re:Slashdot's RSS blocking policy by Anonymous Coward · · Score: 0

      I'm developing an rss aggregator using the Python Universal Feed Parser http://www.feedparser.org/. During my work I've been banned by slashdot as many other developers.

      You claim that /. would not benefit from caching because the cache hit ratio would approach zero. However, the modified field in your rss feed, or more correctly the dc:date field, is as far as I have been able to see, NOT updated every minute..

      Hence, it appears to me that having your server accept conditional http get should save some bandwith. If nothing else, it may reduce the amount of users you are forced to block.

      Am I totally missing something or do I make sense?

      Lars Brenna, Norway.

    16. Re:Slashdot's RSS blocking policy by (trb001) · · Score: 1

      I would assume that your RSS feeder can use cookies, I know mine does. I get NYTimes articles all the time that you have to be logged in for, and they do that using cookies, I believe.

      --trb

    17. Re:Slashdot's RSS blocking policy by jamie · · Score: 1

      Check that "rss" link at the bottom of the homepage again. As of now, it should be a new dynamic URL that looks like index.pl?content_type=rss&logtoken= etc.

    18. Re:Slashdot's RSS blocking policy by jamie · · Score: 1

      Sorry, I jumped the gun a bit. But the feature's in place now. Check that "rss" link at the bottom of the homepage again, it should be a dynamic URL now, which you can paste into your RSS reader.

    19. Re:Slashdot's RSS blocking policy by jamie · · Score: 1
      Oh, I was talking about caching webpages. But the cache hit rate would still be near-zero for RSS pages, assuming your software respects the limits we recommend:

      <syn:updatePeriod>hourly </syn:updatePeriod>
      <syn:updateFrequency>1 </syn:updateFrequency>

      While developing RSS reading software, it's best to pull your data from your own network, not a public site.

      The dc:date field is not updated because it should not be. If you mean the Last-Modified field in the HTTP header, we send that correctly. We also send an ETag header.

      And we do HTTP conditional GET correctly too, as far as I can tell:

      $ telnet slashdot.org 80
      Trying 66.35.250.150...
      Connected to slashdot.org.
      Escape character is '^]'.
      GET /yro.rss HTTP/1.1
      Host: slashdot.org
      If-Modified-Since: Thu, 09 Dec 2004 09:11:06 GMT

      HTTP/1.1 304 Not Modified
      Date: Thu, 09 Dec 2004 15:44:53 GMT
      Server: Apache/1.3.29 (Unix) mod_gzip/1.3.26.1a mod_perl/1.29
      Connection: close
      ETag: "1ae265-3464-41b816a9"
      Cache-Control: max-age=1800

      Connection closed by foreign host.

    20. Re:Slashdot's RSS blocking policy by jamie · · Score: 1
      Oh, sure, it used to suck. But I'm fairly confident that the early bugs are squashed now.

      I'll email your friend and if his script is still having troubles I'll try to sort out what's going on.

    21. Re:Slashdot's RSS blocking policy by bill_mcgonigle · · Score: 1

      I would assume that your RSS feeder can use cookies, I know mine does. I get NYTimes articles all the time that you have to be logged in for, and they do that using cookies, I believe.

      Agreed, but how would they get set in the first place? If my RSS reader (NewsFire) has a cookie store, it's not shared with my web browser (Mozilla).

      --
      My God, it's Full of Source!
      OUTSIDE_IP=$(dig +short my.ip @outsideip.net)
    22. Re:Slashdot's RSS blocking policy by bill_mcgonigle · · Score: 1

      Check that "rss" link at the bottom of the homepage again. As of now, it should be a new dynamic URL that looks like index.pl?content_type=rss&logtoken= etc.

      Perfect! Thanks, Jamie.

      --
      My God, it's Full of Source!
      OUTSIDE_IP=$(dig +short my.ip @outsideip.net)
    23. Re:Slashdot's RSS blocking policy by (trb001) · · Score: 1

      That I don't know. I use SharpReader and I'm guessing it does use the IE cookie store. Either that or I've been using it long enough that I don't remember logging in to some sites. Or it uses an IE browser plugin, of sorts, that uses the cookie store. I know I can right click in the preview pane and have the full set of IE options.

      --trb

    24. Re:Slashdot's RSS blocking policy by shokk · · Score: 1

      Honestly, who the hell needs to ping Slashdot more than once an hour for stories?!? Can we admit that, other than FP idiots, the stories are really not so engaging that we have to be hanging on to get the first glance at a site that is no longer running due to the /. effect?

      Once you've got 30 or so feeds in your list, Slashdot becomes just one of many. Once an hour is really more than enough to keep the tide of info coming at you when you hit that level of RSS use.

      --
      "Beware of he who would deny you access to information, for in his heart, he dreams himself your master."
    25. Re:Slashdot's RSS blocking policy by Winders · · Score: 1

      Try Newsfire RSS reader on OS X- despite being set up to only ask for feeds every hour, I get blocked twice a week or so.

    26. Re:Slashdot's RSS blocking policy by Cecil · · Score: 1

      Cool, thanks Jamie.

    27. Re:Slashdot's RSS blocking policy by MikeFM · · Score: 1

      Doesn't Slashdot get a lot of hits per minute? If the pages cached well and more users were made to use proxies I'd think it'd still save a significant amount of bandwidth. Even if you only had to send the page once per minute instead of ten times per minute that'd seem like a huge savings in bandwidth to me. I'd imagine Slashdot gets a lot more than ten hits in a minute on the major pages.

      --
      At what price learning? At what cost wisdom? The price is a man's peace of mind, and the cost is his life.
    28. Re:Slashdot's RSS blocking policy by FTL · · Score: 1
      > I'll email your friend and if his script is still having troubles I'll try to sort out what's going on.

      Thanks Jamie & Cecil. I've increased my polling to six times an hour and will let you know if I start getting errors again.

      Stats show that usage of my Slashdot feed is slowly declining. This is most likely due to the gradual migration from Mozilla (which has that great sidebar) to Firefox (which has its own RSS abilities).

      --
      Slashdot monitor for your Mozilla sidebar or Active Desktop.
    29. Re:Slashdot's RSS blocking policy by jamie · · Score: 1

      If you like, contact its author and, assuming this is reproducible on an IP that isn't used for other RSS requests, have him/her email me directly about this.

    30. Re:Slashdot's RSS blocking policy by Anonymous Coward · · Score: 0

      Not really. Our cache hit rate would be about zero. We update the homepage about once a minute, and the same goes for any page that any reader would be likely to reload within a reasonable time.

      That's only because you smash everything together into one huge file. If you put your styles in a .css file they'd have a hit rate of about 99%.

    31. Re:Slashdot's RSS blocking policy by sdhankin · · Score: 1

      I've tried for two weeks to use this. By default, when added to my toolbar menu on FireFox 1.0, I was banned. I asked for the ban to be lifted, and it was - until the next day. Then I asked again, and was able to use it again - for a day. Third time I asked, I got a reply that my IP address doesn't show up in the logs, so it must be my reader. FireFox 1.0 is my reader - hard to imagine I'm the only one.

      So now I use it as an RSS reader for a half dozen other RSS sites, except /., which I had to delete. It was useless to me, after all, and there appears no interest in fixing my problem. Too bad - cool while it lasted.

    32. Re:Slashdot's RSS blocking policy by Paradise+Pete · · Score: 1
      Slashdot blocks your IP from accessing RSS if you access our site more than fifty times in one hour. I think that's reasonable, don't you?

      I get blocked occasionally. And I check less frequently than the minimums. But I suspect that large parts of the country appear as one user, as I believe we all go through the same NAT. If that's the case though, I'm surprised it doesn't happen more often.

    33. Re:Slashdot's RSS blocking policy by Anonymous Coward · · Score: 0
      While developing RSS reading software, it's best to pull your data from your own network, not a public site.

      This is sooo true. As a software developer, I try to skip real-world interaction testing whenever possible, especially before I ship a product to customers.
    34. Re:Slashdot's RSS blocking policy by Anonymous Coward · · Score: 0

      The NYT feed uses special URLs that bypass the registration system. So links directly from the feed don't need cookies or anything else to recognize you.

      Most feed readers don't support cookies at all.

  50. Doesn't sound that useful by Chuck+Chunder · · Score: 1

    How big are RSS files normally? I'd be surprised if the bandwidth involved in tracking and coordinating a whole bunch of clients would be significantly less than the RSS itself.

    By the time you've told a client to "go an ask these other clients" you may as well have just sent it the RSS file.

    --
    Boffoonery - downloadable Comedy Benefit for Bletchley Park
  51. Soultion by vandalman · · Score: 1

    Nick's got it covered

    --
    Devise, Repair, Solve, Build
  52. Solution: RSS over Usenet news by NZheretic · · Score: 4, Interesting
    One solution would be to use an existing infrastructure that was built for flood filling content - the Usenet news server network.

    Create a new first level domain ( like alt, comp, talk etc ) named "rss" and use an extra header to identify the originating rss feed URL. The latter header could be used by the RSS/NNTP reader to select which article bodies to download and to verify each RSS entry to identify fake posts.

    1. Re:Solution: RSS over Usenet news by Anonymous Coward · · Score: 0

      So, the solution to the RSS problem is to invent an entirely new technology, unrelated to current RSS readers?

      Yeah, that sounds like it'll work.

  53. just zip it by Darth_Burrito · · Score: 1

    I see some people talking about bit torrent like networks for rss feeds. Really, why not just zip it. If it were normal text, I'd expect to shrink file size down to 10% but we're talking about XML which has a lot more redundancy.

  54. Re:They just need to follow /.'s lead by jamie · · Score: 4, Funny

    Of course we blocked your IP when you hammered our server. And we'll do it again. Duh. We monitor abuse on the whole site, not just RSS.

  55. Swarming (Like BitTorrent) is the answer by MS_leases_my_soul · · Score: 4, Interesting

    This still baffles me. BitTorrent works great for distributing media like ISOs. Folks, it can distribute "little" stuff, too.

    A content creator (say Slashdot) has webpages and it has an RSS feed. They create a torrent for each page. They sign the RSS file and each torrent (and its content) with a private key. They post their public key on their homepage.

    Now, you can cache the RSS file on other sites that support you yet the users can still be confident that it really came from you. Inside the RSS file, users can try to get the webpage (and all its images, etc.) through the torrent first. When the page loads locally in your browser, it could still go out and get ads if you are an ad sponsored site.

    If you are a popular site and have a "fan base", you should have no problem implementing something along these lines. If you are a site that has these problems, you are probably popular and have a fan base. Given the right software and the buy-in from users, the problem solves itself.

    1. Re:Swarming (Like BitTorrent) is the answer by Tzarius · · Score: 1

      Yes, but the bandwidth saved would be negligible when serving imageless pages, and there would still be the issue of misbehaving clients scraping the tracker. Not to mention, integrating a Bittorrent-esque client with the users browser of choice.

      The far more basic solution (as mentioned on BoingBoing), is to simply cache IPs that have downloaded the current RSS feed, then flush the cache at the next update.
      This requires readers that cache feeds though, as many users would get caught out if they tried getting the feed too often.

    2. Re:Swarming (Like BitTorrent) is the answer by Jerf · · Score: 3, Informative

      BitTorrent works great for distributing media like ISOs. Folks, it can distribute "little" stuff, too.

      No, it "can't". Or at least, it can't serve it with any benefit. Tracker overhead swamps any gains you might make. BitTorrent is unsuitable for use with small files, unless the protocol has radically changed since I last looked at it. In the limiting case, like 1K per file, it can even be worse or much worse than just serving the file over HTTP.

      Inside the RSS file, users can try to get the webpage (and all its images, etc.) through the torrent first.

      Oh, here's the problem, you don't know what you're talking about or how these technologies work. When an RSS file has been retrieved, there is nothing remotely like "get the webpage" that takes place in the rendering. The images are retrieved but those are typically too small to be usefully torrented too.

      Regretably, solving the bandwidth problem involves more than invoking some buzzwords; when you're talking about a tech scaling to potentially millions of users you really have to design carefully. Frankly, the best proof of my point is that it was as easy as you say it is, it'd be done now. But it's not, it's hard, and will probably require a custom solution... which is what the article talks about, coincidentally.

    3. Re:Swarming (Like BitTorrent) is the answer by Quixotic · · Score: 2, Insightful

      i'm not sure the overhead of maintaining a torrent would be less than just serving up a single rss feed (or webpage, image, whatever small file). if i'm not mistaken, each client still needs to download the torrent from the main site to determine where it should download the payload from... and if you're going to do that, you might as well just serve up the small file.

      also, using a torrent might not work so well for sites like slashdot, which allows users to customize the homepage and/or feeds...

      --
      --
    4. Re:Swarming (Like BitTorrent) is the answer by Anonymous Coward · · Score: 0

      Umm... don't RSS files have a text summary of each webpage (usually the first paragraph), the URL to the file, and various other metadata? Don't you then go a retrieve that URL in your browser? Where is he wrong?

      If I understand what he is saying, the RSS files gets signed and distributed in a p2p system. That has pros and cons, but it could technically work. It will use more "total bandwidth" from the standpoint of the entire Internet, but it will take load off the hosting server.

      Taking what he said and giving it specifics, the RSS file has a node with nodes underneath it. I think that he is saying that we would be adding a node under the node. This node would include all the data of a .torrent file (The url of the tracker, name, files, length, path, piece length, pieces, etc.)

      You could now get the RSS file over a P2P system. You would bring the webpage "package" down with BT.

      This only solves the problem of bandwidth used to constantly poll the RSS file. You still need to address the "versioning" and/or updating of the RSS on the P2P system. You also still have the problems found with trackers today. It does offload the bandwidth for the individual pages to BT peers, so you could include larger images in your pages.

      I don't think this is the solution, but it does make an interest point or two that are good starting points of a discussion.

    5. Re:Swarming (Like BitTorrent) is the answer by Jerf · · Score: 1

      Umm... don't RSS files have a text summary of each webpage (usually the first paragraph), the URL to the file, and various other metadata?

      Yes.

      Don't you then go a retrieve that URL in your browser?

      No.

      You can, but it isn't automatic, nor is it even normal.

      You still need to address the "versioning" and/or updating of the RSS on the P2P system.

      A huge problem, completely (and correctly) unaddressed by BitTorrent.

      You also still have the problems found with trackers today.

      Only worse, because now you're using your trackers for no gain. RSS files are small. (Or they should be.)

      It does offload the bandwidth for the individual pages to BT peers, so you could include larger images in your pages.

      RSS does not have images embedded in them. Large images aren't the bandwidth problem, anyhow, it's the text getting hammered.

      With all due respect, you don't know what you're talking about either. There is nothing intrinsically wrong with that, but since you don't even understand the problem, why do you think you can solve it? Or criticising my explanation, no less?

    6. Re:Swarming (Like BitTorrent) is the answer by Anonymous Coward · · Score: 0

      So you are saying that people do not get an RSS file, see an article that sounds interesting and go pull it up in their browser? OK, sorry, I will stop doing that on a daily basis.

      Also, I was saying that you *DO NOT* host the RSS files on BT, but some other P2P type of technology (using the pre-built Gnutella2 code?). But, as I said, just doing a search for "Slashdot.rss" will not automatically get you the latest and greatest RSS file. The file would need a TTL.

      Lastly, I was not saying the RSS file sould have images built into them. I said the torrent should have the images included in it. Each webpage (or a daily snapshot of the whole site, for that matter) would be built into a multi-file torrent.

      I was talking about:
      - combining a more traditional P2P type codebase to allow search and exchange of the RSS files,
      - wrap up page and all its included content (images, etc.) into a torrent. Or you could snapshot the whole site on some interval and bundle it into a single multi-file torrent.
      - have the website at as a node/hub on the "traditional" side of the network and as a seed on the BT side.

      I went on to point out that these choices will take the load off the host, but it introduces all kinds of other problems.

      Where do I not know what I am talking about?

  56. ban abusive clients by zoftie · · Score: 1

    Main issue is that some clients check far too many times the site or download whole content without checking the change time. ban those and you will be fine. same kind of issue dyndns.org or some other site was having with linksys clients.

  57. Internet2 by Anonymous Coward · · Score: 0

    Internet2

  58. You're talking application-level by mveloso · · Score: 4, Interesting

    Well, RSS was simple, and everything you're talking about (caching, push-based update, etc) are application-level issues. Even though that stuff is defined in HTTP 1.1, it took years for HTTP 1.1 to come out.

    If the web started with HTTP 1.1, it would never have gone anywhere because it's too complicated. There are parts of 1.0 that probably aren't implemented very well.

    If you want to improve things, adopt an RSS reader project and add those features.

  59. OT: Re:Solutions by DarthWiggle · · Score: 1

    Bloglines is quite good, and I appreciate that it's very chummy with Firefox, but I'm not 100% satisfied with it. I wish I could articulate what bugged me about it (especially to the founder, heh), but I find it's slower for me to check bloglines than it is to just swoop through my bookmarks every once in a while.

    By far the best RSS experience I've had has been with the Konfabulator RSS widget, which pops up when it finds a new entry and hides away when there's nothing new. It's elegant and simple. Bloglines is a fantastic service for aggregating large amounts of information, but it's still not very efficient at providing it to me quickly.

    On topic, the problem with RSS seems to me to be that it's a solution in search of a problem that, in turn, creates more problems (through the non-caching, etc). I suppose it's useful for keeping tabs on all those LJ blogs you read (but don't admit to reading), but, honestly, if the BBC posts a news item about, say, a nuclear explosion in Karachi (apologies to both Indians and Pakistanis reading this), I don't want to wait for my RSS client to cycle. I guess there's some element of push and some element of browse that need to be mixed together.

    1. Re:OT: Re:Solutions by markfletcher · · Score: 1
      Thanks for the feedback. :)

      On the issue of getting updates quickly, there's a new project dedicated to that. It's called Feedmesh, and it's being done on a Yahoo Group, http://groups.yahoo.com/group/feedmesh.

      Basically, the idea is to share/distribute new item pings. Most blogging software has the capability to ping services when you post a new item. Sites like blo.gs and weblogs.com currently act as 'clearinghouses' for these pings. The Feedmesh project's goals are to distribute these more, while at the same time concentrating all the pings, so that everybody has access to all available pings. This will lead to faster updates in aggregators as well as decreased bandwidth for content providers.

    2. Re:OT: Re:Solutions by DarthWiggle · · Score: 1

      Thanks for replying. :)

      What I'd like is to have one, centralized server-based clearinghouse (of my links; not talking about the clearinghouse from your post) that would keep track of the feeds I'm interested in, and, based on some sort of flag, fire off SMS, IM, or some other proprietary alert to me wherever I am. In-browser is where it gets tricky for me as a user: I don't want my aggregator to take up a whole window. I'd much rather have a sidebar or floating window interface than one that fills the screen... I'm just looking for headlines here and maybe the briefest blurb. No doubt, there is a product out there that does some of this (maybe even Bloglines - I haven't explored it fully), but I'm moderating my wishlist with the knowledge that RSS is not an entirely mature culture (even if the technology is good to go).

      So, in addition to death by its own success, RSS faces the problem of crusty users like me who have work habits that RSS is currently fighting (kind of like using a TV-based media navigator when you're used to the extremely rapid and efficient navigation of iTunes or winamp).

      Anyway, I'm rambling.

    3. Re:OT: Re:Solutions by Anonymous Coward · · Score: 0

      I know...I got to this article using Bloglines.

      However, for customized feeds where each user gets a different feed, even bloglines wont help reduce the # of RSS hits.

  60. Smaller Feeds by idiotfromia · · Score: 1

    Why does everybody seem to feel the need to have their last 20-25 posts in their feed? It's just going to mean wasted bandwidth, especially for websites that update infrequently. I'd say the last five posts would be sufficient for most weblogs and 10 for news sites like Slashdot and The Register.

    Feed readers are the other issue. Many set their default refresh to an hour. I use SharpReader which has an adequate 4 hour default. I adjust that on a per feed basis. Some update once per day, and that's all I need it to load the latest.

  61. I agree with you. by mcmasuda · · Score: 3, Informative

    I just fired up ethereal and refreshed my RSS reader. Out of the dozen or so feeds I monitor, a few of them are using Etags and sensible cache-control headers, so I just get 304 Not Modified. Of the rest, not a single one is compressing even though my client is specifying gzip and deflate in its Accept-Encoding header.

    HTTP compression will work even better here than it does for regular pages - RSS is basically all text so every response is going to be compressible. Looking at a handful of my feeds, some quick messing about with wget & gzip gives me an average compression ratio of 3:1. That's a 66% reduction in bandwidth utilization. If just half of your clients support HTTP compression, it's still a significant savings.

    Now, the article is talking about poorly designed aggregators that don't check whether the content's changed (I'm assuming he's talking about Etags). There's not much you can do about that, but using compression for capable clients sure seems like it would be a good thing.

  62. howe is this any different than HTML? by Matey-O · · Score: 1

    many many bandwidth issues that were HTML related were solved by incorporating proxys between the viewers and servers, I fail to see why you couldn't do the same with index.xml

    --
    "Draco dormiens nunquam titillandus."
  63. If-Modified-Since, User-Agent by pbryan · · Score: 3, Insightful

    I'd be interested in seeing how many of these hits are for complete feeds rather than If-Modified-Since the last time it was downloaded. I suspect that if the RSS readers were behaving like nice User-Agents, we wouldn't see such reports.

    Perhaps particularly offending User-Agents should be denied access to feeds. If I saw particular User-Agents consistently sending requests without If-Modified-Since, I'd ban them.

    --

    My car gets 40 rods to the hogshead, and that's the way I likes it!

    1. Re:If-Modified-Since, User-Agent by phixus · · Score: 2

      In a lot of cases, it doesn't matter if the client supplies If-Modified-Since headers, or If-None-Match headers either. Slashdot is a prime example. You can supply all of the bandwidth reducing headers you want but slashdots server will give you the full feed every single damn time. Servers should also be gzipping their output, but most don't. Slashdot sure doesn't. Slashdot's rss feed would go from 12K to 3K if they did (I checked). Most of those 12K full feed results would be reduced to about 200 bytes if they honored the caching headers.

  64. HTTP? by bill_mcgonigle · · Score: 1

    What's the problem here, is everybody loading the full feed each time?

    Wouldn't a client include a If-Modified-Since HTTP header in the GET request?

    We're talking 200 bytes for a not-modified query.

    Is it these 200-some-odd byte requests that people are complaning about?

    --
    My God, it's Full of Source!
    OUTSIDE_IP=$(dig +short my.ip @outsideip.net)
    1. Re:HTTP? by Mercano · · Score: 1

      In a perfect world, yes. However, there are problems. Not all clients are well behaved and include a if-modifed-since field in thier request header, probably do to programer laziness (You want me to store what I get? Why, bandwidth is cheap and I don't feal like writing the extra code), and not all RSS servers support look at the if-modifed-since field, espceially if they are generating the feed from a script thats called on demand, once again, due to laziness (You want me to store what I generate? Why, processing is cheap and I don't feal like writing the extra code.).

      --
      #include <signature.h>
  65. Atom's bandwidth usage? by Nimey · · Score: 1

    Does Atom use significantly more or less bandwidth than RSS?

    --
    Hail Eris, full of mischief...

    E pluribus sanguinem
  66. BitTorrent + RSS by Apreche · · Score: 1

    A single peer to peer ad-hoc worldwide bittorrent style feed through which RSS stories are passed and spread. A great vision for the future.

    --
    The GeekNights podcast is going strong. Listen!
  67. Could Multicast be a Saviour? by ethzer0 · · Score: 1

    RSS is such a valueable service that not having them is unacceptable. Perhaps it could provide a better argument for Multicast. Service providers should all seriously accept the need for a widescale multicast solution. It's technology that we have today, but nothing substantial seems to be emerging in terms of native connectivity. This kind of service would provide a great delivery mechanism for traffic of this nature.

    1. Re:Could Multicast be a Saviour? by Darkangael · · Score: 0

      I don't think this will happen before IPv6 becomes commonplace and some sort of standardized solution for people behind routers is agreed upon. IPv6 has apparently got some features in it especially for mutlicast which I think would be a great solution, and it means the ISP's and backbones only need to do one major upgrade to fix both problems (IPv6 migration and multicasting RSS). Then of course all we need to do is make all the people using "old" RSS change over. This may be the hardest part.

  68. That's one possibility, yes. by Trejkaz · · Score: 1

    If I load a page through my Firefox, all the advertisements get blocked. So they surely aren't getting any revenue from my downloading of the entire HTML.

    Meanwhile, if I load the story page via the RSS reader in Thunderbird, I can't block the ads. :-)

    So clearly it can work both ways.

    --
    Karma: It's all a bunch of tree-huggin' hippy crap!
  69. Bandwidth concerns? by Cereal+Box · · Score: 1

    Why is there such concern over accessing RSS feeds as opposed to accessing the web site? Take for instance Slashdot: as of this writing, the main page is 65K and the RSS feed is 14K. Isn't this the case for most websites? So why the big fuss? If people are continuously refreshing the RSS feed, at least less bandwidth is consumed than if they were continuously refreshing the main page.

    ... Or is this one of those things where geeks have become so enamored with the technology that they go completely overboard with it? Are people refreshing the RSS feeds every two seconds or something?

  70. Slashdot's RSS blocking policy-$$$$ Kaching. by Anonymous Coward · · Score: 4, Insightful

    "Slashdot's RSS traffic, like Boing Boing's, is huge, and blocking broken readers has saved us a ton of bandwidth, which of course means money."

    So's using correct HTML, and CSS.

    1. Re:Slashdot's RSS blocking policy-$$$$ Kaching. by steveyT · · Score: 1

      If slashdot was updated to use well formatted xhtml I could easily see that reducing page weight by way more than half, considering that the css would be cached after the first request. Imagine to cost savings and reduction in page load times!

  71. Is RMS Doomed by Popularity? by clayasaurus · · Score: 1

    Any one else read is it, "Is RMS Doomed by Popularity?"
    Now that would be something.

  72. corporate caching by chiph · · Score: 2, Insightful

    I wouldn't doubt that eventually someone will build a RSS caching device & sell it to the corporate market. Given how big a drain as RSS is to the supplier, the corporate market has the money and determination not to permit it to become a problem for them.

    Chip H.

  73. so useful that people have been talking about it by Anonymous Coward · · Score: 0
  74. It's All in the RSS Reader by Chris+Fritz · · Score: 1

    I use the Sage RSS reader extension for Firefox, and my web sites' feeds use Last-Modified and ETag headers which Sage respects. If I have 10 feeds from my site and none have updates, Sage merely checks the header and the server sends back a "HTTP/1.0 304 Not Modified". If all RSS readers cached feeds and if all servers serving RSS feeds used Last-Modified, then the load drops massively.

  75. Simple solution: pregenerate RSS feed by dwheeler · · Score: 1
    There's an even simpler solution: pregenerate your RSS feed. Whenever info that you use to generate your RSS changes (e.g., you add a blog entry), regenerate a static file. This is no big deal; if anyone asked for the info, you'd have to generate this anyway. Then serve the static file.

    This gets a lot of caching behavior automatically.

    --
    - David A. Wheeler (see my Secure Programming HOWTO)
  76. Here's how by Mitchell+Mebane · · Score: 1

    Everybody writing an RSS client or server script should read this and make it one of their main priorities.

    I imagine even more bandwidth could be saved if the next version of the RSS or ATOM standards mandated rsync support.

    --

    The roots of education are bitter, but the fruit is sweet.
    --Aristotle
    1. Re:Here's how by Mitchell+Mebane · · Score: 1

      Oops, didn't notice the above post. Oh, well.

      --

      The roots of education are bitter, but the fruit is sweet.
      --Aristotle
  77. one thing that would help by erc · · Score: 1

    One thing that would help is if people would stop stupidly going gaga over data wrappers that contain multiple times more metadata than data. XML (which RSS really is) and related technologies are blatantly dumb ideas - both in terms of reinventing a wheel that never needed to be invented in the first place, and in terms of being a massive waste of bandwidth.

    --
    -- Ed Carp, N7EKG erc@pobox.com PGP KeyID: 0x0BD32C9B What I'm up to: http://intuitives.mine.nu
  78. Web caching hasn't caught on? News to me... by _defiant_ · · Score: 1

    A caching system won't work for the same reason web caches have never caught on in the US - people are terrified of being sued to smithereens for potential copyright infringement. Even if any case would be thrown out of court instantly (by no means certain in the US) the costs would be prohibitive and malicious plaintifs rarely ever get asked to pay costs.

    Maybe the caching is just so damn good that you don't notice it. UIUC has a transparent web cache that I doubt 90% (of ~8,000) of the dorm students realize is there.

    Also, Akamai is everywhere now. Is it not a caching solution?

    I don't think web caches have failed... I just don't think they've been applied to the RSS problem.

    1. Re:Web caching hasn't caught on? News to me... by Anonymous Coward · · Score: 0

      I completely agree with your analysis of caching use, but:

      I don't think web caches have failed... I just don't think they've been applied to the RSS problem.

      They have, but these people complaining about the bandwidth RSS sucks up are clueless about caching and they don't apply it to the RSS "problem". Everyone else uses 304s etc and saves a lot of bandwidth, instead of not knowing what the fuck they are doing and complaining about it.

  79. Jabber and/or BitTorrent ! by Anonymous Coward · · Score: 1, Interesting

    Well, I've been thinking about this since RSS first came on my radar a few years ago, and it seems to me something like Jabber might be part of the solution.

    I.e., instead of polling slashdot.org every hour, you maintain a persistence connection to your local Jabber server (at your ISP perhaps), which registers with slashdot's Jabber server. When a new story is published, slashdot's server notifies all the registered servers with the new story, which then distributes the content to each local news reader.

    It would look and feel just like RSS readers do today.

    And you might think "that's too complicated".. well, RSS today works over an HTTP server that you have to install and maintain, and you don't worry about the details of HTTP, why not a Jabber server?

    I wish folks would think of this stuff BEFORE they start using RSS, but what can ya do.

    Also, BitTorrent could be involved somehow to reduce bandwidth even more. For instance in the example above, Slashdot wouldn't have to distribute to EVERY listener, just enough for them to start downloading from each other.

    All it takes is for a couple of the big RSS reader authors to add this and it will happen.. you just gotta have the guts to try. Next version of Mac OS will have Jabber libraries built-in I believe.. there's your chance!!!!

    1. Re:Jabber and/or BitTorrent ! by hildjj · · Score: 2, Interesting
      And here's a first cut at an Internet draft to make it happen. Very small amounts of code, if you have a pubsub service already.

      http://xmpp.org/drafts/draft-saintandre-atompub-no tify-01.html

  80. Browser stats for boingbonig.net by Anonymous Coward · · Score: 0

    http://boingboing.net/stats/ says that Firefox contributes to 31.5% of user agents :-0

    1. Re:Browser stats for boingbonig.net by La+Camiseta · · Score: 1

      It also says that the top two search keyphrases to get to boingboing are anal and henai, with boingboing coming third and blowjobs coming fourth. I just want to know, what the hell?

  81. Nah, not Push. It's overkill for this. :) by Kristoffer+Lunden · · Score: 1

    Push, if memory serves right, kept the connection open "forever" and when it was time for an update, the server just dropped the new content down the pipe. This may not be an appropriate solution, really...

    Since the whole thing as it is now is just HTTP requests anyways, I don't see why

    a) clients couldn't poll at sane intervals, and
    b) something like if-modified-since couldn't be used. Web servers have it out of the box if you serve static RSS manually, otherwise it is just one extra header in your CGI. And on the client side it is as easy, just one simple header to test against *first*.

    I mean, come on.

    Ok, so I haven't actually done any research. But it would surprise me greatly if those things are being done and there still is a problem. :)

    1. Re:Nah, not Push. It's overkill for this. :) by Erik+Hollensbe · · Score: 1

      just do it at the server level - http as a protocol is not meant for access control.

      eg, my RSS feed updates, I allow new requests from that IP. Until then, I send 403 responses (or something more exact) and their client can figure it out. mod_access tied to a dbm (or something more customized) should be able to facilitate this rather well.

      The fact that clients like Thunderbird allow you to refresh every minute (and don't allow per-feed refresh tuning) really sickens me, as most people aren't manually bombing RSS feeds, it's the client giving them these options which makes it convenient, and they "know not what they do".

      I love the fact that despite the fact that I publish an article every couple of days (or sometimes, weeks), people slam my RSS feed sometime every 15 minutes for no reason. Really folks, this isn't a hard problem to solve with the client - those who really like crap like that are going to use some super-powered client that allows them to do it, and probably have a clue of what they're doing.

    2. Re:Nah, not Push. It's overkill for this. :) by tongue · · Score: 1

      well, honestly its not a terribly hard problem to solve at the server either. a script that sends back content-type text/xml and checks the ip address against recent requests would save your bandwidth.

      Of course, then you're trading bandwidth for processing power, which may be six of one, half-dozen of the other depending on your popularity.

      Ultimately I think blacklisting IP's that overindulge is probably the best way to go, unless you're someone like the New York Times and can't lose the ad impressions (though you can insert ads into your feed really easily. see moreover.com)

    3. Re:Nah, not Push. It's overkill for this. :) by Darren+Winsper · · Score: 1

      But what do you do about people behind a NAT? Personally, I think this is all a problem that really shouldn't exist. It was drilled into my head at uni that polling = BAD, not to mention that polling badly scales, as we're seeing here.

    4. Re:Nah, not Push. It's overkill for this. :) by Erik+Hollensbe · · Score: 1

      Actually, the solution I proposed suggests leaving the current polling system in place and restricting the work to the server.

      NAT's, while commonplace, unfortunately are always going to have a problem with IP-based restrictions - vote on a poll here on slashdot and you will know.

      Full support of the HTTP standard allows for a lot more flexibility, such as using cookies, but as anyone who works with HTTP knows, Set-Cookie: directives are easily ignored and Cookie: headers are not sent.

      So, anything that I did in this regard would amount to, "If you're not getting your feed, sign up for an account and send us cookie headers so we know to send them."

      Really though, the problem is that HTTP, in general, is a horrible protocol. It's way too simple for all the things that travel over it nowadays, and it's stateless nature makes it very hard to enforce any kind of access restriction - something like SMTP, which is not much better but *is* stateful, would be a better alternative...

      The HTTP/1.2 agenda is supposed to cover stateful transmission and I really, really hope they put a lot more momentum towards getting this out there. They've been futzing around with it for years now.

  82. Solution!-Intel Intercast. by Anonymous Coward · · Score: 0

    The problems with pointcast were more than just technological. Intercast was a better push technology for the pre-broadband era, and even now with HDTV and cable/satellite. It still may be a good choice.

  83. RSS has already failed. by Anonymous Coward · · Score: 1, Insightful

    It was meant for syndication. So that one website could gather syndicated news from other sites. It was not meant for individual readers to use it as a news update service. Simply using an appropriate protocol would solve this problem, but do to the blogtard community, this will never happen. And so RSS is doomed to be used stupidly just like it is now.

    1. Re:RSS has already failed. by Anonymous Coward · · Score: 0
      It was meant for syndication.

      Big Fucking Deal.

      The Atom bomb was designed to blow the shit out of people and the principles are used to generate energy to power your ill-used computer.
      Just because that's what it was meant for doesn't mean someone can't find another use for it.

      The real problem is retarded RSS readers that keep downloading the whole thing over and over instead of asking if it's been updated.

    2. Re:RSS has already failed. by Anonymous Coward · · Score: 0

      What an amazingly shitty analogy. This is nothing to do with principles, atom bombs are not used to generate electricity, RSS is used for a new update service. And besides, the principles behind the atom bomb actually played no real role in hydroelectric generators, which are making the electricity to power my computer.

      And just making RSS readers less damaging is just a bandaid. Simply not being stupid in the first place is a solution. A simple app can connect to a server, and listen for events, and events can be sent out only as they occur, instead of anyone ever asking anything. Heaven forbid people create useful new programs and protocols, thus progressing technology, instead of hijacking other protocols for absurd uses.

    3. Re:RSS has already failed. by Anonymous Coward · · Score: 0
      atom bombs are not used to generate electricity

      No Shit Sherlock

      Go back and read the comment. And read it again. And realize that the PRINCIPLES are used to generate energy.

      And just making RSS readers less damaging is just a bandaid. Simply not being stupid in the first place is a solution. A simple app can connect to a server, and listen for events, and events can be sent out only as they occur, instead of anyone ever asking anything.

      Your suggestion is broken because the server runs out of sockets and/or RAM due to keeping track of all those connections. RSS (such as slashdot uses it) gives a short list of URLs that uses considerably less bandwidth and processing power than loading the front page every few minutes.

      Only polling for the full RSS feed when it has changed isn't a band-aid, it's the right thing to do.

    4. Re:RSS has already failed. by Anonymous Coward · · Score: 0

      You go back and read moron, like I said, this has nothing to do with using the principles learned from something in something else, its using the thing, exactly as is, for a purpose it doesn't suit well.

      And you clearly have no idea what you are talking about, you can use tens of thousands of sockets just fine, you will not run out of sockets or RAM unless you have set your system limits low, or are running on underpowered hardware.

    5. Re:RSS has already failed. by chefmonkey · · Score: 1
      ...[Y]ou clearly have no idea what you are talking about, you can use tens of thousands of sockets just fine, you will not run out of sockets or RAM unless you have set your system limits low, or are running on underpowered hardware


      So there are only tens of thousands of slashdot readers? Wow. There sure is a lot of noise around here for that -- not to mention user IDs that are somewhat inflated.

      The point he was trying to make -- and he's right -- is that, if you keep persistent TCP connections open for every interested client, you'll run out of local resources long before you exhaust network bandwidth. You've managed to reduce a difficult-to-solve problem to an impossible-to-solve problem.

      Trust me on this one; I design large-scale network servers for a living.
    6. Re:RSS has already failed. by Anonymous Coward · · Score: 0

      RSS has already failed? You might be a bit more convincing if you didn't post it on a website that syndicates dozens of other websites using RSS.

  84. Conditional GET and feed services by PeekabooCaribou · · Score: 1

    Conditional GET request and centralized feed reading services like the ones mentioned in this post are important to keep syndication bandwidth down. Also, beating people that update feeds every 5 minutes is good too.

    --
    "I'll say it again for the logic-impaired." -- Larry Wall.
  85. Remember Scoble's false alarm about this? by miller60 · · Score: 1

    Back in September Microsoft blogging evangelist Robert Scoble warned that RSS is broken, saying the sky was falling and RSS bandwidth usage was forcing Microsoft to skinny down its feeds. Turns out it wasn't quite true. Microsoft's IT folks thought 400KB feeds were excessive, and RSS feeds are no big deal compared to 106 million downloads of the 75MB SP2 update. But the ensuing debate produced some useful discussion among RSS enthusiasts about ways to make clients smarter and give more server-side control. See the writeup at Netcraft (Slashdot is noted as an early adopter).

  86. Fifty times? by Anonymous Coward · · Score: 0

    This might be a good time to point out that I've often had trouble with Slashdot's RSS feed. I use the News plugin for Trillian Pro to access it every 32 minutes... Yet I'm often being banned. Have you heard of problems with this reader before? Or is it slashdot that's broken?

  87. Event driven! by Aredridel · · Score: 1

    Use RSS (or rather, Atom) over XMPP (Jabber!)

    Like, Duh!

    1. Re:Event driven! by bobwyman · · Score: 1

      "Atom over XMPP" is supported by PubSub.com and is the subject of an Internet Draft. See: http://www.ietf.org/internet-drafts/draft-saintand re-atompub-notify-01.txt

      We offer a free FireFox/IE sidebar for receiving "Atom over XMPP" on our website at: http://www.pubsub.com/sidebar_firefox.php

      bob wyman

  88. Random retreival times? by TheLoneIguana · · Score: 1

    I have throught that some form of randomizing function in an RSS reader would be a good idea; so that the readers aren't all hammering at the door at the same time (ie the top of the hour).
    Don't know if that would make much of a dent or not.

  89. Well... by Xenex · · Score: 1

    You've just made me a subscriber again!

  90. Distributed Networking by Cow007 · · Score: 1

    The internet community had better start looking at how to integrate Bit Torrent into the internet itself It could very well save it from choking on itself. I beleve that eventually that in return for using the net you should be required to spread the love. This model also makes it a bit harder for illicit enterprises to operate.

    --
    411 Y0UR 8453 4R3 8310NG 70 U5!! -NSA
  91. Now... by Xenex · · Score: 1

    I wonder how long it takes to come into effect. Still just index.rss here.

    1. Re:Now... by jamie · · Score: 1

      Try it now.

  92. Non-Sequitur... again by Safety+Cap · · Score: 1

    Close enough to be a dupe? You Decide.

    Nevertheless, this is not an issue, but like the unwashed shrills squawking that the end of Social Security is nigh, RSS is far from being dead. The issue is that ignorant (maybe I should say 'stupid') people did not bother to implement the spec properly in their RSS reader code. I'm not talking about the RSS spec, but the HTML spec. This is a simple two step process (credit Charles Miller):

    1. When you first pull your RSS feed, store the values you get for Last-modified ( = A) and ETag (= B).
    2. When you want to next poll your feed, send If-Modified-Since: A and If-None-Match: B.

    If the RSS feed has not been updated since you last polled, you will get a 304: Not Modified in response, but no RSS feed (because it has not changed, duh).

    It's like in The Army, you know--The Great Prince issues commands, founds states, vests families with fiefs. Inferior people should not be employed (creating broken RSS readers).

    --
    Yeah, right.
    1. Re:Non-Sequitur... again by TheLink · · Score: 1

      Minor note: it's an HTTP spec not an HTML spec.

      It helps in some cases to keep the two things distinct (e.g. stuff like how to quote characters).

      --
    2. Re:Non-Sequitur... again by Safety+Cap · · Score: 1

      You're right; thanks for the correction!

      --
      Yeah, right.
  93. Compression by yem · · Score: 2, Insightful

    I assume the complainers are using it?

    51894b boingboing.rss.xml
    17842b boingboing.rss.xml.gz

    --
    No, I did not read the f***ing article!
  94. Problems are political, not technical. by SteveX · · Score: 1
    Massively distributing content without making the publisher pay for distribution is a problem that's been solved a few times now - by Usenet, and by IRC for example. Adapting one of the existing solutions to solve syndication wouldn't be that tough.

    But, the folks involved with RSS / Atom are "wire protocols should be xml" types who like the idea of using an XML-RPC call too much to give it up easily.

    It's too bad, since it doesn't really need to be death to a site to have too many people subscribe. If I post an article on Usenet, 100 million people could read it tomorrow and it wouldn't cost me a cent. There are some problems with it, but problems that could have been solved in a lot less work than what it's going to take to fix syndication now.

    Here is some more of my ranting about this.

  95. RSS sucks anyway by Anonymous Coward · · Score: 0

    http://www.informatik.uni-oldenburg.de/~ulli/why-r ss-sucks.html

  96. RFC3229+feed defines "delta" encoding for RSS by bobwyman · · Score: 2, Informative

    Your suggestion is precisely what is defined by RFC3229+feed (i.e. an RSS-specific extension to RFC3229 " delta encoding for HTTP). I maintain a list of implementation of RFC3229+feed on my blog. You can also find some empirical evidence showing massive bandwidth savings as a result of RFC3229+feed use.

    This is a well known and "solved" issue...

    bob wyman

  97. rss distribution by dgp · · Score: 1

    first and foremost, use a publish/subscribe system. no more polling! jabber may be a good framework for this. upon joining an rss subscription you get a refresh and after that you get deltas.

    as has been mentioned many times already, a swarming delivery service would be helpful.

  98. obvious to anyone with a second grade education by peccary · · Score: 1

    Yeah, "RSS was a really stupid protocol".
    As was HTTP, and the idea of putting protocol specifiers in names, and Napster, and Microsoft Dfs and a dozen other protocols which were designed to varying degrees of poor.

    People invent these non-scaling, incredibly wasteful protocols that seem like they work fine for screwing around with their three buddies when you're willing to dedicate $300 worth of server hardware and 1 Mb/s of network bandwidth per user.

    But when you try to handle hundreds of thousands of users for $1 each, those protocols won't ever work.

    Of course, if you're a physicist, or a freshman, or an MBA, you're likely to assume that one, ten, a thousand users, it's all the same.

    And never stop to think about tens or hundreds of thousands of users.

  99. Version Control = trading one problem for another by goofrider · · Score: 1

    You'll need to parse the file every time someone ask for it. So you're just trading exceessive bandwidth usage for excessive CPU load.

  100. Date-dependent RSS? by Cardbox · · Score: 1

    I've always been held back from using RSS by the fact that there is no way to syndicate the "page of the day". For example, as I write this, it is Thursday in Europe and Australia but Wednesday in California and points west. Whatever time of day the feed is retrieved, it's going to be wrong for someone.

    Do I have to create 28 separate feeds, one for each timezone, or has someone come up with a better solution?

    1. Re:Date-dependent RSS? by Anonymous Coward · · Score: 0

      I've always been held back from using RSS by the fact that there is no way to syndicate the "page of the day". For example, as I write this, it is Thursday in Europe and Australia but Wednesday in California and points west. Whatever time of day the feed is retrieved, it's going to be wrong for someone.

      What do you mean? No matter what timezone you are in, it'll still only be published once a day. Can you give an example or two?

  101. Doomed, I tell you! Doomed! by don.g · · Score: 1

    Yes.

    --
    Pretend that something especially witty is here. Thanks.
  102. Easy: use Coral by Anonymous Coward · · Score: 0

    Using the Coral Web cache for RSS is simple and requires NO modifications to web sites and RSS clients. The server load and bandwidth usage can be reduced in no time. See Making RSS scale with Coral.

  103. cookies ? by Anonymous Coward · · Score: 0

    should be possible to store/read cookies by the rss feed server... why just don't store the date of last access and only send new items like this rss-cache site does with ip's ?

  104. Jabber by pronik · · Score: 1

    Well, Jabber is a solution for many tasks. For me, it's gathering RSS feeds. There is an RSS transport on a particular server, it gets the feeds about hourly and sends the new messages to me. If everyone would move their RSS needs to Jabber (and also help to develop the transports!), the bandwidth problem will cease to exist!

  105. P2P RSS by kurisudes · · Score: 1

    P2P RSS... hmm sounds familiar.... oh yeah... It's called usenet... nothing new here but extra xml...

    --
    --------------------------------- Born Again Bourne Again Believer: New Life, GNU/Linux Be Free!
  106. Solution: by BristolCream · · Score: 1

    Very simple solution: combine an RSS client with BitTorrent.

  107. Actually, this is a more general xml problem by evil_one666 · · Score: 2, Interesting

    XML munches up bandwidth like a lardy butter lover. Yes, yes, RSS feeds are handy, but they dont actually do anything that couldnt be achieved with a much leaner binary format. Its 2004, we dont have byte compatablitily issues any more

    See Roedy Greens (one time comp.java.lang FAQ maintainer)excellent essay on why XML causes these problems.

    1. Re:Actually, this is a more general xml problem by Hast · · Score: 1

      That's why you compress the XML data. Sure it is mentioned in that article but his comment is that zip is too CPU heavy and RAM intensive to be useful and that no-one uses it.

      While XML has problems you can use a binary system, but then you just get a completely different set of problems. Perhaps it would be best to define data structures with XML and then compress that into a binary format. That way you could (at least in some ways) get the best of both worlds.

      A straight binary format is bad though. You loose a lot of flexibility just to make it smaller and a little faster to parse. With XML you can trade size for speed with compression.

      The basic issue is that it's a lot easier to get a binary format wrong. And debugging a binary format over eg network is just a bitch.

    2. Re:Actually, this is a more general xml problem by (trb001) · · Score: 1

      XML munches more bandwidth than byte streams, true, but we're still talking about a total in the thousands of kilobyte range, not megs, polling periodically. I'll be happy to look at some hard numbers, but unless you're downloading hundreds upon hundreds of constantly updating RSS feeds, I don't think it's anything to worry about.

      --trb

    3. Re:Actually, this is a more general xml problem by evil_one666 · · Score: 1

      It is. That is the whole point of this thread and the article. Please RTFA

    4. Re:Actually, this is a more general xml problem by hummassa · · Score: 1
      [...] zip is too CPU heavy and RAM intensive [...]

      I don't know if the article is specifying zip or gzip, because gzip is NOT cpu heavy nor ram intensive, and every site/browser should enable transparent gzip of xml and html.
      --
      It's better to be the foot on the boot than the face on the pavement. ~~ tkx Kadin2048
    5. Re:Actually, this is a more general xml problem by EastCoastSurfer · · Score: 1

      I agree. Haven't FTP servers been doing this for years?

    6. Re:Actually, this is a more general xml problem by CrazyWingman · · Score: 1

      a total in the thousands of kilobyte range, not megs

      Hmmm...let's look this up here real quick... Yep - 1 MB is still equal to 1024 KB. So, I guess "thousands of kilobytes" is still around a few megs. ;)

    7. Re:Actually, this is a more general xml problem by baadfood · · Score: 1

      Its called asn.1 which, incidently, already has a defined mapping to and from xml Its binary encoding is designed to include as little redundant data as possible while retaining the benefits of tagged heirchaical data, and it is already used for storing and transmitting things like certificates in SSL.

    8. Re:Actually, this is a more general xml problem by (trb001) · · Score: 1

      I'm a moron...thousands of bytes is what I meant.

      --trb

    9. Re:Actually, this is a more general xml problem by ivan256 · · Score: 1

      With the speed optimized stettings in zlib, XML containing mostly ASCII content (like most RSS feeds) compresses to about 15% of it's original size.

      Don't think about the downloader, think about the server which may have tens of thousands of users polling every few seconds. Do the math, that's *gigabytes* a second. Now, if you were the host for that feed, wouldn't you like to cut that traffic to less than a sixth of it's current size?

      Compressed XML compares favorably in size with comprable binary protocols.

    10. Re:Actually, this is a more general xml problem by ivan256 · · Score: 1

      Sure it is mentioned in that article but his comment is that zip is too CPU heavy and RAM intensive to be useful and that no-one uses it.


      Mentioned, but it's total bull.

      The server could cache the compressed copy such that they only run the (sub-second) compression once per feed update. A 2ghz P4 can compress about a megabyte of XML with zlib down to about 150kb in under a tenth of a second. If you're only doing that once every few minutes it's negligable. Sure, the client has to decompress, but so what? Even the most modest machine these days can decompress that in under a second, and the server doesn't care how much processing time the client is using. With an RSS reader, the processing is happening in the background anyway, and the user doesn't realize they are waiting.

      Plus a lot of times compressed XML is smaller than the comprable binary format. This is kind of stuff that zlib was designed for. I used to be anti-XML as a protocol too, until I actually tried it and compared the numbers. Binary formats are for when you need low latency (parsing XML is a bitch CPU and memory wise), but if you don't care so much about latency and you're optimizing for size, compressed XML is the way to go.

    11. Re:Actually, this is a more general xml problem by anomalous+cohort · · Score: 1
      XML munches up bandwidth like a lardy butter lover

      Is this really a bandwidth issue or a scalability issue? I was under the impression that the problem comes with bursts of simultaneous requests. A previous /. topic discussed this.

      Now, if the clients could all agree to ask at different times like CSMA/CD, then you would have a scalable solution. It's too bad that CDF never became popular because that format does about the same thing as RSS but with some additional information on the window with which requests should be made.

  108. Re:They just need to follow /.'s lead by the+angry+liberal · · Score: 2, Funny

    Jamie, if you need help securing /., I have just your man. He is a smarty like you.

  109. reinventing the wheel by famebait · · Score: 1

    Possible solutions to this problem are emerging slowly, like RSScache (feed caching proxy) and KnowNow (even-driven syndication).

    Didn't Usenet solve this sort of thing decades ago?

    One might want to modernise it slightly so you get shoter lag times etc. but the basic distribution problem is the same and the algorithms chosen have worked well almost since the start of th internet without eating up the capacity.

    --
    sudo ergo sum
  110. I liked pointcast... by mikeage · · Score: 1

    was I the only one who liked the idea of using (let's say) 500K of bandwidth per hour while my screen saver was on anyway (and so I wasn't really doing anything) so that if I saw something interesting, I could have it NOW (and not in the 10 seconds it takes to load a page)? Was I the only one who actually liked their screensaver (yes, it was shiny and flashy, but when you look across the room at it, this stuff is important).

    Actually, this post is serious, not sarcastic. Is there any similar replacements?

    --
    -- Is "Sig" copyrighted by www.sig.com?
    1. Re:I liked pointcast... by Anonymous Coward · · Score: 0

      PointCast was cool.
      PointCast was a pig.

      RSS readers are the same damn thing. But, tehcnology has moved forward and now we have the ram and disk space to do it. But in the end it's the same damn thing.

  111. Diffs? by Nephrite · · Score: 1

    Hey, what about not sending the whole RSS xml (which could be huge) but just the diffs with previous? Like cvs or cvsup. This will save a ton of bandwidth.

  112. What is the point? by beforewisdom · · Score: 1

    I don't get the popularity of RSS.

    I personally rather have content sans a gui, but most people consider going from a web site to text a step backwards.

    Some sites also get RSS feeds from other sites, but why bother? Why not just go to that other site? Usually they do a better job.

    So, what is the appeal or am I misunderstanding RSS?

    1. Re:What is the point? by Tazzy531 · · Score: 1

      If you read a large number of blogs or content per day it may be a lot easier to get the feed rather than visit each of the site to check if it has been updated or not. At work, we have a news feed and a slashdot feed. There's no point in constantly hitting reload on /. to see if there is a new story. Now, I only visit if there is a new story and it is something that I'm interested in.

      --


      _______________________________
      "I'm not Conceited...I'm just a realist..."
  113. Standards Compliant Slashdot by mu-sly · · Score: 1

    They already started doing this on AListApart a while back.

    Can't happen a moment too soon!

  114. FOR THE LOVE OF GOD WHAT IS RSS!!! by Anonymous Coward · · Score: 0

    Could someone please explain?

    1. Re:FOR THE LOVE OF GOD WHAT IS RSS!!! by trongey · · Score: 1

      Try clicking the link in the posting.

      --
      You never really know how close to the edge you can go until you fall off.
  115. Compression Makes a Big Difference by JeffBarr · · Score: 1

    I did some measurements across several hundred thousand of the most prominent RSS feeds, and I found that only a few actually return a compressed feed when so requested.

    On average, compressed feeds are 30.42% of the size of the original, as you can see in graphical form here.

    Better support for mod_gzip would certainly help to reduce the impact of RSS polling, but then again so would proper use of conditional get.

  116. RSS is using too much bandwidth? by SeaFox · · Score: 1

    Really, it's like people look for things to complain about.

    What do you think uses more bandwidth, 20,000 people loading a webpage with the latest news, or 20,000 people loading an RSS feed?

  117. Bandwith Solutions? by jeffpostcn · · Score: 1

    Bit Torrents.

  118. RSS and revenue by happyfrogcow · · Score: 1

    When we showed our corporate overlords what RSS was, the first thing they asked was "How can we get ads in there" ...

    It took some time to explain why RSS was good without ads.

  119. RSS Throttling Script by TheLoneGundam · · Score: 2, Informative

    Glenn Fleishman, of Wi-Fi Networking News has written a script to throttle the poorly-behaved aggregators and writes about it on his personal blog.

  120. RSS hits that directly hit databases are flawed by smagruder · · Score: 2, Insightful

    I've seen many RSS URLs pull from a site's database to build the XML each time it's hit. This is fixed simply by creating a CRON job that builds the RSS XML on a periodic basis, then serving the resulting file. If you're just throwing a file back, then server bandwidth isn't as much of a problem, especially when you consider that browsers themselves cache files.

    --
    Steve Magruder, Metro Foodist
  121. A leaner binary format == one UDP or TCP packet by SgtChaireBourne · · Score: 1
    XML munches up bandwidth like a lardy butter lover.
    A fat UDP or TCP packet could probably hold enough info to do the job. There are several questions about how to do it, but the one I can think of first would be whether the client polls the server in a ping-like manner or if the client subscribes to a server and then waits for an event notice. Either way there are safety issues and error handling to figure out.

    The second option seems safer for the server. At an interval of hours or days, the server could check to verify the existence of the client.

    --
    Beta is broken and the link to classic doesn't work. Stop wasting our time or there won't be anybody left here.
  122. There are ways to mitigate the impact by TNLNYC · · Score: 2, Informative

    I'm reproducing this article from my own site (all the links are on the site):

    Capacity planning and RSS
    September 9, 2004

    Robert Scoble points to MSDN having issues with full entry RSS. What it comes down to is a capacity planning exercise.

    In his note, he says that RSS is broken. I personally believe that at issue is not whether RSS is working or not. RSS is working but it has complicated the bandwidth issue. At issue is the fact that RSS feeds are generally generating more traffic to a site. Because RSS readers are polling the site to check if a feed has been updated, the traffic patterns change, with increased numbers of spikes on a hourly basis. This is similar to some of the issues network administrators started facing when Pointcast first appeared.

    There are a number of ways to mitigate the issue.

    HTTP Conditional GET for RSS
    First of all, one of the things to consider when using RSS is to create conditional HTTP headers on RSS feeds. This helps mitigate some of the impact by ensuring that feeds are only served if the content has changed.

    Feed Compression
    The next item to think of is to use compression when serving feeds. By doing so, one reduces the size of the payload, which ends up being much better in terms of managing bandwidth. In my own experience, because RSS is primarily text, I've seen a reduction of 80% of the bandwidth when delivering RSS feeds in a compressed format. That represents a fairly large gain in bandwidth that can then accommodate more users.

    Change the polling schedule
    The RSS 2.0 specification already offers a number of optional elements to give RSS readers a better idea as to when to get content. For example, the pubDate element offers information as to when a feed was last published, as does the lastBuildDate one. ttl (aka. time to live) can also be used to indicate to the software that this feed should live for a certain amount of time. Finally, skipHours and skipDays offers more pointers as to when RSS reader software should not poll. With all those mechanisms in place, it looks like a lot of flexibility exists in the format to accommodate scalability.

    When all else fails, reduce
    If all of the above still fail, RSS publishers should look at reducing the size of their feeds. There are two ways you can do this. First, you can just say that you're not going to offer full-text feeds. This seems to be the option that Scoble hates. Another way to do things is to offer both abbreviated feeds and full-text feeds or offer more detailed feeds, as I do on TNL.net.

    An important consideration when doing something like this is how to address them. By default, users who just use the RSS autodiscovery feature will only get the abbreviated feed. However, they still have the option to go and get the full-text version. The compromise here is that users who just want to subscribe quickly can do so at a lower bandwidth costs, while power users can seek out the fuller feed and subscribe to that. The result, in my experience, is that most people use the autodiscovery feature, grabbing the smaller feed. Some power users do seek out the fuller feed and subscribe to that instead (based on the numbers, I'm seeing a 5% usage of the full-text feed as opposed to the default abbreviated one. This is a compromise solution that seems to accomodate everyone involved to date.

    Final considerations
    When publishing RSS feeds, your audience grows, which results in traffic growth too. One of the thing to realize is that RSS feeds are generally stickier than the rest of a site. What this means is that, for every new subscriber you get, you will see an on-going increase in your overall site traffic stats. This is not a bad thing as messages emanating from your site do get a higher passive readership. One of the thing that new syndication standards should consider is a follow-up on this. While RSS publisher know how many feeds are being pushed out, there is littl

    --
    Check out http://www.tnl.net/blog
  123. HTTP is Not the Answer by stpeter · · Score: 1

    Massive polling for updates leads to scalability problems? Big surprise! We need to learn that HTTP is not always the best technology for the job. Just-in-time content delivery requires a different set of tools. There's already an Internet-Draft for sending Atom feeds over XMPP (a.k.a. Jabber), and the same "publish-subscribe" technology could be used for RSS (or a smart service could translate to Atom so your client doesn't need to parse all those RSS formats). Check out PubSub.com for a real-life implementation of the basic concept (they track 3+ million feeds and notify you when a feed you're interested in has changed, and even do handy keyword-based monitoring). And one added benefit of using the XMPP pubsub extension is that these are all open protocols with many open-source implementations. In this problem-space at least, HTTP is so second-millennium!

  124. Miski: client2server2server2client by Philip+Dorrell · · Score: 2, Interesting

    In 2000 I tried to invent a spam-proof usenet. The result of my efforts was Miski. The idea of Miski was that users would have addresses on servers representing what are effectively RSS channels, and other users would subscribe to these channels through their servers. There would be a DNS extension for the naming of servers. Channels would have names like username@example.com/"Java Programming". The system would be spam-proof because your server would only send you what you had subscribed to. It would be "push", because as soon as you posted something to a channel, your server would pass the message on to the servers of those who had subscribed to your channel. Only the notifications would be push: ordinary http would be used to retrieve the actual content.

    Miski also had the important concept of "reposting", whereby if you saw something you liked, you could press a single button in your client to repost the notification, so that any subscribers to you could know about the item being reposted, if they had not already heard about it from somewhere else. The presumption was that the client (or the reader's server) would trim out duplicates, so that people posting would have no inhibitions about reposting stuff that maybe many of their subscribers already knew about.

    Miski was more than just an attempt to create scalable-push RSS, or a spam-proof equivalent of Usenet: it was a vision of the "global brain". Using posting and reposting, notification of a new "interesting" idea could spread very quickly from the inventor of the idea to almost anyone in the world likely to be interested in that idea, even if the inventor was not well known. We would all be like neurons in the brain, with signals passing from one person to the next as fast as possible. It was an attempt to solve the dual problems of "How can I tell the world what I have to say when I have to compete against the efforts of all those other people trying to tell the world stuff?" and "How can I find out new stuff that's really interesting to me from among all this junk that I am getting from all these people trying to tell stuff to the world?".

    I asked the question How fast is the Internet?. Although packets can travel from one computer to another in seconds, or even less, information can still take days, weeks, months or even years to travel from the person who created it to another person who is interested in it. One way to measure this is to consider how often you find a document on the web which is interesting, but which you did not know about, and which has nevertheless been available for months or years, and which would have been interesting to you even when it was originally posted on the web.

    Sadly Miski was never implemented, and I reduced my ambitions to write Womcat Bookmarks, which attempted to be a less dynamic version of Miski, but has ended up being just another RSS reader.

    --
    Music: a super-stimulus for the perception of musicality. Musicality: a perceived aspect of speech.
  125. Don't blame RSS! by Anonymous Coward · · Score: 0

    blame XML!

  126. It's not RSS's fault! by Anonymous Coward · · Score: 1, Insightful

    That's XML for ya!

    the best way is to optimize your rss feeds to a max of 10 items, and stick to TITLE and LINK fields only.

    Tom's hardware had a feed that was over 500kb, and they wonder why they had bandwidth issues.

  127. The Bandwidth is there.... by jwb4273 · · Score: 1
    if people would only loosen up on it.

    During the Dot-bomb era, all these companies popped up. Remember them? The ones like Pets.com, medsonline.com, etc. The philosophy was that all this new technology was so going to permeate into every home in the world by 2000 - and every time someone needed a Tylenol for their headache they would be going online to buy it.

    Turns out, as all of us know, that was dead wrong. But the companies, before going bust, kept yelling one message to the Telcos and IP providers - "WE NEED BANDWIDTH" and "WE NEED LOTS OF IT". Come on - there are literally millions of miles of fiber optics out there - each capable of handling TONS of gigabits of capacity.

    Want reality? I don't know the exact numbers or anything, but I seem to recall from somewhere that over 60% of the fiber out there is DARK. Yes, that's right - DARK FIBER! Capable of handling those OC-192+s.

    Ok, the laws of economics are supposed to tell us that if you have a demand the price goes up, and more of a supply and the price goes down. If carriers would only open more supply up (which, mind you, they already have), prices would drop for that bandwidth, and VOILA - we don't have bandwidth problems any more because the companies it's sucking on (Cnet, /., etc.) can afford more bandwidth!

    It seems silly to me that we're fussing over bandwidth issues when we have literally gigabits of the stuff laying under the roads we drive every day that's not turned on. Now, if I could only get come of that optical stuff in front of my house I'd be doing good!

  128. Torrenting small files is great...for the server! by lilmouse · · Score: 1
    Tracker overhead swamps any gains you might make.


    Yes, but that overhead would then be handled by the p2p clients, not by the central server. It's become Somebody Else's Problem. Not necessarily good for the internet as a whole, but a solution for the server :)

    --LWM
  129. A solution to the RSS bandwidth problem by ToterSan · · Score: 1

    The solution is twofold:
    1: Conditional get(s) enable only new data to be sentZ
    2: Random (or pseudorandom) refresh. Most people set their readers to get headlines every 30 or 60 minutes & consequently sites are overwhelmed at the hourly & half-hourly marks. I personally set my reader to update at 100 minute intervals. I would support efforts of RSS developers to enact random requeing (the server gives the reader a random time to recheck for updates)

    BTW, Eweek had relatively the same response in September... Clicky
    Cheers!
    Daniel Lott,
    Service Computers, LLC.

  130. Reminds me of a quote by Yogi Berra: by garyebickford · · Score: 1

    "Nobody goes there any more, it's too crowded."

    Other quotes by Yogi (You've heard many of them.)

    --
    It's easier to be a result of the past, but more fun to be a cause of the future! http://www.spacefinancegroup.com/