Slashdot Mirror


When RSS Traffic Looks Like a DDoS

An anonymous reader writes "Infoworld's CTO Chad Dickerson says he has a love/hate relationship with RSS. He loves the changes to his information production and consumption, but he hates the behavior of some RSS feed readers. Every hour, Infoworld "sees a massive surge of RSS newsreader activity" that "has all the characteristics of a distributed DoS attack." So many requests in such a short period of time are creating scaling issues. " We've seen similiar problems over the years. RSS (or as it should be called, "Speedfeed") is such a useful thing, it's unfortunate that it's ultimately just very stupid.

82 of 443 comments (clear)

  1. RSS maybe by Anonymous Coward · · Score: 3, Funny

    RSS may be ultimatly stupid but you didn't get first post did you! rookie!

  2. Yesterday by ravan_a · · Score: 3, Interesting

    Does this have anything to do with /. problems yesterday

    --
    -ravan_a
    1. Re:Yesterday by afidel · · Score: 2, Interesting

      Oh how prophetic, I went to check the first reply to your post and slashdot again did the white page thing (top and left borders with a white page and no right border). Earlier today (around noon EST) I was getting nothing but 503's. This new code has not been good to Slashdot.

      --
      There are 4 boxes to use in the defense of liberty: soap, ballot, jury, ammo. Use in that order. Starting now.
  3. netcraft article by croddy · · Score: 4, Informative
  4. Can't this be throttled? by xplosiv · · Score: 2, Interesting

    Can't one just write a small php script or something which returns an error (i.e. 500), less data to send back, and hopefully the reader would just try again later.

    1. Re:Can't this be throttled? by jcain · · Score: 3, Insightful

      That kind of eliminates the point of having the RSS at all, as the user no longer gets up-to-the-minute information.

      Also, I doubt that the major problem here is bandwidth, more the number of requests the server has to deal with. RSS feeds are quite small (just text most of the time). The server would still have to run that PHP script you suggest.

    2. Re:Can't this be throttled? by mgoodman · · Score: 4, Insightful

      Then their RSS client would barf on the input and the user wouldn't see any of the previously downloaded news feeds, in some cases.

      Or rather, anyone that programs an RSS reader so horribly as to make it so that every client downloads information every hour on the hour would probably also barf on the input of a 500 or 404 error.

      Most RSS feeders *should* just download every hour from the time they start, making the download intervals between users more or less random and well-dispersed. And if you want it more than every hour, well then edit the source and compile it yourself :P

      --
      01100111 01100101 01110100 00100000 01101111 01110101 01110100 00100000 01101101 01101111 01110010 01100101 00101110
    3. Re:Can't this be throttled? by ameoba · · Score: 4, Insightful

      It seems kinda stupid to have the clients basing their updates on clock time. Doing an update on client startup and then every 60min after that would be just as easy as doing it on the clock time & would basically eliminate the whole DDOSesque thing.

      --
      my sig's at the bottom of the page.
    4. Re:Can't this be throttled? by TREE · · Score: 2, Insightful

      500 or 404 won't work for RSS, since most readers just eat the error and try again later.

      What would really, really be effective would be a valid RSS feed that contained an error message in-line describing why your request was rejected. A few big sites doing this would rapidly get the rest of the users and clients to be updated.

    5. Re:Can't this be throttled? by mblase · · Score: 4, Insightful

      Most RSS feeders *should* just download every hour from the time they start

      That's also a problem, though, since most people start work at their computer desks on the hour, or very close to it. The better solution would be for the client (1) to check once at startup, then (2) pick a random number between one and sixty (or thirty or whatever) and (3) start checking the feed, hourly, after that many minutes. That's the only way to ensure a decently random distribution of hits.

    6. Re:Can't this be throttled? by hunterx11 · · Score: 4, Funny
      From now on, instead of telling people to fuck off I'll just say:

      User-agent: You
      Disallow: /
      --
      English is easier said than done.
    7. Re:Can't this be throttled? by Fat+Cow · · Score: 2, Informative

      I think that the problem is the peak load - unfortunately the rss readers all download at the same time (they should be more uniformly distributed within the minimum update period). This means that you have to design your system to cope with the peak load, but then all that capacity is sitting idle the rest of the time.

      The electricity production system has the same problem

      --
      stay frosty and alert
    8. Re:Can't this be throttled? by Anonymous Coward · · Score: 2, Insightful

      How about having the SERVER tell the client when to download next? Sort'a like DHCP, but more inteligent: The server will even out the TTL by some sort of gausian algorithm, and in that method save itself!

      If certian users want news more often, (say every 15 minutes, verses every hour), have the client say that it would like news every 15 minutes, and the server will schedule it (almost like a calendar), and will send the client a TTL that is almost 15 minutes (but close enough). Infact, this might be the better route: fundamentally change the way RSS works, so that newsreaders are REQUIRED to RSVP, and the ones that don't get an error message (telling the client about newsreaders that are supported)

  5. Simple HTTP Solution by inertia187 · · Score: 3, Informative

    The readers should HEAD to see if the last modified changed... And the feed rendering engines should make sure their last modified is accurate.

    --
    A programmer is a machine for converting coffee into code.
    1. Re:Simple HTTP Solution by skraps · · Score: 5, Insightful

      This "optimization" will not have any long-lasting benefits. There are at least three variables in this equation:

      1. Number of users
      2. Number of RSS feeds
      3. Size of each request

      This optimization only addresses #3, which is the least likely to grow as time goes on.

      --
      Karma: -2147483648 (Mostly affected by integer overflow)
    2. Re:Simple HTTP Solution by ry4an · · Score: 3, Informative
      Better than that they should use the HTTP 2616 If-Modified-Since: header in their GETs as specified in section 14.25. That way if it has changed they don't have to do a subsequent GET.

      Someone did a nice write-up about doing so back in 2002.

    3. Re:Simple HTTP Solution by johnbeat · · Score: 3, Informative

      So, he's writing from infoworld and complaining that RSS feed readers grab feeds whether the data has changed or not. So, I went to look for infoworld's RSS feeds. Found them at:

      http://www.infoworld.com/rss/rss_info.html

      Trying the top news feed, got back:

      date -u ; curl --head http://www.infoworld.com/rss/news.xml
      Tue Jul 20 19:51:44 GMT 2004
      HTTP/1.1 200 OK
      Date: Tue, 20 Jul 2004 19:48:30 GMT
      Server: Apache
      Accept-Ranges: bytes
      Content-Length: 7520
      Content-Type: text/html; charset=UTF-8

      How do I write an RSS reader that only downloads this feed if the data has changed?

      Jerry

    4. Re:Simple HTTP Solution by poot_rootbeer · · Score: 2, Insightful

      There are at least three variables in this equation:
      1. Number of users
      2. Number of RSS feeds
      3. Size of each request


      And I'll add:
      4. Time at which each request occurs

      If RSS requests were evenly distributed throughout the hour, the problems would be minimal. When every single RSS reader assumes that updates should be checked exactly at X o'clock on the hour, you get problems.

    5. Re:Simple HTTP Solution by jesser · · Score: 3, Insightful

      Even if every RSS reader used HEAD (or if-modified-since) correctly, servers would still get hammered on the hour when the RSS feed has been updated during the hour. If-modified-since saves you bandwidth over the course of a day or month, but it doesn't reduce peak usage.

      --
      The shareholder is always right.
    6. Re:Simple HTTP Solution by blowdart · · Score: 2, Informative
      You're missing the point I assume the original poster was making.

      Not all web servers provide last-modified or etag headers. Infoworld doesn't, so even a well written RSS reader has to bring the whole feed down as they have no way to know if it has changed or not.

    7. Re:Simple HTTP Solution by johnbeat · · Score: 2, Informative

      Uh, no.

      Pastiche knows when the document was last modified and can support my writing an rss reader that checks last-modified:

      curl --head http://fishbowl.pastiche.org/nerdfull.xml
      HTTP/1. 1 200 OK
      Date: Tue, 20 Jul 2004 22:16:33 GMT
      Server: Apache/1.3.26 (Unix) Debian GNU/Linux mod_gzip/1.3.19.1a mod_jk/1.1.0
      Last-Modified: Mon, 19 Jul 2004 02:52:46 GMT
      ETag: "28620-8faa-40fb377e"
      Accept-Ranges: bytes
      Content-Length: 36778
      Content-Type: text/xml

      But infoworld does not. As far as I can tell from the headers I displayed in the previous post, infoworld's server does not provide such data. Without the last-modified or etag or something similar, there is no way to ask for a conditional get, because there is nothing to base the conditional on, and most likely the server doesn't know how to compare the conditional anyway since it clearly is not keeping track of when the document was last modified.

      I could easily be getting the syntax wrong, but whenever I request that it only send me the xml feed if it has been last modified in the last fraction of a second, I still get the page back:

      date > datestamp; curl --time-cond datestamp http://www.infoworld.com/rss/news.xml

      This returns a bunch of xml.

      Running the same command on Pastiche's xml feed returns, as I would expect, absolutely nothing:

      date > datestamp; curl --time-cond datestamp http://fishbowl.pastiche.org/nerdfull.xml

      Jerry

  6. Still haven't tried these newfangled RSS readers.. by Rezonant · · Score: 2, Interesting

    ...so could someone recommend a couple of really good ones for Windows and *nix?

  7. Call me stupid by nebaz · · Score: 4, Informative

    This is helpful.

    --
    Rhymes that keep their secrets will unfold behind the clouds.There upon the rainbow is the answer to a neverending story
  8. Editorializing in the blurb by Patik · · Score: 2, Insightful

    I don't really care for RSS either, but damn, was that necessary?

    1. Re:Editorializing in the blurb by sohojim · · Score: 2, Funny
      Oh, you mean editors editorializing? Probably.

  9. Over the years? How about over the weekend? by Marxist+Hacker+42 · · Score: 5, Informative

    We've seen similiar problems over the years. RSS (or as it should be called, "Speedfeed") is such a useful thing, it's unfortunate that it's ultimately just very stupid.

    And it seems to have gotten worse since the new code was installed- I get 503 errors at the top of every hour now on slashdot.

    --
    SJW: a person who perceives an injustice, and while correcting it, commits a greater injustice.
  10. What about a scheduler? by el-spectre · · Score: 4, Interesting

    Since many clients request the new data every 30 minutes or so... how about a simple system that spreads out the load? A page that, based on some criteria (domain name, IP, random seed, round robin) gives each client a time it should check for updates (i.e. 17 past the hour).

    Of course, this depends on the client to respect the request, but we already have systems that do (robots.txt), and they seem to work fairly well, most of the time.

    --
    "Faith: Belief without evidence in what is told by one who speaks without knowledge, of things without parallel." - A.B.
    1. Re:What about a scheduler? by cmdr_beeftaco · · Score: 5, Funny

      Bad idea. Everyone knows that most headlines are made at the top of the hour. Thus, A.M. radio always give news headlines "at-the-top-of-hour." RSS reader should be given the same timely updates.
      Related to this is the fact that most traffic accidents happen "on the twenties." Human nature is a curious and seemingly very predictable thing.

    2. Re:What about a scheduler? by JimDabell · · Score: 2, Informative

      RSS already supports the <ttl> element type, which indicates how long a client should wait before looking for an update. Additionally, HTTP servers can provide this information through the Expires header.

      Furthermore, well-behaved clients issue a "conditional GET" that only requests the file if it has been updated, which cuts back on bandwidth quite a bit, as only a short response saying it hasn't been updated is necessary in most cases.

    3. Re:What about a scheduler? by Anonymous Coward · · Score: 2, Funny

      maybe military time? 20 = 5 pm

      I sure hope you're not in the military. If you are then I highly recommend you make sure you haven't missed any appointments recently.

    4. Re:What about a scheduler? by cephyn · · Score: 2, Insightful

      no i think he's being serious. Since most people's schedules are based on the hour marks, it stands to chance that most people are rushing to get to their destination 20 minutes before the hour, and rushing out of their wherever 20 minutes after the hour. So, since the schedules are all synched, the traffic volume quickly swells 20 min before/after the hour and bam -- thats when you get the most accidents.

      Most major cities I think have traffic reports more often than just on 20/40.

      --
      Moo.
  11. RSS needs better TCP stacks by Russ+Nelson · · Score: 3, Interesting

    RSS just needs better TCP stacks. Here's how it would work: when your RSS client connects to an RSS server, it would simply leave the connection open until the next time the RSS data got updated. Then you would receive a copy of the RSS content. You simply *couldn't* fetch data that hadn't been updated.

    The reason this needs better TCP stacks is because every open connection is stored in kernel memory. That's not necessary. Once you have the connecting ip, port, and sequence number, those should go into a database, to be pulled out later when the content has been updated.
    -russ

    --
    Don't piss off The Angry Economist
    1. Re:RSS needs better TCP stacks by genixia · · Score: 3, Funny

      Yeah, because there's nothing like using a sledgehammer to crack a hazlenut.

      For starters, how about the readers play nice and spread their updates around a bit instead of all clamoring at the same time.

    2. Re:RSS needs better TCP stacks by EnderWiggnz · · Score: 3, Insightful

      not needing user intervention is the effing POINT of rss.

      its like saying - "java is great, except lets make it compiled, and platform specific"

      --
      ... hi bingo ...
    3. Re:RSS needs better TCP stacks by Salamander · · Score: 5, Insightful

      Leaving thousands upon thousands of connections open on the server is a terrible idea no matter how well-implemented the TCP stack is. The real solution is to use some sort of distributed mirroring facility so everyone could connect to a nearby copy of the feed and spread the load. The even better solution would be to distribute asynchronous update notifications as well as data, because polling always sucks. Each client would then get a message saying "xxx has updated, please fetch a copy from your nearest mirror" only when the content changes, providing darn near optimal network efficiency.

      --
      Slashdot - News for Herds. Stuff that Splatters.
    4. Re:RSS needs better TCP stacks by Anonymous Coward · · Score: 2, Insightful

      Yeah, just use a database backend for TCP, good idea. Oh! I know! Lets use XML instead! Jesus christ, if you are this stupid, just shut your hole. Don't propose retarded solutions to problems you don't understand just cause you are bored.

    5. Re:RSS needs better TCP stacks by Russ+Nelson · · Score: 2, Interesting

      Sigh. You don't get it, do you? You suggest different protocols, when TCP works just fine. The reason you want to stay with TCP is because of the infinite number of ways people have implemented TCP. Just as one HUGE example, consider all the people behind NAT. How are you going to "distribute asynchronous update notifications"?

      I'd like to hear one person, just one person say "Hmmm.... I wonder why Russ didn't suggest asynchronous update notifications?" And then maybe go on to answer themselves by saying "Oh! I get it! Russ is right! Hey, that's a great idea! It's backwards compatible and yet does exactly what is needed to turn RSS into a packet-efficient protocol."

      Instead, you get weenies who say something slightly more erudite than "duh" but which could be summarized thusly. You also get people (stand up and take a bow, Salamander) who say "Geez, that idea has OBVIOUS PROBLEMS" even though I obviously anticipate those OBVIOUS PROBLEMS and suggest a solution. Honestly, I see why people have such a low opinion of slashdot posters. Yer all a bunch of dummies!
      -russ
      p.s. pant, pant, pant, pant, okay, I feel better now.

      --
      Don't piss off The Angry Economist
  12. Idea by iamdrscience · · Score: 4, Interesting

    Well maybe somebody should set something up to syndicate RSS feeds via a peer to peer service. BitTorrent would work, but it could be improved upon (people would still be grabbing a torrent every hour, so it wouldn't completely solve the problem).

    1. Re:Idea by ganhawk · · Score: 5, Interesting

      You could have a system based on JXTA. Instead of the bittorrent model, it would be something like the P2P Radio. When the user asks for feed, a neigbour who just recived it can give it to the user (overlay network, JXTA based) or the server can point to one of the users who just received it.(similar to bittorrent but user gets whole file from peer intead of parts. The user also does not come back to server at all, if transfer is successfull. But the problem is this user need not serve others and can just leech)

      I feel overlay netwrok scheme would work better than Bittorrent/tracker based system. In overlay network scheme each group of network will have its own ultra peer (JXTA rendezvous) which acts as tracker for all files in that network. I wanted to do this for slashdot effect (p2pbridge.sf.net) but somehow the project has been delayed for long.

      --
      Python script to convert photos into "artsy" portraits: http://p2pbridge.sf.net/pyPortrait/
    2. Re:Idea by kingman · · Score: 2, Informative

      Shrook for Mac OS X appears to do almost that, where a central server collects updates and has ONE randomly-chosen client check for updates as frequently as every five minutes, but all other clients just refer to the central server to see if feeds are updated.

  13. One hour interval? by anynameleft · · Score: 2, Insightful
    Why have developers made their RSS readers so that they query the master site at each hour sharp? Why haven't they done it like Opera or Konqueror, e.g. query the server every sixty minutes after the application has been started?

    Or did the RSS reader authors hope that their applications wouldn't be used by anybody except for a few geeks?

  14. "it's the connection overhead, stupid" by SuperBanana · · Score: 4, Informative

    ...is what one would say to the designers of RSS.

    Mainly, IF your client is smart enough to communicate that it only needs part of the page, guess what? The pages, especially after gzip compression(which, including with mod_gzip, can be done ahead of time)...the real overhead is all the nonsense, both on a protocol level and for the server in terms of CPU time, of opening+closing a TCP connection.

    It's also the fault of the designers for not including strict rules as part of the standard for how frequently the client is allowed to check back, and, duh, the client shouldn't be user-configured to check at common times, like on the hour.

    Bram figured this out with BitTorrent- the server can instruct the client on when it should next check back.

  15. it's the PULL,stupid by kisrael · · Score: 3, Interesting

    "Despite 'only' being XML, RSS is the driving force fulfilling the Web's original promise: making the Web useful in an exciting, real-time way."

    Err, did I miss the meeting where that was declared as the Web's original promise?

    Anyway, the trouble is pretty obvious: RSS is just a polling mechanism to do fakey Push. (Wired had an interesting retrospective on their infamous "PUSH IS THE FUTURE" hand cover about PointCast.) And that's expensive, the cyber equivalent of a hoarde of screaming children asking "Are we there yet? Are we there yet? How about now? Are we there yet now? Are we there yet?" It would be good if we had an equally widely used "true Push" standard, where remote clients would register as listeners, and then the server can actually publish new content to the remote sites. However, in today's heavily firewall'd internet, I dunno if that would work so well, especially for home users.

    I dunno. I kind of admit to not really grokking RSS, for me, the presentation is too much of the total package. (Or maybe I'm bitter because the weird intraday format that emerged for my own site doesn't really lend itself to RSS-ification...)

    --
    SO YOU'RE GOING TO DIE: The Comic for Dealing with Death
  16. Proposed Solution by Dominatus · · Score: 2, Interesting

    Here's a solution: Have the RSS readers grab data every hour or half hour starting from when they are started up, not on the hour. This would of course distribute the "attacks" on the server.

  17. random check intervals? by Hunterdvs · · Score: 2, Insightful

    Why not have rss readers that check on startup, then check again at user specified intervals.. After a random amount of time has past.
    user starts program at 3.15 and it checks rss feed.
    user sets check interval to 1 hour.
    rand()%60 minutes later (let's say 37) it checks feed
    every hour after that it checks the feed.

    simplistic sure, but isn't rss in general?

    on an aside, any of you (few) non-programmers interested in creating rss feeds, i put out some software that facilitates it.
    hunterdavis.com/ssrss.html

  18. Re:Revision of the Standard by cmdr_beeftaco · · Score: 2, Interesting

    And there is a one word solution, peer to peer. The whole torrent concept is what is needed.

  19. Push, not pull! by mcrbids · · Score: 4, Interesting

    The basic problem with RSS is that it's a "pull" method - RSS clients have to make periodic requests "just to see". Also, there's no effective way to mirror content.

    That's just plain retarded.

    What they *should* do...

    1) Content should be pushed from the source, so only *necessary* traffic is generated. It should be encrypted with a certificate so that clients can be sure they're getting content from the "right" server.

    2) Any RSS client should also be able to act as a server, NTP style. Because of the certificate used in #1, this could be done easily while still ensuring that the content came from the "real" source.

    3) Subscription to the RSS feed could be done on a "hand-off" basis. In other words, a client makes a request to be added to the update pool on the root RSS server. It either accepts the request, or redirects the client to one its already set up clients. Whereupon the process starts all over again. The client requests subscription to the service, and the request is either accepted or deferred. Wash, rinse, repeat until the subscription is accepted.

    The result of this would be a system that could scale to just about any size, easily.

    Anybody want to write it? (Unfortunately, my time is TAPPED!)

    --
    I have no problem with your religion until you decide it's reason to deprive others of the truth.
    1. Re:Push, not pull! by stratjakt · · Score: 3, Interesting

      Too many firewalls in todays world for "push" anything to work.

      Too many upstream bandwidth restrictions, especially on home connections. Last thing people want is getting AUPped because they're mirroring slashdot headlines.

      My solution? Multicast IPs. Multicast IPs solve every problem that's ever been encountered by mankind. Join Multicast, listen till you've heard all the headlines (which repeat ad nauseum), move on with life. Heck, keep listening if ya want. All we have to do is make it work.

      Frankly, who said you have to let everyone in the world on your RSS feed. If your server cant handle X concurrent RSS requests, it's hardly the protocols "fault", IMO.

      --
      I don't need no instructions to know how to rock!!!!
    2. Re:Push, not pull! by stratjakt · · Score: 2, Insightful

      Doubly so if I want RSS content on multiple machines behind NAT. One person gets slashdot headlines, another CNN or whatever. Simple port forwarding won't solve that problem.

      "Push" is dead. "Push" was stillborn. The very climate w.r.t internet security is not disposed to "hey lets let remote servers push stuff into our network!"

      --
      I don't need no instructions to know how to rock!!!!
    3. Re:Push, not pull! by laird · · Score: 3, Informative

      The ICE syndication protocol has solved this. See http://www.icestandard.org.

      The short version is that ICE is far more bandwidth efficient than RSS because:
      - the syndicator and subscriber can negotiate whether to push or pull the content. So if the network allows for true push, the syndicator can push the updates, which is most efficient. This eliminates all of the "check every hour" that crushes RSS syndicators. And while many home users are behind NAT, web sites aren't, and web sites generate tons of syndication traffic that could be handled way more efficiently by ICE. Push means that there are many fewer updates transmitted, and that the updates that are sent are more timely.
      - ICE supports incremental updates, so the syndicator can send only the new or changed information. This means that the updates that are transmitted are far more efficient. For example, rather than responding to 99% of updates with "here are the same ten stories I sent you last time" you can reply with a tiny "no new stories" message.
      - ICE also has a scheduling mechanism, so you can tell a subscriber exactly how often you update (e.g. hourly, 9-5, Monday-Friday). This means that even for polling, you're not wasting resources being polled all night. This saves tons of bandwidth for people doing pull updates.

  20. I seem to remember... by Misch · · Score: 4, Interesting

    I seem to remember Windows scheduler being able to randomize scheduled event times within a 1 hour period. I think our RSS feeders need similar functions.

    --

    --You will rephrase your request for me to go to hell. Goto statements are not acceptable programming constructs
  21. Re:Still haven't tried these newfangled RSS reader by maharg · · Score: 3, Interesting

    RSSOwl - http://rssowl.sourceforge.net/ is pretty good.

    --

    $ strings FTP.EXE | grep Copyright
    @(#) Copyright (c) 1983 The Regents of the University of California.
  22. Re:Still haven't tried these newfangled RSS reader by Dr.+Sp0ng · · Score: 4, Informative

    On Windows I use RSS Bandit. Haven't found a non-sucky one for *nix, although I haven't looked all that hard. On OS X I use NetNewsWire, which while not great, does the job.

  23. Oh, come on by aiken_d · · Score: 5, Interesting

    My guess is that InfoWorld is dynamically generating the RSS for each request. A simple host-side cache of the generated XML, so hits just talk to the HTTP server and not the app server, would probably make this a non-issue.

    Or are they *really* getting more RSS hits than image requests? If -- somehow -- that's the case, spend $500/mo on Akamai or Speedera and point RSS stuff there, and give the CDN a reasonable timeout (30 minutes or something). That guarantees you no more than about 500 hits per timeout period, or maybe one every 10 seconds. Surely the app server can handle that.

    Then again, what do I know? I only worked there for five years, including two on infoworld.com. It's been a few years, but unless things have changed dramatically, that is one messed up IT organization.

    Cheers
    -b

    --
    If I wanted a sig I would have filled in that stupid box.
    1. Re:Oh, come on by Mitchell+Mebane · · Score: 2, Informative

      Or maybe something like this.

      --

      The roots of education are bitter, but the fruit is sweet.
      --Aristotle
  24. Re:Still haven't tried these newfangled RSS reader by Eslyjah · · Score: 2, Informative

    If you're using NetNewsWire on OS X, try the Atom Beta, which, I'm sure it will come as no shock to you, adds support for Atom feeds.

  25. It just ain't broadcast.. by wfberg · · Score: 4, Interesting

    Complaining about people connecting to your RSS feeds "impolitely" is missing the mark a bit, I think. Even RSS readers that *do* check when the file was last changed, still download the entire feed when so much as a single character has changed.

    There used to be a system where you could pull a list of recently posted articles off of a server that your ISP had installed locally, and only get the newest headers, and then decide which article bodies to retrieve.. The articles could even contain rich content, like HTML and binary files. And to top it off, articles posted by some-one across the globe were transmitted from ISP to ISP, spreading over the world like an expanding mesh.

    They called this.. USENET..

    I realize that RSS is "teh hotness" and Usenet is "old and busted", and that "push is dead" etc. But for Pete's sake, don't send a unicast protocol to do a multicast (even if it is at the application layer) protocol's job!

    It would of course be great if there was a "cache" hierarchy on usenet. Newsgroups could be styled after content providers URLs (e.g. cache.com.cnn, cache.com.livejournal.somegoth) and you could just subscribe to crap that way. There's nothing magical about what RSS readers do that the underlying stuff has to be all RRS-y and HTTP-y..

    For real push you could even send the RSS via SMTP, and you could use your ISPs outgoing mail server to multiply your bandwidth (i.e. BCC).

    --
    SCO employee? Check out the bounty
    1. Re:It just ain't broadcast.. by fiftyvolts · · Score: 4, Insightful

      You make some very good points. The old saying "When all you have is a hammer, everything looks like a nail" seems to ring true time and time again. These days it seems that everyone wants to use HTTP for everything and quite frankly it's not equipped to do that.

      RSS over SMTP sounds pretty cool. Heck, just sending a list of subscribers an email of RSS and let their mail clients sort it out would be pretty nice.

      Heh, my favorite posts are when some one suggested soething that sonuds totally novel and then someone else points our "Yeah! Like $lt;insert old and undeused technology>. It seems to do that damn well." The internet cannot forget its roots!

    2. Re:It just ain't broadcast.. by mbauser2 · · Score: 2, Insightful

      1) The RSS-developer community has a completely irrational fear of MIME. They never completed the registration of the application/rss+xml media type, and they've shown no interest in doing so. Weiner and the gang want to use text/xml for everything, which makes it harder to separate RSS out of a newsgroup (or anything else; more on that below).

      2) The RSS developer community can't picture themselves using anything except HTTP. I've tried mentioning other protocols to them; they don't respond.

      3) NOBODY MAKES RSS READERS THAT WORK IN A PIPE!. Seriously. Is it really so hard to envisage somebody piping an RSS file in from the command line? Apparently, it is for the people who write RSS readers: they make you cut-and-paste URIs into a form before you can do anything with an RSS file.

      Seriously, RSS over netnews wouldn't really require any new Big Ideas, just a smart re-application of the Old Ideas:

      1) Post RSS files to Usenet with proper "Content-Type" and "Supersedes" headers to an appropriate newsgroup. (Maybe some new RSS-friendly newsgroups; maybe the old ones. We can figure that out later. The important thing is: This wouldn't be any more difficult than posting a FAQ is.)

      2) Use newsgroup-capable RSS-readers to poll the newsgroups, and/or use regular newsreaders to pipe RSS files to dedicated RSS-readers.

      3) Profit! Or at least, Fewer Accidental DDoS attaacks!

      I could do Step 1 now, without significant effort. (It's no more difficult than posting a newsgroup FAQ.) Step 2 requires a real programmer, which I am not.

      (In fact, you know what would be great? A combined newsgroup/RSS reader. It makes more sense than all those RSS readers patterned after e-mail programs. But I digress.)

      Maybe I'm getting cynical in my old age, but I'm beginning to think this is the UNIX/Windows divide all over again. A lot of the RSS developer community comes from a Windows/Mac developer background, so they just don't see the potential of the toolbox approach, even while they're rambling about the extensibility of XML and it's "user-centric" design.

      Take for example, the refusal to get a real media type for RSS: A unique MIME type would help web browsers, too, because browsers can use media types to decide which plug-in gets which file. Instead of making a user cut-and-paste URIs from his browser to his reader (which is a dreadfully Window-ish way of doing it), the user could just click on the RSS link and the web browser could launch the RSS reader by itself (which presumbably would do something smart, like ask the user if they want to subscribe to a new feed). Just like all those other plug-ins and non-HTML formats on the Web!

      Makes sense, yes? But it doesn't register with anybody creating RSS readers. Some programmers still advocate the cutting-and-pasting of URIs. Some programmers advocated auto-discovery by reading HTML "link" elements. Some advocate complicated cloud/stream schemes. But nobody wants to talk about re-using basic, functional tools that we've had in the toolbox for 10, 15, or 25 years.

      Some days, it's like the "RSS developers" are from another planet. And I want to send them all back.

      --
      Proud to be / Smiley-free / Since Nineteen / Ninety-Three
  26. You know what would be nice.. by ToadMan8 · · Score: 2, Interesting

    /. is especially pissy with this but I want breaking news, not whatever is new each hour. So I hammer the shit out of it with my client (and get banned). I'd like to see a service where I download one client (that has front-ends in Gnome pannel, the Windows tray, etc.) that the site (/., cspan, etc.) _pushes_ new updates to when I sign up. Those w/ dynamic IPs could, when they sign on, have their client automagically connect to a server that holds their unique user ID with their IP.

    --
    I haven't posted in so long, my sig is out of date.
  27. RSS is like a DDoS attack on my brain by PCM2 · · Score: 5, Interesting

    Am I the only one who finds it easier to get the information I want from the home pages of the sites I trust, rather than relying on an RSS feed? For one thing, in an RSS feed every story has the same priority ... stories keep coming in and I have no idea which ones are "bigger" than others. Sites like News.com, on the other hand, follow the newspaper's example of printing the headlines for the more important stories bigger. With RSS, it's just information overload, especially with the same stories duplicated at different sources, etc. Everyone seems really excited about RSS, but when I tried it I just couldn't figure out how to use it such that it would actually give me some real value vs. the resources I already have.

    --
    Breakfast served all day!
    1. Re:RSS is like a DDoS attack on my brain by damiangerous · · Score: 2, Interesting

      Nope. I just don't get RSS either. Every time there's a story about it I give another reader another shot, and every time I just end up thinking "how is this different than checking my bookmarks regularly?"

  28. Re:RHS by shadowcabbit · · Score: 2, Funny

    we need RHS... really HARD syndication

    That's nothing compared to RMS, which (according to RMS) stands for GNU/Recursive Meta-Syndication.

    --
    "Why Subscribe?" Good question...
  29. Re:Still haven't tried these newfangled RSS reader by bbh · · Score: 2, Informative

    I'm using Liferea version 0.5.1 under Linux right now. Compiles from source fine on Fedora Core 2 and has worked great for me so far.

    bbh

  30. Re:Over the years? How about over the weekend? by Marxist+Hacker+42 · · Score: 2, Interesting

    Overall traffic isn't what anybody is complaining about- as I noted, the 503 errors seem to come at the top of every hour (I just got through not being able to read slashdot for a few minutes), which means, essentially, slashdot is recieving a slashdotting. Do I know that RSS is doing it? Not from this location which has limited investigation tools or capability to figure out what's really going on. But it might explain recent behavior of the site.

    --
    SJW: a person who perceives an injustice, and while correcting it, commits a greater injustice.
  31. Re:Won't help by AndroidCat · · Score: 2, Interesting
    Maybe not--it depends on how the programs work. If they check a feed an hour from the start of the last check rather than an hour from when the last check ended, they won't drift.

    However, the smart money is on Murphy. :)

    --
    One line blog. I hear that they're called Twitters now.
  32. Publish/Subscribe by dgp · · Score: 4, Informative

    That is mind bogglingly inefficient. Its like POP clients checking for new email every X minutes. Polling is wrong wrong wrong! Check out the select() libc call. Does the linux kernel go into a busy wait loop listening for every ethernet packet? no! it gets interrupted when a packet it ready!

    http://www.mod-pubsub.org/
    The apache module mod_pubsub might be a solution.

    From the mod_pubsub FAQ:
    What is mod_pubsub?

    mod_pubsub is a set of libraries, tools, and scripts that enable publish and subscribe messaging over HTTP. mod_pubsub extends Apache by running within its mod_perl Web Server module.

    What's the benefit of developing with mod_pubsub?

    Real-time data delivery to and from Web Browsers without refreshing; without installing client-side software; and without Applets, ActiveX, or Plug-ins. This is useful for live portals and dashboards, and Web Browser notifications.

    Jabber also saw a publish/subscribe mechanism as an important feature.

  33. Re:Over the years? How about over the weekend? by anomalous+cohort · · Score: 2, Informative
    it's unfortunate that it (RSS) is ultimately just very stupid.

    The folks over at Netscape and/or UserLand should have studied the CDF standard first. Then they would have realized the value of specifying schedule information.

  34. Common Sense? by djeaux · · Score: 3, Informative
    I publish 15 security-related RSS feeds (scrapers) at my website. In general, they are really small files, so bandwidth is usually not an issue for me. I do publish the frequency with which the feeds are refreshed (usually once per hour).

    I won't argue with those who have posted here that some alternative to the "pull" technology of RSS would be very useful. But...

    The biggest problem I see isn't newsreaders but blogs. Somebody throws together a blog, inserts a little gizmo to display one of my feeds & then the page draws down the RSS every time the page is reloaded. Given the back-and-forth nature of a lot of folks' web browsing pattern, that means a single user might draw down one of my feeds 10-15 times in a 5 minute span. Now, why couldn't the blogger's software be set to load & cache a copy of the newsfeed according to a schedule?

    The honorable mention for RSS abuse goes to the system administrator who set up a newreader screen saver that pulled one of my feeds. He then installed the screen saver on every PC in every office of his company. Every time the screen saver activated, POW! one feed drawn down...

    --
    "Obviously, I'm not an IBM computer any more than I'm an ashtray" (Bob Dylan)
  35. Trivial solution! by Maljin+Jolt · · Score: 2, Interesting

    Random intervals. I already patched my desktop RSS reader to request new feed every 73+-13 minutes.

    --
    There you are, staring at me again.
  36. Not to flame... by T3kno · · Score: 3, Interesting

    But isn't this what TCP/IP multicast was invented for? I've never really understood why multicast has never really taken off. Too complicated? Instead of entering an rss server to pull from just join a multicast group and have the RSS blasted once every X minutes. Servers could even send out updates more often because there are only a few connections to send to. Of course I could be completely wrong and multicast may be the absolute wrong choice for this sort of application, it's been a while since I've read any documentation about it.

    --
    (B) + (D) + (B) + (D) = (K) + (&)
    1. Re:Not to flame... by Ernesto+Alvarez · · Score: 2, Informative

      TCP cannot multicast. It's impossible due to its connection oriented, two way properties.

      IP can multicast, but it needs support from the network to do that. The problem with that is that the internet is not under one authority that can say "from today onwards, we do multicast in such and such way". There have been experiments with multicasting (mbone), but there are some things that cannot be solved easily (eg. how do you register as a multicast client, and (important part here) how do you make every router from source to destination know about it, and act accordingly (remember, those routers are NOT under the same authority). So, even when you could multicast with UDP/IP, some logistics problems make it very difficult to do it.

      However, within an autonomous system (which IS under a single authority) you could multicast, provided there is support provided by the net, in fact, both standard routing protocols (OSPF and RIP) as well as NTP can, and have multicast groups assigned to them.

      It's too bad, but that's how the real world is....

    2. Re:Not to flame... by stienman · · Score: 2, Informative

      The practical problem with multicast was that it requires an intelligent network and dumb clients. In other words: routers have to be able to keep a table of information on which links to relay multicast information, and that has to be dynamically updated periodically.

      There is a multicast overlay on top of the internet which consists of routers that can handle this load.

      But the combination of no hardware/software support in the network, and no real huge push for this technology left multicast high and dry.

      Brief idea of how multicast works:
      1) A source send out a "I have a multicast feed" to its immediate routers. Those routers 'publish' this feed to their connected routers until every segment on the internet has seen this feed broadcast.
      2) At the end points, individual computers see this message on their segment. They can subscribe to the feed by sending a message to their upstream router. This router places an entry in its table saying, "Someone on segment X wants feed Y, which I get from segment Z" It then sends a subscribe message to the router it got the original broadcast from, which does the same thing on upward until it hits the originating server.
      3) Each router, when it sees a multicast packet, consults its table to see which (if any) segments it should forward the packet to. Eventually the packet makes its way to all the endpoints of the network
      4) The publish broadcast is initiated periodically. Each router also periodically checks the table to see if they haven't received a re-subscribe message since the last publish broadcast. If no one resubscribes then the table entry is not refreshed - there is no unsubscribe, if you no longer want the feed just ignore it and it'll go away if no one else on your segment wants it. Only one subscriber on each segment needs to subscribe, so if I want it and my co worker wants it then if I see his subscribe packet before I send mine out then I won't send mine out since it'll be put on my segment anyway.

      It's quite elegant, but when a router is dealing with 40+Gbps of packets it barely has time to figure out where each packet goes, nevermind statefully inspecting multicast packets and forwarding them appropiately. Not impossible, but it hasn't been rolled out and few providers see any money in supporting it.

      -Adam

  37. PulpFiction by Cadre · · Score: 2, Informative

    I recommend PulpFiction for an RSS/Atom reader on OS X. I much prefer the interface and how it treats the news compared to NNW.

    --
    All editorial writers ever do is come down from the hill after the battle is over and shoot the wounded.
  38. A few dont get it by Bluelive · · Score: 2, Insightful

    it seems a few peoples here dont get it. RSS is the file format, not the transfer via HTTP The whole pull problem is a problem with HTTP, in theory you could make an irc like protocol and transmit via that, solving some of the subscription, distribution and pull problems.

  39. Re:Still haven't tried these newfangled RSS reader by timothyf · · Score: 2, Informative

    If you don't use one computer all the time and you want to check your feeds from other places, I'd recommend going with a web-based news-agreggation service. I personally use BlogLines, but there are other services out there as well.

  40. Solution: HTTP 503 Response for Flow Control by Orasis · · Score: 3, Insightful

    The main problem here is that RSS lacks any sort of distributed flow control, much as the Internet did back in the early days with tons of UDP packets flying around everywhere and periodically bringing networks to their knees.

    One completely backwards-compatible fashion to add flow-control to RSS would be to use the HTTP 503 response when server load is getting too high for your RSS files. The server simply sends an HTTP 503 response with a Retry-After header indicating how long the requesting client should wait before retrying.

    Clients that ignore the retry interval or are overly aggressive could be punished by further 503 responses thus basically denying those aggressive clients access to the RSS feeds. Users of overly aggressive clients would soon find that they actually provide less fresh results and would place pressure on implementors to fix their implementations.

  41. Told Ya So by cmacb · · Score: 3, Interesting

    I think this was more or less the first thought I had about RSS when I first looked into it and found out that it was a "pull" technology rather than a "push" as the early descriptions of it implied.

    Yes, it's "cool" that I can set up a page (or now use a browser plug-in) to automatically get a lot of content from hundreds of web pages at a time when I really opened up the browser to check my e-mail.

    What would have REALLY, been cool would be some sort of technology that would notify me when something CHANGED. No effort on my part, no *needless* effort on the servers part.

    Oh wait... We HAD that didn't we, I think they were called Listservers, and they worked just fine. (Still do actually as I get a number of updates, including Slashdot, that way.) RSS advocates (and I won't mention any names) keep making pronouncements like "e-mail s dead!" simply because they have gotten themselves and their hosting companies on some black hole lists. Cry me a river now that your bandwidth costs are going through the roof and yet nobody is clicking though on your web page ads, because, guess what? Nobody is visiting your page. They have all they need to know about your updates via your RSS feeds.

  42. Feed on Feeds (Web based) by Poulpy · · Score: 2, Informative

    Neither Windows nor Unix, but I've set up Feed on Feeds on my webserver and I like it!

    It's a "PHP/MySQL server side RSS/Atom aggregator", so you can read your feeds wherever you are, you only need a web browser on the client side.

    Pros:
    1) you don't need to synchronize the state between the multiple workstations you might use.
    2) no platform/os problem on the client side.

    Cons:
    1) you need some web hosting with PHP and MySQL available (I pay 45 a year for my domain name + 30MB Webspace + 30MB FTP + 30MB MySQL base + 100*25MB pop/imap accounts + SSL everywhere).
    2) no installer so you'll need many computing skills to set it up (no that hard).
    3) no automated update, you have to click "Update" so you may miss some news when you offline (see away from any internet access) for a long period...

    Changed my online life as I no longer have to install anything on the client side (usefull when away from your home/office) or have to synchronize my feeds either with some removable storage (my USB key failed after 250+ daily syncs) or through the net (BottomFeeder, a smalltalk implementation which works on every platform I ever came accross, allows to sync with an FTP location).

    Regards,
    Poulpy.

  43. Random != Distributed by grcumb · · Score: 2, Interesting

    True story:

    We ran a network operations center to provide support for several hundred servers spread over two continents. Each hour, every server would 'phone home' to see if it needed updates or configuration changes. This was a fairly data-heavy operation, requiring many database lookups. We knew that we didn't want every server calling at the same time, so we had each server derive its own random integer between 1 and 59, and to use that as the minute of the hour to contact the NOC.

    Before long we found that the NOC was dragging itself into a death spiral of overwork. The problem? By chance, an unusually large number of servers chose a very small range of numbers. Worse, they just happened to choose numbers close to 05, which just happened to be when some very large cron tasks were running as well.

    Try rolling a die 100 times. Even though the odds are the same every time before you roll, the actual frequency of occurence of the individual numbers is not even. Leaving the choice of retrieval time to the client does not reliably reduce the chance of a server being overwhelmed. In fact, it more or less guarantees traffic spikes.

    I'm not intimately familiar with RSS client or server implementations, but I suspect that it would be fairly easy to format a suggested refresh interval and refresh time on the server and send that to the client.

    --
    Crumb's Corollary: Never bring a knife to a bun fight.
  44. it's the RSS _client_ being stupid by cs · · Score: 2, Insightful
    Clients polling on the hour? How stupid is that?

    Even for a poll at hourly intervals this should get staggered across an given hour according to when the client starts. Also, a client should probably not be polling every 3600 seconds (or whatever interval) but polling with a 3600 second gap between end of one poll and start of the next. In this way a loaded server will smear the clients out simply by having slower response, and the load will even out on its own.

    It's always bad to have lots of agents doing things in synchrony when that involves an outside resource. Contact the client authors, give them a clue, let the upgrades push the bugfix out.

    Finally, isn't RSS done over HTTP anyway? So why aren't these clients going through their ISP's proxy and doing Get-If-Modified? The target server should see only a fraction of the spike even with bad clients. Unless they're very very bad...

    None of these things is a direct flaw in RSS, just crap quality of implementation in RSS clients.

    --
    Cameron Simpson, DoD#743 cs@cskk.id.au http://www.cskk.ezoshosting.com/cs/
  45. Re:Still haven't tried these newfangled RSS reader by The+Grassy+Knoll · · Score: 2, Funny

    >On Windows I use RSS Bandit

    Pronounced "ArseBandit"?
    That's priceless, to a Brit at least.

    .

    --
    They will never know the simple pleasure of a monkey knife fight