Slashdot Mirror


When RSS Traffic Looks Like a DDoS

An anonymous reader writes "Infoworld's CTO Chad Dickerson says he has a love/hate relationship with RSS. He loves the changes to his information production and consumption, but he hates the behavior of some RSS feed readers. Every hour, Infoworld "sees a massive surge of RSS newsreader activity" that "has all the characteristics of a distributed DoS attack." So many requests in such a short period of time are creating scaling issues. " We've seen similiar problems over the years. RSS (or as it should be called, "Speedfeed") is such a useful thing, it's unfortunate that it's ultimately just very stupid.

11 of 443 comments (clear)

  1. netcraft article by croddy · · Score: 4, Informative
  2. Simple HTTP Solution by inertia187 · · Score: 3, Informative

    The readers should HEAD to see if the last modified changed... And the feed rendering engines should make sure their last modified is accurate.

    --
    A programmer is a machine for converting coffee into code.
    1. Re:Simple HTTP Solution by ry4an · · Score: 3, Informative
      Better than that they should use the HTTP 2616 If-Modified-Since: header in their GETs as specified in section 14.25. That way if it has changed they don't have to do a subsequent GET.

      Someone did a nice write-up about doing so back in 2002.

    2. Re:Simple HTTP Solution by johnbeat · · Score: 3, Informative

      So, he's writing from infoworld and complaining that RSS feed readers grab feeds whether the data has changed or not. So, I went to look for infoworld's RSS feeds. Found them at:

      http://www.infoworld.com/rss/rss_info.html

      Trying the top news feed, got back:

      date -u ; curl --head http://www.infoworld.com/rss/news.xml
      Tue Jul 20 19:51:44 GMT 2004
      HTTP/1.1 200 OK
      Date: Tue, 20 Jul 2004 19:48:30 GMT
      Server: Apache
      Accept-Ranges: bytes
      Content-Length: 7520
      Content-Type: text/html; charset=UTF-8

      How do I write an RSS reader that only downloads this feed if the data has changed?

      Jerry

  3. Call me stupid by nebaz · · Score: 4, Informative

    This is helpful.

    --
    Rhymes that keep their secrets will unfold behind the clouds.There upon the rainbow is the answer to a neverending story
  4. Over the years? How about over the weekend? by Marxist+Hacker+42 · · Score: 5, Informative

    We've seen similiar problems over the years. RSS (or as it should be called, "Speedfeed") is such a useful thing, it's unfortunate that it's ultimately just very stupid.

    And it seems to have gotten worse since the new code was installed- I get 503 errors at the top of every hour now on slashdot.

    --
    SJW: a person who perceives an injustice, and while correcting it, commits a greater injustice.
  5. "it's the connection overhead, stupid" by SuperBanana · · Score: 4, Informative

    ...is what one would say to the designers of RSS.

    Mainly, IF your client is smart enough to communicate that it only needs part of the page, guess what? The pages, especially after gzip compression(which, including with mod_gzip, can be done ahead of time)...the real overhead is all the nonsense, both on a protocol level and for the server in terms of CPU time, of opening+closing a TCP connection.

    It's also the fault of the designers for not including strict rules as part of the standard for how frequently the client is allowed to check back, and, duh, the client shouldn't be user-configured to check at common times, like on the hour.

    Bram figured this out with BitTorrent- the server can instruct the client on when it should next check back.

  6. Re:Still haven't tried these newfangled RSS reader by Dr.+Sp0ng · · Score: 4, Informative

    On Windows I use RSS Bandit. Haven't found a non-sucky one for *nix, although I haven't looked all that hard. On OS X I use NetNewsWire, which while not great, does the job.

  7. Publish/Subscribe by dgp · · Score: 4, Informative

    That is mind bogglingly inefficient. Its like POP clients checking for new email every X minutes. Polling is wrong wrong wrong! Check out the select() libc call. Does the linux kernel go into a busy wait loop listening for every ethernet packet? no! it gets interrupted when a packet it ready!

    http://www.mod-pubsub.org/
    The apache module mod_pubsub might be a solution.

    From the mod_pubsub FAQ:
    What is mod_pubsub?

    mod_pubsub is a set of libraries, tools, and scripts that enable publish and subscribe messaging over HTTP. mod_pubsub extends Apache by running within its mod_perl Web Server module.

    What's the benefit of developing with mod_pubsub?

    Real-time data delivery to and from Web Browsers without refreshing; without installing client-side software; and without Applets, ActiveX, or Plug-ins. This is useful for live portals and dashboards, and Web Browser notifications.

    Jabber also saw a publish/subscribe mechanism as an important feature.

  8. Common Sense? by djeaux · · Score: 3, Informative
    I publish 15 security-related RSS feeds (scrapers) at my website. In general, they are really small files, so bandwidth is usually not an issue for me. I do publish the frequency with which the feeds are refreshed (usually once per hour).

    I won't argue with those who have posted here that some alternative to the "pull" technology of RSS would be very useful. But...

    The biggest problem I see isn't newsreaders but blogs. Somebody throws together a blog, inserts a little gizmo to display one of my feeds & then the page draws down the RSS every time the page is reloaded. Given the back-and-forth nature of a lot of folks' web browsing pattern, that means a single user might draw down one of my feeds 10-15 times in a 5 minute span. Now, why couldn't the blogger's software be set to load & cache a copy of the newsfeed according to a schedule?

    The honorable mention for RSS abuse goes to the system administrator who set up a newreader screen saver that pulled one of my feeds. He then installed the screen saver on every PC in every office of his company. Every time the screen saver activated, POW! one feed drawn down...

    --
    "Obviously, I'm not an IBM computer any more than I'm an ashtray" (Bob Dylan)
  9. Re:Push, not pull! by laird · · Score: 3, Informative

    The ICE syndication protocol has solved this. See http://www.icestandard.org.

    The short version is that ICE is far more bandwidth efficient than RSS because:
    - the syndicator and subscriber can negotiate whether to push or pull the content. So if the network allows for true push, the syndicator can push the updates, which is most efficient. This eliminates all of the "check every hour" that crushes RSS syndicators. And while many home users are behind NAT, web sites aren't, and web sites generate tons of syndication traffic that could be handled way more efficiently by ICE. Push means that there are many fewer updates transmitted, and that the updates that are sent are more timely.
    - ICE supports incremental updates, so the syndicator can send only the new or changed information. This means that the updates that are transmitted are far more efficient. For example, rather than responding to 99% of updates with "here are the same ten stories I sent you last time" you can reply with a tiny "no new stories" message.
    - ICE also has a scheduling mechanism, so you can tell a subscriber exactly how often you update (e.g. hourly, 9-5, Monday-Friday). This means that even for polling, you're not wasting resources being polled all night. This saves tons of bandwidth for people doing pull updates.