When RSS Traffic Looks Like a DDoS

← Back to Stories (view on slashdot.org)

When RSS Traffic Looks Like a DDoS

Posted by ryuzaki0 on Tuesday July 20, 2004 @07:32AM from the which-is-most-of-the-time dept.

An anonymous reader writes "Infoworld's CTO Chad Dickerson says he has a love/hate relationship with RSS. He loves the changes to his information production and consumption, but he hates the behavior of some RSS feed readers. Every hour, Infoworld "sees a massive surge of RSS newsreader activity" that "has all the characteristics of a distributed DoS attack." So many requests in such a short period of time are creating scaling issues. " We've seen similiar problems over the years. RSS (or as it should be called, "Speedfeed") is such a useful thing, it's unfortunate that it's ultimately just very stupid.

22 of 443 comments (clear)

netcraft article by croddy · 2004-07-20 07:34 · Score: 4, Informative

another article
Call me stupid by nebaz · 2004-07-20 07:35 · Score: 4, Informative

This is helpful.

--
Rhymes that keep their secrets will unfold behind the clouds.There upon the rainbow is the answer to a neverending story
Over the years? How about over the weekend? by Marxist+Hacker+42 · 2004-07-20 07:37 · Score: 5, Informative

We've seen similiar problems over the years. RSS (or as it should be called, "Speedfeed") is such a useful thing, it's unfortunate that it's ultimately just very stupid.

And it seems to have gotten worse since the new code was installed- I get 503 errors at the top of every hour now on slashdot.

--
SJW: a person who perceives an injustice, and while correcting it, commits a greater injustice.
What about a scheduler? by el-spectre · 2004-07-20 07:37 · Score: 4, Interesting

Since many clients request the new data every 30 minutes or so... how about a simple system that spreads out the load? A page that, based on some criteria (domain name, IP, random seed, round robin) gives each client a time it should check for updates (i.e. 17 past the hour).

Of course, this depends on the client to respect the request, but we already have systems that do (robots.txt), and they seem to work fairly well, most of the time.

--
"Faith: Belief without evidence in what is told by one who speaks without knowledge, of things without parallel." - A.B.
1. Re:What about a scheduler? by cmdr_beeftaco · 2004-07-20 07:50 · Score: 5, Funny
  
  Bad idea. Everyone knows that most headlines are made at the top of the hour. Thus, A.M. radio always give news headlines "at-the-top-of-hour." RSS reader should be given the same timely updates.
  Related to this is the fact that most traffic accidents happen "on the twenties." Human nature is a curious and seemingly very predictable thing.
Idea by iamdrscience · 2004-07-20 07:38 · Score: 4, Interesting

Well maybe somebody should set something up to syndicate RSS feeds via a peer to peer service. BitTorrent would work, but it could be improved upon (people would still be grabbing a torrent every hour, so it wouldn't completely solve the problem).
1. Re:Idea by ganhawk · 2004-07-20 07:53 · Score: 5, Interesting
  
  You could have a system based on JXTA. Instead of the bittorrent model, it would be something like the P2P Radio. When the user asks for feed, a neigbour who just recived it can give it to the user (overlay network, JXTA based) or the server can point to one of the users who just received it.(similar to bittorrent but user gets whole file from peer intead of parts. The user also does not come back to server at all, if transfer is successfull. But the problem is this user need not serve others and can just leech)
  
  I feel overlay netwrok scheme would work better than Bittorrent/tracker based system. In overlay network scheme each group of network will have its own ultra peer (JXTA rendezvous) which acts as tracker for all files in that network. I wanted to do this for slashdot effect (p2pbridge.sf.net) but somehow the project has been delayed for long.
  
  --
  Python script to convert photos into "artsy" portraits: http://p2pbridge.sf.net/pyPortrait/
"it's the connection overhead, stupid" by SuperBanana · 2004-07-20 07:39 · Score: 4, Informative

...is what one would say to the designers of RSS.
Mainly, IF your client is smart enough to communicate that it only needs part of the page, guess what? The pages, especially after gzip compression(which, including with mod_gzip, can be done ahead of time)...the real overhead is all the nonsense, both on a protocol level and for the server in terms of CPU time, of opening+closing a TCP connection.
It's also the fault of the designers for not including strict rules as part of the standard for how frequently the client is allowed to check back, and, duh, the client shouldn't be user-configured to check at common times, like on the hour.
Bram figured this out with BitTorrent- the server can instruct the client on when it should next check back.

--
Please help metamoderate.
Re:Can't this be throttled? by mgoodman · 2004-07-20 07:40 · Score: 4, Insightful

Then their RSS client would barf on the input and the user wouldn't see any of the previously downloaded news feeds, in some cases.

Or rather, anyone that programs an RSS reader so horribly as to make it so that every client downloads information every hour on the hour would probably also barf on the input of a 500 or 404 error.

Most RSS feeders *should* just download every hour from the time they start, making the download intervals between users more or less random and well-dispersed. And if you want it more than every hour, well then edit the source and compile it yourself :P

--
01100111 01100101 01110100 00100000 01101111 01110101 01110100 00100000 01101101 01101111 01110010 01100101 00101110
Re:Simple HTTP Solution by skraps · 2004-07-20 07:45 · Score: 5, Insightful
This "optimization" will not have any long-lasting benefits. There are at least three variables in this equation:
1. Number of users
2. Number of RSS feeds
3. Size of each request
This optimization only addresses #3, which is the least likely to grow as time goes on.
--
Karma: -2147483648 (Mostly affected by integer overflow)
Push, not pull! by mcrbids · 2004-07-20 07:46 · Score: 4, Interesting

The basic problem with RSS is that it's a "pull" method - RSS clients have to make periodic requests "just to see". Also, there's no effective way to mirror content.

That's just plain retarded.

What they *should* do...

1) Content should be pushed from the source, so only *necessary* traffic is generated. It should be encrypted with a certificate so that clients can be sure they're getting content from the "right" server.

2) Any RSS client should also be able to act as a server, NTP style. Because of the certificate used in #1, this could be done easily while still ensuring that the content came from the "real" source.

3) Subscription to the RSS feed could be done on a "hand-off" basis. In other words, a client makes a request to be added to the update pool on the root RSS server. It either accepts the request, or redirects the client to one its already set up clients. Whereupon the process starts all over again. The client requests subscription to the service, and the request is either accepted or deferred. Wash, rinse, repeat until the subscription is accepted.

The result of this would be a system that could scale to just about any size, easily.

Anybody want to write it? (Unfortunately, my time is TAPPED!)

--
I have no problem with your religion until you decide it's reason to deprive others of the truth.
I seem to remember... by Misch · 2004-07-20 07:46 · Score: 4, Interesting

I seem to remember Windows scheduler being able to randomize scheduled event times within a 1 hour period. I think our RSS feeders need similar functions.

--

--You will rephrase your request for me to go to hell. Goto statements are not acceptable programming constructs
Re:Still haven't tried these newfangled RSS reader by Dr.+Sp0ng · 2004-07-20 07:49 · Score: 4, Informative

On Windows I use RSS Bandit. Haven't found a non-sucky one for *nix, although I haven't looked all that hard. On OS X I use NetNewsWire, which while not great, does the job.
Oh, come on by aiken_d · 2004-07-20 07:54 · Score: 5, Interesting

My guess is that InfoWorld is dynamically generating the RSS for each request. A simple host-side cache of the generated XML, so hits just talk to the HTTP server and not the app server, would probably make this a non-issue.

Or are they *really* getting more RSS hits than image requests? If -- somehow -- that's the case, spend $500/mo on Akamai or Speedera and point RSS stuff there, and give the CDN a reasonable timeout (30 minutes or something). That guarantees you no more than about 500 hits per timeout period, or maybe one every 10 seconds. Surely the app server can handle that.

Then again, what do I know? I only worked there for five years, including two on infoworld.com. It's been a few years, but unless things have changed dramatically, that is one messed up IT organization.

Cheers
-b

--
If I wanted a sig I would have filled in that stupid box.
It just ain't broadcast.. by wfberg · 2004-07-20 07:54 · Score: 4, Interesting

Complaining about people connecting to your RSS feeds "impolitely" is missing the mark a bit, I think. Even RSS readers that *do* check when the file was last changed, still download the entire feed when so much as a single character has changed.

There used to be a system where you could pull a list of recently posted articles off of a server that your ISP had installed locally, and only get the newest headers, and then decide which article bodies to retrieve.. The articles could even contain rich content, like HTML and binary files. And to top it off, articles posted by some-one across the globe were transmitted from ISP to ISP, spreading over the world like an expanding mesh.

They called this.. USENET..

I realize that RSS is "teh hotness" and Usenet is "old and busted", and that "push is dead" etc. But for Pete's sake, don't send a unicast protocol to do a multicast (even if it is at the application layer) protocol's job!

It would of course be great if there was a "cache" hierarchy on usenet. Newsgroups could be styled after content providers URLs (e.g. cache.com.cnn, cache.com.livejournal.somegoth) and you could just subscribe to crap that way. There's nothing magical about what RSS readers do that the underlying stuff has to be all RRS-y and HTTP-y..

For real push you could even send the RSS via SMTP, and you could use your ISPs outgoing mail server to multiply your bandwidth (i.e. BCC).

--
SCO employee? Check out the bounty
1. Re:It just ain't broadcast.. by fiftyvolts · 2004-07-20 08:28 · Score: 4, Insightful
  
  You make some very good points. The old saying "When all you have is a hammer, everything looks like a nail" seems to ring true time and time again. These days it seems that everyone wants to use HTTP for everything and quite frankly it's not equipped to do that.
  
  RSS over SMTP sounds pretty cool. Heck, just sending a list of subscribers an email of RSS and let their mail clients sort it out would be pretty nice.
  
  Heh, my favorite posts are when some one suggested soething that sonuds totally novel and then someone else points our "Yeah! Like $lt;insert old and undeused technology>. It seems to do that damn well." The internet cannot forget its roots!
  
  --
  100% Crunchier
Re:Can't this be throttled? by ameoba · 2004-07-20 07:58 · Score: 4, Insightful

It seems kinda stupid to have the clients basing their updates on clock time. Doing an update on client startup and then every 60min after that would be just as easy as doing it on the clock time & would basically eliminate the whole DDOSesque thing.

--
my sig's at the bottom of the page.
Re:RSS needs better TCP stacks by Salamander · 2004-07-20 07:58 · Score: 5, Insightful

Leaving thousands upon thousands of connections open on the server is a terrible idea no matter how well-implemented the TCP stack is. The real solution is to use some sort of distributed mirroring facility so everyone could connect to a nearby copy of the feed and spread the load. The even better solution would be to distribute asynchronous update notifications as well as data, because polling always sucks. Each client would then get a message saying "xxx has updated, please fetch a copy from your nearest mirror" only when the content changes, providing darn near optimal network efficiency.

--
Slashdot - News for Herds. Stuff that Splatters.
RSS is like a DDoS attack on my brain by PCM2 · 2004-07-20 08:05 · Score: 5, Interesting

Am I the only one who finds it easier to get the information I want from the home pages of the sites I trust, rather than relying on an RSS feed? For one thing, in an RSS feed every story has the same priority ... stories keep coming in and I have no idea which ones are "bigger" than others. Sites like News.com, on the other hand, follow the newspaper's example of printing the headlines for the more important stories bigger. With RSS, it's just information overload, especially with the same stories duplicated at different sources, etc. Everyone seems really excited about RSS, but when I tried it I just couldn't figure out how to use it such that it would actually give me some real value vs. the resources I already have.

--
Breakfast served all day!
Publish/Subscribe by dgp · 2004-07-20 08:19 · Score: 4, Informative

That is mind bogglingly inefficient. Its like POP clients checking for new email every X minutes. Polling is wrong wrong wrong! Check out the select() libc call. Does the linux kernel go into a busy wait loop listening for every ethernet packet? no! it gets interrupted when a packet it ready!

http://www.mod-pubsub.org/
The apache module mod_pubsub might be a solution.

From the mod_pubsub FAQ:
What is mod_pubsub?

mod_pubsub is a set of libraries, tools, and scripts that enable publish and subscribe messaging over HTTP. mod_pubsub extends Apache by running within its mod_perl Web Server module.

What's the benefit of developing with mod_pubsub?

Real-time data delivery to and from Web Browsers without refreshing; without installing client-side software; and without Applets, ActiveX, or Plug-ins. This is useful for live portals and dashboards, and Web Browser notifications.

Jabber also saw a publish/subscribe mechanism as an important feature.
Re:Can't this be throttled? by mblase · 2004-07-20 08:25 · Score: 4, Insightful

Most RSS feeders *should* just download every hour from the time they start

That's also a problem, though, since most people start work at their computer desks on the hour, or very close to it. The better solution would be for the client (1) to check once at startup, then (2) pick a random number between one and sixty (or thirty or whatever) and (3) start checking the feed, hourly, after that many minutes. That's the only way to ensure a decently random distribution of hits.
Re:Can't this be throttled? by hunterx11 · 2004-07-20 08:52 · Score: 4, Funny

From now on, instead of telling people to fuck off I'll just say:
User-agent: You Disallow: /

--
English is easier said than done.