When RSS Traffic Looks Like a DDoS
An anonymous reader writes "Infoworld's CTO Chad Dickerson says he has a love/hate relationship with RSS. He loves the changes to his information production and consumption, but he hates the behavior of some RSS feed readers. Every hour, Infoworld "sees a massive surge of RSS newsreader activity" that "has all the characteristics of a distributed DoS attack." So many requests in such a short period of time are creating scaling issues. " We've seen similiar problems over the years. RSS (or as it should be called, "Speedfeed") is such a useful thing, it's unfortunate that it's ultimately just very stupid.
RSS may be ultimatly stupid but you didn't get first post did you! rookie!
Does this have anything to do with /. problems yesterday
-ravan_a
another article
they aren't just being /.ed?
we need RHS... really HARD syndication
Can't one just write a small php script or something which returns an error (i.e. 500), less data to send back, and hopefully the reader would just try again later.
The readers should HEAD to see if the last modified changed... And the feed rendering engines should make sure their last modified is accurate.
A programmer is a machine for converting coffee into code.
...so could someone recommend a couple of really good ones for Windows and *nix?
This is helpful.
Rhymes that keep their secrets will unfold behind the clouds.There upon the rainbow is the answer to a neverending story
Every hour, random sites "see a massive surge of /.'s news reader activity" that "has all the characteristics of a distributed DoS attack."
Slashdot (or as it should be called, "Sitefsck") is such a useful thing, it's unfortunate that it's ultimately just very stupid.
I don't really care for RSS either, but damn, was that necessary?
We've seen similiar problems over the years. RSS (or as it should be called, "Speedfeed") is such a useful thing, it's unfortunate that it's ultimately just very stupid.
And it seems to have gotten worse since the new code was installed- I get 503 errors at the top of every hour now on slashdot.
SJW: a person who perceives an injustice, and while correcting it, commits a greater injustice.
Since many clients request the new data every 30 minutes or so... how about a simple system that spreads out the load? A page that, based on some criteria (domain name, IP, random seed, round robin) gives each client a time it should check for updates (i.e. 17 past the hour).
Of course, this depends on the client to respect the request, but we already have systems that do (robots.txt), and they seem to work fairly well, most of the time.
"Faith: Belief without evidence in what is told by one who speaks without knowledge, of things without parallel." - A.B.
Why not make it standard that the starting time is chosen randomly or assigned by the remote site? "Forty-three minutes after the hour is pretty empty, from now on you can check the news at that time" or something similar.
RSS just needs better TCP stacks. Here's how it would work: when your RSS client connects to an RSS server, it would simply leave the connection open until the next time the RSS data got updated. Then you would receive a copy of the RSS content. You simply *couldn't* fetch data that hadn't been updated.
The reason this needs better TCP stacks is because every open connection is stored in kernel memory. That's not necessary. Once you have the connecting ip, port, and sequence number, those should go into a database, to be pulled out later when the content has been updated.
-russ
Don't piss off The Angry Economist
RSS readers and aggregators shouldn't gather new feeds every hour on the hour. They should gather them when the application is first run and then every hour after that (probably not on the hour). I'd hope most GUI applications already run this way. I guess most of this traffic just comes from daemon processes -- and that should be changed.
mbbac
RSS is infact living up to what it was made for, however, its getting used like a Chevy S-10 pulling a Semi trailer.
PHP just had a major overhaul, no reason why RSS2 shouldn't be on the drawing board. This time, though, more thought the scale of its use should be thought of.
--
if the deluge of traffic that RSS causes makes RSS "stupid," posting an article about the deluge of traffic RSS is causing on Slashdot is, at the very least, "ironic."
Well maybe somebody should set something up to syndicate RSS feeds via a peer to peer service. BitTorrent would work, but it could be improved upon (people would still be grabbing a torrent every hour, so it wouldn't completely solve the problem).
"Really stupid syndication"
----------------------------------
I'd rather not take sides until I hear the monkey's version - PHB
I used to have an RSS feed for google news that I loved and used all the time, but it was taken down due to this effect. It's a shame that these things can't be handled better. (the RSS feed may be back up, I haven't checked in months)
Or did the RSS reader authors hope that their applications wouldn't be used by anybody except for a few geeks?
...is what one would say to the designers of RSS.
Mainly, IF your client is smart enough to communicate that it only needs part of the page, guess what? The pages, especially after gzip compression(which, including with mod_gzip, can be done ahead of time)...the real overhead is all the nonsense, both on a protocol level and for the server in terms of CPU time, of opening+closing a TCP connection.
It's also the fault of the designers for not including strict rules as part of the standard for how frequently the client is allowed to check back, and, duh, the client shouldn't be user-configured to check at common times, like on the hour.
Bram figured this out with BitTorrent- the server can instruct the client on when it should next check back.
Please help metamoderate.
The guy who came up with the idea for RSS should be sent back to comp. sci. 101. It should have been readily apparent from day 1 that this would be a problem.
Some sort of peer-to-peer event-driven model would be a better match for this problem.
Karma: -2147483648 (Mostly affected by integer overflow)
"Despite 'only' being XML, RSS is the driving force fulfilling the Web's original promise: making the Web useful in an exciting, real-time way."
Err, did I miss the meeting where that was declared as the Web's original promise?
Anyway, the trouble is pretty obvious: RSS is just a polling mechanism to do fakey Push. (Wired had an interesting retrospective on their infamous "PUSH IS THE FUTURE" hand cover about PointCast.) And that's expensive, the cyber equivalent of a hoarde of screaming children asking "Are we there yet? Are we there yet? How about now? Are we there yet now? Are we there yet?" It would be good if we had an equally widely used "true Push" standard, where remote clients would register as listeners, and then the server can actually publish new content to the remote sites. However, in today's heavily firewall'd internet, I dunno if that would work so well, especially for home users.
I dunno. I kind of admit to not really grokking RSS, for me, the presentation is too much of the total package. (Or maybe I'm bitter because the weird intraday format that emerged for my own site doesn't really lend itself to RSS-ification...)
SO YOU'RE GOING TO DIE: The Comic for Dealing with Death
It's 'simple,' stupid. :)
Here's a solution: Have the RSS readers grab data every hour or half hour starting from when they are started up, not on the hour. This would of course distribute the "attacks" on the server.
I'd really, really like to.
Obviously, I can't, but boy would I like to.
Stupid RSS.
We use poisson distribution to even out the load our scripts generate.
In any commons, co-operation is key. I doubt most people will update their clients to work with HEAD or some sort of checksumming without reason, so the first obvious step is to block clients for a period. If a client retrieves information from a host, place a bam on all requests from said client until either the information changes, or there is a timeout value.
;)
On the client side, the software needs to be written to check for updates to the data before pulling the data. This will lessen the burder.
The other side of the problem is the fact that the clients default to asking for data at the top of the hour. As this scales up, even with checks to see if data has changed, you'll be seeing a synchronized rise in traffic which leads to a DDoS effect on systems. To fix this is the same way we fixed message ids: the interval that clients check the data on should be seeded semi-random intervals such that no more than subset n of the total i clients are checking for new data or transfering new data at any given time. This is something else that can be mitigated by having smarter server-side data blocks until users update to smarter clients. Otherwise the servers risk being DDoSed by these legions of stupid clients
--
Internet Explorer (n): Another bug -- that is, a feature that can't be turned off -- in Windows.
Newsreaders and users could both use random request times, rather than defaulting to the top of the hour.
Off topic indeed, this clearly should have been an Ask Slashdot.
Why not have rss readers that check on startup, then check again at user specified intervals.. After a random amount of time has past.
user starts program at 3.15 and it checks rss feed.
user sets check interval to 1 hour.
rand()%60 minutes later (let's say 37) it checks feed
every hour after that it checks the feed.
simplistic sure, but isn't rss in general?
on an aside, any of you (few) non-programmers interested in creating rss feeds, i put out some software that facilitates it.
hunterdavis.com/ssrss.html
Arse-feed
The basic problem with RSS is that it's a "pull" method - RSS clients have to make periodic requests "just to see". Also, there's no effective way to mirror content.
That's just plain retarded.
What they *should* do...
1) Content should be pushed from the source, so only *necessary* traffic is generated. It should be encrypted with a certificate so that clients can be sure they're getting content from the "right" server.
2) Any RSS client should also be able to act as a server, NTP style. Because of the certificate used in #1, this could be done easily while still ensuring that the content came from the "real" source.
3) Subscription to the RSS feed could be done on a "hand-off" basis. In other words, a client makes a request to be added to the update pool on the root RSS server. It either accepts the request, or redirects the client to one its already set up clients. Whereupon the process starts all over again. The client requests subscription to the service, and the request is either accepted or deferred. Wash, rinse, repeat until the subscription is accepted.
The result of this would be a system that could scale to just about any size, easily.
Anybody want to write it? (Unfortunately, my time is TAPPED!)
I have no problem with your religion until you decide it's reason to deprive others of the truth.
I seem to remember Windows scheduler being able to randomize scheduled event times within a 1 hour period. I think our RSS feeders need similar functions.
--You will rephrase your request for me to go to hell. Goto statements are not acceptable programming constructs
We have way too much traffic from dumb P2P schemes today, considering the relatively small volume of new content being distributed.
Ask RSS reader writers to program into their programs a suggestion that the refresh not be on the hour. It would distribute the load more evenly. Getting people to actually do this is another problem. Sounds to me like a little bit of lazy coding (not checking modified times in header), and a little bit of ignorance (RSS isn't big enough to cause a problem.... so doing this on the hour is OK right?) have just snowballed.
Help I'm a rock.
RSSOwl - http://rssowl.sourceforge.net/ is pretty good.
$ strings FTP.EXE | grep Copyright
@(#) Copyright (c) 1983 The Regents of the University of California.
On Windows I use RSS Bandit. Haven't found a non-sucky one for *nix, although I haven't looked all that hard. On OS X I use NetNewsWire, which while not great, does the job.
Still haven't tried these newfangled RSS readers.. (Score:3, Informative) ...so could someone recommend a couple of really good ones for Windows and *nix?
by Rezonant (775417) on 2004-07-20 12:35 (#9752026)
Ok. How is a question informative? Or is the fact that Rezonant has never used an RSS reader informative? Here's a +5 Informative for you: I haven't used an RSS reader either.
Of course, XML bloat has nothing to do with this.
It might not seem like it's worth much effort if a bunch of your customers are all downloading a few hundred bytes of headlines every hour, but it probably matters when they're all downloading movie trailers or OS updates. The caching of small stuff to keep from contibuting to someone else's slashdotting, is just a bonus.
Oh, and if an RSS is ten minutes old instead of "real time": The 1% of the population that actually cares, can just elect to not use the proxy.
It can even be a really cheap box, too, since it doesn't need to be reliable. Use cheap consumer-grade crap. If once per year a drive fails and you lose all your cache, so what?
As copyright owner of this comment, I authorize everyone to defeat any technological measure which limits access to it.
RSS Bandit (Windows)
Syndigator (X)
There is also a rss thunderbird extension Formzilla but you have to be using a version of thunderbird build with the xmlextras extension... it is all described in the post.
RSS is like a hi-jack of majordomo, by marketing dweebs.
E-mail - yes folks, good old fashioned SMTP, can be used for these things that RSS is supposedly 'good for'.
We do not need yet another protocol for transfering messages to each other. A properly defined X-Protocol addition, which allows for embedded XML in the Body text, would solve this distribution problem entirely.
Mail scales well. Like it or not, but it does. Its a perfect model for RSS
; -- the corruption of government starts with its secrets. a truly free people keep no secrets. --
My guess is that InfoWorld is dynamically generating the RSS for each request. A simple host-side cache of the generated XML, so hits just talk to the HTTP server and not the app server, would probably make this a non-issue.
Or are they *really* getting more RSS hits than image requests? If -- somehow -- that's the case, spend $500/mo on Akamai or Speedera and point RSS stuff there, and give the CDN a reasonable timeout (30 minutes or something). That guarantees you no more than about 500 hits per timeout period, or maybe one every 10 seconds. Surely the app server can handle that.
Then again, what do I know? I only worked there for five years, including two on infoworld.com. It's been a few years, but unless things have changed dramatically, that is one messed up IT organization.
Cheers
-b
If I wanted a sig I would have filled in that stupid box.
That way I can snag the story from someone who just downloaded it. :)
Bit Torrent + RSS = Problem Solved
If you're using NetNewsWire on OS X, try the Atom Beta, which, I'm sure it will come as no shock to you, adds support for Atom feeds.
www.google.com
This amazing site gets a number of news feeds but it also empowers you to find your own damn reader.
Complaining about people connecting to your RSS feeds "impolitely" is missing the mark a bit, I think. Even RSS readers that *do* check when the file was last changed, still download the entire feed when so much as a single character has changed.
There used to be a system where you could pull a list of recently posted articles off of a server that your ISP had installed locally, and only get the newest headers, and then decide which article bodies to retrieve.. The articles could even contain rich content, like HTML and binary files. And to top it off, articles posted by some-one across the globe were transmitted from ISP to ISP, spreading over the world like an expanding mesh.
They called this.. USENET..
I realize that RSS is "teh hotness" and Usenet is "old and busted", and that "push is dead" etc. But for Pete's sake, don't send a unicast protocol to do a multicast (even if it is at the application layer) protocol's job!
It would of course be great if there was a "cache" hierarchy on usenet. Newsgroups could be styled after content providers URLs (e.g. cache.com.cnn, cache.com.livejournal.somegoth) and you could just subscribe to crap that way. There's nothing magical about what RSS readers do that the underlying stuff has to be all RRS-y and HTTP-y..
For real push you could even send the RSS via SMTP, and you could use your ISPs outgoing mail server to multiply your bandwidth (i.e. BCC).
SCO employee? Check out the bounty
How about combining RSS with Bittorent? The RSS feeder would act more as a BT tracker.. Simply point the client to the nearest dood with a copy of the feed.....
Reality is in the mind of the beholder - me 1996
It's "informative by association." His post will attract answers. (Well, answers and people who bitch about moderation.)
Have RSS readers use a local proxy that is willing to cache RSS data.
Treat this like an HTML problem.
Demand your ISP support native multicast. As most ISPs are now owned by cable companies, and native multicast would enable efficient, scalable video distribution, don't bet on them being too receptive, but here again is another application for which native multicast would excel.
I use wTicker for my Windows computer and KNewsTicker for my Linux boxes. The latest version of wTicker won't run on my XP computer, but an older version does. It's still in beta and a little clunky, but the crawler takes up far less screen space than any other RSS reader I've tried.
It's all fun and games until someone loses the key to the handcuffs.
The main problem with most RSS feeds is that they update all information. Most of these run off a simple JavaScript that will run on a timer to get all the data again and again. A better solution would be to implement an XML RSS (or any language really) that uses a simple ID system for news items. When its time to update the news feed, find any new ID's existing; don't retrieve existing data, only new data. This would cut down a large chunk of bandwidth. A better idea would be to implement some type of component that access their database (or web server through isapi etc etc) that will update the content on the external server that requires the RSS. This would again cut out a large level of data transfer, and requests that would normally slow the server down. Yes this would need to be installed on the external web server, but if people need the news feeds, then you can force them to do it your way. This is similar to what I do with our system, we have 12,000 machines every few minutes accessing a database to send and receive new information, and we have very few problems with it.
TruePunk | Games
Reimplementing TCP using a database is excessive. Making a light connectionless protocol that does similar to what you described would be a lot simpler and not require reimplementing everyone's TCP stack.
Also, as much as I hate the fad of labelling everything P2P, having a P2P-ish network for this would help, too. The original server can just hand out MD5's, and clients propagate the actual text throughout the network.
Of course (and this relates to the P2P stuff), every newfangled toy these days is just a pathetic reimplementation of some original Internet protocol. Like, say, NNTP. Which does all of this already, and has for years. Ah well.
Don't think of it as a flame---it's more like an argument that does 3d6 fire damage
We need to have rss published out to be picked up by clients very much like IRC.
The RSS readers would connect up to the IRC and when new headlines are published into the IRC they would be picked up. (yes they may have to be signed but that is how hard, not hard)
this idea is very much a MOM (Message oriented Middleware)
but there are few/no good open source MOM products that can work with out needing java.
/. is especially pissy with this but I want breaking news, not whatever is new each hour. So I hammer the shit out of it with my client (and get banned). I'd like to see a service where I download one client (that has front-ends in Gnome pannel, the Windows tray, etc.) that the site (/., cspan, etc.) _pushes_ new updates to when I sign up. Those w/ dynamic IPs could, when they sign on, have their client automagically connect to a server that holds their unique user ID with their IP.
I haven't posted in so long, my sig is out of date.
Well, sure if you want to the absolute second, but if you spread the requests across 5 minutes, say, or something similar, it would certainly help, and I doubt most people would complain.
ad logicam Claiming a proposition is false because it was presented as the conclusion of a fallacious argument.
With both it and RSS (and Slashdot and the pages it links to), some kind of data is released by a server and because of the nature of the medium (being live) all clients want to get this data as soon as it becomes available. The result: DDoS.
Now I bet that not all of these thousands of people have a direct LAN connection to the server in question. Now isn't it stupid to send thousands of identical data packages, through the same connection, at the same time? The only thing it is, is a waste of money and resources.
Therefore, as I see it, there are two things that could really solve this problem:
Don't use a poll-based system, but a system in which a client registers with a server, and from then on the server initiates the transfers.
Have a possibility for one IP packet to reach a possibly huge amount of hosts.
For RSS, this can actually be implemented quite easily. You only need to cache the RSS feeds in HTTP proxies, and then you need a system with which the server can notify the clients that a new feed is available. And oh, clients would need to actually use these proxies for a change.
I leave open the task of designing a similar system for Web TV, I don't know enough about those protocols.
Using the distributed DNS system, or a system like DNS, we can push RSS content down to local servers. You still have go to to the site for the actual content, but the headlines are distributed.
This woul dbe an ideal solution, since most RSS feeds are a few K. There's room for a lot of RSS in 1 megabyte.
Of course, a caching proxy server would do the same thing.
Slashdot's rate-of-post filter: Preventing you from posting too many great ideas at once.
"Speedfeed" is such a useful thing, it's unfortunate that it's ultimately just very stupid.
Yeah, it is stupid, which is why most of us just call it RSS.
Why can't I moderate something "Wrong" or at least "Grossly Misinformed"?
All RSS feeds should look at impementing PUBSUB instead. RSS's major flaw is that it has to constantly poll sources to find updates. This is what creates the DOS effect. If they instead used a PUBSUB method (google if you'd like to know more) those that have subscribed would be notified of an update.
Am I the only one who finds it easier to get the information I want from the home pages of the sites I trust, rather than relying on an RSS feed? For one thing, in an RSS feed every story has the same priority ... stories keep coming in and I have no idea which ones are "bigger" than others. Sites like News.com, on the other hand, follow the newspaper's example of printing the headlines for the more important stories bigger. With RSS, it's just information overload, especially with the same stories duplicated at different sources, etc. Everyone seems really excited about RSS, but when I tried it I just couldn't figure out how to use it such that it would actually give me some real value vs. the resources I already have.
Breakfast served all day!
People, if you are going to serve up a popular RSS feed, use a seperate server (or servers). You can't control the clients so you have to be prepared to handle the worst case. Other than bandwidth, an RSS feed should never cause your site to stop handling requests for your site. Be prepared to put a caching appliance in front of your RSS feed if it's *really* popular (/.?)
That said, the client software should poll at intervals related to the start of the application and they should not retrieve the RSS unless it has changed since the last retrieve (hint: use HTTP HEAD). Developers should be shot if they are too lazy to implement these simple 'net friendly features.
I was just thinking the exact same thing. This seems like it's perfect for P2P. Query the trackers for a feed aged X minutes or less; if no matching feed is available, go to the original feed site and seed a new copy.
- Despite popular opinion, I am not perfect.
Hmmm. I'm neck-deep in DNS code anyway; is there any interest in a protocol that would encode update times -- probably not the updates themselves -- in DNS?
The concept is that every time you updated your blog, you'd do a Dynamic DNS push to a RSS name, say, rss.www.slashdot.org's TXT record, containing the Unix time in seconds of the last update (alternatively, and this is how I'd probably implement it in my custom server, lookups to rss.www.slashdot.org would cause a date-check on the entry). The TTL of the DNS entry could be increased to limit the update frequency of clients.
If this is cool (I'm sure some RSS dev's are trolling these comments), throw me an email or reply here. I'll do the server side if someone will integrate support for it into their client.
--Dan
RSS is great, you can easily look over 20 news sources quickly, in a common format. Unfortunately people update feeds in 5 minute intervals, but honestly I would say I hit slashdot every 20 minutes with or without rss. People want information quickly, and this creates hits regardless. This growth is good.
Better than doing a HEAD first to see if the feed has been udpated is to use the If-Modified-Since and/or ETag headers. If the feed hasn't been updated, the server sends a very small response saying so (roughly the same size as the response to HEAD), and doesn't send the whole feed--that all happens in one request/response. Doing HEAD first, and then GET if the feed has been udpated requires two requests and two responses any time the feed has been updated.
Convert RSS to HTML - integrate webfeeds into your website
That, and I don't grant the premise that "it's ultimately just very stupid." It's no stupider than a web browser or anything else when set on default settings. I think it could be argued that RSS actually reduces overall traffic ... not to mention that I don't think RSS has the sort of traction people are guessing it does, at least, not yet anyway.
I'm using Liferea version 0.5.1 under Linux right now. Compiles from source fine on Fedora Core 2 and has worked great for me so far.
bbh
Wasn't there a /. article a while back about one of the ntp servers out there (some .edu in Washington IIRC) that was getting DDoS'd by a bunch of home-user grade DSL/Cable routers updating their clocks all the time? Isn't this basically the same problem?
Don't blame me, I voted for Kodos
There's a variety of ways to deal with this issue. The solution many seem to be suggesting is to randomize request times so that there aren't big spikes in traffic every hour at the hour. That's certainly a good idea. Clients should also respect the ttl (polling at the interval that is listed in the feed), support conditional GET, and handle 304 (not modified) responses to minimize the number of requests they make for the full feed.
But the primary solution will end up being caching. With the exception of personalized RSS feeds, RSS feeds easily can be cached. Web-based RSS readers like Bloglines and My Yahoo already only read the RSS feed once, cache it, and display it to multiple readers. But popular RSS feeds are also easily proxy cached just like web pages, reducing the load on the original source servers.
Overall traffic isn't what anybody is complaining about- as I noted, the 503 errors seem to come at the top of every hour (I just got through not being able to read slashdot for a few minutes), which means, essentially, slashdot is recieving a slashdotting. Do I know that RSS is doing it? Not from this location which has limited investigation tools or capability to figure out what's really going on. But it might explain recent behavior of the site.
SJW: a person who perceives an injustice, and while correcting it, commits a greater injustice.
To quote their webpage:
a blog with unlimited bandwidth
blogs are software systems that allow you to easily post a series of documents to your website over time. Many people use blogs to display daily thoughts, rants, news stories, or pictures. If you run a blog, your readers can return to your site regularly to see the new content that you have posted. Before blogs came along, maintaining a website (and updating it regularly) was a relatively tedious process. Some might call blogging a social revolution---even if you do not buy the hype, you must admit that blogs are causing quite a stir.
kast is similar to a blogging system in that you can use it to regularly "post" new content to a group of readers. Of course, a blog, like any website, has limited bandwidth. Thus, the kinds of content you can post to a blog are usually limited to text and pictures, especially for popular blogs that are read by many people. By leveraging the distribution power of konspire2b, you can use kast to post files of any size to essentially as many readers as you want.
on your blog, you might have a "picture of the day". On your konspire2b channel, you can have a "movie trailer of the day" or even a "gnu/linux distribution of the day". Bandwidth limitations are essentially taken out of the equation.
and, thanks to kast's web-based user interface, you can use HTML comments to describe each broadcast and link back to relevant information on the web. In fact, the layout of kast's "received folder" interface almost looks like a blog.
That sounds an awful lot like fixing the inherent problem with RSS!
Jabber could work nicely for this type of thing.
Providers set up a Jabber presence for their individual RSS feeds.
Clients subscribe to Providers RSS presence.
Provider generates new news, And makes an announcement via it's jabber presence. This could even include all of the information that is normally in a RSS feed, making the need for RSS unessary!
Clients then can repoll the RSS for new news if they like.
If there is a problem with too many subscribers at once... then you limit the number of clients that can subscribe to any individual RSS jabber presence. If you try and subscribe to a full account, you get an auto-message informing you of the currently open account. The provider sets up 1 RSS jabber presence per 1000 subscirbers for instance, and then only announces the new news to each jabber account every 5 or 10 min, spreading out the hit on the RSS file.
ever thought about that...?
i have found, you can find,happiness in slavery!
Every hour, Infoworld "sees a massive surge of RSS newsreader activity" that "has all the characteristics of a distributed DoS attack."
If one RSS file is parsed by tens of thousands of readers every hour on the hour, how is that anything like "all the characteristics" of a DDoS? If only real DDoS's REALLY has all the characteristics of an RSS reader. (ie, Loaded one document at a specific interval, and doesn't do anything else for another hour)
Others have already mentioned that RSS is an attempt to fake a "push" in a technology that is all "pull".
I have what to my 10 minutes of thought on the subject appears to be a better solution - every web site that currently publishes an RDF page should instead push new entries to an NNTP newsgroup. I'd suggest that a heirarchy be created for it, then sort of a reverse of the URL for the group name, like rdf.org.slashdot or rdf.uk.co.thregister. Then the articles get propogated in a distributed manner and people read a copy on their nearest news server instead of hammering your web site over and over looking to see if there are updates.
Feel free to tear this idea to shreds.
The next Cmdr Taco duplicate will be ready soon, but subscribers can beat the rush and see it early!
Read this for some more thoughts on this..
Ya I think its great, no ads.
"I use a Mac because I'm just better than you are."
..but it doesn't help if the clients don't support either vanilla RSS syndication tags (ttl, skipDays, skipHours) or the tags defined by the optional syndication module (updatePeriod, updateFrequency, updateBase).
But even if every client obeyed these and used and respected appropiate HTTP headers (If-Modified-Since, Last-Modified, Expires), it would only make the request flood more synchronized. On the other hand, if the RSS generator randomized the syndication settings, it could distribute the load better and even premptively shift load off the peak times.
rss=real slow syndication ... ? :)
My Linux box is mostly there to sneer at, so I haven't gotten around to setting up anything over there yet.
Liberty in our Lifetime
Use FeedBurner as your public newsfeed to let their smart servers handle the brunt of the attack, plus you get stats and format independence (publish both RSS & Atom from 1 feed).
If the problem is not really one of bandwidth but of server speed, then have your scripts update some static file instead of generating the thing on the fly. Have the server cache the static file in memory, and then it can serve it out nearly instantly.
If you have a PHP generating an RSS XML document every time anybody hits it, you're just begging to be DOS'd.
- Give a man a fire and he's warm for a day, but set him on fire and he's warm for the rest of his life.
"When RSS Feeders Attack".. news at 11:00..
Since it is periodic "burst" bandwidth (as opposed to regular visitor traffic), they might be able to provide a better deal than what Slashdot has to pay their provider for the extra bandwidth accomodation. And with all these RSS sites, there's enough of a market for an Akamai-like service to code some kind of RSS proxy server and to offer this package.
Or better yet, force everyone to upgrade to an RSS client that randomizes within 15 minutes the times it checks in.
That is mind bogglingly inefficient. Its like POP clients checking for new email every X minutes. Polling is wrong wrong wrong! Check out the select() libc call. Does the linux kernel go into a busy wait loop listening for every ethernet packet? no! it gets interrupted when a packet it ready!
http://www.mod-pubsub.org/
The apache module mod_pubsub might be a solution.
From the mod_pubsub FAQ:
What is mod_pubsub?
mod_pubsub is a set of libraries, tools, and scripts that enable publish and subscribe messaging over HTTP. mod_pubsub extends Apache by running within its mod_perl Web Server module.
What's the benefit of developing with mod_pubsub?
Real-time data delivery to and from Web Browsers without refreshing; without installing client-side software; and without Applets, ActiveX, or Plug-ins. This is useful for live portals and dashboards, and Web Browser notifications.
Jabber also saw a publish/subscribe mechanism as an important feature.
Take a look at Konspire. It has a lot of the properties that you describe. They claim they are more scalable than Bittorrent.
The folks over at Netscape and/or UserLand should have studied the CDF standard first. Then they would have realized the value of specifying schedule information.
Emacs GNUS, you can even read /. through it!
if you use firefox I would recomend using a firefox extension called sage http://sage.mozdev.org/ to view RSS feeds.
There's a two-fold effect to this problem, that even a PUSH solution would not solve. With everyone simultaneously grabbing, you have to deal with the initial precursor blast of traffic (RSS fetch), and then you have to deal with the big shock wave of people coming in to get the actual content (content fetch).
/w content expiration times (not that you can fully expect all clients to adhere to them).
A Push method may stop the precursor, but you're still going to have to deal with everyone jamming into your site at the same time... probably even worse because if it became a 'standard' for clients, you would be faced with a lot more simultaneous content fetches than with a mixed Pull-on-the-half-our/Pull every 30 mins crowd.
I feel that the best method is to enforce RSS frequency through the delivered XML (I was actually quite dumbfounded when I didn't find that in the RSS 2.0 spec), and to have clients not operate on the hands of the clock, but to be distributed based on app start time. Additionally, site designers should be implementing caching and quick-delivery schemes for their RSS feeds, and be using HTTP headers
- JR
Instead of IRC, why not Usenet? (Or perhaps more like Usenet back in the days of UUCP.)
One line blog. I hear that they're called Twitters now.
I use SlashDock on Mac OS X. It's really great, because I just rightclick on the icon in the dock when I want to check for new developments on my favourite websites. It's very convenient especially when at work. Oh yeah, you can choose the update frequencees individually, and it starts the schedule on app launch, so it's radomized properly too.
load-balancing between web servers is eventually limitted by how many ports you can have open at once - depending on how you adjust your settings, anywhere from 10,000 to nearly 60,000 per IP address.
So, five front-end load balancers, with twenty IP addresses assigned to each, and a couple of spares to take over if one fails. That gives you what, six *million* concurrent connections? Your bandwidth issues/bills are going to GREATLY eclipse your serving capacity at that point.
Then it's just a matter of shoving cheap, non-redunant, commodity machines in a rack to handle the requests. Say, a bunch of low-power, tiny Epia-based machines all pulling from a central file server.
So, someone updates the RSS file on the file server. Each of the 5/10/100/however-many front-end servers access it once, at which point it goes into file cache. Then they all dish it out like crazy.
Yes, this is an expensive solution: But I like it. I like it because it puts the cost on the person that wants to distribute the RSS feed. Other solutions, like uploading it to NNTP servers, shifts the cost to nearly everyone BUT the provider of information.
steve
Oh, you're not stuck, you're just unable to let go of the onion rings.
I use Shrook, a lovely RSS reader for MacOS X. It uses distributed checking to get around this problem. From their FAQ:
A central server maintains a database of when each channel was last updated. To keep it up to date, every so often, the server chooses a computer to check for new items and report back. The frequency of this varies from every 5 minutes for popular channels, to every half hour for channels with only one online subscriber, and it tries to use a different computer each time. At the other end, each copy of Shrook checks in with the server every 5 minutes, and if any of its channels are out of date, it reloads them.
Nice. So not only does it stop DDoSing the web server, it means I get updates within five minutes instead of every half hour.
--
Karma: Chameleon (you come and go)
Bloglines avoids this problem completely by only fetching a feed once per iteration, regardless of the number of subscribers. We're also able to provide subscriber stats to feed publishers, something that you can't do with desktop aggregators. And no messy software to install.
I won't argue with those who have posted here that some alternative to the "pull" technology of RSS would be very useful. But...
The biggest problem I see isn't newsreaders but blogs. Somebody throws together a blog, inserts a little gizmo to display one of my feeds & then the page draws down the RSS every time the page is reloaded. Given the back-and-forth nature of a lot of folks' web browsing pattern, that means a single user might draw down one of my feeds 10-15 times in a 5 minute span. Now, why couldn't the blogger's software be set to load & cache a copy of the newsfeed according to a schedule?
The honorable mention for RSS abuse goes to the system administrator who set up a newreader screen saver that pulled one of my feeds. He then installed the screen saver on every PC in every office of his company. Every time the screen saver activated, POW! one feed drawn down...
"Obviously, I'm not an IBM computer any more than I'm an ashtray" (Bob Dylan)
as already mentioned, HTTP/1.1's If-Modified-Since header field would be a first hint
another way of minimizing the DDoS would be to use the Syndication module, which informs client programs when to update the feed - this would still mean that everyone tries to GET the RSS at the same time, but what about randomizing updateBase?
get out more.
Opera now has RSS built in. Just click on an rss link and it automatically adds it to your list of newsfeeds.
where the hell is all that Pull technology now?
eventing/callbacks/message passing/[async technology of choice] are always the better option.
Random intervals. I already patched my desktop RSS reader to request new feed every 73+-13 minutes.
There you are, staring at me again.
Netcraft actually confirms it!
But isn't this what TCP/IP multicast was invented for? I've never really understood why multicast has never really taken off. Too complicated? Instead of entering an rss server to pull from just join a multicast group and have the RSS blasted once every X minutes. Servers could even send out updates more often because there are only a few connections to send to. Of course I could be completely wrong and multicast may be the absolute wrong choice for this sort of application, it's been a while since I've read any documentation about it.
(B) + (D) + (B) + (D) = (K) + (&)
I recommend PulpFiction for an RSS/Atom reader on OS X. I much prefer the interface and how it treats the news compared to NNW.
All editorial writers ever do is come down from the hill after the battle is over and shoot the wounded.
it seems a few peoples here dont get it. RSS is the file format, not the transfer via HTTP The whole pull problem is a problem with HTTP, in theory you could make an irc like protocol and transmit via that, solving some of the subscription, distribution and pull problems.
If you don't use one computer all the time and you want to check your feeds from other places, I'd recommend going with a web-based news-agreggation service. I personally use BlogLines, but there are other services out there as well.
I get this error now when my user cookie is set, no matter what time of the day. Quite irritating.
The main problem here is that RSS lacks any sort of distributed flow control, much as the Internet did back in the early days with tons of UDP packets flying around everywhere and periodically bringing networks to their knees.
One completely backwards-compatible fashion to add flow-control to RSS would be to use the HTTP 503 response when server load is getting too high for your RSS files. The server simply sends an HTTP 503 response with a Retry-After header indicating how long the requesting client should wait before retrying.
Clients that ignore the retry interval or are overly aggressive could be punished by further 503 responses thus basically denying those aggressive clients access to the RSS feeds. Users of overly aggressive clients would soon find that they actually provide less fresh results and would place pressure on implementors to fix their implementations.
OmniWeb 5.0 has an RSS reader built right into the bookmarks manager, which is really neat. If only they could let us change the length of the headlines that show up in the Dock menu to something greater than just 30 characters...
Interesting. Today at work I was browsing and was getting errors constantly at Slashdot. Pages would load but not have backgrounds, I would get "you should not be here" etc errors. This was IIRC about noon. Slashdot in general has just seemed less responsive. The only time it flies anymore is late at night.
413 Request Entity Too Large
The server is refusing to process a request because the request entity is larger than the server is willing or able to process. The server MAY close the connection to prevent the client from continuing the request.
If the condition is temporary, the server SHOULD include a Retry- After header field to indicate that it is temporary and after what time the client MAY try again.
This suggests to me an overwhelmed RSS server could return a 413 error code with a randomized Retry-After header on the order of a couple of minutes. It seems to me that such a reply would help lessen the load on the server, as it wouldn't have to deal with content, even if said content was already cached.
If the problem is simply that of having too many open connections, then yeah, I guess you're hosed.
You're recommending a Microsoft spec on slashdot? Are you new? :D
Quite simply, your reply is good for the next two years or so, or maybe even ten years. However, it doesn't address the scalability issue, which is a concern in and of itself.
To address the scalability issue, perhaps there should be a distributed response network, to handle the distributed overload.
One idea? The RSS news feeder should always story the IP address of computers that it sends its most recent feed to, and append a random IP of one of the other computers that downloaded its feed.
Now, the news reader then turns around and the next time it goes to download its feed, it first asks one of the other sources. Three failures, and a source gets thrown out. Nonetheless, there will build up a distributed network of computers that first look for their feeds from another source.
Now, that the brings along with it the questionability of verification of the RSS feed line. I mean, how do you know that the RSS news feed doesn't direct you to the latest Windows Spam Takeover Site? However, that can be handled with another kind of RSS or similar technology, namely PGP.
Now, this all involves new technology. But the technology isn't all that difficult -- it's been done before, in bits and pieces. So it could be done again, I suppose. The real question, to me, is whether it's all worth it.
Correct Horse Battery Staple: 72 bits of entropy. Enter "Correct H" into google. When it generates the phrase, that's
When many sites withstand slashdotting that involves movies, images and dynamically generated pages. This kind of problem can only result from extraordinary stupidity of both client and server.
Start by running RSS reader on a cheaper separate server hosted by another ISP. If clients connect at random time, great. If they connect exactly on the hour, the ones that get through will only get the "news" about an RSS reader than will fix that problem.
NNTP already has all the issues solved pat.
DNA just wants to be free...
[sarcasm]That can't possibly be true - it's a client Java app![/sarcasm]
Galileo: "The Earth revolves around the Sun!"
Score: -1 100% Flamebait
The real problem is that RSS is a fileformat, not a protocol. This means that if the client wants "all entries since X" it'll get "the last Y" which is might not include everything it wants or might be way too much. RSS and ATOM needs to be complemented by a protocol. This protocol could very well be based on RSS or ATOM but includes the most trivial feature of specifying what it wants.
www.akamai.com
Seriously. You've got a bunch of people trying to download the exact same file from all over the world. This problem has been solved.
I think the suggestions for a whole new RSS specific protocol are somewhat off base. To me, the beauty of an RSS feed is that you don't really need anything special to supply one, just a web server, which you're probably already running, and a script to actually generate the file. I don't think there'd be a lot of success if RSS required special protocols etc. Maybe if you solve the general case of providing a push service and then distribute RSS data over that, but that's a lot of work. A previous poster mentioned using SMTP, and that could be an interesting solution.. Mail updates to an aggregation service and then read via IMAP? I don't make heavy use of RSS, so I don't know if that's a useful way to look at the data or not.
The problem with the server notifying the clients that something has changed doesn't really fix the problem (think thundering herd trying to acquire a mutex), unless it is smart enough to delay notification to clients to even out the load.
I think the real solution is to have the server dictate when the client should check back, and enforce that time delay. If the server just asks nicely, many clients will ignore that. My solution is to use one time passwords that are only valid in a certain window. With the RSS data, the client gets the next password and the window. The passwords would be randomly generated and the algorithm for choosing the window could be arbitrary. Regular HTTP AUTH would be sufficient here. A cron job could manage the list of valid passwords. You'd still have to provide an unprotected stream to get the first password or if the window has expired. To prevent abuse of this, you could limit accesses per IP per day, delay the first window, provide a smaller (empty?) stream, etc.. This could probably provide a backwards compatibility path too. Maybe make it a degraded stream to encourage users to upgrade their clients, or have the first article be "Update your client!" or something..
Just a few thoughts.. Anyone care to comment?
I use an RSS reader; it is a heavily modified version of rnews, which I customized for my own needs.
What my RSS reader does is it limits how often I make the request. So, it won't make a request until X minutes after the last time I made a request.
To be a good netizen, I don't set that to anything less than 30 minutes.
But the real beauty is that if I don't bother looking at the news for, say, 3 hours, it won't bother retrieving it. It will retrieve it when I look (so what if I have to wait a few seconds for it to download the feed?) and resets the timer to not check (no matter how often I reload the page) until X minutes have passed.
If people adopt an "on-demand" policy instead of scheduled, it should help push out the lifespan of this tech in its current form.
Diplomacy is the art of saying, "Nice doggie!" until you can find a rock.
Authors of RSS feed-grabbing software should do the responsible thing: allow the user to set a desired refresh-rate down to as low as say 1 hour, but always use a random 30 minute window for the actual refresh time.
So, if the user specifies a two hour refresh time, and the application just got done pulling a feed, it should sleep for 105 minutes plus a random amount of time between 0 and 30 minutes, which means the feed is actually updated randomly in the window of 1:45-2:15 after the previous update.
At the very very least if they don't do the above, they should at least base their refreshes on time intervals since startup (or since the last pull), so that you don't see the global synchronizations on
I use the "random 30 minute window" technique similarly at the office to distribute the load on an rsync server and it works wonderfully (all the machines in our whole environment wake up on a cronjob at a wee hour of the morning, each sleeps for a random 0-30 minutes time interval and then fires off its rsync request - the result is that rsync load is distributed evenly over a 30 minute period instead of the server getting pounded into the dust for 5 minutes straight.).
11*43+456^2
this has already been the topic of much discussion
It'd be pretty amsuing to read, and helpful to figure out if rendering or other problems (like the recent rash of 503s) has anything to do with me or if everyone gets them.
I think this was more or less the first thought I had about RSS when I first looked into it and found out that it was a "pull" technology rather than a "push" as the early descriptions of it implied.
Yes, it's "cool" that I can set up a page (or now use a browser plug-in) to automatically get a lot of content from hundreds of web pages at a time when I really opened up the browser to check my e-mail.
What would have REALLY, been cool would be some sort of technology that would notify me when something CHANGED. No effort on my part, no *needless* effort on the servers part.
Oh wait... We HAD that didn't we, I think they were called Listservers, and they worked just fine. (Still do actually as I get a number of updates, including Slashdot, that way.) RSS advocates (and I won't mention any names) keep making pronouncements like "e-mail s dead!" simply because they have gotten themselves and their hosting companies on some black hole lists. Cry me a river now that your bandwidth costs are going through the roof and yet nobody is clicking though on your web page ads, because, guess what? Nobody is visiting your page. They have all they need to know about your updates via your RSS feeds.
The reason why RSS took off is because everyone has a friggin' web server. Most ISPs throw in some web space with your dial-up/DSL account. They don't throw in a multicast server.
RSS wasn't orignally invented to handle InfoWorld's traffic. It was a blogging thang. Most blogs don't get that much traffic so it's a fine solution.
This has been known since the beginning of networking.
If you overload a resource, you back off - EXPONENTIALLY and RANDOMLY DISTRIBUTED. The more congested, the longer you back off. And you do NOT want lockstepping, so you add a random component.
Do RSS servers know how to tell the clients "I'm congested"?
We already know the data is going to be small, with lots of clients wanting to get the information -- seems like something that logically should be p2p.
Something simple, like the first time you connect, you try the main server, and are given a list of partners available to get the feed from. The next client does the same thing, and now you're one of the list given to them. If all your partners are unavailable, or none of them have the data, you connect back to the server and start over.
RandomAndInteresting.comdefending the world from stupidity since 1979
This isn't hard. RSS content in the Infoworld's case isn't personalised (i.e. tailored to a given user) -- so why not just push out static RSS files to a web server file system when content which is included in the RSS feed changes?
In which case, if Infoworld's web server can't keep up with a bunch of GET and HEAD requests for static content then perhaps Infoworld should think about getting a new static content hosting provider.
Now, I'd be betting that Infoworld calculates the RSS request on every hit -- and there lies the problem.
(In fairness, RSS clients should support e-tags and Last-Modified headers and gzip compression -- any clients not doing this are brain dead -- but even then if you can't handle a few extra hits every hour from every reader of your website, you have a fundamental problem with your infrastructure -- RSS can be improved, sure, but don't blame RSS for a lack of infrastructure or serving it dynamically each time.)
That's still way to bloated to scale properly to the multiple-thousand user's you're talking about. Anything that requires a separate network connection for each user, and/or requires the server to keep track of all the "listeners" is not very practical.
The better technology for this is multicasting. And specifically the new and much improved multicasting technology built into IPv6.
Neither Windows nor Unix, but I've set up Feed on Feeds on my webserver and I like it!
It's a "PHP/MySQL server side RSS/Atom aggregator", so you can read your feeds wherever you are, you only need a web browser on the client side.
Pros:
1) you don't need to synchronize the state between the multiple workstations you might use.
2) no platform/os problem on the client side.
Cons:
1) you need some web hosting with PHP and MySQL available (I pay 45 a year for my domain name + 30MB Webspace + 30MB FTP + 30MB MySQL base + 100*25MB pop/imap accounts + SSL everywhere).
2) no installer so you'll need many computing skills to set it up (no that hard).
3) no automated update, you have to click "Update" so you may miss some news when you offline (see away from any internet access) for a long period...
Changed my online life as I no longer have to install anything on the client side (usefull when away from your home/office) or have to synchronize my feeds either with some removable storage (my USB key failed after 250+ daily syncs) or through the net (BottomFeeder, a smalltalk implementation which works on every platform I ever came accross, allows to sync with an FTP location).
Regards,
Poulpy.
Email me if you're an RSS developer.
I just downloaded and installed it. And I'm using it. It is pretty good. I was able to import my subscriptions from Bloglines into RSSOwl. It used Mozilla 1.4 has a built-in browser on Linux, and IE 5+ on Windows. Neat.
And, BTW, client-side Java is pretty good. I'm happy to see an SWT-based GUI application other than the Eclipse IDE itself. It's a proof-of-concept (and you have the source). Now if you want to write a multi-OS GUI app in Java, you know what to refer to.
what about making a client that just uses bittorrent files to get the RSS data, use 1 random client to get the RSS feed, then he seeds it to all the others requesting using BT, would be pretty awesome and really fast
Excuse me, I don't mean to impose, but I am the ocean
That's why we needed a push technology. Unfortunately, we were too stupid to realise it during the dot-com craze and now Netscape will probably refuse to reimplement it for us...
Future Wiki -- If you don't think about the future, you cannot have one.
True story:
We ran a network operations center to provide support for several hundred servers spread over two continents. Each hour, every server would 'phone home' to see if it needed updates or configuration changes. This was a fairly data-heavy operation, requiring many database lookups. We knew that we didn't want every server calling at the same time, so we had each server derive its own random integer between 1 and 59, and to use that as the minute of the hour to contact the NOC.
Before long we found that the NOC was dragging itself into a death spiral of overwork. The problem? By chance, an unusually large number of servers chose a very small range of numbers. Worse, they just happened to choose numbers close to 05, which just happened to be when some very large cron tasks were running as well.
Try rolling a die 100 times. Even though the odds are the same every time before you roll, the actual frequency of occurence of the individual numbers is not even. Leaving the choice of retrieval time to the client does not reliably reduce the chance of a server being overwhelmed. In fact, it more or less guarantees traffic spikes.
I'm not intimately familiar with RSS client or server implementations, but I suspect that it would be fairly easy to format a suggested refresh interval and refresh time on the server and send that to the client.
Crumb's Corollary: Never bring a knife to a bun fight.
Even for a poll at hourly intervals this should get staggered across an given hour according to when the client starts. Also, a client should probably not be polling every 3600 seconds (or whatever interval) but polling with a 3600 second gap between end of one poll and start of the next. In this way a loaded server will smear the clients out simply by having slower response, and the load will even out on its own.
It's always bad to have lots of agents doing things in synchrony when that involves an outside resource. Contact the client authors, give them a clue, let the upgrades push the bugfix out.
Finally, isn't RSS done over HTTP anyway? So why aren't these clients going through their ISP's proxy and doing Get-If-Modified? The target server should see only a fraction of the spike even with bad clients. Unless they're very very bad...
None of these things is a direct flaw in RSS, just crap quality of implementation in RSS clients.
Cameron Simpson, DoD#743 cs@cskk.id.au http://www.cskk.ezoshosting.com/cs/
So, it's kind of like an hourly slashdotting. Speaking of which....
They that quote Benjamin Franklin on liberty and safety deserve neither.
Why not add to the content, an extra file with details of when such data gets updated, or when the next update will happen as well as getting the md5 of the current content to decide if to download or not the new content.
/get_nextupdatetime.xml /get_content_md5.xml to decide if to do the whole big download of /get_our_realcontent.xml
So just one request for
and one for
Oh and the server can randomize the result of get_nextupdatetime.xml by jiffying the real time += 10 minutes so its not always exact, so all clients will get slightly different values.
Liberty freedom are no1, not dicks in suits.
These days it seems that everyone wants to use HTTP for everything and quite frankly it's not equipped to do that.
What else goes through common Internet firewalls as cleanly as HTTP? Many providers provide WWW access at a discounted rate compared to Internet access.
Opera has RSS built in.
A possiblility is to try to reinvent the NNTP model over HTTP. Have big "super-nodes" which poll the originating feeds and store the result in some big database, and then allow users to pull down one big feed containing stuff from all of the sources they are interested in.
Of course, there'd have to be something in it for the super-nodes, and I suspect what would happen is that they would charge a nominal fee, or perhaps bundle it alongside some other, similar service. One example of this is LiveJournal, which currently distributes RSS and Atom feed content to any interested LiveJournal user via the "friends page" mechanism from a single database, so there's only one poll per hour (or so). All they need now is some kind of feed of the "friends view" and you have a special version of a feed distributor with some value-add: you get all of your LiveJournal friends' content in there too.
Bare RSS isn't set up for this since it can't support per-item source information, but Atom can do it and RSS can be extended through namespaces to contain the relevant info as long as it becomes popular enough that clients support it.
In fact, if this were to be done, it would also be useful to have an optional "intelligent poll" mode where the client tells the server a magic token it got on its last poll which the server can then use to give a delta feed rather than a fixed feed. This would have to be optional, since the CPU burn of it vs. just copying a static file to a socket would probably only be a win on big sites like Slashdot and BBC News Online.
In fact, it looks like the Atom guys already thought of all this. Check out AggregateFeeds, SuperAggregator and the overly-long PossibleHTTPExtensionForEfficientFeedTransfer entries in the Atom Wiki. I didn't read it all through in depth, but it looks like they're talking about the same thing I'm talking about.
I don't use RSS (and Atom) for reading the news, I use it to monitoring posts in people's weblogs and sometimes even the comments to them if the content producer has been nice enough to make those available.
I agree that using RSS to read "real news" is like trying to read someone else's newspaper from the other end of a train car.
I'm happy to see an SWT-based GUI application other than the Eclipse IDE itself.
:P this is just an example of another SWT-based app)
Maybe you should try Azureus, for your BitTorrent needs (I know, many are happy with btdownloadcurses
http://blogs.law.harvard.edu/tech/bitTorrent
Heh, just like the car alarm. A great idea that never should have been invented. One of those, "What were they thinking?" kinda things.
--
If I actually could spell I'd have spelled it right in the first place.
http://www.wired.com/news/infostructure/0,1377,626 51,00.html
http://blogs.law.harvard.edu/tech/bitTorrent
http://slashdot.org/article.pl?sid=04/06/21/015024 3
http://www.digitalbloc.com/200403/rss-bittorrent-n on-stop-downloading.shtml
The site serving RSS could always report Status: site busy try later, and RSS readers could come back and have another go later.
After all, it's not like a user is actually reading his RSS feeds on the hour. A manual refresh will reset the back-off and try again.
I can't recommend anything for Windows, but @ centericq has support for RSS feeds (and a whack of LiveJournal support, not to mention irc/ICQ/ypager/MSN plus some other protocols I don't use, and it's text-based).
Unfortunately, I haven't been successful in getting it to send newsitems via e-mail (although I did succeed with all the chat protocols), so I have a crontab running @ rss2email. (the reason I like sending all this via e-mail is because I have a @ Motorola T900)
I've been using aKregator for some time now. It is progressively sucking less and less. It's almost at a point where it's good.
Gee... it couldn't be the size of the feed itself, could it? Hell, I'll just throw a thousand records into an xml structure. RSS, while convenient, is inefficient in my book.
schedule.
If I have to explain that...
Reduce Bandwidth by Combining RSS with BitTorrent.
RSS is very easily cacheable. That is to say, it can be treated as static content that is updated every so often.
A simple solution would be to update a static file every 5 minutes or so. Once that's done, you can write a very small C program that listens on a port, and as soon as it receives a packet, returns a static HTTP response with the RSS file, that is already cached in memory. This small C program would update the cached copy it had in memory every 5 minutes from the updated version. During the udpate, the file could be gzipped to reduce bandwidth consumption; since the file is PRE-gzipped, there is no increase in CPU usage to serve the gzipped file.
That done, the CPU and memory requirements should be minimal. There would remain several possible bottlenecks, which might include bandwidth, and system-wide TCP/IP issues.
The bandwidth issue can be solved for a reasonable price. If the site involved does not have sufficient bandwidth to handle the load, a very cheap solution is to rent a dedicated server at one of a variety of providers. Generally, looking at the most popular providers, you will pay under 100$ for a server with over 1000GB/mth of transfer on a 100mbit connection.
Assuming that every hour, one hundred thousand users attempt to get an updated RSS feed, and that RSS feed is 10KB gzipped to 5KB, a dedicated server with about 350GB/mth would suffice. Assuming the server had a 100mbit connection, it would take about 40 seconds to serve all those clients. Considering that the clients will not all request the file at exactly the same time, but due to slightly different system clocks, at varying times, this should be sufficient.
If the load is too great for a 100mbit connection to handle, it may be possible to get a gigabit connection from a more professional provider for a reasonable cost, on the assurance that it is only for very short bursts. The cost would be significantly greater, but generally still reasonable for a site large enough to have a hundred thousand RSS requests per hour.
Or, of course, if all the above is too complicated, you could just rent a big-assed server from a dedicated server provider and turn on Apache's server-side cacheing.
I've been working my own site completely based on consolidating all the newsfeeds popping up on the web and making it all so much more usable
http://fooey.net/NewsArchives
It makes it easier to stay up to date on all your favorite sites, and includes searching and page caching (take that slashdot effect!)
another big plus is many many users can all use the same RSS feeds, yet only one request per hour has to be made
I'm planning on making it so you can create an account and pick the feeds you want, and after that greatly increase the number of feeds provided but real life and real work keep interupting
Anyways, I built it for myself originally, butit didn't take long before everyone at work had it bookmarked. So I figure if other people find it useful, I'd be glad to share! =)
...if the server is smart enough to support it, too. A lot of "big" content management systems -- possibly like the one Infoworld uses -- don't; they generate their RSS feeds on the fly for every request.
I'm not sure whether Slashdot supports the conditional get, but from a cursory examination of NetNewsWire's bandwidth report, the answer is no. Part of the problem Commander Taco is complaining about in his commentary on this could be addressed by making Slashcode smarter.
What about my Mail.app checking my IMAP and POP accounts every minute? If a heavy traffic server like .Mac's can handle the load balancing of thousands of users simultaneously, can't they do the same for RSS? Load balancing isn't a Web-serving specific solution either.
>On Windows I use RSS Bandit
Pronounced "ArseBandit"?
That's priceless, to a Brit at least.
.
They will never know the simple pleasure of a monkey knife fight
Opera's mail client M2 has an integrated RSS feedreader. New articles appear as a new mail.
I believe that there is a module for RSS that allows you to add scheduling info. But it doesn't really help, because it just shifts when the fetch occurs from on-the-hour to on-the-day or on-the-month. Just as big a peak, just at a different time.
It means that actually RSS software fails to do its role, of collecting all updates from a source. Plain on mailing lists don't fail here where RSS miserably fails. If your old washing machine dies and cuts your electricity with it just when you go out for the weekend, then when you come back you have lost all the weekend changes in your RSS feeds, but your mailing lists updates safely wait for you on your mail server!
Still sticking with just HTTP and RSS as it is now, some kind of if-modified-since HTTP request would greatly reduce the load. That or a checksum. Or a date-time stamp.
It would also be possible and more complex to make a TCP or UDP based RSS designed to be robust and minimize effects of heavy use. A lot of information can be crammed into a single UDP packet, or it could just be a checksum or even just a date-time stamp.
Beta is broken and the link to classic doesn't work. Stop wasting our time or there won't be anybody left here.
BitTorrent has so much setup overhead that's it's silly to use it for small files like RSS feeds. You have to connect to a tracker, get a list of peers, and wait for a peer to optimistically unchoke you. Just the "connect to tracker" part of the BitTorrent handshake probable requires as much work for a server as just sending out the RSS over HTTP. So you would be trading a slashdotting of your web server for a slashdotting of your BitTorrent tracker.
Also, using BitTorrent for RSS doesn't solve the firewall problem, which is why other "push" approaches to RSS distribution won't work. Most enterprises are not going to allow any type of push protocol into their networks, and 95% of home users won't be able to figure out how to do all the firewall shenanigans necessary to make BitTorrent work.
It seems to me that everybody on Slashdot wants to use BitTorrent for everything these days, even though BitTorrent is only good at one thing: decreasing the bandwidth required for distributing large files (not small ones).
Using e-mail to push an RSS feed has its own downsides:
baby raping murdering islamic killer. you are a pedophile, and when you cant get boy ass you fuck men, and your father, fucker.
This seems to be the case. I went to a talk by James Robrtson (aka Bottomfeeder RSS client) last night and his opinion was that this was their problem, and they were not understanding the issue. Cache it as static content, use mod_gzip and let Apache handle it.
Xix.
"Everything is adjustable, provided you have the right tools"