When RSS Traffic Looks Like a DDoS

RSS maybe by Anonymous Coward · 2004-07-20 07:33 · Score: 3, Funny

RSS may be ultimatly stupid but you didn't get first post did you! rookie!

Re:RSS maybe by Anonymous Coward · 2004-07-20 07:41 · Score: 1, Insightful

First post for once finally used in the correct context of a story and its modded offtopic, damn. Thought I had a winner.
Re:RSS maybe by andufo82 · 2004-07-20 14:56 · Score: 0

Thats what you think smart azz

--
Temet Nosce

Yesterday by ravan_a · 2004-07-20 07:34 · Score: 3, Interesting

Does this have anything to do with /. problems yesterday

--
-ravan_a

Re:Yesterday by Anonymous Coward · 2004-07-20 07:41 · Score: 0

Yesterday? What about today too :)
Re:Yesterday by afidel · 2004-07-20 07:53 · Score: 2, Interesting

Oh how prophetic, I went to check the first reply to your post and slashdot again did the white page thing (top and left borders with a white page and no right border). Earlier today (around noon EST) I was getting nothing but 503's. This new code has not been good to Slashdot.

--
There are 4 boxes to use in the defense of liberty: soap, ballot, jury, ammo. Use in that order. Starting now.
Re:Yesterday by Anonymous Coward · 2004-07-20 07:55 · Score: 0

I have been having all kinds of problems for the last maybe 4 days.
Re:Yesterday by cuzality · 2004-07-20 07:59 · Score: 1

Slashdot must have been farked...

...
Read about Scandal@Gmail
Re:Yesterday by proj_2501 · 2004-07-20 08:39 · Score: 1

I'm pretty sure the white page thing is a Gecko bug. I only see this in Firefox and Camino. If you go to another page then hit back, everything shows up as normal.
Re:Yesterday by vk2 · 2004-07-20 09:00 · Score: 1

Taco, at least you should not post as Anonymous Coward.
I have been having all kinds of problems for the last maybe 4 days.

--
No Sig for you.!
Re:Yesterday by dcam · 2004-07-20 14:06 · Score: 1

Are you sure? I think what is actually happening is that /. is sending part of the page and the dropping the connection. Moz/Fire* then renders what it has been given. Unfortunately a view source in Mozilla results in a new request so this is hard to verify.

--
meh
Re:Yesterday by proj_2501 · 2004-07-20 16:02 · Score: 1

well, on my connection at work, it comes up a LOT faster if I surf to another page, then go back, leading me to believe that it's pulled from cache.

i suppose i could yank my net cable out if it happens again and then see what shows up, but it doesn't bother me that much
Re:Yesterday by stoborrobots · 2004-07-20 19:13 · Score: 1

I actually deduced over the weekend that the entire page is actually getting rendered (for me, at least), but the width is like twice what it should be.

I discovered this because sometimes, this happens and leaves a scrollbar at the bottom, allowing me to find the remainder of the page, (but sometimes it doesn't). However, even if there is no scroll bar, you can still use find-as-you-type to find the links, and activate them...

I also noticed that it happens far more often when at home on dial-up... and almost never here at work on a broadband link.

I suspect that the ads/images are choking the link, and due to timeouts of something in the rendering process, the page gets rendered wider than the screen, and then the scrollbars get taken away... At least if I watch carefully, that's what happens...

--
"Go to CNN [for a] spell-checked, fact-checked summary" -- CmdrTaco
Re:Yesterday by snake_dad · 2004-07-20 19:20 · Score: 1

FWIW, an annoying bug in rendering Slashdot was recently fixed in Mozilla. At least, I don't see it anymore, using Mozilla 1.8a2.

--
karma capped .sig seeking available Slashdot poster for long-term relationship.
Re:Yesterday by dcam · 2004-07-21 12:00 · Score: 1

I've been having the same problems today and after a little messing around I have come to the concolusion that it is Moz. When I opened a page in a new tab, it rendered fine, moved to another tab and moved back to the original tab the page was having issues. One less mystery.

--
meh

netcraft article by croddy · 2004-07-20 07:34 · Score: 4, Informative

another article

Are they sure by foidulus · 2004-07-20 07:34 · Score: 0, Troll

they aren't just being /.ed?

Re:Are they sure by Saeed+al-Sahaf · 2004-07-20 07:54 · Score: 1

Are they sure they aren't just being /.ed?
Yes, of course! That's it! They should have known better than to run Infoworld off a 286 and a DSL conx in some guy's basement.

--
"Who are in control, they are not in control of anything - they don't even control themselves!" - Glen Beck

RHS by Anonymous Coward · 2004-07-20 07:34 · Score: 1, Funny

we need RHS... really HARD syndication

Re:RHS by croddy · 2004-07-20 07:36 · Score: 1

It stands for RDF Site Summary.
Re:RHS by Anonymous Coward · 2004-07-20 07:43 · Score: 0

it's called irony, d00d. discuss.
Re:RHS by The+Other+White+Boy · 2004-07-20 07:55 · Score: 1

sure it is, alanis.
Re:RHS by shadowcabbit · 2004-07-20 08:08 · Score: 2, Funny

we need RHS... really HARD syndication

That's nothing compared to RMS, which (according to RMS) stands for GNU/Recursive Meta-Syndication.

--
"Why Subscribe?" Good question...

Can't this be throttled? by xplosiv · 2004-07-20 07:34 · Score: 2, Interesting

Can't one just write a small php script or something which returns an error (i.e. 500), less data to send back, and hopefully the reader would just try again later.

Re:Can't this be throttled? by jcain · 2004-07-20 07:39 · Score: 3, Insightful

That kind of eliminates the point of having the RSS at all, as the user no longer gets up-to-the-minute information.

Also, I doubt that the major problem here is bandwidth, more the number of requests the server has to deal with. RSS feeds are quite small (just text most of the time). The server would still have to run that PHP script you suggest.
Re:Can't this be throttled? by mgoodman · 2004-07-20 07:40 · Score: 4, Insightful

Then their RSS client would barf on the input and the user wouldn't see any of the previously downloaded news feeds, in some cases.

Or rather, anyone that programs an RSS reader so horribly as to make it so that every client downloads information every hour on the hour would probably also barf on the input of a 500 or 404 error.

Most RSS feeders *should* just download every hour from the time they start, making the download intervals between users more or less random and well-dispersed. And if you want it more than every hour, well then edit the source and compile it yourself :P

--
01100111 01100101 01110100 00100000 01101111 01110101 01110100 00100000 01101101 01101111 01110010 01100101 00101110
Re:Can't this be throttled? by ameoba · 2004-07-20 07:58 · Score: 4, Insightful

It seems kinda stupid to have the clients basing their updates on clock time. Doing an update on client startup and then every 60min after that would be just as easy as doing it on the clock time & would basically eliminate the whole DDOSesque thing.

--
my sig's at the bottom of the page.
Re:Can't this be throttled? by TREE · 2004-07-20 08:05 · Score: 2, Insightful

500 or 404 won't work for RSS, since most readers just eat the error and try again later.

What would really, really be effective would be a valid RSS feed that contained an error message in-line describing why your request was rejected. A few big sites doing this would rapidly get the rest of the users and clients to be updated.
Re:Can't this be throttled? by lukewarmfusion · 2004-07-20 08:05 · Score: 0, Redundant

And if you want it more than every hour, well then edit the source and compile it yourself "

My newsreader is proprietary, you insensitive clod!
Re:Can't this be throttled? by Zaiff+Urgulbunger · 2004-07-20 08:16 · Score: 0, Troll

"...a valid RSS feed that contained an error message in-line describing why your request was rejected"

What, like, "Fuck off! Just F-U-C-K O-F-F and take your stupid fuck-wit, fuckity news reader fucking software with you! Fucking Cunt!". Would that be fair?

I do think its important to be *firm* and *unambiguous* with your message!! :-D
Re:Can't this be throttled? by mblase · 2004-07-20 08:25 · Score: 4, Insightful

Most RSS feeders *should* just download every hour from the time they start

That's also a problem, though, since most people start work at their computer desks on the hour, or very close to it. The better solution would be for the client (1) to check once at startup, then (2) pick a random number between one and sixty (or thirty or whatever) and (3) start checking the feed, hourly, after that many minutes. That's the only way to ensure a decently random distribution of hits.
Re:Can't this be throttled? by Bishop923 · 2004-07-20 08:26 · Score: 1

Most RSS feeders *should* just download every hour from the time they start, making the download intervals between users more or less random and well-dispersed.

While it would certainly help, you would still see spikes during the week from workers starting their computers at 9 AM +/- 5 minutes or so. A better method would be to add or subtract a random value of between 5-15 minutes from the hourly check which should spread it even more(of course in a worse-case scenario it would be 90 minutes from one check to another, but we aren't talking about mission-critical tasks here...)
Re:Can't this be throttled? by hunterx11 · 2004-07-20 08:52 · Score: 4, Funny

From now on, instead of telling people to fuck off I'll just say:
User-agent: You Disallow: /

--
English is easier said than done.
Re:Can't this be throttled? by Zaiff+Urgulbunger · 2004-07-20 08:53 · Score: 1

The above post was intended to be read with tongue firmly in cheek! Re-read it... honestly, its funny!
Re:Can't this be throttled? by Fat+Cow · 2004-07-20 08:58 · Score: 2, Informative

I think that the problem is the peak load - unfortunately the rss readers all download at the same time (they should be more uniformly distributed within the minimum update period). This means that you have to design your system to cope with the peak load, but then all that capacity is sitting idle the rest of the time.
The electricity production system has the same problem

--
stay frosty and alert
Re:Can't this be throttled? by mgoodman · 2004-07-20 09:03 · Score: 1

I actually mentioned this method in a later another post, as I couldn't edit this one. D'oh heh.

--
01100111 01100101 01110100 00100000 01101111 01110101 01110100 00100000 01101101 01101111 01110010 01100101 00101110
Re:Can't this be throttled? by Burpmaster · 2004-07-20 09:32 · Score: 1

Nah, we don't need any complicated solution like random distribution. Just have everyone set their RSS reader to update 30 minutes after every hour. That'll fix it!
Re:Can't this be throttled? by FryGuy1013 · 2004-07-20 11:09 · Score: 1

This is what slashdot does when you try to check their feeds too often.

--
bananas like monkeys.
Re:Can't this be throttled? by Anonymous Coward · 2004-07-20 11:20 · Score: 2, Insightful

How about having the SERVER tell the client when to download next? Sort'a like DHCP, but more inteligent: The server will even out the TTL by some sort of gausian algorithm, and in that method save itself!

If certian users want news more often, (say every 15 minutes, verses every hour), have the client say that it would like news every 15 minutes, and the server will schedule it (almost like a calendar), and will send the client a TTL that is almost 15 minutes (but close enough). Infact, this might be the better route: fundamentally change the way RSS works, so that newsreaders are REQUIRED to RSVP, and the ones that don't get an error message (telling the client about newsreaders that are supported)
Re:Can't this be throttled? by Spy+Hunter · 2004-07-20 15:56 · Score: 1

What they *should* do is for the first minute of every hour add a top headline reading "Please configure your news reader to download our news feed at a different time."

--
main(c,r){for(r=32;r;) printf(++c>31?c=!r--,"\n":c<r?" ":~c&r?" `":" #");}
Re:Can't this be throttled? by dglaude · 2004-07-20 18:03 · Score: 1

Ever since there was network protocol using fixed timer, we have seen some global synchronisation effect generating fluctuation in network performance.
All new protocol that include some timer should do it with a suffisently random interval.

RIP: 30 s
OSPF: 30 min
STP: 2 s
RSS: 1 hour
This is all wrong.

--
Don't let the computer/expert control the election. Information for Belgium in french: http://www.poureva.be/
Re:Can't this be throttled? by nwbvt · 2004-07-20 18:13 · Score: 1

Up to the minute information is not the point of RSS at all. In fact, people who think it is (and who thus set their aggregators to check the feeds every 10 minutes) are part of the problem.

--
Mathematics is made of 50 percent formulas, 50 percent proofs, and 50 percent imagination.
Re:Can't this be throttled? by trashme · 2004-07-28 02:37 · Score: 1

Having the server schedule the feed retrieval has lots of little problems:

- The server must keep state on every client, whether they want to retrieve news right now or not.

- The server either has to initiate the connection with the client (firewalls make this method a problem) or you keep persistent connections open (which wastes ports since most will not be in use).

Simple HTTP Solution by inertia187 · 2004-07-20 07:34 · Score: 3, Informative

The readers should HEAD to see if the last modified changed... And the feed rendering engines should make sure their last modified is accurate.

--
A programmer is a machine for converting coffee into code.

Re:Simple HTTP Solution by skraps · 2004-07-20 07:45 · Score: 5, Insightful
This "optimization" will not have any long-lasting benefits. There are at least three variables in this equation:
1. Number of users
2. Number of RSS feeds
3. Size of each request
This optimization only addresses #3, which is the least likely to grow as time goes on.
--
Karma: -2147483648 (Mostly affected by integer overflow)
Re:Simple HTTP Solution by ry4an · 2004-07-20 07:48 · Score: 3, Informative

Better than that they should use the HTTP 2616 If-Modified-Since: header in their GETs as specified in section 14.25. That way if it has changed they don't have to do a subsequent GET.

Someone did a nice write-up about doing so back in 2002.
Re:Simple HTTP Solution by johnbeat · 2004-07-20 07:55 · Score: 3, Informative

So, he's writing from infoworld and complaining that RSS feed readers grab feeds whether the data has changed or not. So, I went to look for infoworld's RSS feeds. Found them at:

http://www.infoworld.com/rss/rss_info.html

Trying the top news feed, got back:

date -u ; curl --head http://www.infoworld.com/rss/news.xml
Tue Jul 20 19:51:44 GMT 2004
HTTP/1.1 200 OK
Date: Tue, 20 Jul 2004 19:48:30 GMT
Server: Apache
Accept-Ranges: bytes
Content-Length: 7520
Content-Type: text/html; charset=UTF-8

How do I write an RSS reader that only downloads this feed if the data has changed?

Jerry
Re:Simple HTTP Solution by poot_rootbeer · 2004-07-20 08:01 · Score: 2, Insightful

There are at least three variables in this equation:
1. Number of users
2. Number of RSS feeds
3. Size of each request

And I'll add:
4. Time at which each request occurs

If RSS requests were evenly distributed throughout the hour, the problems would be minimal. When every single RSS reader assumes that updates should be checked exactly at X o'clock on the hour, you get problems.
Re:Simple HTTP Solution by jesser · 2004-07-20 08:08 · Score: 3, Insightful

Even if every RSS reader used HEAD (or if-modified-since) correctly, servers would still get hammered on the hour when the RSS feed has been updated during the hour. If-modified-since saves you bandwidth over the course of a day or month, but it doesn't reduce peak usage.

--
The shareholder is always right.
Re:Simple HTTP Solution by Jane_Dozey · 2004-07-20 08:27 · Score: 1

Do you think it would work if, on initial request the RSS feeder replied with a time in which to check for each induvidual reader? Say, on the hour, 1 minute past the hour, 2 minutes past the hour etc. If a standard was made to incorporate this then they would get staggard traffic and more control over the requests.
Just a thought.

--
Silly rabbit
Re:Simple HTTP Solution by Mandomania · 2004-07-20 09:17 · Score: 1

It's called a Conditional GET.

--
Mando
Re:Simple HTTP Solution by blowdart · 2004-07-20 09:44 · Score: 2, Informative

You're missing the point I assume the original poster was making.
Not all web servers provide last-modified or etag headers. Infoworld doesn't, so even a well written RSS reader has to bring the whole feed down as they have no way to know if it has changed or not.
Re:Simple HTTP Solution by Mandomania · 2004-07-20 10:02 · Score: 1

Not all web servers provide last-modified or etag headers. Infoworld doesn't...

According to the parent's post, Infoworld uses Apache which happnes to provide both. They are also both a part of the HTTP/1.1 spec., which any webserver should support.

It is a problem for those that don't provide them, tho.
Re:Simple HTTP Solution by johnbeat · 2004-07-20 10:34 · Score: 2, Informative

Uh, no.

Pastiche knows when the document was last modified and can support my writing an rss reader that checks last-modified:

curl --head http://fishbowl.pastiche.org/nerdfull.xml
HTTP/1. 1 200 OK
Date: Tue, 20 Jul 2004 22:16:33 GMT
Server: Apache/1.3.26 (Unix) Debian GNU/Linux mod_gzip/1.3.19.1a mod_jk/1.1.0
Last-Modified: Mon, 19 Jul 2004 02:52:46 GMT
ETag: "28620-8faa-40fb377e"
Accept-Ranges: bytes
Content-Length: 36778
Content-Type: text/xml

But infoworld does not. As far as I can tell from the headers I displayed in the previous post, infoworld's server does not provide such data. Without the last-modified or etag or something similar, there is no way to ask for a conditional get, because there is nothing to base the conditional on, and most likely the server doesn't know how to compare the conditional anyway since it clearly is not keeping track of when the document was last modified.

I could easily be getting the syntax wrong, but whenever I request that it only send me the xml feed if it has been last modified in the last fraction of a second, I still get the page back:

date > datestamp; curl --time-cond datestamp http://www.infoworld.com/rss/news.xml

This returns a bunch of xml.

Running the same command on Pastiche's xml feed returns, as I would expect, absolutely nothing:

date > datestamp; curl --time-cond datestamp http://fishbowl.pastiche.org/nerdfull.xml

Jerry
Re:Simple HTTP Solution by Mandomania · 2004-07-20 10:41 · Score: 1

I should have looked at curl before I posted. I thought that perhaps the Last-Modified and ETag headers were somehow being stripped or something.

Touche' :).
Re:Simple HTTP Solution by bergeron76 · 2004-07-20 11:10 · Score: 1

Another idea:

Have your pages serve up a warning message as a news item at the :00 of each hour. Ask your users to stagger their downloader / parser activity and not to pull it down "on the hour" or "on the half-hour".

Alternatively, you could just not serve feeds at :00 and :30. However, your readership wouldn't know why they were getting flaky performance from your feed.

--
Don't think that a small group of dedicated individuals can't change the world. It's the only thing that ever has.
Re:Simple HTTP Solution by Bluelive · 2004-07-20 11:41 · Score: 1

Most readers do that allready. At the same time some servers that host feeds dont support them.
Re:Simple HTTP Solution by Anonymous Coward · 2004-07-20 15:51 · Score: 0

Apache only provides them for static content (or if you create them with mod_headers/expires, etc). It does not create them for dynamic content... that's up to you.
Re:Simple HTTP Solution by nwbvt · 2004-07-21 06:34 · Score: 1

"And the feed rendering engines should make sure their last modified is accurate."
I'm working on an aggregator right now and I'm using a couple feeds to test it on. Currently my aggregator does only get the rss file if it has been modified since it was last checked (or at least should, I havn't finished testing it), the problem is it almost always finds that it has been modified, even if no new items have been added. I manually checked the feeds and found that yes, they had been updated in the last hour, at least according to the pubdate and lastBuildDate items. I guess the feed rendering engine is updating the feed regardless of whether any real changes were made, thus my aggregator checks it when there is nothing new available.

--
Mathematics is made of 50 percent formulas, 50 percent proofs, and 50 percent imagination.

Still haven't tried these newfangled RSS readers.. by Rezonant · 2004-07-20 07:35 · Score: 2, Interesting

...so could someone recommend a couple of really good ones for Windows and *nix?

Call me stupid by nebaz · 2004-07-20 07:35 · Score: 4, Informative

This is helpful.

--
Rhymes that keep their secrets will unfold behind the clouds.There upon the rainbow is the answer to a neverending story

Oh really? by Anonymous Coward · 2004-07-20 07:37 · Score: 0

Every hour, random sites "see a massive surge of /.'s news reader activity" that "has all the characteristics of a distributed DoS attack."

Slashdot (or as it should be called, "Sitefsck") is such a useful thing, it's unfortunate that it's ultimately just very stupid.

Editorializing in the blurb by Patik · 2004-07-20 07:37 · Score: 2, Insightful

I don't really care for RSS either, but damn, was that necessary?

Re:Editorializing in the blurb by sohojim · 2004-07-20 07:55 · Score: 2, Funny

Oh, you mean editors editorializing? Probably.
Re:Editorializing in the blurb by Anonymous Coward · 2004-07-20 17:19 · Score: 0

Hey, it came from CmdrTalkhole... what did you expect? Just be glad it wasn't michael!

Over the years? How about over the weekend? by Marxist+Hacker+42 · 2004-07-20 07:37 · Score: 5, Informative

We've seen similiar problems over the years. RSS (or as it should be called, "Speedfeed") is such a useful thing, it's unfortunate that it's ultimately just very stupid.

And it seems to have gotten worse since the new code was installed- I get 503 errors at the top of every hour now on slashdot.

--
SJW: a person who perceives an injustice, and while correcting it, commits a greater injustice.

What about a scheduler? by el-spectre · 2004-07-20 07:37 · Score: 4, Interesting

Since many clients request the new data every 30 minutes or so... how about a simple system that spreads out the load? A page that, based on some criteria (domain name, IP, random seed, round robin) gives each client a time it should check for updates (i.e. 17 past the hour).

Of course, this depends on the client to respect the request, but we already have systems that do (robots.txt), and they seem to work fairly well, most of the time.

--
"Faith: Belief without evidence in what is told by one who speaks without knowledge, of things without parallel." - A.B.

Re:What about a scheduler? by el-spectre · 2004-07-20 07:40 · Score: 1

Just thinking about it... all you really need is a script that has a cycling counter from 0-59, and responds to a GET. Take about 2 minutes to write in the language of your choice.

--
"Faith: Belief without evidence in what is told by one who speaks without knowledge, of things without parallel." - A.B.
Re:What about a scheduler? by cmdr_beeftaco · 2004-07-20 07:50 · Score: 5, Funny

Bad idea. Everyone knows that most headlines are made at the top of the hour. Thus, A.M. radio always give news headlines "at-the-top-of-hour." RSS reader should be given the same timely updates.
Related to this is the fact that most traffic accidents happen "on the twenties." Human nature is a curious and seemingly very predictable thing.
Re:What about a scheduler? by Retric · 2004-07-20 07:57 · Score: 1

Just check every hour after you log on / start it up. Shure there would be a minor bias to people loging on on the hour as many people get to work at 7:50 +/- 5 min or so but it's not all that bad and if you want to cover this one just add +/- 2 min to the interval aka make it 58 - 62 min and people will spread out quickly.

This realy has a lot more to do with 100,000 people checking the sight with in 30 seconds of each other than anything else.
Re:What about a scheduler? by jesser · 2004-07-20 08:04 · Score: 1

It would be simpler for RSS readers to generate a random time themselves rather than asking a server for a random time. I assume that RSS readers have a user option for when to check for updates; all that is needed is for the default to be random instead of "0 past the hour" for everyone.

--
The shareholder is always right.
Re:What about a scheduler? by dogas · 2004-07-20 08:05 · Score: 0, Redundant

well, I suppose a p2p-like solution could be coded. But of course, that solution would soon be illegal.

--
'When the going gets weird, the weird turn pro.' -HST
Re:What about a scheduler? by MarsDefenseMinister · 2004-07-20 08:07 · Score: 1

most traffic accidents happen "on the twenties."

Please explain further. What does this mean, and how do you know it?

--
No weapon in the arsenals of the world is so formidable as the will and moral courage of free men.-Ronald Reagan
Re:What about a scheduler? by el-spectre · 2004-07-20 08:26 · Score: 1

I dunno, in LA it's "On the ones"... maybe we'll schedule by call sign too?

--
"Faith: Belief without evidence in what is told by one who speaks without knowledge, of things without parallel." - A.B.
Re:What about a scheduler? by JimDabell · 2004-07-20 08:31 · Score: 2, Informative

RSS already supports the <ttl> element type, which indicates how long a client should wait before looking for an update. Additionally, HTTP servers can provide this information through the Expires header.

Furthermore, well-behaved clients issue a "conditional GET" that only requests the file if it has been updated, which cuts back on bandwidth quite a bit, as only a short response saying it hasn't been updated is necessary in most cases.
Re:What about a scheduler? by the+chao+goes+mu · 2004-07-20 08:41 · Score: 1

This is amusing. I have read three or four posts talking about "Everyone starts at the same time +/- 5 minutes", but the "same" start times they give range from 7:50 AM to 9:00 AM. Thus, it appears there is well over an hour of variation in the time "everyone" starts...

--
Boys from the City. Not yet caught by the Whirlwind of Progress. Feed soda pop to the thirsty pigs.
Re:What about a scheduler? by Anonymous Coward · 2004-07-20 08:44 · Score: 0

maybe military time? 20 = 5 pm (when most traffice accidents do occur). Beats me what the poster means exactly.
Re:What about a scheduler? by el-spectre · 2004-07-20 08:49 · Score: 1

True, but if the server did it, the site could self-adjust (too many folks clustered around :45? bump 1/2 of 'em to :15).

--
"Faith: Belief without evidence in what is told by one who speaks without knowledge, of things without parallel." - A.B.
Re:What about a scheduler? by cephyn · 2004-07-20 08:55 · Score: 0

The poster means that traffic accidents most commonly happen 20 minutes after and 20 minutes before the hour.

20 minutes after as everyone is leaving somewhere, and 20 minutes before as everyone is hurrying to get somewhere.

--
Moo.
Re:What about a scheduler? by awtbfb · 2004-07-20 08:56 · Score: 1

...how about a simple system that spreads out the load? A page that, based on some criteria (domain name, IP, random seed, round robin) gives each client a time it should check for updates (i.e. 17 past the hour).

That may require a change in standards/clients. At a simpler, more near term level, developers could just agree to have their client programs call a randomization function for the initial poll start time for the initial preference setting. All it would take would be the major 3-4 developers to agree to do this and you'd spread traffic out a lot. Most users would not change this number. Of course /. would still be screwed since most of us would.
Re:What about a scheduler? by el-spectre · 2004-07-20 09:01 · Score: 1

Do ya think it might be a reference to when the radio commonly gives traffic reports?

--
"Faith: Belief without evidence in what is told by one who speaks without knowledge, of things without parallel." - A.B.
Re:What about a scheduler? by Idarubicin · 2004-07-20 09:05 · Score: 1

Related to this is the fact that most traffic accidents happen "on the twenties." Human nature is a curious and seemingly very predictable thing.
For the sibling posts that don't understand what the parent is saying here, it's a tongue-in-cheek remark. Many news radio stations will deliver news, traffic, and/or weather reports at a consistent time after the hour. Presumably his local radio station gives traffic reports on the hours and at 20 and 40 minutes past--hence, 'on the twenties'.
In Toronto, the local AM radio news station (680 kHz) does traffic and weather 'on the ones'--at 1, 11, 21, 31, 41, and 51 minutes past the hour. (Actually, the reports are a little bit after that, since the station always plays an ad first.)
Similarly, his remark that most headlines occur at the top of the hour is equally facetious--he's just referring the practice of many radio stations to give a summary of the news headlines on the hour.

--
~Idarubicin
Re:What about a scheduler? by cephyn · 2004-07-20 09:10 · Score: 1

no. he said traffic accidents happen on the twenties. not traffic reports. it seems pretty unambiguous to me, im not sure what the confusion is.

--
Moo.
Re:What about a scheduler? by Anonymous Coward · 2004-07-20 09:14 · Score: 2, Funny

maybe military time? 20 = 5 pm

I sure hope you're not in the military. If you are then I highly recommend you make sure you haven't missed any appointments recently.
Re:What about a scheduler? by el-spectre · 2004-07-20 09:21 · Score: 1

it was (probably) a joke that most folks missed...

--
"Faith: Belief without evidence in what is told by one who speaks without knowledge, of things without parallel." - A.B.
Re:What about a scheduler? by Anonymous Coward · 2004-07-20 09:21 · Score: 0

that's the joke. since you only hear about accidents on the "20s" it follows that accidents only happen on the "20s." Haha, right?
Re:What about a scheduler? by Anonymous Coward · 2004-07-20 09:25 · Score: 0

This is amusing. I have read three or four posts talking about "Everyone starts at the same time +/- 5 minutes", but the "same" start times they give range from 7:50 AM to 9:00 AM. Thus, it appears there is well over an hour of variation in the time "everyone" starts...

Two words: Government Workers

(It's amazing how punctual they can be when it comes to arriving just in time for work and leaving right at 4pm.)
Re:What about a scheduler? by cephyn · 2004-07-20 09:27 · Score: 2, Insightful

no i think he's being serious. Since most people's schedules are based on the hour marks, it stands to chance that most people are rushing to get to their destination 20 minutes before the hour, and rushing out of their wherever 20 minutes after the hour. So, since the schedules are all synched, the traffic volume quickly swells 20 min before/after the hour and bam -- thats when you get the most accidents.

Most major cities I think have traffic reports more often than just on 20/40.

--
Moo.
Re:What about a scheduler? by Huogo · 2004-07-20 09:36 · Score: 1

But it holds true to at extent still - Some people I work with have to get there at 8:00, I have to be there at 8:30, some others at 10:00, etc. When was the last time you had to be at work for 8:23? If you do 1 hour after login, you're still going to have surges on the hour and half hour, because thats when people have to be at work. Thats what people are saying.
Re:What about a scheduler? by Aerion · 2004-07-20 09:44 · Score: 1

It's also well-known that the vast majority of weather events happen "on the ones." But then why can't the guy on TV predict them very well?
Re:What about a scheduler? by costas · 2004-07-20 10:02 · Score: 1

As a provider of personalized, customized RSS feeds, I really do like your idea. Essentially a "Next-Modification" HTTP header that would tie together with the "Last-Modified" header would help greatly. Right now, the best you can do is hope that the RSS client respects 304 error codes, and goes away when you tell it there is no updated content. That doesn't stop it though from coming back 15' later... RFC? :-)
Re:What about a scheduler? by JDevers · 2004-07-20 10:05 · Score: 1

Yea, but that is only national weather events...everyone knows that local weather happens "on the eights"
Re:What about a scheduler? by cephyn · 2004-07-20 10:44 · Score: 0

No, I think you're missing his point totally. He's not trying to be funny.

Bad idea. Everyone knows that most headlines are made at the top of the hour. Thus, A.M. radio always give news headlines "at-the-top-of-hour." RSS reader should be given the same timely updates.

What he's saying here is that people are conditioned to expect news headlines at the top of the hour. Erego, they EXPECT their RSS newsfeeds to grab news at the top of every hour. People are conditioned to check the news every hour, on the hour....thats why its the 5 oclock news, the 6 oclock news, highlights at 11. No one expects headlines at 3:28 pm, or highlights at 9:37. Similarly, they don't expect their newsfeeds to update at 8:22 and 9:14, they expect it at 8:00 and 9:00.

His traffic comment is that most traffic accidents DO happen 20 after and 20 before the hour -- it has nothing to do with when the radio announces it. See a sibling post for the full explanation.

--
Moo.
Re:What about a scheduler? by Anonymous Coward · 2004-07-20 11:54 · Score: 0

Thank god for my screwed up body clock. The only time I ever rush anywhere is when I'm running out the door in the morning for work. And since I'm on flex time, that's a 4 hour window...
Re:What about a scheduler? by adavidw · 2004-07-20 12:47 · Score: 1

Stop it. It was a joke. You didn't get the joke. Now stop it.
Re:What about a scheduler? by wolrahnaes · 2004-07-20 13:53 · Score: 1

What he's saying here is that people are conditioned to expect news headlines at the top of the hour.

Personally I expect my computer to do things better than my radio or TV. Rather than only getting my headlines on the hour, I want it to give them to me when they happen.

Also, I think the traffic comment was intended to be a joke, just also happened to be supportable with facts.

--
I used to get high on life, but I developed a tolerance. Now I need something stronger.
Re:What about a scheduler? by mutende · 2004-07-20 18:00 · Score: 1

Or simply have your feed reader check the RSS every, say, 71 minutes instead of every hour.

--
Unselfish actions pay back better
Re:What about a scheduler? by Sputum · 2004-07-20 22:01 · Score: 1

I'm not too fussed about getting my updates right on the hour, particularly if my news reader's going to be open all day. I can wait until 9:20 for the 9:00 news, thanks.

Providers could set up an interface for you to ask to get your updates earlier.

Definitely, if the solution is going to be spreading the load, it'd be best if it involves some sort of quality of service control system where the server can say when it's ready to process requests. And other than spreading the load, what can you do other than having proxy servers?

--
"What we imagine is order is merely the prevailing form of chaos"
Re:What about a scheduler? by parksie · 2004-07-21 02:37 · Score: 1

Mmmmmm 420. Too many stoned drivers %-)

They get hit every hour? by Anonymous Coward · 2004-07-20 07:37 · Score: 1, Interesting

Why not make it standard that the starting time is chosen randomly or assigned by the remote site? "Forty-three minutes after the hour is pretty empty, from now on you can check the news at that time" or something similar.

Re:They get hit every hour? by shadowcabbit · 2004-07-20 08:14 · Score: 1

That just time-shifts the attack. The idea is to spread the onslaught of requests evenly over the hour or over multiple sources.

RSS is a good idea in principle but this sort of problem is why publishing middle-men were created for paper-type publications. Content producers submit to a syndicator, who has the capability to distribute to the masses who want the content. I don't see why you couldn't do that for RSS feeds; you'd just have to make sure the company who holds the syndicating servers isn't evil. (Google GRSS, anyone?)

--
"Why Subscribe?" Good question...

RSS needs better TCP stacks by Russ+Nelson · 2004-07-20 07:37 · Score: 3, Interesting

RSS just needs better TCP stacks. Here's how it would work: when your RSS client connects to an RSS server, it would simply leave the connection open until the next time the RSS data got updated. Then you would receive a copy of the RSS content. You simply *couldn't* fetch data that hadn't been updated.

The reason this needs better TCP stacks is because every open connection is stored in kernel memory. That's not necessary. Once you have the connecting ip, port, and sequence number, those should go into a database, to be pulled out later when the content has been updated.
-russ

--
Don't piss off The Angry Economist

Re:RSS needs better TCP stacks by Russ+Nelson · 2004-07-20 07:42 · Score: 1

Several people have pointed out that you want a schedule. No, you don't. That just foments the stampeding herd of clients. You really want to allow people to connect whenever they want, and then receive data only when you're ready and able to send it back.

Basically, you use the TCP connection as a subscription. Call it "repeated confirmation of opt-in" if you want. Every time the user re-connects to get the next update (which they will probably do immediately; may as well) that's an indication that they want another copy. Everybody gets updates as soon as possible, just like email, only it's not possible to force data on everybody, as we've seen happen with email.

--
Don't piss off The Angry Economist
Re:RSS needs better TCP stacks by ganhawk · 2004-07-20 07:43 · Score: 1

I assume It will be DDOS'ed (intentionally or even otherwise due to internet hotspots).

Especially since scalling is a problem. Imagine millions of active connections with database storing the states of each connection. You need far more resources for that than the current system.

--
Python script to convert photos into "artsy" portraits: http://p2pbridge.sf.net/pyPortrait/
Re:RSS needs better TCP stacks by genixia · 2004-07-20 07:45 · Score: 3, Funny

Yeah, because there's nothing like using a sledgehammer to crack a hazlenut.

For starters, how about the readers play nice and spread their updates around a bit instead of all clamoring at the same time.
Re:RSS needs better TCP stacks by mgoodman · 2004-07-20 07:48 · Score: 1

I'm not sure the server could handle having that many open connections...hence its current process of providing an extremely small text file, creating a connection, transferring the file, and destroying the connection.

Correct me if I'm wrong.

--
01100111 01100101 01110100 00100000 01101111 01110101 01110100 00100000 01101101 01101111 01110010 01100101 00101110
Re:RSS needs better TCP stacks by EnderWiggnz · 2004-07-20 07:55 · Score: 3, Insightful

not needing user intervention is the effing POINT of rss.

its like saying - "java is great, except lets make it compiled, and platform specific"

--
... hi bingo ...
Re:RSS needs better TCP stacks by Salamander · 2004-07-20 07:58 · Score: 5, Insightful

Leaving thousands upon thousands of connections open on the server is a terrible idea no matter how well-implemented the TCP stack is. The real solution is to use some sort of distributed mirroring facility so everyone could connect to a nearby copy of the feed and spread the load. The even better solution would be to distribute asynchronous update notifications as well as data, because polling always sucks. Each client would then get a message saying "xxx has updated, please fetch a copy from your nearest mirror" only when the content changes, providing darn near optimal network efficiency.

--
Slashdot - News for Herds. Stuff that Splatters.
Re:RSS needs better TCP stacks by Anonymous Coward · 2004-07-20 08:04 · Score: 2, Insightful

Yeah, just use a database backend for TCP, good idea. Oh! I know! Lets use XML instead! Jesus christ, if you are this stupid, just shut your hole. Don't propose retarded solutions to problems you don't understand just cause you are bored.
Re:RSS needs better TCP stacks by normal_guy · 2004-07-20 08:07 · Score: 1

Need better TCP stacks? I don't understand why you would break standard TCP in order to accomplish a scattered update. If it's really an issue, the server should just set a hard limit on hits to uselessinfo.rss and cease to return anything other than a 20-byte error message once that limit has been reached. There's _much_ more potential for dDOS with a modified TCP stack. Sounds like you learned a little bit about protocols in your compsci class and now everything is a TCP issue -- the problem is poor coding and obsessive geeks, and not the transport or network layers.

--

Linux: Free if your time is worthless.
Re:RSS needs better TCP stacks by Wesley+Felter · 2004-07-20 08:15 · Score: 1

aka Newswire (maybe someday it will even be released).
Re:RSS needs better TCP stacks by Salamander · 2004-07-20 08:34 · Score: 1

Yes, like that. :-)

--
Slashdot - News for Herds. Stuff that Splatters.
Re:RSS needs better TCP stacks by arrow · 2004-07-20 08:35 · Score: 1

Karma be damned...

What the *fuck* are you talking about?

--
symetrix. We are building a religion, a limited edition.
Re:RSS needs better TCP stacks by timothyf · 2004-07-20 08:39 · Score: 1

RSS is implemented with XML and is generally transported via HTTP (although you could probably transport it over just about anything), like a web page. What does the TCP stack have to do with it?
Re:RSS needs better TCP stacks by HokieJP · 2004-07-20 08:42 · Score: 1

Pushing updates wouldn't work so well for people behind firewalls.

Plus, with your system, the server would have to send a billion updates every time any new content appeared. For frequently updated sites, this would probably increase the total traffic.

Sometimes stupid is the smartest way to go.
Re:RSS needs better TCP stacks by Ernesto+Alvarez · 2004-07-20 08:53 · Score: 1

Why would you want to change standard TCP to behave better when using a SPECIFIC layer 7 protocol, when it should be independent? What would happen under normal loads or when using other protocols? After all TCP is a stream/connection oriented protocol, not a "repeated confirmation of opt-in". TCP is not the place to implement that. Imagine if you had to use, say, SSH using this "delayed TCP". I push a key, it goes to the server, the server get the connection data from the database, services my request, commits the state (remember, TCP has a constantly changing state, congestion windows, receiver windows, SEQ numbers), and then does the same in order to send the answer. It would be an unuseable session.

Besides, that activity looks like a slashdotting, which has some of the properties of congestion. If you tried keeping all the connection data, you would simply en with a lot of pending connections and service to a little minority (basic networking lesson: infinite buffers do not solve congestion).
It would be better to reject some sessions to lighten the load hoping that the clients will retry in a reasonable time.

TCP, UDP and IP have remained unchanged for almost 25 years, and that is not coincidence. The people who invented them knew very well how to do their jobs.

--
GPG 0x1B479C78
Re:RSS needs better TCP stacks by Malc · 2004-07-20 09:47 · Score: 1

Just to give an example of a real world situation. We had a bug in a server-side application. With 25,000 connections open we had 900MB of memory allocated on a machine with only 512MB of physical RAM. Of course, this wasn't just the TCP connection overhead, but also things like IIS's EXTENSION_CONTROL_BLOCK per connection, etc. If a connection isn't needed again within a short period of time, close it!
Re:RSS needs better TCP stacks by Bill+Kendrick · 2004-07-20 10:04 · Score: 1

The real solution is to use some sort of distributed mirroring facility so everyone could connect to a nearby copy of the feed and spread the load.

Like Akamai and other similar distributed content providers. That's what they were invented for. :^)

-bill!
Re:RSS needs better TCP stacks by Salamander · 2004-07-20 10:28 · Score: 1

Like Akamai and other similar distributed content providers. That's what they were invented for.

...except that they don't really propagate update notifications. Just data. It's really a cache-consistency problem in disguise, but I didn't expect most slashdotters to grok that.

--
Slashdot - News for Herds. Stuff that Splatters.
Re:RSS needs better TCP stacks by sploxx · 2004-07-20 10:33 · Score: 1

Maybe RSS should instead use multicasting?
Re:RSS needs better TCP stacks by Anonymous Coward · 2004-07-20 10:42 · Score: 0

What are all these guys saying "wrong layer" "destroying the TCP standard" and what not bitching about?

What the f difference does it make where you store the TCP connection information? In a block of memory or in a database? How does allowing millions of simultanous connections instead of a thousand change TCP standards compliance??

Although unlikely to be ever implemented and not without disadvantages, at least the OP is looking and talking in the right direction!

Problem now: RSS is based on polling --> leads to demand peaks (often while there isn't even any news). Non-polling isn't possible because of all the NAT boxes/firewalls on the client side.

Solution: when the server already has an open connection to all clients (they are "subscribed"), it simply sends them the data when there is any.

Is this so hard to understand or something?? Stupid I-wanna-sound-like-I-understand-something-nay-saye rs!
Re:RSS needs better TCP stacks by Salamander · 2004-07-20 10:50 · Score: 1

Plus, with your system, the server would have to send a billion updates every time any new content appeared.

Only if there were a billion separate subscribers querying the origin server directly, but you raise a good point. In the architecture description for a project at my last job, which was pretty closely related to this problem, I described two fundamental causes of wasted bandwidth. #1 was sending the same data over the same link multiple times when one would have been sufficient - the problem with RSS as it currently exists. #2 was sending data over a link once when zero would have been sufficient (i.e. it's never needed at the far end). I think I also referred to these as Scylla and Charybdis, but maybe that was something else. Anyway, the point is that neither extreme works optimally. What you have to look at - this is a variant of the first rule of optimization as described in Hennessy and Patterson's Computer Architecture: A Quantitative Approach - is the relative frequency of each error. How many wasted requests are made with the current system, vs. how many wasted change notifications would there be in the system I proposed above? Also, how would those wasted requests/notifications be distributed?

I still contend that the "distributed push" system would be an improvement over the status quo, but the approach I actually used in the aforementioned project might be even better. There, it was primarily a "pull" model, though push was supported as well when a reasonable prediction could be made about a need for data (e.g. to deal with demand as people around the world get to work and start their morning surf). Pulling data didn't just pull it to the original requester, though; it also pulled copies into a series of intermediate caches so that future requests near that first requester wouldn't have to go all the way to the origin. Note that such an approach can use either polling or asynchronous invalidation (or both) for consistency, and still benefit from the distributed caching. I know it works, even with full consistency, because I did it over two years ago and it scaled just fine. The subsequent fate of that project has to do with the incredible short-sightedness of corporate weasels at a certain large storage vendor, and doesn't reflect on the technology at all. Maybe if I had some spare time I'd apply some of those ideas (the ones that aren't being patented) to a better distributed-RSS system.

--
Slashdot - News for Herds. Stuff that Splatters.
Re:RSS needs better TCP stacks by Russ+Nelson · 2004-07-20 15:09 · Score: 1

I'm not suggesting that users need to do anything special. The solution to the "people are hitting my web server for RSS feeds every ten minutes, or every hour on the hour" can be solved on a level much lower than the user level.
-russ
p.s. effing get it?

--
Don't piss off The Angry Economist
Re:RSS needs better TCP stacks by Russ+Nelson · 2004-07-20 15:17 · Score: 1

Dude, go google for "russ" if you think I'm some sort of newbie non-anonymous non-coward.
-russ

--
Don't piss off The Angry Economist
Re:RSS needs better TCP stacks by Russ+Nelson · 2004-07-20 15:21 · Score: 1

Honest to goodness, I wish idiots would do a little googling before they resorted to ad-hominem argumentation. At least know something about the person you're arguing against.
-russ
p.s. I'm not suggesting that anybody should break standard TCP. TCP the protocol is perfectly fine. TCP the implementations typically cannot hold millions of connections open.

--
Don't piss off The Angry Economist
Re:RSS needs better TCP stacks by Russ+Nelson · 2004-07-20 15:24 · Score: 1

Please read the subject that you posted this under. Did I say "RSS needs better TCP"? Or did I say "RSS needs better TCP stacks".

I'd be happy to reply in detail once we're talking about the same thing.
-russ

--
Don't piss off The Angry Economist
Re:RSS needs better TCP stacks by Russ+Nelson · 2004-07-20 15:32 · Score: 1

Try this:

echo 1000000 > /proc/sys/fs/file-max

and then see if your Linux box can handle a million simultaneous TCP connections. When you have rebooted your machine, you will be enlightened.

--
Don't piss off The Angry Economist
Re:RSS needs better TCP stacks by Russ+Nelson · 2004-07-20 15:37 · Score: 1

Clearly they haven't, given the replies to the main story and to my posting. AC: "Hey RSS should have a schedule for updates!" Russ: "Yeah! Just like the DNS!"
-russ

--
Don't piss off The Angry Economist
Re:RSS needs better TCP stacks by Russ+Nelson · 2004-07-20 15:45 · Score: 2, Interesting

Sigh. You don't get it, do you? You suggest different protocols, when TCP works just fine. The reason you want to stay with TCP is because of the infinite number of ways people have implemented TCP. Just as one HUGE example, consider all the people behind NAT. How are you going to "distribute asynchronous update notifications"?

I'd like to hear one person, just one person say "Hmmm.... I wonder why Russ didn't suggest asynchronous update notifications?" And then maybe go on to answer themselves by saying "Oh! I get it! Russ is right! Hey, that's a great idea! It's backwards compatible and yet does exactly what is needed to turn RSS into a packet-efficient protocol."

Instead, you get weenies who say something slightly more erudite than "duh" but which could be summarized thusly. You also get people (stand up and take a bow, Salamander) who say "Geez, that idea has OBVIOUS PROBLEMS" even though I obviously anticipate those OBVIOUS PROBLEMS and suggest a solution. Honestly, I see why people have such a low opinion of slashdot posters. Yer all a bunch of dummies!
-russ
p.s. pant, pant, pant, pant, okay, I feel better now.

--
Don't piss off The Angry Economist
Re:RSS needs better TCP stacks by TheLink · 2004-07-20 15:57 · Score: 1

Heh, I wonder how many people here actually understand what you're talking about. Looks like most don't.

You'd still have to change the RSS protocol. Plus I think HTTP keepalives aren't usually expected to stay alive for that long (even if it's allowed, many systems may make assumptions).

An issue you may have to deal with is some firewalls may time out and close the connection. Unless you send dummy packets, that'll complicate things.

Also I'm not sure if popular load balancers would like that sort of thing - these are used to more actively balance/distribute traffic to webservers - typically they also keep track of connections and response times. I daresay a few news sites use these things (rather than the DNS balancing method).

Perhaps this problem could be reduced if there was better and more transparent web caching spread about the world (ISPs etc). Then the caches take the load where possible.
--
- Too many replies beneath your current threshold
Re:RSS needs better TCP stacks by Lehk228 · 2004-07-20 17:06 · Score: 1

you ummm.... want to alter the network stack on the server OS so RSS works better... i Really hope you were kidding (though it certainly seems from the tone of your post that you are, and the other replies didn't pick up on it)

--
Snowden and Manning are heroes.
Re:RSS needs better TCP stacks by normal_guy · 2004-07-20 18:21 · Score: 1

TCP the implementations typically cannot hold millions of connections open.
My apologies for the compsci comment. It was uncalled for.
Regardless, replacing the 'close connection' portion of TCP with a database write to be used for subsequent re-attaching of an existing session merely moves the problem up a few layers. Portions of the millions of connections will still need to be reestablished -- requested by the clients (status quo), or with your method. The burden still exists.
The BitTorrent method is discussed elsewhere, I think one part of it is the best (application layer) solution. The normal TCP connect/data/disconnect-to-free-resource flow occurs, but part of the data is an update timer changed dynamically by load.
If a network layer change is needed, why not just have the clients listen for UDP packets, while the server sends out staccato connectionless uselessnews.rss based on the database you discussed?

--

Linux: Free if your time is worthless.
Re:RSS needs better TCP stacks by zatz · 2004-07-20 18:57 · Score: 1

It's a different problem, your analogy is not helpful.

DNS was designed for efficiently querying tiny records which are updated rarely. You don't have thousands of people interested in being alerted the moment your DNS records change, and you can even anticipate updates and plan around them.

A major use of RSS, on the other hand, is to make updates (which are often too large to fit in one datagram) visible as soon as possible, while facilitating a little aggregation on the client side. Caching is less useful than with DNS, because the data are larger and change much faster. Readers still outnumber writers, so caching can help, but I wouldn't characterize this as entirely a "cache-consistency problem".

--

Java: the COBOL of the new millenium.
Re:RSS needs better TCP stacks by zatz · 2004-07-20 20:05 · Score: 1

That's a cool idea. It's a pity most readers completely misunderstood you.

Unfortunately it still requires modifying RSS clients, and I think there are lots of lower-hanging fruit if you can do that.

Anyway, an IP, port, and two sequence numbers is only 14 bytes per connection; if only that was everything. The hard part is keeping track of all the other details required to implement TCP, and in less space and time than the kernel. MTU negotiation, reassembly, window size, retransmission timeouts... it adds up fast. Perhaps you could slide a "window" of a few thousand along your giant array of open connections... send them a TCP windowful of data when they enter, and if there are still ACKs outstanding when they reach the end of that "window", too bad. That way you can talk to a lot of clients without increasing your resident set size. I think you would find it very challenging to remain a compliant TCP implementation while doing all of this chicanery, though.

--

Java: the COBOL of the new millenium.
Re:RSS needs better TCP stacks by Salamander · 2004-07-20 23:01 · Score: 1

This might come as a blow to your ego, Russ, but it's not safe to assume that everyone has read every single suggestion you've ever made. I have no idea what you're talking about when you say you've suggested a solution to some obvious problems that I never referred to...and I don't care. If you wanted to be more helpful than combative maybe you could provide a link to this perfect solution in the places where you attempt to lambaste people for not knowing it.

--
Slashdot - News for Herds. Stuff that Splatters.
Re:RSS needs better TCP stacks by Salamander · 2004-07-20 23:12 · Score: 1

Ahhh, I forgot that you were the idiot who suggested leaving many thousands of TCP connections open all the time. I guess I assumed that was so beneath contempt that it couldn't possibly be what you were talking about. Tell you what, Russ. Why don't you go implement a few TCP stacks, then come back to us? Yes, I've done it. Your hand-wave about doing it out in user space with a database doesn't really solve the problem either. For one thing, vendors will not recompile their web servers to use your library for TCP. Just as importantly, putting it out in user space doesn't make the memory-consumption problem, which another poster already pointed out is not just TCP itself but other context that gets layered on top of each connection. People really do have better things to do with their memory and swap space than use it to store context for idle connections, and better things to do with their development time than solve the management problems that such large numbers involve.

You really do need to get some of those connections off the origin server altogether, which is where hierarchical or mesh approaches come in. As I mentioned in a previous post which you obviously didn't read before posting more ignorant garbage, that can still be a pull model using TCP if you feel you have to - it's just that not everyone is pulling directly from the origin. Take a bow yourself, Russ, for being the person in this thread who is most obviously uninformed, unqualified, and uninterested in listening to what others who overcame those failings a decade ago have to say. You're the George Bush of this thread.

--
Slashdot - News for Herds. Stuff that Splatters.
Re:RSS needs better TCP stacks by whitis · 2004-07-21 02:36 · Score: 1
RSS just needs better TCP stacks. Here's how it would work: when your RSS client connects to an RSS server, it would simply leave the connection open until the next time the RSS data got updated. Then you would receive a copy of the RSS content. You simply *couldn't* fetch data that hadn't been updated.
Happy anniversary, Russ! It looks like in 6 days it will be the 2^4th aniversity of your first Packet Driver release . For those who are still a bit wet behind the ears or (like me) are terrible with names, back in the MS-DOS days most DOS TCP/IP stacks used Packet Drivers to talk to the network card and Russ Nelson was the primary author of a large collection of public domain packet drivers.
Your suggestion is interesting and addresses some long standing problems which affect other services besides RSS feeds. I see two variations of it:
- Unmodified Client App., Unmodified Server App., Modified server TCP/IP stack. Unmodified client TCP/IP stack. Client still stupidly contacts the server every hour on the hour but the server is free to accept the connection and then blow the client off until, say, 17 minutes after the hour when it has finnished the prior requests. As long as the client doesn't time out and the client doesn't hang until it gets a response, everything is fine. To some extent, the backlog parameter on listen() can be used to give this effect but there may be hard limits or practical maximums.
- Modified client app, modified server, and modified server TCP/IP stack. Unmodified client TCP/IP stack. This works as you describe and seriously cuts down on unnecessary net traffic and minimizes latency in headlines arriving at client. Only TCP/IP keep alive packets need to be sent every 20 minutes or so.
The first variation has the advantage that it would be immediately deployable by overburdened web sites, once a kernel patch is availible, and the web sites are not at the mercy of many client developers doing the right thing in fixing their software and far more users actually upgrading their software. The first variation would pave the way for the second variation. A more limited but useful version of the first variation is almost availible right now without kernel mods: increase backlog to as high as possible and limit the number of server processes forked to prevent overloading the CPU to the point where it can't perform other required tasks. Unfortunately, the kernel limit SOMAXCONN will stop you at 128 connections backlogged.
In one sense, your suggestion is to hop out of the frying pan into the fire. Servers would experience the heavy load (TCP/IP connection-wise, not application-wise) they currently see at the top of the hour for the entire hour. But I am reminded of another metaphor from a submariner. I was once told that all sea vessels can go underwater and the speaker would rather be in a vessel that was at least designed to go underwater.
Another advantage to your approach of spiffing up the TCP/IP stack is that many other services could benefit. Forget HTTP Refresh bullshit; the browser will get new data as soon as it is availible and won't waste the servers time when new data isn't availible. Back before client pull, we had server push which was much better in many ways; unfortunately, OS limits on number of TCP connections/file handles severely affected the scalability of this approach. And think of what an improved TCP stack would do for IM servers such as Jabber where huge numbers of clients connect and yet many sit idle much of the time. POP, IMAP, and NNTP clients could also benefit from persist

Scheduling by mbbac · 2004-07-20 07:37 · Score: 1

RSS readers and aggregators shouldn't gather new feeds every hour on the hour. They should gather them when the application is first run and then every hour after that (probably not on the hour). I'd hope most GUI applications already run this way. I guess most of this traffic just comes from daemon processes -- and that should be changed.

--

mbbac

Re:Scheduling by Neon+Spiral+Injector · 2004-07-20 07:41 · Score: 1

Or update 23 or 25 times a day, so it always hits on a different minute mark.
Re:Scheduling by stratjakt · 2004-07-20 07:46 · Score: 1

No doubt its cron jobs and the like.

Does any flavor of cron have a "randomizing" function? Like, for instance, tell it "every hour on the hour, give or take 30 minutes"?

So it might look at 1:11, 2:25, 2:51, etc...

--
I don't need no instructions to know how to rock!!!!
Re:Scheduling by anomalous+cohort · 2004-07-21 01:40 · Score: 1

In CDF, the feed author specifies a window of time and the reader is supposed to randomly choose when in that window to fetch the data. Not as fancy as media filtering protocols such as those used by bit torrent but better then what RSS provides.
Re:Scheduling by takkaria · 2004-07-21 04:43 · Score: 1

Mm, sorry. I didn't realise this from my brief look at the CDF spec.

Revision of the Standard by novalogic · 2004-07-20 07:37 · Score: 1

RSS is infact living up to what it was made for, however, its getting used like a Chevy S-10 pulling a Semi trailer.

PHP just had a major overhaul, no reason why RSS2 shouldn't be on the drawing board. This time, though, more thought the scale of its use should be thought of.

--
--

Re:Revision of the Standard by cmdr_beeftaco · 2004-07-20 07:46 · Score: 2, Interesting

And there is a one word solution, peer to peer. The whole torrent concept is what is needed.
Re:Revision of the Standard by Anonymous Coward · 2004-07-20 08:16 · Score: 1, Funny

that's three words, idiot
Re:Revision of the Standard by MntlChaos · 2004-07-20 09:38 · Score: 1

And there is a one word solution, peer to peer. The whole torrent concept is what is needed.

No, the amount of overhead for something as small as an RSS file makes a torrent or other p2p network impractical.

You know how much traffic a p2p network deals with due to searches? Now multiply those searches every hour. Ouch. As for torrents, you still need a central tracker. BitTorrent works well for serving large files, because a small amount of control data controls a large amount of data transfer. Here you'd have a small amount of control data for a small amount of data transfer. So no, P2P would not work for distributing RSS feeds.

deluge of traffic... by Anonymous Coward · 2004-07-20 07:37 · Score: 0

if the deluge of traffic that RSS causes makes RSS "stupid," posting an article about the deluge of traffic RSS is causing on Slashdot is, at the very least, "ironic."

Idea by iamdrscience · 2004-07-20 07:38 · Score: 4, Interesting

Well maybe somebody should set something up to syndicate RSS feeds via a peer to peer service. BitTorrent would work, but it could be improved upon (people would still be grabbing a torrent every hour, so it wouldn't completely solve the problem).

Re:Idea by ganhawk · 2004-07-20 07:53 · Score: 5, Interesting

You could have a system based on JXTA. Instead of the bittorrent model, it would be something like the P2P Radio. When the user asks for feed, a neigbour who just recived it can give it to the user (overlay network, JXTA based) or the server can point to one of the users who just received it.(similar to bittorrent but user gets whole file from peer intead of parts. The user also does not come back to server at all, if transfer is successfull. But the problem is this user need not serve others and can just leech)

I feel overlay netwrok scheme would work better than Bittorrent/tracker based system. In overlay network scheme each group of network will have its own ultra peer (JXTA rendezvous) which acts as tracker for all files in that network. I wanted to do this for slashdot effect (p2pbridge.sf.net) but somehow the project has been delayed for long.

--
Python script to convert photos into "artsy" portraits: http://p2pbridge.sf.net/pyPortrait/
Re:Idea by Anonymous Coward · 2004-07-20 08:01 · Score: 0

however the size of the RSS data vs the size of a query to search for a peer with the data, your probably looking at 200-1000% overhead (vs just RSS from the source)

then you have to add junk like signing to avoid
"hot new models at New York Times.rss.pif.exe"

and the faked files with junk like:
"Bush said something smart today!",
"Bin Laden says he can't handle Bush's high IQ", "So-Damn-Insain is actually Bin Laden, we weren't so misdirrected after all",
"Tin-foil hats add 4 inches to your man pipe!"

-Joe 2-Keg
Re:Idea by iamdrscience · 2004-07-20 08:01 · Score: 1

neat.
Re:Idea by RedWizzard · 2004-07-20 08:31 · Score: 1

Well maybe somebody should set something up to syndicate RSS feeds via a peer to peer service. BitTorrent would work, but it could be improved upon (people would still be grabbing a torrent every hour, so it wouldn't completely solve the problem).
Peer to peer was my first thought too. It's designed to solve the exact same problem that RSS faces - too many connections hitting the server and using too much bandwidth. But rather than use an existing P2P system to distribute RSS content as you've suggested, I think RSS should be replaced with a dedicated P2P system. I'd have a central descriptor (like a torrent file) so that the publisher can track all subscribers. I'd have persistent connections so that the system can be mainly push rather than the current pull model. BitTorrent and other P2P systems are designed to efficiently allow large numbers of users to download a file, this P2P-RSS would be a system designed to efficiently allow large numbers of users to synchronize a file as it changes over time.
Re:Idea by Salamander · 2004-07-20 08:56 · Score: 1

BitTorrent as currently constituted would not work, because it's necessary to propagate change notification as well as data and BitTorrent doesn't do that. The closest technological fit, amusingly enough, predates P2P as most people know it. It's good old NNTP, which had to deal with the exact same problem over a decade ago and still does so even for very large networks.

--
Slashdot - News for Herds. Stuff that Splatters.
Re:Idea by kingman · 2004-07-20 08:58 · Score: 2, Informative

Shrook for Mac OS X appears to do almost that, where a central server collects updates and has ONE randomly-chosen client check for updates as frequently as every five minutes, but all other clients just refer to the central server to see if feeds are updated.
Re:Idea by Jerf · 2004-07-20 09:01 · Score: 1

I've looked into this.

One of the first defenses against bandwidth attack is to shrink the file, so things on the order of 5-10 K are not uncommon. As of this writing, index.rss on Slashdot is 12,605 bytes.

This causes a problem with BitTorrent; the overhead for communicating with the tracker eats that pretty fast. Remember it's not just about bytes; if downloading the index.rss entirely takes 11 packets, and BT-Tracker communication takes 50 packets but only uses 5KB total, your still not gaining much and you way well lose net throughput. And a gain of 2x in the face of exponential growth isn't much, anyhow; to "win", you need the majority of your consumers to not hit your servers at all.

The other problem is that RSS content is constantly changing, which BT handles poorly as it isn't designed for that. This also screws up Freenet and most other file-sharing protocols.

(Actually, Freenet is doable, I once coded a proof of concept, but it is inefficient and requires constant nursing from the author, so I don't consider it practical in the general case.)

There is also the issue of verifiability once you are downloading from a non-primary source, which isn't insurmountable but nobody has put all the pieces together yet.

I think a specialized P2P protocol would be necessary; you can fork from an existing system (and for that I'd recommend starting from BT, but you're still going to need to make a lot of changes) but no system I know of to date (and that includes several academic P2P systems you've never heard of most likely) that has an implementation matches this problem. The "constantly changing file of the same name" really throws a wrench in the whole thing; most systems can't handle that, those that can either don't scale or exist only as academic papers or restricted-access academic prototypes.

(And to those about to whack on the reply button explaining how it is just soooooooooo easy, please bear two things in mind: One, I did this about a year ago so there may be a system I didn't analyse, thought I don't think things have fundamentally changed, and two, I did this reseach as an implementer, not a theorizer, including a couple (anti-)proofs of concept implementation. Vague theorizing and handwaving without code isn't going to impress me; been here, tried to actually make it work.)
Re:Idea by Anonymous Coward · 2004-07-20 09:12 · Score: 0

I have already seen this being worked on. Using bittorrent to pull down RSS news feeds and allow for attachments with the newfeeds. This could be done at night so in the morning your news and attachments are ready. Take it to the next step and you have a distributed RSS newsfeed that means the bigger the subsriber base the wider the distribution method. Allowing for Bittorrent technology means that you get your feed from multiple sources and you feed multpile sources. You would also be able to control the dissemination of the data because you controlled when it is released into the bittorrent stream. After that it is all good..
Re:Idea by Anonymous Coward · 2004-07-20 09:20 · Score: 0

Should have searched /. before I posted. see story http://slashdot.org/article.pl?sid=04/03/17/178225 &tid=95&tid=185

link is for a bittorrent/RSS application.
Re:Idea by Anonymous Coward · 2004-07-20 09:59 · Score: 0

that sir, is a good idea
RSS over NNTP
alt.binary.rss.[country].[category]
with signed rss attachments and a subject line with the domain name.

add a rss-feed-description file with the public key for signing, the group and the subject to watch for which can be downloaded from the site in xml for import into rss-over-nntp reader.

category being stuff like 'tech','business','politics','weblog' and not 'news', there all 'news'...
Re:Idea by RyanK · 2004-07-20 10:43 · Score: 1

This may just be a situation where Konspire actually makes sense.

When this surfaced on Slashdot about a year ago, a lot of noise was made about it, but the fact is that it is tied to distribution cycles. Meaning a file/stream/whatever isn't passed on until some on has a complete copy.

This obviously doesn't work well if you want to distribute a DVD to a network, but would probably be reasonable for smaller files. Tiny files like RSS feeds could reach a very wide audience VERY quickly.
Re:Idea by Nurgled · 2004-07-20 12:21 · Score: 1

BitTorrent is the wrong model. What you want is something more like the NNTP model, but with persistant connections rather than polling. Content producers would sign their content (to prove it hasn't been modified/faked) and push it into a Peer-to-Peer cloud. The content item then gets shoved around the cloud so that interested parties can store it.

The most efficient approach is probably to create small clouds surrounding each content source rather than one big cloud shoving everything around, so the problem just becomes handling the initial bootstrap (like a BitTorrent tracker) rather than handling the content distribution to each client individually. However, the "one big cloud" solution also has advantages. It means that eventually people will be able to push in content which isn't from any website at all and have others subscribe to it. Clients would watch all passing data and snag what they are interested in. Replies to articles wouldn't go back through the originating server, they'd just get pushed around the cloud with a reply marker and clients would be able to produce a threaded display much as a USENET newsreader does.

Of course, this all needs clever swarming/distribution algorithms to avoid swamping all users with loads of data they don't care in the slightest about, but there's plenty of prior art out there to base it on.

Ahh, So that's what it stands for... by cedmond · 2004-07-20 07:38 · Score: 1

"Really stupid syndication"

--
----------------------------------
I'd rather not take sides until I hear the monkey's version - PHB

Google News by Dominatus · 2004-07-20 07:38 · Score: 1

I used to have an RSS feed for google news that I loved and used all the time, but it was taken down due to this effect. It's a shame that these things can't be handled better. (the RSS feed may be back up, I haven't checked in months)

One hour interval? by anynameleft · 2004-07-20 07:38 · Score: 2, Insightful

Why have developers made their RSS readers so that they query the master site at each hour sharp? Why haven't they done it like Opera or Konqueror, e.g. query the server every sixty minutes after the application has been started?

Or did the RSS reader authors hope that their applications wouldn't be used by anybody except for a few geeks?

Re:One hour interval? by r00zky · 2004-07-20 07:48 · Score: 1

At first i didn't understood the article...
You mean RSS readers are programmed to fetch the feed at hour xx:00??

That's fantastic

Some programmers should be shot...

--
I'm a chainsmokin' alcoholic sociopath, so-ci-o-path
Re:One hour interval? by Anonymous Coward · 2004-07-20 07:52 · Score: 0

Because, unfortunately, on the Macintosh, on the hour is the only option available (from Safari!)
Re:One hour interval? by AndroidCat · 2004-07-20 07:58 · Score: 1

I believe that's what SharpReader does. One thing I personally do is adjust the refresh rate for each feed from the one hour default. There's no point in banging on a feed every hour when it changes a few times a week.
One good idea would be for the protocols to allow each feed to suggest a default refresh rate. That way slow changing or overloaded sites could ask readers to slow down a little. A minimum refresh warning rate would be good too. (i.e. Refreshing faster than that rate might get you nuked.) I know that some things are already in the protocols, but a better set of Netiquette for Blogreaders would be a good idea.

--
One line blog. I hear that they're called Twitters now.
Re:One hour interval? by mayotte · 2004-07-20 07:59 · Score: 1

Or how about every 60 minutes +/- 5 minutes
Re:One hour interval? by gyrojoe · 2004-07-20 08:31 · Score: 1

Perhaps everyone just opens their news reader on the hour!

"it's the connection overhead, stupid" by SuperBanana · 2004-07-20 07:39 · Score: 4, Informative

...is what one would say to the designers of RSS.

Mainly, IF your client is smart enough to communicate that it only needs part of the page, guess what? The pages, especially after gzip compression(which, including with mod_gzip, can be done ahead of time)...the real overhead is all the nonsense, both on a protocol level and for the server in terms of CPU time, of opening+closing a TCP connection.

It's also the fault of the designers for not including strict rules as part of the standard for how frequently the client is allowed to check back, and, duh, the client shouldn't be user-configured to check at common times, like on the hour.

Bram figured this out with BitTorrent- the server can instruct the client on when it should next check back.

--
Please help metamoderate.

Re:"it's the connection overhead, stupid" by Russ+Nelson · 2004-07-20 07:44 · Score: 1

No, it's not necessary to add scheduling. All that's needed is better TCP stacks which can handle millions of concurrent open connections. Presumably this would happen in a database in userland, and not in the kernel.
-russ

--
Don't piss off The Angry Economist
Re:"it's the connection overhead, stupid" by Wesley+Felter · 2004-07-20 08:09 · Score: 1

It's also the fault of the designers for not including strict rules as part of the standard for how frequently the client is allowed to check back

Inevitably the most popular clients are the most poorly-written ones which ignore as much of the spec as possible. Telling them what they should do is useless, because they don't listen.

As an example, consider all the broken BitTorrent implementations out there.
Re:"it's the connection overhead, stupid" by the+chao+goes+mu · 2004-07-20 08:58 · Score: 1

Millions of lookups in a userland database would be more efficient than persistent connections?

--
Boys from the City. Not yet caught by the Whirlwind of Progress. Feed soda pop to the thirsty pigs.
Re:"it's the connection overhead, stupid" by myov · 2004-07-20 09:06 · Score: 1

It's also the fault of the designers for not including strict rules as part of the standard for how frequently the client is allowed to check back, and, duh, the client shouldn't be user-configured to check at common times, like on the hour.

IIRC, there is a how-frequently-this-feed-is-updated header available, but clients don't actually use it.

I'm wondering why clients don't have the ability to change the refresh rate for feeds - I like to update the main /. page hourly, but the Apple feed doesn't change that often. My trick has been to setup my webserver to wget the other feeds once a day, and hit my own server hourly.

I agree with your idea though - the end user shouldn't be able to decide how frequently to update. Why hit a web server every 30 minutes if the feed changes once a day?

--
I use Macs to up my productivity, so up yours Microsoft!
Re:"it's the connection overhead, stupid" by devnullify · 2004-07-20 09:33 · Score: 1

The real overhead is more probably the database query and code that runs on the server to generate the RSS. I highly doubt this is a bandwidth issue, especially since they're mentioning scalability issues. They're having trouble distributing the processing and aren't happy that they even have to for a once-an-hour peak that never gets approached 98% of the time.
Re:"it's the connection overhead, stupid" by Anonymous Coward · 2004-07-20 10:47 · Score: 0

You wouldn't need "millions" of lookups. It's simply "SELECT IP, Port FROM Connection WHERE interested in story x" _once_ there is a new story x.

And returning "millions" of records is what databases do.
Re:"it's the connection overhead, stupid" by Russ+Nelson · 2004-07-20 15:07 · Score: 1

I'm not sure you understand the problem. Try compiling your kernel to support a million persistent connections.
-russ

--
Don't piss off The Angry Economist
Re:"it's the connection overhead, stupid" by subreality · 2004-07-20 15:35 · Score: 1

...the real overhead is all the nonsense, both on a protocol level and for the server in terms of CPU time, of opening+closing a TCP connection.

Those who do not understand TCP are doomed to reinvent it, poorly. The overhead isn't nonsense, and in any modern IP stack running on a modern CPU, it doesn't require a lot of CPU time.

The bottlenecks that most small sites I've run encounter, in order, are:

1) Upstream bandwidth
2) Application server CPU utilization
3) Firewall state table size
4) Limits on TIME_WAIT states

Look up T/TCP for probably the best example of an improved TCP for small transactions. It has a drawback: it's defenseless againt real DDOSes.

Good luck with your reinvention of layer 4. If you create something with significantly better connection performance, without catastrophic drawbacks, draft an RFC, and you'll go down in geek history.
Re:"it's the connection overhead, stupid" by bedessen · 2004-07-28 16:17 · Score: 1

Bram figured this out with BitTorrent- the server can instruct the client on when it should next check back. ... and most clients ignore it completely. I've seen clients reannounce every 5 minutes even though the tracker says 30. Nice idea, but in the real world it doesn't work that way.

Why are we writing polling software in 2004? by skraps · 2004-07-20 07:39 · Score: 1, Flamebait

The guy who came up with the idea for RSS should be sent back to comp. sci. 101. It should have been readily apparent from day 1 that this would be a problem.

Some sort of peer-to-peer event-driven model would be a better match for this problem.

--
Karma: -2147483648 (Mostly affected by integer overflow)

Re:Why are we writing polling software in 2004? by Gooner · 2004-07-20 07:46 · Score: 1

Y'know the parent comment is one time I'm more than willing to let Dave Winner grab all the "glory" within the creation of RSS. Oh and if this turns into a flamewar over who invented what then let me get my CDF shout out, erm, out there.
Re:Why are we writing polling software in 2004? by Anonymous Coward · 2004-07-20 08:16 · Score: 0

a) What does peer-to-peer have to do with it?
b) Eventing over the internet would be nice. There are many reasons it wouldn't work. Good luck coming up with a method that's fast, scalable, secure, and portable.

Yeah, polling is inefficient. That's well known. If you think you can come up with a good way of doing eventing over the internet when nobody else has, please go for it. I'd be very happy to see it.

it's the PULL,stupid by kisrael · 2004-07-20 07:40 · Score: 3, Interesting

"Despite 'only' being XML, RSS is the driving force fulfilling the Web's original promise: making the Web useful in an exciting, real-time way."

Err, did I miss the meeting where that was declared as the Web's original promise?

Anyway, the trouble is pretty obvious: RSS is just a polling mechanism to do fakey Push. (Wired had an interesting retrospective on their infamous "PUSH IS THE FUTURE" hand cover about PointCast.) And that's expensive, the cyber equivalent of a hoarde of screaming children asking "Are we there yet? Are we there yet? How about now? Are we there yet now? Are we there yet?" It would be good if we had an equally widely used "true Push" standard, where remote clients would register as listeners, and then the server can actually publish new content to the remote sites. However, in today's heavily firewall'd internet, I dunno if that would work so well, especially for home users.

I dunno. I kind of admit to not really grokking RSS, for me, the presentation is too much of the total package. (Or maybe I'm bitter because the weird intraday format that emerged for my own site doesn't really lend itself to RSS-ification...)

--
SO YOU'RE GOING TO DIE: The Comic for Dealing with Death

Re:it's the PULL,stupid by archen · 2004-07-20 08:34 · Score: 1

I think your right that the problem would probably be lessened by a "push" structure. But as you say firewalls are a problem there.

Now I haven't thought about this really deeply, but there is a push system that would probably make it through firewalls. Basically just send the feed out by email. You "bcc" a message to "X" ammount of clients, they can then hammer their own pop3 servers or recieve it directly. You can have some sort of validation system by certificates. Not as simple as just connecting to a server, but as far as hacks go, it would aliviate the firewall issue.
Re:it's the PULL,stupid by temojen · 2004-07-20 09:12 · Score: 1
Anyway, the trouble is pretty obvious: RSS is just a polling mechanism to do fakey Push. (Wired had an interesting retrospective on their infamous "PUSH IS THE FUTURE" hand cover about PointCast.) And that's expensive, the cyber equivalent of a hoarde of screaming children asking "Are we there yet? Are we there yet? How about now? Are we there yet now? Are we there yet?" It would be good if we had an equally widely used "true Push" standard, where remote clients would register as listeners, and then the server can actually publish new content to the remote sites. However, in today's heavily firewall'd internet, I dunno if that would work so well, especially for home users.

Actually, it's more like someone saying over and over: "What's New?"

How to make this work well, without overloading anything is the trick. I figgure there are two things to do.
1. On the Server Side make the RSS feed a static file, updated automatically only when there are changes. Then serve it with a standard cache/webserver setup.
2. On the client side make RSS readers properly use the If-Modified-Since header. It's important that this be a request for a static file because most webservers will always run the script if the URL refers to one.
This changes the conversation in most cases from "What's New?"-"(thinking...thinking...)These Things are relatively new." to "What's New?"-"Nothing." most of the time.
Re:it's the PULL,stupid by antiher0 · 2004-07-20 09:22 · Score: 1

It seems to me that the technology already exists for true push. The functionality you mention (specifically the registering of listeners and the notification of changes or pushing of content) is part of XML web services. There's a standard out there, but I'm going to point you to some Microsoft resources out of spite. :)
Re:it's the PULL,stupid by sploxx · 2004-07-20 10:36 · Score: 1

As I asked above, isn't it not only missing "push" but also missing multicast support?
RSS is mostly news and for example /. should profit considerable from only one outgoing RSS-multicast-stream...
Re:it's the PULL,stupid by Bluelive · 2004-07-20 11:31 · Score: 1

Push is easy to make. Getting Push used is something that is very hard. For the same reason why soap seems more popular then rpc, well have to wait a while for it. People are just too happy piggybacking stuff onto http.
Re:it's the PULL,stupid by Eythian · 2004-07-20 17:23 · Score: 1

Or have the clients use the jabber network. The client program logs on using a special resource, registers itself with a bot, and when there is data, the bot sends it to that resource if it is online.

(In jabber, a resource is a marker to seperate the same user being logged on multiple times, e.g. I have Uni and Home as resources to my jabber ID, depending on where I am, so people can send me a message to reach me specifically at home if they like)

it's not stupid by Tumbleweed · 2004-07-20 07:40 · Score: 1

It's 'simple,' stupid. :)

Proposed Solution by Dominatus · 2004-07-20 07:41 · Score: 2, Interesting

Here's a solution: Have the RSS readers grab data every hour or half hour starting from when they are started up, not on the hour. This would of course distribute the "attacks" on the server.

As a self-appointed representative of RSS, ... by burgburgburg · 2004-07-20 07:41 · Score: 1

I'd like to dispute the characterization of my client as stupid.

I'd really, really like to.

Obviously, I can't, but boy would I like to.

Stupid RSS.

Poisson distribution by Anonymous Coward · 2004-07-20 07:42 · Score: 1, Insightful

We use poisson distribution to even out the load our scripts generate.

Re:Poisson distribution by JeffWhitledge · 2004-07-20 09:42 · Score: 0

We use ball-bearings. Everybody's using ball-bearings now-a-days.

Hey! It's all ball bearings nowadays. Now you prepare that Fetzer valve with some 3-in-1 oil and some gauze pads, and I'm gonna need 'bout ten quarts of anti-freeze, preferably Prestone. No, no make that Quaker State.

--
These comments do express the opinions of my employers, and, personally, I think they're complete rubbish.

Server side and client side fixes. by Inoshiro · 2004-07-20 07:42 · Score: 1

In any commons, co-operation is key. I doubt most people will update their clients to work with HEAD or some sort of checksumming without reason, so the first obvious step is to block clients for a period. If a client retrieves information from a host, place a bam on all requests from said client until either the information changes, or there is a timeout value.

On the client side, the software needs to be written to check for updates to the data before pulling the data. This will lessen the burder.

The other side of the problem is the fact that the clients default to asking for data at the top of the hour. As this scales up, even with checks to see if data has changed, you'll be seeing a synchronized rise in traffic which leads to a DDoS effect on systems. To fix this is the same way we fixed message ids: the interval that clients check the data on should be seeded semi-random intervals such that no more than subset n of the total i clients are checking for new data or transfering new data at any given time. This is something else that can be mitigated by having smarter server-side data blocks until users update to smarter clients. Otherwise the servers risk being DDoSed by these legions of stupid clients ;)

--
--
Internet Explorer (n): Another bug -- that is, a feature that can't be turned off -- in Windows.

Random request time by sanpitch · 2004-07-20 07:42 · Score: 1

Newsreaders and users could both use random request times, rather than defaulting to the top of the hour.

Re:Still haven't tried these newfangled RSS reader by Anonymous Coward · 2004-07-20 07:43 · Score: 0

Off topic indeed, this clearly should have been an Ask Slashdot.

random check intervals? by Hunterdvs · 2004-07-20 07:45 · Score: 2, Insightful

Why not have rss readers that check on startup, then check again at user specified intervals.. After a random amount of time has past.
user starts program at 3.15 and it checks rss feed.
user sets check interval to 1 hour.
rand()%60 minutes later (let's say 37) it checks feed
every hour after that it checks the feed.

simplistic sure, but isn't rss in general?

on an aside, any of you (few) non-programmers interested in creating rss feeds, i put out some software that facilitates it.
hunterdavis.com/ssrss.html

I thought it should be called by Anonymous Coward · 2004-07-20 07:45 · Score: 0

Arse-feed

Push, not pull! by mcrbids · 2004-07-20 07:46 · Score: 4, Interesting

The basic problem with RSS is that it's a "pull" method - RSS clients have to make periodic requests "just to see". Also, there's no effective way to mirror content.

That's just plain retarded.

What they *should* do...

1) Content should be pushed from the source, so only *necessary* traffic is generated. It should be encrypted with a certificate so that clients can be sure they're getting content from the "right" server.

2) Any RSS client should also be able to act as a server, NTP style. Because of the certificate used in #1, this could be done easily while still ensuring that the content came from the "real" source.

3) Subscription to the RSS feed could be done on a "hand-off" basis. In other words, a client makes a request to be added to the update pool on the root RSS server. It either accepts the request, or redirects the client to one its already set up clients. Whereupon the process starts all over again. The client requests subscription to the service, and the request is either accepted or deferred. Wash, rinse, repeat until the subscription is accepted.

The result of this would be a system that could scale to just about any size, easily.

Anybody want to write it? (Unfortunately, my time is TAPPED!)

--
I have no problem with your religion until you decide it's reason to deprive others of the truth.

Re:Push, not pull! by stratjakt · 2004-07-20 07:58 · Score: 3, Interesting

Too many firewalls in todays world for "push" anything to work.

Too many upstream bandwidth restrictions, especially on home connections. Last thing people want is getting AUPped because they're mirroring slashdot headlines.

My solution? Multicast IPs. Multicast IPs solve every problem that's ever been encountered by mankind. Join Multicast, listen till you've heard all the headlines (which repeat ad nauseum), move on with life. Heck, keep listening if ya want. All we have to do is make it work.

Frankly, who said you have to let everyone in the world on your RSS feed. If your server cant handle X concurrent RSS requests, it's hardly the protocols "fault", IMO.

--
I don't need no instructions to know how to rock!!!!
Re:Push, not pull! by ikegami · 2004-07-20 07:59 · Score: 1

Pushed contents sounds like the obvious solution, but it's hard to push content to a client behind NAT (home "routers") or a firewall.
Re:Push, not pull! by stratjakt · 2004-07-20 08:02 · Score: 2, Insightful

Doubly so if I want RSS content on multiple machines behind NAT. One person gets slashdot headlines, another CNN or whatever. Simple port forwarding won't solve that problem.

"Push" is dead. "Push" was stillborn. The very climate w.r.t internet security is not disposed to "hey lets let remote servers push stuff into our network!"

--
I don't need no instructions to know how to rock!!!!
Re:Push, not pull! by Kentamanos · 2004-07-20 08:14 · Score: 1

You are correct.

Multicast was designed for this type of thing (delivering the same content to LOTS of people without needing LOTS of bandwidth). It has a lot of the push model features without the headache.
Re:Push, not pull! by dgp · 2004-07-20 08:31 · Score: 1

NAT is not a problem. The client connects to the RSS server and holds a TCP connection open. Then updates are "pushed" at will.
Re:Push, not pull! by guard952 · 2004-07-20 08:32 · Score: 1

How about all feeds are encrypted with the GPG key of the original server, and passed along as binaries. Clients would only need to read the public GPG key from the original server once to check that what they were reading was an original source.
Re:Push, not pull! by rfsayre · 2004-07-20 08:33 · Score: 1

The basic problem with RSS is that it's a "pull" method

That may be true, but the basic problem with Infoworld's server is that it returns 200 for every single fetch of a feed. Using conditional GETs and Cache-Control headers would be a big help for them.

Robert Sayre
Re:Push, not pull! by dgp · 2004-07-20 08:33 · Score: 1

push doesn't have to mean the server opens a connection to the client.

push can mean the client opens a connection to the server, like any other HTTP transaction, and the client leaves the connection open. the server 'pushes' data whenever it wants over this open TCP connection.
Re:Push, not pull! by laird · 2004-07-20 08:37 · Score: 3, Informative

The ICE syndication protocol has solved this. See http://www.icestandard.org.

The short version is that ICE is far more bandwidth efficient than RSS because:
- the syndicator and subscriber can negotiate whether to push or pull the content. So if the network allows for true push, the syndicator can push the updates, which is most efficient. This eliminates all of the "check every hour" that crushes RSS syndicators. And while many home users are behind NAT, web sites aren't, and web sites generate tons of syndication traffic that could be handled way more efficiently by ICE. Push means that there are many fewer updates transmitted, and that the updates that are sent are more timely.
- ICE supports incremental updates, so the syndicator can send only the new or changed information. This means that the updates that are transmitted are far more efficient. For example, rather than responding to 99% of updates with "here are the same ten stories I sent you last time" you can reply with a tiny "no new stories" message.
- ICE also has a scheduling mechanism, so you can tell a subscriber exactly how often you update (e.g. hourly, 9-5, Monday-Friday). This means that even for polling, you're not wasting resources being polled all night. This saves tons of bandwidth for people doing pull updates.

--
Enable 3D printed prosthetics!
Re:Push, not pull! by Anonymous Coward · 2004-07-20 08:43 · Score: 0

You want to hold open potentially tens of thousands of TCP connections? I thought we were trying to avoid making this act like a DDoS.
Re:Push, not pull! by ivan256 · 2004-07-20 08:44 · Score: 1

Too many firewalls in todays world for "push" anything to work.

Which explains why e-mail doesn't work.
Re:Push, not pull! by stratjakt · 2004-07-20 08:45 · Score: 1

Since the root of the problem is that the server can't handle the load in the first place, I don't see how this would help whatsoever.

--
I don't need no instructions to know how to rock!!!!
Re:Push, not pull! by dgp · 2004-07-20 09:02 · Score: 1

Its the load of receiving a new TCP connection, processing its request for new data that is the problem.

Say you read boingboing.net's RSS feed over one work week and it has two new articles a day. the server load would be something like this:

Monday:
8am 1. TCP Session open
8am 2. Receive request for new articles, send any new articles.
10am 3. Send new article
12pm 4. Send new article

Tuesday-Thursday:
10am 1. Send new article
12pm 2. Send new article

Friday:
10am 1. Send new article
12pm 2. Send new article
5pm 2. Client closed TCP session

I know this is really vague but thats about 12 "steps" of work.

Now instead, for the same 2 new articles a day, have the RSS client check every 30 minutes.

Monday-Friday
12:30am 1. Client Opens TCP session
12:30am 2. Client asks for new articles, send any new articles
12:31am 3. Client closes TCP session

Thats 3 x 24 half-hours in a day x 5 days or 360 "steps"!

using a publish-subscribe mechanism would greatly reduce the load on the server, even if the clients are still initiating the TCP connection.
Re:Push, not pull! by dgp · 2004-07-20 09:04 · Score: 1

whoops that should be 48 half hours in a day :). 3x48x5 = 720 steps.
Re:Push, not pull! by .pentai. · 2004-07-20 09:08 · Score: 1

Email isn't a push system...

You connect to a mail server and pull your mail down.
Atleast, on client machines. Servers get the mail pushed to them, but guess what, people aren't viewing RSS data on their mail servers which have access to port 25 allowed through the firewall, it's the 15 people in the office behind the firewall that want to know what news is out there...

My email isn't pushed to my home laptop, it goes to my mail server, where my mail client pulls it from...
Re:Push, not pull! by ikegami · 2004-07-20 09:24 · Score: 1

Keeping an open connection to the server may work with a 1000 clients, but the article is discussing scalability. Can servers handle a million silmultaneously open connections?
Re:Push, not pull! by ivan256 · 2004-07-20 09:47 · Score: 1

Email isn't a push system...

Yes it is. The entire delivery of a message to you from the sender's perspective is push. If you choose to use a pull style system to read your mail, such as POP3, that is your choice, and not required in any way. Technically, nothing is stoping you from doing what thousands of users have been doing for decades and just opening the mailspool with a client that is running on the server, or having your client machine be the server.

Sure, a pull based approach to reading mail may be what makes firewalls a non-issue, but that's internal to your network and doesn't impose any costs on the content generator. In the context of this discussion, pull based e-mail would be like you logging into the server of everybody who may want to send you e-mail and checking to see if they have a message for you.
Re:Push, not pull! by krogoth · 2004-07-20 11:56 · Score: 1

Reading websites is also a pull method - if you have the same access patterns, RSS traffic is much more efficient. Frequent automated checks increase the load, but you don't really need those unless you're running a news ticker...

--

They that quote Benjamin Franklin on liberty and safety deserve neither.
Re:Push, not pull! by PW2 · 2004-07-20 12:13 · Score: 1

Maybe something like RSS could be extended to work like usenet where the headlines are cached locally and updated in a distributed manner.
Re:Push, not pull! by LionKimbro · 2004-07-20 14:56 · Score: 1

You might want to investigate DingDing, an Event System.

It supports XML-RPC, Query -based Subscription, and Transparent Messaging.

I'm in the middle of working v5 of it, which has a more consistent API, security and privacy features, and more "exhibition" as well- you can look at the server, and see who's subscribed, get a whole lot of other data as well. It also has an ULI port, so it's easy to query from Jabber, IRC, the command line, wherever you can communicate a line.

I seem to remember... by Misch · 2004-07-20 07:46 · Score: 4, Interesting

I seem to remember Windows scheduler being able to randomize scheduled event times within a 1 hour period. I think our RSS feeders need similar functions.

--

--You will rephrase your request for me to go to hell. Goto statements are not acceptable programming constructs

Re:I seem to remember... by burns210 · 2004-07-20 10:58 · Score: 1

why not each client RSS reader be offset a random ammount during install... 1 client might check on 13 minutes after the hour, while another might check 57 minutes after the hour... randomly distributing the load.
Re:I seem to remember... by da_fiend · 2004-07-20 11:07 · Score: 1

Can this really be? A slashdot post suggesting Microsoft's implementations are better than those of open standard? And Rated more than a two as well... I think my connection is playing up again.

Won't help by Animats · 2004-07-20 07:47 · Score: 1, Interesting

Doesn't matter. If lots of people poll every hour, eventually the polls will synch up. There's a neat phase-locking effect. Van Jacobson analyzed this over twenty years ago.

We have way too much traffic from dumb P2P schemes today, considering the relatively small volume of new content being distributed.

Re:Won't help by AndroidCat · 2004-07-20 08:19 · Score: 2, Interesting

Maybe not--it depends on how the programs work. If they check a feed an hour from the start of the last check rather than an hour from when the last check ended, they won't drift.
However, the smart money is on Murphy. :)

--
One line blog. I hear that they're called Twitters now.
Re:Won't help by DaCool42 · 2004-07-20 08:22 · Score: 1

Why would that be? If I start at 1:37, and check every hour, It will always be xx:37. There's no drifting.

--

----
All of whose base are belong to the what-now?

Simple Solution by prichardson · 2004-07-20 07:47 · Score: 0, Redundant

Ask RSS reader writers to program into their programs a suggestion that the refresh not be on the hour. It would distribute the load more evenly. Getting people to actually do this is another problem. Sounds to me like a little bit of lazy coding (not checking modified times in header), and a little bit of ignorance (RSS isn't big enough to cause a problem.... so doing this on the hour is OK right?) have just snowballed.

--
Help I'm a rock.

Re:Still haven't tried these newfangled RSS reader by maharg · 2004-07-20 07:47 · Score: 3, Interesting

RSSOwl - http://rssowl.sourceforge.net/ is pretty good.

--

$ strings FTP.EXE | grep Copyright
@(#) Copyright (c) 1983 The Regents of the University of California.

Re:Still haven't tried these newfangled RSS reader by Dr.+Sp0ng · 2004-07-20 07:49 · Score: 4, Informative

On Windows I use RSS Bandit. Haven't found a non-sucky one for *nix, although I haven't looked all that hard. On OS X I use NetNewsWire, which while not great, does the job.

Re:Still haven't tried these newfangled RSS reader by Neil+Blender · 2004-07-20 07:50 · Score: 0

Still haven't tried these newfangled RSS readers.. (Score:3, Informative)
by Rezonant (775417) on 2004-07-20 12:35 (#9752026) ...so could someone recommend a couple of really good ones for Windows and *nix?

Ok. How is a question informative? Or is the fact that Rezonant has never used an RSS reader informative? Here's a +5 Informative for you: I haven't used an RSS reader either.

XML Bloat... by Anonymous Coward · 2004-07-20 07:51 · Score: 0

Of course, XML bloat has nothing to do with this.

Cache it by Sloppy · 2004-07-20 07:51 · Score: 1

IMHO, most ISPs (and their ISPs) should run caching proxies for http. (And of course, servers need to be less stupid about advising against caching content.) I just don't understand why they don't. Most of them already run a nameserver, mail server, maybe a usenet spool, etc. What's one more service?

It might not seem like it's worth much effort if a bunch of your customers are all downloading a few hundred bytes of headlines every hour, but it probably matters when they're all downloading movie trailers or OS updates. The caching of small stuff to keep from contibuting to someone else's slashdotting, is just a bonus.

Oh, and if an RSS is ten minutes old instead of "real time": The 1% of the population that actually cares, can just elect to not use the proxy.

It can even be a really cheap box, too, since it doesn't need to be reliable. Use cheap consumer-grade crap. If once per year a drive fails and you lose all your cache, so what?

--
As copyright owner of this comment, I authorize everyone to defeat any technological measure which limits access to it.

Re:Still haven't tried these newfangled RSS reader by coolguy81 · 2004-07-20 07:53 · Score: 1

RSS Bandit (Windows)
Syndigator (X)

There is also a rss thunderbird extension Formzilla but you have to be using a version of thunderbird build with the xmlextras extension... it is all described in the post.

Its totally stupid. by torpor · 2004-07-20 07:53 · Score: 1

RSS is like a hi-jack of majordomo, by marketing dweebs.

E-mail - yes folks, good old fashioned SMTP, can be used for these things that RSS is supposedly 'good for'.

We do not need yet another protocol for transfering messages to each other. A properly defined X-Protocol addition, which allows for embedded XML in the Body text, would solve this distribution problem entirely.

Mail scales well. Like it or not, but it does. Its a perfect model for RSS ...

--
; -- the corruption of government starts with its secrets. a truly free people keep no secrets. --

Oh, come on by aiken_d · 2004-07-20 07:54 · Score: 5, Interesting

My guess is that InfoWorld is dynamically generating the RSS for each request. A simple host-side cache of the generated XML, so hits just talk to the HTTP server and not the app server, would probably make this a non-issue.

Or are they *really* getting more RSS hits than image requests? If -- somehow -- that's the case, spend $500/mo on Akamai or Speedera and point RSS stuff there, and give the CDN a reasonable timeout (30 minutes or something). That guarantees you no more than about 500 hits per timeout period, or maybe one every 10 seconds. Surely the app server can handle that.

Then again, what do I know? I only worked there for five years, including two on infoworld.com. It's been a few years, but unless things have changed dramatically, that is one messed up IT organization.

Cheers
-b

--
If I wanted a sig I would have filled in that stupid box.

Re:Oh, come on by Anonymous Coward · 2004-07-20 08:38 · Score: 0

From the discription of the problem, the primary issue is the vast majority of the RSS requests are coming in at the top of the hour, if we assume, say, a 10 second window, that is a 360-fold concentration of requests. Or if it is a half-minute window, that would still be a 120-fold concentration. If the "average" load is just 1% of excess server capacity, then the peak load would obviously overload the server.

It would be a waste to blow $500 a month that you are only going to use for 12 minutes a day (assuming a 30 second peak load window at the top of each hour). Like he is saying, the format seems to be pretty flawed in the scaleability department.

A simple system would be for the RSS server to send a timestamp with each feed, which is returned with the next request. The server could quickly check the timestamp against the most recent update, and even just return any entries that occured since that time.
Re:Oh, come on by krails · 2004-07-20 08:54 · Score: 1

Yep, things really have changed that dramatically since you were here. =)

These are static pages being generated, and the issue is just that the majority of RSS clients are too dumb to allow randomly timed checks. Selecting "once an hour" means at xx:00. Selecting "once every 4 hours" means at 12:00, 4:00, etc.

The guys designing these programs really should think about how the polling works. Do a check on startup, then every hour, half hour, etc after that. Probably 90% of our RSS traffic comes in during the first minute of every hour.

---
http://kevin.railsback.com/blog/
Re:Oh, come on by Mitchell+Mebane · 2004-07-20 09:10 · Score: 2, Informative

Or maybe something like this.

--

The roots of education are bitter, but the fruit is sweet.
--Aristotle
Re:Oh, come on by smitty45 · 2004-07-20 09:40 · Score: 1

"I only worked there for five years, including two on infoworld.com."

So did I, and I helped Chad and other folks redesign their front-facing stuff, including the migration from the Solaris/NetscapeServer-based sucky CMS-published content to Apache on Linux.

Unless you worked for Chad, I would guess that you would not at all recognize the entire IT organization, forget about the infrastructure, if you were to visit.

Are you implying that you're armchair critiquing InfoWorld's RSS issues in 5 lines of a Slashdot comment ?
Re:Oh, come on by Bill+Kendrick · 2004-07-20 10:02 · Score: 1

spend $500/mo on Akamai or Speedera

Exactly. First thing to pop into my head was distributing the content via Akamai. :)

-bill!
Re: Oh, come on by Nevyn · 2004-07-20 10:49 · Score: 1

So what are you complaining about ... the fix seems obvious to me. Just bandwidth limit all RSS requests connects between x:00:00 and x:01:00 to 1KB/s or something. Or just block those IPs for an hour. This isn't html, if I happen to open my RSS reader during that window one day ... so what. I'll get it the next, or later that day if I open it again.

All the retarded RSS readers get what they deserve, sucks to be them or their users ... the coders will work out how to stop being so retarded, or their users will work out how to find an RSS reader written by someone who has.

I'd also bet that you'd only need to have this policy for a while, and then the problem will stop (or so be much smaller you don't care anyway).

--
ustr: Managed string API with ave. 44% overhead over strdup(), for 0-20B
Re: Oh, come on by smitty45 · 2004-07-20 11:50 · Score: 1

Here's where an alarm should go off in your head:

when you think that you're solving a technical problem in a couple lines in a Slashdot comment, that declares the problem to be "simple" or "obvious", that the CTO of InfoWorld and his team can't solve right away.

Let's have RSS torrents -- by Anonymous Coward · 2004-07-20 07:54 · Score: 0

That way I can snag the story from someone who just downloaded it. :)

How about this by humpTdance · 2004-07-20 07:54 · Score: 1

Bit Torrent + RSS = Problem Solved

Re:Still haven't tried these newfangled RSS reader by Eslyjah · 2004-07-20 07:54 · Score: 2, Informative

If you're using NetNewsWire on OS X, try the Atom Beta, which, I'm sure it will come as no shock to you, adds support for Atom feeds.

Re:Still haven't tried these newfangled RSS reader by cmdr_beeftaco · 2004-07-20 07:54 · Score: 0, Troll

www.google.com
This amazing site gets a number of news feeds but it also empowers you to find your own damn reader.

It just ain't broadcast.. by wfberg · 2004-07-20 07:54 · Score: 4, Interesting

Complaining about people connecting to your RSS feeds "impolitely" is missing the mark a bit, I think. Even RSS readers that *do* check when the file was last changed, still download the entire feed when so much as a single character has changed.

There used to be a system where you could pull a list of recently posted articles off of a server that your ISP had installed locally, and only get the newest headers, and then decide which article bodies to retrieve.. The articles could even contain rich content, like HTML and binary files. And to top it off, articles posted by some-one across the globe were transmitted from ISP to ISP, spreading over the world like an expanding mesh.

They called this.. USENET..

I realize that RSS is "teh hotness" and Usenet is "old and busted", and that "push is dead" etc. But for Pete's sake, don't send a unicast protocol to do a multicast (even if it is at the application layer) protocol's job!

It would of course be great if there was a "cache" hierarchy on usenet. Newsgroups could be styled after content providers URLs (e.g. cache.com.cnn, cache.com.livejournal.somegoth) and you could just subscribe to crap that way. There's nothing magical about what RSS readers do that the underlying stuff has to be all RRS-y and HTTP-y..

For real push you could even send the RSS via SMTP, and you could use your ISPs outgoing mail server to multiply your bandwidth (i.e. BCC).

--
SCO employee? Check out the bounty

Re:It just ain't broadcast.. by fiftyvolts · 2004-07-20 08:28 · Score: 4, Insightful

You make some very good points. The old saying "When all you have is a hammer, everything looks like a nail" seems to ring true time and time again. These days it seems that everyone wants to use HTTP for everything and quite frankly it's not equipped to do that.

RSS over SMTP sounds pretty cool. Heck, just sending a list of subscribers an email of RSS and let their mail clients sort it out would be pretty nice.

Heh, my favorite posts are when some one suggested soething that sonuds totally novel and then someone else points our "Yeah! Like $lt;insert old and undeused technology>. It seems to do that damn well." The internet cannot forget its roots!

--
100% Crunchier
Re:It just ain't broadcast.. by MenTaLguY · 2004-07-20 09:47 · Score: 1

So, serious question ... NNTP servers are still out there and not totally dead... what's to stop an implementatino of RSS-over-NNTP now, while we still have the infrastructure?

--

DNA just wants to be free...
Re:It just ain't broadcast.. by wfberg · 2004-07-20 10:31 · Score: 1

So, serious question ... NNTP servers are still out there and not totally dead... what's to stop an implementatino of RSS-over-NNTP now, while we still have the infrastructure?

Nothing.

Although if you want the groups to be carried commonly, on that existing infrastructure, it might be a good idea to use a single alt.* group (in time, perhaps, a big-8 group).

In that case, you'd want to include a unique id (URL for example) of the RSS file in the subject for easy filtering (the From: header is another obvious place for the feed's ID, but you'd have to figure out a way to translate a URL to an e-mail address).

Using the "Supersedes:" header, you could just post existing RSS files, although it would be nicer if each article was posted as a single article (which would work a lot better with a complete hierarchy, so that the Subject header is freed up for the actual article's subject).

That's on the content provider's side. On the client side, the RSS reader needs to, well, support NNTP.

So what's to stop it? A well-thought-out specification (RFC stylee) and an implementation.

--
SCO employee? Check out the bounty
Re:It just ain't broadcast.. by AndroidCat · 2004-07-20 10:49 · Score: 1

Hey, and to jazz up Usenet, people can make it available as an RSS feed too! (Maybe with some P2P as well.) And then make it so that anyone can moderate, but you only see moderation from people you trust via FOAF RDF.
(I hope I'm kidding, but...)

--
One line blog. I hear that they're called Twitters now.
Re:It just ain't broadcast.. by AndroidCat · 2004-07-20 10:53 · Score: 1

And if Google could carry that newsgroup, distribution would be even easier. Let's see now, XML RSS over NNTP to Google and out via HTTP...

--
One line blog. I hear that they're called Twitters now.
Re:It just ain't broadcast.. by MenTaLguY · 2004-07-20 10:57 · Score: 1

So what's to stop it? A well-thought-out specification (RFC stylee) and an implementation.

Interested? I'm toying with the idea of doing just that, but I can't do it on my own, and you seem to have a much deeper knowledge of NNTP than I.

--

DNA just wants to be free...
Re:It just ain't broadcast.. by Bluelive · 2004-07-20 11:34 · Score: 1

IRC seems to me a better fit with its channels and server model, cept what to use to retrieve the headers that youd miss when your offline. SMTP and NNTP are both still a pull technologys for the last mile, and much higher latency then irc.
Re:It just ain't broadcast.. by wfberg · 2004-07-20 12:12 · Score: 1

IRC seems to me a better fit with its channels and server model, cept what to use to retrieve the headers that youd miss when your offline. SMTP and NNTP are both still a pull technologys for the last mile, and much higher latency then irc.

But IRC isn't really suited to the kinds of volumes that RSS feeds would impose on it -- at least not on a per-channel basis. IRCopers are also rather protective of people raping their bandwidth by using IRC as a transport layer. Usenet on the other hand was made for newsgroups.

The last mile of NNTP doesn't need to be pull the last mile, you could run servers onlying carry RSS groups that accept IHAVE commands from RSS clients. But if you're running edge servers yourself rather than using the ISPs newsservers, then another platform (like jabber or some of that jxta stuff) might be a better idea.

There's something to the latency issue, though. On the other hand, the alternative is using a regular RSS reader that checks at hourly intervals; whereas most ISPs won't mind if you check a newsgroup for new articles every few seconds.

The line between push/pull is rather thin, after all. If a server takes it time sending stuff to you, you might not see the hottest news as it breaks (e.g. mailinglists that sometimes take hours between mailing subscriber 1 and mailing subscriber 12000); on the other hand, if you hit refresh often enough.. Some versions of push are really pull; like those blackberry devices. The mail is pushed to you, but only after it's been pulled off of pop3 at scheduled intervals. If I use pine/mutt, aren't I pulling the mail from my spoolfile, where it has been pushed using SMTP?

IRC is a neat protocol (though sometimes poorly implemented) but usenet is just made for syndication. You don't even need to format stuff in RSS, just post MIME HTML (in special purpose groups!), no problemo. Cool or what?

Of course, there's no reason whatsoever RSS couldn't be embedded in IRC, DNS, or whatever..

--
SCO employee? Check out the bounty
Re:It just ain't broadcast.. by wfberg · 2004-07-20 12:38 · Score: 1

So what's to stop it? A well-thought-out specification (RFC stylee) and an implementation.

Interested? I'm toying with the idea of doing just that, but I can't do it on my own, and you seem to have a much deeper knowledge of NNTP than I.

I'm not much of a blogger, I have no idea which blogging software it would be most important to get a post2usenet plugin for.. There are NNTP functions in PHP though, that's always good.

There's already a project that pulls RSS feeds from the web and then posts individual descriptions+links to usenet groups, at http://www.methodize.org/nntprss/.

The scope of this thing is rather grand
1) design the protocol (making decisions like, do we want to post RSS or HTML? full-body HTML or just the blog-entry? one article per blog entry, or Supersede old posts with the most recent blog-entries? use a single newsgroup, or use a hierarchy? what about security/spoofing?)
2) get bloggers to publish to NNTP
2) get clients to use NNTP
3) get ISPs to carry the newsgroup (easier with an alt* group, but hard to prevent spoofing) or the hierarchy (usenet operators dislike new hierarchies); or set up an entirely new network (like IRC people do all the time).

It would be pivotal if a popular blogging application (movabletype, pivot, sunlog) or services (blogger? google groups?) would add a post2usenet feature. Clients would hopefully soon follow, and from there ISPs would be more willing to carry groups.

Unfortunately I don't have the political clout (or time) to get involved with all those blogging tools people...

--
SCO employee? Check out the bounty
Re:It just ain't broadcast.. by mbauser2 · 2004-07-20 16:45 · Score: 2, Insightful

1) The RSS-developer community has a completely irrational fear of MIME. They never completed the registration of the application/rss+xml media type, and they've shown no interest in doing so. Weiner and the gang want to use text/xml for everything, which makes it harder to separate RSS out of a newsgroup (or anything else; more on that below).

2) The RSS developer community can't picture themselves using anything except HTTP. I've tried mentioning other protocols to them; they don't respond.

3) NOBODY MAKES RSS READERS THAT WORK IN A PIPE!. Seriously. Is it really so hard to envisage somebody piping an RSS file in from the command line? Apparently, it is for the people who write RSS readers: they make you cut-and-paste URIs into a form before you can do anything with an RSS file.

Seriously, RSS over netnews wouldn't really require any new Big Ideas, just a smart re-application of the Old Ideas:

1) Post RSS files to Usenet with proper "Content-Type" and "Supersedes" headers to an appropriate newsgroup. (Maybe some new RSS-friendly newsgroups; maybe the old ones. We can figure that out later. The important thing is: This wouldn't be any more difficult than posting a FAQ is.)

2) Use newsgroup-capable RSS-readers to poll the newsgroups, and/or use regular newsreaders to pipe RSS files to dedicated RSS-readers.

3) Profit! Or at least, Fewer Accidental DDoS attaacks!

I could do Step 1 now, without significant effort. (It's no more difficult than posting a newsgroup FAQ.) Step 2 requires a real programmer, which I am not.

(In fact, you know what would be great? A combined newsgroup/RSS reader. It makes more sense than all those RSS readers patterned after e-mail programs. But I digress.)

Maybe I'm getting cynical in my old age, but I'm beginning to think this is the UNIX/Windows divide all over again. A lot of the RSS developer community comes from a Windows/Mac developer background, so they just don't see the potential of the toolbox approach, even while they're rambling about the extensibility of XML and it's "user-centric" design.

Take for example, the refusal to get a real media type for RSS: A unique MIME type would help web browsers, too, because browsers can use media types to decide which plug-in gets which file. Instead of making a user cut-and-paste URIs from his browser to his reader (which is a dreadfully Window-ish way of doing it), the user could just click on the RSS link and the web browser could launch the RSS reader by itself (which presumbably would do something smart, like ask the user if they want to subscribe to a new feed). Just like all those other plug-ins and non-HTML formats on the Web!

Makes sense, yes? But it doesn't register with anybody creating RSS readers. Some programmers still advocate the cutting-and-pasting of URIs. Some programmers advocated auto-discovery by reading HTML "link" elements. Some advocate complicated cloud/stream schemes. But nobody wants to talk about re-using basic, functional tools that we've had in the toolbox for 10, 15, or 25 years.

Some days, it's like the "RSS developers" are from another planet. And I want to send them all back.

--
Proud to be / Smiley-free / Since Nineteen / Ninety-Three
Re:It just ain't broadcast.. by Barto · 2004-07-20 18:34 · Score: 1

WebDAV's XML properties could be used to alleviate the problems with vanilla HTTP. PROPFIND the modification date on the RSS feed, if it is later than the local copy download it again.
Re:It just ain't broadcast.. by SQL+Error · 2004-07-20 19:59 · Score: 1

Damn, my mod points have expired. Someone mod mbauser2 up.

Yes, NNTP is the absolute friggin' obvious solution to the problem. Tested, scalable, works. already in place, widely supported. Grrr. Grrr, I say!

rss+bittorent? by DougMelvin · 2004-07-20 07:54 · Score: 1, Redundant

How about combining RSS with Bittorent? The RSS feeder would act more as a BT tracker.. Simply point the client to the nearest dood with a copy of the feed.....

--
Reality is in the mind of the beholder - me 1996

Re:rss+bittorent? by slungsolow · 2004-07-20 08:16 · Score: 1

I don't see how distributed bandwidth would help with what is essentially a small marked up text file. If someone is serious about their RSS feed they can just dedicate a box for it. Perhaps a better way to thing of the RSS feed would be to serve it up "buffet" style. All you can eat for a dollar.. or in this case.. all you can download for a set amount of time (cycles whatever).. its a horrible idea.

3 Good Ideas by cameronc · 2004-07-20 07:54 · Score: 1

Readers should obey HTTP Cache and Expires Directives
Readers should use Head requests to determine change
Future RSS formats should specify update frequency

Re:3 Good Ideas by Anonymous Coward · 2004-07-20 08:04 · Score: 0

RSS feeds like /. should also respect such ideas...

Last I heard slashdot doesn't use rss cache... never looked at the head so I can't comment on that...

But... why if you refresh a slashdot rss feed three times in two hours they think your DDoSing them but you could refresh they home page every-other second for the next 20 years and its all peachy...

The problem is two fold... RSS feed-reader apps are really silly as thats not the intended purpose of rss... and then you have silly rss feed rules (like /.s)

But thed three good ideas are good ideas for BOTH website developers AND rss feed developers to follow.
Re:3 Good Ideas by cosmo7 · 2004-07-20 08:23 · Score: 1

Disclosure: I am the author of NewsYouCanUse, a menubar RSS reader for OS X. NewsYouCanUse downloads using a timer, not a clock.

One of the problems with following these sensible suggestions is that many feeds don't actually update their headers. Some feeds do all sorts of crazy things, such as mixing up their character encoding, using unreliable servers, occasionally serving an html page instead of xml, malforming tags, reusing urls and so on. Developers will only use header information when there is some assurance that the headers have been created properly, especially as any mistake in reading a feed appears to the user to be an application bug.
Re:3 Good Ideas by cameronc · 2004-07-20 11:55 · Score: 1

So it sounds liek the probelm isn't really with RSS, it's with bad web applications, bad web servers, and in some cases, bad reader software. Implementing RSS feeds for high traffic sites involves using an application server to generate the feed and a cache proxy to serve it. Proper HTTP cache headers are sent along with the request and the number of requests which can be handled is only limitted by the number of cache proxies sitting in front of the app. Good readers should be able to use the header info and behave properly, regardless of the effectiveness of the web server or web application. It's a question of good application development on both ends. Eventually both ends of the equation will catch up, but just because the web servers aren't serving proper info no-one should use that as an excuse to develop misbehaving readers...

Re:Still haven't tried these newfangled RSS reader by Anonymous Coward · 2004-07-20 07:56 · Score: 0, Funny

It's "informative by association." His post will attract answers. (Well, answers and people who bitch about moderation.)

Site proxy server by Anonymous Coward · 2004-07-20 07:56 · Score: 0

Have RSS readers use a local proxy that is willing to cache RSS data.
Treat this like an HTML problem.

bah. Multicast! by Anonymous Coward · 2004-07-20 07:57 · Score: 0

Demand your ISP support native multicast. As most ISPs are now owned by cable companies, and native multicast would enable efficient, scalable video distribution, don't bet on them being too receptive, but here again is another application for which native multicast would excel.

Re:Still haven't tried these newfangled RSS reader by harley_frog · 2004-07-20 07:57 · Score: 1

I use wTicker for my Windows computer and KNewsTicker for my Linux boxes. The latest version of wTicker won't run on my XP computer, but an older version does. It's still in beta and a little clunky, but the crawler takes up far less screen space than any other RSS reader I've tried.

--
It's all fun and games until someone loses the key to the handcuffs.

Too many problems. by blanks · 2004-07-20 07:58 · Score: 1

The main problem with most RSS feeds is that they update all information. Most of these run off a simple JavaScript that will run on a timer to get all the data again and again. A better solution would be to implement an XML RSS (or any language really) that uses a simple ID system for news items. When its time to update the news feed, find any new ID's existing; don't retrieve existing data, only new data. This would cut down a large chunk of bandwidth. A better idea would be to implement some type of component that access their database (or web server through isapi etc etc) that will update the content on the external server that requires the RSS. This would again cut out a large level of data transfer, and requests that would normally slow the server down. Yes this would need to be installed on the external web server, but if people need the news feeds, then you can force them to do it your way. This is similar to what I do with our system, we have 12,000 machines every few minutes accessing a database to send and receive new information, and we have very few problems with it.

--
TruePunk | Games

Wrong target, but good solution by oGMo · 2004-07-20 07:59 · Score: 1

Reimplementing TCP using a database is excessive. Making a light connectionless protocol that does similar to what you described would be a lot simpler and not require reimplementing everyone's TCP stack.

Also, as much as I hate the fad of labelling everything P2P, having a P2P-ish network for this would help, too. The original server can just hand out MD5's, and clients propagate the actual text throughout the network.

Of course (and this relates to the P2P stuff), every newfangled toy these days is just a pathetic reimplementation of some original Internet protocol. Like, say, NNTP. Which does all of this already, and has for years. Ah well.

--

Don't think of it as a flame---it's more like an argument that does 3d6 fire damage

Re:Wrong target, but good solution by arrow · 2004-07-20 08:40 · Score: 1

Making a light connectionless protocol that does similar to what you described would be a lot simpler and not require reimplementing everyone's TCP stack.

Thats a great idea! Lets call it UDP.

--
symetrix. We are building a religion, a limited edition.
Re:Wrong target, but good solution by Russ+Nelson · 2004-07-20 15:11 · Score: 1

What I suggest is not reimplementing everybody's TCP stack! Just the people who are publishing extremely popular RSS feeds. Mine is a much simpler solution than yours because it continues to rely on TCP, and doesn't create a new protocol.
-russ

--
Don't piss off The Angry Economist
Re:Wrong target, but good solution by Anonymous Coward · 2004-07-20 17:46 · Score: 0

Mine is a much simpler solution than yours because it continues to rely on TCP, and doesn't create a new protocol.

Can I touch you?
Re:Wrong target, but good solution by Russ+Nelson · 2004-07-20 18:15 · Score: 1

No, and I'll thank you to call me Sir Nelson. A little in advance my knighthood, but practice wouldn't hurt.
-russ

--
Don't piss off The Angry Economist

Re:Idea use IRC by Anonymous Coward · 2004-07-20 07:59 · Score: 0

We need to have rss published out to be picked up by clients very much like IRC.
The RSS readers would connect up to the IRC and when new headlines are published into the IRC they would be picked up. (yes they may have to be signed but that is how hard, not hard)

this idea is very much a MOM (Message oriented Middleware)
but there are few/no good open source MOM products that can work with out needing java.

You know what would be nice.. by ToadMan8 · 2004-07-20 07:59 · Score: 2, Interesting

/. is especially pissy with this but I want breaking news, not whatever is new each hour. So I hammer the shit out of it with my client (and get banned). I'd like to see a service where I download one client (that has front-ends in Gnome pannel, the Windows tray, etc.) that the site (/., cspan, etc.) _pushes_ new updates to when I sign up. Those w/ dynamic IPs could, when they sign on, have their client automagically connect to a server that holds their unique user ID with their IP.

--
I haven't posted in so long, my sig is out of date.

Re:You know what would be nice.. by Anonymous Coward · 2004-07-20 08:42 · Score: 0

Yeah! And you could make it a screensaver or something, so that your news is on your screen! Oh wait, that was Pointcast.
Re:You know what would be nice.. by CerebusUS · 2004-07-20 14:00 · Score: 1

OMG you just just (re-)invented PointCast!

Hmmmm. by SatanicPuppy · 2004-07-20 08:01 · Score: 1

Well, sure if you want to the absolute second, but if you spread the requests across 5 minutes, say, or something similar, it would certainly help, and I doubt most people would complain.

--
ad logicam Claiming a proposition is false because it was presented as the conclusion of a fallacious argument.

RSS and Web TV: similatities by anynameleft · 2004-07-20 08:02 · Score: 1

Okay, I had already posted, but there is something else I just thought about: Live streaming video/audio.

With both it and RSS (and Slashdot and the pages it links to), some kind of data is released by a server and because of the nature of the medium (being live) all clients want to get this data as soon as it becomes available. The result: DDoS.

Now I bet that not all of these thousands of people have a direct LAN connection to the server in question. Now isn't it stupid to send thousands of identical data packages, through the same connection, at the same time? The only thing it is, is a waste of money and resources.

Therefore, as I see it, there are two things that could really solve this problem:

Don't use a poll-based system, but a system in which a client registers with a server, and from then on the server initiates the transfers.

Have a possibility for one IP packet to reach a possibly huge amount of hosts.

For RSS, this can actually be implemented quite easily. You only need to cache the RSS feeds in HTTP proxies, and then you need a system with which the server can notify the clients that a new feed is available. And oh, clients would need to actually use these proxies for a change.

I leave open the task of designing a similar system for Web TV, I don't know enough about those protocols.

RSS+DNS by scorp1us · 2004-07-20 08:02 · Score: 1

Using the distributed DNS system, or a system like DNS, we can push RSS content down to local servers. You still have go to to the site for the actual content, but the headlines are distributed.

This woul dbe an ideal solution, since most RSS feeds are a few K. There's room for a lot of RSS in 1 megabyte.

Of course, a caching proxy server would do the same thing.

--
Slashdot's rate-of-post filter: Preventing you from posting too many great ideas at once.

Re:RSS+DNS by Anonymous Coward · 2004-07-20 08:49 · Score: 0

Create a distributed system like DNS all you want, but I wish people would stop suggesting that the actual DNS system could be used for their pet project of the week.

Speedfeed? by GeorgeH · 2004-07-20 08:04 · Score: 1

"Speedfeed" is such a useful thing, it's unfortunate that it's ultimately just very stupid.

Yeah, it is stupid, which is why most of us just call it RSS.

--
Why can't I moderate something "Wrong" or at least "Grossly Misinformed"?

Re:Speedfeed? by Trejkaz · 2004-07-20 13:14 · Score: 1

The idiots who make up words like "Speedfeed" for this are the same idiots who make up words like "txt" for equally easy words like SMS.

--
Karma: It's all a bunch of tree-huggin' hippy crap!

RSS - PUBSUB by jamincollins · 2004-07-20 08:04 · Score: 1

All RSS feeds should look at impementing PUBSUB instead. RSS's major flaw is that it has to constantly poll sources to find updates. This is what creates the DOS effect. If they instead used a PUBSUB method (google if you'd like to know more) those that have subscribed would be notified of an update.

RSS is like a DDoS attack on my brain by PCM2 · 2004-07-20 08:05 · Score: 5, Interesting

Am I the only one who finds it easier to get the information I want from the home pages of the sites I trust, rather than relying on an RSS feed? For one thing, in an RSS feed every story has the same priority ... stories keep coming in and I have no idea which ones are "bigger" than others. Sites like News.com, on the other hand, follow the newspaper's example of printing the headlines for the more important stories bigger. With RSS, it's just information overload, especially with the same stories duplicated at different sources, etc. Everyone seems really excited about RSS, but when I tried it I just couldn't figure out how to use it such that it would actually give me some real value vs. the resources I already have.

--
Breakfast served all day!

Re:RSS is like a DDoS attack on my brain by damiangerous · 2004-07-20 08:56 · Score: 2, Interesting

Nope. I just don't get RSS either. Every time there's a story about it I give another reader another shot, and every time I just end up thinking "how is this different than checking my bookmarks regularly?"
Re:RSS is like a DDoS attack on my brain by Anonymous Coward · 2004-07-20 14:09 · Score: 0

I read a whole lot of blogs. It's much easier to read them in RSS Owl, aggregating the different categories of blogs I read.
Re:RSS is like a DDoS attack on my brain by Anonymous Coward · 2004-07-20 22:40 · Score: 0

Well, its real value can only be seen by bored office workers.

Sort of like Slashdot, really.

Simple solutions by Sirrion · 2004-07-20 08:05 · Score: 1

People, if you are going to serve up a popular RSS feed, use a seperate server (or servers). You can't control the clients so you have to be prepared to handle the worst case. Other than bandwidth, an RSS feed should never cause your site to stop handling requests for your site. Be prepared to put a caching appliance in front of your RSS feed if it's *really* popular (/.?)

That said, the client software should poll at intervals related to the start of the application and they should not retrieve the RSS unless it has changed since the last retrieve (hint: use HTTP HEAD). Developers should be shot if they are too lazy to implement these simple 'net friendly features.

Mod the parent up! by spookymonster · 2004-07-20 08:05 · Score: 1

I was just thinking the exact same thing. This seems like it's perfect for P2P. Query the trackers for a feed aged X minutes or less; if no matching feed is available, go to the original feed site and seed a new copy.

--
- Despite popular opinion, I am not perfect.

DNS Polling? by Effugas · 2004-07-20 08:06 · Score: 1

Hmmm. I'm neck-deep in DNS code anyway; is there any interest in a protocol that would encode update times -- probably not the updates themselves -- in DNS?

The concept is that every time you updated your blog, you'd do a Dynamic DNS push to a RSS name, say, rss.www.slashdot.org's TXT record, containing the Unix time in seconds of the last update (alternatively, and this is how I'd probably implement it in my custom server, lookups to rss.www.slashdot.org would cause a date-check on the entry). The TTL of the DNS entry could be increased to limit the update frequency of clients.

If this is cool (I'm sure some RSS dev's are trolling these comments), throw me an email or reply here. I'll do the server side if someone will integrate support for it into their client.

--Dan

Re:DNS Polling? by Effugas · 2004-07-20 08:09 · Score: 1

Actually, I'll make this even easier, and use the fact that 32 bit IP addresses fit 32 bit unix timestamps juuuuuuuust fine. So you'd do a gethostbyname and recast the response!

--Dan

Hits by Anonymous Coward · 2004-07-20 08:06 · Score: 0

RSS is great, you can easily look over 20 news sources quickly, in a common format. Unfortunately people update feeds in 5 minute intervals, but honestly I would say I hit slashdot every 20 minutes with or without rss. People want information quickly, and this creates hits regardless. This growth is good.

If-Modified-Since and ETag headers by Kakurenbo+Shogun · 2004-07-20 08:06 · Score: 1

Better than doing a HEAD first to see if the feed has been udpated is to use the If-Modified-Since and/or ETag headers. If the feed hasn't been updated, the server sends a very small response saying so (roughly the same size as the response to HEAD), and doesn't send the whole feed--that all happens in one request/response. Doing HEAD first, and then GET if the feed has been udpated requires two requests and two responses any time the feed has been updated.

--
Convert RSS to HTML - integrate webfeeds into your website

Re:Over the years? How about over the weekend? by SuperRob · 2004-07-20 08:06 · Score: 1

That, and I don't grant the premise that "it's ultimately just very stupid." It's no stupider than a web browser or anything else when set on default settings. I think it could be argued that RSS actually reduces overall traffic ... not to mention that I don't think RSS has the sort of traction people are guessing it does, at least, not yet anyway.

Re:Still haven't tried these newfangled RSS reader by bbh · 2004-07-20 08:09 · Score: 2, Informative

I'm using Liferea version 0.5.1 under Linux right now. Compiles from source fine on Fedora Core 2 and has worked great for me so far.

bbh

Similar problems with other stuff? by i.r.id10t · 2004-07-20 08:09 · Score: 1

Wasn't there a /. article a while back about one of the ntp servers out there (some .edu in Washington IIRC) that was getting DDoS'd by a bunch of home-user grade DSL/Cable routers updating their clocks all the time? Isn't this basically the same problem?

--
Don't blame me, I voted for Kodos

RSS scalability by glinden · 2004-07-20 08:09 · Score: 1

There's a variety of ways to deal with this issue. The solution many seem to be suggesting is to randomize request times so that there aren't big spikes in traffic every hour at the hour. That's certainly a good idea. Clients should also respect the ttl (polling at the interval that is listed in the feed), support conditional GET, and handle 304 (not modified) responses to minimize the number of requests they make for the full feed.

But the primary solution will end up being caching. With the exception of personalized RSS feeds, RSS feeds easily can be cached. Web-based RSS readers like Bloglines and My Yahoo already only read the RSS feed once, cache it, and display it to multiple readers. But popular RSS feeds are also easily proxy cached just like web pages, reducing the load on the original source servers.

Re:Over the years? How about over the weekend? by Marxist+Hacker+42 · 2004-07-20 08:10 · Score: 2, Interesting

Overall traffic isn't what anybody is complaining about- as I noted, the 503 errors seem to come at the top of every hour (I just got through not being able to read slashdot for a few minutes), which means, essentially, slashdot is recieving a slashdotting. Do I know that RSS is doing it? Not from this location which has limited investigation tools or capability to figure out what's really going on. But it might explain recent behavior of the site.

--
SJW: a person who perceives an injustice, and while correcting it, commits a greater injustice.

Wasn't this the whole point of Konspire2b? by Alzheimers · 2004-07-20 08:10 · Score: 1

To quote their webpage:

a blog with unlimited bandwidth

blogs are software systems that allow you to easily post a series of documents to your website over time. Many people use blogs to display daily thoughts, rants, news stories, or pictures. If you run a blog, your readers can return to your site regularly to see the new content that you have posted. Before blogs came along, maintaining a website (and updating it regularly) was a relatively tedious process. Some might call blogging a social revolution---even if you do not buy the hype, you must admit that blogs are causing quite a stir.

kast is similar to a blogging system in that you can use it to regularly "post" new content to a group of readers. Of course, a blog, like any website, has limited bandwidth. Thus, the kinds of content you can post to a blog are usually limited to text and pictures, especially for popular blogs that are read by many people. By leveraging the distribution power of konspire2b, you can use kast to post files of any size to essentially as many readers as you want.

on your blog, you might have a "picture of the day". On your konspire2b channel, you can have a "movie trailer of the day" or even a "gnu/linux distribution of the day". Bandwidth limitations are essentially taken out of the equation.

and, thanks to kast's web-based user interface, you can use HTML comments to describe each broadcast and link back to relevant information on the web. In fact, the layout of kast's "received folder" interface almost looks like a blog.

That sounds an awful lot like fixing the inherent problem with RSS!

Jabber RSS subscriptions by porter235 · 2004-07-20 08:10 · Score: 1

Jabber could work nicely for this type of thing.

Providers set up a Jabber presence for their individual RSS feeds.

Clients subscribe to Providers RSS presence.

Provider generates new news, And makes an announcement via it's jabber presence. This could even include all of the information that is normally in a RSS feed, making the need for RSS unessary!

Clients then can repoll the RSS for new news if they like.

If there is a problem with too many subscribers at once... then you limit the number of clients that can subscribe to any individual RSS jabber presence. If you try and subscribe to a full account, you get an auto-message informing you of the currently open account. The provider sets up 1 RSS jabber presence per 1000 subscirbers for instance, and then only announces the new news to each jabber account every 5 or 10 min, spreading out the hit on the RSS file.

Re:Jabber RSS subscriptions by Anonymous Coward · 2004-07-20 08:17 · Score: 0

yeah... pubsub.com does that. And they are already facing the scalability issues of XMPPs poor interserver protocol (even IRC is much better at distributing things)
Re:Jabber RSS subscriptions by porter235 · 2004-07-20 08:54 · Score: 1

OK, but that is a different issue... As well, how does pubsub.com know that there is new content... by polling RSS in a traditional manner? Or do the content providers actually inform pubsub? Are they sending all of the messages at once or distributed over time like I suggested? Is XMPP the bottle neck, or their polling and matching?

p2p-rss ? by edson+at+lies.cl · 2004-07-20 08:11 · Score: 0

ever thought about that...?

--
i have found, you can find,happiness in slavery!

Because DDoS's Attack every hour on the hour by chadpnet · 2004-07-20 08:11 · Score: 1

Every hour, Infoworld "sees a massive surge of RSS newsreader activity" that "has all the characteristics of a distributed DoS attack."

If one RSS file is parsed by tens of thousands of readers every hour on the hour, how is that anything like "all the characteristics" of a DDoS? If only real DDoS's REALLY has all the characteristics of an RSS reader. (ie, Loaded one document at a specific interval, and doesn't do anything else for another hour)

Wrong solution to wrong problem by ptomblin · 2004-07-20 08:11 · Score: 1

Others have already mentioned that RSS is an attempt to fake a "push" in a technology that is all "pull".

I have what to my 10 minutes of thought on the subject appears to be a better solution - every web site that currently publishes an RDF page should instead push new entries to an NNTP newsgroup. I'd suggest that a heirarchy be created for it, then sort of a reverse of the URL for the group name, like rdf.org.slashdot or rdf.uk.co.thregister. Then the articles get propogated in a distributed manner and people read a copy on their nearest news server instead of hammering your web site over and over looking to see if there are updates.

Feel free to tear this idea to shreds.

--
The next Cmdr Taco duplicate will be ready soon, but subscribers can beat the rush and see it early!

Re:Wrong solution to wrong problem by wfberg · 2004-07-20 08:52 · Score: 1

I have what to my 10 minutes of thought on the subject appears to be a better solution - every web site that currently publishes an RDF page should instead push new entries to an NNTP newsgroup. I'd suggest that a heirarchy be created for it, then sort of a reverse of the URL for the group name, like rdf.org.slashdot or rdf.uk.co.thregister. Then the articles get propogated in a distributed manner and people read a copy on their nearest news server instead of hammering your web site over and over looking to see if there are updates.

I'm not saying my idea is terribly original, but would your 10 minutes of thought perhaps have included reading this comment, seeing as it was posted about 15 minutes prior to yours? ;-)

Keep up the good work in spreading the word though. Just don't patent it (actually, if you did come up with it without reading my comment, althemore proof how obvious my ideas are) ;-)

--
SCO employee? Check out the bounty
Re:Wrong solution to wrong problem by ptomblin · 2004-07-20 09:08 · Score: 1

actually, if you did come up with it without reading my comment, althemore proof how obvious my ideas are

Actually, I did come up with the idea without seeing your post. But since I've been a Usenet news administrator for 18 years, I suppose it's no surprise that I would think of NNTP as a solution.

It's a damn site better use of NNTP that for distributing porn and copyright violations, anyway.

--
The next Cmdr Taco duplicate will be ready soon, but subscribers can beat the rush and see it early!
Re:Wrong solution to wrong problem by wfberg · 2004-07-20 09:23 · Score: 1

Actually, I did come up with the idea without seeing your post. But since I've been a Usenet news administrator for 18 years, I suppose it's no surprise that I would think of NNTP as a solution.

18 years? Yikes!! Come to think of it, I should perhaps rephrase my idea in terms of FIDOnet echos.. ;-)

Not really so surprising though, since newsgroups are pretty suited for.. well.. news.. (I'm still a bit sad that the clari.* newsfeed went away, that was cool).

--
SCO employee? Check out the bounty
Re:Wrong solution to wrong problem by danlyke · 2004-07-20 10:17 · Score: 1

This is exactly why I provide an NNTP version of my blog and have made an open offer of NNTP syndication for anyone else who wants to do their 'blog that way. I believe there are also a couple of people doing RSS to NNTP feeds, but none of us have ever gotten enough traction to make doing this as more than a novelty worth our while.

RSS doesn't scale by SteveX · 2004-07-20 08:12 · Score: 1

We need a way to make RSS scale, the sooner the better before the mainstream browsers make it easy for a hundred million people to subscribe to a popular feed. Distributing feeds around using something like NNTP so that users can poll a server near them and let the new items propagate out to closer servers, rather than every user polling the source.

Read this for some more thoughts on this..

Re:Over the years? How about over the weekend? by 0racle · 2004-07-20 08:12 · Score: 1

Ya I think its great, no ads.

--
"I use a Mac because I'm just better than you are."

RSS has ways to control clients.. by Shadeborn · 2004-07-20 08:14 · Score: 1

..but it doesn't help if the clients don't support either vanilla RSS syndication tags (ttl, skipDays, skipHours) or the tags defined by the optional syndication module (updatePeriod, updateFrequency, updateBase).

But even if every client obeyed these and used and respected appropiate HTTP headers (If-Modified-Since, Last-Modified, Expires), it would only make the request flood more synchronized. On the other hand, if the RSS generator randomized the syndication settings, it could distribute the load better and even premptively shift load off the peak times.

rss by abhinavmodi · 2004-07-20 08:14 · Score: 1

rss=real slow syndication ... ? :)

Re:Still haven't tried these newfangled RSS reader by StJefferson · 2004-07-20 08:15 · Score: 1

On Windows, I've been using SharpReader for the past several weeks, and have been reasonably happy with it.

My Linux box is mostly there to sneer at, so I haven't gotten around to setting up anything over there yet.

--

Liberty in our Lifetime

Offload the DDOS burden by desiderius7 · 2004-07-20 08:16 · Score: 1

Use FeedBurner as your public newsfeed to let their smart servers handle the brunt of the attack, plus you get stats and format independence (publish both RSS & Atom from 1 feed).

Agreed... smarter server would help. by Otto · 2004-07-20 08:17 · Score: 1

If the problem is not really one of bandwidth but of server speed, then have your scripts update some static file instead of generating the thing on the fly. Have the server cache the static file in memory, and then it can serve it out nearly instantly.

If you have a PHP generating an RSS XML document every time anybody hits it, you're just begging to be DOS'd.

--
- Give a man a fire and he's warm for a day, but set him on fire and he's warm for the rest of his life.

I can see the Fox special now by jrexilius · 2004-07-20 08:17 · Score: 1, Funny

"When RSS Feeders Attack".. news at 11:00..

Re:I can see the Fox special now by imnoteddy · 2004-07-20 09:01 · Score: 1

"When RSS Feeders Attack".. news at 11:00.
"When RSS Feeders Attack".. DDOS on the hour every hour.

--
No electrons were harmed creating this post, though some may have been subjected to electrical and/or magnetic fields.

Sounds like a job for Akamai by Douglas+Simmons · 2004-07-20 08:19 · Score: 1

So we've got a growing demand for chronic (half-hourly) high bursts of RSS pings among Slashdot and other dynamic sites. Since Slashdot would prefer to pay only for the resources to handle their typical flow of traffic without these extreme clumps of bursts, why not outsource this and use one of those Akamai-like unlimited bandwidth setups to act as a proxy RSS server, for us and all the other RSS-using sites?

Since it is periodic "burst" bandwidth (as opposed to regular visitor traffic), they might be able to provide a better deal than what Slashdot has to pay their provider for the extra bandwidth accomodation. And with all these RSS sites, there's enough of a market for an Akamai-like service to code some kind of RSS proxy server and to offer this package.

Or better yet, force everyone to upgrade to an RSS client that randomizes within 15 minutes the times it checks in.

Re:Sounds like a job for Akamai by arrow · 2004-07-20 08:56 · Score: 1

Akamai requires a minimum commitment of 10 Mb/sec. They expect your "peak" times to be much much higher than that.

They also charge accordingly.

Also, if you have multiple Mb/sec of RSS traffic, you might want to reconsider providing a RSS feed, since it generates little to no income on it's own.

--
symetrix. We are building a religion, a limited edition.

Publish/Subscribe by dgp · 2004-07-20 08:19 · Score: 4, Informative

That is mind bogglingly inefficient. Its like POP clients checking for new email every X minutes. Polling is wrong wrong wrong! Check out the select() libc call. Does the linux kernel go into a busy wait loop listening for every ethernet packet? no! it gets interrupted when a packet it ready!

http://www.mod-pubsub.org/
The apache module mod_pubsub might be a solution.

From the mod_pubsub FAQ:
What is mod_pubsub?

mod_pubsub is a set of libraries, tools, and scripts that enable publish and subscribe messaging over HTTP. mod_pubsub extends Apache by running within its mod_perl Web Server module.

What's the benefit of developing with mod_pubsub?

Real-time data delivery to and from Web Browsers without refreshing; without installing client-side software; and without Applets, ActiveX, or Plug-ins. This is useful for live portals and dashboards, and Web Browser notifications.

Jabber also saw a publish/subscribe mechanism as an important feature.

konspire by akb · 2004-07-20 08:19 · Score: 1

Take a look at Konspire. It has a lot of the properties that you describe. They claim they are more scalable than Bittorrent.

Re:Over the years? How about over the weekend? by anomalous+cohort · 2004-07-20 08:19 · Score: 2, Informative

it's unfortunate that it (RSS) is ultimately just very stupid.

The folks over at Netscape and/or UserLand should have studied the CDF standard first. Then they would have realized the value of specifying schedule information.

Re:Still haven't tried these newfangled RSS reader by Anonymous Coward · 2004-07-20 08:20 · Score: 0

Emacs GNUS, you can even read /. through it!

Re:Still haven't tried these newfangled RSS reader by koogunmo · 2004-07-20 08:21 · Score: 1

if you use firefox I would recomend using a firefox extension called sage http://sage.mozdev.org/ to view RSS feeds.

As a site admin with RSS Feeds.... by Anonymous Coward · 2004-07-20 08:22 · Score: 1, Interesting

There's a two-fold effect to this problem, that even a PUSH solution would not solve. With everyone simultaneously grabbing, you have to deal with the initial precursor blast of traffic (RSS fetch), and then you have to deal with the big shock wave of people coming in to get the actual content (content fetch).

A Push method may stop the precursor, but you're still going to have to deal with everyone jamming into your site at the same time... probably even worse because if it became a 'standard' for clients, you would be faced with a lot more simultaneous content fetches than with a mixed Pull-on-the-half-our/Pull every 30 mins crowd.

I feel that the best method is to enforce RSS frequency through the delivered XML (I was actually quite dumbfounded when I didn't find that in the RSS 2.0 spec), and to have clients not operate on the hands of the clock, but to be distributed based on app start time. Additionally, site designers should be implementing caching and quick-delivery schemes for their RSS feeds, and be using HTTP headers /w content expiration times (not that you can fully expect all clients to adhere to them).

- JR

Re:Idea use IRC by AndroidCat · 2004-07-20 08:23 · Score: 1

Instead of IRC, why not Usenet? (Or perhaps more like Usenet back in the days of UUCP.)

--
One line blog. I hear that they're called Twitters now.

I'll recommend SlashDock by lukket · 2004-07-20 08:24 · Score: 1

I use SlashDock on Mac OS X. It's really great, because I just rightclick on the icon in the dock when I want to check for new developments on my favourite websites. It's very convenient especially when at work. Oh yeah, you can choose the update frequencees individually, and it starts the schedule on app launch, so it's radomized properly too.

Tell me if this won't work.... by NerveGas · 2004-07-20 08:24 · Score: 1

load-balancing between web servers is eventually limitted by how many ports you can have open at once - depending on how you adjust your settings, anywhere from 10,000 to nearly 60,000 per IP address.

So, five front-end load balancers, with twenty IP addresses assigned to each, and a couple of spares to take over if one fails. That gives you what, six *million* concurrent connections? Your bandwidth issues/bills are going to GREATLY eclipse your serving capacity at that point.

Then it's just a matter of shoving cheap, non-redunant, commodity machines in a rack to handle the requests. Say, a bunch of low-power, tiny Epia-based machines all pulling from a central file server.

So, someone updates the RSS file on the file server. Each of the 5/10/100/however-many front-end servers access it once, at which point it goes into file cache. Then they all dish it out like crazy.

Yes, this is an expensive solution: But I like it. I like it because it puts the cost on the person that wants to distribute the RSS feed. Other solutions, like uploading it to NNTP servers, shifts the cost to nearly everyone BUT the provider of information.

steve

--
Oh, you're not stuck, you're just unable to let go of the onion rings.

Distributed checking by deepstephen · 2004-07-20 08:26 · Score: 1

I use Shrook, a lovely RSS reader for MacOS X. It uses distributed checking to get around this problem. From their FAQ:

A central server maintains a database of when each channel was last updated. To keep it up to date, every so often, the server chooses a computer to check for new items and report back. The frequency of this varies from every 5 minutes for popular channels, to every half hour for channels with only one online subscriber, and it tries to use a different computer each time. At the other end, each copy of Shrook checks in with the server every 5 minutes, and if any of its channels are out of date, it reloads them.

Nice. So not only does it stop DDoSing the web server, it means I get updates within five minutes instead of every half hour.

--

--
Karma: Chameleon (you come and go)

Bloglines and other centralized aggregators by markfletcher · 2004-07-20 08:26 · Score: 1

Bloglines avoids this problem completely by only fetching a feed once per iteration, regardless of the number of subscribers. We're also able to provide subscriber stats to feed publishers, something that you can't do with desktop aggregators. And no messy software to install.

Re:Bloglines and other centralized aggregators by easyfrag · 2004-07-20 09:14 · Score: 1

Bloglines rocks! If you use multiple computers a lot a centralized aggregator is a must, otherwise you'll be looking at the same items over and over. Bloglines is perfect with tabbed browsers like Mozilla and Opera, middle-click items that you want to read and continue scanning. The clipping feature offers a storage place for items that you don't have time for immediately or you want to keep as reference. I easily keep track over over 100 feeds and my bookmarks folders have gotten a lot smaller.

Common Sense? by djeaux · 2004-07-20 08:27 · Score: 3, Informative

I publish 15 security-related RSS feeds (scrapers) at my website. In general, they are really small files, so bandwidth is usually not an issue for me. I do publish the frequency with which the feeds are refreshed (usually once per hour).

I won't argue with those who have posted here that some alternative to the "pull" technology of RSS would be very useful. But...

The biggest problem I see isn't newsreaders but blogs. Somebody throws together a blog, inserts a little gizmo to display one of my feeds & then the page draws down the RSS every time the page is reloaded. Given the back-and-forth nature of a lot of folks' web browsing pattern, that means a single user might draw down one of my feeds 10-15 times in a 5 minute span. Now, why couldn't the blogger's software be set to load & cache a copy of the newsfeed according to a schedule?

The honorable mention for RSS abuse goes to the system administrator who set up a newreader screen saver that pulled one of my feeds. He then installed the screen saver on every PC in every office of his company. Every time the screen saver activated, POW! one feed drawn down...

--
"Obviously, I'm not an IBM computer any more than I'm an ashtray" (Bob Dylan)

possible solutions by gohai · 2004-07-20 08:29 · Score: 1

as already mentioned, HTTP/1.1's If-Modified-Since header field would be a first hint

another way of minimizing the DDoS would be to use the Syndication module, which informs client programs when to update the feed - this would still mean that everyone tries to GET the RSS at the same time, but what about randomizing updateBase?

I'd Like to Solve the Puzzle by trippinonbsd · 2004-07-20 08:30 · Score: 1

get out more.

Re:I'd Like to Solve the Puzzle by mgoodman · 2004-07-20 09:07 · Score: 0, Offtopic

indeed, we all should...speaking of which...

--
01100111 01100101 01110100 00100000 01101111 01110101 01110100 00100000 01101101 01101111 01110010 01100101 00101110

Re:Still haven't tried these newfangled RSS reader by HokieJP · 2004-07-20 08:32 · Score: 1

Opera now has RSS built in. Just click on an rss link and it automatically adds it to your list of newsfeeds.

pull by LordMyren · 2004-07-20 08:38 · Score: 1

where the hell is all that Pull technology now?

eventing/callbacks/message passing/[async technology of choice] are always the better option.

Trivial solution! by Maljin+Jolt · 2004-07-20 08:39 · Score: 2, Interesting

Random intervals. I already patched my desktop RSS reader to request new feed every 73+-13 minutes.

--
There you are, staring at me again.

Wow by Anonymous Coward · 2004-07-20 08:42 · Score: 0

Netcraft actually confirms it!

Re:Wow by hunterx11 · 2004-07-20 08:49 · Score: 1

RSS is dying...

--
English is easier said than done.
Re:Wow by Anonymous Coward · 2004-07-20 09:25 · Score: 0

No, RSS is killing. It's the servers that are dying.

Not to flame... by T3kno · 2004-07-20 08:42 · Score: 3, Interesting

But isn't this what TCP/IP multicast was invented for? I've never really understood why multicast has never really taken off. Too complicated? Instead of entering an rss server to pull from just join a multicast group and have the RSS blasted once every X minutes. Servers could even send out updates more often because there are only a few connections to send to. Of course I could be completely wrong and multicast may be the absolute wrong choice for this sort of application, it's been a while since I've read any documentation about it.

--
(B) + (D) + (B) + (D) = (K) + (&)

Re:Not to flame... by Ernesto+Alvarez · 2004-07-20 09:23 · Score: 2, Informative

TCP cannot multicast. It's impossible due to its connection oriented, two way properties.

IP can multicast, but it needs support from the network to do that. The problem with that is that the internet is not under one authority that can say "from today onwards, we do multicast in such and such way". There have been experiments with multicasting (mbone), but there are some things that cannot be solved easily (eg. how do you register as a multicast client, and (important part here) how do you make every router from source to destination know about it, and act accordingly (remember, those routers are NOT under the same authority). So, even when you could multicast with UDP/IP, some logistics problems make it very difficult to do it.

However, within an autonomous system (which IS under a single authority) you could multicast, provided there is support provided by the net, in fact, both standard routing protocols (OSPF and RIP) as well as NTP can, and have multicast groups assigned to them.

It's too bad, but that's how the real world is....

--
GPG 0x1B479C78
Re:Not to flame... by Anonymous Coward · 2004-07-20 09:43 · Score: 1, Informative

there's another more practical reason why multicast is not supported over the internet; it can be very easily used to do a DDoS attack.

Imagine being able to send a ping with a forged return header to the IP address *.*.*.* and getting four billion replies sent to the person who owns the forged address.
Re:Not to flame... by stienman · 2004-07-20 10:43 · Score: 2, Informative

The practical problem with multicast was that it requires an intelligent network and dumb clients. In other words: routers have to be able to keep a table of information on which links to relay multicast information, and that has to be dynamically updated periodically.

There is a multicast overlay on top of the internet which consists of routers that can handle this load.

But the combination of no hardware/software support in the network, and no real huge push for this technology left multicast high and dry.

Brief idea of how multicast works:
1) A source send out a "I have a multicast feed" to its immediate routers. Those routers 'publish' this feed to their connected routers until every segment on the internet has seen this feed broadcast.
2) At the end points, individual computers see this message on their segment. They can subscribe to the feed by sending a message to their upstream router. This router places an entry in its table saying, "Someone on segment X wants feed Y, which I get from segment Z" It then sends a subscribe message to the router it got the original broadcast from, which does the same thing on upward until it hits the originating server.
3) Each router, when it sees a multicast packet, consults its table to see which (if any) segments it should forward the packet to. Eventually the packet makes its way to all the endpoints of the network
4) The publish broadcast is initiated periodically. Each router also periodically checks the table to see if they haven't received a re-subscribe message since the last publish broadcast. If no one resubscribes then the table entry is not refreshed - there is no unsubscribe, if you no longer want the feed just ignore it and it'll go away if no one else on your segment wants it. Only one subscriber on each segment needs to subscribe, so if I want it and my co worker wants it then if I see his subscribe packet before I send mine out then I won't send mine out since it'll be put on my segment anyway.

It's quite elegant, but when a router is dealing with 40+Gbps of packets it barely has time to figure out where each packet goes, nevermind statefully inspecting multicast packets and forwarding them appropiately. Not impossible, but it hasn't been rolled out and few providers see any money in supporting it.

-Adam
Re:Not to flame... by Bluelive · 2004-07-20 11:38 · Score: 1

Most isp's cant be arsed to setup their routers correctly for mutlicast to work. That and there are few lowbandwidth examples of multicast software, and none popular.
Re:Not to flame... by Ernesto+Alvarez · 2004-07-20 13:09 · Score: 1

That's not multicasting. That's broadcasting to 255.255.255.255. In theory it would be addressed to every device in the net, however it is really used as a "local broadcast", because routers don't forward these kinds of requests.

If you wanted to do your DDOS attack, you would somehow need to get everyone on a multicast group, not an easy task (and certainly detectable).

--
GPG 0x1B479C78
Re:Not to flame... by Anonymous Coward · 2004-07-20 13:34 · Score: 0

You're actually looking for RSS over DNS more than multicast.. That would be proper use of technology, err concepts.

DNS/RSS-pull.
BPG/RIP-push.

The difference is there for a reason -- which Dave Wiener decided to gloss over in his quest to forcibly inject a standard into/onto the net.

DNS folks tend to be like the guys the build the steel frames from which we hang buildings and no amount of rationality may be enough to let them challenge the fear that the net will break if anything dynamic/interactive gets mushed into their C code..

A lot of net functionality... www/rss/p2p would be better of with smarter DNS like proxies, no?
Re:Not to flame... by Spy+Hunter · 2004-07-20 15:26 · Score: 1

Internet TV is the killer app for multicast. Winamp 5 is bringing a working implementation of Internet TV to the masses today. If you haven't seen it yet try it out, it's cool :-). It works fine without multicast for small audiences (~100 people), but I'm sure you need tons of bandwidth. With real IP multicast Winamp TV could become much more than it is today. It could allow anybody with a home broadband connection to broadcast DVD-quality TV to the world. It could make cable TV obsolete.
I really, really hope that multicast will be feasible someday. Without it, the Internet is incomplete as a communication medium. With multicast, a whole new world of distribution possibilities would open up for people who don't have a spare arm and leg to spend on bandwidth. With working multicast and one of those fiber-to-the-home connections Verizon is offering, you would have worldwide distribution power to equal any media conglomerate. With that kind of power in the hands of the people, there's no telling what could happen.

--
main(c,r){for(r=32;r;) printf(++c>31?c=!r--,"\n":c<r?" ":~c&r?" `":" #");}
Re:Not to flame... by Spy+Hunter · 2004-07-20 15:50 · Score: 1

Multicast is the right solution to all sorts of problems that the Internet is facing today (RSS, streaming radio and streaming TV, large file distribution to lots of people ala bittorrent). Unfortunately, it has never been implemented over the public Internet, so you can't use it. If multicast was available today, the Internet would be a completely different place. You would be able to download huge popular files from places like FilePlanet without all the waiting. Distribution of huge files would be easy, BitTorrent would be obsolete. You would be able to watch streaming TV at DVD quality from hundreds of stations set up by normal people only on their DSL lines. Unfortunately, ISPs are dragging their feet, or something. I'm not sure what the holdup is exactly.

--
main(c,r){for(r=32;r;) printf(++c>31?c=!r--,"\n":c<r?" ":~c&r?" `":" #");}
Re:Not to flame... by amorsen · 2004-07-21 02:51 · Score: 1

Show me a router which can handle, say, 100 million multicast groups, and I'll agree that the problem with multicast is merely political. I'll be impressed if you can. Peer to peer is the solution -- it puts the complexity in the hosts, not in the routers.

--
Finally! A year of moderation! Ready for 2019?
Re:Not to flame... by Spy+Hunter · 2004-07-21 14:56 · Score: 1

Hey, I never said it was merely political; I just don't know. Peer-to-peer is a lousy solution; it's far too inefficient, it is often made impossible by firewalls, it is prone to "leeching," , and doesn't solve all the same problems that multicast does. I can't really imagine a working real-time video streaming P2P application, scaleable to millions of users from one broadcast source.

--
main(c,r){for(r=32;r;) printf(++c>31?c=!r--,"\n":c<r?" ":~c&r?" `":" #");}
Re:Not to flame... by amorsen · 2004-07-21 20:02 · Score: 1

I don't see the inefficiency. I have no problems imagining a working real-time video streaming P2P application. Skype seems to work just fine, although that is admittedly only sound so far. You need decent upload bandwidth from the hosts, and that may be a problem. However upgrading the last mile to 512kbps upload seems a lot simpler to me than upgrading the core routers to handle 100 million multicast groups -- particularly because I have no idea how you would build hardware to handle that. Core routers are already strained by the few hundred thousand unicast prefixes. Multicast would require magnitudes of performance improvement.

--
Finally! A year of moderation! Ready for 2019?
Re:Not to flame... by Spy+Hunter · 2004-07-22 22:10 · Score: 1

Skpye works because it is one peer to one peer. Streaming video works OK too in this environment (at least I hear that iChat AV does it well). I'm talking about real Internet TV, just like broadcast TV with the same scalability. One source (with upload bandwidth just a little more than required for one stream), millions of listeners, all in real-time. Since the source and most of the peers can only send out one or two streams at a time (and some won't even have the bandwidth for one, not to mention congestion problems in between hosts), there will need to be a *lot* of peers involved in distribution; probably around 50%. Any time one of them leaves the channel, the peers behind it will see a hiccup in the stream. Important peers leaving would result in intermittent service interruptions for large numbers of people watching the broadcast. Large amounts of peer turnover would make the stream completely unwatchable. Plus after you add in four or five hops backwards and forwards across the entire Internet you can forget about real time.
Like most Internet problems it can be solved by simply adding more bandwidth, but you're going to need more than 512 kbps per peer to broadcast TV-quality video reliably using a P2P method. You need more than that just for one stream. (I've seen Winamp TV broadcasts at 800 kbps; the compression artifacts were still quite obvious). You need each peer to be able to distribute at least two streams on average so you can get a nice tree going, and for redundancy in case of peers leaving the network three or four would be much better. I would estimate 5-10 Mbps per peer of upload bandwidth would be necessary for a workable, reliable P2P TV system. I doubt that kind of upload bandwidth will be widespread 10 years from now, especially outside of major urban areas. And the network backbones would have to be able to handle all of that traffic, which would be at least an order of magnitude larger than the same system using multicast. Would having to switch an order of magnitude less packets help some when implementing multicast?

--
main(c,r){for(r=32;r;) printf(++c>31?c=!r--,"\n":c<r?" ":~c&r?" `":" #");}
Re:Not to flame... by amorsen · 2004-07-23 00:17 · Score: 1

The amount of packets that can be routed in backbone routers these days is really huge. The Juniper T640 is specced for 770Mpps. Forwarding rate is not really a problem. Particularly because each packet forwarded can be billed. Multicast on the other hand just slows your routers down whether you actually get to forward packets or not -- and there is no easy way to bill that cost to anyone. So no, there is little interest in decreasing the amount of bandwidth used. Bandwidth is cheap and plentiful.
Anyway, if you want to send 5Mbps streams to someone, you have basically lost before you start. Almost noone has that kind of download bandwidth. And those who do generally have a shared bottleneck on the ISP side of the connection. Streaming video will work fine (with multicast) as long as you can get all the subscribers to agree on which stream they want to watch...
In the end, I bet that at the time when 10Mbps download bandwidth becomes common, decent upload bandwidth will be common as well. We are 5 years away from that point, at least.

--
Finally! A year of moderation! Ready for 2019?

PulpFiction by Cadre · 2004-07-20 08:44 · Score: 2, Informative

I recommend PulpFiction for an RSS/Atom reader on OS X. I much prefer the interface and how it treats the news compared to NNW.

--
All editorial writers ever do is come down from the hill after the battle is over and shoot the wounded.

A few dont get it by Bluelive · 2004-07-20 08:45 · Score: 2, Insightful

it seems a few peoples here dont get it. RSS is the file format, not the transfer via HTTP The whole pull problem is a problem with HTTP, in theory you could make an irc like protocol and transmit via that, solving some of the subscription, distribution and pull problems.

Re:A few dont get it by AndroidCat · 2004-07-20 09:01 · Score: 1

You could even transmit it over television signals. It's old tech, I worked on a project in 1987 that sent advertising to LED display signs that way. More recently Microsoft had a Big Feature in Windows 98 where PC's with tuner cards could tap an information feed from PBS that way. (Did anyone ever give that a try? Me neither.)
It's not a high speed channel (9600bps) and it's probably shared by Close Captioning now, but it could be done and would have nice uses combined with TV. (A newsfeed for TV news, wow, what a concept!)

--
One line blog. I hear that they're called Twitters now.

Re:Still haven't tried these newfangled RSS reader by timothyf · 2004-07-20 08:49 · Score: 2, Informative

If you don't use one computer all the time and you want to check your feeds from other places, I'd recommend going with a web-based news-agreggation service. I personally use BlogLines, but there are other services out there as well.

Re:Over the years? How about over the weekend? by nvrrobx · 2004-07-20 08:50 · Score: 1

I get this error now when my user cookie is set, no matter what time of the day. Quite irritating.

Solution: HTTP 503 Response for Flow Control by Orasis · 2004-07-20 08:51 · Score: 3, Insightful

The main problem here is that RSS lacks any sort of distributed flow control, much as the Internet did back in the early days with tons of UDP packets flying around everywhere and periodically bringing networks to their knees.

One completely backwards-compatible fashion to add flow-control to RSS would be to use the HTTP 503 response when server load is getting too high for your RSS files. The server simply sends an HTTP 503 response with a Retry-After header indicating how long the requesting client should wait before retrying.

Clients that ignore the retry interval or are overly aggressive could be punished by further 503 responses thus basically denying those aggressive clients access to the RSS feeds. Users of overly aggressive clients would soon find that they actually provide less fresh results and would place pressure on implementors to fix their implementations.

Re:Solution: HTTP 503 Response for Flow Control by GeekDork · 2004-07-20 09:37 · Score: 1

Overly aggressive clients could get a short RSS snippet with an "In other news, your client suxx0rz!" and nothing else... One problem that I see with your proposal is that servers would need to keep track of all connections in times when they're overloaded anyway to properly enforce the mechanism.

--
Fight hunger. Filet a politician and send him to a 3rd world country of your choice.
Re:Solution: HTTP 503 Response for Flow Control by Orasis · 2004-07-20 09:54 · Score: 1

One problem that I see with your proposal is that servers would need to keep track of all connections in times when they're overloaded anyway to properly enforce the mechanism.

Not true, we've already implemented something like this and it works just great. You simply allocate a fixed amount of memory to keeping track of aggressive clients and use an LRU structure to forget about the less aggressive ones.

Re:Still haven't tried these newfangled RSS reader by MinutiaeMan · 2004-07-20 08:52 · Score: 1

OmniWeb 5.0 has an RSS reader built right into the bookmarks manager, which is really neat. If only they could let us change the length of the headlines that show up in the Dock menu to something greater than just 30 characters...

Re:Over the years? How about over the weekend? by Anonymous Coward · 2004-07-20 08:55 · Score: 0

Interesting. Today at work I was browsing and was getting errors constantly at Slashdot. Pages would load but not have backgrounds, I would get "you should not be here" etc errors. This was IIRC about noon. Slashdot in general has just seemed less responsive. The only time it flies anymore is late at night.

Server side fix by Mignon · 2004-07-20 08:56 · Score: 1

I would assume that changing the server is easier than changing the client. So I took a look at HTTP status codes and I see

413 Request Entity Too Large

The server is refusing to process a request because the request entity is larger than the server is willing or able to process. The server MAY close the connection to prevent the client from continuing the request.

If the condition is temporary, the server SHOULD include a Retry- After header field to indicate that it is temporary and after what time the client MAY try again.

This suggests to me an overwhelmed RSS server could return a 413 error code with a randomized Retry-After header on the order of a couple of minutes. It seems to me that such a reply would help lessen the load on the server, as it wouldn't have to deal with content, even if said content was already cached.

If the problem is simply that of having too many open connections, then yeah, I guess you're hosed.

Re:Over the years? How about over the weekend? by Anonymous Coward · 2004-07-20 09:00 · Score: 0

You're recommending a Microsoft spec on slashdot? Are you new? :D

That reply simply delays the scalability issue. by MickLinux · 2004-07-20 09:03 · Score: 1

Quite simply, your reply is good for the next two years or so, or maybe even ten years. However, it doesn't address the scalability issue, which is a concern in and of itself.

To address the scalability issue, perhaps there should be a distributed response network, to handle the distributed overload.

One idea? The RSS news feeder should always story the IP address of computers that it sends its most recent feed to, and append a random IP of one of the other computers that downloaded its feed.

Now, the news reader then turns around and the next time it goes to download its feed, it first asks one of the other sources. Three failures, and a source gets thrown out. Nonetheless, there will build up a distributed network of computers that first look for their feeds from another source.

Now, that the brings along with it the questionability of verification of the RSS feed line. I mean, how do you know that the RSS news feed doesn't direct you to the latest Windows Spam Takeover Site? However, that can be handled with another kind of RSS or similar technology, namely PGP.

Now, this all involves new technology. But the technology isn't all that difficult -- it's been done before, in bits and pieces. So it could be done again, I suppose. The real question, to me, is whether it's all worth it.

--
Correct Horse Battery Staple: 72 bits of entropy. Enter "Correct H" into google. When it generates the phrase, that's

Re:That reply simply delays the scalability issue. by NuclearDog · 2004-07-20 18:29 · Score: 0

Well, we need to make this hamburger into a fighting machine, so maybe if we just take of the buns and replace them with titanium armour, and take out the hamburger and replace it with some electronics...

My point is, although your idea is good, they are probably looking for a fix to the RSS system, not a whole new, much more complex one :)

ND

--
This statement is forty-five characters long.

A server fails to send 1K of data to 1% of users by iamacat · 2004-07-20 09:06 · Score: 1

When many sites withstand slashdotting that involves movies, images and dynamically generated pages. This kind of problem can only result from extraordinary stupidity of both client and server.

Start by running RSS reader on a cheaper separate server hosted by another ISP. If clients connect at random time, great. If they connect exactly on the hour, the ones that get through will only get the "news" about an RSS reader than will fix that problem.

Use RSS-over-NNTP by MenTaLguY · 2004-07-20 09:06 · Score: 1

NNTP already has all the issues solved pat.

--

DNA just wants to be free...

Re:Still haven't tried these newfangled RSS reader by Glock27 · 2004-07-20 09:10 · Score: 1

RSSOwl - http://rssowl.sourceforge.net/ is pretty good.

[sarcasm]That can't possibly be true - it's a client Java app![/sarcasm]

--
Galileo: "The Earth revolves around the Sun!"
Score: -1 100% Flamebait

RSS is a fileformat, not a protocol by BoxedFlame · 2004-07-20 09:11 · Score: 1

The real problem is that RSS is a fileformat, not a protocol. This means that if the client wants "all entries since X" it'll get "the last Y" which is might not include everything it wants or might be way too much. RSS and ATOM needs to be complemented by a protocol. This protocol could very well be based on RSS or ATOM but includes the most trivial feature of specifying what it wants.

IMPORTANT LINK FOR MR. DICKERSON by Anonymous Coward · 2004-07-20 09:11 · Score: 0

www.akamai.com

Seriously. You've got a bunch of people trying to download the exact same file from all over the world. This problem has been solved.

A new daemon not the answer... by Lurch00 · 2004-07-20 09:11 · Score: 1

I think the suggestions for a whole new RSS specific protocol are somewhat off base. To me, the beauty of an RSS feed is that you don't really need anything special to supply one, just a web server, which you're probably already running, and a script to actually generate the file. I don't think there'd be a lot of success if RSS required special protocols etc. Maybe if you solve the general case of providing a push service and then distribute RSS data over that, but that's a lot of work. A previous poster mentioned using SMTP, and that could be an interesting solution.. Mail updates to an aggregation service and then read via IMAP? I don't make heavy use of RSS, so I don't know if that's a useful way to look at the data or not.

The problem with the server notifying the clients that something has changed doesn't really fix the problem (think thundering herd trying to acquire a mutex), unless it is smart enough to delay notification to clients to even out the load.

I think the real solution is to have the server dictate when the client should check back, and enforce that time delay. If the server just asks nicely, many clients will ignore that. My solution is to use one time passwords that are only valid in a certain window. With the RSS data, the client gets the next password and the window. The passwords would be randomly generated and the algorithm for choosing the window could be arbitrary. Regular HTTP AUTH would be sufficient here. A cron job could manage the list of valid passwords. You'd still have to provide an unprotected stream to get the first password or if the window has expired. To prevent abuse of this, you could limit accesses per IP per day, delay the first window, provide a smaller (empty?) stream, etc.. This could probably provide a backwards compatibility path too. Maybe make it a degraded stream to encourage users to upgrade their clients, or have the first article be "Update your client!" or something..

Just a few thoughts.. Anyone care to comment?

Problem is *scheduled* pulls... by Not_Wiggins · 2004-07-20 09:14 · Score: 1

I use an RSS reader; it is a heavily modified version of rnews, which I customized for my own needs.

What my RSS reader does is it limits how often I make the request. So, it won't make a request until X minutes after the last time I made a request.

To be a good netizen, I don't set that to anything less than 30 minutes.

But the real beauty is that if I don't bother looking at the news for, say, 3 hours, it won't bother retrieving it. It will retrieve it when I look (so what if I have to wait a few seconds for it to download the feed?) and resets the timer to not check (no matter how often I reload the page) until X minutes have passed.

If people adopt an "on-demand" policy instead of scheduled, it should help push out the lifespan of this tech in its current form.

--
Diplomacy is the art of saying, "Nice doggie!" until you can find a rock.

Responsible clients by photon317 · 2004-07-20 09:19 · Score: 1

Authors of RSS feed-grabbing software should do the responsible thing: allow the user to set a desired refresh-rate down to as low as say 1 hour, but always use a random 30 minute window for the actual refresh time.

So, if the user specifies a two hour refresh time, and the application just got done pulling a feed, it should sleep for 105 minutes plus a random amount of time between 0 and 30 minutes, which means the feed is actually updated randomly in the window of 1:45-2:15 after the previous update.

At the very very least if they don't do the above, they should at least base their refreshes on time intervals since startup (or since the last pull), so that you don't see the global synchronizations on :00 as badly.

I use the "random 30 minute window" technique similarly at the office to distribute the load on an rsync server and it works wonderfully (all the machines in our whole environment wake up on a cronjob at a wee hour of the morning, each sleeps for a random 0-30 minutes time interval and then fires off its rsync request - the result is that rsync load is distributed evenly over a 30 minute period instead of the server getting pounded into the dust for 5 minutes straight.).

--
11*43+456^2

Re:Responsible clients by mike3k · 2004-07-20 09:27 · Score: 1

NetNewsWire checks every hour (or whatever interval you select) from the time it started up or woke from sleep, so if you start it at 9:42 it will check at :42 after each hour.

rss + bit torrent by k2enemy · 2004-07-20 09:19 · Score: 1

this has already been the topic of much discussion

A website devoted to talking about Slashdot? by swb · 2004-07-20 09:19 · Score: 1

It'd be pretty amsuing to read, and helpful to figure out if rendering or other problems (like the recent rash of 503s) has anything to do with me or if everyone gets them.

Told Ya So by cmacb · 2004-07-20 09:27 · Score: 3, Interesting

I think this was more or less the first thought I had about RSS when I first looked into it and found out that it was a "pull" technology rather than a "push" as the early descriptions of it implied.

Yes, it's "cool" that I can set up a page (or now use a browser plug-in) to automatically get a lot of content from hundreds of web pages at a time when I really opened up the browser to check my e-mail.

What would have REALLY, been cool would be some sort of technology that would notify me when something CHANGED. No effort on my part, no *needless* effort on the servers part.

Oh wait... We HAD that didn't we, I think they were called Listservers, and they worked just fine. (Still do actually as I get a number of updates, including Slashdot, that way.) RSS advocates (and I won't mention any names) keep making pronouncements like "e-mail s dead!" simply because they have gotten themselves and their hosting companies on some black hole lists. Cry me a river now that your bandwidth costs are going through the roof and yet nobody is clicking though on your web page ads, because, guess what? Nobody is visiting your page. They have all they need to know about your updates via your RSS feeds.

It's simple - HTTP is ubiquitous by jamezilla · 2004-07-20 09:37 · Score: 1

Obviously there are better solutions to handle the syndication problem. There's no question about it.

The reason why RSS took off is because everyone has a friggin' web server. Most ISPs throw in some web space with your dial-up/DSL account. They don't throw in a multicast server.

RSS wasn't orignally invented to handle InfoWorld's traffic. It was a blogging thang. Most blogs don't get that much traffic so it's a fine solution.

Exponential backoff!!!!!!!!!! by hta · 2004-07-20 09:38 · Score: 1

This has been known since the beginning of networking.
If you overload a resource, you back off - EXPONENTIALLY and RANDOMLY DISTRIBUTED. The more congested, the longer you back off. And you do NOT want lockstepping, so you add a random component.
Do RSS servers know how to tell the clients "I'm congested"?

Re:Exponential backoff!!!!!!!!!! by Derek+Mason · 2004-07-20 18:06 · Score: 1

Excellent point. I sometimes see serious load on our proxy from individual RSS readers who find that a feed doesn't exist, and then keep polling it as often as every minute, to see if it suddenly magically appeared. Exponential (or hell with it, even quadratic) backoff is the way to go.

Seems like RSS shoudl be P2P by np_bernstein · 2004-07-20 09:39 · Score: 1

We already know the data is going to be small, with lots of clients wanting to get the information -- seems like something that logically should be p2p.

Something simple, like the first time you connect, you try the main server, and are given a list of partners available to get the feed from. The next client does the same thing, and now you're one of the list given to them. If all your partners are unavailable, or none of them have the data, you connect back to the server and start over.

--
RandomAndInteresting.comdefending the world from stupidity since 1979

Maybe they need to think this through by Anonymous Coward · 2004-07-20 09:44 · Score: 0

This isn't hard. RSS content in the Infoworld's case isn't personalised (i.e. tailored to a given user) -- so why not just push out static RSS files to a web server file system when content which is included in the RSS feed changes?

In which case, if Infoworld's web server can't keep up with a bunch of GET and HEAD requests for static content then perhaps Infoworld should think about getting a new static content hosting provider.

Now, I'd be betting that Infoworld calculates the RSS request on every hit -- and there lies the problem.

(In fairness, RSS clients should support e-tags and Last-Modified headers and gzip compression -- any clients not doing this are brain dead -- but even then if you can't handle a few extra hits every hour from every reader of your website, you have a fundamental problem with your infrastructure -- RSS can be improved, sure, but don't blame RSS for a lack of infrastructure or serving it dynamically each time.)

Try IPv6 multicasting instead. by dmeranda · 2004-07-20 09:45 · Score: 1

That's still way to bloated to scale properly to the multiple-thousand user's you're talking about. Anything that requires a separate network connection for each user, and/or requires the server to keep track of all the "listeners" is not very practical.

The better technology for this is multicasting. And specifically the new and much improved multicasting technology built into IPv6.

Feed on Feeds (Web based) by Poulpy · 2004-07-20 09:53 · Score: 2, Informative

Neither Windows nor Unix, but I've set up Feed on Feeds on my webserver and I like it!

It's a "PHP/MySQL server side RSS/Atom aggregator", so you can read your feeds wherever you are, you only need a web browser on the client side.

Pros:
1) you don't need to synchronize the state between the multiple workstations you might use.
2) no platform/os problem on the client side.

Cons:
1) you need some web hosting with PHP and MySQL available (I pay 45 a year for my domain name + 30MB Webspace + 30MB FTP + 30MB MySQL base + 100*25MB pop/imap accounts + SSL everywhere).
2) no installer so you'll need many computing skills to set it up (no that hard).
3) no automated update, you have to click "Update" so you may miss some news when you offline (see away from any internet access) for a long period...

Changed my online life as I no longer have to install anything on the client side (usefull when away from your home/office) or have to synchronize my feeds either with some removable storage (my USB key failed after 250+ daily syncs) or through the net (BottomFeeder, a smalltalk implementation which works on every platform I ever came accross, allows to sync with an FTP location).

Regards,
Poulpy.

Re:Feed on Feeds (Web based) by exhilaration · 2004-07-20 10:09 · Score: 1

Set up a cron job for automated updates. The instructions are in the README.
Re:Feed on Feeds (Web based) by Poulpy · 2004-07-20 18:26 · Score: 1

Well, I don't have access to the crontabs on this host (which is shared among several other customers of the web hosting company).
Re:Feed on Feeds (Web based) by exhilaration · 2004-07-21 04:01 · Score: 1

That's too bad. I'm paying $10/month and I have access to just about everything.
Re:Feed on Feeds (Web based) by Poulpy · 2004-07-21 04:15 · Score: 1

That's $10 * 12 months = $120 per year.

I pay €45 (about $55) per year which is half what you're paying, hence the difference in service.

Regards,
Poulpy.

PS: Just noticed the euro sign isn't slashdot friendly (must use "€").

Done by Effugas · 2004-07-20 09:59 · Score: 1

Email me if you're an RSS developer.

Re:Still haven't tried these newfangled RSS reader by hackrobat · 2004-07-20 10:06 · Score: 1

I just downloaded and installed it. And I'm using it. It is pretty good. I was able to import my subscriptions from Bloglines into RSSOwl. It used Mozilla 1.4 has a built-in browser on Linux, and IE 5+ on Windows. Neat.

And, BTW, client-side Java is pretty good. I'm happy to see an SWT-based GUI application other than the Eclipse IDE itself. It's a proof-of-concept (and you have the source). Now if you want to write a multi-OS GUI app in Java, you know what to refer to.

Distributed (ala bittorrent?) by Eric(b0mb)Dennis · 2004-07-20 10:06 · Score: 1

what about making a client that just uses bittorrent files to get the RSS data, use 1 random client to get the RSS feed, then he seeds it to all the others requesting using BT, would be pretty awesome and really fast

--
Excuse me, I don't mean to impose, but I am the ocean

Push by danila · 2004-07-20 10:27 · Score: 1

That's why we needed a push technology. Unfortunately, we were too stupid to realise it during the dot-com craze and now Netscape will probably refuse to reimplement it for us...

--
Future Wiki -- If you don't think about the future, you cannot have one.

Random != Distributed by grcumb · 2004-07-20 10:31 · Score: 2, Interesting

True story:

We ran a network operations center to provide support for several hundred servers spread over two continents. Each hour, every server would 'phone home' to see if it needed updates or configuration changes. This was a fairly data-heavy operation, requiring many database lookups. We knew that we didn't want every server calling at the same time, so we had each server derive its own random integer between 1 and 59, and to use that as the minute of the hour to contact the NOC.

Before long we found that the NOC was dragging itself into a death spiral of overwork. The problem? By chance, an unusually large number of servers chose a very small range of numbers. Worse, they just happened to choose numbers close to 05, which just happened to be when some very large cron tasks were running as well.

Try rolling a die 100 times. Even though the odds are the same every time before you roll, the actual frequency of occurence of the individual numbers is not even. Leaving the choice of retrieval time to the client does not reliably reduce the chance of a server being overwhelmed. In fact, it more or less guarantees traffic spikes.

I'm not intimately familiar with RSS client or server implementations, but I suspect that it would be fairly easy to format a suggested refresh interval and refresh time on the server and send that to the client.

--
Crumb's Corollary: Never bring a knife to a bun fight.

it's the RSS _client_ being stupid by cs · 2004-07-20 11:22 · Score: 2, Insightful

Clients polling on the hour? How stupid is that?

Even for a poll at hourly intervals this should get staggered across an given hour according to when the client starts. Also, a client should probably not be polling every 3600 seconds (or whatever interval) but polling with a 3600 second gap between end of one poll and start of the next. In this way a loaded server will smear the clients out simply by having slower response, and the load will even out on its own.

It's always bad to have lots of agents doing things in synchrony when that involves an outside resource. Contact the client authors, give them a clue, let the upgrades push the bugfix out.

Finally, isn't RSS done over HTTP anyway? So why aren't these clients going through their ISP's proxy and doing Get-If-Modified? The target server should see only a fraction of the spike even with bad clients. Unless they're very very bad...

None of these things is a direct flaw in RSS, just crap quality of implementation in RSS clients.

--
Cameron Simpson, DoD#743 cs@cskk.id.au http://www.cskk.ezoshosting.com/cs/

Regular slashdotting by krogoth · 2004-07-20 11:51 · Score: 1

So, it's kind of like an hourly slashdotting. Speaking of which....

--

They that quote Benjamin Franklin on liberty and safety deserve neither.

i have a better idea.... by cheekyboy · 2004-07-20 12:22 · Score: 1

Why not add to the content, an extra file with details of when such data gets updated, or when the next update will happen as well as getting the md5 of the current content to decide if to download or not the new content.

So just one request for /get_nextupdatetime.xml
and one for /get_content_md5.xml to decide if to do the whole big download of /get_our_realcontent.xml

Oh and the server can randomize the result of get_nextupdatetime.xml by jiffying the real time += 10 minutes so its not always exact, so all clients will get slightly different values.

--
Liberty freedom are no1, not dicks in suits.

Re:i have a better idea.... by Russ+Nelson · 2004-07-20 15:34 · Score: 1

No. DNS TTLs.

Now do you understand why my suggestion is better? Or must I explain it in greater detail?
-russ

--
Don't piss off The Angry Economist

Firewalls are the issue by tepples · 2004-07-20 12:27 · Score: 1

These days it seems that everyone wants to use HTTP for everything and quite frankly it's not equipped to do that.

What else goes through common Internet firewalls as cleanly as HTTP? Many providers provide WWW access at a discounted rate compared to Internet access.

Re:Firewalls are the issue by hadaso · 2004-07-20 23:40 · Score: 1

> What else goes through common Internet firewalls as cleanly as HTTP?

EMAIL does!

Re:Still haven't tried these newfangled RSS reader by Anonymous Coward · 2004-07-20 12:30 · Score: 0

Opera has RSS built in.

Aggregate Feeds by Nurgled · 2004-07-20 12:35 · Score: 1

A possiblility is to try to reinvent the NNTP model over HTTP. Have big "super-nodes" which poll the originating feeds and store the result in some big database, and then allow users to pull down one big feed containing stuff from all of the sources they are interested in.

Of course, there'd have to be something in it for the super-nodes, and I suspect what would happen is that they would charge a nominal fee, or perhaps bundle it alongside some other, similar service. One example of this is LiveJournal, which currently distributes RSS and Atom feed content to any interested LiveJournal user via the "friends page" mechanism from a single database, so there's only one poll per hour (or so). All they need now is some kind of feed of the "friends view" and you have a special version of a feed distributor with some value-add: you get all of your LiveJournal friends' content in there too.

Bare RSS isn't set up for this since it can't support per-item source information, but Atom can do it and RSS can be extended through namespaces to contain the relevant info as long as it becomes popular enough that clients support it.

In fact, if this were to be done, it would also be useful to have an optional "intelligent poll" mode where the client tells the server a magic token it got on its last poll which the server can then use to give a delta feed rather than a fixed feed. This would have to be optional, since the CPU burn of it vs. just copying a static file to a socket would probably only be a win on big sites like Slashdot and BBC News Online.

In fact, it looks like the Atom guys already thought of all this. Check out AggregateFeeds, SuperAggregator and the overly-long PossibleHTTPExtensionForEfficientFeedTransfer entries in the Atom Wiki. I didn't read it all through in depth, but it looks like they're talking about the same thing I'm talking about.

Not so good for news by Nurgled · 2004-07-20 12:41 · Score: 1

I don't use RSS (and Atom) for reading the news, I use it to monitoring posts in people's weblogs and sometimes even the comments to them if the content producer has been nice enough to make those available.

I agree that using RSS to read "real news" is like trying to read someone else's newspaper from the other end of a train car.

Re:Not so good for news by Eythian · 2004-07-20 17:47 · Score: 1

I use knewsticker for news. It just has a scrolling thing at the bottom, where a word may catch my eye. However, usually I right-click it to bring up a list of all the articles, if I want to see if anything new is on slashdot/the register/wired/whatever. I find it a lot nicer than trolling through browser bookmarks, however I'm effectivly treating it like a set of bookmarks that update themselves.

Re:Still haven't tried these newfangled RSS reader by Anonymous Coward · 2004-07-20 12:53 · Score: 0

I'm happy to see an SWT-based GUI application other than the Eclipse IDE itself.

Maybe you should try Azureus, for your BitTorrent needs (I know, many are happy with btdownloadcurses :P this is just an example of another SWT-based app)

Bittorrent+RSS by jonasmit · 2004-07-20 13:11 · Score: 1

http://blogs.law.harvard.edu/tech/bitTorrent

Great but stupid technology by X-Nc · 2004-07-20 13:19 · Score: 1

> [RSS] is such a useful thing, it's unfortunate that it's ultimately just very stupid.

Heh, just like the car alarm. A great idea that never should have been invented. One of those, "What were they thinking?" kinda things.

--
-- If I actually could spell I'd have spelled it right in the first place.

RSS+BitTorrent by jonasmit · 2004-07-20 13:19 · Score: 1

http://www.wired.com/news/infostructure/0,1377,626 51,00.html http://blogs.law.harvard.edu/tech/bitTorrent http://slashdot.org/article.pl?sid=04/06/21/015024 3 http://www.digitalbloc.com/200403/rss-bittorrent-n on-stop-downloading.shtml

Ever heard of exponential back-off? by Anonymous Coward · 2004-07-20 13:19 · Score: 0

The site serving RSS could always report Status: site busy try later, and RSS readers could come back and have another go later.

After all, it's not like a user is actually reading his RSS feeds on the hour. A manual refresh will reset the back-off and try again.

Re:Still haven't tried these newfangled RSS reader by Anonymous Coward · 2004-07-20 13:29 · Score: 0

I can't recommend anything for Windows, but @ centericq has support for RSS feeds (and a whack of LiveJournal support, not to mention irc/ICQ/ypager/MSN plus some other protocols I don't use, and it's text-based).

Unfortunately, I haven't been successful in getting it to send newsitems via e-mail (although I did succeed with all the chat protocols), so I have a crontab running @ rss2email. (the reason I like sending all this via e-mail is because I have a @ Motorola T900)

Re:Still haven't tried these newfangled RSS reader by Pento · 2004-07-20 14:06 · Score: 1

Dr. Sp0ng wrote:
Haven't found a non-sucky one for *nix, although I haven't looked all that hard.

I've been using aKregator for some time now. It is progressively sucking less and less. It's almost at a point where it's good.

XML? by NoelWeb · 2004-07-20 14:24 · Score: 1

Gee... it couldn't be the size of the feed itself, could it? Hell, I'll just throw a thousand records into an xml structure. RSS, while convenient, is inefficient in my book.

Reinventing the staggered pseudorandom request by blair1q · 2004-07-20 15:29 · Score: 1

schedule.

If I have to explain that...

Reduce Bandwidth by EvilTofu · 2004-07-20 15:50 · Score: 1

Reduce Bandwidth by Combining RSS with BitTorrent.

A possible solution by Guspaz · 2004-07-20 16:27 · Score: 1

RSS is very easily cacheable. That is to say, it can be treated as static content that is updated every so often.

A simple solution would be to update a static file every 5 minutes or so. Once that's done, you can write a very small C program that listens on a port, and as soon as it receives a packet, returns a static HTTP response with the RSS file, that is already cached in memory. This small C program would update the cached copy it had in memory every 5 minutes from the updated version. During the udpate, the file could be gzipped to reduce bandwidth consumption; since the file is PRE-gzipped, there is no increase in CPU usage to serve the gzipped file.

That done, the CPU and memory requirements should be minimal. There would remain several possible bottlenecks, which might include bandwidth, and system-wide TCP/IP issues.

The bandwidth issue can be solved for a reasonable price. If the site involved does not have sufficient bandwidth to handle the load, a very cheap solution is to rent a dedicated server at one of a variety of providers. Generally, looking at the most popular providers, you will pay under 100$ for a server with over 1000GB/mth of transfer on a 100mbit connection.

Assuming that every hour, one hundred thousand users attempt to get an updated RSS feed, and that RSS feed is 10KB gzipped to 5KB, a dedicated server with about 350GB/mth would suffice. Assuming the server had a 100mbit connection, it would take about 40 seconds to serve all those clients. Considering that the clients will not all request the file at exactly the same time, but due to slightly different system clocks, at varying times, this should be sufficient.

If the load is too great for a 100mbit connection to handle, it may be possible to get a gigabit connection from a more professional provider for a reasonable cost, on the assurance that it is only for very short bursts. The cost would be significantly greater, but generally still reasonable for a site large enough to have a hundred thousand RSS requests per hour.

Or, of course, if all the above is too complicated, you could just rent a big-assed server from a dedicated server provider and turn on Apache's server-side cacheing.

I think webbased is the way to go by Fo0eY · 2004-07-20 17:01 · Score: 1

I've been working my own site completely based on consolidating all the newsfeeds popping up on the web and making it all so much more usable

http://fooey.net/NewsArchives

It makes it easier to stay up to date on all your favorite sites, and includes searching and page caching (take that slashdot effect!)

another big plus is many many users can all use the same RSS feeds, yet only one request per hour has to be made

I'm planning on making it so you can create an account and pick the feeds you want, and after that greatly increase the number of feeds provided but real life and real work keep interupting

Anyways, I built it for myself originally, butit didn't take long before everyone at work had it bookmarked. So I figure if other people find it useful, I'd be glad to share! =)

Conditional GET only works... by Watts+Martin · 2004-07-20 18:47 · Score: 1

...if the server is smart enough to support it, too. A lot of "big" content management systems -- possibly like the one Infoworld uses -- don't; they generate their RSS feeds on the fly for every request.

I'm not sure whether Slashdot supports the conditional get, but from a cursory examination of NetNewsWire's bandwidth report, the answer is no. Part of the problem Commander Taco is complaining about in his commentary on this could be addressed by making Slashcode smarter.

IMAP by mr100percent · 2004-07-20 19:01 · Score: 1

What about my Mail.app checking my IMAP and POP accounts every minute? If a heavy traffic server like .Mac's can handle the load balancing of thousands of users simultaneously, can't they do the same for RSS? Load balancing isn't a Web-serving specific solution either.

Re:IMAP by hadaso · 2004-07-20 20:38 · Score: 1

IMAP has real push technology. IMAP IDLE command lets the server notify the client about a new message in a mailbox. No need for periodic mail checking. A major difference between a POP/IMAP server and a web-server is that an email server has to sdeal only with a limited number of users that have mailboxes on that have their mailboxes on that server. A webserver might have to serve many millions of users. I never used RSS, and it seems like a wrong model: instead of information really being pushed like in sending it by email, everyone is checking every website every hour for updates. I understand that people are afraid of getting spam if they subscribe to many services. But this problem is not really inherent to the use of email: it's just a problem of people not understanding how to use email. Every subscription should be made with a different email address, and filtering can prevent any info but the requested info to be received at a particular address. An application with user interface looking exactly the same as an RSS reader can handle these kind of things for the user, from whose point of view it would look exactly the same as RSS. But from the point of view of the information supplier it would be a totally different story: instead of having to deal with unexpected loads, the provider has total load control. Another advantage of email, from the user's point of view, is that information is updated not once every hour, but as soon as the information provider is ready to send it. I think that the way this should really be handled is with email clients/servers transparently handling multiple email addresses and filtering for the email user. One would perhaps need a protocal for clients and servers to communicate relevant information about addresses and filters, and perhaps a protocol for communicating changes in subscriptions with information providers (so an email client can securely communicate an address change or perhaps more information when an address needs to change (say when a user clicks a "this is spam" button. The first spam message received would be enough to cancel the address. The user wouldn't even have to be aware of the address change. The spamers would have perhaps billions of unusable addresses and the trading of mailing lists would be impossible).

Re:Still haven't tried these newfangled RSS reader by The+Grassy+Knoll · 2004-07-20 20:56 · Score: 2, Funny

>On Windows I use RSS Bandit

Pronounced "ArseBandit"?
That's priceless, to a Brit at least.

.

--
They will never know the simple pleasure of a monkey knife fight

Re:Still haven't tried these newfangled RSS reader by 216pi · 2004-07-20 21:36 · Score: 1

Opera's mail client M2 has an integrated RSS feedreader. New articles appear as a new mail.

Scheduling by takkaria · 2004-07-20 23:02 · Score: 1

I believe that there is a module for RSS that allows you to add scheduling info. But it doesn't really help, because it just shifts when the fetch occurs from on-the-hour to on-the-day or on-the-month. Just as big a peak, just at a different time.

That's a VERY important observation! by hadaso · 2004-07-20 23:50 · Score: 1

It means that actually RSS software fails to do its role, of collecting all updates from a source. Plain on mailing lists don't fail here where RSS miserably fails. If your old washing machine dies and cuts your electricity with it just when you go out for the weekend, then when you come back you have lost all the weekend changes in your RSS feeds, but your mailing lists updates safely wait for you on your mail server!

It can be throttled, but RSS could be improved by SgtChaireBourne · 2004-07-21 03:10 · Score: 1

RSS can be throttled either by the server or by the firewall. It is just HTTP traffic. But RSS still transmits redundant information, especially if the server is polled often.

Still sticking with just HTTP and RSS as it is now, some kind of if-modified-since HTTP request would greatly reduce the load. That or a checksum. Or a date-time stamp.

It would also be possible and more complex to make a TCP or UDP based RSS designed to be robust and minimize effects of heavy use. A lot of information can be crammed into a single UDP packet, or it could just be a checksum or even just a date-time stamp.

--
Beta is broken and the link to classic doesn't work. Stop wasting our time or there won't be anybody left here.

Re:Distributed (ala bittorrent?) by Thundersnatch · 2004-07-21 03:51 · Score: 1

BitTorrent has so much setup overhead that's it's silly to use it for small files like RSS feeds. You have to connect to a tracker, get a list of peers, and wait for a peer to optimistically unchoke you. Just the "connect to tracker" part of the BitTorrent handshake probable requires as much work for a server as just sending out the RSS over HTTP. So you would be trading a slashdotting of your web server for a slashdotting of your BitTorrent tracker.

Also, using BitTorrent for RSS doesn't solve the firewall problem, which is why other "push" approaches to RSS distribution won't work. Most enterprises are not going to allow any type of push protocol into their networks, and 95% of home users won't be able to figure out how to do all the firewall shenanigans necessary to make BitTorrent work.

It seems to me that everybody on Slashdot wants to use BitTorrent for everything these days, even though BitTorrent is only good at one thing: decreasing the bandwidth required for distributing large files (not small ones).

E-mail has its own issues by tepples · 2004-07-21 05:26 · Score: 1

Using e-mail to push an RSS feed has its own downsides:

IMAP/POP3 is a polling based technology, no better than HTTP. Many ISPs will limit how often a particular customer can poll the e-mail server in order to get away with running less expensive servers that process fewer mail transactions per second (on broadband) or in order to more accurately detect inactive users (on dial-up).
SMTP in destination mode requires the ability to accept incoming connections as a server, which many monopoly or duopoly broadband ISPs expressly prohibit in the acceptable use policy, which is non-negotiable for residential service.
Many ISPs allow outgoing connections on the registered SMTP ports (25/tcp and 587/tcp) only to the ISP's own mail server, which will often 1. change the From: address to an ISP-issued address that may look unprofessional to the owner of a domain and 2. limit to how many addresses a residential user may send a message, in order to reduce the use of bandwidth by unsolicited bulk e-mail.

Re:IMAP murdering Islamic killer by Anonymous Coward · 2004-07-21 18:31 · Score: 0

baby raping murdering islamic killer. you are a pedophile, and when you cant get boy ass you fuck men, and your father, fucker.

Yes, they are clueless by xixax · 2004-07-22 15:55 · Score: 1

My guess is that InfoWorld is dynamically generating the RSS for each request. A simple host-side cache of the generated XML, so hits just talk to the HTTP server and not the app server, would probably make this a non-issue.

This seems to be the case. I went to a talk by James Robrtson (aka Bottomfeeder RSS client) last night and his opinion was that this was their problem, and they were not understanding the issue. Cache it as static content, use mod_gzip and let Apache handle it.

Xix.

--
"Everything is adjustable, provided you have the right tools"

Re:Yes, they are clueless by krails · 2004-07-22 19:02 · Score: 1

Actually it is all static XML that we generate and server up with Apache. We just have tons of RSS subscribers, and most RSS clients do their checks at xx:00 instead of spreading the checks out. All it would take is for it to do a full refresh on startup, and then once an hour (or whatever your preference is) after that.

I fixed two of minor issues today... I removed .xml from bring processed for ServerSideIncludes which allowed Apache to send ETags again, and I also added Expires info as well. Those will help with caching and some of the RSS aggregators, but won't fix the dumb client designs. =)

As Chad explained in his column and again in his blog, this is not a bandwidth problem. mod_gzip won't make a difference.

http://kevin.railsback.com/blog
Re:Yes, they are clueless by smitty45 · 2004-07-23 12:30 · Score: 1

it's not InfoWorld's problem. It's not a bandwidth issue. It's the timing, and the format.

Slashdot Mirror

When RSS Traffic Looks Like a DDoS

443 comments