How Much Bandwidth is Required to Aggregate Blogs?

All at once by someonewhois · 2005-08-14 10:30 · Score: 5, Interesting

It would make a lot more sense to have a protocol where you check one file that has a list of links to another XML file, and then the aggregator figures out which of those URLs has NOT been aggregated, then it downloads the other XML file which has the post-specific info, which it proceeds to display. That would save a lot of bandwidth, I'm sure.

Re:All at once by ranson · 2005-08-14 10:42 · Score: 4, Insightful

I'm trying to understand how this would help because if everyone would incorporate generally accepted practices with regard to the HTTP protocol into their XML generation script (e.g., including Last-Modified and/or Expires headers, providing an e-tag, etc) the aggregators could use Get If-Modified-Since requests to save an unthinkable amount of bandwidth. As it is right now, since most RSS feeds are generated on the fly from some database, that doesn't happen and the aggregators just have to pull the entire XML at regular intervals to ensure nothing was missed. I find it silly that some basic functionality of the WWW like smart caching rules started being ignored when RSS came along.
Re:All at once by G-Licious! · 2005-08-14 11:01 · Score: 2, Interesting

I don't think you need a list of links or even a separate file. An easier solution might be to just pass a format string in a separate link-tag on the html page announcing the feed. For example, right now we have: (taken straight form the linked article)

<link rel="alternate" type="application/atom+xml" title="Atom" href="http://www.feedblog.org/atom.xml" /> <link rel="alternate" type="application/rss+xml" title="RSS" href="http://www.feedblog.org/index.rdf" />

And we could introduce a new relationship type, say "recent-feed", with a strftime-like format string:

<link rel="recent-feed" type="application/atom+xml" title="Atom" href="http://www.feedblog.org/atom.xml?date=%Y-%m- %d&time=%H:%M:%S" /> <link rel="recent-feed" type="application/rss+xml?date=%Y-%m-%d&time=%H:%M :%S" title="RSS" href="http://www.feedblog.org/index.rdf" />

Ofcourse, that'd require the blog feed to be a dynamic page of some sort (PHP, Python, Ruby, Perl, whatever..), but that shouldn't be a problem; I can't think of a single blog with a bandwidth problem that is using static pages.
Re:All at once by broward · 2005-08-14 11:21 · Score: 4, Interesting

The bandwidth isn't going to matter much.

The blog wave is close to an inflection point,
probably within six to twelve months...
which means that total bandwidth will probably
top out at about TWICE the current rate.

http://www.realmeme.com/Main/miner/preinflection/b logDejanews.png

I suspect that even now, many blogs are
starved for readership as new blogs come online
and steal mental bandwidth.
Re:All at once by jrockway · 2005-08-14 19:45 · Score: 2, Informative

Most sane webservers GZIP the content. XML compresses extremely well. (In other words, gzipped XML is just as efficient space-wise as a binary memory dump. And much easier for mere people to understand.)

--
My other car is first.

How much? If everyone GZipped, a lot less! by ranson · 2005-08-14 10:30 · Score: 4, Insightful

How much bandwidth is required? A lot less if everyone would take the 5 minutes required to implement GZip compression on their Apache servers. It saves you bandwidth, it speeds up your site for users (especially those on dialup), and saves the bandwidth of aggregators (assuming they advertise an Accept-Encoding header for gzip; deflate)

So my plea to the internet community today.. make sure your web server is configured to send gzipped content. TFA says he doesn't know how many RSS feeds can support gzip. The answer is easy really, any feed being served by Apache (plus a LOT of other webservers. AOLserver even added gzip support recently). Here's how to setup Apache and here's where to check if your site is using GZip or and get an idea of the bandwidth savings you should see get. If you're site isn't gzipping, show your admin (if it's someone else) the 'how-to' above and ask them to implement it -- it's an absolute no-brainer win-win for everyone that takes no time at all to setup really. It's really absurd IMO that it's not enabled in Apache by default.

Re:How much? If everyone GZipped, a lot less! by TCM · 2005-08-14 10:38 · Score: 3, Insightful

Of course every server is powerful enough that CPU time can't possibly become an issue, right?

--
Of course it runs NetBSD. BTC: 1NT7QvbetmANwaMzhpVL6
Re:How much? If everyone GZipped, a lot less! by Madd+Scientist · 2005-08-14 10:40 · Score: 5, Informative

i used gzip with apache at an old job and we ran into a problem with it... some obscure header problem in conjunction with mod-rewrite.
so i wouldn't say ANY site using apache... but probably most. the real problem there is with compression load on the servers... gzip compression doesn't just happen you know, it takes CPU cycles that could be being used to just push data rather than encode it.
Re:How much? If everyone GZipped, a lot less! by ranson · 2005-08-14 10:54 · Score: 2, Insightful

>Of course every server is powerful enough that CPU >time can't possibly become an issue, right? On moderately busy servers, most have found that mod_gzip helps with both CPU and RAM, since users stay connected to your server for shorter durations, resulting in overall fewer concurrent connections.
Re:How much? If everyone GZipped, a lot less! by ZorbaTHut · 2005-08-14 11:06 · Score: 4, Interesting

As I remember, www.livejournal.com has experimented with gzip compression several times. They've discovered that the price of the CPU far exceeds the price of the bandwidth.

Bandwidth is cheap. Computers, not so much.

--
Breaking Into the Industry - A development log about starting a game studio.
Re:How much? If everyone GZipped, a lot less! by jandrese · 2005-08-14 11:45 · Score: 2, Insightful

That depends a lot on what you're hosting your servers on. CPU time is expensive on Tandems and to a lesser extent Suns. On PCs the CPU is cheap, especially since most PC installations are clusters and even 1U boxes tend to come with overpowered processors.

One thing is for certain though, for many users bandwidth is NOT cheap.

--

I read the internet for the articles.
Re:How much? If everyone GZipped, a lot less! by magefile · 2005-08-14 12:00 · Score: 2, Insightful

Erm ... if it's static, just store 2 copies and route accordingly. You're not serving gzipped stuff to save space, you're serving it to save bandwidth.
Re:How much? If everyone GZipped, a lot less! by grcumb · 2005-08-14 12:13 · Score: 2, Interesting

"Compared to keeping a connection state, gzipping is _way_ more expensive. I find it very hard to believe that there is a case where keeping the connection longer was more expensive than gzipping the content."

I'm prone to agree. But I also suspect that my CTO is going to agree that it's cheaper to pay once for more processing power than it is to pay every day for higher bandwidth use. YMMV, of course. Bandwidth is relatively cheap in some parts of the US, but in other parts of the world it's hideously expensive.

In short, I agree with your conclusion, but I think that the GP is right, if not for the reasons he provided. In some cases it actually does make sense to cope with a little less efficiency in one part of the system than it is to cope with constantly higher costs in another.

--
Crumb's Corollary: Never bring a knife to a bun fight.
Re:How much? If everyone GZipped, a lot less! by womby · 2005-08-14 12:22 · Score: 3, Insightful

With the least intensive compression algorithms html can end up almost 10 times smaller
That results in a 10 times shorter transfer time,
Which results in 10 times fewer simultaneous connections,
Which results in 10 times fewer apache processes,
Which results in massively reduced memory and processor requirements.

That unused processor and memory is what would be used to perform the gzip operations. Lets say for arguments sake compressing the output doubles the processor usage (a ridiculously high number) cutting the number of apache processes by an order of magnitude only has to reduce CPU requirements by 50% to come out on top.

If the gzip operation only inflicts a 10% overhead cutting the apache processes by ten only needs to free more than 9% to come out on top.

Look at your server, would cutting the number of apache processes from 400 to 40 save more than 10% of the CPU usage, would it save more than 50%?

[All numbers in this post were selected for ease of calculation not for their real world precision,]

--
**** lying is wrong even for sleeping dogs
Re:How much? If everyone GZipped, a lot less! by ZorbaTHut · 2005-08-14 12:57 · Score: 4, Insightful

That's true. LJ is a very CPU-heavy site (surprisingly), and therefore anything that can spare CPU is welcomed. A site that mostly transmitted static pages would probably find gzipping to be an obvious win.

--
Breaking Into the Industry - A development log about starting a game studio.
Re:How much? If everyone GZipped, a lot less! by jp10558 · 2005-08-14 13:05 · Score: 3, Insightful

Couldn't you GZIP each page once per change (obviously no good for dynamic pages, but for blogs, each post would only need to be done once. Unless you get comments like on slashdot, it's unlikely you'd have to gzip more than once every few minutes or so. And then serve that file like you would any other file?

--
Opera, Proxomitron-Grypen,GPG 0x0A1C6EE3
Re:How much? If everyone GZipped, a lot less! by TooncesTheCat · 2005-08-14 18:48 · Score: 2, Funny

God you really are trying to argue semantics on a fucking moot point.

I'm too tired to explain to you how retarded that comment is in context to a multi-million dollar business like a datacenter. You think that they care if you are using 30 more Watts of electricity which doesnt equate to them having an extra 100 dollars on their power bill. They dont care / would never raise rates because of their power bill....They only raise rates when bandwidth availability / rackspace becomes a premium or their demand goes up. Not just because of something as trivial as your 30 extra Watts of power being used because your using Gzip.

And your acting like Gzip would be maxing your CPU out 99% of the time.

People that argue semantics piss me off.
Re:How much? If everyone GZipped, a lot less! by JPDeckers · 2005-08-14 20:50 · Score: 2, Interesting

Another nice and strange problem is that IE totally ignores ETag headers on gzipped pages (it does not send a If-None-Matched header back).
So effectively IE requests each and every page again if it's gzipped.
Nice to know that this bandwidthreduction-solution has the opposite effect...
See my blog for more info.
Re:How much? If everyone GZipped, a lot less! by tsm_sf · 2005-08-14 21:35 · Score: 2, Funny

yah, I see that:

1) you're trying to have a conversation about two separate topics w/ 2 separate people
2) you've mixed up both the topics and the people already
3) you've replied to your OWN posts when you meant to reply to someone else's
4) you really like the word 'semantics'

Have to say that I'm really enjoying the fact that you work in IT but get pissed off by ppl arguing over linguistics. The irony is maxing my CPU out.

--
Literalism isn't a form of humor, it's you being irritating.

Bandwidth wasted for non-xhtml pages? by bdigit · 2005-08-14 10:31 · Score: 5, Interesting

How much bandwidth is /. wasting every month by not creating a standard xhtml page even though someone created one for them already

Re:Bandwidth wasted for non-xhtml pages? by llZENll · 2005-08-14 10:42 · Score: 2, Informative

Answer: Not enough to justify the cost to do it. Which goes to show you that if a site as popular as slashdot can't save money doing this, no other site on the net belongs converting to xhtml, economically speaking of course.

"Though a few KB doesn't sound like a lot of bandwidth, let's add it up. Slashdot's FAQ, last updated 13 June 2000, states that they serve 50 million pages in a month. When you break down the figures, that's ~1,612,900 pages per day or ~18 pages per second. Bandwidth savings are as follows:

Savings per day without caching the CSS files: ~3.15 GB bandwidth
Savings per day with caching the CSS files: ~14 GB bandwidth
Most Slashdot visitors would have the CSS file cached, so we could ballpark the daily savings at ~10 GB bandwidth. A high volume of bandwidth from an ISP could be anywhere from $1 - $5 cost per GB of transfer, but let's calculate it at $1 per GB for an entire year. For this example, the total yearly savings for Slashdot would be: $3,650 USD!"
Re:Bandwidth wasted for non-xhtml pages? by A+beautiful+mind · 2005-08-14 10:44 · Score: 4, Interesting

Normally you would be right, but now you're banging open doors. CmdrTaco and others are actively working on a new CSS-using formatting of slashdot.

--
It takes a man to suffer ignorance and smile
Be yourself no matter what they say
Re:Bandwidth wasted for non-xhtml pages? by A+beautiful+mind · 2005-08-14 10:50 · Score: 4, Interesting

Oh yea, here is the link about it.

--
It takes a man to suffer ignorance and smile
Be yourself no matter what they say
Re:Bandwidth wasted for non-xhtml pages? by Bogtha · 2005-08-14 11:04 · Score: 2, Informative

It has absolutely sod-all to do with XHTML. HTML 4.01 and XHTML 1.0 are functionally identical. You can use table layouts and <font> elements with XHTML 1.0 and you can use CSS with HTML 4.01.

You are referring to separating the content and the presentation through the use of stylesheets. This has nothing to do with XHTML, although it would save a hell of a lot of bandwidth if Slashdot implemented it. They are implementing it.

--
Bogtha Bogtha Bogtha
Re:Bandwidth wasted for non-xhtml pages? by oh_bugger · 2005-08-14 19:44 · Score: 2, Funny

A more serious question is how much bandwidth /. is wasting by hosting the large quantity of duped articles

--
Go home and shave your giant head of smell with your bad self

Slashdot? by djsmiley · 2005-08-14 10:31 · Score: 4, Insightful

"And more importantly, with 9M posts, what percentage of them have any real value, and how do busy people find that .001%?"

On slashdot.... Oh wait....

--
- http://www.milkme.co.uk

Re:Slashdot? by Propaganda13 · 2005-08-14 10:51 · Score: 2, Funny

If you'd check out my blog, you could read about the blogs I've read today thus saving yourself a lot of time.

900k a day, not 9m by Anonymous Coward · 2005-08-14 10:32 · Score: 2, Informative

order of magnitude out there, fella... better try again with this new fangled "math" stuff

Don't forget the robots by astrashe · 2005-08-14 10:33 · Score: 4, Interesting

I used to have a blog that I recently shut down because no one read it.

No one read it, but I got a ton of hits -- all from indexing services. WordPress pings a service that lets lots of indexing systems know about new posts. Some of them -- Yahoo, for example, were contstantly going through my entire tree of posts, and hitting links for months, subjects, and so on.

It didn't bother me, because the bandwidth wasn't an issue, and it wasn't like they were hammering my vps or anything. It mostly just made it really hard to read the logs, because finding human readers was like looking for a needle in a haystack.

But bandwidth is cheap, and RSS is really useful, so it seems at least as good of a use for the resource as p2p movie exchanges.

Re:Don't forget the robots by lukewarmfusion · 2005-08-14 12:03 · Score: 2, Informative

Are you saying that you read the logs directly/manually?

See AWStats
Re:Don't forget the robots by doktor-hladnjak · 2005-08-14 14:00 · Score: 2, Insightful

Who says a whole lot of people need to read your blog? Only a small handful of friends read mine, mostly people I live far away from. It's a weirdly indirect way of keeping in touch with those people (I read theirs, they read mine). Still, I find my blog to be more of a diary to keep track of things that happen in my life for my own personal purposes more than anything else.

Rather than assuming... by llZENll · 2005-08-14 10:33 · Score: 5, Interesting

Rather than a making all these assumptions why not just email Bob Wyman and ask him?

"How much data is this? If we assume that the average HTML post is 150K this will work out to about 135G. Now assuming we're going to average this out over a 24 hour period (which probably isn't realistic) this works out to about 12.5 Mbps sustained bandwidth.

Of course we should assume that about 1/3 of this is going to be coming from servers running gzip content compression. I have no stats WRT the number of deployed feeds which can support gzip (anyone have a clue?). My thinking is that this reduce us down to about 9Mbps which is a bit better.

This of course assumes that you're not fetching the RSS and just fetching the HTML. The RSS protocol is much more bloated in this regard. If you have to fetch 1 article from an RSS feed your forced to fetch the remaining 14 addition posts that were in the past (assuming you're not using the A-IM encoding method which is even rarer). This floating window can really hurt your traffic. The upside is that you have to fetch less HTML.

Now lets assume you're only fetching pinged blogs and you don't have to poll (polling itself has a network overhead). The average blog post would probably be around 20k I assume. If we assume the average feed has 15 items, only publishes one story, and has a 10% overhead we're talking about 330k per fetch of an individual post.

If we go back to the 900k posts per day figure we're talking a lot of data - 297G most of which is wasted. Assuming gzip compression this works out to 27.5Mbps.

Thats a lot of data and a lot of bloat which is unnecessary. This is a difficult choice for smaller aggregator developers as this much data costs a lot of money. The choice comes down to cheap HTML index ing with the inaccuracy that comes from HTML or accurate RSS which costs 2.2x more.

Update: Bob Wyman commented that he's seeing 2k average post size with 1.8M posts per day. If we are to use the same metrics as above this is 54G per day or around 5Mbps sustained bandwidth for RSS items (assuming A-IM differentials aren't used)."

Some Answers by RAMMS+EIN · 2005-08-14 10:35 · Score: 3, Insightful

``How Much Bandwidth is Required to Aggregate Blogs?''

Less than it currently takes, what with pull, HTTP, and XML used instead of more efficient technologies.

``what percentage of them have any real value, and how do busy people find that .001%?''

Using a scoring system, like Slashdot's?

It's not like all of this is rocket science. It's just that people go along with the hyped technology that's "good enough for any conceivable purpose", ignoring the superior technology that had been invented before and wasn't hyped as much. Nothing new here.

--
Please correct me if I got my facts wrong.

Definition of quality and value == arbitrary by davecrusoe · 2005-08-14 10:37 · Score: 3, Insightful

And more importantly, with 9M posts, what percentage of them have any real value, and how do busy people find that x%

Well, the significant percent is probably much larger than you might think. For example, if you aren't a chef, chances are you won't desire to read anything that relates to cooking. So, knock off X% of all blogs. You might not be interested in knitting, so deduct another X%.

In actuality, my guess is that there are few blogs you might decide to visit, and of those you do, several may have content you find worthwhile. Remember, worthwhile is all in the perception of the reader - there is no real definition for quality or value. Perhaps through trial and error - in essence digital tinkering - you find and derive your own value.

cheers, --dave

Slashdot = blog = ironic by Lovejoy · 2005-08-14 10:45 · Score: 3, Interesting

Does anyone else wonder why Slashdot editors seem to have it in for blogs? Is it because in Internet years, Slashdot is as old and sclerotic as the Dinomedia? Is Slashdot the Dinomedia of the new media?

Does anyone else consider it ironic that the Slashdot editorship HATES blogs, but Slashdot is actually a blog?

Anyone else getting tired of these questions?

--
Yes, it's a blog. Sorry if that offends you.

Re:Slashdot = blog = ironic by Kiaser+Wilhelm+II · 2005-08-14 10:56 · Score: 2

On the contrary, the questions being raised about the quality of blogs is very correct..

The average blog is just some random joe telling us about his day or various bits of intellectual sophistry about things he doesn't understand (politics, science, etc).

Sorry, quantity != quality. A million monkeys at a million typewriters, only a few of them are producing the works of Shakespeare.

--
Lord High Crapflooder The Right Honourable Vlad Craig Esther McDavenpherson III
Destroyer of Mercatur.Net

Answer: Not much by Anonymous Coward · 2005-08-14 10:51 · Score: 4, Funny

The bandwidth savings from using html+css are hugely exaggerated.

Slashdot is switching to html+css for the front page, but not for any dynamic pages like the one you're on now. Because slashcode was written by totally incompetent programmers, the markup for comment pages is not separated from the logic. Making any changes is therefore a huge undertaking and the people who wrote it are far too busy maintaining the high journalistic standards slashdot is known for to do it.

That's 900,000 posts by epeus · 2005-08-14 10:57 · Score: 3, Informative

I run the spiders at Technorati, and it is 0.9 million posts a day, which Kevin Burton had correct in the post cited. Is the is the no dot effect?

Re:busy people read 9000 blogs per day?? by cyberfunk2 · 2005-08-14 10:57 · Score: 2, Informative

First, As some AC points out, 0.001 PERCENT of 9 million is 90.

Secondly, that would be posts, i'm assuming the intelligent stuff tends to be not in 90 seperate posts, but with multiple intelligent posts from the same person.

Third, since the original poster somehow messed up and cited the number 9 million instead of the correct number, 900,000 , that number is reduced to 9 posts a day, a reasonable amount to read.

Finding the Worthwhile Content in Blogs by Rob+Carr · 2005-08-14 10:58 · Score: 4, Insightful

Most blogs are both drivel and worthwhile, depending upon the individual reading them (including mine). They become worthwhile in context.

If a friend is going through cancer treatment, her blog is worthwhile. If you find a youth group leader like yourself and can learn from his posts, his blog is worthwhile. A mother fighting for her health so that she can take care of her two sons and husband can share insights that are worthwhile. Someone fighting depression might have a worthwhile blog. A grandmother might have a view of the world that makes her blog worthwhile, just to get a different view. Perhaps a blog by someone who totally disagrees with you will be worthwhile, just to stretch your mind.

I've just described why I read the blogs on my blog roll. You can choose differently.

Top political blogs? You can find them easily among Technorati's top 100 list. Tags at Technorati will let you pick out specialties like science or "Master Blasters" or diabetes or the Tour de France. Google will turn up blogs if you search right, which is the trick for using Google.

"Worthwhile" is a much more difficult variable to calculate than "bandwidth." Perhaps it's the sheer variety of blogs that makes them interesting, because they are so individual and someone, somewhere will speak to your mind or your heart.

Worthwhile is what's worthwhile to you, and maybe to very few others. Not everyone will agree, and that's not a bad thing.

--
This sig seemed like a good idea at the time....

Value by lakin · 2005-08-14 11:01 · Score: 5, Interesting

what percentage of them have any real value

I had for a while held the view that most blogs out there are pointless. Some can be insightful and some are basically used as company press releases, but most are people talking about their days activities that few people really care about, and a few of my friends have blogs like these. When I asked one whats the point, she said she just blogs stuff she would normally mention to many people on msn throughout the day. Its not meant to have value to anyone on slashdot, be hugely insightful, or detail some breathtaking new hack, its simply another way for her to talk to friends (that doesnt involve repeating herself).

--
Paul

Wheat from chaff by StikyPad · 2005-08-14 11:22 · Score: 4, Funny

search query: blog -1337 -teh -kewl -hugz -omg -bored -lol -lmao -"can't wait to get my drivers license"

--
https://www.eff.org/https-everywhere

Re:Wheat from chaff by Rosco+P.+Coltrane · 2005-08-14 11:44 · Score: 5, Funny

search query: blog -1337 -teh -kewl -hugz -omg -bored -lol -lmao -"can't wait to get my drivers license"

Ah! I guess you missed the following blog entry then:
Hi everybody, it's Sunday today and I'm bored. So I guess I'll get on with my homemade engine that runs on water. As you know, it's almost finished, and I expect it to put out as much as 1337 horsepower. The reliability of the motor should be good too: my friend, Ray Kewl in engineering, said it should provide well beyond 10,000 TEH (total engine hours). Update: the engine is in the car, and it runs! on nothing but water! OMG I'm so happy! check the pictures and the diagrams to build your own. I can't wait to get my drivers license renewed so I can take it for a spin!

--
"A door is what a dog is perpetually on the wrong side of" - Ogden Nash
Re:Wheat from chaff by ebrandsberg · 2005-08-14 11:57 · Score: 2, Insightful

Yes, but with a post like that, it should end up on Slashdot in a few days anyway... after every news site has posted it a few days earlier.

semi off topic by cookiepus · 2005-08-14 12:26 · Score: 2, Interesting

Since we're on the subject of blog aggregation, can someone recomend a GOOD way to aggregate?

Every single RSS aggregator I've come across treats my RSS world similar to an e-mail reader, where each blog is a 'folder' and each entry is equivalent to an e-mail.

This is decidedly NOT what I want and I don't understand why everyone's writing the same thing.

My friend is running PLANET, which builds a frontpage out of the RSS feeds (looks kind of like the slasdot frontpage where adjacent stores come from different sources and are sorted in chronolocial order (newest on top)

PLANET seems to be a server-side implementation. My buddy's running Linux and he made a little page for me but it's not right for me to bug him every time I want to add a feed.

Is there anything like what I want that would run on Windows? And if not, why the heck not?

By the same token, why doesn't del.icio.us have any capacity to know when my links have been updated?

For what it's worth, here's my del.icio.us BLOGS area with some blogs I find good.

http://del.icio.us/eduardopcs/BLOG

--
Ecce Europa - Web Design for Business

Gzip helps, but the real win is conditional get by epeus · 2005-08-14 12:55 · Score: 4, Informative

If your weblog server implements ETag and Last-Modified, my spider can send a one packet request with the values I last saw from you, and you can send a one packet 304 response if nothing has changed.

Charles Miller explained this well a few years ago.

(I run the spiders at Technorati).

Re:If you poll, at least do it well... by bobwyman · 2005-08-14 16:55 · Score: 2, Interesting

Baricom: What you're looking for is the "cloud" interface defined at: http://blogs.law.harvard.edu/tech/soapMeetsRss
The documentation there is, I think, about as good as you'll find. While it says that it can be implemented in either XML-RPC or SOAP, I am aware only of XML-RPC implementations.

The cloud provides a means for blogs to notify subscribers of updates and should eliminate the need for polling -- except that the subscriptions must be renewed at least every 25 hours. Of course, this cloud stuff isn't terribly useful in most cases since it relies on the blog server being able to send an HTTP message to a remote client (subscriber). In most cases, those messages would be blocked by firewalls. This is, of course, why the "Atom over XMPP" stuff makes sense. It relies on a connection established from the client to the server -- in the same manner as is done with instant messaging clients. Thus, there are many fewer issues with firewalls.

Of course, having lots of session open between a client program and all of the various blogs it reads probably doesn't make much sense. Neither does it make sense for every blog to maintain a list of all of its "cloud" readers and go to the work of sending them all messages whenever the blog is updated. Thus, the most sensible way to do this push business is to have the individual blogs publish to a common network of aggregating servers and then have clients establish connections to the common service. Overall bandwidth consumption is thus reduced to the absolute minimum. That's what we're building at PubSub.com.

bob wyman

Miski Client-Server-Server-Client protocol by Philip+Dorrell · 2005-08-14 20:18 · Score: 2, Interesting

As I explained (as long ago as 2000) in Miski: A White Paper, we need a system with the following features:

Each producer of link suggestions has a unique address, something like channel/user@example.com. (This implies resolution via DNS, but probably people will end up using the URL of an XML file.)
The channel address points to the producer's server.
The subscriber to a channel tells their server to subscribe to the channel. The subscriber's server talks to the producer's server.
When the producer makes a new link suggestion, their client pushes it to their server, which pushes it to all the servers whose subscribers have subscribed to the channel.
Each server pushes the link suggestion to their clients (by whatever means).

The pattern of client to server to server to client is a bit like the architecture of email, but it is quite spam-proof because you only ever receive what you asked for.

Additionally, subscribers can instantly "repost" a suggestion to their own channel, which will be read by their subscribers. To avoid reading duplicate posts, servers will optionally filter out duplicates. However, this has a major consequence, which is that subscribers are only ever guaranteed to see the URL, which means that anything you want to say about the content of a new page has to go into the URL. The current system of RSS titles and descriptions will not work under reposting and duplicate filtering.

The combination of real-time pushing and reposting could lead to a speeded up Internet, where exciting new ideas spread from one user to the next in a matter of minutes, without having to go through the bottlenecks of centralised attention and popular websites (such as Slashdot). This could be enough to turn the Internet into a "Global Brain", and perhaps even trigger the Technological Singularity.

I invented Miski to solve the problem of getting people to take notice of new ideas without having to engage in a massive publicity effort, but unfortunately I've failed to get anyone to take any notice of the Miski idea.

--
Music: a super-stimulus for the perception of musicality. Musicality: a perceived aspect of speech.

Th e long tail by Eivind+Eklund · 2005-08-14 20:28 · Score: 3, Informative

I think most of these blogs have something of interest to somebody, and that the value of blogs is in their diversity - in a lot of things having value to a small number of people.

This effect is called the The long tail effect, and is visible all over the web. For instance, Amazon.com says that every day, it sells more books that didn't sell yesterday than the sum of books sold that *also* sold yesterday. In other words, they sell (in sum) more of the items selling less than one every other day than of items selling (by type) more than that.

Eivind.

--
Doubting the existence of evolution is like doubting the existence of China: It just shows that you're uninformed.

s/blog/website by ubernostrum · 2005-08-14 22:04 · Score: 2, Insightful

Time to ditch the World Wide Web, right?.

Slashdot Mirror

How Much Bandwidth is Required to Aggregate Blogs?

50 of 209 comments (clear)