Slashdot Mirror


How Much Bandwidth is Required to Aggregate Blogs?

Kevin Burton writes "Technorati recently published that they're seeing 900k new posts per day. PubSub says they're seeing 1.8M. With all these posts per day how much raw bandwidth is required? Due to innefficiencies in RSS aggregation protocols a little math is required to understand this problem." And more importantly, with millions of posts, what percentage of them have any real value, and how do busy people find that .001%?

209 comments

  1. All at once by someonewhois · · Score: 5, Interesting

    It would make a lot more sense to have a protocol where you check one file that has a list of links to another XML file, and then the aggregator figures out which of those URLs has NOT been aggregated, then it downloads the other XML file which has the post-specific info, which it proceeds to display. That would save a lot of bandwidth, I'm sure.

    1. Re:All at once by ranson · · Score: 4, Insightful

      I'm trying to understand how this would help because if everyone would incorporate generally accepted practices with regard to the HTTP protocol into their XML generation script (e.g., including Last-Modified and/or Expires headers, providing an e-tag, etc) the aggregators could use Get If-Modified-Since requests to save an unthinkable amount of bandwidth. As it is right now, since most RSS feeds are generated on the fly from some database, that doesn't happen and the aggregators just have to pull the entire XML at regular intervals to ensure nothing was missed. I find it silly that some basic functionality of the WWW like smart caching rules started being ignored when RSS came along.

    2. Re:All at once by G-Licious! · · Score: 2, Interesting

      I don't think you need a list of links or even a separate file. An easier solution might be to just pass a format string in a separate link-tag on the html page announcing the feed. For example, right now we have: (taken straight form the linked article)

      <link rel="alternate" type="application/atom+xml" title="Atom" href="http://www.feedblog.org/atom.xml" />
      <link rel="alternate" type="application/rss+xml" title="RSS" href="http://www.feedblog.org/index.rdf" />

      And we could introduce a new relationship type, say "recent-feed", with a strftime-like format string:

      <link rel="recent-feed" type="application/atom+xml" title="Atom" href="http://www.feedblog.org/atom.xml?date=%Y-%m- %d&time=%H:%M:%S" />
      <link rel="recent-feed" type="application/rss+xml?date=%Y-%m-%d&time=%H:%M :%S" title="RSS" href="http://www.feedblog.org/index.rdf" />

      Ofcourse, that'd require the blog feed to be a dynamic page of some sort (PHP, Python, Ruby, Perl, whatever..), but that shouldn't be a problem; I can't think of a single blog with a bandwidth problem that is using static pages.

    3. Re:All at once by G-Licious! · · Score: 1

      I forgot to mention, the date/time there is ofcourse the last time the feed was checked, so the feed can be a regular RSS or Atom feed generated with just the articles published or modified since then.

    4. Re:All at once by broward · · Score: 4, Interesting

      The bandwidth isn't going to matter much.

      The blog wave is close to an inflection point,
      probably within six to twelve months...
      which means that total bandwidth will probably
      top out at about TWICE the current rate.

      http://www.realmeme.com/Main/miner/preinflection/b logDejanews.png

      I suspect that even now, many blogs are
      starved for readership as new blogs come online
      and steal mental bandwidth.

    5. Re:All at once by jo42 · · Score: 1

      You'd also save a buttload of bandwidth by using a more efficient method than XML. All the XML crudge I see is easily 60-80% fsckin XML tags, the rest being real content.

    6. Re:All at once by Enrico+Pulatzo · · Score: 1

      It's amazing to me how many problems would be solved if applications (client and server) just understood http 1.1 more fully.

    7. Re:All at once by BrokenHalo · · Score: 1
      Your argument makes sense... but from the original post:

      And more importantly, with millions of posts, what percentage of them have any real value, and how do busy people find that .001%?

      I'm not so sure why anyone would consider Slashdot to be able to offer any insight on that one.

      *ducks* :-D

    8. Re:All at once by Anonymous Coward · · Score: 0

      I'm not so sure why anyone would consider Slashdot to be able to offer any insight on that one

      slashmash can :)

    9. Re:All at once by jrockway · · Score: 2, Informative

      Most sane webservers GZIP the content. XML compresses extremely well. (In other words, gzipped XML is just as efficient space-wise as a binary memory dump. And much easier for mere people to understand.)

      --
      My other car is first.
    10. Re:All at once by TheRaven64 · · Score: 1

      One way to hugely reduce bandwidth would be to use XMPP publish-subscribe for RSS, rather than HTTP. That way, you don't have to poll every 30 minutes or so to see changes, and you don't have to download a complete RSS file just to get one new article.

      --
      I am TheRaven on Soylent News
    11. Re:All at once by thing12 · · Score: 1

      Sure, but gzip'd JSON is 15-25% smaller than gzip'd XML. Take a look at the (contrived) examples and try gziping any of them. It's closer to a binary format itself and is really just as, if not more readable than XML.

  2. How much? If everyone GZipped, a lot less! by ranson · · Score: 4, Insightful

    How much bandwidth is required? A lot less if everyone would take the 5 minutes required to implement GZip compression on their Apache servers. It saves you bandwidth, it speeds up your site for users (especially those on dialup), and saves the bandwidth of aggregators (assuming they advertise an Accept-Encoding header for gzip; deflate)

    So my plea to the internet community today.. make sure your web server is configured to send gzipped content. TFA says he doesn't know how many RSS feeds can support gzip. The answer is easy really, any feed being served by Apache (plus a LOT of other webservers. AOLserver even added gzip support recently). Here's how to setup Apache and here's where to check if your site is using GZip or and get an idea of the bandwidth savings you should see get. If you're site isn't gzipping, show your admin (if it's someone else) the 'how-to' above and ask them to implement it -- it's an absolute no-brainer win-win for everyone that takes no time at all to setup really. It's really absurd IMO that it's not enabled in Apache by default.

    1. Re:How much? If everyone GZipped, a lot less! by TCM · · Score: 3, Insightful

      Of course every server is powerful enough that CPU time can't possibly become an issue, right?

      --
      Of course it runs NetBSD. BTC: 1NT7QvbetmANwaMzhpVL6
    2. Re:How much? If everyone GZipped, a lot less! by Madd+Scientist · · Score: 5, Informative
      i used gzip with apache at an old job and we ran into a problem with it... some obscure header problem in conjunction with mod-rewrite.

      so i wouldn't say ANY site using apache... but probably most. the real problem there is with compression load on the servers... gzip compression doesn't just happen you know, it takes CPU cycles that could be being used to just push data rather than encode it.

    3. Re:How much? If everyone GZipped, a lot less! by Have+Blue · · Score: 1

      I think you mean "enable". *Implementing* GZip takes a hell of a lot longer than 5 minutes :)

    4. Re:How much? If everyone GZipped, a lot less! by Guspaz · · Score: 1

      With or without gzip, 12.5mbit is easy and cheap. A 2.4ghz Celeron with a 20mbit unmetered Cogent connection goes for $239 US/mth at ServerMatrix. For these big sites complaining about bandwidth, $239 per month is peanuts.

    5. Re:How much? If everyone GZipped, a lot less! by ranson · · Score: 2, Insightful

      >Of course every server is powerful enough that CPU >time can't possibly become an issue, right? On moderately busy servers, most have found that mod_gzip helps with both CPU and RAM, since users stay connected to your server for shorter durations, resulting in overall fewer concurrent connections.

    6. Re:How much? If everyone GZipped, a lot less! by quanticle · · Score: 1

      Yes, but at least some of those CPU cycles will be made up for by the fact that you have to push less data to the user.

      --
      We all know what to do, but we don't know how to get re-elected once we have done it
    7. Re:How much? If everyone GZipped, a lot less! by TCM · · Score: 1

      Do you have _any_ sources to back this up? Compared to keeping a connection state, gzipping is _way_ more expensive. I find it very hard to believe that there is a case where keeping the connection longer was more expensive than gzipping the content.

      --
      Of course it runs NetBSD. BTC: 1NT7QvbetmANwaMzhpVL6
    8. Re:How much? If everyone GZipped, a lot less! by ZorbaTHut · · Score: 4, Interesting

      As I remember, www.livejournal.com has experimented with gzip compression several times. They've discovered that the price of the CPU far exceeds the price of the bandwidth.

      Bandwidth is cheap. Computers, not so much.

      --
      Breaking Into the Industry - A development log about starting a game studio.
    9. Re:How much? If everyone GZipped, a lot less! by TCM · · Score: 1

      The cycles you save by sending $orig_size - $gzip_size less data are neglegible compared to the overhead of gzipping.

      Where the hell do you get the idea these two processes are comparable cpu-wise?

      *shakes head*

      --
      Of course it runs NetBSD. BTC: 1NT7QvbetmANwaMzhpVL6
    10. Re:How much? If everyone GZipped, a lot less! by rtaylor · · Score: 1

      so i wouldn't say ANY site using apache... but probably most. the real problem there is with compression load on the servers... gzip compression doesn't just happen you know, it takes CPU cycles that could be being used to just push data rather than encode it.

      Most web clients take gzipped content, so if it's static you should gzip by default and store compressed on the filesystem.

      For browsers taking compressed content (most of them) serve as is and for those that don't you can uncompress the content on the fly. Incidentally, it's usually faster to decompress than to compress it anyway, so it could be a savings even if you do get a flood of Netscape 3 browsers visiting.

      --
      Rod Taylor
    11. Re:How much? If everyone GZipped, a lot less! by jandrese · · Score: 2, Insightful

      That depends a lot on what you're hosting your servers on. CPU time is expensive on Tandems and to a lesser extent Suns. On PCs the CPU is cheap, especially since most PC installations are clusters and even 1U boxes tend to come with overpowered processors.

      One thing is for certain though, for many users bandwidth is NOT cheap.

      --

      I read the internet for the articles.
    12. Re:How much? If everyone GZipped, a lot less! by GodGell · · Score: 1

      a "no-brainer win-win"? wait a sec. i have to look for useful documents about gzip on the net which is a pain in the ass itself, then have to write a gzip compressor, then test it, then build it into my server's code, and then test, test, test. it takes at least a day (~6 hours) and is not nearly as a "piece o' cake" as this guy says.

      --
      [SHOW SOME LENIENCY TOWARDS ... I mean, FUCK BETA] Eat. Survive. Reproduce. GOTO 10
    13. Re:How much? If everyone GZipped, a lot less! by magefile · · Score: 2, Insightful

      Erm ... if it's static, just store 2 copies and route accordingly. You're not serving gzipped stuff to save space, you're serving it to save bandwidth.

    14. Re:How much? If everyone GZipped, a lot less! by Tony+Hoyle · · Score: 1

      Your howto specifically states how to *not* use mod_gzip and to create .gz copies of every page.

      Not so useful on a dynamic site.

    15. Re:How much? If everyone GZipped, a lot less! by grcumb · · Score: 2, Interesting

      "Compared to keeping a connection state, gzipping is _way_ more expensive. I find it very hard to believe that there is a case where keeping the connection longer was more expensive than gzipping the content."

      I'm prone to agree. But I also suspect that my CTO is going to agree that it's cheaper to pay once for more processing power than it is to pay every day for higher bandwidth use. YMMV, of course. Bandwidth is relatively cheap in some parts of the US, but in other parts of the world it's hideously expensive.

      In short, I agree with your conclusion, but I think that the GP is right, if not for the reasons he provided. In some cases it actually does make sense to cope with a little less efficiency in one part of the system than it is to cope with constantly higher costs in another.

      --
      Crumb's Corollary: Never bring a knife to a bun fight.
    16. Re:How much? If everyone GZipped, a lot less! by TooncesTheCat · · Score: 1

      Last time I checked the whole point of the Gzip idea was to save bandwidth, does using your extra CPU cycles cost you extra money?

      Fuck no it doesnt - Extra bandwidth costs you money, cpu cycles being used dont.

      Bam BOOYAH

    17. Re:How much? If everyone GZipped, a lot less! by noidentity · · Score: 1

      "...less if everyone would take the 5 minutes required to implement GZip compression on their Apache servers. It saves you bandwidth, it speeds up your site for users (especially those on dialup)

      Even when they have the modem compression enabled (something that's been available at least since V.32)?

    18. Re:How much? If everyone GZipped, a lot less! by Siniset · · Score: 0

      Increased cpu cycles costs more in energy costs. So yes, increased cpu cycles does cost you more.

    19. Re:How much? If everyone GZipped, a lot less! by womby · · Score: 3, Insightful

      With the least intensive compression algorithms html can end up almost 10 times smaller
      That results in a 10 times shorter transfer time,
      Which results in 10 times fewer simultaneous connections,
      Which results in 10 times fewer apache processes,
      Which results in massively reduced memory and processor requirements.

      That unused processor and memory is what would be used to perform the gzip operations. Lets say for arguments sake compressing the output doubles the processor usage (a ridiculously high number) cutting the number of apache processes by an order of magnitude only has to reduce CPU requirements by 50% to come out on top.

      If the gzip operation only inflicts a 10% overhead cutting the apache processes by ten only needs to free more than 9% to come out on top.

      Look at your server, would cutting the number of apache processes from 400 to 40 save more than 10% of the CPU usage, would it save more than 50%?

      [All numbers in this post were selected for ease of calculation not for their real world precision,]

      --
      **** lying is wrong even for sleeping dogs
    20. Re:How much? If everyone GZipped, a lot less! by Dwonis · · Score: 1

      IIRC, you can configure your server to do the compression once per file, instead of every time the page is served.

    21. Re:How much? If everyone GZipped, a lot less! by Craig+Ringer · · Score: 1

      I don't imagine it would be simple - network protocols rarely are.

      That said, HTTP is pretty much made for it with its support for `Content-Encoding:', and the clear headers/body separation. You still have to do things like make sure the client understands gzip, of course.

      The job would also be vastly simplified by zlib. I'd consider implementing your own gzip compressor pretty extreme when zlib is free and extremely well tested.

    22. Re:How much? If everyone GZipped, a lot less! by TooncesTheCat · · Score: 1

      How many colocation providers or dedicated server providers charge you for electricity? None that I have ever fucking seen, and I have ran over 10 colo'd boxes in my life and hundreds of dedicateds at some of the most prominent datacenters in the US.

    23. Re:How much? If everyone GZipped, a lot less! by ZorbaTHut · · Score: 4, Insightful

      That's true. LJ is a very CPU-heavy site (surprisingly), and therefore anything that can spare CPU is welcomed. A site that mostly transmitted static pages would probably find gzipping to be an obvious win.

      --
      Breaking Into the Industry - A development log about starting a game studio.
    24. Re:How much? If everyone GZipped, a lot less! by indigoid · · Score: 1

      I'm not sure which way to lean WRT connection lifetimes vs. gzip cpu utilisation. However...

      cpu time is NOT cheap in either case. sure, fast CPUs might be cheap, and the servers containing them also reasonably cheap, but there is a pretty big difference in operating cost between a mostly idle cluster of 1u boxes and a well loaded cluster of 1u boxes.

      the well loaded cluster will want LOTS more power and generate LOTS more heat. not only that, but in the Dells we use here (eg. PE1850, PE2850, PE6850, etc) if just one fan dies, ALL of the other fans in that server will run at full speed in an attempt to compensate, sucking even more power. Then as things heat up, your AC requirements also go up. This costs many beans.

      In short, you can't just look at one aspect in isolation. There are many flow-on effects!

      --
      P-plate adventurer
    25. Re:How much? If everyone GZipped, a lot less! by jp10558 · · Score: 3, Insightful

      Couldn't you GZIP each page once per change (obviously no good for dynamic pages, but for blogs, each post would only need to be done once. Unless you get comments like on slashdot, it's unlikely you'd have to gzip more than once every few minutes or so. And then serve that file like you would any other file?

      --
      Opera, Proxomitron-Grypen,GPG 0x0A1C6EE3
    26. Re:How much? If everyone GZipped, a lot less! by Anonymous Coward · · Score: 0
      "...less if everyone would take the 5 minutes required to implement GZip compression on their Apache servers. It saves you bandwidth, it speeds up your site for users (especially those on dialup)

      Even when they have the modem compression enabled (something that's been available at least since V.32)?

      I wonder how available modem compression really is. Consider this brief timeline of Internet connectivity:

      Then: ISA (16-bit) hardware modems with compression (what I use)

      Now: broadband (not for my rural ass)

      Between then and now: $5 PCI WinModems. No compression because paying royalties would've added too many pennies to the cost. (Yes, I know the LZW patent is now expired.)

      Incidently, it would not surprise me to learn that the creator of yEnc had just such a cheap, compression-less WinModem.

    27. Re:How much? If everyone GZipped, a lot less! by 0x20 · · Score: 1

      You're missing what is meant by "gzip". The above posters are referring to on-the-fly gzip compression of the http stream by the web server as the pages are requested. They're not talking about gzipping the individual files and offering them for download that way.

      Anyone running Apache can install mod_gzip, which compresses the served content and sends it to the browser, which decompresses and renders it. for further info, see this rather old article.

    28. Re:How much? If everyone GZipped, a lot less! by pediddle · · Score: 1

      Well, the GP raises an interesting question. Is there (or should there be) an implementation of mod_gzip which caches the compressed copies?

      On that note, using a standard web cache like Squid which supports compression in front of your webserver could solve the same problem.

    29. Re:How much? If everyone GZipped, a lot less! by SmittyTheBold · · Score: 1

      *Implementing* GZip takes a hell of a lot longer than 5 minutes :)

      Nah, dude, zlib. Two minutes, max.

      =P

      --
      ± 29 dB
    30. Re:How much? If everyone GZipped, a lot less! by TCM · · Score: 1

      Why do you automatically assume that I _have_ extra CPU cycles?

      --
      Of course it runs NetBSD. BTC: 1NT7QvbetmANwaMzhpVL6
    31. Re:How much? If everyone GZipped, a lot less! by xanthan · · Score: 1

      Then use a compression accelerator like NetScaler (www.netscaler.com). I've seen it used on similarly large sites to handle load balancing and acceleration. The servers don't have to do squat and you don't have to change the app. For large enough sites, the savings should more than make up for the cost of the box.

    32. Re:How much? If everyone GZipped, a lot less! by TooncesTheCat · · Score: 1

      Quit being semantic. Doesnt matter if your using 99% of your cycles, or 3 %.

      Its not going to cost you more money.

    33. Re:How much? If everyone GZipped, a lot less! by GodGell · · Score: 1

      i'll take a look into zlib. but i imagine even without having to implement gzip it's not as easy as the original parent says. :)

      --
      [SHOW SOME LENIENCY TOWARDS ... I mean, FUCK BETA] Eat. Survive. Reproduce. GOTO 10
    34. Re:How much? If everyone GZipped, a lot less! by uid8472 · · Score: 1

      If anything supported LZO, that might help with the CPU usage.

    35. Re:How much? If everyone GZipped, a lot less! by municio · · Score: 1

      I think your problems have more to do with the choice of your servers. While buying Xeons from Dell might seem an economic choice, in the long run it's not.

      You should consider migrating to Opteron chips. Power consumption and heat are not such big problems for Opterons. In the long run you will also save a lot of cash in AC/electricity costs and even get a significant peformance improvement.

    36. Re:How much? If everyone GZipped, a lot less! by Anonymous Coward · · Score: 0

      Hi All,

      using mod_deflate (apache 2 version of mod_gzip) is a piece of cake. You have to add about virtually nothing to your httpd.conf file and fedora 4 (and maybe earlier versions) do it by default.

      ---First, in the loadmodule section:
      LoadModule deflate_module modules/mod_deflate.so

      ---Second, in your virtual hosts:
      (Location /var/www/html/whatever/ )
      #Note those brackets should be the pointy kind not "()" but slashdot drops the line if i use them
      SetOutputFilter DEFLATE
      BrowserMatch ^Mozilla/4 gzip-only-text/html
      BrowserMatch ^Mozilla/4\.0[678] no-gzip
      BrowserMatch \bMSI[E] !no-gzip !gzip-only-text/html
      SetEnvIfNoCase Request_URI .(?:gif|jpe?g|png|zip)$ no-gzip dont-vary
      Header append Vary User-Agent env=!dont-vary
      (/Location)

      ---That's it, you're now gzipping text/html but not messing around trying to compress already compressed images. Test it with the link in the parent...

      Now, what was hard about that?

    37. Re:How much? If everyone GZipped, a lot less! by tsm_sf · · Score: 1

      How many colocation providers or dedicated server providers charge you for electricity?

      I'm sure they pass the cost of electricity on to you, the customer. If their bills start going up I'd bet they raise your rates.

      --
      Literalism isn't a form of humor, it's you being irritating.
    38. Re:How much? If everyone GZipped, a lot less! by TooncesTheCat · · Score: 2, Funny

      God you really are trying to argue semantics on a fucking moot point.

      I'm too tired to explain to you how retarded that comment is in context to a multi-million dollar business like a datacenter. You think that they care if you are using 30 more Watts of electricity which doesnt equate to them having an extra 100 dollars on their power bill. They dont care / would never raise rates because of their power bill....They only raise rates when bandwidth availability / rackspace becomes a premium or their demand goes up. Not just because of something as trivial as your 30 extra Watts of power being used because your using Gzip.

      And your acting like Gzip would be maxing your CPU out 99% of the time.

      People that argue semantics piss me off.

    39. Re:How much? If everyone GZipped, a lot less! by TooncesTheCat · · Score: 1

      2:57:36 AM: yes
          2:57:44 AM: it happens, power is getting expesnive
          2:58:21 AM: Im saying though...
          2:58:27 AM: I mean for individual servers
          2:58:36 AM: not for the individual servers, but usually for everyone
          2:58:44 AM: Well heres what I mean
          2:58:46 AM: Well I'm posting on slashdot about using mod_gzip for less bandwidth usage....and some guy replied back saying that using mod_gzip increases CPU cycles, I argued back that who cares....using extra CPU cycles doesnt cost you money from a datacenter...bandwidth does...he counters back with a stupid arguement saying that using more CPU cycles uses more electricity on the server...which equates to money, to which I replied no datacenters ever charge you for electricity / overages on power....he replies back that
      "I'm sure they pass the cost of electricity on to you, the customer. If their bills start going up I'd bet they raise your rates."
          2:58:49 AM: though some that can measure power usage can do that, but i think that's just an excuse
          2:59:18 AM: DC's never charge for you using your full power / wattage of your PSU
          2:59:18 AM: It's a averaging out thing
          2:59:30 AM: nobody is going to invest in equipment to measure the power of each server, it simply is stuipd
          2:59:55 AM: He is trying to argue semantics....saying that using mod_gzip will lead to a whole entire datacenter raising its rates on the whole
          3:00:09 AM: Or just for you ( but no DC has ever charged to my knowledge on a per user basis )
          3:00:21 AM: It's stupid argument
          3:00:26 AM: Yea I know :/
          3:00:32 AM: the electricity use is simply insignificant
          3:00:38 AM: Exactly

      -------

      Straight from the a friend of mine that works at a prominent datacenter in California.

      BAM

    40. Re:How much? If everyone GZipped, a lot less! by nacturation · · Score: 1

      Microsoft IIS caches the gzipped versions of static files and only recompresses when the original changes.

      --
      Want to improve your Karma? Instead of "Post Anonymously", try the "Post Humously" option.
    41. Re:How much? If everyone GZipped, a lot less! by JPDeckers · · Score: 2, Interesting
      Another nice and strange problem is that IE totally ignores ETag headers on gzipped pages (it does not send a If-None-Matched header back).
      So effectively IE requests each and every page again if it's gzipped.

      Nice to know that this bandwidthreduction-solution has the opposite effect...

      See my blog for more info.

    42. Re:How much? If everyone GZipped, a lot less! by tsm_sf · · Score: 2, Funny

      yah, I see that:

        1) you're trying to have a conversation about two separate topics w/ 2 separate people
        2) you've mixed up both the topics and the people already
        3) you've replied to your OWN posts when you meant to reply to someone else's
        4) you really like the word 'semantics'

      Have to say that I'm really enjoying the fact that you work in IT but get pissed off by ppl arguing over linguistics. The irony is maxing my CPU out.

      --
      Literalism isn't a form of humor, it's you being irritating.
    43. Re:How much? If everyone GZipped, a lot less! by uhlume · · Score: 1

      "People that argue semantics piss me off."

      I do not think that word means what you think it means.

      (Oh, the irony.)

      --
      SIERRA TANGO FOXTROT UNIFORM
    44. Re:How much? If everyone GZipped, a lot less! by baadger · · Score: 1

      I think you can acheive this using the AddHandler directive for .gz files on Apache.

      Just have your web application gzip it's output to a file with a .gz extension when it's done, then have Apache server .gz files with priority over .html files...although I haven't tried it yet.

      This assumes of course your web application is not dynamic at runtime on a visitor to visitor basis.

    45. Re:How much? If everyone GZipped, a lot less! by cbreaker · · Score: 1

      I agree with you - someone trying to say electricity if a factor *at all* is just ignorant.

      I think mod_gzip is great, and I think the benefits outweigh the costs in all regards. We're not alone; I've found that a LOT of sites are using gzip now.

      Not to mention the fact that the client/browser computers are going to get pages faster.

      But, man.. the chat log with some dude that works wheverever (and you can't tell who's talking - no names) needs to go. You don't need to continuously prove a point that IS correct. Being correct is enough =)

      --
      - It's not the Macs I hate. It's Digg users. -
    46. Re:How much? If everyone GZipped, a lot less! by Bob+Uhl · · Score: 1
      Your page doesn't say how to use mod_gzip (or mod_deflate, the Apache 2.0 version). I figured it out, though, and here's what to do for mod_deflate:
      LoadModule deflate_module modules/mod_deflate.so

      <Location />
      SetOutputFilter DEFLATE

      # Netscape 4.x has some problems...
      BrowserMatch ^Mozilla/4 gzip-only-text/html

      # Netscape 4.06-4.08 have some more problems
      BrowserMatch ^Mozilla/4\.0[678] no-gzip

      # MSIE masquerades as Netscape, but it is fine
      # BrowserMatch \bMSIE !no-gzip !gzip-only-text/html

      # don't compress images (already compressed)
      SetEnvIfNoCase Request_URI \.(?:gif|jpe?g|png)$ no-gzip dont-vary
      # don't compress PDFs
      SetEnvIfNoCase Request_URI \.pdf$ no-gzip dont-vary
      # don't compressed compressed files
      SetEnvIfNoCase Request_URI \.(?:exe|t?gz|zip|bz2|sit|rar)$ no-gzip dont-vary

      # Make sure proxies don't deliver the wrong content
      Header append Vary User-Agent env=!dont-vary
      </Location>

      The only thing I can't figure out is the appropriate IfModule line--mod_deflate.c and deflate.c don't seem to work properly.

    47. Re:How much? If everyone GZipped, a lot less! by Madd+Scientist · · Score: 1

      and why did you colo 10 boxes instead of just 1? probably because you ran out of CPU cycles on the rest. so your costs went up 10 times because of more boxes needed. more boxes = more space = more costs. and more heat and more electricity = more fans = more electricity = more heat. if you can't understand even a very simple economic system, then don't respond to my genius insight.

    48. Re:How much? If everyone GZipped, a lot less! by Madd+Scientist · · Score: 1

      i have dealt with ISPs that charge you a fee based on the % CPU time your user on the shared box used. so yes, CPU cycles do directly = more cost in some contracts.

    49. Re:How much? If everyone GZipped, a lot less! by Madd+Scientist · · Score: 1

      (Oh, the semantics.)

    50. Re:How much? If everyone GZipped, a lot less! by Madd+Scientist · · Score: 1

      semantics cost everyone money

    51. Re:How much? If everyone GZipped, a lot less! by Madd+Scientist · · Score: 1
      these are the kind of insights that bosses are infuriated by.

      "no, we can't store 2 copies... then we'll have a sync issue... no no no, we have to decompress realtime to browsers that don't accept gzip"

      "uh sir, do you know what 'realtime' even means? and that is just stupid."

      "JOHNSON!!#^%!#%!#^ GRRRRRR#@^%#^#^@!#^@"

    52. Re:How much? If everyone GZipped, a lot less! by Madd+Scientist · · Score: 1
      using mod_rewrite we were able to send the right header to solve this problem, but we were using the apache as a file server too and we couldn't modify the content type header for some reason if the content could have changed for that URL.

      so again, i wouldn't say all apaches servers would have this problem either... but some would.

    53. Re:How much? If everyone GZipped, a lot less! by Madd+Scientist · · Score: 1
      FROM http://www.aplushosting.com/english/policies-misus e-use.phtml... CPU cycles do cost money. that was just the first result searching for "shared hosting CPU cost"

      CPU/RAM usage

      The CPU/RAM consumption of a web site may in no way affect the performance of a server. This means that the implementation of CGIs on a web site must be carefully studied by the account's owner. Examples of what can cause high CPU/RAM usage are: CGIs that handle too much data from text files. An example of this is a message board. The UBB (Ultimate Bulletin Board) is well known to consume very high amounts of CPU anytime it has to handle its flat file database (like when someone posts a message). Perl scripts constantly executed consume a lot of server resources, since the Perl interpreter must be launched each time the script is called. Scripts with long loops. Scripts that are constantly calling a database and doing complex queries. We reserve the right to disable any CGI script that affects normal shared server operation without notice. Web sites with excessive traffic will consume plenty of CPU and RAM, since the webserver has to handle more. Any web site with more than 140,000 hits per day will be considered a high traffic web site. Special prices apply in such case. The user will be alerted if his/her site goes beyond such limit, and will be given 3 days to either upgrade his/her plan to accomodate the added traffic or to lower the traffic. If not, the account's traffic consumption will be controlled, or, the account might get cancelled (sudden peaks in traffic are usually due to the mirroring of large files, which are against policies and usually result in cancellation).

      If a script (and its instances) is found to be consuming too much CPU power (more than 5%) and/or RAM, and it's affecting the overall performance of a server, A+ Hosting has the right to move/block the script, and warn the account's owner. The warning will consist of a request to improve the performance of the script, or to replace it with a more efficient one. If, the account's user pays no attention to a warning (or several ones) and puts the script back to operation, A+ Hosting has the right to again block the script and/or change the account's password until the user complies. If, after that, the user still pays no attention to the warning and puts the script back, A+ Hosting will delete the account. No refunds for the current month of service will be given (advanced payments will be refunded) and a $35 clean up fee might be charged.

    54. Re:How much? If everyone GZipped, a lot less! by uhlume · · Score: 1

      (Yes, precisely.)

      --
      SIERRA TANGO FOXTROT UNIFORM
    55. Re:How much? If everyone GZipped, a lot less! by rtaylor · · Score: 1

      You're not serving gzipped stuff to save space, you're serving it to save bandwidth.

      Actually, I am. I didn't feel like getting more diskspace for increquently accessed material.

      --
      Rod Taylor
    56. Re:How much? If everyone GZipped, a lot less! by magefile · · Score: 1

      Why not have a script watching the file to see when it changes, and update the gzipped stuff then?

  3. Bandwidth wasted for non-xhtml pages? by bdigit · · Score: 5, Interesting

    How much bandwidth is /. wasting every month by not creating a standard xhtml page even though someone created one for them already

    1. Re:Bandwidth wasted for non-xhtml pages? by llZENll · · Score: 2, Informative

      Answer: Not enough to justify the cost to do it. Which goes to show you that if a site as popular as slashdot can't save money doing this, no other site on the net belongs converting to xhtml, economically speaking of course.

      "Though a few KB doesn't sound like a lot of bandwidth, let's add it up. Slashdot's FAQ, last updated 13 June 2000, states that they serve 50 million pages in a month. When you break down the figures, that's ~1,612,900 pages per day or ~18 pages per second. Bandwidth savings are as follows:

      Savings per day without caching the CSS files: ~3.15 GB bandwidth
      Savings per day with caching the CSS files: ~14 GB bandwidth
      Most Slashdot visitors would have the CSS file cached, so we could ballpark the daily savings at ~10 GB bandwidth. A high volume of bandwidth from an ISP could be anywhere from $1 - $5 cost per GB of transfer, but let's calculate it at $1 per GB for an entire year. For this example, the total yearly savings for Slashdot would be: $3,650 USD!"

    2. Re:Bandwidth wasted for non-xhtml pages? by A+beautiful+mind · · Score: 4, Interesting

      Normally you would be right, but now you're banging open doors. CmdrTaco and others are actively working on a new CSS-using formatting of slashdot.

      --
      It takes a man to suffer ignorance and smile
      Be yourself no matter what they say
    3. Re:Bandwidth wasted for non-xhtml pages? by A+beautiful+mind · · Score: 4, Interesting

      Oh yea, here is the link about it.

      --
      It takes a man to suffer ignorance and smile
      Be yourself no matter what they say
    4. Re:Bandwidth wasted for non-xhtml pages? by Bogtha · · Score: 2, Informative

      It has absolutely sod-all to do with XHTML. HTML 4.01 and XHTML 1.0 are functionally identical. You can use table layouts and <font> elements with XHTML 1.0 and you can use CSS with HTML 4.01.

      You are referring to separating the content and the presentation through the use of stylesheets. This has nothing to do with XHTML, although it would save a hell of a lot of bandwidth if Slashdot implemented it. They are implementing it.

      --
      Bogtha Bogtha Bogtha
    5. Re:Bandwidth wasted for non-xhtml pages? by Isldeur · · Score: 1

      How much bandwidth is /. wasting every month by not creating a standard xhtml page even though someone created one for them already

      and from here.

      Ask an IT person if they know what Slashdots tagline is and theyll reply, News for Nerds. Stuff that Matters. Slashdot is a very prominent site, but underneath the hood you will find an old jalopy that could benefit from a web standards mechanic.

      This is going to sound like a flame - and it isn't meant to be it. But it seems obvious at this point that the people running Slashdot have made their money, or grown tired of it, and have moved on to other things. Just look at the duplicates or the fact that the site really hasn't developed at all for years.

      One would think this would be one of the sites that experiment with web technologies. I just don't see it anymore.

    6. Re:Bandwidth wasted for non-xhtml pages? by Anonymous Coward · · Score: 0

      Answer: Not enough to justify the cost to do it. Which goes to show you that if a site as popular as slashdot can't save money doing this, no other site on the net belongs converting to xhtml, economically speaking of course.

      Does that mean that if the Army is still using NT3 somewhere, that everybody should use NT3? Just because some organization is big doesn't mean they're right.

    7. Re:Bandwidth wasted for non-xhtml pages? by sootman · · Score: 1

      Just look at... the fact that the site really hasn't developed at all for years.

      Whaddya mean? Haven't you seen the glorious new IT color scheme?

      --
      Dear Slashdot: next time you want to mess with the site, add a rich-text editor for comments.
    8. Re:Bandwidth wasted for non-xhtml pages? by burtdub · · Score: 0

      How much bandwidth is wasted for duplicate /. articles?

    9. Re:Bandwidth wasted for non-xhtml pages? by Anonymous Coward · · Score: 0

      to answer your questions,
      Yes and no.

      but you didn't hear that from me, actually I think I'll check that anonymous box

    10. Re:Bandwidth wasted for non-xhtml pages? by oh_bugger · · Score: 2, Funny

      A more serious question is how much bandwidth /. is wasting by hosting the large quantity of duped articles

      --
      Go home and shave your giant head of smell with your bad self
    11. Re:Bandwidth wasted for non-xhtml pages? by Ulrich+Hobelmann · · Score: 1

      So what? The slashcode link above (alistapart) exists TODAY and it looks really good (lightyears better than most other forums on the web).

      Why wait another ten years for Taco and others to do some NIH?

    12. Re:Bandwidth wasted for non-xhtml pages? by terpri · · Score: 0

      So, considering that it's already been done for them, does this mean that CmdrTaco is going to be duping Slashdot's layout? =P

    13. Re:Bandwidth wasted for non-xhtml pages? by elemental23 · · Score: 1

      Probably because slashcode has years of cruft and mixed presentation and login built into it. Not to take anything away from the AListApart guys -- I think they did a great job -- but they're working with a static, already-generated page. That's orders of magnitude easier than rewriting an application with HTML tags scattered all through it.

      That said, Slashcode is using the new templates and it's a testing ground for stuff that is coming to this site. CmdrTaco spoke at Linuxworld about the upcoming conversion which should be happening Real Soon Now.

      --
      I like my women like my coffee... pale and bitter.
    14. Re:Bandwidth wasted for non-xhtml pages? by fbjon · · Score: 1

      Wait, do you mean bandwidth would be saved by using XHTML, or by separating content and style?

      --
      True confidence comes not from realising you are as good as your peers, but that your peers are as bad as you are.
  4. Slashdot? by djsmiley · · Score: 4, Insightful

    "And more importantly, with 9M posts, what percentage of them have any real value, and how do busy people find that .001%?"

    On slashdot.... Oh wait....

    --
    - http://www.milkme.co.uk
    1. Re:Slashdot? by Propaganda13 · · Score: 2, Funny

      If you'd check out my blog, you could read about the blogs I've read today thus saving yourself a lot of time.

    2. Re:Slashdot? by Anonymous Coward · · Score: 0

      for slashdot, slashmash is at 17.38% at this time

  5. busy people read 9000 blogs per day?? by Anonymous Coward · · Score: 1, Insightful

    9M*0.001 = 9000...

    1. Re:busy people read 9000 blogs per day?? by Anonymous Coward · · Score: 0

      0.001% of 9M is 90 you idiot.

    2. Re:busy people read 9000 blogs per day?? by Erno_Rubaiyat · · Score: 1

      As is pointed out in another post the article lists 900k posts per day, and you forgot two decimal places .001% is .00001*900k = 9 blog posts per day, which is actually a little low but right in the ballpark.

      j

    3. Re:busy people read 9000 blogs per day?? by cyberfunk2 · · Score: 2, Informative

      First, As some AC points out, 0.001 PERCENT of 9 million is 90.

      Secondly, that would be posts, i'm assuming the intelligent stuff tends to be not in 90 seperate posts, but with multiple intelligent posts from the same person.

      Third, since the original poster somehow messed up and cited the number 9 million instead of the correct number, 900,000 , that number is reduced to 9 posts a day, a reasonable amount to read.

    4. Re:busy people read 9000 blogs per day?? by cyberfunk2 · · Score: 1

      Erm, i meant , "not in 90 seperate blogs"... not "90 seperate posts".

  6. Easy by gonaddespammed.com · · Score: 0

    Visit /.

  7. 900k a day, not 9m by Anonymous Coward · · Score: 2, Informative

    order of magnitude out there, fella... better try again with this new fangled "math" stuff

  8. Don't forget the robots by astrashe · · Score: 4, Interesting

    I used to have a blog that I recently shut down because no one read it.

    No one read it, but I got a ton of hits -- all from indexing services. WordPress pings a service that lets lots of indexing systems know about new posts. Some of them -- Yahoo, for example, were contstantly going through my entire tree of posts, and hitting links for months, subjects, and so on.

    It didn't bother me, because the bandwidth wasn't an issue, and it wasn't like they were hammering my vps or anything. It mostly just made it really hard to read the logs, because finding human readers was like looking for a needle in a haystack.

    But bandwidth is cheap, and RSS is really useful, so it seems at least as good of a use for the resource as p2p movie exchanges.

    1. Re:Don't forget the robots by croddy · · Score: 1
      I think this anecdote might provide a good idea of how many of those blog posts are actually useful.

      Almost none.

      Don't worry about it, guys. If people ever start clamoring for MORE blog posts, you'll know.

    2. Re:Don't forget the robots by Bradmont · · Score: 1

      robots.txt?

    3. Re:Don't forget the robots by lukewarmfusion · · Score: 2, Informative

      Are you saying that you read the logs directly/manually?

      See AWStats

    4. Re:Don't forget the robots by doktor-hladnjak · · Score: 2, Insightful

      Who says a whole lot of people need to read your blog? Only a small handful of friends read mine, mostly people I live far away from. It's a weirdly indirect way of keeping in touch with those people (I read theirs, they read mine). Still, I find my blog to be more of a diary to keep track of things that happen in my life for my own personal purposes more than anything else.

    5. Re:Don't forget the robots by Anonymous Coward · · Score: 0

      Who says a whole lot of people need to read your blog?

      Still, I find my blog to be more of a diary to keep track of things that happen in my life for my own personal purposes more than anything else.

      I'd have to agree about the numbers. I actually took my entire site, along with it's blog, offline recently after having been online since around '97. I took it offline, but the site and blog still lives and is used almost daily on a local setup of Apache/PHP/MySQL etc.

      My readership has went from a few hundred unique hits a day to just me, but it's still as useful as it ever was, if not more so. Like you I used the blog mainly keep track of day to day activities, and with the site offline and access restricted to just myself, I was much more inclined to do so.

      And while sure, there are programs to record notes and write to-do lists without the memory footprint of Apache, MySQL, and etc... it allows me the ease of use, and ease of expansion involved with HTML, PHP, and etc.

  9. Rather than assuming... by llZENll · · Score: 5, Interesting

    Rather than a making all these assumptions why not just email Bob Wyman and ask him?

    "How much data is this? If we assume that the average HTML post is 150K this will work out to about 135G. Now assuming we're going to average this out over a 24 hour period (which probably isn't realistic) this works out to about 12.5 Mbps sustained bandwidth.

    Of course we should assume that about 1/3 of this is going to be coming from servers running gzip content compression. I have no stats WRT the number of deployed feeds which can support gzip (anyone have a clue?). My thinking is that this reduce us down to about 9Mbps which is a bit better.

    This of course assumes that you're not fetching the RSS and just fetching the HTML. The RSS protocol is much more bloated in this regard. If you have to fetch 1 article from an RSS feed your forced to fetch the remaining 14 addition posts that were in the past (assuming you're not using the A-IM encoding method which is even rarer). This floating window can really hurt your traffic. The upside is that you have to fetch less HTML.

    Now lets assume you're only fetching pinged blogs and you don't have to poll (polling itself has a network overhead). The average blog post would probably be around 20k I assume. If we assume the average feed has 15 items, only publishes one story, and has a 10% overhead we're talking about 330k per fetch of an individual post.

    If we go back to the 900k posts per day figure we're talking a lot of data - 297G most of which is wasted. Assuming gzip compression this works out to 27.5Mbps.

    Thats a lot of data and a lot of bloat which is unnecessary. This is a difficult choice for smaller aggregator developers as this much data costs a lot of money. The choice comes down to cheap HTML index ing with the inaccuracy that comes from HTML or accurate RSS which costs 2.2x more.

    Update: Bob Wyman commented that he's seeing 2k average post size with 1.8M posts per day. If we are to use the same metrics as above this is 54G per day or around 5Mbps sustained bandwidth for RSS items (assuming A-IM differentials aren't used)."

    1. Re:Rather than assuming... by Jesus+IS+the+Devil · · Score: 1

      You're forgetting, most collocation data centers charge you by the 95th percentile. With most of the traffic bunched up duing the weekday hours (most likely), the guy is probably paying for many more Mbps than what you're calculating.

      --

      eTrade SUCKS
    2. Re:Rather than assuming... by VolciMaster · · Score: 1
      You've also forgotten about caching. The aggregator I use on my website caches feeds whenever it can (Magpie RSS), and only refreshes the feed when the expiration for the feed items has been reached. It substantially reduces the external network load to other servers. Magpie also only checks for new versions when the page is refreshed, not on a set schedule like stand-alone RSS aggregators typically do.

      It's not a perfect solution, but it's better than most alternatives I looked at before goign with this system.

  10. 900K, not 9M...RTFA by Anonymous Coward · · Score: 0

    Technorati points out there are 900k blog posts per day:
    You'd think at least the submitter would read the post.

  11. Some Answers by RAMMS+EIN · · Score: 3, Insightful

    ``How Much Bandwidth is Required to Aggregate Blogs?''

    Less than it currently takes, what with pull, HTTP, and XML used instead of more efficient technologies.

    ``what percentage of them have any real value, and how do busy people find that .001%?''

    Using a scoring system, like Slashdot's?

    It's not like all of this is rocket science. It's just that people go along with the hyped technology that's "good enough for any conceivable purpose", ignoring the superior technology that had been invented before and wasn't hyped as much. Nothing new here.

    --
    Please correct me if I got my facts wrong.
    1. Re:Some Answers by Anonymous Coward · · Score: 0

      God you're a genius. Go tell mommy.

  12. 9 M? by cyberfunk2 · · Score: 1

    By which I assume you mean 9 million....

    the cited article discusses volumes of 900k, i.e.: thousands...

    from whence comes this discrepancy ?

    1. Re:9 M? by Subrafta · · Score: 1
      By which I assume you mean 9 million.... the cited article discusses volumes of 900k, i.e.: thousands... from whence comes this discrepancy ?

      What a doof -- he was using the Roman Numeral M which is 1000 so he underestimated by 891,000.

      Never underestimate the value of those Liberal Arts classes...

      --
      Vuja De: That sinking feeling that this is going to happen again. Often occurs in meetings with Product Managers.
    2. Re:9 M? by nacturation · · Score: 1

      from whence comes this discrepancy ?

      Aforementioned discrepancy cometh from thine arse, which be white as the first winter snow.

      --
      Want to improve your Karma? Instead of "Post Anonymously", try the "Post Humously" option.
  13. RTFA by Anonymous Coward · · Score: 0

    It's 900k, not 9m. Please, RTFA before hurrying over here to post.

  14. Definition of quality and value == arbitrary by davecrusoe · · Score: 3, Insightful
    And more importantly, with 9M posts, what percentage of them have any real value, and how do busy people find that x%
    Well, the significant percent is probably much larger than you might think. For example, if you aren't a chef, chances are you won't desire to read anything that relates to cooking. So, knock off X% of all blogs. You might not be interested in knitting, so deduct another X%.

    In actuality, my guess is that there are few blogs you might decide to visit, and of those you do, several may have content you find worthwhile. Remember, worthwhile is all in the perception of the reader - there is no real definition for quality or value. Perhaps through trial and error - in essence digital tinkering - you find and derive your own value.

    cheers, --dave
    1. Re:Definition of quality and value == arbitrary by Anonymous Coward · · Score: 0

      Yeah, that's what they meant, but what percentage of those 0.9M posts are and aren't childish drivel? Like most of Xanga for example...

      I would think that's the proper question.

  15. Slashdot = blog = ironic by Lovejoy · · Score: 3, Interesting

    Does anyone else wonder why Slashdot editors seem to have it in for blogs? Is it because in Internet years, Slashdot is as old and sclerotic as the Dinomedia? Is Slashdot the Dinomedia of the new media?

    Does anyone else consider it ironic that the Slashdot editorship HATES blogs, but Slashdot is actually a blog?

    Anyone else getting tired of these questions?

    1. Re:Slashdot = blog = ironic by Kiaser+Wilhelm+II · · Score: 2

      On the contrary, the questions being raised about the quality of blogs is very correct..

      The average blog is just some random joe telling us about his day or various bits of intellectual sophistry about things he doesn't understand (politics, science, etc).

      Sorry, quantity != quality. A million monkeys at a million typewriters, only a few of them are producing the works of Shakespeare.

      --
      Lord High Crapflooder The Right Honourable Vlad Craig Esther McDavenpherson III
      Destroyer of Mercatur.Net
  16. 1.69+ Terrabytes a Month by HunbunFunland · · Score: 0
    1. Re:1.69+ Terrabytes a Month by Anonymous Coward · · Score: 0

      I think this shitty blog is in need of a GNAA crapflood troll.

  17. Answer: Not much by Anonymous Coward · · Score: 4, Funny

    The bandwidth savings from using html+css are hugely exaggerated.

    Slashdot is switching to html+css for the front page, but not for any dynamic pages like the one you're on now. Because slashcode was written by totally incompetent programmers, the markup for comment pages is not separated from the logic. Making any changes is therefore a huge undertaking and the people who wrote it are far too busy maintaining the high journalistic standards slashdot is known for to do it.

  18. That's 900,000 posts by epeus · · Score: 3, Informative

    I run the spiders at Technorati, and it is 0.9 million posts a day, which Kevin Burton had correct in the post cited. Is the is the no dot effect?

    1. Re:That's 900,000 posts by Anonymous Coward · · Score: 0


      Were you the first Apple employee to get fired because of bloggin?

      http://epeus.blogspot.com/2003_08_01_epeus_archive .html#106133327387671172

  19. Finding the Worthwhile Content in Blogs by Rob+Carr · · Score: 4, Insightful
    Most blogs are both drivel and worthwhile, depending upon the individual reading them (including mine). They become worthwhile in context.

    If a friend is going through cancer treatment, her blog is worthwhile. If you find a youth group leader like yourself and can learn from his posts, his blog is worthwhile. A mother fighting for her health so that she can take care of her two sons and husband can share insights that are worthwhile. Someone fighting depression might have a worthwhile blog. A grandmother might have a view of the world that makes her blog worthwhile, just to get a different view. Perhaps a blog by someone who totally disagrees with you will be worthwhile, just to stretch your mind.

    I've just described why I read the blogs on my blog roll. You can choose differently.

    Top political blogs? You can find them easily among Technorati's top 100 list. Tags at Technorati will let you pick out specialties like science or "Master Blasters" or diabetes or the Tour de France. Google will turn up blogs if you search right, which is the trick for using Google.

    "Worthwhile" is a much more difficult variable to calculate than "bandwidth." Perhaps it's the sheer variety of blogs that makes them interesting, because they are so individual and someone, somewhere will speak to your mind or your heart.

    Worthwhile is what's worthwhile to you, and maybe to very few others. Not everyone will agree, and that's not a bad thing.

    --
    This sig seemed like a good idea at the time....
    1. Re:Finding the Worthwhile Content in Blogs by Anonymous Coward · · Score: 0

      If you find a youth group leader like yourself and can learn from his posts, his blog is worthwhile.

      Yeah, because those paedophiles need to share their tips and tricks amongst each other after "work".

    2. Re:Finding the Worthwhile Content in Blogs by Rob+Carr · · Score: 1
      Yeah, because those paedophiles need to share their tips and tricks amongst each other after "work"

      It sounds like you've had a terrible experience, and I'm truly sorry to hear that. If that's what happened to you, I hope you were able to get someone to listen to you and get the abuser thrown out of the program and incarcerated. I wish you well in working through what must be a terrible experience.

      I was part of a church committee that worked on policies designed to prevent children being abused. The church worked hard to make sure the children are safe. Background checks are run on leaders. Spot checks are made on classrooms by roving leaders. All classroom doors have windows and must remain unlocked. Leaders must not be alone with an individual student at any time.

      Any accusation of abuse of any kind results in the local police being called immediately and an investigation initiated with full cooperation of the church.

      Pedophiles are a problem. They are attracted to work with youth in the church, scouting programs, and teaching positions. They must be stopped.

      To assume that all who work with children are pedophiles is unfair to the people who care for the wellbeing of these children and are willing to give of their time and their hearts.

      It's also makes recruiting volunteers far more difficult and helping children almost impossible. Who wants to volunteer when the immediate assumption is you might be a pervert? Children can no longer be hugged for fear that such contact might be misconstrued, The results of all this can be heartbreaking. Imagine having a child so happy to see you that they run up to give you a hug. As a responsible leader, you must step back and block the attempted hug. The smaller kids don't understand that. They wonder what they did wrong. How do you explain what you just did to a three or four year old?

      It's messed up, but it's what must be done.

      --
      This sig seemed like a good idea at the time....
  20. Do we even understand what "value" means? by MoralHazard · · Score: 1

    with 9M posts, what percentage of them have any real value, and how do busy people find that .001%?/i

    Either I don't understand this question, or it's a completely idiotic question. What the fuck does "real value" mean? The maxim "One man's trash is another man's treasure " is especially important when talking about information--the asymmetry of value from person to person is even bigger than when you're talking about physical goods.

    Considering the second half of the question, though, one might re-phrase the whole thing as "How do you find the posts that have value to you, individually?" That IS an important question... but like most econ majors, I figure the market will probably solve.

    1. Re:Do we even understand what "value" means? by Brontosaurus+Jim · · Score: 1

      Yet another fucking blogger that can't even handle making a post without screwing up a tag.

  21. Value by lakin · · Score: 5, Interesting

    what percentage of them have any real value

    I had for a while held the view that most blogs out there are pointless. Some can be insightful and some are basically used as company press releases, but most are people talking about their days activities that few people really care about, and a few of my friends have blogs like these. When I asked one whats the point, she said she just blogs stuff she would normally mention to many people on msn throughout the day. Its not meant to have value to anyone on slashdot, be hugely insightful, or detail some breathtaking new hack, its simply another way for her to talk to friends (that doesnt involve repeating herself).

    --
    Paul
    1. Re:Value by C0llegeSTUDent · · Score: 1

      Hate to break it to you, but your friend is a blog-whore.

    2. Re:Value by Anonymous Coward · · Score: 0

      So it's a way of bastardizing interpersonal communications. That's even more sad. :(

    3. Re:Value by phobos13013 · · Score: 1

      In a different time, before the internet, if you had merely changed the world blog to the word mall or some other social place to meet, you would have called them a social-whore. The fact is the 'blogosphere' for all its modern wonders and entrapments is merely a digital extension of our previous social world. I too for awhile was on about that whole 'oh im too cool for a blog because i dont care if nobody cares about my boring life' mentality but the fact is call it a diary, call it a blog, call it a social networking node its just a natural use of the internet.

      For those that call blogs merely boring drivel with no purpose (oh well except for the blogs *I* read) those are just the same misantropes who sat in the back of class heckling everyone (yes i was one too), or who goes to the bar and drinks by hisself and doesnt talk to anyone except for the 'regulars' because they are the only REAL patrons there. But for those that have embraced it for this reason or not, all blogging or reading blogs is is a natural extension of sociality. Like the person who sitting in line at the bank strikes up a useless conversation with you just to pass the time, or the old guy at the bus stop who rambles away his life story during the long wait. Blogs are healthy and natural no matter how boring or useless the internet l33t think it is.

      And consider this, those that tend to heckle the blogosphere to seem ahead of the cutting edge or what have you of the internet, we can be sure these are the same dcsks we see in IRC trying to seem the most knowledgable or on the forums trying to talk the loudest which are hey just other forms of social media available on the internet. So quit the blog-slamming and either read them or go back to yr pr0n surfing. the end.

      --
      ...and it should be known by now
    4. Re:Value by Vitriol+Angst · · Score: 1

      Blogs to me are kind of like "your life on a bumper sticker". A shout in the dark, in case people are listening.

      But, in a philosophical sense, in the grand scheme, I think they are like the subconscious of a world mind. If you cross-referenced a lot of these blogs, you could get an idea of how a certain group is thinking. Not so much in what each person is saying, but in what they are looking at, and how they are saying it. Perhaps people distort their thinking when they right it down ... but that is no different than how all of our communication is distorted by whom and how we are comminicating what we think. They are doing some predictive research using just this concept on http://www.halfpasthuman.com/ It is pretty interesting and has been strangely better than a few wacko psychics in predicting earthquakes and a power outage in China. Fascinating stuff.

      If everyone had a blog, you might have a better barometer of the world consciousness than anything yet realized. I know that sound grandiose when talking about some high school girl blogging about the cute buy she saw on the way to school ... but, if you were going to see the beginnings of a world mind, how would it start? If the sampling were more often, say from a wireless PDA and updating a block every 30 minutes and more people had blogs, then add a social situation where people become more forthcoming about what they think... you would start getting a lot of useless chatter that would become more useful in aggregate.

      --
      >>"ad space available -- low rates!!!"
  22. Re:we've tried gzip on our server... by Anonymous Coward · · Score: 1, Insightful

    I call BS. Gzip compresses streams in memory. It can't corrupt your hard drive.

    This reads like a generic troll. "We actually had been using $PRODUCT_NAME for quite a long time on a server at home..."

  23. Rod Serling says by nagora · · Score: 1
    "Remember that nut that sat beside you on the bus? They guy that had a water-powered car but was too afraid to go public in case the oil companies came after him? You remember how glad you were when your stop came and you got off? This morning, though, when you walk into town for the Sunday papers, today, that nut is everyone you meet!

    You've just woken up in....The Blogosphere! De-de-de-de, de-de-de-de.....

    The answer to the article's question is: nothing; there's no point in wading through the output of blogs, so don't bother aggregating it; stick the whole lot in the bin. There, wasn't that easy?

    TWW

    --
    "Encyclopedia" is to "Wikipedia" what "Library" is to "Some people at a bus stop"
    1. Re:Rod Serling says by Anonymous Coward · · Score: 1, Interesting

      that if Google hadn't destroyed Usenet we'd not have all these goofy blogs and millions of people trying to make a dime off of them.

  24. Please, mod... by gardyloo · · Score: 1

    Making any changes is therefore a huge undertaking and the people who wrote it are far too busy maintaining the high journalistic standards slashdot is known for to do it. ...+5, nougat-filled sarcasm.

  25. Judgemental much? by fermion · · Score: 1
    This is just a bunch of numbers spouted, with no useful context, and then a broad statement made about value.

    The days when 9 megabytes or 5 MPS sustained for a popular server is considered out of line is long gone. Poeple want to communicate, and they will use whatever resources are needed. How many resources do we use so that we can gaurantee that tuan will his present from grandma? How many resources do we use so that an arbitrary firm can mail a postcard to everyone in the country? How many resources do we use so that everyone can keep up with the every move of thier favorite celebrity?

    As far as figuring out what is of value to a particular person, whatever judgemental figure one wishes to place on it, one can browse, or, take the time tested method of using a proxy. Find a trusted source and pay them to publish all the content that they think is of interest to a particular group. Obviously the busy person does not have time to look everywhere, so the service is of value. And, if you do not wish to actually pay for the service, perhaps the trusted source can convience others to pay in exchange for the opportunity to have control over the content or a portion of a page to promote thier particular interest.

    --
    "She's a scientist and a lesbian. She's not going to let it slide." Orphan Black
  26. Wheat from chaff by StikyPad · · Score: 4, Funny

    search query: blog -1337 -teh -kewl -hugz -omg -bored -lol -lmao -"can't wait to get my drivers license"

    1. Re:Wheat from chaff by Rosco+P.+Coltrane · · Score: 5, Funny

      search query: blog -1337 -teh -kewl -hugz -omg -bored -lol -lmao -"can't wait to get my drivers license"

      Ah! I guess you missed the following blog entry then:

      Hi everybody, it's Sunday today and I'm bored. So I guess I'll get on with my homemade engine that runs on water. As you know, it's almost finished, and I expect it to put out as much as 1337 horsepower. The reliability of the motor should be good too: my friend, Ray Kewl in engineering, said it should provide well beyond 10,000 TEH (total engine hours).

      Update: the engine is in the car, and it runs! on nothing but water! OMG I'm so happy! check the pictures and the diagrams to build your own. I can't wait to get my drivers license renewed so I can take it for a spin!

      --
      "A door is what a dog is perpetually on the wrong side of" - Ogden Nash
    2. Re:Wheat from chaff by ebrandsberg · · Score: 2, Insightful

      Yes, but with a post like that, it should end up on Slashdot in a few days anyway... after every news site has posted it a few days earlier.

    3. Re:Wheat from chaff by jpsowin · · Score: 1

      [joke] I wonder why nothing comes up when I search for that in Google? [/joke]

      Wait--real life is more humorous--the GOP is the first listing!

    4. Re:Wheat from chaff by hywel_ap_ieuan · · Score: 1

      Ah! I guess you missed the following blog entry then:

      Hi everybody, it's Sunday today and I'm bored. So I guess I'll get on with my homemade engine that runs on water....

      So the filter works on trolls and scammers as well as lamers? Sweet!

  27. how? by BlackShirt · · Score: 1

    http://slashdot.org/metamod.pl

    is quite useful (if /. = blog)

  28. What about static html pages? by MTO_B. · · Score: 1

    The link you posted explains how to compress php files, but what if my site's server is 98% static html files? Is it possible then?

    thanks

    1. Re:What about static html pages? by leonmergen · · Score: 1

      Yes, just use mod_gzip with apache.

      --
      - Leon Mergen
      http://www.solatis.com
    2. Re:What about static html pages? by tylers · · Score: 1

      The post above mentions mod_gzip - that will work if you're using apache 1. If you're using apache 2, the module you'll need is mod_deflate.

      It's not hard to set up and works well.

  29. MOD PARENT UP by Anonymous Coward · · Score: 0

    Do it. He's right.

  30. a little math... by digitalderbs · · Score: 1

    clearly, the answer is the number of posts times 6.25.

  31. Re:we've tried gzip on our server... by Anonymous Coward · · Score: 0

    This reads like a generic troll. "We actually had been using $PRODUCT_NAME for quite a long time on a server at home..."

    No shit, Sherlock.

    I call BS. Gzip compresses streams in memory. It can't corrupt your hard drive.

    So why did you have to say this?

  32. The Answer by simpl3x · · Score: 1

    It would make you very rich. Nobody thinking about crap like that! No sir!

    Like most of life, building networks of trust takes time. Aren't issues like this really part of the problem? Charging for bandwidth... My server has something like 100gig of transfer, and unless I get Slashdotted several times a month is this really a problem? And, if I do, why aren't I getting some ads in place to pay for it?

  33. Technorati =900K not 9 million! by bobwyman · · Score: 1

    Technorati only claims to process 900,000 new entries per day - not 9 million. Burton has the number correct in his posting. The /. article quotes him incorrectly. On the other hand, the numbers cited for PubSub are correct. We have processed an average of 1,796,574 (1.8 million) new entries per day over the last 30 days. Many of our statistics are available on our site and updated daily. The "new entries per day" data can be found graphed at: http://www.pubsub.com/linkcounts_graphs.php?type=n ewentries for more graphs and tables, see: http://pubsub.com/linkcounts.php

    bob wyman
    CTO, PubSub.com

  34. Re:we've tried gzip on our server... by Craig+Ringer · · Score: 1

    mod_gzip is a C program. Like any C program, some types of programming error can cause stack corruption, which can leave unexpected crud on the stack that is then executed by the CPU. It is not impossible that system calls will be part of said crap.

    That said, it's not overly likely, and you'd have to be pretty unlucky to have something like that happen even when testing it and having it crash repeatedly.

    The more important point is "What? You were running Apache as root?"

  35. Running Apache as root by Craig+Ringer · · Score: 1

    Er... what?

    I'm not going to make the argument another fellow did (it can't corrupt your disk) 'cos it's not true, though you'd have to be pretty darn unlucky.

    The more important point is - you were running Apache as root?! If so, I don't blame your boss. If not, how exactly did it corrupt the disk? I'd be putting my money on an unrelated error (without information to the contrary) personally - early disk failure, etc.

    In general, though, firing someone for implementing software they've approved on a production server after testing is stupidity. That's not the sort of place I'd want to work anyway, frankly. He was middle management in a larger company I presume?

  36. What people even blog by TooncesTheCat · · Score: 1

    You should not have a blog if you do not meet one or more of the following requirements :

    - A eyepatch ( both eyes are even better )

    - A pegleg ( both is even better )

    - A huge scar that has some sort of badass story behind it

    - Are a bonafied certified porn star

    - Have a 15 inch penis...in a flaccid state

    Blogs really are pointless, their is a reason people are called the "average american" or "average joe schmoe".

    No one wants to read how your day went if you sit inside a cubicle all day filing reports and creaming over the hot ladies in the office without getting some of it.

    Blogs are gay.

    1. Re:What people even blog by Anonymous Coward · · Score: 0
      Blogs are gay.

      Sez the slashdot poster with the tastes of a 14-year-old boy. Wow, you are pathetic.

    2. Re:What people even blog by An+Onerous+Coward · · Score: 1

      I find your comments to be both clever and perspicacious. The way you have taken a complex phenomenon and boiled it down to its bare basics, while still remaining true to that phenomenon, demonstrates a truly remarkable talent.

      Have you considered writing a blog? People would read it.

      --

      You want the truthiness? You can't handle the truthiness!

    3. Re:What people even blog by TooncesTheCat · · Score: 1

      How are those the tastes of a 14 year old boy? What man on Earth does not want to be a certified porn star with a 15 inch flaccid penis? NOT A SINGLE GODDAMN ONE.

      Pirates are kickass too ( And I'm not copying Maddox - I always wanted to move to Malayasia and become a pirate ) What man would not want to have some badass scar with some really truly fantastic story behind it?

      "How did you get that scar Bob?"

      "I was chasing down a suspected robber and beat him to death with my peg arms and legs. The scar is from him managing to blindside me due to my eyepatch, he caught the best of me with a knife to the chest"

      No guy on Earth would not want to have a badass scar with a badass story.

      You apparently are a metosexual sir. Good day to you.

      Real men would want to be anything I listed in the parent post.

    4. Re:What people even blog by pestie · · Score: 1

      Thank you! I actually searched the comments for the word "suck," in the hopes that someone had posted a "blogs suck" comment, but your "Blogs are gay" is even better! Heh...

      Look, folks: blogs are gay. Really. They're not the "next big" anything, they're not a subversive medium that's rapidly undermining traditional media - they're a toy for people with nothing better to do than talk to themselves in public. Yes, the occasional bit of cleverness or coolness shows up on a blog here and there from time to time, but Jesus, the signal-to-noise ratio is worse than that of my first cheap Walkman clone I got back in '83! Even with RSS aggregators it's nearly impossible to sort through the crap-wave to find anything worth a damn.

      And would someone please explain to me why the user interfaces of the major blogging sites make no goddamned sense at all? Yes, I'm sure they're "easy once you get the hang of it," but I've seen countless web sites (Slashdot included) that seem to have relatively straightforward, intuitively obvious user interfaces. But oh no, God forbid that LiveJournal or MySpace have any such thing! Every time one of my friends convinces me, against my better judgement, to try one of these sites, I always give up in frustration. But I notice there's no shortage of advertising on these sites which force me to click through 17 pages to do what could have been done in 2! Hmmm...

      Blogs are for people who read Wired and take it seriously. I think that about sums it up.

      Mod me as a troll if you want - fuck it, I've got karma to burn - but I swear, the next time I see the word "blogosphere" in print, I'm going to take a dump on my desk. I have no idea how that's going to improve my mood any, but I swear I'll do it.

    5. Re:What people even blog by TooncesTheCat · · Score: 1

      God will bless you sir with intelligent babies.

      How do I know?

      The fact that you made a really intelligent post on how gay blogs are =]

  37. semi off topic by cookiepus · · Score: 2, Interesting

    Since we're on the subject of blog aggregation, can someone recomend a GOOD way to aggregate?

    Every single RSS aggregator I've come across treats my RSS world similar to an e-mail reader, where each blog is a 'folder' and each entry is equivalent to an e-mail.

    This is decidedly NOT what I want and I don't understand why everyone's writing the same thing.

    My friend is running PLANET, which builds a frontpage out of the RSS feeds (looks kind of like the slasdot frontpage where adjacent stores come from different sources and are sorted in chronolocial order (newest on top)

    PLANET seems to be a server-side implementation. My buddy's running Linux and he made a little page for me but it's not right for me to bug him every time I want to add a feed.

    Is there anything like what I want that would run on Windows? And if not, why the heck not?

    By the same token, why doesn't del.icio.us have any capacity to know when my links have been updated?

    For what it's worth, here's my del.icio.us BLOGS area with some blogs I find good.

    http://del.icio.us/eduardopcs/BLOG

    1. Re:semi off topic by Ksevio · · Score: 1

      If you've ever tried the latest version of Safari there's a very nice RSS aggregator built in.

  38. Gzip helps, but the real win is conditional get by epeus · · Score: 4, Informative

    If your weblog server implements ETag and Last-Modified, my spider can send a one packet request with the values I last saw from you, and you can send a one packet 304 response if nothing has changed.

    Charles Miller explained this well a few years ago.

    (I run the spiders at Technorati).

  39. If you poll, at least do it well... by bobwyman · · Score: 1

    While there are some great ideas in RSS, one of the worst is polling. As discussed in Burton's post, polling results in a ridiculous waste of bandwidth. A Push approach, like the one defined in "Atom over XMPP" would result in a massively more efficient distribution system like the one that we implement in the PubSub Sidebars. But, if you insist on polling, then the best efficiency can be had by combining Gzip with the A-IM or RFC3229+feed as described on my blog. Using RFC3229+feed, your server would only serve up "unread" entries not everything in your feed. Please read and implement: http://bobwyman.pubsub.com/main/2004/09/implementa tions.html

    bob wyman
    CTO, PubSub.com

    1. Re:If you poll, at least do it well... by Baricom · · Score: 1

      RSS has a method for pushing new posts via SOAP or XML-RPC. I haven't found good documentation for it, however.

      Dave, where are you when we need you?

    2. Re:If you poll, at least do it well... by bobwyman · · Score: 2, Interesting

      Baricom: What you're looking for is the "cloud" interface defined at: http://blogs.law.harvard.edu/tech/soapMeetsRss
      The documentation there is, I think, about as good as you'll find. While it says that it can be implemented in either XML-RPC or SOAP, I am aware only of XML-RPC implementations.

      The cloud provides a means for blogs to notify subscribers of updates and should eliminate the need for polling -- except that the subscriptions must be renewed at least every 25 hours. Of course, this cloud stuff isn't terribly useful in most cases since it relies on the blog server being able to send an HTTP message to a remote client (subscriber). In most cases, those messages would be blocked by firewalls. This is, of course, why the "Atom over XMPP" stuff makes sense. It relies on a connection established from the client to the server -- in the same manner as is done with instant messaging clients. Thus, there are many fewer issues with firewalls.

      Of course, having lots of session open between a client program and all of the various blogs it reads probably doesn't make much sense. Neither does it make sense for every blog to maintain a list of all of its "cloud" readers and go to the work of sending them all messages whenever the blog is updated. Thus, the most sensible way to do this push business is to have the individual blogs publish to a common network of aggregating servers and then have clients establish connections to the common service. Overall bandwidth consumption is thus reduced to the absolute minimum. That's what we're building at PubSub.com.

      bob wyman

  40. CPU time by Craig+Ringer · · Score: 1

    There are gzip accelerator PCI cards available for cases where CPU is an issue. Whether they're cheaper in large clusters than just adding some hosts or getting a bigger pipe, I don't know ... but they're another option.

  41. Louis Waweru is spamming slashdot (INFO HERE) by Anonymous Coward · · Score: 0

    HunbunFunland, AnonDotOrg, Aberfoyle, TGIFF, Gestures, CarbonBasedSoda, and BorgGates are all sock puppet accounts of the same guy who is trying to use the Slashdot comment system as his/her own personal ad agency by constantly making posts that are nothing more than thinly veiled excuses to attract traffic to his blog. His name is Louis Waweru and his information is listed below:

    WHOis info:
    -----------
    Registrant:
          Louis Waweru
          525 W. 7th Street
          Suite 2116
          Charlotte, North Carolina 28202
          United States

          Registered through: GoDaddy.com
          Domain Name: OVERHEARDINTHEUK.COM
                Created on: 16-Jul-05
                Expires on: 17-Jul-06
                Last Updated on: 16-Jul-05

          Administrative Contact:
                Waweru, Louis youngbonzi@earthlink.net
                625 W. 113th Street
                Suite 3R
                New York, New York 10025
                United States
                (646) 339-8190
          Technical Contact:
                Waweru, Louis youngbonzi@earthlink.net
                625 W. 113th Street
                Suite 3R
                New York, New York 10025
                United States
                (646) 339-8190

          Domain servers in listed order:
                NS8.ZONEEDIT.COM
                NS17.ZONEEDIT.COM

    Further Contact info:
    -----------
    youngbonzi@earthlink.net
    user-0c8h4ji.cable.mindspring.com
    AOL: louislogicnyc
    YM: lushlouis
    DOB 11/09/1981

  42. Oops! by Rob+Carr · · Score: 1
    it should end up on Slashdot in a few days anyway

    It's going to.

    Someone is going to link to the original post on their blog. That article will be recopied a few times until any link to Slashdot is lost.

    Some news reporter, hoping to pick up on the "next big thing" will take it to be a legitimate report.

    When you watch the cable news and see an over-hyped story about a car that runs on water, ask yourself if it started out as a joke on Slashdot.

    --
    This sig seemed like a good idea at the time....
    1. Re:Oops! by ebrandsberg · · Score: 1

      Then it will probably be duped on http://www.primidi.com/, submitted, and posted as a dupe on slashdot again. Sigh.

  43. How much bandwidth... by svenjob · · Score: 1

    ...does it take to get to the center of the blog aggregate? 1... 2... 3.

    --

    Totally Life!

    ALL replies

  44. We don't by supabeast! · · Score: 1

    "And more importantly, with millions of posts, what percentage of them have any real value, and how do busy people find that .001%?"

    Busy people don't waste time on blogs. Blogs are the realm of internet kooks ranting about the latest conspiracy behind secret intelligence memos, not sane people with limited free time.

    1. Re:We don't by wasted+time · · Score: 1
      Busy people don't waste time on blogs.

      Hey, I resent^Wresemble that remark!



      Blogs are the realm of internet kooks ranting about the latest conspiracy behind secret intelligence memos

      It's official then, Slashdot is the mother of all blogs.

      --
      The Stone Age did not end because humans ran out of stones. - William McDonough
    2. Re:We don't by slim · · Score: 1

      It's official then, Slashdot is the mother of all blogs.

      Well, Slashdot is a blog (which is why I always chuckle at comments saying "I don't read blogs"), and if it wasn't the first website to use the "newest article appears at the top, pushes previous ones down" format that characterises a blog, then it was certainly among the first.

  45. One problem - mod_ssl by vlad_petric · · Score: 1
    If you want ssl you either need to disable compression for https requests or do a weird hack with mod_proxy.

    In theory, the two should work together seamlessly. In practice, they don't.

    --

    The Raven

    1. Re:One problem - mod_ssl by tylers · · Score: 1

      I've used mod_ssl and mod_deflate together with apache 2.0.50 and above on slackware, debian, and solaris 9 without any issues. If you're using an older version of apache, you might try upgrading to see if it fixes the issue.

  46. How? you know how... by Duncan3 · · Score: 1

    and how do busy people find that .001%?

    They don't, they really have better things to do. The media actually does that for us already... what me worry?

    --
    - Adam L. Beberg - The Cosm Project - http://www.mithral.com/
  47. What's the webhost's motivation to use GZip? by AaronLawrence · · Score: 1

    If they use Gzip, then all their customers are suddenly using much less bandwidth and they make less money.

    Of course this would not be true if their bandwidth charges were the same as their costs ... not very likely.

    --
    For every expert, there is an equal and opposite expert. - Arthur C. Clarke
  48. I thought we knew this already! by Anonymous Coward · · Score: 0

    We don't use HTML and RSS for blogs because they are efficient, we use them because they are easy. They are low budget replacements for SOAP / .Net / J2EE and don't require the installation of new server software.

    We get away with it because it wasn't designed to have one central server running the blogs for millions of people - it was designed so that Joe Blogs (sic) can easily update a website that their /. reading bretherin set up for them - for that purpose it is more than adequate.

    We have reached an interesting juncture with RSS and Blogs. People like the technologies, they are successful, but they are hacks. Here are somethings I'd like to happen:

    • A decent stateful HTTP like web technology
    • HTTP modified so you can send the hash of a page to the server, and it will only send the bits of the page that have changed.
    • W3C to create a web server standard, that demands that only standards compliant HTML can be delivered - governments and corporations like standards. Hopefully MS will start to listen then
    • An RSS protocol that can ask for chunks of a feed
  49. Miski Client-Server-Server-Client protocol by Philip+Dorrell · · Score: 2, Interesting
    As I explained (as long ago as 2000) in Miski: A White Paper, we need a system with the following features:
    • Each producer of link suggestions has a unique address, something like channel/user@example.com. (This implies resolution via DNS, but probably people will end up using the URL of an XML file.)
    • The channel address points to the producer's server.
    • The subscriber to a channel tells their server to subscribe to the channel. The subscriber's server talks to the producer's server.
    • When the producer makes a new link suggestion, their client pushes it to their server, which pushes it to all the servers whose subscribers have subscribed to the channel.
    • Each server pushes the link suggestion to their clients (by whatever means).

    The pattern of client to server to server to client is a bit like the architecture of email, but it is quite spam-proof because you only ever receive what you asked for.

    Additionally, subscribers can instantly "repost" a suggestion to their own channel, which will be read by their subscribers. To avoid reading duplicate posts, servers will optionally filter out duplicates. However, this has a major consequence, which is that subscribers are only ever guaranteed to see the URL, which means that anything you want to say about the content of a new page has to go into the URL. The current system of RSS titles and descriptions will not work under reposting and duplicate filtering.

    The combination of real-time pushing and reposting could lead to a speeded up Internet, where exciting new ideas spread from one user to the next in a matter of minutes, without having to go through the bottlenecks of centralised attention and popular websites (such as Slashdot). This could be enough to turn the Internet into a "Global Brain", and perhaps even trigger the Technological Singularity.

    I invented Miski to solve the problem of getting people to take notice of new ideas without having to engage in a massive publicity effort, but unfortunately I've failed to get anyone to take any notice of the Miski idea.

    --
    Music: a super-stimulus for the perception of musicality. Musicality: a perceived aspect of speech.
    1. Re:Miski Client-Server-Server-Client protocol by Kent+Recal · · Score: 1

      Congratulations, you have just reinvented usenet...

  50. Th e long tail by Eivind+Eklund · · Score: 3, Informative
    I think most of these blogs have something of interest to somebody, and that the value of blogs is in their diversity - in a lot of things having value to a small number of people.

    This effect is called the The long tail effect, and is visible all over the web. For instance, Amazon.com says that every day, it sells more books that didn't sell yesterday than the sum of books sold that *also* sold yesterday. In other words, they sell (in sum) more of the items selling less than one every other day than of items selling (by type) more than that.

    Eivind.

    --
    Doubting the existence of evolution is like doubting the existence of China: It just shows that you're uninformed.
    1. Re:Th e long tail by glinden · · Score: 1

      The trick is finding the right blogs for the right people. Tricky problem, matching content to the people who are interested and no one else.

  51. Your sig says by ubernostrum · · Score: 1

    Please correct me if I got my facts wrong.

    You got your facts wrong. When feed readers use conditional GET and respect HTTP Last-Modified headers, and when feed publishers use gzip encoding (XML, like most plain text formats, compresses wonderfully), the bandwidth requirement for aggregation is minimal; the technologies themselves, then, are not inefficient; the inefficiency is in how they are being used. And the alternative you hint at, push, is nowhere near being "more efficient" since it would require an overhaul of IP, universally adopted, to implement reliably.

    1. Re:Your sig says by RAMMS+EIN · · Score: 1

      :-) Someone on /. takes sigs seriously. Good!

      I can't say I agree with you, though. Maybe I should clarify my points a bit.

      1. Push distribution should be more efficient than pull distribution, because it only sends when something has actually changed. You could argue that pull distribution can be more efficient, because multiple updates can be bundled, but the same can obviously be done with push distribution as well.

      2. XML is more verbose than, for example, s-expressions. RSS is not terrible, but when I eyeball a typicall RSS newsfeed, it seems to me there is about 30% overhead in the encoding. Sure, redundancy compresses well, but I'd be surprised if it would be smaller than a compressed version of something that didn't have the redundancy in the first place.

      3. HTTP is not necessarily a very verbose protocol, but many clients insist on adding a bunch of headers that aren't necessarily useful to the server, and the same goes for server to client communication. All in all, the headers in a typical HTTP request-response sequence can easily reach a couple of hundreds of bytes, which is a significant amount compared to a typical newsfeed item.

      I think that should clarify why I said that currently used technologies are inefficient.

      Then there's this:

      ``And the alternative you hint at, push, is nowhere near being "more efficient" since it would require an overhaul of IP, universally adopted, to implement reliably.''

      I don't know where you get that from. IP certainly doesn't care if it's the receiver initiating the dialog (pull), or the sender (push). Push distribution for newsfeeds could easily be implemented using existing protocols, or even a dedicated protocol.

      --
      Please correct me if I got my facts wrong.
    2. Re:Your sig says by ubernostrum · · Score: 1
      1. "Only send when something has changed" is available and widely implemented right now; it's called the "If-Modified-Since" header. More on why push won't work in a moment.
      2. XML is verbose by design, much as s-expressions are not. And again, with gzip encoding or other compression applied, this is not a problem; feeds compress nicely (and in case you're worried about the performance hit to the server having to gzip the feed, keep in mind you can cache that fairly easily).
      3. I'd rather deal with the bandwidth cost of conditional GET than the headache that would be implementing push. Plus it's really not that much.

      IP certainly doesn't care if it's the receiver initiating the dialog (pull), or the sender (push).

      No, but if you've got a foolproof way to do push to clients who aren't always-on, or to clients who are behind a variety of NAT and NAT-like configurations, I'd love to see it.

    3. Re:Your sig says by RAMMS+EIN · · Score: 1

      ``"Only send when something has changed" is available and widely implemented right now; it's called the "If-Modified-Since" header.''

      Oh I see. And you know in advance when things are going to change, so you can postpone making a query until then, and your HTTP-requests don't generate any traffic, right? Cause those are the two advantages of push over pull: the transfer is initiated by the side who _knows_ when there is something to transfer, and communication need only happen in one direction.

      Moreover, unless I'm mistaken, If-Modified-Since still causes the whole feed to be transfered, not just the parts that have changed since the time indicated in that header.

      ``XML is verbose by design, much as s-expressions are not.''

      I know that. Irrespective of whether you consider that an advantage for other reasons, it's a loss as far as bandwidth is concerned.

      ``And again, with gzip encoding or other compression applied, this is not a problem;''

      I'll grant you that, once you apply suitable compression, the difference is nil, especially since one of the compressions you could apply is converting to s-expressions before transmission. That does not negate the fact that RSS adds a significant amount of overhead to a news item.

      ``I'd rather deal with the bandwidth cost of conditional GET than the headache that would be implementing push. Plus it's really not that much.''

      Ok, but that's a matter of opinion. My opinion is that push is the correct distribution mechanism here, and therefore should be used in favor of what is, essentially, an imitation of push semantics on a pull protocol.

      As for the overhead not being much, that wholly depends. As I stated earlier, many clients and servers will send headers that can total up to a few hundred bytes (and these will not be compressed with standard HTTP). That, to me, is a significant overhead when a news item is also in the hundreds of bytes range.

      ``No, but if you've got a foolproof way to do push to clients who aren't always-on, or to clients who are behind a variety of NAT and NAT-like configurations, I'd love to see it.''

      Many aggregrators are always on and always reachable. There are plenty of solutions for the rest. I would probably prefer a solution where one could request "new items since ...", and request real-time updates from now on. I realize that such a solution involves extra complexity compared to one that relies on pull or push exclusively. But then, this is a direct consequence of the complexity introduced by receivers that aren't always available.

      Anyway, I get the feeling that this is another discussion in which neither side is going to convince the other. It's a pretty typical instance of Worse is Better vs. the Right Thing.

      --
      Please correct me if I got my facts wrong.
    4. Re:Your sig says by ubernostrum · · Score: 1

      Moreover, unless I'm mistaken, If-Modified-Since still causes the whole feed to be transfered, not just the parts that have changed since the time indicated in that header.

      What you're unhappy with here is not HTTP or feed formats. You're unhappy with how content-management software commonly handles feeds. It would be trivial for a CMS to either look at the If-Modified-Since header or accept an extra URL parameter for "only items since this timestamp" (e.g., 'example.com/feed/?since=2005-08-16T07:47:19Z') and treat it as a request to only return items newer than the timestamp.

      Also, you seem to have a very narrow view of the uses of feeds; "new item at example.com, click this link to read it" is, while a popular use, only one of many uses. For example, the Atom syndication format closely interoperates with the Atom API; Atom feeds contain complete representations of content objects and can be used for a variety of purposes including data import/export for any amount of content up to and including an entire site. So mandating "only send what's changed since time t" would be disastrous.

      And perhaps this is a pointless argument to have, but I actually see it as cost/benefit analysis rather the the old "worse is better" debate. Even if pull technology were widely implemented outside of email, the bandwidth cost of a conditional HTTP GET, even the bandwidth cost of millions of conditional HTTP GETs, would not equal the infrastructure cost of implementing a push system for notifications of site updates. So the proposed benefit of implementing that push system would not outweigh its costs, and so it should not be implemented.

  52. s/blog/website by ubernostrum · · Score: 2, Insightful

    Time to ditch the World Wide Web, right?.

    1. Re:s/blog/website by Abcd1234 · · Score: 1

      Indeed... it's so ironic. The original promise of the web, what made it incredibly exciting, was the idea that anyone, anywhere could have a voice. It allowed for one person to reach millions, cutting out the middle-man.

      Well, now blogs have made that dream come true by making the technology easily accessible to the masses. And what happens? People bitch and complain about the quality of blogs. Oh well, that's intellectual elitism for you...

  53. -1, redundant by WillerZ · · Score: 1

    Definition of whence: From where.

    So, you can say:

    Whence comes this discrepancy?

    but please don't use

    From whence...

    because it's redundant.

    --
    I guess today is a passable day to die.
    1. Re:-1, redundant by cyberfunk2 · · Score: 1

      Well.. there's a bit of a debate over that actually..

      both uses tend to be accepted...
      see: http://www.worldwidewords.org/qa/qa-fro2.htm

  54. The Big Count by base_chakra · · Score: 1

    Technorati recently published that they're seeing 900k new posts per day. PubSub says they're seeing 1.8M.

    PubSub later admitted they may have been double-counting.

    1. Re:The Big Count by bobwyman · · Score: 1

      PubSub have not been "double-counting" and have "admitted" no such thing. Your post is a complete lie. I can only wonder whose interests are served by spreading such lies about the service we provide...

      bob wyman

    2. Re:The Big Count by base_chakra · · Score: 1

      PubSub have not been "double-counting" and have "admitted" no such thing. Your post is a complete lie. I can only wonder whose interests are served by spreading such lies about the service we provide...

      Bob, before your head implodes, realize that it was a joke stemming from the fact that the stated PubSub estimate is exactly twice that of Technorati's.

    3. Re:The Big Count by bobwyman · · Score: 1

      Base_chakra: Before you get too righteous, you should understand that claiming that someone "double counts" or otherwise falsifies numbers can never be considered a joke. Your suggestion that our numbers are not honest is the sort of thing that builds myths that we will probably be responding to months if not years from now. At PubSub, we do our best to provide the best statistics we can. We publish more data than any other service and we are vastly more open about our numbers than anyone else. It isn't in any way "funny" or "joke-like" to suggest that our numbers are in any way tainted.

      bob wyman

    4. Re:The Big Count by Anonymous Coward · · Score: 0

      Er, you don't get to redefine the concept of humour just because you are worried about your company's reputation. The GGP was quite clearly a joke(solely because of its author's quite obvious intent in posting it), and some people(me included) would consider it to be a funny joke.

      That alone does not taint your company's reputation in my eyes. That you are so utterly unable to respond appropriately to a simple witticism, however, does.

  55. Re:The long tail [is inflated] by Anonymous Coward · · Score: 0

    If you go to blogger/blogspot and use their feature that allows you to scan random blogs, approximately 50% appear to be machine generated link farms with posts being generated every 15 minutes that all link back to a specific site.

    I can't imagine that a lot of people are sitting there reading through sets of keywords, but maybe their jobs are more boring than mine.

  56. Obligatory Maddox link by IsoRashi · · Score: 1
    --
    This is not the greatest sig in the world, no. This is just a tribute.
  57. Value is subjective by patternjuggler · · Score: 1

    And more importantly, with millions of posts, what percentage of them have any real value, and how do busy people find that .001%?

    Unless you're talking about value in terms of dollars earned per web page/blog post, value is completely subjective.

    The most objectively valuable blogs are ones that link to other sites and blogs in meaningful ways, which increases the ability of google searches to find what I'm looking for. The value of the internet is raised by making searching better for me.

    I don't really understand the anti-blog sentiment on slashdot. Most of the internet was already irrelevant to me before blogging came around, but google made it easier to not get bogged down in it.

    The thing that makes blogs different and harder for google to track is the speed at which they are updated. If something happened yesterday I don't want to wait weeks for google to spider all the new posts, I want to find sites talking about it right now- technorati and other sites do a decent job of that, but it's annoying to have to go to two places to try to find the same thing.

    Google does work if people make posts like "I'm going to go to this event next week, and I'll put pictures up, here are links to other people that are also going" and then proceed to do so, then the google will get you to the old post and you can move forward to the more recent post that actually talks about the event.

  58. Bandwidth: Blogs <<< TV shows by Clith · · Score: 1

    I would say that the bandwidth used for blog RSS feeds pales in comparison to that used for downloading TV shows these days.

    --
    [ReidNews]
  59. Another alternative by bArTmAn13 · · Score: 1

    You could of course use PSYC http://psyc.pages.de/ to syndicate your blogs... much better distribution strategy than RSS, and the overhead is not anywhere near RSS's. And you can do much more than just distributing newsfeeds... but anyway, it's one of the things it's good at :)

    1. Re:Another alternative by Anonymous Coward · · Score: 0

      what's this.. a chat technology?
      so it uses push.. and how do i get
      the news? with an irc client or
      do i need a specific psyc app?

    2. Re:Another alternative by bArTmAn13 · · Score: 1

      Basicly, however you want. IRC is possible, Jabber would be possible, a Perl-CPAN module exists so you could easily write you're own psycfeed client, if you want. Anyway, even if you don't like all the stuff that already exists, writing a psyc-app is pretty easy. It uses push and a kind of homebuild multicast to distribute the messages.

  60. RSS polls. Polling is stupid. Push is correct. by Madwand · · Score: 1

    ...and the right protocol/system for it is netnews with NNTP.

    If you make the NNTP links between servers match the physical topology of the Internet, you can make the guarantee that no message cross an Internet link (in a given direction) more than once. This is because the messages are all tagged with a network-wide unique message-ID, and duplicates (which are a necessary effects of a flood broadcast system) are rejected before they're sent.

    You couple that with clean separation of content into enough different news groups, and users who subscribe to just what they're interested in, and voila! Efficient, reliable, fast distribution of information over the Internet, even better than the so-called "P2P" file sharing networks.

    I oughta know - I am one of the guys who invented NNTP.

  61. Post Size by SomeOtherGuy · · Score: 1

    Damn....He is basing his math on a an average post size of 150K. From a textual standpoint through gzip compression -- that is closer to a BOOK than a blog entry. I can't remember the last time I read a single article of original content that was that big.

    --
    (+1 Funny) only if I laugh out loud.
  62. Not that complex (Re:All at once) by mi · · Score: 1
    The simple way to arrange this saving is to store the generated file on the disk.

    Instead of regenerating pages upon each request (pull), they should be regenerated upon each change (push). This will save not only bandwidth, but also memory and CPU (and lots of it) on the server and is, actually, easier to implement and debug -- no changes to web-server, which will be dealing with the regular file, for example.

    --
    In Soviet Washington the swamp drains you.
  63. What about message boards? by V3 · · Score: 1
    As many folks on the thread have pointed out, the value of blog entries depends on the reader's context. If you're interested in following someone's life or listening to what they have to say, blogs are fabulous mechanisms. You are probably interested in your friends' blogs. If you're a golfer, chances are you'd be interested in golfers who blog about their experiences.

    Though it seems to me that if you're interested in a particular subject, rather than a specific person, good old message boards like phpBB or VBulletin, Usenet newsgroups, or forums like Slashdot, are much better vehicles for sharing information. In these cases, you care more about the topic and not so much who it is that's doing the writing. Rather than trying to harvest relevant content about a subject from blogs, maybe we're just better off posting on message boards to begin with? You hear a lot nowadays about blogs nowadays and how great they are, but not a whole lot of noise about message boards and discussion forums. Why is that? They seem like just different ways of equal value for people to contribute information to the community.

    I've been helping out some friends who are putting together a site for people to add comments to web pages. I don't know if it's up yet, but here is an example of another mechanism, sort of in between the two, where you can either follow a particular subject or a particular author without too much difficulty.

  64. eyeout search agents do exactly that by outerbody · · Score: 1

    eyeout is a search agent tool that monitors RSS. Its also trainable so you can filter at a broad level. I seed a topic with a few keywords and then start giving it feedback, pretty soon I have a feed with boundaries based on my interest. amazing, checkitout at eyeout.com.