Slashdot Mirror


Are Long URLs Wasting Bandwidth?

Ryan McAdams writes "Popular websites, such as Facebook, are wasting as much as 75MBit/sec of bandwidth due to excessively long URLs. According to a recent article over at O3 Magazine, they took a typical Facebook home page, looked at the traffic statistics from compete.com, and figured out the bandwidth savings if Facebook switched from using URL paths which, in some cases, run over 150 characters in length, to shorter ones. It looks at the impact on service providers, with the wasted bandwidth used by the subsequent GET requests for these excessively long URLs. Facebook is just one example; many other sites have similar problems, as well as CMS products such as Word Press. It's an interesting approach to web optimization for high traffic sites."

31 of 379 comments (clear)

  1. Can they not use... by teeloo · · Score: 5, Insightful

    compression to shorten the URL's?

    1. Re:Can they not use... by dotgain · · Score: 5, Funny

      No, they cannot use TinyURL (read: goatse, tubgirl et. al) thank you very much.

    2. Re:Can they not use... by truthsearch · · Score: 4, Funny

      They should just move all the GET parameters to POST. Problem solved. ;)

    3. Re:Can they not use... by jd · · Score: 4, Informative

      Most of the time, yes, but then there's a question of trade-off. Small URLs are generally hashes and are hard to type accurately and hard to remember. On the other hand, if you took ALL of the sources of wastage in bandwidth, what percentage would you save by compressing pages vs. compressing pages + URLs or just compressing URLs?

      It might well be the case that these big web services are so inefficient with bandwidth that there are many things they could do to improve matters. In fact, I consider that quite likely. Those times I've done web admin stuff, I've rarely come across servers that have compression enabled.

      --
      It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
    4. Re:Can they not use... by gbh1935 · · Score: 5, Funny

      This thread is wasting more bandwidth

    5. Re:Can they not use... by guyminuslife · · Score: 5, Funny

      That's nothing. This is the most disgusting shit you'll ever see on the Internet.

      --
      I don't believe in time. It's a grand conspiracy designed to sell watches.
    6. Re:Can they not use... by FredFredrickson · · Score: 4, Funny

      I won't lie. I was partly relieved, but partly dissapointed when I clicked that link.

      --
      Belief? Hope? Preference?The Existential Vortex
    7. Re:Can they not use... by x_MeRLiN_x · · Score: 4, Informative

      Using a cookie, TinyURL allows you to enable previews, i.e., view where a TinyURL points to before following the link.

    8. Re:Can they not use... by dgatwood · · Score: 4, Insightful

      Depending on your network type, you may not get any benefit from shorter URLs at all. Many networking protocols use fixed-size frames, which then get padded with zeroes up to the end of the frame. For example, in ATM networks, anything up to 48 bytes is a single frame, so depending on where that URL occurs relative to the start of a frame, it's possible that it would take a 48 byte URL to cause even one extra frame to be sent.

      Either way, this is like complaining about a $2 budget overrun on a $2 billion project. Compared with the benefits of compressing the text content, moving all your scripts into separate files so they can be cached (Facebook sends over 4k of inline JavaScript with every page load for certain pages), generating content dynamically in the browser based on high density XML without all the formatting (except for the front page, Facebook appears to be predominantly server-generated HTML), removing every trace of inline styles (Facebook has plenty), reducing the number of style sheet links to a handful (instead of twenty), etc., the length of URLs is a trivial drop in the bucket.

      --

      Check out my sci-fi/humor trilogy at PatriotsBooks.

    9. Re:Can they not use... by dgatwood · · Score: 4, Informative

      And even with the wink, this still got initially moderated "Interesting" instead of "Funny".... *sigh*

      To clarify the joke for those who don't "GET" it, in HTTP, POST requests are either encoded the same way as GET requests (with some extra bytes) or using MIME encoding. If you use a GET request, the number of bytes sent should differ by... the extra byte in the word "POST" versus "GET" plus two extra CR/LF pairs and a CR/LF-terminated Content-length header, IIRC.... And if you use MIME encoding for the POST content, the size of the data balloons to orders of magnitude larger unless you are dealing with large binary data objects like a JPEG upload or something similar.

      So basically, a POST request just hides the URL-encoded data from the user but sends almost exactly the same data over the wire.

      --

      Check out my sci-fi/humor trilogy at PatriotsBooks.

    10. Re:Can they not use... by smellotron · · Score: 4, Informative

      You're missing the joke... GET requests look like this:

      GET /url?a=b&c=d HTTP/1.0

      POST requests look like this:

      POST /url HTTP/1.0
      a=b&c=d

      Same amount of content... URL looks shorter, but the exact same data as the querystring gets sent inside the request body. Thus, switching from GET to POST does not alter the bandwidth usage at all, even if it makes the URL seen in the browser look shorter.

    11. Re:Can they not use... by Dan541 · · Score: 5, Funny

      http://tinyurl.com/6rywju

      Tiny url is not all bad, this is one example of a positive use.

      --
      An SQL query goes to a bar, walks up to a table and asks, "Mind if I join you?"
  2. Wordpress has the option by slummy · · Score: 5, Informative

    Wordpress by default allows you to configure URL writing. The default is set to something like: http://www.mysite.com/?p=1.

    For SEO purposes it's always handy to switch to the more popular example: http://www.mysite.com/2009/03/my-title-of-my-post.html.

    Suggesting that we cut URL's that help Google rank our pages higher is preposterous.

  3. Who knows? by esocid · · Score: 4, Funny

    Are forums (fora?) like these wasting bandwidth as well by allowing nerds, like myself, to banter about minutia (not implying this topic)? Discuss amongst yourselves.



    Read the rest of this comment

    --
    Absolute power corrupts absolutely. indymedia
    1. Re:Who knows? by phantomfive · · Score: 4, Insightful

      Seriously. No one better tell him about the padding in the IP packet header. A whole four bits is wasted in every packet that gets sent. More if it's fragmented. Or what about the fact that http headers are in PLAIN TEXT? Talk about a waste of bandwidth.

      In reality I think by watching one youtube movie you've used more bandwidth than you will on facebook URLs in a year.

      --
      Qxe4
  4. Better way of doing it by Foofoobar · · Score: 4, Informative

    The PHPulse framework is a great example of a better way to do it. It uses one variable sent for all pages which it then sends to a database (rather than an XML page) where it stores the metadata of how all the pages interelate. As such, it doesn't need to parse strings, it is easier to build SEO optimized pages and it can increase page load times by 10 times over other MVC frameworks.

    --
    This is my sig. There are many like it but this one is mine.
  5. Depending on your viewpoint by markov_chain · · Score: 5, Insightful

    The short Facebook URLs waste bandwidth too ;)

    --
    Tsunami -- You can't bring a good wave down!
  6. Waste of effort by El_Muerte_TDS · · Score: 4, Interesting

    Of all things that could be optimized, urls shouldn't have a high priority (unless you want people to enter them manually).
    I'm pretty sure their HTML, CSS, and javascript could be optimized way more than just their urls.
    But rather than simply sites, people often what it to be filled with crap (which nobody but themselves care about).

    ps, that doesn't mean you should try to create "nice" urls instead of incomprehensible url that contain things like article.pl?sid=09/03/27/2017250

    1. Re:Waste of effort by JCY2K · · Score: 5, Insightful

      Of all things that could be optimized, urls shouldn't have a high priority (unless you want people to enter them manually). I'm pretty sure their HTML, CSS, and javascript could be optimized way more than just their urls. But rather than simply sites, people often what it to be filled with crap (which nobody but themselves care about).

      ps, that doesn't mean you should try to create "nice" urls instead of incomprehensible url that contain things like article.pl?sid=09/03/27/2017250

      Of all things that could be optimized, urls shouldn't have a high priority (unless you want people to enter them manually). I'm pretty sure their HTML, CSS, and javascript could be optimized way more than just their urls. But rather than simply sites, people often what it to be filled with crap (which nobody but themselves care about).

      ps, that doesn't mean you should try to create "nice" urls instead of incomprehensible url that contain things like article.pl?sid=09/03/27/2017250

      To your ps, most of that is easily comprehensible It was an article that ran today; only the 2017250 is unmeaningful in itself. Perhaps article.pl?sid=09/03/27/Muerte/WasteOfEffort would be better but we're trying to shorten things up.

    2. Re:Waste of effort by krou · · Score: 5, Interesting

      Exactly. If they wanted to try optimize the site, they could start looking at the number of Javascript files they include (8 on the homepage alone) and the number of HTTP requests each page requires. My Facebook page has *20* files getting included alone.

      From what I can judge, a lot of their Javascript and CSS files don't seem to be getting cached on the client's machine either. They could also take a look at using CSS sprites to reduce the number of HTTP requests required by their images.

      I mean, clicking on the home button is a whopping 726KB in size (with only 145 KB coming from cache), and 167 HTTP requests! Sure, a lot seem to be getting pulled from a content delivery network, but come on, that's a bit crazy.

      Short URIs are the least of their worries.

      --
      'If Christ had tweeted the sermon on the mount, it might have lasted until nightfall.' - John Perry Barlow
    3. Re:Waste of effort by HeronBlademaster · · Score: 4, Informative

      This very type of analysis is what YSlow is for :)

  7. Irrelevant by Skal+Tura · · Score: 5, Insightful

    It's irrelevantly small portion of the traffic, while at the scale of Facebook, it could save some traffic, but does not make any impact on the bottomline worthwhile the effort!

    150 chars long url = 150 bytes VS 50KILObytes + Images of rest of the pageview....

    I'm throwing out of my head that 50kilobytes for the full page text, but a pageview often runs at over 100kb.

    So it's totally irrelevant if they can shave off the 100kb a whopping 150bytes.

    1. Re:Irrelevant by Anonymous Coward · · Score: 4, Informative

      You missed the previous paragraph of the article where they explained where they got the 20k value, perhaps you should read the article first. :)

      They rounded down the number of references, but on an average Facebook home.php file there are 250+ HREF or SRC references in excess of 120 characters. They took that these could be shaved by 80 bytes each. Thats 80 bytes x 250 references = 20,000 bytes or 20k.

      Your math is wrong, its taking into account just one URL, when there are 250 references on home.php alone! They did not even factor in more than one page view per visit. If they did it your way, you would be looking at far more bandwidth utilization that 74MBit/sec.

  8. Mental Masturbation by JWSmythe · · Score: 5, Insightful

        This is a stupid exercise. Oh my gosh, there's an extra few characters wasted. They're talking about 150 characters, which would be 150 bytes, or (gasp) 0.150KB.

        10 times the bandwidth could be saved by removing a 1.5KB image from the destination page, or doing a little added compression to the rest of the images. The same can be said for sending out the page itself gzipped.

        We did this exercise at my old work. We had relatively small pages. 10 pictures per page, roughly 300x300, a logo, and a very few layout images. We saved a fortune in bandwidth by compressing the pictures just a very little bit more. Not a lot. Just enough to make a difference.

        Consider taking 100,000,000 hits in a day. Bringing a 15KB image to 14KB would be .... wait for it .... 100GB per day saved in transfers.

        The same can be said for conserving the size of the page itself. Badly written pages (and oh are there a lot of them out there) not only take up more bandwidth because they have a lot of crap code in them, but they also tend to take longer to render.

        I took one huge badly written page, stripped out the crap content (like, do you need a font tag on every word?), cleaned up the table structure (this was pre-CSS), and the page loaded much faster. That wasn't just the bandwidth savings, that was a lot of overhead on the browser where it didn't have to parse all the extra crap in it.

        I know they're talking about the inbound bandwidth (relative to the server), which is usually less than 10% of the traffic. Most of the bandwidth is wasted in the outbound bandwidth. That's all anyone really cares about. Server farms only look at outbound bandwidth, because that's always the higher number, and the driving factor of their 95th percentile. Home users all care about their download bandwidth, because that's what sucks up the most for them. Well, unless they're running P2P software. I know I was a rare (but not unique) exception, where I was frequently sending original graphics in huge formats, and ISO's to and from work.

    --
    Serious? Seriousness is well above my pay grade.
  9. tag: dropinthebucket by RobertB-DC · · Score: 4, Insightful

    Seriously. Long URL's as wasters of bandwidth? There's a flash animation ad running at the moment (unless you're an ad-blocking anti-capitalist), and I would expect it uses as much bandwidth when I move my mouse past it as a hundred long URL's.

    I'm not apologizing for bandwidth hogs... back in the dialup days (which are still in effect in many situations), I was a proud "member" of the Bandwidth Conservation Society, dutifully reducing my .jpgs instead of just changing the Height/Width tags. My "Wallpaper Heaven" website (RIP) pushed small tiling backgrounds over massive multi-megabyte images. But even then, I don't think a 150-character URL would have appeared on their threat radar.

    It's a drop in the bucket. There are plenty of things wrong with 150-character URLs, but bandwidth usage isn't one of them.

    --
    Stressed? Me? Of course not. Stress is what a rubber band feels before it breaks, silly.
  10. I can top that. Try the Globe and Mail! by Anonymous Coward · · Score: 5, Interesting

    For an even more egregious example of web design / CMS fail, take a look at the HTML on this page.

    $ wc wtf.html
    12480 9590 166629 wtf.html

    I'm not puzzled by the fact that it took 166 kilobytes of HTML to write 50 kilobytes of text. That's actually not too bad. What takes it from bloated into WTF-land is the fact that that page is 12,480 lines long. Moreover...

    $ vi wtf.html

    ...the first 1831 lines (!) of the page are blank. That's right, the &lt!DOCTYPE... declaration is on line 1832, following 12 kilobytes of 0x20, 0x09, and 0x0a characters - spaces, tabs, and linefeeds. Then there's some content, and then another 500 lines of tabs and spaces between each chunk of text. WTF? (Whitespace, Then Failure?)

    Attention Globe and Mail web designers: When your idiot print newspaper editor tells you to make liberal use of whitespace, this is not what he had in mind!

  11. Re:Most likely insignificant by scdeimos · · Score: 4, Interesting

    I think the O3 article and the parent have missed the real point. It's not the length of the URL's that's wasting bandwidth, it's how they're being used.

    A lot of services append useless query parameter information (like "ref=logo" etc. in the Facebook example) to the end of every hyperlink instead of using built-in HTTP functionality like the HTTP-Referer client request headers to do the same job.

    This causes proxy servers to retrieve multiple copies of the same pages unnecessarily, such as http://www.facebook.com/home.php and http://www.facebook.com/home.php?ref=logo, wasting internet bandwidth and disk space at the same time.

  12. Customer bulletin by kheldan · · Score: 4, Funny
    Dear Customer,
    In order to maximize the web experience for all customers, effective immediately all websites with URLs in excess of 16 characters will be bandwidth throttled.

    Sincerely,
    Comcast

    --
    Are YOU using the TOOL, or is the TOOL using YOU? Think about it!
  13. Better idea by Anonymous Coward · · Score: 5, Funny

    Just use a smaller font for the URL!

  14. No by kpang · · Score: 5, Insightful

    Are Long URLs Wasting Bandwidth?

    No. But this article is.

  15. Re:I can top that. Try the Globe and Mail! by LateArthurDent · · Score: 5, Interesting

    ...the first 1831 lines (!) of the page are blank...Attention Globe and Mail web designers: When your idiot print newspaper editor tells you to make liberal use of whitespace, this is not what he had in mind!

    Believe it or not, someone had it in mind. This is most likely a really, really stupid attempt at security by obscurity.

    PHB:My kid was showing me something on our website, and then he just clicked some buttons and the entire source code was available for him to look at. You need to do something about that.
    WebGuy:You mean the html code? Well, that actually does need to get transferred. You see, the browser does the display transformation on the client's computer...
    PHB:The source code is out intellectual property!
    WebGuy:Fine. We'll handle it. ::whispering to WebGuy #2:: Just add a bunch of empty lines. When the boss looks at it, he won't think to scroll down much before he gives up.
    PHB:Ah, I see that when I try to look at the source it now shows up blank! Good work!