How Not To Design a Protocol

Remind me by entotre · 2010-10-30 00:16 · Score: 1

Are slashdot accounts with auto-login also vulnerable?

Re:Remind me by bunratty · 2010-10-30 04:17 · Score: 1

Vulnerable to what?

--
What a fool believes, he sees, no wise man has the power to reason away.
Re:Remind me by entotre · 2010-10-30 04:44 · Score: 1

Having the "Herding Firesheep" story fresh in my mind, I meant the wifi vulnerability. :)
After reading the blog post I would guess /. uses (insecure) httponly cookies, but that the cookie settings of each individual account is what determines whether the cookie wifi spy tools can obtain will be useful.
Re:Remind me by bunratty · 2010-10-30 05:04 · Score: 1

It's not a WiFi vulnerability. The vulnerability is that without HTTPS, passwords and cookies are sent in the clear, so that anyone who can see your Internet traffic can impersonate you on sites you log into. This could happen on a WiFi network or on a wired network. Slashdot does not support HTTPS at all as far as I can tell.

--
What a fool believes, he sees, no wise man has the power to reason away.
Re:Remind me by somersault · 2010-10-30 05:56 · Score: 1

And it stores your passwords in cleartext.. or at least with reversible encryption.

--
which is totally what she said
Re:Remind me by bunratty · 2010-10-30 06:00 · Score: 1

That's not good, but at least someone would have to break into Slashdot's servers to get the passwords. Why don't they hash the password with some salt and store the hash in the database?

--
What a fool believes, he sees, no wise man has the power to reason away.

Re:Does it work ? by Anonymous Coward · 2010-10-30 00:16 · Score: 3, Insightful

RTFA. That's exactly what happend with HTTP. "It works". In the world of 1990. And then they started to "fix" it to keep up.

Analogy by s1lverl0rd · 2010-10-30 00:30 · Score: 1

HTTP is like a manual lawn mower. It's not flawless, pretty, blazingly fast, or elegant, but it's usable enough to do the job, and you get used to the quirks.

Re:Analogy by John+Hasler · 2010-10-30 00:57 · Score: 5, Funny

> HTTP is like a manual lawn mower.
No it isn't. A manual lawnmower is well-designed. The Web is like a lawnmower built by Rube Goldberg out of dozens of pairs of scissors, lots of string, some boards and a child's wagon, propelled by a large dog and powered by the wagging of his tail (the cookies are to get him to wag it). It's now had a clippings bag and a fertilizer cart added following the same design principles. An automatic dandilion remover, a dethatcher, and an aerator are coming soon (and several more dogs).

--
Warning: this article may contain humor, sarcasm, parody, and perhaps even irony. Read at your own risk.
Re:Analogy by phillips321 · 2010-10-30 01:10 · Score: 4, Funny

You forgot to mention that the dog taking a shit is an extra add-on........Flash!
Re:Analogy by peragrin · 2010-10-30 01:13 · Score: 4, Funny

am I the only one who now wants to see that built/build it myself?

--
i thought once I was found, but it was only a dream.
Re:Analogy by Anonymous Coward · 2010-10-30 01:41 · Score: 1, Funny

...which smells so bad because the dog has been fed the worst dogfood, called PHP
Re:Analogy by arth1 · 2010-10-30 02:18 · Score: 2, Insightful

Rube Goldberg? Quite the opposite. The HTTP protocol is very simple, eminently debuggable, plus extensible both ways.
It's the implementations in browsers and servers that suck.
Now *SOAP*, layered on top of HTTP, is truly a Rube Goldberg invention with no redeeming qualities whatsoever.
Re:Analogy by postbigbang · 2010-10-30 03:47 · Score: 2, Interesting

Part of the problem is historical. Tim B-L wanted to make a WYSYWYG viewer system. Back in the day when it was invented, it was dangerous. Dangerous because it was an independent, open API set that worked wherever a browser worked. That flew in the face of tons of proprietary software. It was a transport-irrelevent protocol set that took the best of different coding schemes and made it work. Like most things invented by a single (or very few) person(s), it was a work of art. But it was state of the art nearly two decades ago, and we've come a lonnnnnnng way.
When http and W3C were hatching, there were still battles about ARCNet, Token Ring, Ethernet, and something called ATM. Now most of the world uses Ethernet and Ethernet-like communications using TCP/IP-- which back then, was barely running across the aforementioned networking protocols.
Lawn mowers, by contrast, were a 2-stroke, then 4-stroke engine with a blade and housing. The need, whacking grass, hasn't changed. By contrast, we now make browsers do all sorts of things never invisioned in the early 1990's. And we're planning stuff not really imagined in 2000. In 2020, browsers may be gone, or they may be *completely* different tools than they are now. Lawnmowers will still only whack grass.

--
---- Teach Peace. It's Cheaper Than War.
Re:Analogy by John+Hasler · 2010-10-30 03:55 · Score: 4, Funny

Lawn mowers, by contrast, were a 2-stroke, then 4-stroke engine with a blade and housing.
It would appear that you do not know what a manual lawnmower is.

--
Warning: this article may contain humor, sarcasm, parody, and perhaps even irony. Read at your own risk.
Re:Analogy by postbigbang · 2010-10-30 04:05 · Score: 1

LOL. Got me.

--
---- Teach Peace. It's Cheaper Than War.
Re:Analogy by Joce640k · 2010-10-30 04:05 · Score: 1

I would have said it more like a baby stroller which later had to do duty as a lawnmower and a vacuum cleaner while still maintaining full backwards compatibility and increasing capacity up to 200 babies.

--
No sig today...
Re:Analogy by grumbel · 2010-10-30 04:19 · Score: 1

The HTTP protocol is very simple, eminently debuggable, plus extensible both ways.
Simple, yes, but I'd say its a little to simple for its own good. For example I find it rather ridiculous that in 2010 I still can't reliably continue an interrupted download, as without any form of checksum the browser might just append new data to a file containing garbage and not even know it.
Re:Analogy by Bigjeff5 · 2010-10-30 04:27 · Score: 1

God I hope so!

--
Security is mostly a superstition... Avoiding danger is no safer in the long run than outright exposure. - Helen Keller
Re:Analogy by mikael_j · 2010-10-30 04:34 · Score: 1

Now *SOAP*, layered on top of HTTP, is truly a Rube Goldberg invention with no redeeming qualities whatsoever.
Yet a lot of times it's the only thing that makes sense from a business perspective, more elegant solutions often require a lot more work while the majority of your systems can somewhat easily be made to work with SOAP. Not trying to defend it, it's still pretty ugly but connecting to different systems using SOAP is often faster than using something elegant, and the boss doesn't care about "elegant" (I'm sure there are exceptions and I'd love to work for someone like that, most don't though).

--
Greylisting is to SMTP as NAT is to IPv4
Re:Analogy by Bigjeff5 · 2010-10-30 04:37 · Score: 4, Insightful

The only reason the implementations in browsers suck is because HTTP is such a hack-job of a protocol (it wasn't originally, but then it was not originally designed to do what it does today). The browsers are left dealing with issues which the HTTP "specification" (which isn't even fully documented, btw) either completely ignores or recommends practices that are completely unrealistic.
One example from the article: the HTTP spec recommends a minimum of 80kb for request headers (20 cookies per user, 4kb per cookie). However, most web servers limit request headers to 8kb (Apache) or 16kb (IIS) in order to prevent denial of service attacks. It is very important that they limit the headers - not doing so leaves them wide open to attack. The HTTP recommendations are completely unreasonable in this regard and fly in the face of good security practice. They are also completely ignored in this and many other cases, because they are so unreasonable.
If the protocol were simple, clear, well designed, and well defined then the browser implementations wouldn't have to suck. It's HTTP that has caused this problem, not the other way around.
It was a very limited protocol that became way too popular, and now we're stuck with a bunch of hacks to get it to work with modern web technology.

--
Security is mostly a superstition... Avoiding danger is no safer in the long run than outright exposure. - Helen Keller
Re:Analogy by Blakey+Rat · 2010-10-30 04:57 · Score: 1

That'll be amusing until one of the scissors cuts off the dog's tail, you monster!

--
Comment of the year
Re:Analogy by somersault · 2010-10-30 05:58 · Score: 1

I don't think I'd mow my lawn with it though
That's the beauty of it.. you don't have to, because the dog does everything! I want one!

--
which is totally what she said
Re:Analogy by zippthorne · 2010-10-30 07:12 · Score: 1

True, but neither did the poster originally making the analogy....

--
Can you be Even More Awesome?!
Re:Analogy by cynyr · 2010-10-30 07:24 · Score: 1

I think the GGP means a "real mower".
By contrast though, an "automatic" lawnmower to me is like a deadly roomba. So what is a riding lawnmower, a self propelled lawnmower, a powered lawnmower, and do we make distinctions based on fuel/power type?

--
All of the above was encrypted with a Quad ROT-13 method. Unauthorized decryption is in violation of the DMCA.
Re:Analogy by ant_tmwx · 2010-10-30 07:36 · Score: 1

check out metalink. most linux distributions use it for large downloads and software updates.
Re:Analogy by arth1 · 2010-10-30 08:05 · Score: 1

You have both a timestamp as well as a byte range, so downloads can reliably be continued as long as you can trust the server.
If the file contains garbage, that's surely the fault of the client and not the protocol?
Nothing stops the client from generating and saving a checksum for every kB received.
Re:Analogy by grumbel · 2010-10-30 10:22 · Score: 1

If the file contains garbage, that's surely the fault of the client and not the protocol?
Without a checksum the client has no way to detect the difference between garbage and the real data. And the Last-Modified header isn't good enough in practice, it might be missing, it might change on dynamically generated URLs, it might change when switching to a different server, etc. There are plenty ways to screw things up in practice, which is why you have all those "please check the md5sum manually" when downloading an .iso image or why you use a protocol like Bitorrent which does the checksumming instead.
Re:Analogy by arth1 · 2010-10-30 11:14 · Score: 1
And the Last-Modified header isn't good enough in practice, it might be missing, it might change on dynamically generated URLs, it might change when switching to a different server, etc.
Surely, that's a fault of the server, and not the protocol?
The HTTP protocol has a last-modified header, and if it incorrect, that's not the fault of the protocol.
And the combination of last-modified and byte ranges means that the client can keep track of what it has received and what it needs to fetch again. If your client can't do that, it's again not the fault of the protocol.

here are plenty ways to screw things up in practice, which is why you have all those "please check the md5sum manually" when downloading an .iso image or why you use a protocol like Bitorrent which does the checksumming instead.
No, back in the days of zmodem and faulty FTP clients, it was necessary. Today, there are only a few reasons for the md5sums:
1. There are policies in place since earlier days that haven't been changed yet.
2. The big image is hosted on a server outside the publisher's control. Checking with an md5sum that is hosted by the publisher is a way to avoid hacked ISOs with backdoors and other malware. But that has nothing to do with the HTTP protocol.
3. The sites are run by ignorant admins who do this because others do it.
4. The admins know fully well it doesn't buy anything when the files are hosted on the same host, but do it anyhow because users are ignorant enough to feel safer that way.
Re:Analogy by grumbel · 2010-10-30 11:56 · Score: 1

The HTTP protocol has a last-modified header, and if it incorrect, that's not the fault of the protocol.
Even when it is correct it tells you absolutely nothing about the content. Getting two files with the same time stamp is not exactly hard.
And most importantly, it breaks the other way around too, just because the timestamp changed on a HTTP request doesn't mean the content changed to. Thus instead of continuing a download, you might be forced to redownload content you already have.

The main thing is ... by thomst · 2010-10-30 00:40 · Score: 2, Funny

... cookies are delicious!

--
Check out my novel.

Re:The main thing is ... by bunratty · 2010-10-30 04:28 · Score: 1

Delicious delicacies!

--
What a fool believes, he sees, no wise man has the power to reason away.
Re:The main thing is ... by sjames · 2010-10-30 06:25 · Score: 1

Delicious strawberry flavored death!
Re:The main thing is ... by bunratty · 2010-10-30 07:59 · Score: 1

Is that what Opera called cookies?

--
What a fool believes, he sees, no wise man has the power to reason away.

Aww shoot... by MacGyver2210 · 2010-10-30 00:43 · Score: 1

Darn...and here I thought this was going to be an article on the OSI Network model...

http://en.wikipedia.org/wiki/OSI_model

--
If the only way you can accept an assertion is by faith, then you are conceding that it can't be taken on its own merits

Re:Aww shoot... by timeOday · 2010-10-30 02:02 · Score: 5, Insightful

Ah, the OSI model (circa 1978), the polar opposite of Cookies - a spec so glorious, it's still commonly cited - yet so useless it's a 30 year old virgin, having never been implemented!
Re:Aww shoot... by pacman+on+prozac · 2010-10-30 04:57 · Score: 1

It has been implemented in IS-IS, used in some service provider networks.
Re:Aww shoot... by Bigjeff5 · 2010-10-30 05:02 · Score: 1

That's because it's just a description of the network structure, not a protocol in itself. It's only a specification in the sense that it accurately describes how networks must be layed out. It is in fact implemented everywhere. It has to be, or a network connection does not exist. The specific protocols don't matter, the OSI model doesn't care about them beyond describing which layer they fall into.
Layer 1 is your physical connection - any medium over which data is transmitted (coax, microwave, fiber, radio, etc) falls under this layer.
Layer 2 is the data link layer - your MAC address is part of this layer, along with the switch/router your machine connects to. Also here is PPP, SNAP, ethernet DLC, etc.
Layer 3 is the network layer - ARP, ICMP, IPX, IP, etc all fall under this layer.
Layer 4 is the transport layer - TCP, UDP, SPX, NSPDNA, ADSP, etc all fall under this layer
Layer 5 is the session layer - DAP, NetBEUI, RPC, etc all fall under this layer
Layer 6 is the presentation layer - LPP, XDR, NetBIOS, etc fall under this layer
Layer 7 is the application layer - DHCP, HTTP, NFT, RFA, X Windows, FTP, NTP, NFS, etc all fall under this layer
A few protocols span multiple layers (not many), and some layers are skipped (anything that is sessionless and presentationless doesn't need the fifth and sixth layer, for example), but everything needs up to at least the 4th layer and anything in user land must have a protocol in the 7th layer in order to communicate.
It's a description (like all specs), and it is well used today in networks everywhere.
One of the main problems with HTTP is it is sessionless - it really needs something between TCP and HTTP to handle sessions, but instead cookies were hacked on by browsers (thank you Netscape) to give some semblance of sessions to a sessionless protocol. Cookies have since been expanded and further bandaided and completely mis-managed by the http protocol, leaving us with piss-poor implementations of cookies some 15 years after their creation.

--
Security is mostly a superstition... Avoiding danger is no safer in the long run than outright exposure. - Helen Keller
Re:Aww shoot... by satch89450 · 2010-10-30 05:23 · Score: 1

You forgot layer 8: The user of the applications, which can be human or machines.
Re:Aww shoot... by zach_the_lizard · 2010-10-30 05:34 · Score: 1

The OSI model has been implemented, if you can call it that. It's more of a descriptive model of how networking works than anything else. Now, OSI protocols, that's another story. IS-IS has been deployed in ISPs, but stuff like CLNP has never been widely used. I believe there was some talk about moving to it though.

--
SSC
Re:Aww shoot... by klapaucjusz · 2010-10-30 08:18 · Score: 3, Informative

Ah, the OSI model [sic, recte suite], [...] having never been implemented!
Saying that the full OSI suite has never been implemented is like saying that nobody implements the full set of standard track RFCs -- which is true, since some standard track RFCs are mis-designed or even contradict other standard-track RFCs.
Large parts of the OSI suite have been implemented, and some are still running today. For example, IS-IS over CLNP is commonly used for routing IP and IPv6 traffic on operators' backbones. (I was about to mention LDAP and X.509 before I realised they are not necessarily the best-designed parts of OSI.)
Where you are right, though, is that large parts of OSI are morasses of complexity that have only been implemented due to government mandate and have since been rightly abandoned.
Re:Aww shoot... by bar-agent · 2010-10-30 14:33 · Score: 1

a spec so glorious, it's still commonly cited - yet so useless it's a 30 year old virgin, having never been implemented!
"Implemented." Is that what the kids are calling it these days?
Actually, it does sound kinda dirty.

--
i'd hit it so hard, if you pulled me out you'd be the king of britain [bash.org]
Re:Aww shoot... by Eivind · 2010-10-31 18:24 · Score: 1

And in practice, layers 5 and 6 are pretty useless. Not that the functionality is, but that it's seldom seen as a function of the network proper, but instead handled in the app.
For example, today, remote-procedure-calls are frequently handled ontop of http, and don't even get me started on the "presentation" layer.

Cookies should be replaced by Anonymous Coward · 2010-10-30 00:49 · Score: 1, Interesting

The whole cookie system should be replaced by a system based on public key cryptography. Replace domain scope by associating sessions with the public keys of the client and the server. Authenticate each chunk of exchanged data by signing a hash value. Browsers could offer throwaway key pairs for temporary sessions and persistent key pairs for preferences and permanent logins.

Re:Cookies should be replaced by Sique · 2010-10-30 01:53 · Score: 1

But then you run into problems if sessions are to be detached to different servers, because not a single computer answers your requests, but a large server farm, maybe geographically distributed worldwide.

--
.sig: Sique *sigh*
Re:Cookies should be replaced by ultranova · 2010-10-30 03:12 · Score: 2, Insightful

But then you run into problems if sessions are to be detached to different servers, because not a single computer answers your requests, but a large server farm, maybe geographically distributed worldwide.

But these servers need to communicate anyway to maintain a "session" in any meaningful sense, so they can as well send the associated crypt key with the rest of the session information.

--
Forget magic. Any technology distinguishable from divine power is insufficiently advanced.
Re:Cookies should be replaced by SanityInAnarchy · 2010-10-30 06:12 · Score: 1

Not true. The best systems are idempotent on the server side, storing any state associated with a single "session" in the cookie itself. This is precisely so that these different servers only have to communicate permanent state, if that.

--
Don't thank God, thank a doctor!
Re:Cookies should be replaced by RichiH · 2010-10-31 02:28 · Score: 1

> But these servers need to communicate anyway to maintain a "session" in any meaningful sense, so they can as well send the associated crypt key with the rest of the session information.
If they use the session for _authentication_ only, that is not necessarily case.

More restrictive spec could have averted this by thasmudyan · 2010-10-30 00:53 · Score: 5, Interesting

I still think allowing cookies to span more than one distinct domain was a mistake. If we had avoided that in the beginning, cookie scope implementations would be dead simple and not much functionality would be lost on the server side. Also, JavaScript cookie manipulation is something we could easily lose for the benefit of every user, web developer and server admin. I postulate there are very few legitimate uses for document.cookie

Re:More restrictive spec could have averted this by Sique · 2010-10-30 01:55 · Score: 2, Interesting

It was created to allow a site to dispatch some functionality within a session to dedicated computers, let's say a catalog server, a shopping cart server and a cashier server.

--
.sig: Sique *sigh*
Re:More restrictive spec could have averted this by TheRaven64 · 2010-10-30 02:08 · Score: 1

With that restriction, you'd have had to log in to tech.slashdot.org, linux.slashdot.org, slashdot.org, and so on all separately. As it is, you have to log into slashdot.org and {some subdomain}.slashdot.org separately.
A better solution might be to put cookie policies in either a well-known location on the web server (as with robots.txt) or in DNS records (as with SPF). That way, domains like slashdot.org could say 'cookies are shared between all subdomains' while domains like .com would have no entry and so cookies would be on a per-subdomain basis.

Hubbub [hubbub.at]: privacy-oriented, distributed, open source social network
The world doesn't need more incompatible social networking platforms, it needs one well-defined, well-designed, social networking protocol.

--
I am TheRaven on Soylent News
Re:More restrictive spec could have averted this by thasmudyan · 2010-10-30 02:14 · Score: 1

It's clear why it was created. I would argue, however, that the same effect can be achieved by other means on the server side and at the same time it would have made client implementations much much easier. And safer.
Re:More restrictive spec could have averted this by Sique · 2010-10-30 02:16 · Score: 1

Then describe those "other means".

--
.sig: Sique *sigh*
Re:More restrictive spec could have averted this by thasmudyan · 2010-10-30 02:21 · Score: 1

With that restriction, you'd have had to log in to tech.slashdot.org, linux.slashdot.org, slashdot.org, and so on all separately.

Yeah, there is no technical reason to have those subdomains anyway. (Other than that it looks cool.)

As it is, you have to log into slashdot.org and {some subdomain}.slashdot.org separately.

If you really needed to pass auth tokens around through subdomains, there are other more secure schemes available to do exactly that.
But even if you're a total fan of semantic subdomains, there is a real argument to be made that you should first have to prove to the browser you actually own the root domain and the subdomain before being allowed setting cookies for them. Though such an extra step would have added complexity, it wouldn't have been anywhere near as ugly as the wildcard/TLD/heuristics mess we got today.

The world doesn't need more incompatible social networking platforms, it needs one well-defined, well-designed, social networking protocol.
I waste my time on what I feel like, thank you. What the world needs is more people who actually do things instead of sniping cheap shots from the sidelines. And my sig is completely irrelevant to this dicussion. Feel free to diss me in a private message anytime.
Re:More restrictive spec could have averted this by thasmudyan · 2010-10-30 02:28 · Score: 4, Insightful

Then describe those "other means".
First, this happens only rarely in practice. Most of the time these types of ID handovers are done by huge commercial sites such as eBay and even they have cleaned up their URL mess considerably in the last years. Nowadays, big sites tend to have multiple transparent front-end servers that handle incoming connections to a single domain. Using subdomains as a means of differentiating separate machines is not all that common anymore, especially when they exchange lots of data.
But if you really need this functionality, you can just as easily pass a one-time auth token by URL and create another cookie on the second server. There is really no trickery involved here. And if you need to make it very very secure, you can use OAuth, but that would be overkill for the scenarios we're talking about here.
Re:More restrictive spec could have averted this by Skapare · 2010-10-30 02:37 · Score: 1

This functionality would be achieved with a very simple rule. The rule is simply that for a given hostname, the cookie can be accessed by any hostname that is LONGER than the hostname it was set for. So if "example.co.uk" sets a cookie, "foobar.example.co.uk" can access it. A website can simply make use of this by directing people to the core web site. Note that even this can be abused. A registrar might set up "co.uk" and set a cookie that every domain in "co.uk" can access.

--
now we need to go OSS in diesel cars
Re:More restrictive spec could have averted this by istartedi · 2010-10-30 05:11 · Score: 1

What the world needs is more people who actually do things instead of sniping cheap shots from the sidelines
And, if I may add, "How do you know that software won't form the base for an open standard some day?".
Documents take time and cost money. Free reference implementations are priceless.

--
For all intensive purposes, "whom" is no longer a word. That begs the question, "who cares"?
Re:More restrictive spec could have averted this by Sique · 2010-10-30 05:30 · Score: 1

This won't work if at the moment the user's connection breaks, times out or whatever causes the transfer to fail and he's trying to restart the transaction. And it is susceptible to replay attacks.

--
.sig: Sique *sigh*
Re:More restrictive spec could have averted this by thasmudyan · 2010-10-30 05:42 · Score: 1

This is where you're simply wrong. When the user's connection breaks, he still has the cookie for server 1, which is presumably the server all business was being conducted on until the failed handover. There is nothing preventing people from clicking the "next" button one more time if their connection tragically failed on the first attempt.
Also, it is not susceptible to replay attacks, because it's a one-time auth token. In other news, you know what else actually is susceptible to replay attacks? Cookies! And not only at the moment of any handover too, but every single request! Of course SSL solves all this but why it failed to become the default web protocol, that's probably a topic for another thread...
Re:More restrictive spec could have averted this by thasmudyan · 2010-10-30 05:50 · Score: 1

That's an excellent point, thank you! Apart from the fact that I don't really have the influence to publish some lofty new protocol standard (and make people care about it at the same time), I absolutely agree with you that things should be tested in real life first. There are many examples where I believe the committee-designed version was horrible compared to something already in practical use - such as XML Schema versus Relax NG.
Case in point, I was totally proud of the protocol I used for the prototype of my project, right up to the point where I discovered that it was horribly misguided and too error-prone. That was an awesome experience, because now I get to do a much better version and if other people ever have to re-implement it, they'll probably be thankful for it.
Re:More restrictive spec could have averted this by Pollardito · 2010-10-30 06:34 · Score: 1

I think you don't even have to go that far, you just have to make sure that the browser request passes along the path/url/etc of the cookie along with the value. Most of these problems with cookies being clobbered has to do with the application not being able to tell that it's not reading the cookie for its domain, but is instead reading the one for the top-level domain (or the non-secure one, or the non-http-only one, etc). If the application had all the applicable cookies and knew which was which, then it seems like a lot of the problems go away.
Re:More restrictive spec could have averted this by Tim+C · 2010-10-31 23:56 · Score: 1

I postulate there are very few legitimate uses for document.cookie
A site I am currently working on uses it to set a cookie from the client-side in order to store view preferences on an otherwise static site. This lets users change font size and colour contrast settings (all handled by JS and/or CSS) without need for any server-side processing.
(Making the entire site static HTML was mandated for performance reasons)

--
It's official. Most of you are morons.

Re:Does it work ? by OG · 2010-10-30 01:02 · Score: 4, Insightful

When I'm designing a solution, I don't ask if it works, I was if it works well. Is it secure? Is it scalable? What are the risks associated with it? Is it full of kludges that make bad implementations easy? What do I do if a user decides she doesn't trust that functionality and turns it off? And the point of the article wasn't to say that people shouldn't use cookies when developing web site or applications. Rather, it's an examination of how a sub-optimal solution came to be so that perhaps other people can avoid similar pitfalls in the future.

Not planned by Thyamine · 2010-10-30 01:10 · Score: 2, Insightful

I think it can be hard to plan for this far into the future. Look how much the web has changed, and the things we do now with even just HTML and CSS that people back in the beginning probably would never have even considered doing. You build something for your needs and if it works then you are good. Sometimes you don't want to spend time planning it out for the next 5, 10, 20 years because you assume (usually correctly) that what you are writing will be updated long before then and replaced with something else.

--
I will shred my adversaries. Pull their eyes out just enough to turn them towards their mewing, mutilated faces. Illyria

Re:Does it work ? by Anonymous Coward · 2010-10-30 01:11 · Score: 1, Informative

I wonder how many code snippets of yours have appeared on The Daily WTF. Just because something works doesn't mean it's good.

I knew a pilot who flew with duct tape holding down the fuel cap on his wing. That worked too, but it's hardly ideal is it?

Here in Australia a few years back, a major power substation was "working" only because someone rigged up a hose to constantly drip water on an overheating thingomajig. Sure it works and props to the hardhack, but it's a piece of shit that can easily stop working.

You see, some of us prefer things not to be a piece of shit.

I'd be happy if we could declare valid cookies by Vekseid · 2010-10-30 01:11 · Score: 1

On a domain.

Like the crosssite.xml or robots.txt files. "Cookies on this site must follow this pattern." Or somesuch.

Most of the rest, I can cope with. Cookie pollution from various forms of injection, not so much.

--
Adult Role Playing Forum

Re:I'd be happy if we could declare valid cookies by Sique · 2010-10-30 01:57 · Score: 1

You could actually implement that in your server. Throw away any cookies you are not interested in.

--
.sig: Sique *sigh*

Replace, rather than repair by AlecC · 2010-10-30 01:27 · Score: 1

TFA makes it clear that it is impossible to repair the current cookie system: it is really badly broken, and several previous attempts have failed.

Could we therefore design a complete new replacement system, to be implemented in parallel, and added as part of the HTML5 standard? If it were well specified, so that all implementations were consistent, and had all the features that TFA shows are needed, it should be both easy to use and have serious benefits for the site designer as well as the user. In which case, designers might be inclined to do
if then else

The important thing is that it must be easy to use the replacement (e.g. no inter-browser weirdness) and the designer must get some payoff in terms of a better site. Of course, the user will also get a payoff - probably bigger - in terms of better security amongst other things. But, realistically, it is the designers convenience which will win the day. Once you get the big four (or so) browsers implementing the same standard, and designers regarding that as a preferred option, it has a chance of taking over.

Who can design such a system? Assuming a perfect "supercookie" system is designed, how do we get it into the standard? And what is the game-changing power feature that will bribe site designers to use the supercookie?

--
Consciousness is an illusion caused by an excess of self consciousness.

Re:Replace, rather than repair by tepples · 2010-10-30 02:47 · Score: 1

Is HTML5 localStorage anything like what you want?
Re:Replace, rather than repair by AlecC · 2010-10-30 03:58 · Score: 1

You need more than that, as the comments on TFA explain. You need a limitation on space. You need expiry. You need very carefully defined sharing, so sites can federate. You probably need enforcement of https. On the other hand, you need very little storage: rigorously controlled UUIDs seem to me to provide all that is needed i.e. a record of your previous visit to this or a federated site.

--
Consciousness is an illusion caused by an excess of self consciousness.
Re:Replace, rather than repair by tepples · 2010-10-30 06:00 · Score: 1

You need a limitation on space.
Which localStorage has, as I understand the spec. As it stands now, user agents SHOULD allocate 5 MB per origin until the user raises or lowers it.

You need very carefully defined sharing, so sites can federate.
A cross-document message broker running as a script in one document can very carefully define how other documents can interact with an origin's store.

You probably need enforcement of https.
Right now HTTPS requires a static IP and a certificate. More widespread HTTPS would need more widespread deployment of DNSSEC and either IPv6 or client operating systems supporting SNI.

Re:Does it work ? by Bacon+Bits · 2010-10-30 01:42 · Score: 2, Insightful

It is this type of thinking that separates a carpenter from an engineer.

--
The road to tyranny has always been paved with claims of necessity.

"Working" is different from "working well". by Anonymous Coward · 2010-10-30 01:49 · Score: 5, Insightful

"Working" is measured over a very wide spectrum. On one hand, we have "broken", and on the other we have "working perfectly". The web is far, far closer to the "broken" side of the spectrum than it ever has been to the "working perfectly" side.

Put simply, almost everything about the web is one filthy hack upon another. It's a huge stack of shitty "extensions" that were often made with little thought, so it's no wonder web development is so horrible today.

HTTP has been repurposed far more than it should have been. Its lack of statefulness has resulted in horrible hacks like cookies and AJAX. HTTP makes caching far harder than it should be. SSL and TLS are mighty awful hacks. And those are just a few of its problems!

HTML is a mess, and HTML5 is just going to make the situation worse. Even after 20 years, layout is still a huge hassle. CSS tries to bring in concepts from the publishing world, but they're not at all what we need for web layout, and thus everyone is unhappy.

A lot of people will claim otherwise, and they're wrong, but JavaScript is a fucking horrible scripting language. It's even worse for writing anything significant. And no, it's absolutely nothing like Scheme (some JavaScript advocate always makes this stupid claim whenever the topic of JavaScript's horrid nature comes up).

PHP is one of the few popular languages that can rival JavaScript in terms of being absolutely shitty. Then there are other server-side shenanigans like the NoSQL movement, which arose solely because there are a lot of web "developers" who don't know how to use relational databases properly. I've seriously dealt with such "developers" and many of them didn't even know what indexes are!

Most web browsers themselves are quite shitty. It has gotten better recently, but they still use huge amounts of RAM for the relatively simple services they provide.

The only people involved with some sort of web-related software development who aren't absolute fuck-ups are those working on HTTP servers like Apache HTTPd, nginx, and lighttpd. But now we're seeing crap like Mongrel and Mongrel2 arising in this area, so maybe it's only a matter of time before the sensible developers here move on.

So just because the web is "sort of broken", rather than "completely fucking broken", it doesn't mean that it's "working".

Re:"Working" is different from "working well". by BlueStraggler · 2010-10-30 04:03 · Score: 4, Insightful

HTML is a mess
Unquestionably, yes. And yet it has nevertheless become the most pervasive, flexible, universal communication medium in the history of the world, so it's a glorious mess. It is questionable whether a better-specified system would have succeeded in this, because it would have been too locked down into its designer's original intent. It is precisely the hackability of HTML/http that makes it both fucking awful and fucking brilliant.
Re:"Working" is different from "working well". by DavidTC · 2010-10-30 04:28 · Score: 1

PHP isn't as shitty as people want to make it out to be.
It's certainly an inconsistent language, but arguments being in weird orders and some function having _ and some not doesn't really make a language 'shitty'. Especially now that it's a real OOP and if you actually use that part it's pretty consistent.
And thanks to HTTP's shittiness and web servers being bitches it often results in PHP being not being stateful either, but that's not really PHP's fault. None of the 'cgi' languages are stateful, and even if the language is, like Perl, you're not using that statefulness in web-based programs.

--
If corporations are people, aren't stockholders guilty of slavery?
Re:"Working" is different from "working well". by quacking+duck · 2010-10-30 04:55 · Score: 3, Insightful

I've been noticing technology trending towards biological models, either intentionally or otherwise. Genetic algorithms. Adaptable AIs. Computer viruses, even.
The rise of the internet and the web models this, too. Much like our own DNA, there's a lot of redundancy, legacy functionality that borders on harmful, and amazing features that are the result of (tech/biological) hacks upon hacks, but they survived not because they were necessarily the best, but because they allowed earlier iterations (ancestors/early web) to be more flexible and adaptable, so it flourished.
Re:"Working" is different from "working well". by AK+Marc · 2010-10-30 07:40 · Score: 1

HTTP is basically a text transfer protocol (with those newfangled images included). Everything else is a hack. I mean upgrade. In the "old days" Gopher was along side FTP, telnet, and everything else. But today, people use their browser to do all that. Most file transfers default to HTTP, rather than FTP. And people use browsers to remotely control other computers and all that. People seem to like the one large program that does everything, even if it's all done poorly.

--
Learn to love Alaska
Re:"Working" is different from "working well". by Mad+Merlin · 2010-10-30 09:31 · Score: 1

A lot of people will claim otherwise, and they're wrong, but JavaScript is a fucking horrible scripting language. It's even worse for writing anything significant. And no, it's absolutely nothing like Scheme (some JavaScript advocate always makes this stupid claim whenever the topic of JavaScript's horrid nature comes up).
Truly spoken by someone who's never made it past <a href="#" onclick="alert('hello world')">click me</a>.

--
Game! - Where the stick is mightier than the sword!
Re:"Working" is different from "working well". by Anonymous Coward · 2010-10-30 11:05 · Score: 1, Interesting

HTTP is a text transfer protocol? Are you serious? With only the Content-Type header it's already leaps and bounds beyond FTP at identifying what the actual content of a file is.
FTP: Hmm, do I need to set ASCII or BINARY when I want to transfer this file? There's no extension, how do I tell? There's no manifest in the directory, how do I tell? I might have to transfer it a second time if I get it wrong.
HTTP: HEAD /SomefilewithoutExtension. Oh, it's a text/plain, text/rtf or image/gif file. GET /SomefilewithoutExtension.

Why the hate.... by Ancient_Hacker · 2010-10-30 01:51 · Score: 5, Informative

Why go hatin' on this particular protocol?

Most of them are just nuckin futs:

* FTP: needs two connections. Commands and responses and data are not synced in any way. No way to get a reliable list of files. No standard file listing format. No way to tell what files need ASCII and which need BIN mode. And probably more fubarskis.

* Telnet: The original handshake protocol is basically foobar-- the handshakes can go on forever. Several RFC patches did not help much. Basically the clients have to kinda cut off negotiations at some point and just guess what the other end can and will do.

* SMTP: You can't send a line with the word "From" as the first word? I'm not a typewriter? WTF?

Re:Why the hate.... by Anonymous Coward · 2010-10-30 02:14 · Score: 4, Insightful

Telnet dates to 1969. FTP dates to 1971. SMTP dates to 1982. HTTP dates to 1991, with the current state of affairs mostly dictated during the late 1990s.
It's excusable that Telnet, FTP and even SMTP have their issues. They were among the very first attempts ever at implementing networking protocols. Of course mistakes were going to be made. That's expected when doing highly complex stuff that has absolutely never been done before.
HTTP has no such excuse. It was initially developed two to three decades after Telnet and FTP. That's 20 to 30 years of mistakes, accumulated knowledge and research that its designers and implementors could have learned from.
Re:Why the hate.... by Carl+Drougge · 2010-10-30 02:19 · Score: 1

SMTP has no such restriction. (Not saying it's good exactly, but it doesn't have that particular problem.)
The unix mbox format has that problem though, but there are plenty of better options for mail storage. And there are no interoperability problems with switching, except with local software.
Re:Why the hate.... by Bookwyrm · 2010-10-30 02:50 · Score: 1

Take a look at Session Initiation Protocol (SIP) RFC 3261 if you really want to see crazy.
Re:Why the hate.... by panda · 2010-10-30 02:50 · Score: 2, Interesting

Interestingly, "mbox" format is another one of those standards without a standard, just like cookies.
It started basically as a storage convention for the mail command. Then, other programs started using it. Some of those programs were written to depend on certain information appearing on the line after the "From " and others didn't.
When I contributed to KMail 2 back in the day, on of my patches was to change what KMail put into the "From " lines of mailbox files because mutt or pine users (forget which) were complaining that KMail was broken because it wrote "From aaa@aaa" followed by the date with the hour set to midnight. This broke one of the other readers that expected the sender's email address and an actual timestamp.
Anyway, long story short, mbox format is plagued by similar though less serious problems to cookies. The biggest of which is that it is actually not a standard, but a convention.

--
Just be sure to wear the gold uniform when you beam down -- you know what happens when you wear the red one.
Re:Why the hate.... by hedrick · 2010-10-30 02:50 · Score: 3, Informative

These protocols were designed for a different world:
1) They were experiments with new technology. They had lots of options because no one was sure what would be useful. Newer protocols are simpler because we now know what turned out to be the most useful combination. And the ssh startup isn't that much better than telnet. Do a verbose connection sometime.
2) In those days the world was pretty evenly split between 7-bit ASCII, 8-bit ASCII and EBCDIC, with some even odder stuff thrown in. They naturally wanted to exchange data. These days protocols can assume that the world is all ASCII (or Unicode embedded in ASCII, more or less) full duplex. It's up to the system to convert if it has to. They also didn't have to worry about NAT or firewalls. Everyone sane believed that security was the responsibility of end systems, and firewalls provide only the illusion of security (something that is still true), and that address space issues would be fixed by reving the underlying protocol to have large addresses (which should have been finished 10 years ago).
3) A combination of patents and US export controls prevented using encryption and encryption-based signing right at the point where the key protocols were being designed. The US has ultimately paid a very high price for its patent and export control policies. When you're designing an international network, you can't use protocols that depend upon technologies with the restrictions we had on encryption at that time. It's not like protocol designers didn't realize the problem. There were requirements that all protocols had to implement encryption. But none of them actually did, because no one could come up with approaches that would work in the open-source, international environment of the Internet design process. So the base protocols don't include any authentication. That is bolted on at the application layer, and to this day the only really interoperable approach is passwords in the clear. The one major exception is SSL, and the SSL certificate process is broken*. Fortunately, these days passwords in the clear are normally on top of either SSL or SSH. We're only now starting to secure DNS, and we haven't even started SMTP.
---------------
*How is it broken? Let me count the ways. To start, there are enough sleazy certificate vendors that you don't get any real trust from the scheme. But setting up enterprise cert management is clumsy enough that few people really do it, hence client certs aren't use very often. And because of the combination of cost and clumsiness of issuing real certs, there are so many self-signed certs around the users are used to clicking through cert warnings anyway. Yuck.
Re:Why the hate.... by metamatic · 2010-10-30 02:54 · Score: 1

Don't forget the horrible hacks on SMTP for lines that consist of just a period "."
Also, if you want to see a brand new bad protocol, look at XMPP.
I think the all time worst protocol I've seen is SyncML. vCards wrapped in XML, with embedded plaintext passwords.

--
GCHQ Quantum Insert installed. If only our tongues were made of glass, how much more careful we would be when we speak
Re:Why the hate.... by arth1 · 2010-10-30 03:13 · Score: 2, Insightful

* SMTP: You can't send a line with the word "From" as the first word? I'm not a typewriter? WTF?

There's nothing in the SMTP protocol stopping you from using 'From ' at the start of a line. The flaw is with the mbox storage format, in improper implementations[*], and mail clients who compensate for that without even giving the user a choice. Blaming that on SMTP is plain wrong.
[*]: RFC4155 gives some advice on this, and calls the culprits "overly liberal parsers".
Re:Why the hate.... by ultranova · 2010-10-30 04:14 · Score: 2, Insightful

HTTP has no such excuse. It was initially developed two to three decades after Telnet and FTP. That's 20 to 30 years of mistakes, accumulated knowledge and research that its designers and implementors could have learned from.

HTTP works perfectly fine for the purpose for which it was made: downloading a text file from a server. How were the developers supposed to know that someone was going to run a shop over it?
HTTP and the Web grew organically. That evolution has given it its own version of wisdom teeth. Unfortunate, but hardly the fault of either Berners-Lee or the microbes in the primordial soup.

--
Forget magic. Any technology distinguishable from divine power is insufficiently advanced.
Re:Why the hate.... by Blakey+Rat · 2010-10-30 04:52 · Score: 1

There's also OAuth and OpenID, which are particular egregious because they're so new. Who designs a protocol that *requires* Internet access *and* a web browser to work? WTF.

--
Comment of the year
Re:Why the hate.... by Randle_Revar · 2010-10-30 06:00 · Score: 1

>Also, if you want to see a brand new bad protocol, look at XMPP.
Maybe, but still better than SIP/SIMPLE

--
Climate Progress - Hell and High Water
Re:Why the hate.... by kamochan · 2010-10-30 06:19 · Score: 1

Ah, but take a look at RFC 2543. As long as the net-heads had the reins, SIP was still sane. Once the telco actors got in the game, SIP went to hell faster than you could compress the word "idiocy" in your SIGCOMP VM with the counterpart-provided bytecode decomp implementation.
Re:Why the hate.... by klapaucjusz · 2010-10-30 08:22 · Score: 1

FTP: needs two connections.
Which makes a lot of sense if you want to be able to send commands while a file transfer is going on.

SMTP: You can't send a line with the word "From" as the first word?
Yes you can. It's only the Berkeley implementation of SMTP that cannot.
Re:Why the hate.... by KonoWatakushi · 2010-10-30 08:41 · Score: 1

*How is it broken? Let me count the ways. To start, there are enough sleazy certificate vendors that you don't get any real trust from the scheme. But setting up enterprise cert management is clumsy enough that few people really do it, hence client certs aren't use very often. And because of the combination of cost and clumsiness of issuing real certs, there are so many self-signed certs around the users are used to clicking through cert warnings anyway. Yuck.
I would just like to add: regardless, you are placing your trust in a central authority. That authority can be subverted with ease, when the will to do so emerges.
Re:Why the hate.... by Bookwyrm · 2010-10-30 09:05 · Score: 1

Sorry, but do you mean RFC 2543, RFC 2543-bis02, bis-03, bis-04 bis-05...? It got to RFC 2543-bis09, I think.
Of course, regardless of revision or RFC, it was always SIP/2.0.
Sorry, SIP was never sane. Just look at Alert-Info, particularly in early 2543. That should never have been in there.
Re:Why the hate.... by jpallas · 2010-10-30 19:55 · Score: 1

You're entitled to your opinions, but your factual assertions about Telnet and SMTP are just plain wrong. The closest thing to "several RFC patches" is the simple statement in RFC 1123 that "A host MUST carefully follow the rules of RFC-854 to avoid option-negotiation loops." In other words, if you don't follow the protocol, then the not-the-actual-protocol that you are using may not work right. And other commenters have already pointed out that SMTP poses no restrictions on the text of a message body, although some implementations change the message body when delivering to a mailbox.
FTP uses two connections partly because it was designed to allow a third party to initiate a transfer. Whether that's a reasonable requirement is a matter of opinion, but there's no way to meet that requirement without separating the control and data channels. FTP was designed, for better or for worse, with interactive use in mind. And a truly ancient hacker would know that some of the operating systems where ASCII and binary modes actually make a difference don't support the distinction in the filesystem, so putting that requirement in the protocol would simply make it impossible to implement for those OSes (which were common and important at that time).
Re:Why the hate.... by RichiH · 2010-10-31 02:35 · Score: 2, Informative

> * SMTP: You can't send a line with the word "From" as the first word? I'm not a typewriter? WTF?
Huh? The first blank line tells SMTP to stop parsing stuff as the body has begun. Far from perfect, but hey. Anyway, I just sent myself an email with "From: foo@foo.org" in the first line of the body. Needless to say, it worked.

Re:Does it work ? by Anonymous Coward · 2010-10-30 01:58 · Score: 2, Insightful

Web developers would have the time to think about important things like that, if they weren't spending all of their time trying to prevent data loss caused by MySQL or the NoSQL database de jour, horrible server-side peformance due to PHP, horrible client-side performance due to JavaScript, all while trying to avoid the numerous browser incompatibilities.

Although the tools and technologies they're using are complete shit, it sure doesn't help that they generally don't understand even basic software development and programming theories very well. See their bastardization of the MVC pattern, for instance.

The way the web works in general is bizarre by vadim_t · 2010-10-30 02:19 · Score: 4, Insightful

Let's see:

1. IP is a stateless protocol, that's inconvenient for some things, so
2. We build TCP on it to make it stateful and bidirectional.
3. On top of TCP, we build HTTP, which is stateless and unidirectional.
4. But whoops, that's inconvenient. We graft state back into it with cookies. Still unidirectional though.
5. The unidirectional part sucks, so various hacks are added to make it sorta bidirectional like autorefresh, culminating with AJAX.

Who knows what else we'll end up adding to this pile.

Re:The way the web works in general is bizarre by Anonymous Coward · 2010-10-30 02:44 · Score: 1, Informative

culminating with AJAX.
Oh no, not at all. There's WebSockets and Server-Sent Events in the pipeline now.
Re:The way the web works in general is bizarre by Sarten-X · 2010-10-30 04:35 · Score: 1

And that's why I don't do web development. Almost everybody's got a back end, and that's where I stay.

--
You do not have a moral or legal right to do absolutely anything you want.
Re:The way the web works in general is bizarre by bonch · 2010-10-30 05:35 · Score: 1

Isn't it great that there are people trying to create an app platform out of this shit?
Re:The way the web works in general is bizarre by nog_lorp · 2010-10-30 06:18 · Score: 1

The web stack needs a rewrite. Maybe a protocol and 'markup' language that aren't designed for documents. Furthermore, they keep trying to address developers needs by adding new communications features, but it would really be nice to just have udp for christ's sake. Perhaps they need to create a couple transport level protocols that address the security concerns keeping them from giving scripts access to udp/tcp.
Re:The way the web works in general is bizarre by lewildbeast · 2010-10-31 00:40 · Score: 1

That is just evolution. It works even though how it works may be illogical. Look at the human eye for example, the rods and cones point in the opposite direction to the light source! How 'backward' or wrong is that?! Clearly it works though...

Thanks for sharing! by Uzik2 · 2010-10-30 02:30 · Score: 1

A pretty interesting write up :)

--
-- Programming with boost is like building a house with lego. It's a cool but I wouldn't want to live in it

ohhh yeah by Anonymous Coward · 2010-10-30 02:43 · Score: 1, Funny

A session is forever

i love your design

Re:Let me get this straight... by John+Hasler · 2010-10-30 02:50 · Score: 1

Why can't we just start over with an entirely new web standard that would be designed in a more efficient manner?

And let's replace IPv4 while we're at it!

--
Warning: this article may contain humor, sarcasm, parody, and perhaps even irony. Read at your own risk.

alternatives ? by Tom · 2010-10-30 02:53 · Score: 1

Most of the crap we surround ourselves with (cookies, MIME, Windows and Office, etc.) are still there because they are there and the alternatives aren't.

What is the alternative to using cookies, really? Almost every framework for web-based development has session support that largely relies on cookies. Give me something more secure that works as easily and I will be using it right away.

--
Assorted stuff I do sometimes: Lemuria.org

Re:alternatives ? by DaveV1.0 · 2010-10-30 10:39 · Score: 1

I disagree. Those things exist because "we" are using HTTP to do things for which it should not be used. "We" have a hammer and we would rather treat everything as a nail rather than get a screwdriver.

Almost every framework for web-based development has session support that largely relies on cookies.
This is why one should not use a connectionless protocol like HTTP for something that requires a connection.

Give me something more secure that works as easily and I will be using it right away.
Are you a developer? If so, maybe you should get on that. I understand there lots of tools for creating encrypted TCP connections. You might want to start there.

--
There is no "-1 offended" or "-1 you don't agree with me" mod options for a reason.
Re:alternatives ? by Tom · 2010-10-30 22:46 · Score: 1

The problem with HTTPS is that in the usual setup, only one side of the exchange verifies its identity, namely the server. For a session-like handling, I would want client certificates. Those, however, are by far not common enough to build something that demands it.
As I said: Give me something that takes care of your complaints and is as easy to use, and I will. Thousands of other developers think the same. I'm also a security guy (I do run a browser game as a hobby, that's the developer part) so I know the problems. But still, my focus is my game, not invention a new protocol. Besides, how many players would I get if it weren't playable through a browser?

--
Assorted stuff I do sometimes: Lemuria.org

Not completely nonsensical... by Junta · 2010-10-30 03:04 · Score: 5, Informative

1. Sure
2. stateful, stream-oriented, *and* reliable
3. HTTP designed as a stateless datagram model, but wanted reliability, so TCP got chosen for lack of a better option. SCTP if it had existed might have been a better model, but for the time the stateful stream aspect of TCP was forgiven since it could largely be ignored but reliability over UDP was not so trivial.
4. More critically, the cookie mechanism strives to add stateful aspects that cross connections. This is something infeasible with TCP. Simplest example, HTTP 'state' strives to survive events like client IP changes, server failover, client sleeping for a few hours, or just generally allowing the client to disconnect and reduce server load. TCP state can survive none of those.
5. Indeed, at least AJAX enables somewhat sane masking of this, but the only-one-request-per-response character of the protocol means a lot of things cannot be done efficiently. If HTTP had allowed arbitrary server-side HTTP responses for the duration of a persistent http connection, that would have greatly alleviated the inefficiencies that AJAX methods strive to mask.

--
XML is like violence. If it doesn't solve the problem, use more.

Re:Not completely nonsensical... by icebraining · 2010-10-30 04:42 · Score: 1

6. Hence WebSockets?

--
Dilbert RSS feed
Re:Not completely nonsensical... by Junta · 2010-10-30 08:17 · Score: 1

I can grasp the point of server-side events (though I'm not sure it's a whole lot better than just having a vanilla HTTP request 'pending' from the client at all times to afford the server a way in if needed.
I really don't get what WebSockets buys me over any generic TCP stream. The biggest thing touted commonly is 'hey, it gets through overzealous firewalls that only allow port 80', which I think is as stupid as when SOAP advocates made the point. If any of these become sufficiently pervasive, then you'll just incur a wave of higher-layer firewalling when things go to crap.

--
XML is like violence. If it doesn't solve the problem, use more.
Re:Not completely nonsensical... by tibit · 2010-10-30 15:59 · Score: 1

Wait a minute, aren't websockets like the old-fashioned SOCKS proxies, just rediscovered?

--
A successful API design takes a mixture of software design and pedagogy.
Re:Not completely nonsensical... by jsm · 2010-10-31 05:35 · Score: 1

Thanks for this thoughtful response. But:

5. Indeed, at least AJAX enables somewhat sane masking of this, but the only-one-request-per-response character of the protocol means a lot of things cannot be done efficiently. If HTTP had allowed arbitrary server-side HTTP responses for the duration of a persistent http connection, that would have greatly alleviated the inefficiencies that AJAX methods strive to mask.
Well... what's wrong with using HTTP 1.1 persistent connections? They do allow multiple arbitrary HTTP responses over a single connection, efficiently.
I'm coming here late, but after reading the comments I still don't see the problems with HTTP. There does seem to be a lot of misunderstanding of the protocol and its history, though.

Re:Does it work ? by ralfmuschall · 2010-10-30 03:05 · Score: 3, Insightful

Your way of thinking is nice, but it is exactly this attitude that gets developers fired (or their bosses broke if they share that attitude and don't fire you, in which case an inferior insecure competing product will dominate) for thinking too much instead of getting the product out. That's why we are up to the neck in inferior goods, protocols just being one example. Not even death penalty (e.g. for melamine in chinese milk) does seem to stop this.

Re:Let me get this straight... by frank378 · 2010-10-30 03:19 · Score: 1

Why can't we just start over with an entirely new web standard that would be designed in a more efficient manner?

Yes, why don't we? The layered nature of the protocol stack is meant to allow for multiple versions and revisions of various and sundry functionality and interaction between layers. All the bright outspoken /.'ers here can go off and build some newer, better layers, or even a whole new stack! No more cookies needed, huzzah!

And it did learn... by Junta · 2010-10-30 03:28 · Score: 2, Interesting

It didn't make mistakes that closely resemble those in Telnet, tftp, ftp, smtp, it made what may be considered completely distinct 'mistakes' in retrospect.

However, if you confine the scope of HTTP use to what it was intended, it holds up pretty well. It was intended to serve up material that would ultimately manifest on a endpoint as a static document. Considerations for some server-side programmatic content tweaking based on client given cues was baked in to give better coordination between client and server and some other flexibility, but it was not intended to be the engine behind highly interactive applications 'rendered' by the server. HTTP was founded at a time when the internet at large wasn't particularly shy about developing new protocols running over TCP or UDP and I'm sure the architects of HTTP would've presumed such a usage model would have induced a new protocol rather than a mutation of HTTP over time.

Part of the whole 'REST' philosophy is to get back to the vision that HTTP targets. Strictly speaking, a RESTful implementation is supposed to eschew cookies and server maintained user sessions entirely. Every currently applicable embodiment of data is supposed to have its own *U*RL and authentication when required is HTTP auth. Thanks to Javascript a web application can still avoid popping up the inadequate browser provided login dialog as well as assembling disparate data at the client side rather than server side. It doesn't work everywhere, and often even when it does it's kinda mind warping to get used to, but it does try to use HTTP more in the manner it was archictected to be used.

--
XML is like violence. If it doesn't solve the problem, use more.

Re:And it did learn... by JohnFen · 2010-10-31 03:51 · Score: 1

Thanks to Javascript a web application can still avoid popping up
That's the only Javascript seems to avoid popping up.
Thank you, I'll be here all week.
But in all seriousness, Adding Javascript to the mix kinda makes everything worse. There's a good reason that I cannot stand to use the web without NoScript, and if javascript is actually required for any given site, that's a large minus. The site has to be much more compelling to me to get me to come back, and has to be fairly exceptional to get me to run its scripts.
Javascript might be able to patch over many of the problems with HTTP, but it brings its own sphere of madness, one the I think is even worse than plain vanilla HTTP+HTML.
That said, I like the REST approach.

Re:-1 Profanity by Anonymous Coward · 2010-10-30 03:43 · Score: 1, Insightful

Oh for fucks sake, stop being a fucking puritan, you fucktard!

Re:Let me get this straight... by am+2k · 2010-10-30 04:16 · Score: 1

Huh? The article is talking about HTTP, not HTML. Those two are not related in any way, Flash is also sent via HTTP.

You think HTTP is bad? by mveloso · 2010-10-30 04:33 · Score: 1

SNMP is a nightmare. There was a doc out there that used SNMP as an exemplar of "how not to write a protocol."

It's easy to forget, but these protocols were designed back in the day when there wasn't a lot of ram, bandwidth, or CPU.

Most of the problems with everything have been well-discussed. You can dig into the past to see, but interoperability with existing implementations is always the blocking factor.

Heck, everyone knew the problems with ActiveX when it was announced...but that didn't stop MS. Same with cookies. If you want to see excitement, you can mine all the old protocol-level vulnerabilities just by plowing through usenet archives.

Re:You think HTTP is bad? by sjames · 2010-10-30 07:00 · Score: 2, Informative

SNMP is in serious need of retirement! Even XML is better (and that's saying a LOT!) The constraints that made it seem like a good idea simply don't exist anymore anywhere.
See also BEEP and the syslog over BEEP (that has never, to my knowledge EVER been supported by anything) A protocol that didn't realize that we already HAVE multiplexing built in to the communications channel, and so re-implemented it in the most baroque way possible.
ActiveX rises to new heights of bogosity. It's not just poor implementation or even poor design. The very concept of letting random websites execute arbitrary code outside of a sandbox is brain dead.

Re:Does it work ? by cheekyjohnson · 2010-10-30 04:50 · Score: 1

That's the "if it isn't broke, don't fix it" mindset for you. Nothing ever improves.

--
Filthy, filthy copyrapists!

Re:Let me get this straight... by perlchild · 2010-10-30 05:02 · Score: 1

Let's not confuse html with http. This is already messy territory as it is

Re:Does it work ? by icebike · 2010-10-30 05:12 · Score: 4, Insightful

So in other words, you never bring anything into production status.

Look, its really quite simple.

HTTP was a presentation mechanism, designed to deliver content, dependent on non persistent connections, where each initial and each subsequent request had to supply all information necessary to fulfill said request. Even if you "log in" to your account, every request stands alone.

There is no persistent connection. There is no reliable persistent knowledge on the server side that can be positivity attributed to any given client. Clients are like motorists at a drive up window of a Burger stand, not well known patrons at a restaurant.

Given that scenario, it was inevitable that cookies would be developed, and employed.

So unless you were willing to hold off deployment of e-commerce until you totally rewrote HTTP into a persistent connection based protocol, totally replaced the browser as the client side tool, any grandstanding on how carefully and methodically you work is just grandiose bravado.

The only tool at hand was http and web servers and browsers. Its still largely the same today. There was no other way besides cookies of some sort. You may argue about their structure, their content or what ever, but cookies are all that is on the menu.

--
Sig Battery depleted. Reverting to safe mode.

Re:Let me get this straight... by Bigjeff5 · 2010-10-30 05:13 · Score: 1

We're talking about HTTP, not HTML. Just because they are often used together doesn't mean they are the same thing. In fact, they couldn't be more different; one is a communications protocol, the other is a markup language - I hope to god you can figure out which is which from that much.

But HTML is a terrible mess of kludges that doesn't work very well, too. It's just that most people on Slashdot consider it to be superior to Flash, even though it lacks a lot of Flash's basic functionality, and lacks all of the nice development tools that Flash has. Most of this stems from security paranoia (legitimate, but overblown in 99% of cases) and its tendency to crash (more significant issue, IMO, and also legitimate - also the cause of much of the security paranoia).

--
Security is mostly a superstition... Avoiding danger is no safer in the long run than outright exposure. - Helen Keller

Re:Does it work ? by Saint+Stephen · 2010-10-30 05:29 · Score: 2, Funny

Thank you, Captain Hindsight! What a complete failure the designers of HTTP were. They should've done it so much different! :-)

Re:-1 Profanity by somersault · 2010-10-30 05:47 · Score: 1

He basically said "everyone apart from me sucks".

--
which is totally what she said

Re:Does it work ? by SanityInAnarchy · 2010-10-30 05:53 · Score: 3, Informative

It would help if you qualified or explained a single one of these blanket assertions you've made.

What data loss is caused by MySQL? And while perhaps a NoSQL database "du jour" causes data loss, are you suggesting that the major ones like Couch, Cassandra, Mongo, etc all have serious data loss issues?

If so, specifics or it didn't happen. File a bug report, at the very least.

I don't have much good to say about PHP, but didn't someone recently roll out a compiler for it? I can't imagine PHP performance is a significant bottleneck, especially as people run successful websites written in everything from Java to Ruby. And what would you suggest in its place, C++? Gee, thanks, now we can spend all our time focusing on memory leaks and buffer overflows instead.

It's possible it's the wrong language for the job, but if you want to make that case, you've got to suggest an alternative.

Similarly, for JavaScript -- say what? Chrome compiles JavaScript to native code, and Firefox just got faster than Chrome. Both of them are now more than competitive with languages typically used for server-side development, where you'd expect performance to be a much bigger bottleneck. Indeed, there's at least one modern server-side JavaScript framework, written for V8, Chrome's JavaScript engine.

And again, is a potential alternative actually better for a given problem? Again, specific examples. There are applications which actually have performance needs which suggest they should be native apps, and people generally don't try those as web apps. Then there's a very, very thin border where a web app makes sense on the Web, but would be faster native -- but often, it's the design that's shite, not the technologies themselves.

If you ignore IE, browser compatibilities aren't so bad. Even if you include IE, are they significantly worse than OS incompatibilities if you decided to go native?

Finally, MVC. Exactly how is this "bastardized"? How would you do it differently, if you were writing a web framework? At least that's a specific example -- but you mentioned "software development and programming theories," plural, and you've only mentioned one.

It's possible you've got some good points, but you haven't backed them up at all.

--
Don't thank God, thank a doctor!

Re:Does it work ? by icebraining · 2010-10-30 06:02 · Score: 1

I don't have much good to say about PHP, but didn't someone recently roll out a compiler for it? I can't imagine PHP performance is a significant bottleneck, especially as people run successful websites written in everything from Java to Ruby. And what would you suggest in its place, C++? Gee, thanks, now we can spend all our time focusing on memory leaks and buffer overflows instead.

Yes, Facebook runs PHP compiled to C++ using HipPop.

Finally, MVC. Exactly how is this "bastardized"? How would you do it differently, if you were writing a web framework?

I think he's talking about the RoR model, where the view is essentially a template. That annoyed me too, but the framework I used is flexible enough to allow me to use Views as proper objects, which then use Templates.

--
Dilbert RSS feed

Re:Does it work ? by hairyfeet · 2010-10-30 06:18 · Score: 1

Actually I'd say it is the difference between those of us with a deadline and those without. It is easy in hindsight to say "I would do thus" but most of the time we simply aren't given the time we need to give a job the attention we'd like. Have I done a seriously half assed job in the past? Yes I have. Did I actually WANT to do a half assed job? No I didn't but was told flat footed "the job WILL be done by X", not Y, not even X+1 but X OR ELSE. It didn't matter that they would be getting a much less quality job, they chose half assed by X over decent quality by Y, so that is what they got.

--
ACs don't waste your time replying, your posts are never seen by me.

Re:Let me get this straight... by SanityInAnarchy · 2010-10-30 06:28 · Score: 1

What does that tell you about how bad Flash is that HTML5 is such a massive improvement over it?

--
Don't thank God, thank a doctor!

Re:Does it work ? by AK+Marc · 2010-10-30 07:24 · Score: 1

I knew a pilot who flew with duct tape holding down the fuel cap on his wing. That worked too, but it's hardly ideal is it?

I don't know about Australia, but in the US, that's simply illegal (presuming that running with the loss of a fuel cap is a "safety" item and it's on a certified aircraft flown by a certified pilot). When you have to break the law to do stupid, it's a special breed of stupid. And a pilot should know such things...

--
Learn to love Alaska

Re:Does it work ? by CapOblivious2010 · 2010-10-30 10:29 · Score: 3, Insightful

Let's see... we start with electrical signals, which are stateful... then we layer IP on top, to make it stateless... then we layer TCP on top of that, to make it stateful again... then we layer HTTP on top of that, to make it stateless again... then we layer cookies on top of that, to make it stateful again...

...and then we wonder why it performs like shit and is flaky as all hell!

I can't imagine what the problem might be... maybe we need a few more layers to make it perfect!

Re:-1 Profanity by kiwimate · 2010-10-30 10:46 · Score: 2, Insightful

Yes, and actually I agree with jabberw0k. There's simply no call for that kind of language; it added nothing to the points being made, and in fact distracted the poster from what had been a reasonably cogent argument up until that point.

If you reread the AC post, he/she makes several good points with some substance in the first four paragraphs - and then just lets rip with the profanity in the fifth paragraph, which, coincidentally, is where the entire post dissolves into a bunch of assertions with little to no rationale provided.

"Javascript is horrible." Oh, okay, then - why? "PHP is just as dreadful." Really, you don't say? Justify this assertion, please. "Every web developer who doesn't fit my narrow criteria is automatically rubbish." Glad you are still giving us some cogent points, then.

For what it's worth, I actually agree that "working" is different from "working well". One of my day jobs is as a member sitting on an interoperability panel at the moment, and you very quickly realize that something can meet the base level of "it does what it says" and fail miserably to be compatible and interoperable with other products.

But I don't need to descend to toilet language to explain this.

Working "well enough" is different... by weston · 2010-10-30 12:25 · Score: 1

... than "I don't like it."

HTTP has been repurposed far more than it should have been. Its lack of statefulness has resulted in horrible hacks like cookies and AJAX

AJAX? I can understand the cookie criticism, which TFA did a pretty good overview of, but AJAX's place is pretty much orthogonal to the issue of state. People resort to hacks *with* AJAX because browsers don't have a protocol with sessions, but even if we did, AJAX-like APIs and idioms would exist and continue to be used.

layout is still a huge hassle. CSS tries to bring in concepts from the publishing world, but they're not at all what we need for web layout

Layout -- even cross-platform layout -- is actually pretty easy if you use a subset of CSS positioning for the problems it's good at and tables for cases where it isn't.

A lot of people will claim otherwise, and they're wrong,

I predict a lot of the people who claim otherwise will do something you manage to neglect in their comment: provide justification for their statements. Perhaps you can try that your second time around instead of merely pounding your fist on the table about your personal opinion.

but JavaScript is a fucking horrible scripting language. It's even worse for writing anything significant.

Worse than what? How?

And no, it's absolutely nothing like Scheme (some JavaScript advocate always makes this stupid claim whenever the topic of JavaScript's horrid nature comes up).

It's enough like Scheme on at least two important fronts (functions as first class values, scoping rules) that it's false to say it's "nothing" like Scheme, and the related idioms that grow up around those common parts of the language are important to using it that it's a reasonable comparison, even with all the syntactic weight that JavaScript has and the missing features like macros and tail-call optimization.

the NoSQL movement, which arose solely because there are a lot of web "developers" who don't know how to use relational databases properly. I've seriously dealt with such "developers" and many of them didn't even know what indexes are!

A lack of programmer familiarity with the setup and querying of RDBMSs is a problem, and yes, set up properly, they can be pretty darn effective for a lot of situations some devs are using NoSQL solutions for, but saying the later are there "solely" for this reason is just as ignorant.

--
Tweet, tweet.

mbox needs a fix by r00t · 2010-10-30 15:48 · Score: 1

If a line starting with "From " is changed to start with ">From ", then one must also change ">From " to ">>From " and so on. Without this, mail gets mangled.

When reading mail, that transform must be undone. Note that even in cases where mail was stored without ">From " being changed to ">>From " it is likely less destructive to do unescaping than not. This is because humans seldom send email containing lines that start with ">From " but frequently send emails with lines starting with "From ".

Re:Does it work ? by SanityInAnarchy · 2010-10-30 18:23 · Score: 1

Yes, Facebook runs PHP compiled to C++ using HipPop

And that's pretty much my point exactly. Avoids the problems of C++, mostly, and all their existing PHP code gets faster just by tuning the language.

In the meantime, I don't see why I should adopt an ugly, dangerous language to solve a performance problem which, frankly, I just don't see. A properly designed app should be able to scale, which means you can throw hardware at the problem. When you're big enough that this isn't feasible, you're probably big enough that you can afford to build something like HipHop.

I think he's talking about the RoR model, where the view is essentially a template. That annoyed me too, but the framework I used is flexible enough to allow me to use Views as proper objects, which then use Templates.

There's Erector, which allows views to be code which ultimately generates HTML -- similar to a template, but not identical.

But I have to ask: In a Web context, what else would make sense as a view, particularly if you're deliberately doing fat models?

--
Don't thank God, thank a doctor!

Re:Does it work ? by SanityInAnarchy · 2010-10-30 18:29 · Score: 4, Insightful

By the time you add real garbage collection to C++, you're rapidly approaching a point where you may as well use Java. Anything short of that, like auto_ptr, is just a band-aid -- you still have plenty of ways to leak memory, and plenty of potential for buffer overflows. Contrast this to a sane, modern language, where these problems cannot exist.

Again, what would you suggest? If you're going to continue dismissing things I propose as crap without offering anything useful in its place, it's really not worth talking to you. If C++ is actually what you're suggesting, say so, and defend it.

--
Don't thank God, thank a doctor!

Broken spec, valid implementations by billcopc · 2010-10-31 21:22 · Score: 1

Sure, the spec itself is retarded, but cookies have been around long enough that we, the developers, have learned their quirks and know how to avoid them. For starters, no sane coder would actually stuff several cookies full of 4096-byte data chunks. They are mostly used for storing a relatively small session ID, with the big data blobs stored server-side, where they are actually used anyway.

The cross-domain issue is indeed annoying for sites that do mass vhosts like "username.somedomain.com". I frankly have never used cross-domain cookies, when it is easier at both ends to pass the ID in a URL. I'm not saying they should completely disable this feature, but maybe turn it into an opt-in kind of thing, to be decided by the user. I consider it far more secure for such sites to use cross-domain JS includes (pull), rather than someone else's cookies (push).

--
-Billco, Fnarg.com

Re:Does it work ? by agbinfo · 2010-11-01 07:04 · Score: 1

I believe the parent was referring to smart pointers and RAII which lets you select when the data is considered garbage and when it should be collected. Languages that use GC can also leak memory if you're not careful. I remember a Java program I was working on where the data was loaded in a map for quick lookup. Whenever the operator would load a new file, the map wasn't set to null and Weak pointers were not used so it leaked.

Re:Does it work ? by SanityInAnarchy · 2010-11-01 16:03 · Score: 1

It's true that GC'd languages can potentially leak memory, but the possibilities are small and almost require you to deliberately subvert what the garbage collector otherwise does for you.

By contrast, it's trivially easy to leak memory in a non-garbage-collected language, and again, "smart pointers" (just refcounting, right?) are still more likely to leak memory, and potentially add even more overhead than real GC.

So, may as well just use GC, and if you're doing that, may as well just use something like Java. (Though not, I'd hope, Java itself.)

--
Don't thank God, thank a doctor!

Re:Does it work ? by julesh · 2010-11-01 22:15 · Score: 1

I don't have much good to say about PHP, but didn't someone recently roll out a compiler for it? I can't imagine PHP performance is a significant bottleneck, especially as people run successful websites written in everything from Java to Ruby.

The main performance issue with PHP never was its interpreter, which was always reasonably fast. The issue that a lot of people have with it is that if you use one of its most important features, i.e. its automatic session management, it uses a locking system that basically means only request per client can be processed at a time. You can work around it, but you have to know what you're doing, and many people are unaware that they need to. Hence, if you have content like images being served by PHP, unless its author understood the language much better than the average PHP developer does, the result will be unnecessarily slow.

If you ignore IE, browser compatibilities aren't so bad. Even if you include IE, are they significantly worse than OS incompatibilities if you decided to go native?

You can't ignore IE. Like it or loathe it (the former only happens if you aren't actually a web developer), IE still has a significant market share. Not supporting it on any site with a commercial goal is practically suicide.

And, yes, IE's compatibility issues are significantly worse than native development issues. I can pick a framework to develop with, say Java+SWT, and have the results work on every common target platform with almost no platform-specific work required. If I target web browsers, I have never been able to produce a non-trivial application without spending significant time debugging IE-specific issues (e.g. browser crashing on unloading plugins in hidden divs, an issue which struck my last major web project and delayed it by about two days as I figured out a workaround).

Re:Does it work ? by agbinfo · 2010-11-02 05:26 · Score: 1

By contrast, it's trivially easy to leak memory in a non-garbage-collected language, and again, "smart pointers" (just refcounting, right?) are still more likely to leak memory, and potentially add even more overhead than real GC.

There are ref counting smart pointers but there's also weak pointers and unique pointers. For the majority of stuff you just want to ensure that the resource is released when the resource's owner goes out of scope. It's not that complicated.

There are also plenty of GC libraries for C++ so it's possible to select which objects are GC candidates and which are not - best of both worlds.

So, may as well just use GC, and if you're doing that, may as well just use something like Java. (Though not, I'd hope, Java itself.)

GCs have other issues aside from efficiency. They make it much harder to have real-time guarantees. They make it harder to free up resources in a deterministic manner although C#'s using statement makes this much easier. Also a good number of Java and C# programmers probably don't even know about weak pointers so I'm pretty sure memory leaks exists in most non-trivial programs in GC languages too.

Sometimes I prefer Java or C#. Sometimes I prefer C++. I just don't think the memory leak issues in C++ are as bad as many people try to make them to be.

Re:Does it work ? by agbinfo · 2010-11-02 06:05 · Score: 1

BTW, I'd like to make it clear that I'm just stating a personal opinion. I'm not an expert on the subject. I'm just some guy with an opinion.

Re:Does it work ? by SanityInAnarchy · 2010-11-02 10:04 · Score: 1

You can't ignore IE. Like it or loathe it (the former only happens if you aren't actually a web developer), IE still has a significant market share. Not supporting it on any site with a commercial goal is practically suicide.

Supporting old versions in a limited capacity, with a suggestion to upgrade your browser, doesn't seem to be hurting YouTube any.

And, yes, IE's compatibility issues are significantly worse than native development issues. I can pick a framework to develop with, say Java+SWT, and have the results work on every common target platform with almost no platform-specific work required.

But you're incapable of picking a framework to develop with, say JQuery or IE9.js, which has the results work on every common target platform with almost no platform-specific work required?

And again, throw out IE, particularly old versions of IE, and it becomes a decent platform. If needed, add it back in with something like IE9.js or Chrome Frame.

Yes, Chrome Frame. You're going to make your users download a JVM and your native app, but it's too much to ask them to download a browser, or even a browser plugin?

It's worth mentioning, too: IE has fallen below 50%, and that's in general. Among technically-inclined people, it's far lower. Only about 15% are on IE6, and again, the platform massively improves when you don't have to support that anymore.

--
Don't thank God, thank a doctor!

Re:Does it work ? by SanityInAnarchy · 2010-11-02 10:13 · Score: 1

It's not that complicated.

In theory, it's simple. In practice, not so much -- the bugs which can happen here are numerous and subtle.

There are also plenty of GC libraries for C++

And my point here was that by the time you use a GC library, why not get the full benefit of a safer, saner language? You've already got most of the overhead of something like Java, why not also get the runtime optimizations and the protection from buffer overflows and segfaults, too?

it's possible to select which objects are GC candidates and which are not

And what'd be the criteria for which objects should be GC'd and which you want to handle yourself?

I'd guess the objects which you want to manage yourself are either places where you're interacting with code, or particularly performance-critical parts of your application. But if you're doing it that way, it seems to me that I get most of the same benefit by coding in Ruby, and dropping down to C for those two cases.

It seems you could get similar benefits in Java if JNI wasn't such a bitch -- and even as it is, it isn't that bad compared to pretty much anything else in C.

I just don't think the memory leak issues in C++ are as bad as many people try to make them to be.

I don't think they're particularly bad either, but I don't see any reason I, as a programmer, should have to deal with them. I certainly don't think C++ has any real place in web development -- except, as I mentioned, in particularly performance-critical bits, especially when they can be abstracted into libraries. I trust the HTTP parser in nginx or Apache a lot more than any code I wrote myself, but anything I write, I trust a lot more in Ruby or JavaScript than in C or C++.

--
Don't thank God, thank a doctor!

Re:Does it work ? by agbinfo · 2010-11-02 15:47 · Score: 1

It's not that complicated.

In theory, it's simple. In practice, not so much -- the bugs which can happen here are numerous and subtle.

That's true. It's also true of other languages but there are probably more issues with C and C++ than there are with many other languages.

There are also plenty of GC libraries for C++

And my point here was that by the time you use a GC library, why not get the full benefit of a safer, saner language? You've already got most of the overhead of something like Java, why not also get the runtime optimizations and the protection from buffer overflows and segfaults, too?

For most purposes, smart pointers will do the job real fine. There's none to little overhead and you get the advantage that you know when your objects get destroyed.

it's possible to select which objects are GC candidates and which are not

And what'd be the criteria for which objects should be GC'd and which you want to handle yourself?

I'd guess the objects which you want to manage yourself are either places where you're interacting with code, or particularly performance-critical parts of your application. But if you're doing it that way, it seems to me that I get most of the same benefit by coding in Ruby, and dropping down to C for those two cases.

It seems you could get similar benefits in Java if JNI wasn't such a bitch -- and even as it is, it isn't that bad compared to pretty much anything else in C.

I've never had to use a GC in C++ so I'm mostly guessing here. One situation where I'd want to use GC is if I had several containers sharing the same objects and none could be considered the owner. If there's an owner, then using a weak pointers for other containers does the trick.

As far as performance is concerned, going from managed to unmanaged code was relatively expensive in Java with JNI when I used it. Hopefully Ruby is better at it. I don't think you're wrong in that the vast majority of cases don't need the performance provided by C++.

There's another thing you might want to look at when talking about performance. C and C++ will usually have much lower memory requirement and there's no interpreter to load. If performance is an issue, it might be simpler to stick to C++

I just don't think the memory leak issues in C++ are as bad as many people try to make them to be.

I don't think they're particularly bad either, but I don't see any reason I, as a programmer, should have to deal with them. I certainly don't think C++ has any real place in web development -- except, as I mentioned, in particularly performance-critical bits, especially when they can be abstracted into libraries. I trust the HTTP parser in nginx or Apache a lot more than any code I wrote myself, but anything I write, I trust a lot more in Ruby or JavaScript than in C or C++.

If performance is not an issue, I wouldn't use C++ either unless there's some reason to. I've implemented some proof-of-concept in C++ but I did so because I had to interface with our code base. At other times I've used Perl, Java and C# when I could choose.

Re:Does it work ? by SanityInAnarchy · 2010-11-04 11:25 · Score: 1

That's true. It's also true of other languages but there are probably more issues with C and C++ than there are with many other languages.

Well, in particular, when something goes wrong in Java, the typical result is a NullPointerException, which can be caught and managed, and which is much easier to debug compared to with C, where the typical result is a segfault, and it can be difficult or impossible to track down.

For most purposes, smart pointers will do the job real fine. There's none to little overhead and you get the advantage that you know when your objects get destroyed.

Well, again, what do you mean? If we're talking about std::auto_ptr -- that is, a refcounting pointer -- then while I haven't done the benchmarks to back it up, I'd guess refcounting can actually be worse than GC in terms of performance. In particular, with a garbage-collected language, the garbage collector presumably runs at intervals, and is highly optimized -- the whole thing probably fits in cache. This means when the GC isn't running, there's no memory-management-related code running. By contrast, with refcounting, you're at least dealing with the reference count all the time, and you're making calls to delete or free more often...

On the other hand,

C and C++ will usually have much lower memory requirement and there's no interpreter to load.

I don't think an interpreter alone is an issue, and I'm skeptical that the memory requirements are that significant, but if nothing else, GC would tend to leave objects around for awhile before attempting to collect them, whereas C and C++ can collect them immediately. In practice, for performance reasons, you'd probably retain a pool of allocated memory so you don't have to talk to the OS as often -- I think modern malloc implementations do this -- but on a system truly starved for memory, it helps that every byte is released as soon as it can be.

It's just that for the vast majority of applications, GC and other modern, high-level tools are more than worth a large performance penalty, and the difference is getting smaller all the time.

--
Don't thank God, thank a doctor!

The rest of HTTP is just as bad. by CondeZer0 · 2010-11-04 19:31 · Score: 1

HTTP is a huge complex mountain of hacks on top of other hacks. We are just lucky that no more 'features' have been added to it for some time.

I have been thinking about defining a sane subset and calling it HTTP 0.2, but every time I look into it the sheer messiness of the HTTP standard and existing implementations is just too depressing to handle.

--
"When in doubt, use brute force." Ken Thompson

Re:Does it work ? by agbinfo · 2010-11-05 03:30 · Score: 1

For most purposes, smart pointers will do the job real fine. There's none to little overhead and you get the advantage that you know when your objects get destroyed.

Well, again, what do you mean? If we're talking about std::auto_ptr -- that is, a refcounting pointer -- then while I haven't done the benchmarks to back it up, I'd guess refcounting can actually be worse than GC in terms of performance. In particular, with a garbage-collected language, the garbage collector presumably runs at intervals, and is highly optimized -- the whole thing probably fits in cache. This means when the GC isn't running, there's no memory-management-related code running. By contrast, with refcounting, you're at least dealing with the reference count all the time, and you're making calls to delete or free more often...

On the other hand,

First off, I'm not an expert in memory management. I graduated in the 90s and I'm sure things have changed quite a bit since then. That is, I may be wrong and you can prefix everyone of the next sentences with "As far as I know."

Reference counting has very little overhead. Memory wise it adds a few bytes and time wise, it adds almost nothing as well. A GC will probably use reference counting to speed up detection of unused memory and only perform mark-and-sweep or whatever is needed to resolve circular references after that. I'm pretty sure that there's just as much, if not more, memory-management related code in a GC based program even when the GC is not running. I haven't done any benchmarks either.

auto_ptr is probably not the smart pointer you want to use. You're better off using boost's smart pointers. The problem with auto_ptr is that they don't play along nicely with containers.

As far as keeping unused objects in memory for cache, I don't think that GCs can do that. Once the object is no longer referenced, its data is meaningless. Also, I'm pretty sure that malloc/new implementations have never resulted in systematic calls to the OS.

Re:Does it work ? by SanityInAnarchy · 2010-11-05 14:46 · Score: 1

I don't know if GC uses refcounting at all, though I suppose it's possible.

However, the point is that the reference counting itself isn't just the extra bytes of RAM, it's the extra bytes of CPU cache. It's the difference between a chunk of your program fitting in cache and running insanely fast, then being paged out for GC to run (and GC sits in cache during its run), and that same program needing the refcounting, malloc/free, and a bunch of other housekeeping stuff always hot in cache, meaning it's likely your program will have to have chunks of it paged in and out of cache much more often.

Paradoxical, and I'm not convinced, so I'd want to benchmark it. It does seem plausible, and I did read it in a respectable-looking paper.

So no, I wasn't talking about the GC keeping anything "in memory" (as opposed to what?) -- yes, once the object isn't referenced, its data is meaningless.

And yes, I'm pretty sure malloc/new implementations have, at least at one point, been direct system calls. I imagine they still are, on some embedded platforms. When you're starved for memory, it makes sense -- you want everything free'd for other processes to use as soon as you possibly can.

Good to know about boost -- though now I'm curious what the difference is.

--
Don't thank God, thank a doctor!

Re:Does it work ? by agbinfo · 2010-11-05 16:17 · Score: 1

I don't know if GC uses refcounting at all, though I suppose it's possible.

However, the point is that the reference counting itself isn't just the extra bytes of RAM, it's the extra bytes of CPU cache. It's the difference between a chunk of your program fitting in cache and running insanely fast, then being paged out for GC to run (and GC sits in cache during its run), and that same program needing the refcounting, malloc/free, and a bunch of other housekeeping stuff always hot in cache, meaning it's likely your program will have to have chunks of it paged in and out of cache much more often.

Actually, ref-counting is mostly just the extra few bytes. An auto_ptr (or a unique_ptr or a boost::scoped_ptr) doesn't even use the extra bytes because it has single ownership. When they go out of scope, the object is destroyed. No extra byte; no complicated memory management. The C++ compiler knows when the object goes out of scope and will call the destructor at that time.

For boost::shared_ptr, there's extra memory for reference counting because there can be multiple owners. But, again, I'd be surprised if a GC-based language wouldn't use reference counting. Perl, for example, uses reference counting exclusively *because* it's much faster than other schemes. It has the same drawback that C++ has which is that circular references may leak.

Paradoxical, and I'm not convinced, so I'd want to benchmark it. It does seem plausible, and I did read it in a respectable-looking paper.

If you have a link to that paper I'd like to see it. As I said, there's not much more to reference counting other than incrementing a value when the object is assigned a new owner and decrementing that same value when it's being released. The allocation is done once and there is a single delete.

So no, I wasn't talking about the GC keeping anything "in memory" (as opposed to what?) -- yes, once the object isn't referenced, its data is meaningless.

And yes, I'm pretty sure malloc/new implementations have, at least at one point, been direct system calls. I imagine they still are, on some embedded platforms.

I've programmed in C and other procedural languages, Pascal for example, for a long time. I've never seen a single implementation that would make a system call for each malloc/free call. If you know of one, again, I'd be interested to have a link.

When you're starved for memory, it makes sense -- you want everything free'd for other processes to use as soon as you possibly can.

When delete is called (or free in C), the memory used by the object is made available immediately. This requires a call to the C or C++ library, if that's what you mean, but this is not a system call. It doesn't require an intervention from the OS except, maybe, in a multi-threaded application. If this library call is what you mean by "system call" then yes it has some overhead. I have heard of implementations of new/delete that accumulate the delete in order to gain a few extra cycles. But when you need these extra cycles you probably should be programming in C++.

Good to know about boost -- though now I'm curious what the difference is.

here

Re:Does it work ? by SanityInAnarchy · 2010-11-06 04:44 · Score: 1

That's just it, though:

When they go out of scope, the object is destroyed... The C++ compiler knows when the object goes out of scope and will call the destructor at that time.

Which means the destructor now needs to be called, along with whatever code the 'delete' keyword actually compiles to. And again, this is extra bytes of code.

But, again, I'd be surprised if a GC-based language wouldn't use reference counting. Perl, for example, uses reference counting exclusively *because* it's much faster than other schemes. It has the same drawback that C++ has which is that circular references may leak.

Well, and I know for a fact Java, Ruby, and any sane JavaScript interpreter at least has some sort of actual garbage collector, vaguely like mark-and-sweep, so they don't have to deal with circular references. Once they have that, I don't see the point of reference counting.

As I said, there's not much more to reference counting other than incrementing a value when the object is assigned a new owner and decrementing that same value when it's being released. The allocation is done once and there is a single delete.

It's that value, plus the actual delete.

If you have a link to that paper I'd like to see it.

Not readily. I think the best I can do at the moment is point out that the wikipedia article seems to agree with me. There's also this, which again suggests that garbage collection can match or beat malloc/free -- and that's without mentioning refcounting, which brings some additional overhead of its own.

When delete is called (or free in C), the memory used by the object is made available immediately. This requires a call to the C or C++ library, if that's what you mean, but this is not a system call.

Right -- this is what I mean by the smart, optimized way. It's not a system call every time (though it is sometimes), and it isn't entirely without cost.

But because it's not a system call, the memory is only available to this program immediately, which is why I'd imagine (though I don't have a link to back it up) that on an embedded system, if you were particularly starved for memory, you might want to make it immediately available to other programs, which necessarily involves talking to the system.

--
Don't thank God, thank a doctor!

154 of 186 comments (clear)