How Not To Design a Protocol
An anonymous reader writes "Google security researcher Michael Zalewski posted a cautionary tale for software engineers: amusing historical overview of all the security problems with HTTP cookies, including an impressive collection of issues we won't be able to fix. Pretty amazing that modern web commerce uses a mechanism so hacky that does not even have a proper specification."
RTFA. That's exactly what happend with HTTP. "It works". In the world of 1990. And then they started to "fix" it to keep up.
... cookies are delicious!
Check out my novel.
I still think allowing cookies to span more than one distinct domain was a mistake. If we had avoided that in the beginning, cookie scope implementations would be dead simple and not much functionality would be lost on the server side. Also, JavaScript cookie manipulation is something we could easily lose for the benefit of every user, web developer and server admin. I postulate there are very few legitimate uses for document.cookie
> HTTP is like a manual lawn mower.
No it isn't. A manual lawnmower is well-designed. The Web is like a lawnmower built by Rube Goldberg out of dozens of pairs of scissors, lots of string, some boards and a child's wagon, propelled by a large dog and powered by the wagging of his tail (the cookies are to get him to wag it). It's now had a clippings bag and a fertilizer cart added following the same design principles. An automatic dandilion remover, a dethatcher, and an aerator are coming soon (and several more dogs).
Warning: this article may contain humor, sarcasm, parody, and perhaps even irony. Read at your own risk.
When I'm designing a solution, I don't ask if it works, I was if it works well. Is it secure? Is it scalable? What are the risks associated with it? Is it full of kludges that make bad implementations easy? What do I do if a user decides she doesn't trust that functionality and turns it off? And the point of the article wasn't to say that people shouldn't use cookies when developing web site or applications. Rather, it's an examination of how a sub-optimal solution came to be so that perhaps other people can avoid similar pitfalls in the future.
You forgot to mention that the dog taking a shit is an extra add-on........Flash!
I think it can be hard to plan for this far into the future. Look how much the web has changed, and the things we do now with even just HTML and CSS that people back in the beginning probably would never have even considered doing. You build something for your needs and if it works then you are good. Sometimes you don't want to spend time planning it out for the next 5, 10, 20 years because you assume (usually correctly) that what you are writing will be updated long before then and replaced with something else.
I will shred my adversaries. Pull their eyes out just enough to turn them towards their mewing, mutilated faces. Illyria
am I the only one who now wants to see that built/build it myself?
i thought once I was found, but it was only a dream.
It is this type of thinking that separates a carpenter from an engineer.
The road to tyranny has always been paved with claims of necessity.
"Working" is measured over a very wide spectrum. On one hand, we have "broken", and on the other we have "working perfectly". The web is far, far closer to the "broken" side of the spectrum than it ever has been to the "working perfectly" side.
Put simply, almost everything about the web is one filthy hack upon another. It's a huge stack of shitty "extensions" that were often made with little thought, so it's no wonder web development is so horrible today.
HTTP has been repurposed far more than it should have been. Its lack of statefulness has resulted in horrible hacks like cookies and AJAX. HTTP makes caching far harder than it should be. SSL and TLS are mighty awful hacks. And those are just a few of its problems!
HTML is a mess, and HTML5 is just going to make the situation worse. Even after 20 years, layout is still a huge hassle. CSS tries to bring in concepts from the publishing world, but they're not at all what we need for web layout, and thus everyone is unhappy.
A lot of people will claim otherwise, and they're wrong, but JavaScript is a fucking horrible scripting language. It's even worse for writing anything significant. And no, it's absolutely nothing like Scheme (some JavaScript advocate always makes this stupid claim whenever the topic of JavaScript's horrid nature comes up).
PHP is one of the few popular languages that can rival JavaScript in terms of being absolutely shitty. Then there are other server-side shenanigans like the NoSQL movement, which arose solely because there are a lot of web "developers" who don't know how to use relational databases properly. I've seriously dealt with such "developers" and many of them didn't even know what indexes are!
Most web browsers themselves are quite shitty. It has gotten better recently, but they still use huge amounts of RAM for the relatively simple services they provide.
The only people involved with some sort of web-related software development who aren't absolute fuck-ups are those working on HTTP servers like Apache HTTPd, nginx, and lighttpd. But now we're seeing crap like Mongrel and Mongrel2 arising in this area, so maybe it's only a matter of time before the sensible developers here move on.
So just because the web is "sort of broken", rather than "completely fucking broken", it doesn't mean that it's "working".
Why go hatin' on this particular protocol?
Most of them are just nuckin futs:
* FTP: needs two connections. Commands and responses and data are not synced in any way. No way to get a reliable list of files. No standard file listing format. No way to tell what files need ASCII and which need BIN mode. And probably more fubarskis.
* Telnet: The original handshake protocol is basically foobar-- the handshakes can go on forever. Several RFC patches did not help much. Basically the clients have to kinda cut off negotiations at some point and just guess what the other end can and will do.
* SMTP: You can't send a line with the word "From" as the first word? I'm not a typewriter? WTF?
Web developers would have the time to think about important things like that, if they weren't spending all of their time trying to prevent data loss caused by MySQL or the NoSQL database de jour, horrible server-side peformance due to PHP, horrible client-side performance due to JavaScript, all while trying to avoid the numerous browser incompatibilities.
Although the tools and technologies they're using are complete shit, it sure doesn't help that they generally don't understand even basic software development and programming theories very well. See their bastardization of the MVC pattern, for instance.
Ah, the OSI model (circa 1978), the polar opposite of Cookies - a spec so glorious, it's still commonly cited - yet so useless it's a 30 year old virgin, having never been implemented!
Rube Goldberg? Quite the opposite. The HTTP protocol is very simple, eminently debuggable, plus extensible both ways.
It's the implementations in browsers and servers that suck.
Now *SOAP*, layered on top of HTTP, is truly a Rube Goldberg invention with no redeeming qualities whatsoever.
Let's see:
1. IP is a stateless protocol, that's inconvenient for some things, so
2. We build TCP on it to make it stateful and bidirectional.
3. On top of TCP, we build HTTP, which is stateless and unidirectional.
4. But whoops, that's inconvenient. We graft state back into it with cookies. Still unidirectional though.
5. The unidirectional part sucks, so various hacks are added to make it sorta bidirectional like autorefresh, culminating with AJAX.
Who knows what else we'll end up adding to this pile.
1. Sure
2. stateful, stream-oriented, *and* reliable
3. HTTP designed as a stateless datagram model, but wanted reliability, so TCP got chosen for lack of a better option. SCTP if it had existed might have been a better model, but for the time the stateful stream aspect of TCP was forgiven since it could largely be ignored but reliability over UDP was not so trivial.
4. More critically, the cookie mechanism strives to add stateful aspects that cross connections. This is something infeasible with TCP. Simplest example, HTTP 'state' strives to survive events like client IP changes, server failover, client sleeping for a few hours, or just generally allowing the client to disconnect and reduce server load. TCP state can survive none of those.
5. Indeed, at least AJAX enables somewhat sane masking of this, but the only-one-request-per-response character of the protocol means a lot of things cannot be done efficiently. If HTTP had allowed arbitrary server-side HTTP responses for the duration of a persistent http connection, that would have greatly alleviated the inefficiencies that AJAX methods strive to mask.
XML is like violence. If it doesn't solve the problem, use more.
Your way of thinking is nice, but it is exactly this attitude that gets developers fired (or their bosses broke if they share that attitude and don't fire you, in which case an inferior insecure competing product will dominate) for thinking too much instead of getting the product out. That's why we are up to the neck in inferior goods, protocols just being one example. Not even death penalty (e.g. for melamine in chinese milk) does seem to stop this.
But these servers need to communicate anyway to maintain a "session" in any meaningful sense, so they can as well send the associated crypt key with the rest of the session information.
Forget magic. Any technology distinguishable from divine power is insufficiently advanced.
It didn't make mistakes that closely resemble those in Telnet, tftp, ftp, smtp, it made what may be considered completely distinct 'mistakes' in retrospect.
However, if you confine the scope of HTTP use to what it was intended, it holds up pretty well. It was intended to serve up material that would ultimately manifest on a endpoint as a static document. Considerations for some server-side programmatic content tweaking based on client given cues was baked in to give better coordination between client and server and some other flexibility, but it was not intended to be the engine behind highly interactive applications 'rendered' by the server. HTTP was founded at a time when the internet at large wasn't particularly shy about developing new protocols running over TCP or UDP and I'm sure the architects of HTTP would've presumed such a usage model would have induced a new protocol rather than a mutation of HTTP over time.
Part of the whole 'REST' philosophy is to get back to the vision that HTTP targets. Strictly speaking, a RESTful implementation is supposed to eschew cookies and server maintained user sessions entirely. Every currently applicable embodiment of data is supposed to have its own *U*RL and authentication when required is HTTP auth. Thanks to Javascript a web application can still avoid popping up the inadequate browser provided login dialog as well as assembling disparate data at the client side rather than server side. It doesn't work everywhere, and often even when it does it's kinda mind warping to get used to, but it does try to use HTTP more in the manner it was archictected to be used.
XML is like violence. If it doesn't solve the problem, use more.
Part of the problem is historical. Tim B-L wanted to make a WYSYWYG viewer system. Back in the day when it was invented, it was dangerous. Dangerous because it was an independent, open API set that worked wherever a browser worked. That flew in the face of tons of proprietary software. It was a transport-irrelevent protocol set that took the best of different coding schemes and made it work. Like most things invented by a single (or very few) person(s), it was a work of art. But it was state of the art nearly two decades ago, and we've come a lonnnnnnng way.
When http and W3C were hatching, there were still battles about ARCNet, Token Ring, Ethernet, and something called ATM. Now most of the world uses Ethernet and Ethernet-like communications using TCP/IP-- which back then, was barely running across the aforementioned networking protocols.
Lawn mowers, by contrast, were a 2-stroke, then 4-stroke engine with a blade and housing. The need, whacking grass, hasn't changed. By contrast, we now make browsers do all sorts of things never invisioned in the early 1990's. And we're planning stuff not really imagined in 2000. In 2020, browsers may be gone, or they may be *completely* different tools than they are now. Lawnmowers will still only whack grass.
---- Teach Peace. It's Cheaper Than War.
It would appear that you do not know what a manual lawnmower is.
Warning: this article may contain humor, sarcasm, parody, and perhaps even irony. Read at your own risk.
The only reason the implementations in browsers suck is because HTTP is such a hack-job of a protocol (it wasn't originally, but then it was not originally designed to do what it does today). The browsers are left dealing with issues which the HTTP "specification" (which isn't even fully documented, btw) either completely ignores or recommends practices that are completely unrealistic.
One example from the article: the HTTP spec recommends a minimum of 80kb for request headers (20 cookies per user, 4kb per cookie). However, most web servers limit request headers to 8kb (Apache) or 16kb (IIS) in order to prevent denial of service attacks. It is very important that they limit the headers - not doing so leaves them wide open to attack. The HTTP recommendations are completely unreasonable in this regard and fly in the face of good security practice. They are also completely ignored in this and many other cases, because they are so unreasonable.
If the protocol were simple, clear, well designed, and well defined then the browser implementations wouldn't have to suck. It's HTTP that has caused this problem, not the other way around.
It was a very limited protocol that became way too popular, and now we're stuck with a bunch of hacks to get it to work with modern web technology.
Security is mostly a superstition... Avoiding danger is no safer in the long run than outright exposure. - Helen Keller
So in other words, you never bring anything into production status.
Look, its really quite simple.
HTTP was a presentation mechanism, designed to deliver content, dependent on non persistent connections, where each initial and each subsequent request had to supply all information necessary to fulfill said request. Even if you "log in" to your account, every request stands alone.
There is no persistent connection. There is no reliable persistent knowledge on the server side that can be positivity attributed to any given client. Clients are like motorists at a drive up window of a Burger stand, not well known patrons at a restaurant.
Given that scenario, it was inevitable that cookies would be developed, and employed.
So unless you were willing to hold off deployment of e-commerce until you totally rewrote HTTP into a persistent connection based protocol, totally replaced the browser as the client side tool, any grandstanding on how carefully and methodically you work is just grandiose bravado.
The only tool at hand was http and web servers and browsers. Its still largely the same today. There was no other way besides cookies of some sort. You may argue about their structure, their content or what ever, but cookies are all that is on the menu.
Sig Battery depleted. Reverting to safe mode.
Thank you, Captain Hindsight! What a complete failure the designers of HTTP were. They should've done it so much different! :-)
It would help if you qualified or explained a single one of these blanket assertions you've made.
What data loss is caused by MySQL? And while perhaps a NoSQL database "du jour" causes data loss, are you suggesting that the major ones like Couch, Cassandra, Mongo, etc all have serious data loss issues?
If so, specifics or it didn't happen. File a bug report, at the very least.
I don't have much good to say about PHP, but didn't someone recently roll out a compiler for it? I can't imagine PHP performance is a significant bottleneck, especially as people run successful websites written in everything from Java to Ruby. And what would you suggest in its place, C++? Gee, thanks, now we can spend all our time focusing on memory leaks and buffer overflows instead.
It's possible it's the wrong language for the job, but if you want to make that case, you've got to suggest an alternative.
Similarly, for JavaScript -- say what? Chrome compiles JavaScript to native code, and Firefox just got faster than Chrome. Both of them are now more than competitive with languages typically used for server-side development, where you'd expect performance to be a much bigger bottleneck. Indeed, there's at least one modern server-side JavaScript framework, written for V8, Chrome's JavaScript engine.
And again, is a potential alternative actually better for a given problem? Again, specific examples. There are applications which actually have performance needs which suggest they should be native apps, and people generally don't try those as web apps. Then there's a very, very thin border where a web app makes sense on the Web, but would be faster native -- but often, it's the design that's shite, not the technologies themselves.
If you ignore IE, browser compatibilities aren't so bad. Even if you include IE, are they significantly worse than OS incompatibilities if you decided to go native?
Finally, MVC. Exactly how is this "bastardized"? How would you do it differently, if you were writing a web framework? At least that's a specific example -- but you mentioned "software development and programming theories," plural, and you've only mentioned one.
It's possible you've got some good points, but you haven't backed them up at all.
Don't thank God, thank a doctor!
SNMP is in serious need of retirement! Even XML is better (and that's saying a LOT!) The constraints that made it seem like a good idea simply don't exist anymore anywhere.
See also BEEP and the syslog over BEEP (that has never, to my knowledge EVER been supported by anything) A protocol that didn't realize that we already HAVE multiplexing built in to the communications channel, and so re-implemented it in the most baroque way possible.
ActiveX rises to new heights of bogosity. It's not just poor implementation or even poor design. The very concept of letting random websites execute arbitrary code outside of a sandbox is brain dead.
Ah, the OSI model [sic, recte suite], [...] having never been implemented!
Saying that the full OSI suite has never been implemented is like saying that nobody implements the full set of standard track RFCs -- which is true, since some standard track RFCs are mis-designed or even contradict other standard-track RFCs.
Large parts of the OSI suite have been implemented, and some are still running today. For example, IS-IS over CLNP is commonly used for routing IP and IPv6 traffic on operators' backbones. (I was about to mention LDAP and X.509 before I realised they are not necessarily the best-designed parts of OSI.)
Where you are right, though, is that large parts of OSI are morasses of complexity that have only been implemented due to government mandate and have since been rightly abandoned.
Let's see... we start with electrical signals, which are stateful... then we layer IP on top, to make it stateless... then we layer TCP on top of that, to make it stateful again... then we layer HTTP on top of that, to make it stateless again... then we layer cookies on top of that, to make it stateful again...
...and then we wonder why it performs like shit and is flaky as all hell!
I can't imagine what the problem might be... maybe we need a few more layers to make it perfect!
Yes, and actually I agree with jabberw0k. There's simply no call for that kind of language; it added nothing to the points being made, and in fact distracted the poster from what had been a reasonably cogent argument up until that point.
If you reread the AC post, he/she makes several good points with some substance in the first four paragraphs - and then just lets rip with the profanity in the fifth paragraph, which, coincidentally, is where the entire post dissolves into a bunch of assertions with little to no rationale provided.
"Javascript is horrible." Oh, okay, then - why? "PHP is just as dreadful." Really, you don't say? Justify this assertion, please. "Every web developer who doesn't fit my narrow criteria is automatically rubbish." Glad you are still giving us some cogent points, then.
For what it's worth, I actually agree that "working" is different from "working well". One of my day jobs is as a member sitting on an interoperability panel at the moment, and you very quickly realize that something can meet the base level of "it does what it says" and fail miserably to be compatible and interoperable with other products.
But I don't need to descend to toilet language to explain this.
By the time you add real garbage collection to C++, you're rapidly approaching a point where you may as well use Java. Anything short of that, like auto_ptr, is just a band-aid -- you still have plenty of ways to leak memory, and plenty of potential for buffer overflows. Contrast this to a sane, modern language, where these problems cannot exist.
Again, what would you suggest? If you're going to continue dismissing things I propose as crap without offering anything useful in its place, it's really not worth talking to you. If C++ is actually what you're suggesting, say so, and defend it.
Don't thank God, thank a doctor!