How Not To Design a Protocol
An anonymous reader writes "Google security researcher Michael Zalewski posted a cautionary tale for software engineers: amusing historical overview of all the security problems with HTTP cookies, including an impressive collection of issues we won't be able to fix. Pretty amazing that modern web commerce uses a mechanism so hacky that does not even have a proper specification."
Are slashdot accounts with auto-login also vulnerable?
RTFA. That's exactly what happend with HTTP. "It works". In the world of 1990. And then they started to "fix" it to keep up.
HTTP is like a manual lawn mower. It's not flawless, pretty, blazingly fast, or elegant, but it's usable enough to do the job, and you get used to the quirks.
... cookies are delicious!
Check out my novel.
Darn...and here I thought this was going to be an article on the OSI Network model...
http://en.wikipedia.org/wiki/OSI_model
If the only way you can accept an assertion is by faith, then you are conceding that it can't be taken on its own merits
The whole cookie system should be replaced by a system based on public key cryptography. Replace domain scope by associating sessions with the public keys of the client and the server. Authenticate each chunk of exchanged data by signing a hash value. Browsers could offer throwaway key pairs for temporary sessions and persistent key pairs for preferences and permanent logins.
I still think allowing cookies to span more than one distinct domain was a mistake. If we had avoided that in the beginning, cookie scope implementations would be dead simple and not much functionality would be lost on the server side. Also, JavaScript cookie manipulation is something we could easily lose for the benefit of every user, web developer and server admin. I postulate there are very few legitimate uses for document.cookie
When I'm designing a solution, I don't ask if it works, I was if it works well. Is it secure? Is it scalable? What are the risks associated with it? Is it full of kludges that make bad implementations easy? What do I do if a user decides she doesn't trust that functionality and turns it off? And the point of the article wasn't to say that people shouldn't use cookies when developing web site or applications. Rather, it's an examination of how a sub-optimal solution came to be so that perhaps other people can avoid similar pitfalls in the future.
I think it can be hard to plan for this far into the future. Look how much the web has changed, and the things we do now with even just HTML and CSS that people back in the beginning probably would never have even considered doing. You build something for your needs and if it works then you are good. Sometimes you don't want to spend time planning it out for the next 5, 10, 20 years because you assume (usually correctly) that what you are writing will be updated long before then and replaced with something else.
I will shred my adversaries. Pull their eyes out just enough to turn them towards their mewing, mutilated faces. Illyria
I wonder how many code snippets of yours have appeared on The Daily WTF. Just because something works doesn't mean it's good.
I knew a pilot who flew with duct tape holding down the fuel cap on his wing. That worked too, but it's hardly ideal is it?
Here in Australia a few years back, a major power substation was "working" only because someone rigged up a hose to constantly drip water on an overheating thingomajig. Sure it works and props to the hardhack, but it's a piece of shit that can easily stop working.
You see, some of us prefer things not to be a piece of shit.
On a domain.
Like the crosssite.xml or robots.txt files. "Cookies on this site must follow this pattern." Or somesuch.
Most of the rest, I can cope with. Cookie pollution from various forms of injection, not so much.
Adult Role Playing Forum
TFA makes it clear that it is impossible to repair the current cookie system: it is really badly broken, and several previous attempts have failed.
Could we therefore design a complete new replacement system, to be implemented in parallel, and added as part of the HTML5 standard? If it were well specified, so that all implementations were consistent, and had all the features that TFA shows are needed, it should be both easy to use and have serious benefits for the site designer as well as the user. In which case, designers might be inclined to do
if then else
The important thing is that it must be easy to use the replacement (e.g. no inter-browser weirdness) and the designer must get some payoff in terms of a better site. Of course, the user will also get a payoff - probably bigger - in terms of better security amongst other things. But, realistically, it is the designers convenience which will win the day. Once you get the big four (or so) browsers implementing the same standard, and designers regarding that as a preferred option, it has a chance of taking over.
Who can design such a system? Assuming a perfect "supercookie" system is designed, how do we get it into the standard? And what is the game-changing power feature that will bribe site designers to use the supercookie?
Consciousness is an illusion caused by an excess of self consciousness.
It is this type of thinking that separates a carpenter from an engineer.
The road to tyranny has always been paved with claims of necessity.
"Working" is measured over a very wide spectrum. On one hand, we have "broken", and on the other we have "working perfectly". The web is far, far closer to the "broken" side of the spectrum than it ever has been to the "working perfectly" side.
Put simply, almost everything about the web is one filthy hack upon another. It's a huge stack of shitty "extensions" that were often made with little thought, so it's no wonder web development is so horrible today.
HTTP has been repurposed far more than it should have been. Its lack of statefulness has resulted in horrible hacks like cookies and AJAX. HTTP makes caching far harder than it should be. SSL and TLS are mighty awful hacks. And those are just a few of its problems!
HTML is a mess, and HTML5 is just going to make the situation worse. Even after 20 years, layout is still a huge hassle. CSS tries to bring in concepts from the publishing world, but they're not at all what we need for web layout, and thus everyone is unhappy.
A lot of people will claim otherwise, and they're wrong, but JavaScript is a fucking horrible scripting language. It's even worse for writing anything significant. And no, it's absolutely nothing like Scheme (some JavaScript advocate always makes this stupid claim whenever the topic of JavaScript's horrid nature comes up).
PHP is one of the few popular languages that can rival JavaScript in terms of being absolutely shitty. Then there are other server-side shenanigans like the NoSQL movement, which arose solely because there are a lot of web "developers" who don't know how to use relational databases properly. I've seriously dealt with such "developers" and many of them didn't even know what indexes are!
Most web browsers themselves are quite shitty. It has gotten better recently, but they still use huge amounts of RAM for the relatively simple services they provide.
The only people involved with some sort of web-related software development who aren't absolute fuck-ups are those working on HTTP servers like Apache HTTPd, nginx, and lighttpd. But now we're seeing crap like Mongrel and Mongrel2 arising in this area, so maybe it's only a matter of time before the sensible developers here move on.
So just because the web is "sort of broken", rather than "completely fucking broken", it doesn't mean that it's "working".
Why go hatin' on this particular protocol?
Most of them are just nuckin futs:
* FTP: needs two connections. Commands and responses and data are not synced in any way. No way to get a reliable list of files. No standard file listing format. No way to tell what files need ASCII and which need BIN mode. And probably more fubarskis.
* Telnet: The original handshake protocol is basically foobar-- the handshakes can go on forever. Several RFC patches did not help much. Basically the clients have to kinda cut off negotiations at some point and just guess what the other end can and will do.
* SMTP: You can't send a line with the word "From" as the first word? I'm not a typewriter? WTF?
Web developers would have the time to think about important things like that, if they weren't spending all of their time trying to prevent data loss caused by MySQL or the NoSQL database de jour, horrible server-side peformance due to PHP, horrible client-side performance due to JavaScript, all while trying to avoid the numerous browser incompatibilities.
Although the tools and technologies they're using are complete shit, it sure doesn't help that they generally don't understand even basic software development and programming theories very well. See their bastardization of the MVC pattern, for instance.
Let's see:
1. IP is a stateless protocol, that's inconvenient for some things, so
2. We build TCP on it to make it stateful and bidirectional.
3. On top of TCP, we build HTTP, which is stateless and unidirectional.
4. But whoops, that's inconvenient. We graft state back into it with cookies. Still unidirectional though.
5. The unidirectional part sucks, so various hacks are added to make it sorta bidirectional like autorefresh, culminating with AJAX.
Who knows what else we'll end up adding to this pile.
A pretty interesting write up :)
-- Programming with boost is like building a house with lego. It's a cool but I wouldn't want to live in it
A session is forever
i love your design
And let's replace IPv4 while we're at it!
Warning: this article may contain humor, sarcasm, parody, and perhaps even irony. Read at your own risk.
Most of the crap we surround ourselves with (cookies, MIME, Windows and Office, etc.) are still there because they are there and the alternatives aren't.
What is the alternative to using cookies, really? Almost every framework for web-based development has session support that largely relies on cookies. Give me something more secure that works as easily and I will be using it right away.
Assorted stuff I do sometimes: Lemuria.org
1. Sure
2. stateful, stream-oriented, *and* reliable
3. HTTP designed as a stateless datagram model, but wanted reliability, so TCP got chosen for lack of a better option. SCTP if it had existed might have been a better model, but for the time the stateful stream aspect of TCP was forgiven since it could largely be ignored but reliability over UDP was not so trivial.
4. More critically, the cookie mechanism strives to add stateful aspects that cross connections. This is something infeasible with TCP. Simplest example, HTTP 'state' strives to survive events like client IP changes, server failover, client sleeping for a few hours, or just generally allowing the client to disconnect and reduce server load. TCP state can survive none of those.
5. Indeed, at least AJAX enables somewhat sane masking of this, but the only-one-request-per-response character of the protocol means a lot of things cannot be done efficiently. If HTTP had allowed arbitrary server-side HTTP responses for the duration of a persistent http connection, that would have greatly alleviated the inefficiencies that AJAX methods strive to mask.
XML is like violence. If it doesn't solve the problem, use more.
Your way of thinking is nice, but it is exactly this attitude that gets developers fired (or their bosses broke if they share that attitude and don't fire you, in which case an inferior insecure competing product will dominate) for thinking too much instead of getting the product out. That's why we are up to the neck in inferior goods, protocols just being one example. Not even death penalty (e.g. for melamine in chinese milk) does seem to stop this.
Why can't we just start over with an entirely new web standard that would be designed in a more efficient manner?
Yes, why don't we? The layered nature of the protocol stack is meant to allow for multiple versions and revisions of various and sundry functionality and interaction between layers. All the bright outspoken /.'ers here can go off and build some newer, better layers, or even a whole new stack!
No more cookies needed, huzzah!
It didn't make mistakes that closely resemble those in Telnet, tftp, ftp, smtp, it made what may be considered completely distinct 'mistakes' in retrospect.
However, if you confine the scope of HTTP use to what it was intended, it holds up pretty well. It was intended to serve up material that would ultimately manifest on a endpoint as a static document. Considerations for some server-side programmatic content tweaking based on client given cues was baked in to give better coordination between client and server and some other flexibility, but it was not intended to be the engine behind highly interactive applications 'rendered' by the server. HTTP was founded at a time when the internet at large wasn't particularly shy about developing new protocols running over TCP or UDP and I'm sure the architects of HTTP would've presumed such a usage model would have induced a new protocol rather than a mutation of HTTP over time.
Part of the whole 'REST' philosophy is to get back to the vision that HTTP targets. Strictly speaking, a RESTful implementation is supposed to eschew cookies and server maintained user sessions entirely. Every currently applicable embodiment of data is supposed to have its own *U*RL and authentication when required is HTTP auth. Thanks to Javascript a web application can still avoid popping up the inadequate browser provided login dialog as well as assembling disparate data at the client side rather than server side. It doesn't work everywhere, and often even when it does it's kinda mind warping to get used to, but it does try to use HTTP more in the manner it was archictected to be used.
XML is like violence. If it doesn't solve the problem, use more.
Oh for fucks sake, stop being a fucking puritan, you fucktard!
Huh? The article is talking about HTTP, not HTML. Those two are not related in any way, Flash is also sent via HTTP.
SNMP is a nightmare. There was a doc out there that used SNMP as an exemplar of "how not to write a protocol."
It's easy to forget, but these protocols were designed back in the day when there wasn't a lot of ram, bandwidth, or CPU.
Most of the problems with everything have been well-discussed. You can dig into the past to see, but interoperability with existing implementations is always the blocking factor.
Heck, everyone knew the problems with ActiveX when it was announced...but that didn't stop MS. Same with cookies. If you want to see excitement, you can mine all the old protocol-level vulnerabilities just by plowing through usenet archives.
That's the "if it isn't broke, don't fix it" mindset for you. Nothing ever improves.
Filthy, filthy copyrapists!
Let's not confuse html with http. This is already messy territory as it is
So in other words, you never bring anything into production status.
Look, its really quite simple.
HTTP was a presentation mechanism, designed to deliver content, dependent on non persistent connections, where each initial and each subsequent request had to supply all information necessary to fulfill said request. Even if you "log in" to your account, every request stands alone.
There is no persistent connection. There is no reliable persistent knowledge on the server side that can be positivity attributed to any given client. Clients are like motorists at a drive up window of a Burger stand, not well known patrons at a restaurant.
Given that scenario, it was inevitable that cookies would be developed, and employed.
So unless you were willing to hold off deployment of e-commerce until you totally rewrote HTTP into a persistent connection based protocol, totally replaced the browser as the client side tool, any grandstanding on how carefully and methodically you work is just grandiose bravado.
The only tool at hand was http and web servers and browsers. Its still largely the same today. There was no other way besides cookies of some sort. You may argue about their structure, their content or what ever, but cookies are all that is on the menu.
Sig Battery depleted. Reverting to safe mode.
We're talking about HTTP, not HTML. Just because they are often used together doesn't mean they are the same thing. In fact, they couldn't be more different; one is a communications protocol, the other is a markup language - I hope to god you can figure out which is which from that much.
But HTML is a terrible mess of kludges that doesn't work very well, too. It's just that most people on Slashdot consider it to be superior to Flash, even though it lacks a lot of Flash's basic functionality, and lacks all of the nice development tools that Flash has. Most of this stems from security paranoia (legitimate, but overblown in 99% of cases) and its tendency to crash (more significant issue, IMO, and also legitimate - also the cause of much of the security paranoia).
Security is mostly a superstition... Avoiding danger is no safer in the long run than outright exposure. - Helen Keller
Thank you, Captain Hindsight! What a complete failure the designers of HTTP were. They should've done it so much different! :-)
He basically said "everyone apart from me sucks".
which is totally what she said
It would help if you qualified or explained a single one of these blanket assertions you've made.
What data loss is caused by MySQL? And while perhaps a NoSQL database "du jour" causes data loss, are you suggesting that the major ones like Couch, Cassandra, Mongo, etc all have serious data loss issues?
If so, specifics or it didn't happen. File a bug report, at the very least.
I don't have much good to say about PHP, but didn't someone recently roll out a compiler for it? I can't imagine PHP performance is a significant bottleneck, especially as people run successful websites written in everything from Java to Ruby. And what would you suggest in its place, C++? Gee, thanks, now we can spend all our time focusing on memory leaks and buffer overflows instead.
It's possible it's the wrong language for the job, but if you want to make that case, you've got to suggest an alternative.
Similarly, for JavaScript -- say what? Chrome compiles JavaScript to native code, and Firefox just got faster than Chrome. Both of them are now more than competitive with languages typically used for server-side development, where you'd expect performance to be a much bigger bottleneck. Indeed, there's at least one modern server-side JavaScript framework, written for V8, Chrome's JavaScript engine.
And again, is a potential alternative actually better for a given problem? Again, specific examples. There are applications which actually have performance needs which suggest they should be native apps, and people generally don't try those as web apps. Then there's a very, very thin border where a web app makes sense on the Web, but would be faster native -- but often, it's the design that's shite, not the technologies themselves.
If you ignore IE, browser compatibilities aren't so bad. Even if you include IE, are they significantly worse than OS incompatibilities if you decided to go native?
Finally, MVC. Exactly how is this "bastardized"? How would you do it differently, if you were writing a web framework? At least that's a specific example -- but you mentioned "software development and programming theories," plural, and you've only mentioned one.
It's possible you've got some good points, but you haven't backed them up at all.
Don't thank God, thank a doctor!
Yes, Facebook runs PHP compiled to C++ using HipPop.
I think he's talking about the RoR model, where the view is essentially a template. That annoyed me too, but the framework I used is flexible enough to allow me to use Views as proper objects, which then use Templates.
Dilbert RSS feed
Actually I'd say it is the difference between those of us with a deadline and those without. It is easy in hindsight to say "I would do thus" but most of the time we simply aren't given the time we need to give a job the attention we'd like. Have I done a seriously half assed job in the past? Yes I have. Did I actually WANT to do a half assed job? No I didn't but was told flat footed "the job WILL be done by X", not Y, not even X+1 but X OR ELSE. It didn't matter that they would be getting a much less quality job, they chose half assed by X over decent quality by Y, so that is what they got.
ACs don't waste your time replying, your posts are never seen by me.
What does that tell you about how bad Flash is that HTML5 is such a massive improvement over it?
Don't thank God, thank a doctor!
I knew a pilot who flew with duct tape holding down the fuel cap on his wing. That worked too, but it's hardly ideal is it?
I don't know about Australia, but in the US, that's simply illegal (presuming that running with the loss of a fuel cap is a "safety" item and it's on a certified aircraft flown by a certified pilot). When you have to break the law to do stupid, it's a special breed of stupid. And a pilot should know such things...
Learn to love Alaska
Let's see... we start with electrical signals, which are stateful... then we layer IP on top, to make it stateless... then we layer TCP on top of that, to make it stateful again... then we layer HTTP on top of that, to make it stateless again... then we layer cookies on top of that, to make it stateful again...
...and then we wonder why it performs like shit and is flaky as all hell!
I can't imagine what the problem might be... maybe we need a few more layers to make it perfect!
Yes, and actually I agree with jabberw0k. There's simply no call for that kind of language; it added nothing to the points being made, and in fact distracted the poster from what had been a reasonably cogent argument up until that point.
If you reread the AC post, he/she makes several good points with some substance in the first four paragraphs - and then just lets rip with the profanity in the fifth paragraph, which, coincidentally, is where the entire post dissolves into a bunch of assertions with little to no rationale provided.
"Javascript is horrible." Oh, okay, then - why? "PHP is just as dreadful." Really, you don't say? Justify this assertion, please. "Every web developer who doesn't fit my narrow criteria is automatically rubbish." Glad you are still giving us some cogent points, then.
For what it's worth, I actually agree that "working" is different from "working well". One of my day jobs is as a member sitting on an interoperability panel at the moment, and you very quickly realize that something can meet the base level of "it does what it says" and fail miserably to be compatible and interoperable with other products.
But I don't need to descend to toilet language to explain this.
... than "I don't like it."
HTTP has been repurposed far more than it should have been. Its lack of statefulness has resulted in horrible hacks like cookies and AJAX
AJAX? I can understand the cookie criticism, which TFA did a pretty good overview of, but AJAX's place is pretty much orthogonal to the issue of state. People resort to hacks *with* AJAX because browsers don't have a protocol with sessions, but even if we did, AJAX-like APIs and idioms would exist and continue to be used.
layout is still a huge hassle. CSS tries to bring in concepts from the publishing world, but they're not at all what we need for web layout
Layout -- even cross-platform layout -- is actually pretty easy if you use a subset of CSS positioning for the problems it's good at and tables for cases where it isn't.
A lot of people will claim otherwise, and they're wrong,
I predict a lot of the people who claim otherwise will do something you manage to neglect in their comment: provide justification for their statements. Perhaps you can try that your second time around instead of merely pounding your fist on the table about your personal opinion.
but JavaScript is a fucking horrible scripting language. It's even worse for writing anything significant.
Worse than what? How?
And no, it's absolutely nothing like Scheme (some JavaScript advocate always makes this stupid claim whenever the topic of JavaScript's horrid nature comes up).
It's enough like Scheme on at least two important fronts (functions as first class values, scoping rules) that it's false to say it's "nothing" like Scheme, and the related idioms that grow up around those common parts of the language are important to using it that it's a reasonable comparison, even with all the syntactic weight that JavaScript has and the missing features like macros and tail-call optimization.
the NoSQL movement, which arose solely because there are a lot of web "developers" who don't know how to use relational databases properly. I've seriously dealt with such "developers" and many of them didn't even know what indexes are!
A lack of programmer familiarity with the setup and querying of RDBMSs is a problem, and yes, set up properly, they can be pretty darn effective for a lot of situations some devs are using NoSQL solutions for, but saying the later are there "solely" for this reason is just as ignorant.
Tweet, tweet.
If a line starting with "From " is changed to start with ">From ", then one must also change ">From " to ">>From " and so on. Without this, mail gets mangled.
When reading mail, that transform must be undone. Note that even in cases where mail was stored without ">From " being changed to ">>From " it is likely less destructive to do unescaping than not. This is because humans seldom send email containing lines that start with ">From " but frequently send emails with lines starting with "From ".
Yes, Facebook runs PHP compiled to C++ using HipPop
And that's pretty much my point exactly. Avoids the problems of C++, mostly, and all their existing PHP code gets faster just by tuning the language.
In the meantime, I don't see why I should adopt an ugly, dangerous language to solve a performance problem which, frankly, I just don't see. A properly designed app should be able to scale, which means you can throw hardware at the problem. When you're big enough that this isn't feasible, you're probably big enough that you can afford to build something like HipHop.
I think he's talking about the RoR model, where the view is essentially a template. That annoyed me too, but the framework I used is flexible enough to allow me to use Views as proper objects, which then use Templates.
There's Erector, which allows views to be code which ultimately generates HTML -- similar to a template, but not identical.
But I have to ask: In a Web context, what else would make sense as a view, particularly if you're deliberately doing fat models?
Don't thank God, thank a doctor!
By the time you add real garbage collection to C++, you're rapidly approaching a point where you may as well use Java. Anything short of that, like auto_ptr, is just a band-aid -- you still have plenty of ways to leak memory, and plenty of potential for buffer overflows. Contrast this to a sane, modern language, where these problems cannot exist.
Again, what would you suggest? If you're going to continue dismissing things I propose as crap without offering anything useful in its place, it's really not worth talking to you. If C++ is actually what you're suggesting, say so, and defend it.
Don't thank God, thank a doctor!
Sure, the spec itself is retarded, but cookies have been around long enough that we, the developers, have learned their quirks and know how to avoid them. For starters, no sane coder would actually stuff several cookies full of 4096-byte data chunks. They are mostly used for storing a relatively small session ID, with the big data blobs stored server-side, where they are actually used anyway.
The cross-domain issue is indeed annoying for sites that do mass vhosts like "username.somedomain.com". I frankly have never used cross-domain cookies, when it is easier at both ends to pass the ID in a URL. I'm not saying they should completely disable this feature, but maybe turn it into an opt-in kind of thing, to be decided by the user. I consider it far more secure for such sites to use cross-domain JS includes (pull), rather than someone else's cookies (push).
-Billco, Fnarg.com
I believe the parent was referring to smart pointers and RAII which lets you select when the data is considered garbage and when it should be collected. Languages that use GC can also leak memory if you're not careful. I remember a Java program I was working on where the data was loaded in a map for quick lookup. Whenever the operator would load a new file, the map wasn't set to null and Weak pointers were not used so it leaked.
It's true that GC'd languages can potentially leak memory, but the possibilities are small and almost require you to deliberately subvert what the garbage collector otherwise does for you.
By contrast, it's trivially easy to leak memory in a non-garbage-collected language, and again, "smart pointers" (just refcounting, right?) are still more likely to leak memory, and potentially add even more overhead than real GC.
So, may as well just use GC, and if you're doing that, may as well just use something like Java. (Though not, I'd hope, Java itself.)
Don't thank God, thank a doctor!
I don't have much good to say about PHP, but didn't someone recently roll out a compiler for it? I can't imagine PHP performance is a significant bottleneck, especially as people run successful websites written in everything from Java to Ruby.
The main performance issue with PHP never was its interpreter, which was always reasonably fast. The issue that a lot of people have with it is that if you use one of its most important features, i.e. its automatic session management, it uses a locking system that basically means only request per client can be processed at a time. You can work around it, but you have to know what you're doing, and many people are unaware that they need to. Hence, if you have content like images being served by PHP, unless its author understood the language much better than the average PHP developer does, the result will be unnecessarily slow.
If you ignore IE, browser compatibilities aren't so bad. Even if you include IE, are they significantly worse than OS incompatibilities if you decided to go native?
You can't ignore IE. Like it or loathe it (the former only happens if you aren't actually a web developer), IE still has a significant market share. Not supporting it on any site with a commercial goal is practically suicide.
And, yes, IE's compatibility issues are significantly worse than native development issues. I can pick a framework to develop with, say Java+SWT, and have the results work on every common target platform with almost no platform-specific work required. If I target web browsers, I have never been able to produce a non-trivial application without spending significant time debugging IE-specific issues (e.g. browser crashing on unloading plugins in hidden divs, an issue which struck my last major web project and delayed it by about two days as I figured out a workaround).
By contrast, it's trivially easy to leak memory in a non-garbage-collected language, and again, "smart pointers" (just refcounting, right?) are still more likely to leak memory, and potentially add even more overhead than real GC.
There are ref counting smart pointers but there's also weak pointers and unique pointers. For the majority of stuff you just want to ensure that the resource is released when the resource's owner goes out of scope. It's not that complicated.
There are also plenty of GC libraries for C++ so it's possible to select which objects are GC candidates and which are not - best of both worlds.
So, may as well just use GC, and if you're doing that, may as well just use something like Java. (Though not, I'd hope, Java itself.)
GCs have other issues aside from efficiency. They make it much harder to have real-time guarantees. They make it harder to free up resources in a deterministic manner although C#'s using statement makes this much easier. Also a good number of Java and C# programmers probably don't even know about weak pointers so I'm pretty sure memory leaks exists in most non-trivial programs in GC languages too.
Sometimes I prefer Java or C#. Sometimes I prefer C++. I just don't think the memory leak issues in C++ are as bad as many people try to make them to be.
BTW, I'd like to make it clear that I'm just stating a personal opinion. I'm not an expert on the subject. I'm just some guy with an opinion.
You can't ignore IE. Like it or loathe it (the former only happens if you aren't actually a web developer), IE still has a significant market share. Not supporting it on any site with a commercial goal is practically suicide.
Supporting old versions in a limited capacity, with a suggestion to upgrade your browser, doesn't seem to be hurting YouTube any.
And, yes, IE's compatibility issues are significantly worse than native development issues. I can pick a framework to develop with, say Java+SWT, and have the results work on every common target platform with almost no platform-specific work required.
But you're incapable of picking a framework to develop with, say JQuery or IE9.js, which has the results work on every common target platform with almost no platform-specific work required?
And again, throw out IE, particularly old versions of IE, and it becomes a decent platform. If needed, add it back in with something like IE9.js or Chrome Frame.
Yes, Chrome Frame. You're going to make your users download a JVM and your native app, but it's too much to ask them to download a browser, or even a browser plugin?
It's worth mentioning, too: IE has fallen below 50%, and that's in general. Among technically-inclined people, it's far lower. Only about 15% are on IE6, and again, the platform massively improves when you don't have to support that anymore.
Don't thank God, thank a doctor!
It's not that complicated.
In theory, it's simple. In practice, not so much -- the bugs which can happen here are numerous and subtle.
There are also plenty of GC libraries for C++
And my point here was that by the time you use a GC library, why not get the full benefit of a safer, saner language? You've already got most of the overhead of something like Java, why not also get the runtime optimizations and the protection from buffer overflows and segfaults, too?
it's possible to select which objects are GC candidates and which are not
And what'd be the criteria for which objects should be GC'd and which you want to handle yourself?
I'd guess the objects which you want to manage yourself are either places where you're interacting with code, or particularly performance-critical parts of your application. But if you're doing it that way, it seems to me that I get most of the same benefit by coding in Ruby, and dropping down to C for those two cases.
It seems you could get similar benefits in Java if JNI wasn't such a bitch -- and even as it is, it isn't that bad compared to pretty much anything else in C.
I just don't think the memory leak issues in C++ are as bad as many people try to make them to be.
I don't think they're particularly bad either, but I don't see any reason I, as a programmer, should have to deal with them. I certainly don't think C++ has any real place in web development -- except, as I mentioned, in particularly performance-critical bits, especially when they can be abstracted into libraries. I trust the HTTP parser in nginx or Apache a lot more than any code I wrote myself, but anything I write, I trust a lot more in Ruby or JavaScript than in C or C++.
Don't thank God, thank a doctor!
It's not that complicated.
In theory, it's simple. In practice, not so much -- the bugs which can happen here are numerous and subtle.
That's true. It's also true of other languages but there are probably more issues with C and C++ than there are with many other languages.
There are also plenty of GC libraries for C++
And my point here was that by the time you use a GC library, why not get the full benefit of a safer, saner language? You've already got most of the overhead of something like Java, why not also get the runtime optimizations and the protection from buffer overflows and segfaults, too?
For most purposes, smart pointers will do the job real fine. There's none to little overhead and you get the advantage that you know when your objects get destroyed.
it's possible to select which objects are GC candidates and which are not
And what'd be the criteria for which objects should be GC'd and which you want to handle yourself?
I'd guess the objects which you want to manage yourself are either places where you're interacting with code, or particularly performance-critical parts of your application. But if you're doing it that way, it seems to me that I get most of the same benefit by coding in Ruby, and dropping down to C for those two cases.
It seems you could get similar benefits in Java if JNI wasn't such a bitch -- and even as it is, it isn't that bad compared to pretty much anything else in C.
I've never had to use a GC in C++ so I'm mostly guessing here. One situation where I'd want to use GC is if I had several containers sharing the same objects and none could be considered the owner. If there's an owner, then using a weak pointers for other containers does the trick.
As far as performance is concerned, going from managed to unmanaged code was relatively expensive in Java with JNI when I used it. Hopefully Ruby is better at it. I don't think you're wrong in that the vast majority of cases don't need the performance provided by C++.
There's another thing you might want to look at when talking about performance. C and C++ will usually have much lower memory requirement and there's no interpreter to load. If performance is an issue, it might be simpler to stick to C++
I just don't think the memory leak issues in C++ are as bad as many people try to make them to be.
I don't think they're particularly bad either, but I don't see any reason I, as a programmer, should have to deal with them. I certainly don't think C++ has any real place in web development -- except, as I mentioned, in particularly performance-critical bits, especially when they can be abstracted into libraries. I trust the HTTP parser in nginx or Apache a lot more than any code I wrote myself, but anything I write, I trust a lot more in Ruby or JavaScript than in C or C++.
If performance is not an issue, I wouldn't use C++ either unless there's some reason to. I've implemented some proof-of-concept in C++ but I did so because I had to interface with our code base. At other times I've used Perl, Java and C# when I could choose.
That's true. It's also true of other languages but there are probably more issues with C and C++ than there are with many other languages.
Well, in particular, when something goes wrong in Java, the typical result is a NullPointerException, which can be caught and managed, and which is much easier to debug compared to with C, where the typical result is a segfault, and it can be difficult or impossible to track down.
For most purposes, smart pointers will do the job real fine. There's none to little overhead and you get the advantage that you know when your objects get destroyed.
Well, again, what do you mean? If we're talking about std::auto_ptr -- that is, a refcounting pointer -- then while I haven't done the benchmarks to back it up, I'd guess refcounting can actually be worse than GC in terms of performance. In particular, with a garbage-collected language, the garbage collector presumably runs at intervals, and is highly optimized -- the whole thing probably fits in cache. This means when the GC isn't running, there's no memory-management-related code running. By contrast, with refcounting, you're at least dealing with the reference count all the time, and you're making calls to delete or free more often...
On the other hand,
C and C++ will usually have much lower memory requirement and there's no interpreter to load.
I don't think an interpreter alone is an issue, and I'm skeptical that the memory requirements are that significant, but if nothing else, GC would tend to leave objects around for awhile before attempting to collect them, whereas C and C++ can collect them immediately. In practice, for performance reasons, you'd probably retain a pool of allocated memory so you don't have to talk to the OS as often -- I think modern malloc implementations do this -- but on a system truly starved for memory, it helps that every byte is released as soon as it can be.
It's just that for the vast majority of applications, GC and other modern, high-level tools are more than worth a large performance penalty, and the difference is getting smaller all the time.
Don't thank God, thank a doctor!
HTTP is a huge complex mountain of hacks on top of other hacks. We are just lucky that no more 'features' have been added to it for some time.
I have been thinking about defining a sane subset and calling it HTTP 0.2, but every time I look into it the sheer messiness of the HTTP standard and existing implementations is just too depressing to handle.
"When in doubt, use brute force." Ken Thompson
For most purposes, smart pointers will do the job real fine. There's none to little overhead and you get the advantage that you know when your objects get destroyed.
Well, again, what do you mean? If we're talking about std::auto_ptr -- that is, a refcounting pointer -- then while I haven't done the benchmarks to back it up, I'd guess refcounting can actually be worse than GC in terms of performance. In particular, with a garbage-collected language, the garbage collector presumably runs at intervals, and is highly optimized -- the whole thing probably fits in cache. This means when the GC isn't running, there's no memory-management-related code running. By contrast, with refcounting, you're at least dealing with the reference count all the time, and you're making calls to delete or free more often...
On the other hand,
First off, I'm not an expert in memory management. I graduated in the 90s and I'm sure things have changed quite a bit since then. That is, I may be wrong and you can prefix everyone of the next sentences with "As far as I know."
Reference counting has very little overhead. Memory wise it adds a few bytes and time wise, it adds almost nothing as well. A GC will probably use reference counting to speed up detection of unused memory and only perform mark-and-sweep or whatever is needed to resolve circular references after that. I'm pretty sure that there's just as much, if not more, memory-management related code in a GC based program even when the GC is not running. I haven't done any benchmarks either.
auto_ptr is probably not the smart pointer you want to use. You're better off using boost's smart pointers. The problem with auto_ptr is that they don't play along nicely with containers.
As far as keeping unused objects in memory for cache, I don't think that GCs can do that. Once the object is no longer referenced, its data is meaningless. Also, I'm pretty sure that malloc/new implementations have never resulted in systematic calls to the OS.
I don't know if GC uses refcounting at all, though I suppose it's possible.
However, the point is that the reference counting itself isn't just the extra bytes of RAM, it's the extra bytes of CPU cache. It's the difference between a chunk of your program fitting in cache and running insanely fast, then being paged out for GC to run (and GC sits in cache during its run), and that same program needing the refcounting, malloc/free, and a bunch of other housekeeping stuff always hot in cache, meaning it's likely your program will have to have chunks of it paged in and out of cache much more often.
Paradoxical, and I'm not convinced, so I'd want to benchmark it. It does seem plausible, and I did read it in a respectable-looking paper.
So no, I wasn't talking about the GC keeping anything "in memory" (as opposed to what?) -- yes, once the object isn't referenced, its data is meaningless.
And yes, I'm pretty sure malloc/new implementations have, at least at one point, been direct system calls. I imagine they still are, on some embedded platforms. When you're starved for memory, it makes sense -- you want everything free'd for other processes to use as soon as you possibly can.
Good to know about boost -- though now I'm curious what the difference is.
Don't thank God, thank a doctor!
I don't know if GC uses refcounting at all, though I suppose it's possible.
However, the point is that the reference counting itself isn't just the extra bytes of RAM, it's the extra bytes of CPU cache. It's the difference between a chunk of your program fitting in cache and running insanely fast, then being paged out for GC to run (and GC sits in cache during its run), and that same program needing the refcounting, malloc/free, and a bunch of other housekeeping stuff always hot in cache, meaning it's likely your program will have to have chunks of it paged in and out of cache much more often.
Actually, ref-counting is mostly just the extra few bytes. An auto_ptr (or a unique_ptr or a boost::scoped_ptr) doesn't even use the extra bytes because it has single ownership. When they go out of scope, the object is destroyed. No extra byte; no complicated memory management. The C++ compiler knows when the object goes out of scope and will call the destructor at that time.
For boost::shared_ptr, there's extra memory for reference counting because there can be multiple owners. But, again, I'd be surprised if a GC-based language wouldn't use reference counting. Perl, for example, uses reference counting exclusively *because* it's much faster than other schemes. It has the same drawback that C++ has which is that circular references may leak.
Paradoxical, and I'm not convinced, so I'd want to benchmark it. It does seem plausible, and I did read it in a respectable-looking paper.
If you have a link to that paper I'd like to see it. As I said, there's not much more to reference counting other than incrementing a value when the object is assigned a new owner and decrementing that same value when it's being released. The allocation is done once and there is a single delete.
So no, I wasn't talking about the GC keeping anything "in memory" (as opposed to what?) -- yes, once the object isn't referenced, its data is meaningless.
And yes, I'm pretty sure malloc/new implementations have, at least at one point, been direct system calls. I imagine they still are, on some embedded platforms.
I've programmed in C and other procedural languages, Pascal for example, for a long time. I've never seen a single implementation that would make a system call for each malloc/free call. If you know of one, again, I'd be interested to have a link.
When you're starved for memory, it makes sense -- you want everything free'd for other processes to use as soon as you possibly can.
When delete is called (or free in C), the memory used by the object is made available immediately. This requires a call to the C or C++ library, if that's what you mean, but this is not a system call. It doesn't require an intervention from the OS except, maybe, in a multi-threaded application. If this library call is what you mean by "system call" then yes it has some overhead. I have heard of implementations of new/delete that accumulate the delete in order to gain a few extra cycles. But when you need these extra cycles you probably should be programming in C++.
Good to know about boost -- though now I'm curious what the difference is.
here
That's just it, though:
When they go out of scope, the object is destroyed... The C++ compiler knows when the object goes out of scope and will call the destructor at that time.
Which means the destructor now needs to be called, along with whatever code the 'delete' keyword actually compiles to. And again, this is extra bytes of code.
But, again, I'd be surprised if a GC-based language wouldn't use reference counting. Perl, for example, uses reference counting exclusively *because* it's much faster than other schemes. It has the same drawback that C++ has which is that circular references may leak.
Well, and I know for a fact Java, Ruby, and any sane JavaScript interpreter at least has some sort of actual garbage collector, vaguely like mark-and-sweep, so they don't have to deal with circular references. Once they have that, I don't see the point of reference counting.
As I said, there's not much more to reference counting other than incrementing a value when the object is assigned a new owner and decrementing that same value when it's being released. The allocation is done once and there is a single delete.
It's that value, plus the actual delete.
If you have a link to that paper I'd like to see it.
Not readily. I think the best I can do at the moment is point out that the wikipedia article seems to agree with me. There's also this, which again suggests that garbage collection can match or beat malloc/free -- and that's without mentioning refcounting, which brings some additional overhead of its own.
When delete is called (or free in C), the memory used by the object is made available immediately. This requires a call to the C or C++ library, if that's what you mean, but this is not a system call.
Right -- this is what I mean by the smart, optimized way. It's not a system call every time (though it is sometimes), and it isn't entirely without cost.
But because it's not a system call, the memory is only available to this program immediately, which is why I'd imagine (though I don't have a link to back it up) that on an embedded system, if you were particularly starved for memory, you might want to make it immediately available to other programs, which necessarily involves talking to the system.
Don't thank God, thank a doctor!