IE Shines On Broken Code
mschaef writes "While reading Larry Osterman'a blog (He's a long time Microsoftie, having worked on products dating back to DOS 4.0), I ran across this BugTraq entry on web browser security. Basically, the story is that Michael Zalewski started feeding randomly malformed HTML into Microsoft Internet Explorer, Mozilla, Opera, Lynx, and Links and watching what happened. Bottom line: 'All browsers but Microsoft Internet Explorer kept crashing on a regular basis due to NULL pointer references, memory corruption, buffer overflows, sometimes memory exhaustion; taking several minutes on average to encounter a tag they couldn't parse.' If you want to try this at home, he's also provided the tools he used in the BugTraq entry."
Since it may not be obvious to all readers, be aware that when you can make a program crash by feeding it bad data, you can typically further manipulate the data you are sending it to take control of the program. That means a security hole. This is how buffer-overruns work. You can't always do it, but you can think of each way you can crash a program as a little "crack" in its exterior. If you can figure out a way to pry apart the crack, you've got yourself a hole.
.NET languages, rather than programming in straight C like the others) which as a side effect prevented such problems.
So many of these "bugs" in Mozilla, Opera, Lynx, and Links are likely security holes as well.
It is interesting, then, to see that Internet Explorer did so well on this, with its notoriously bad history on security. My first instinct would be that the HTML parsing engine in Internet Explorer was written by a different team of programmers than worked on the rest of the software, and they used proper programming techniques (such as RAII in C++, or perhaps used one of their
Let's hope that all these bugs are taken care of in the other browsers quickly before the black hats find ways to make use of them.
It's hard for thee to kick against the pricks.
However, my Mozilla passed the test without crashing. :-P
Money for nothing, pix for free
Without looking at it more closely my first reaction is, well for his values of "random" this might be true. But random is a horribly abused term in computer science. First of all it was no doubt arbitrary within certain ranges rather than random. Then, we need to look at why he chose those ranges.
With such a powerful parsing engine you would thing IE could parse web standards a little better.
Steal This Sig
I suspect that the mozilla developers will be busy using this same tool to vigorously debug their application...
*shrugs*
So, if you feed IE random crap it doesn't crash? Too bad when you feed it stuff you'd like it to crash on (auto execution of malicous code, etc, it works just fine...)
When all is said and done, I still feel 100% safer surfing the web with some Gecko deriviative...
Yes Francis, the world has gone crazy.
Potentially.
does the fact that they do crash a good thing?
No. Never ever is it a good idea to crash on receipt of invalid data. It's up to the program to try and parse this, realise it can't do so successfully, then act ccordingly (error message, best-guess try, whatever. I prefer error message myself, but can understand those who prefer best-guess).
Cheers,
Ian
I don't know if they still use it, but the Linux kernel developers used to use a program called "crashme" to help test kernel stability. Essentially, it generated random code and tried to execute it. Something like this for web browsers would make for a very useful procedure. Generate the code, throw it at the browser and log the code if it crashed the browser.
Of course IE can handle broken html. That's what Microsoft Frontpage produces. They have to be able to handle their own product.
As x approaches total apathy I couldn't care less.
It encourages web authors to make pages that don't work in other (standards-compliant) browsers. But even MS is getting a bit tired of this, because (1) there are now plenty of pages that don't work even with IE (I encounter them all the time at work), and (2) all the error correction code helps to keep IE bloated and slow.
Great, but then it also encourages people to write bad code - see all that code with broken tables and a million tags that remain unclosed?
You're confusing two seperate things here:
This guy has been testing for (2) and not (1). Bad HTML should never cause crashes, memory corruption and buffer overflows. Period.
Finally, you can't go blaming the users for bad input. One of the golden rules of software design is that all software should either reject or handle gracefully bad input. Crashing is not graceful.
Avantslash - View Slashdot cleanly on your mobile phone.
Assuming this MSFT guy is not lying...
Yes it's a slap in the face. But seriously this is what OSS is supposed to be about. Full public disclosure. If he did find scores of DoS related bugs then the OSS crowd [who like to show their names when the attention getting is good] ought to pay attention and fix the problems.
You can't gloat how open and progressive you are if you scowl and fight every possible negative bit of news.
And "mentioning how bad MSIE is" is not a way to make your product any better [just like "he's not bush" isn't a bonus for Kerry].
So shape up, take it in stride and get to the board.
Oh and while you're at it make Mozilla less bloatware. 30MB of tar.bz2 source could be your first problem....
Tom
Someday, I'll have a real sig.
The same person tells us that Apache sucks when compared with IIS. Does this mean we've all been wrong about Microsoft products? If we take Microsofts word for it we have indeed and should seriously consider switching back to IIS. After all, [THE FOLLOWING IS SARCASM:] this conclusively proves that IIS is far superior to the Linux Apache Mysql Perl/Python/Php system.
[signature]
His only criterion is whether the browser crashes or not. Somehow, it disturbs me more that IE doesn't crash; what precisely is the effect of the bad code then?
I agree that these are all potential security holes, yet the article author mistakenly correlates crashing with vulnerability.
What if IE is similarly vulnerable yet simply doesn't crash?
From that point of view, at least crashing is an indication that sometihng is not right.
All the same, I consider it a good beginning.
Blearf. Blearf, I say.
My firefox managed to handle them all without incident too. Either the artical is out of date or we have a good case of FUD. Which reminds me I really need to update firefox from 0.9.3.
I cat binaries all the time - not biggie. Did you cat it to stdout or to your browser or something? Choosing to do something bad on your desktop and causing a crash isn't nearly equatable to something that could allow Ivan's porn TGP to rootkit your machine simply by sending it a properly formed TEXT file.
No. I don't care how bad the input is, if my program reads the input and throws an access violation, then it is my job to fix my program, test the input more, assume less about it or whatever, until my program does something more sensible and less dangerous with the input - like giving up with an error message or even an assertion failure.
I repeat: code that crashes with a null pointer error is wrong. End of story.
My Karma: ran over your Dogma
StrawberryFrog
So your theory is that having access to the source of these browsers has enabled an individual to crash them, but having thousands of people have the source code hasn't resulted in the problems being fixed already.
No, my theory is that while these bugs are serious in themselves, if I had access to IE's source I maybe able to come up with code that would crash it, too.
Although thousands _can_ see them, very few actually do. There's a difference.
Either way, kudos to him, and a good job at finding the bugs. But that still wouldn't make me change to IE anytime soon.
Case in point.
Last week I wrote some Perl to process an mbox mail folder. I just wanted a quick and dirty way to view its contents in a web page. A couple of CPAN modules and a few dozen lines of code and thing was done. Then I started to get fancy and dealing with stuff like embedded MIME-encoded GIF images. This was pretty simple to do, but I made a mistake. Once I had the decoded GIF data lying around, I wrote it to the HTML file of the current e-mail message, rather than writing it to a seperate file and writting <img src="foo.gif"> in the HTML file.
I was viewing the results with Firefox 0.10.1. When it got to a message with an embedded GIF, with a big slodge of GIF binary data sitting in the middle of the page, Firefox either just sat there spinning its hourglass, or crashed and burned.
Then I looked at the same file with IE, and the GIF image showed up. I was puzzled for a while until I noticed that in the directory where I had created the file, no GIF files had been created. It is of course arguable that IE should not have attempted to render the GIF image from the binary data sitting in the middle of the page, but it did so without complaint. Not rendering it would also be acceptable.
Firefox, on the other hand, has a number of better alternatives to crashing or hanging. Should it display gibberish (like when you forget to set up your bz2 association correctly) or nothing, or the image? I don't know, and don't particularly care about which course of action is taken. Anything is better than crashing, especially when IE doesn't.
Anyway, I fixed the Perl code, and all is well.
The End
Except that he 1) provided a copy of the random malformed HTML generating tool that he used and 2) managed to crash the closed-source Opera, as well.
It's a little ridiculous to suspect that he spent countless hours searching the mozilla, links, and lynx source code to find HTML-interpreting crash-causing bugs and then created a random malformed HTML generator as a cover story as to how he found the bugs.
While it's great that IE can handle 'bad' web code, it really is a seperate issue from security. Now, when the other browsers actually *crash*, this is a concern. Yes crashes *can* be used to determine an exploit, but that doesn't mean they *do*.
To beat the dead horse of the car analogy, if my car doesn't start, it may be the entire electrical system, or maybe my battery is just dead. The moral is don't try to make a mountain out of a mole hill.
Meanwhile, I absolutely despise the fact that IE does handle a lot of 'bad' code. This is a side effect of the IE monopoly on the browsing world. We're not talking about it handling variables that arent declared before they are used or sumsuch. We're talking about code which *should* be causing errors. Since they don't cause errors most of the time (or are hidden from the user) and most web authors only test with IE, there is a massive amount of bad code on the net which is never fixed.
Now I'm glad that the author has found these crashing bugs in the other browsers. This obviously needs fixing, and I'm glad IE is at least stable when it encounters malformed code, but more error reporting needs to be done to the user on all browsers.
Summary:Good review, brings up great points, kudo's to MS for stability. Now everyone go back to work on your browsers and add blatant *THIS WEBSITE AUTHOR DOES NOT WRITE PROPER CODE* dialogs to all your error messages. It's the web author's fault, it's time we told them so.
-- I have fans? Wow.
Unfortunately, handling the test cases provided with the post doesn't mean it's ok. Browsers were run with randomly generated test cases, and the article says that usually a hundred or so tests were required to cause a crash. The probability of a poorly written browser crashing on any given case is low.
The chances of any single test case being problematic are low, so passing the cases provided doesn't mean Safari is safe. If someone downloaded the cgi that automatically generated tests and left Safari with that for a few hours without crashes, we could then say that it lacks the flaws that Mozilla and Opera have.
Whoever corrects a mocker invites insult;
whoever rebukes a wicked man incurs abuse.
--Proverbs 9:7
... and here's why.
With correct data (in this case, HTML), there is a specified action that is "correct". In other words, a correctly marked up table will get layed out, according to the W3C rules for laying out tables. A paragraph will get formatted as a a paragraph, etc.
With malformed markup, the "correct" thing to do is indeterminate. If every browser just takes its best guess, they will all diverge, and the behavior is wildly unpredictable. Even from version to version of the same browser, the "best guess" will change.
"So? You've just described the web!" Well, exactly, but it could have been avoided. Bad markup shouldn't render. It ain't rocket science to do (or generate, though that can be a harder problem) correct markup. If you had do it to get your pages viewed, you would. Ultimately, it wouldn't cost anymore, and would actually cost less (measure twice, cut once).
Of course, what I just wrote only really applies in a heterogenous environment ... which MS doesn't want ... fault tolerance in your own little fiefdom can make sense.
That's not actually accurate. Konqueror only shows that bug when there are javascript errors in the page.
HTML is out there, and millions of malformed pages exist. Most of this is a result of mistakes by authors, but some of it is a result of the moving target that HTML has presented in the past. /. story is also about how various browsers, including the saintly Firefox, can be made to *crash* given certain input. Just thought that should get a mention :)
While your argument is attractive in principal, in practice it's misguided. The horse has bolted. in 2004, no-one would use a browser that didnt work with a huge proportion of the web's content. This is an area where pragmatism is required.
And to respond to the ubiquitous MS-bash, let's step back and remind ourselves that this
(And BTW, I speak as a Firefox user)
A person writing a 'Linux advocate article' is likely to be making extensive use of CSS in order to properly separate the document structure and style. In other words they create documents that are easier to maintain and comply with the standards.
I don't suppose most of them will cry if IE renders it slowly but I doubt that's their primary reason for doing it, although they'll certainly be aware that IE's CSS compliance sucks.
With malformed markup, the "correct" thing to do is indeterminate.
Well, that's debatable, but one thing that's for certain is that you absolutely should not crash due to malformed data. That's bad enough in a browser that doesn't support tabs, but in a tab-capable browser it's unforgivable. That one unparsable webpage that causes the browser to crash is a serious annoyance if it takes out another dozen or so tabs in the process.
Even ignoring questions of pragmatism (raised by other respondents), at the very most the browser should display some sort of "malformed page, unable to display" error. User input (which this is essentially) should not be able to crash an application. My compiler doesn't crash because of syntax errors in my code, why should my web browser?
It's official. Most of you are morons.
Just to be clear, unparseable XHTML is not XHTML. In "Matrix" terms, there is no web page. Instead, there is a string of text that may resemble XHTML to the casual observer but that doesn't really represent anything at all.
Arguing that browsers should half-support broken XHTML is like saying that a C compiler should do something whenever it encounters invalid C, since the user obviously wants to run the code and isn't interested in bowing to the pedantic demands of some irrelevant standards committee.
One is rather more important than the other in this context.
I agree completely, but I don't think it's the one that you picked.
Dewey, what part of this looks like authorities should be involved?
This could be the greatest Mozilla stability enhancement tool yet seen!
http://rocknerd.co.uk
Not to mention the horribly annoying bug that hasn't been fixed since 0.8 that renders Slashdot (HTML 3.2?) incorrectly. You'd think they'd want to fix that bug since a lot of their users read Slashdot.
1. Get it compiling on your system.
2. See if you can help with a bug that's in the system already (a crasher or even a misrendering).
3. Find the Gecko hackers and pick their brains.
http://rocknerd.co.uk
Or does it seem rather amusing that IE, a poorly written browser with many security holes that *brilliantly* links into the OS (not), allows poor code not to crash it...
Then again... properly written code it seems to have problems with.
Oh well.. I guess that's what you get.
The price is always right if someone else is paying.
Jon Postel said it best: "Be liberal in what you accept, and conservative in what you send." A web browser that crashes due to invalid HTML fails this test. Execution must stop at once... yes, but the program should handle the error. A program that crashes is the Operating System handling a program with bug. See RFC 1122 for more details about the Requirements for Internet Hosts. Section 1.2.2 about the Robustness Principle explains better than I can why you're wrong.
I think a lot of people are missing the point here. I don't have time to personally verify whether the author's claims are correct; let's assume they are. The type of errors he saw are potentially the type that could be exploited via the tried-and-true buffer overflow method. This is the kind of thing that leads to "execution of arbitrary code", in other words, 0wnage. If Bad Guys craft a apecial web page, they could target those of us who use these non-IE browsers.
In other words, the people who have been defending IE (and Microsoft in general) by saying "your Mozillas and Operas have been safe from security problems only because nobody uses them" will not only have a field day, but now the clock is ticking. There's proof that there are promising means of attack against these browsers. Someone is surely going to research this. I really hope the good guys developing these browsers rise to the challenge and tighten up the code before we (the people who have been recommending them for years) start losing credibility. I'm inspired to look into helping them out.
Sorry if this seems incoherent, I keep getting interrupted as I type this. Stupid work...
Microsoft's is called JScript, not Javascript.
So that'll be why it calls it when I use a url of the form "javascript:[code]", or a tag like "<script language=javascript>" then?
MS's software all internally understands that this language is called Javascript. It's only MS marketing that decided to use a different name for it.
I ran into similar issues with IE ignoring stuff and Mozilla catching it a few years ago when I was developing one of my first web applications with servlets. Mozilla was a great browser for testing my web apps during development. I remember I had a bug in my code that was supposed to populate a dropdown with options from a database but instead choked somewhere in the model layer and populated it with a bunch of breakspaces. Mozilla would show me the space characters in the dropdown where IE simply ignored them and pretended like the dropdown was empty.
It's a real pain in the neck for IE to not try to show what's actually there because when I first looked at the page in IE I assumed that I just wasn't getting what I wanted from the database since my dropdown was empty. In reality it wasn't empty, IE just didn't want to show me 1000 breakspaces as an option in my dropdown which is bad from a developer's standpoint. However, masking and hiding bad code and data is something that I absolutely want a browser to do when the application is out in production being used by my clients.
The bottom line is you should always develop your web applications with a browser like Mozilla that is going to catch your mistakes but once your application is out the door it's better for clients to be using a broswer that will hide any mistakes you didn't catch!
Netscape made the first widely-used browser. Netscape's browser accepted non-conforming HTML all over the place (perhaps because many HTML files were hand-crafted).
IE came later. IE was forced to maintain compatability with all the non-conforming stuff that Netscape introduced; otherwise, the user-experience was "NS works; IE doesn't". And, having worked at the time on yet a different browser, let me tell you that it was a huge pain to try to even figure out the exact language that Netscape accepted, and what the exact semantics of all the non-conforming HTML that it accepted were. I don't know how IE managed to get so much of it to match. (It's hard enough to write to a spec; but when you've also got a, well, white box that you're trying to match the behavior of, good luck).
So, all this business about how it's all a MS plot may be fun to claim, but it doesn't match up with the reality of history. (OK, ok, this is not to say that once IE gained the lion's share of the market that they didn't pull the sort of trick being suggested; but they sure weren't first.) The world would be a much better place today if Netscape had been rigid in the language it accepted; by the time IE came on the scene, it just didn't have the option to enforce strictly-conforming HTML.
You don't need to be paranoid. Even if there was no evil plan, rendering broken HTML correctly is a trait that benefits the survival of the browser. Just as rendering standard code is a beneficial trait for HTML-authoring tool.
Mozilla, Opera, Safari and the rest are simply less suited in this respect. They can argue as much as they want about their adherence to standards, but in the end they must learn to display malformed HTML/CSS/JS correctly or fail miserably (because some pages will only work in IE).
Future Wiki -- If you don't think about the future, you cannot have one.
I wonder how much of IE's "stability" is because of good code and good parsing, and how much of it is due to the fact that IE didn't *know* it's eggs were scrambled?
It wouldn't be the first time that a program went on "running", only to *eventually* go out to lunch and not come back.
At least the others *noticed* that something was wrong - and died.
You aren't remembered for doing what is expected of you
It mentions there is no scripting or stylesheets in this test--just random HTML tag soup. So yes, there was a very defined, arbirtrary range established. The results are disappointing for IE competitors for sure, but given IE is at major release SIX and at the end of its life cycle, while the likes of Thunderbird are for less mature (barely at 1.0 and at the early stages of its life cycle) I would expect and demand no less than perfection from IE.
Given the arbitrary limits on this test, it appears to be designed specifically to make IE look better than its competitors and prove some point rather than be an objective investigation. It is well known that the most serious problems in IE are with scripting and CSS support being unstable, broken or incomplete. A similar test should be conducted of IE should be done with these included. Kudos to the author of the bugtraq entry for doing this kind of testing, but I don't think the editorial commentary regarding the amount of testing of these browsers or their attention to security is warranted or productive.
The author freely admits he did not seriously analyse the source code for the root cause of these crashes (and in the case of IE, he cannot do so even if he wanted to--but that doesn't stop him from proclaiming it as superior quality). He also provides no evidence that these bugs compromise security in any way beyond consuming system resources, so it was not exactly appropriate to attack their security abilities without further study.
As to the jibe about lack of testing...Many of these alternatives are open source projects, not yet at official 1.0 release yet people! Being open source, the whole point of exposure is to get many eyes looking at the code, and get people involved in improving the code. He seems to know a great deal about programming so I suggest he volunteer some of his spare time to the Mozilla project to make things right, if he is indeed THAT concerned about the issue.