Slashdot Mirror


New PHP Interpreter Finds XSS, Injection Holes

rkrishardy writes "A group of researchers from MIT, Stanford, and Syracuse has developed a new program, named 'Ardilla,' which can analyze PHP code for cross-site scripting (XSS) and SQL injection attack vulnerabilities. (Here is the paper, in PDF, and a table of results from scanning six PHP applications.) Ardilla uses a modified Zend interpreter to analyze the code, trace the data, and determine whether the threat is real or not, significantly decreasing false positives." Unfortunately, license issues prevent the tool in its current form from being released as open source.

16 of 66 comments (clear)

  1. Fixed it for you by techprophet · · Score: 4, Informative

    New PHP Interpreter Finds XSS, Injection Holes

    Fixed it for you.

  2. Find X? by eldavojohn · · Score: 4, Funny

    New PHP Interpreter Findx XSS, Injection Holes

    New PHP Interpreter Finds XSS, Injection Holes

    Fixed it for you.

    Clearly the title was trying to illustrate the PHP interpreter's ability to solve the pythagorean theorem.

    --
    My work here is dung.
    1. Re:Find X? by eldavojohn · · Score: 5, Funny

      Clearly the title was trying to illustrate the PHP interpreter's ability to solve the pythagorean theorem [mit.edu].

      I don't need PHP for that! Besides, the pythagorean theorem doesn't have X, just a, b, and c.

      a^2 + b^2 = c^2

      I see you prefer short, nondescript variable names for your algorithms. I pity the person who has to maintain that bit of code. What is a? What is b? What is c?

      I ascribe to a more Knuth-y self descriptive code and prefer the Pythagorean theorem to look more like:

      sideAdjacentToRightAngle^2 + otherSideAdjacentToRightAngle^2 = sideOppositeRightAngle^2

      Or maybe I'm just being a smartass? It's so hard to tell with developers these days ...

      --
      My work here is dung.
    2. Re:Find X? by MillionthMonkey · · Score: 3, Funny

      I ascribe to a more Knuth-y self descriptive code and prefer the Pythagorean theorem to look more like: sideAdjacentToRightAngle^2 + otherSideAdjacentToRightAngle^2 = sideOppositeRightAngle^2 Or maybe I'm just being a smartass? It's so hard to tell with developers these days ...

      Would you want to stare at a wall of code with otherSideAdjacentToRightAngles and sideOppositeRightAngles and sideAdjacentToRightAngles all over the place?

      You could just go all the way and call them II11011I, I1IIOI1I, and II110I1I. At least call one of them "hypotenuse", christ.

    3. Re:Find X? by Haeleth · · Score: 2, Funny

      I ascribe to a more Knuth-y self descriptive code and prefer the Pythagorean theorem to look more like:

      sideAdjacentToRightAngle^2 + otherSideAdjacentToRightAngle^2 = sideOppositeRightAngle^2

      Magic constants?! That's dreadful! How am I supposed to know what 2 is for in that code? And, worse, what if you need to change it to something other than 2? You'd have to change it in three places. You might easily forget one and break everything.

  3. holy smokes batman by sublimino · · Score: 3, Interesting

    From the results paper: "Part of Ardilla's implementation depends on modifications to the open-source Zend interpreter...made (for a different purpose) by a student while he was an intern at IBM. We have since made many more modifications, but since the original small diffs are owned by IBM, we cannot release either those original modifications or our later work that builds on them...It would be valuable for someone to re-implement the original changes, so that we could release our entire system as we would prefer. "

    How would these changes be "re-implemented" - would the code have to be re-engineered, or would a trawl through the original code (patching in changes verbatim) be acceptable? Otherwise, would somebody have to find alternative syntax for implementing the same functionality? Barrel of worms methinks.

  4. Probably for the best by JNSL · · Score: 3, Insightful

    Although it would be nice to be able to use this, I'd imagine there'd be lots of damage following from widespread release of this program without a quick turnaround on fixing vulnerable sites.

  5. Already made one by Norsefire · · Score: 2, Funny
    And mine is open source:

    open( my $code, '<', @ARGV ) or die 'File not found';
    while( <$code> ) {
    if( /php/i ) {
    print "Exploit found\n";
    }
    }

    1. Re:Already made one by BabyDave · · Score: 2, Funny

      /me turns on short_open_tag in php.ini, then cackles maniacally ...

  6. This somehow ... by xmff · · Score: 3, Insightful

    ... reminds me on Perl's taint mode where all external input data is traced until it was explicitly checked through a regular expression or similar.

  7. You are an awful programmer by Anonymous Coward · · Score: 2, Funny

    Same program, just in one line, hence easier to understand: perl -nE'say q(Exploit found) if /php/i' *

  8. Just teach people how to code by loufoque · · Score: 3, Insightful

    Just teach people how to code. When a function or subsystem expects a certain format as a precondition on its input, you actually have to make sure you enforce that precondition (in the case of PHP applications, you simply need to apply trivial conversions such as htmlspecialchars() or mysql_escape_char() depending on whether you want to use that input to generate HTML or XML or to include it into a MySQL request -- this is enough to get rid of XSS and SQL injections completely).

    There would be no need for such tools if PHP developers actually were software engineers rather than kiddies surfing on the web hype that barely understand the tools they're manipulating.

    1. Re:Just teach people how to code by loufoque · · Score: 2, Interesting

      htmlspecialchars converts < to &lt;, > to &gt;, & to &amp; and " to &quot;, simply because those characters have special meanings in HTML and XML and therefore require to be properly escaped. (strictly speaking, converting " is only required in attributes where the value is between quotes itself, but that's the default behaviour of the function to be more general-purpose).
      As you can see, the character encoding of the string is irrelevant here -- assuming it is ASCII-compatible --, since the function only replaces some ASCII sequences by other ASCII sequences. Why the string has an additional argument to handle encoding is beyond me. (to prevent replacements of said characters within grapheme clusters perhaps? Or to handle non ASCII compatible encodings?)

      Of course, handling character encoding is a real issue, but a different one. It's fairly trivial, however: you have to transfer your data in the character encoding that you declared your document was in.

      Maybe you're actually talking of the issue that user agents will encode data not supported by the character set they're supposed to use as sequences? There are different approach on this issue, but the best way is arguably to ask the user agent to send its data in UTF-8. I don't remember any problem with IE6 for that (sure, it ignores the attribute made for that purpose in forms, but it will send the data in the character encoding of the page).

    2. Re:Just teach people how to code by strimpster · · Score: 2, Interesting

      Unfortunately you are incorrect at how easy it is to prevent these issues. In some examples, you want the input to come through as HTML that is allowed to be displayed back to the end users. An example of this is MySpace.com (or even the commenting system here). Do you remember the Samy worm that crawled through their system? The techniques you have given would not have worked. An advanced parser that validates the input is necessary to prevent that (by stripping out the bad portions of the data). I was tasked with creating such a parser for a website I worked on (emerciv.com) to prevent the XSS attacks like that from occurring (and also the problem with invalid HTML that can break page flow). Furthermore, mysql_escape_char is not the industry preferred method of preventing MySQL injection attacks as it still allows some to occur; the preferred method is to use PDO. You might want to study up on those...

      Oh, and by the way, I am a software engineer (finishing up my Master of Science in Software Engineering with a focus on Knowledge and Information Engineering from the University of Michigan's Dearborn campus at the end of the summer and have been asked by the Electrical and Computer Engineering department chair to create new curriculum for the undergraduates in interactive web development, and will be teaching it as well) and I consider myself a PHP developer (amongst other languages) and take offense to that ;)

  9. DarkReading! by jginspace · · Score: 3, Informative
    TFA is just blog spam. See source.

    And I wonder, are the maintainers of schoolmate and webchess now frantically patching their code? None of the articles gives dates - although the PDF is more than 18 months old.

  10. not possible by Lord+Ender · · Score: 2, Interesting

    I agree that it is possible (but difficult) to identify sql injection vulnerabilities with automated code inspection. I do not think XSS can be identified so easily. In a web app, user-submitted text is added to a database. Then who-knows-what happens to it. Eventually, something based on that text is submitted as output, at which time special characters must be escaped.

    The only way to accurately identify XSS in such a scenario is to track the input from the user, into the database, and back out, so that you know the special characters are escaped. That's not something software could accurately do for a general case, without tons of false positives.

    --
    A slashdotter who didn't build his own computer is like a Jedi who didn't build his own lightsaber.