PHP Application Insecurity - PHP or Devs Fault?
somersault asks: "There have recently been a lot of people making jokes at the expense of PHP, but how many common security flaws in PHP are the fault of the language, and how many the fault of the developer? A recent Security Focus article (via the Register) has a brief discussion which suggests that PHP is no less secure than any other scripting language, and that it is the users of the language themselves who need to be educated. The other side of the story is that the developers of PHP should work on tightening up the language to make it more 'idiot proof' by default. Should the team developing PHP take a more active role in controlling the use of their language? What will it take to ensure that users of the language learn to use it securely, short of defacing every vulnerable website out there?"
Saying that it's the programmers' fault for writing bad code is like saying being injured is the fault of a lumberjack for not knowing how to use a chainsaw which is dull and jerks a lot. It's much better to start with a tool that prevents such mishaps rather than being unsafe by default.
Want to improve your Karma? Instead of "Post Anonymously", try the "Post Humously" option.
Take 100 programmers selected randomly, and instruct them all to write a given application, but have 20 of them write the code in PHP, 20 write the code in Python, 20 write the code in Java, and 20 write the code in C++, and 20 write the code in Perl. Then analyze the resulting code.
http://outcampaign.org/
mysql_escape_string and mysql_real_escape_string should both work (assuming you're using MySQL, anyway), but the former is deprecated as PHP 4.3.0 in favor of the latter; it also does not respect the current character set setting.
If you looked at the documentation for addslashes, though, it will tell you nice things like An example use of addslashes() is when you're entering data into a database even though there are special characters that it does not escape that can be used for SQL injection.
My beef with PHP is that it's full of junky functions like mysql_escape_foo() in the core distribution, main namespace, which don't even have a hint of data verification in 'em. I hear there's a neat database abstraction layer in PEAR, it even has prepared statements. But I'll wager there are plenty of PHP developers who haven't even heard of PEAR. Somehow, though, Perl seems to have managed to put together a decent standard distribution without this sort of mess...
The World Wide Web is dying. Soon, we shall have only the Internet.
Is it really that PHP makes it that hard to be secure, or that it makes it easy to do whatever you want, thereby allowing a lot of lazy people to take the easy route? I think the developer (writing code in PHP, not necessarily the developers of PHP) have to take responsibility for the things they write. If you're trusting user-entered data without escaping it and verifying its validity, shame on you! If you're doing other silly things that make it possible for people to h4x0r your systems, that's also largely the fault of the person writing the offending web application. I have nothing against making PHP more secure, but what does this entail? Not allowing you to do the things that make PHP flexible and fun to work with? I think the resulting language would be about as useful as safety scissors.
Agreed: developers should absolutely take responsibility for the code the write.
And people should take responsibility for the cars they drive and the pollution they create.
Of course, it would seem to me like a lot of people believe that there's a certain social value in asking the producers of cars and heavy equipment to improve the quality of their products.
As with anything, one should select the right tool for the job they are trying to do. If you need to write a complex site, pick a tool that allows you to do things that are more complex. Of course, doing so means you need to be aware of what that complexity means and take responsibility for the increased risks.
However, PHP sells itself as the "easy to learn", "user friendly language" of the web.
As someone noted earlier in the thread, user friendliness often times includes safety matters. Sometimes, safety scissors are warranted.
The problem here, is that if you cannot depend on the framework for SOME stuff, why are you even using a framework? Thats like if in Java or .NET you had to constantly worrie about memory leaks (you actualy do, to some light extent, but thats beyond the point), then when someone complained about the framework not handling them, people would go "dont blame the framework, blame yourself!". The framework is supposed to handle these things.
.NET 1 and 1.1 had a very well known flaw of this kind. The datagrid, when a column was configured as invisible, would still render the HTML for the data in that column, but simply not display it. This allowed the data to be seen in the source, but not on the actual page. This lead several developers to hide columns to have secret data in memory to work with on the server side, thinking the user had no access to it. Of course, a GOOD programmer would think of that and use a different method to hide the data securely. That doesn't change that it was an insecure and poor design choice in the .NET framework, and it was fixed in .NET 2.0. So yes, the framework was to blame. Same with PHP's issues. And they are severe. The community however, make them 10x worse than they should be.
The absolute worse thing ever in PHP is how until recently, SQL injection could happen because there was incredibly poor prepared statement support. Good frameworks encourage the use of prepared statements to the extreme. It was possible to use in PHP4, but certain extras had to be added, and it was rare to hear about them in tutorials, etc (thus the blame was also greatly on the community). This, along with the far too common default setting of mapping post variables to variables directly were major things that I definately think CAN be blamed on PHP and its community.
I dont mind prepared statements for when they are usefull, but they dont always work properly. And actually there are many cases where using them you actually lose power. Lets start with a simple example of the LIKE clause :
SELECT * FROM titles WHERE notes LIKE ?
For the unfamiliar, like clause allows me to do partial searches over strings (char/varchar in the sql world). The LIKE clause search string syntax is something of a simplified regular expression. This means that characters that usually have one meaning gain another one. For example the percentage sign becomes a wildcard (think dos/bash filename matching with '*', or regexp with '.*'). For example, all string starting with 'word' we would just search for 'word%'. Great, but how does prepare/binded statement know if the given percentage is to be escaped or not. It doesnt. So you end up doing own user parsing. You are back to square one. You need to still parse user input, so whats the point of binded/prepared statement? Another example is using power provided through fulltext index. Generally, string searching is slow. In SQL world we do an index, a cache to speed up looking. Strings have indexes, but that only speeds up searching for string that start with something (like in above example LIKE 'word%') but what if we want to search for something purely inside the string ?? then we could do LIKE '%word%' but thats slow, on the other hand, we could speed this up by various smart caching and indexing of the contents of the string. This smart indexing we call 'full text'. For example to see if a column contains some word or phrase we could just do
SELECT * FROM myData WHERE CONTAINS (column, ?)
all ok, right? NOPE, because it also could be :
SELECT * FROM myData WHERE CONTAINS (column, 'FORMSOF (INFLECTIONAL, ?)')
To explain slightly, the second examples tries to find words that are not exact, but very close. So for word 'good' another word 'best' could be used as an alternative (with a lower relevancy ranking). Great power?? Yes, but the first time the sql expects the query in the form CONTAINS ( notes , ' "word" ') notice single and double quotes while later its CONTAINS(notes, 'FORMSOF (INFLECTIONAL, word)') notice, no quotes allowed...
and dont even get me started with the
SELECT * FROM myData WHERE column IN ( ? )
The IN clause is a speed over a series of OR statements. I could write WHERE column = 1 OR column = 2 OR column =3 or I could just do it with WHERE column IN ( 1,2,3) . And now the question for the binding gurus. How do I do it with prepared statements ?? Do I create a loop and both generate the SQL and fill a flat array with the right amount of paramenters WHERE column IN ( ? , ? , ? , ? ) , or do I just send arrays within arrays.
SECOND : parameter binding through naming :
cant wait for when parameter binding can be done in a templated fashion, so that no longer order of the columns matters, currently the way you fill prepared statement with data matters by order of the data. It all should be done with associative arrays.
$sth = $__db->prepare ( "select * from myData where cond1 = ? and cond2 = ? " ) ;
$res =& $__db->execute ( $sth , array ( $userInput1 , $userInput2 ) ) ;
it should be done more like
$sth = $__db->prepare ( "select * from myData where cond1 = ?userInput1 and cond2 = ?userInput1 " ) ;
$res =& $__db->execute ( $sth , array ( "userInput1" => $userInput1 , "userInput2" => $userInput2 ) ) ;
There is no special need to input more -- if you want, use the first method just pass non associative array, and library should know to handle param binding in old way -- but for any larger querry, with dozens of parameters, this will be a big boon in readab
I used to work with the Zend team and they seem determined o pander to the least common denominator of hobbiests and not allow the language to grow up. Things like nested classes and strongly types variab;es which should have been implemented in the latest version are strongly fought against. They things as well as other would help enforce good coding standards. But I have been told by the Zend developers themselves that they like to leave it up to the developer to code badly and to me that makes the language just as much to blame. I think the industry has established by now what are good programming habits and methodologies and what aren't.
This is my sig. There are many like it but this one is mine.