PostgreSQL 8.1.4 Released to Plug Injection Hole
alurkar writes to tell us that PostgreSQL released version 8.1.4 today in order to combat a security flaw allowing a SQL injection attack. From the article: "The vulnerability affects PostgreSQL servers exposed to untrusted input, such as input coming from Web forms, in conjunction with multi-byte encodings like (Shift-JIS (SJIS), 8-bit Unicode Transformation Format (UTF-8), 16-bit Unicode Transformation Format (UTF-16), and BIG5. In particular, Berkus says that applications using 'ad-hoc methods to "escape" strings going into the database, such as regexes, or PHP3's addslashes() and magic_quotes' are particularly unsafe. 'Since these bypass database-specific code for safe handling of strings, many such applications will need to be re-written to become secure.'"
whitelisting, not blacklisting, is a good idea. Stop trying to define a set of 'wrong' data. Define a set of good data.
Most of the PHP apps I've ever had the (mis)pleasure to peruse make liberal use of this type of "escaping" rather than calling the provided "escape_string" functions. That never made any sense to me, but the practice appears to be quite common.
Mismatches between different character encodings seem to have been responsible for vast swathes of security vulnerabilities over the past few years. The sooner everybody moves to programming languages and software that use Unicode natively, the more secure we will all be.
Unfortunately, the languages receiving the most attention for web development have abysmal Unicode support. PHP and Ruby haven't a clue, although the next version of PHP is supposed to be much better in this respect. Python developers can at least handle things fairly well, although it's still a bit of a pain in the neck.
This vulnerability is probably going to cause quite a few problems for people, as it's a client issue that will probably need whatever adapter you use to be updated. Here is the user guide to the vulnerability for PostgreSQL. psycopg should be fixed shortly.
Bogtha Bogtha Bogtha
heh, heh, heh... I'll plug your injection hole, baby!
By the way, the dangling reference to a quote by one "Berkus" should be attributed to Josh Berkus.
Don't piss off The Angry Economist
Has anyone else found that Suse is really, really slow in releasing updated Postgres binaries? Are they tied to SLES releases? Anyone know anything?
I know I'll probably get a million flames telling me to compile from source, but I'm not really that fond of supporting my own compilation job on a production server.
Must....not....make....joke....about...injection hole...being plugged...
Damn, too late.
=\
That's why I prefer Postgre. Oh, wait...
Oracle and MySQL suffer from similar vulnerabilites when going UTF8 -> database charset. The "answer" in Oracle is to use UTF-16 on the backend and a select 8/16-bit encoding in the front end if you want to support multiple locales. I'm not sure what the implications are for MySQL.
THIS THING CAN TURN ON A DIME, MACROSSZERO STYLE ALSO FUCK BETA, ~NYORON
If someone wants to try and help me with this problem, let me know, but I had trouble getting mysql to insert (actually doing COPY since it's about 10x faster) anywhere near what I can get with PG8.
I can understand how SJIS and BIG5 are vulnerable.
But in a UTF-8 string, no single byte will match a single quote besides the single quote character (0x27).
It seems to me that simply inserting a backslash before every single quote and backslash in a given string will have the desired effect, and that UTF-8 is not particularly vulnerable to this problem. (quite by design- it was invented by none other than Ken Thompson)
Either that article is misleading somehow, or else the postgres developers are simply putting in some safeguards for common errors in things such as php scripts.
Only up to 8.1.3 were listed here as we composed this:
:-/
http://www.postgresql.org/download/btlist
Oh, and it would be gerat to have just ONE torrent to d'load, eg, per platform.
Alternatively, create an All-In-One ISO (preferably CD-ROM set -and- a DVD ISO)
(Help us to save you bandwidth...)
"Remember: It isn't released until its torrents are released"
I've only recently begun playing with PostgreSQL coming from Oracle. I've also been primarily a Java (JDBC) guy for the last couple years. I'm not sure I completely understand where this vulnerability lies. Would a Java PreparedStatement be vulnerable to this? Would the Postgres implementation of JDBC use 'addslashes()' to bind variables in a prepared statement? Or is this a higher level function? (I have not come across it myself, but like I said I'm still pretty new to Postgres).
I guess I see "affects PostgreSQL servers exposed to untrusted input, such as input coming from Web forms" and wonder if they're talking about some further functionality where postgres acts like a web server. My understanding of PreparedStatements is that they are bound at a very low level in the db to allow for maximum speed through caching etc...
PostgreSQL defaults to SQL-ASCII encoding, which is unaffected by this particular attack. Only clients which connect using a multibyte encoding would be affected.
Actually, this really isn't a vulnerability in the database server itself -- the update just intentionally breaks certain badly written applications in order to protect them from themselves. If PHP's addslashes() ends up creating valid multibyte characters that produce unexpected behavior, that's really PHP's problem -- Postgres is just doing what it's told.
Multi-Layered validation is the only way to go.
Client validation is only useful for round-trip bandwidth reduction, it's nice to have, but not secure in any way. It can stop the occasional accidental bad input. (e.g. entering strings when numerical data is called for, pop up a message box telling you not to do that), it won't stop anyone really interested in corrupting your data.
The app server should be validating everything being posted to it. Is this string too long, too short, not a string, wrong encoding, etc...
The DB server should ALSO be validating everything coming from the app server. Don't trust your application server, it could have a bug, it could have been hacked, it might not be your app server, who knows. Strict stored procedures with no r/w access to tables is a really the only way to go. (To: My Co-Workers, Using select * queries and running as dbo and/or sa is usually a sign that you're not doing it right)
Yes, it's paranoid thinking, yes, it's more work and yes, there is a slight performance hit, but it is secure and it's damn hard to break.
People who don't use prepared queries doesn't deserve any better than having someone to fuck up your database!
Often such are combersome or impossible with dynamic query generation, such as Query-by-Example forms where the terms and sort options depend on user input.
Many "prepared" thingies also depend on positional parameters, which can get messy. Would you like to use and maintain a function with 19 positional parameters? That is what it can feel like.
Further, certain kinds of prepared statements seem to have the same flaw. The PS api may "check" the types, but still passes a string to the database such that if you can find a way past the syntactical type checkers, you can pull off the same thing.
I do wish that most database API's offered a "read-only" mode such that the query being sent to the database is designated read-only. That way the worse a hacker can do is grab table data but not change anything. If you have logins restricted only to necessary tables, then the damage exposure is further limited.
Table-ized A.I.
PostgreSQL ignored invalid UTF-8 sequences, meaning a ' character at the end of a incomplete sequence could cause only one ' to be seen by the parser when escaped.
See http://www.postgresql.org/docs/techdocs.50 for the details.
dtach - A tiny program that emulates the detach feat
... because counting out 500 question marks to figure out why the hell your parameters don't match up is MUCH more fun than being paged at 3AM because the entire production database was wiped out.
If I have been able to see further than others, it is because I bought a pair of binoculars.
You can always use an 8-bit database charset and then use nchar or nclob columns when you want UTF-32 or UTF-16 support. So then your web-app or whatever has to be consciencious where non-ASCII is allowed so it gets converted/stored properly. Then it's only wasteful if the majority of the database content is localized/user submitted latin text. But uh, the case they were going on about concerned far-east locales where multi-byte is a must so I don't think you'd have much issue using a fixed-width encoding if that's your audience.
THIS THING CAN TURN ON A DIME, MACROSSZERO STYLE ALSO FUCK BETA, ~NYORON
Pure BS. There is no way in hell postgres can come close to 35k inserts/s. To get 35k inserts/s you would need a high end cluster system ... running something other than postgres.
Well, I was thinking that it might be possible if you bunched enough of them into the same transaction, had sufficiently small updates, used a single prepared statement over and over, turned fsync off, and had HUGE amounts of memory, then, perhaps?
Not that I would want to run 35k/s inserts with fsync off...
LedgerSMB: Open source Accounting/ERP
I really meant at the API level (specific queries), not necessarily at the user level.
The API would have to tell the database that a given query is to be read-only so that if a hacker sticks a DELETE in there, it will be rejected. But, there is no semi-standard mechanisms for such that I know of.
Table-ized A.I.
I think he means per-query permissions, so he doesn't have to use a separate user for select and insert/update queries.
Be wary of any facts that confirm your opinion.
Well, I was thinking that it might be possible if you bunched enough of them into the same transaction, had sufficiently small updates, used a single prepared statement over and over, turned fsync off, and had HUGE amounts of memory, then, perhaps?
./81/bin/psql test
You don't need to do all that.
The command:
time ruby -e 'puts "BEGIN;"; for i in 1..35000 do puts "INSERT INTO a VALUES(#{i});" end; puts "COMMIT;"' |
Gives me about 7-8 seconds with write caching off, fsync on. Yes, I turned off write caching with hdparm. This is on my PC, with an el-cheapo IDE drive and a modest 1GB of RAM.
If you think about it, why should it take a long time? It's all one transaction. It's basically just writing the data to a file (the log). The table "a" is just an int field, obviously. And with postgresql's MVCC, an INSERT is very cheap.
Social scientists are inspired by theories; scientists are humbled by facts.
so, I even prepare those queries. Whats wrong with that. You are the programmer, you have to know what can come in and how to protect the system from wrong input.
Weak excuses for not doing it right.
"Freiheit ist immer auch die Freiheit des Andersdenkenden" - Rosa Luxemburg, 1871 - 1919
'He also notes that the addslashes function was deprecated in PHP 4.0 due to security risks, but a "distressing" number of PHP applications continue to use the function.'
How come the php documentation doesn't mention this?
Security is a low priority.
/>
;).
:).
Years ago I proposed to the HTML and browser people that there be a "no-active" html tag, that marks enclosed content as nonactive.
Example:
<safetyon lock="randomstring" allowed="keyword1,keyword2,keyword3"
potentially evil content from uncontrolled party - e.g. comments
<safetyoff lock="randomstring"/>
keywords could be "textonly" "basic-html" "java" etc.
It seems everyone is more interested in "GO" tags. And nobody wants a "STOP" tag.
It's like having a thousand accelerator pedals and no brake pedal. The only way to stop is to make sure none of the accelerator pedals are pressed
Maybe I should try again
It's been a while since I've done database stuff (pretty much Oracle), but, how about:
set transaction read only
or
selecting off a database view. we used to do this when we wanted to enforce read only. i know nowadays there's updatable views, but i imagine there's a way to use the old behavior (i'm just too lazy to look it up).
Dude, you're most probably using the wrong database. Financial data or billing info (or otherwise telecomm industry?) I presume.
Consider kdb+ from Kx systems. Properly used, it can do ~1mil rows/second, with multiple live hot failovers. It's pricey (several hundreds of K$), especially compared to MySQL/PGsql, but might be worth your buck.
There is a way to solve SQL injection problems: Disallow text literals. Or even, disallow literals (including numbers) at all. This could be a setting in the database that is on by default, and only off for certain applications (ad hoc query tools) or users (admins). What do you think about that?
I'm thinking about implementing this feature in the database I write (http://www.h2database.com/):
This would be a persistent setting, and only an admin can change it. But, maybe this is the wrong place to ask for comments on this?(Of course there are other security risks, like using 'customer id' in URL or hidden fields in a web application. Or relying on Javascript data validation. But I don't know what to do about those problems.)
CDATA sections already do that. But you can't rely on CDATA, because not all UAs support it.
Hi,
PDO and PEAR::DB both provide ways of doing this under PHP.
See http://pear.php.net/ and http://www.php.net/pdo for examples.
David.
The Ginger Dog
You're wrong. My company has a huge legacy FoxPro database, but for performance reasons we copy most of its tables to PostgreSQL as an hourly cron job. One of those tables is just over 7 million rows and I can copy it into PostgreSQL is slightly less than 5 minutes. 7071473 rows in 287 seconds yields 24639 inserts/s, and this is on a production server answering other queries at the same time.
PostgreSQL 8.1.3, FreeBSD 6.1-STABLE, single Xeon, 3GB of RAM, one SCSI-320 drive, very little performance tuning. If we can hit 25K on that box, I have no doubt he can sustain 35K on beefier hardware.
Dewey, what part of this looks like authorities should be involved?
Here's the keys to it:
Increase src/include/pg_config_manual.h BLCKSZ to the max (32768)
Use COPY not INSERT (faster than transactions for the inserts too)
Postgres isn't threaded so even though it's a dual proc dual core 252, I am still cpu bound. I really need some sort of lazy index/index chunker that can run in parallel to utilize the multiple processors more efficently so I can have more than a single index on the table.
> Postgres isn't threaded so even though it's a dual proc dual core 252, I am still cpu bound. I really
;-)
> need some sort of lazy index/index chunker that can run in parallel to utilize the multiple processors more
> efficently so I can have more than a single index on the table.
Actually, if this is a critical part of your database, you're probably using the wrong product: postgresql is a great product, but this is definitely its weak spot.
For something like this you're far better off with db2 or oracle. My preference is db2 since it lacks a larry ellison and can be far cheaper than oracle. What you'd get with db2 for example is:
- query parallelism: work split across all cpus
- load parallelism: loads split across all cpus
- partitioning: rather than using btree indexes that won't work when you're selecting more than 5-15% of the data, partitioning works linearly when you're selecting 2%, 20%, or 80% of the data.
- query optimization: better query planning means your queries need less tweaking to run the way they should
- fine-grained memory control: to allow you to get the most out of memory, which is very valuable on databases like this. Sort memory, tablespace caches, etc
- automatic summarization: db2 can automatically update summary tables for you
- query rewrite: db2 can automatically rewrite incoming queries to hit summary tables when it makes sense
DB2 Express is free for small systems. I think the limit is 2 cpus & 4 gbytes of memory, with no limit on data size. Beyond those limits you can get workstation edition which is (going from memory here) $1500/CPU list price plus $250/named user. If you want it on the net then it's $7500/CPU for unlimited users. That's list, and you can probably get 50% off.
The only disadvantage with db2 in this configuration is that you'd want to use MDC for partitioning, which is slower to load than a regular table. You'd probably hit just 25-35k rows/sec with mdc. Right now DB2 viper is in beta, that allow you to use MDC, range partitioning, or hash partitioning. All in combination. This means that if you really wanted to scale up you could spread the data across a dozen two-way boxes with each one partitioning its local data. This isn't cheap to do (licensing db2 this way is expensive), but if you wanted to get adhoc queries against a 100 million row table returning results in 1-4 seconds - this is the way to do it.
Oracle is similar, just 2x as expensive, more knobs to twiddle, better skills & third-party availability, and highly unpleasant sales staff to deal with.
Of course, Netezza might also hit the spot using postgresql. It wouldn't have all the benefits of db2 or oracle, but has some. And not sure if they have a lot cost entry-level product either.
Postgres also supports "set transaction read only". Thanks for mentioning it, I never knew that it existed.
Censorship is telling a man he can't have a steak just because a baby can't chew it. --Mark Twain
In the Python DB-API, SQL strings look like:
You create a dictionary (hash table) with a key "baz", pass that dict to the database along with your query, and it fills in the blanks. Your job as the programmer is to make sure that dict has all the keys in it to complete the query; it doesn't matter which order you assign them or if you don't use them all.
In fact, a very common case is to create on dict with all the values you'll need to execute a whole list of queries, and just keep passing the same dict rather than redoing it each time:
It's about as easy as you can possibly make it and has no disadvantages that I've ever encountered. So, I'd take the position that it's better to protect the server and forget about old ideas like positional parameters. There are extremely programmer-friendly solutions to this problem if you know where to look.
Dewey, what part of this looks like authorities should be involved?
Imagine if I started throwing out numbers for selects with bitmap scans - their little heads would explode. :-)
Seriously, "PostgreSQL is slow" is about as current as "Slowaris" and the "17 minutes to copy a file" Mac troll. None of them have been true for years, if they ever were.
Dewey, what part of this looks like authorities should be involved?
Probably he is talking about something like MySQL's LOAD DATA INFILE. In MySQL, inserting data into a table directly from a delimited text file, without using SQL statements, is just as much fast, if not faster.
$query = $pdoInstance->prepare('INSERT INTO myTable VALUES ( :foo, :bar:, :val );');
$query->bindParam(':foo', $foo);
$query->bindParam(':bar', $bar);
$query->bindParam(':val', $val);
$query->execute();
I don't see how that could be any less clear.
Wow, I need to drink my coffee. More clear. More clear... To think I even previewed and reread it.
I do wish that most database API's offered a "read-only" mode such that the query being sent to the database is designated read-only.
This is easy to do. Create a schema with SELECT-only privileges on the DB objects. Use that schema when connecting to the DB via your API for read-only actions.
If there's other circumstances where you need other DML commands to work, create separate schemae for them. If your data integrity is that important that you need every possible defense against SQL injection attacks, the overhead of closing and reopening connections to the DB occasionally is a small price to pay.
Nice idea, although there's plenty of danger even in read-only-land. For example, the example at http://shiflett.org/archive/184 is of an attacker who tricks a login scheme into letting him log in without a password, and the query there is merely a read.
I agree that prepared statements can be cumbersome, but they're clearly much more the Right Thing to be doing than constructing ad-hoc queries by hand, with random strings pasted between single quotes.
Your mention of "Query-by-Example forms" is apropos. Anybody know how to do those Right?
You still need to speed that up by about seven times to get 35k/s inserts. This is why I suggested turning fsync off would be the only way to make this happen.
However, this would be limited to bulk imports of data that didn't have constraints. For example, bulk imports of numbers from experimental measurements into a temporary table so that they could later (at one's convenience) be loaded into proper tables.
Yes, I can imagine real-world applications for such rapid small inserts, but in general, they are few and far between.
LedgerSMB: Open source Accounting/ERP
Right now DB2 viper is in beta, that allow you to use MDC, range partitioning, or hash partitioning. All in combination. This means that if you really wanted to scale up you could spread the data across a dozen two-way boxes with each one partitioning its local data. This isn't cheap to do (licensing db2 this way is expensive), but if you wanted to get adhoc queries against a 100 million row table returning results in 1-4 seconds - this is the way to do it. ;-)
Greenplum is doing something similar using PostgreSQL as a base:
http://www.greenplum.com/products/bizgresMpp.php
You still need to speed that up by about seven times to get 35k/s inserts.
I was trying to show that it's not out of the question. I don't have particularly good I/O on my machine, and I have a slow processor. The disk does not need to be written until COMMIT time, and it can be written sequentially (thanks to the write-ahead log). If there are several processors and a connection per processor, I could imagine that this type of benchmark could approach the write speed of the disk.
INSERTs aren't expensive in PostgreSQL. They can be if there are a lot of constraints and indexes, but in general they aren't. That guy could have been telling the truth. The poster who accused him of BS probably assumed that he meant 35k transactions/sec. 35k transactions/sec sounds unrealistic to me, since PostgreSQL can only COMMIT once per disk revolution (assuming "normal" non-RAID hardware).
Social scientists are inspired by theories; scientists are humbled by facts.
I thought the WAL was written before the tuple. THis would mean that the disk would receive a new write instruction for every insert, and if there was any delay in processing it, you would have to wait for another revolution. Thus it would be worth seeing how enabling write caching would affect your performance here. That might tell you if a battery-backed cache might be good enough to obtain these sorts of numbers.
LedgerSMB: Open source Accounting/ERP
I thought the WAL was written before the tuple.
The way I understand it, the tuple is written first at the end of the file containing that portion of the table in question. But it is written asynchronously, meaning that it may stay in the OS Buffer Cache.
When the COMMIT comes, PostgreSQL writes sequentially to the WAL, but this time it writes synchronously.
That way, if there is a power failure, the data pages in the buffer cache holding the tuples may be lost, but the WAL contains enough information to reconstruct any committed transaction.
THis would mean that the disk would receive a new write instruction for every insert
No, often INSERTs will stay in the OS buffer cache until COMMIT time.
Social scientists are inspired by theories; scientists are humbled by facts.
Typically I've not allowed people to use the value field (i.e. where they specify the criterion) to also specify the operator for that comparison. I keep a separate drop-down with non-mathematical and non-SQL names that I check against a whitelist of valid operators and then translate them into SQL.
.. I'm surprised with all the security-testing companies making such heavy use of sql injection to butter their bread that people are _still_ using non-prepared statements. Very scary!
As an example, for an age field i have the following operators: equals, contains, less than, greater than, etc. The "value" portion of each is identical to what's displayed on the pull down. However there's an internal method that maps "less than" to . No other operators other than those on the whitelist are allowed.
Once all that legwork is done I actually construct a full statement using helper methods that is full of your friends and mine: placeholders. Create a list of bind parameters automatically and then call prepare and execute (perl style for this example).
This has the benefit of make it safer at least from my pov. Any suggestions on improvement are very welcome!
Also a final point: I've never even thought of using non-prepared statements
$x = ($x * 10) % 10 >= 5 ? 1 + int $x : int $x
You are correct-- the WAL entries are flushed at the end of the transaction.
LedgerSMB: Open source Accounting/ERP
Really the weakness comes more from the application tiers which don't support good authentication mechanisms. The constant use of a single DB login for efficiency, while giving up all of the powerful data protection tools is the real problem.
One of our production systems averages around 500/s, with peaks at over 3000/s, with roughly that sort of table structure... it's a dual Opteron, H/W RAID10 across two channels, 6 disks and 4GB of RAM and a separate SATA WAL drive. Thing is, it's not just an insert, it's a call to a stored procedure that does some data hygiene, then a select and then either an insert or update depending on the result of the select. Very fast, very happy!
The attacker could close the CDATA tag, whereas with my proposal, the attacker has to supply the correct "randomstring" in order to reenable normal behaviour ("default allow").
Given a suitable "randomstring" the odds of an attacker successfully closing the safety tag would be exceedingly low, even with future likely tags and browser features.
The current situation is: if someone adds a new tag or browser feature, it is less likely that you'd be blocking that by default, and that is _expected_ behaviour.
Whereas with my proposal, it is actually a bug in the browser if it doesn't restrict the enclosed content to only the allowed stuff (e.g. safe-html (no links or images) or even text only).
It doesn't have to be specifically the way I mentioned but there should be a way to disable "default permit".
I'd actually prefer the tags to be something like: <safety lock="..."> and </safety lock="..."> , however someone said that's not proper HTML/XML - closing tags aren't supposed to have extra bits?