PostgreSQL 8.1.4 Released to Plug Injection Hole
alurkar writes to tell us that PostgreSQL released version 8.1.4 today in order to combat a security flaw allowing a SQL injection attack. From the article: "The vulnerability affects PostgreSQL servers exposed to untrusted input, such as input coming from Web forms, in conjunction with multi-byte encodings like (Shift-JIS (SJIS), 8-bit Unicode Transformation Format (UTF-8), 16-bit Unicode Transformation Format (UTF-16), and BIG5. In particular, Berkus says that applications using 'ad-hoc methods to "escape" strings going into the database, such as regexes, or PHP3's addslashes() and magic_quotes' are particularly unsafe. 'Since these bypass database-specific code for safe handling of strings, many such applications will need to be re-written to become secure.'"
whitelisting, not blacklisting, is a good idea. Stop trying to define a set of 'wrong' data. Define a set of good data.
Most of the PHP apps I've ever had the (mis)pleasure to peruse make liberal use of this type of "escaping" rather than calling the provided "escape_string" functions. That never made any sense to me, but the practice appears to be quite common.
People who don't use prepared queries doesn't deserve any better than having someone to fuck up your database!
Mismatches between different character encodings seem to have been responsible for vast swathes of security vulnerabilities over the past few years. The sooner everybody moves to programming languages and software that use Unicode natively, the more secure we will all be.
Unfortunately, the languages receiving the most attention for web development have abysmal Unicode support. PHP and Ruby haven't a clue, although the next version of PHP is supposed to be much better in this respect. Python developers can at least handle things fairly well, although it's still a bit of a pain in the neck.
This vulnerability is probably going to cause quite a few problems for people, as it's a client issue that will probably need whatever adapter you use to be updated. Here is the user guide to the vulnerability for PostgreSQL. psycopg should be fixed shortly.
Bogtha Bogtha Bogtha
heh, heh, heh... I'll plug your injection hole, baby!
By the way, the dangling reference to a quote by one "Berkus" should be attributed to Josh Berkus.
Don't piss off The Angry Economist
As long as you set the right multibyte string encoding in PHP via the multibyte string functions (specifically, the mb_internal_encoding function), the parser will catch the invalid multibyte sequence and fix it.
Move along, folks. No need to panic.
Has anyone else found that Suse is really, really slow in releasing updated Postgres binaries? Are they tied to SLES releases? Anyone know anything?
I know I'll probably get a million flames telling me to compile from source, but I'm not really that fond of supporting my own compilation job on a production server.
Must....not....make....joke....about...injection hole...being plugged...
Damn, too late.
=\
ok.. here goes.. unicode
hmmm... nope.. didn't work.
That's why I prefer Postgre. Oh, wait...
Oracle and MySQL suffer from similar vulnerabilites when going UTF8 -> database charset. The "answer" in Oracle is to use UTF-16 on the backend and a select 8/16-bit encoding in the front end if you want to support multiple locales. I'm not sure what the implications are for MySQL.
THIS THING CAN TURN ON A DIME, MACROSSZERO STYLE ALSO FUCK BETA, ~NYORON
If someone wants to try and help me with this problem, let me know, but I had trouble getting mysql to insert (actually doing COPY since it's about 10x faster) anywhere near what I can get with PG8.
I can understand how SJIS and BIG5 are vulnerable.
But in a UTF-8 string, no single byte will match a single quote besides the single quote character (0x27).
It seems to me that simply inserting a backslash before every single quote and backslash in a given string will have the desired effect, and that UTF-8 is not particularly vulnerable to this problem. (quite by design- it was invented by none other than Ken Thompson)
Either that article is misleading somehow, or else the postgres developers are simply putting in some safeguards for common errors in things such as php scripts.
I just don't want to know what you would do at Mr. Lube.
Only up to 8.1.3 were listed here as we composed this:
:-/
http://www.postgresql.org/download/btlist
Oh, and it would be gerat to have just ONE torrent to d'load, eg, per platform.
Alternatively, create an All-In-One ISO (preferably CD-ROM set -and- a DVD ISO)
(Help us to save you bandwidth...)
"Remember: It isn't released until its torrents are released"
I've only recently begun playing with PostgreSQL coming from Oracle. I've also been primarily a Java (JDBC) guy for the last couple years. I'm not sure I completely understand where this vulnerability lies. Would a Java PreparedStatement be vulnerable to this? Would the Postgres implementation of JDBC use 'addslashes()' to bind variables in a prepared statement? Or is this a higher level function? (I have not come across it myself, but like I said I'm still pretty new to Postgres).
I guess I see "affects PostgreSQL servers exposed to untrusted input, such as input coming from Web forms" and wonder if they're talking about some further functionality where postgres acts like a web server. My understanding of PreparedStatements is that they are bound at a very low level in the db to allow for maximum speed through caching etc...
PostgreSQL defaults to SQL-ASCII encoding, which is unaffected by this particular attack. Only clients which connect using a multibyte encoding would be affected.
Actually, this really isn't a vulnerability in the database server itself -- the update just intentionally breaks certain badly written applications in order to protect them from themselves. If PHP's addslashes() ends up creating valid multibyte characters that produce unexpected behavior, that's really PHP's problem -- Postgres is just doing what it's told.
Multi-Layered validation is the only way to go.
Client validation is only useful for round-trip bandwidth reduction, it's nice to have, but not secure in any way. It can stop the occasional accidental bad input. (e.g. entering strings when numerical data is called for, pop up a message box telling you not to do that), it won't stop anyone really interested in corrupting your data.
The app server should be validating everything being posted to it. Is this string too long, too short, not a string, wrong encoding, etc...
The DB server should ALSO be validating everything coming from the app server. Don't trust your application server, it could have a bug, it could have been hacked, it might not be your app server, who knows. Strict stored procedures with no r/w access to tables is a really the only way to go. (To: My Co-Workers, Using select * queries and running as dbo and/or sa is usually a sign that you're not doing it right)
Yes, it's paranoid thinking, yes, it's more work and yes, there is a slight performance hit, but it is secure and it's damn hard to break.
Pure BS. There is no way in hell postgres can come close to 35k inserts/s. To get 35k inserts/s you would need a high end cluster system ... running something other than postgres.
/. mod parent UP!!!
Anyway this is
PostgreSQL ignored invalid UTF-8 sequences, meaning a ' character at the end of a incomplete sequence could cause only one ' to be seen by the parser when escaped.
See http://www.postgresql.org/docs/techdocs.50 for the details.
dtach - A tiny program that emulates the detach feat
... because counting out 500 question marks to figure out why the hell your parameters don't match up is MUCH more fun than being paged at 3AM because the entire production database was wiped out.
If I have been able to see further than others, it is because I bought a pair of binoculars.
Here's a good example of a security flaw: people who extract the database to a flat file and leaves it their hard drive. 26 million veterans can't be wrong. No, seriously, a 10 minute seminar on user permissions should be required of anyone running a DB server. Like a driver's license.
i got ball this is my adress 108 20 37 av corona come n do it iam give u the sidekick so I can hit you wit it
You can always use an 8-bit database charset and then use nchar or nclob columns when you want UTF-32 or UTF-16 support. So then your web-app or whatever has to be consciencious where non-ASCII is allowed so it gets converted/stored properly. Then it's only wasteful if the majority of the database content is localized/user submitted latin text. But uh, the case they were going on about concerned far-east locales where multi-byte is a must so I don't think you'd have much issue using a fixed-width encoding if that's your audience.
THIS THING CAN TURN ON A DIME, MACROSSZERO STYLE ALSO FUCK BETA, ~NYORON
Pure BS. There is no way in hell postgres can come close to 35k inserts/s. To get 35k inserts/s you would need a high end cluster system ... running something other than postgres.
Well, I was thinking that it might be possible if you bunched enough of them into the same transaction, had sufficiently small updates, used a single prepared statement over and over, turned fsync off, and had HUGE amounts of memory, then, perhaps?
Not that I would want to run 35k/s inserts with fsync off...
LedgerSMB: Open source Accounting/ERP
> Pure BS. There is no way in hell postgres can come close to 35k inserts/s. To get 35k inserts/s you would ... running something other than postgres.
> need a high end cluster system
Not necessarily - if it supports a high-performance non-logging load utility, and the poster was talking rows and not actual sql statements.
I commonly hit 100,000 rows/second with DB2 on a simple four-way SMP with its load utility. That's with a seriously slow table type too, I could probably hit 2-4x that speed on a simple table.
Perhaps:
1- he's using some kind of loader
2- he's running a netezza server
Does mysql have a loader?
What's SQL?
Well, I was thinking that it might be possible if you bunched enough of them into the same transaction, had sufficiently small updates, used a single prepared statement over and over, turned fsync off, and had HUGE amounts of memory, then, perhaps?
./81/bin/psql test
You don't need to do all that.
The command:
time ruby -e 'puts "BEGIN;"; for i in 1..35000 do puts "INSERT INTO a VALUES(#{i});" end; puts "COMMIT;"' |
Gives me about 7-8 seconds with write caching off, fsync on. Yes, I turned off write caching with hdparm. This is on my PC, with an el-cheapo IDE drive and a modest 1GB of RAM.
If you think about it, why should it take a long time? It's all one transaction. It's basically just writing the data to a file (the log). The table "a" is just an int field, obviously. And with postgresql's MVCC, an INSERT is very cheap.
Social scientists are inspired by theories; scientists are humbled by facts.
'He also notes that the addslashes function was deprecated in PHP 4.0 due to security risks, but a "distressing" number of PHP applications continue to use the function.'
How come the php documentation doesn't mention this?
Dude, you're most probably using the wrong database. Financial data or billing info (or otherwise telecomm industry?) I presume.
Consider kdb+ from Kx systems. Properly used, it can do ~1mil rows/second, with multiple live hot failovers. It's pricey (several hundreds of K$), especially compared to MySQL/PGsql, but might be worth your buck.
There is a way to solve SQL injection problems: Disallow text literals. Or even, disallow literals (including numbers) at all. This could be a setting in the database that is on by default, and only off for certain applications (ad hoc query tools) or users (admins). What do you think about that?
I'm thinking about implementing this feature in the database I write (http://www.h2database.com/):
This would be a persistent setting, and only an admin can change it. But, maybe this is the wrong place to ask for comments on this?(Of course there are other security risks, like using 'customer id' in URL or hidden fields in a web application. Or relying on Javascript data validation. But I don't know what to do about those problems.)
Hi,
PDO and PEAR::DB both provide ways of doing this under PHP.
See http://pear.php.net/ and http://www.php.net/pdo for examples.
David.
The Ginger Dog
You're wrong. My company has a huge legacy FoxPro database, but for performance reasons we copy most of its tables to PostgreSQL as an hourly cron job. One of those tables is just over 7 million rows and I can copy it into PostgreSQL is slightly less than 5 minutes. 7071473 rows in 287 seconds yields 24639 inserts/s, and this is on a production server answering other queries at the same time.
PostgreSQL 8.1.3, FreeBSD 6.1-STABLE, single Xeon, 3GB of RAM, one SCSI-320 drive, very little performance tuning. If we can hit 25K on that box, I have no doubt he can sustain 35K on beefier hardware.
Dewey, what part of this looks like authorities should be involved?
Here's the keys to it:
Increase src/include/pg_config_manual.h BLCKSZ to the max (32768)
Use COPY not INSERT (faster than transactions for the inserts too)
Postgres isn't threaded so even though it's a dual proc dual core 252, I am still cpu bound. I really need some sort of lazy index/index chunker that can run in parallel to utilize the multiple processors more efficently so I can have more than a single index on the table.
> Postgres isn't threaded so even though it's a dual proc dual core 252, I am still cpu bound. I really
;-)
> need some sort of lazy index/index chunker that can run in parallel to utilize the multiple processors more
> efficently so I can have more than a single index on the table.
Actually, if this is a critical part of your database, you're probably using the wrong product: postgresql is a great product, but this is definitely its weak spot.
For something like this you're far better off with db2 or oracle. My preference is db2 since it lacks a larry ellison and can be far cheaper than oracle. What you'd get with db2 for example is:
- query parallelism: work split across all cpus
- load parallelism: loads split across all cpus
- partitioning: rather than using btree indexes that won't work when you're selecting more than 5-15% of the data, partitioning works linearly when you're selecting 2%, 20%, or 80% of the data.
- query optimization: better query planning means your queries need less tweaking to run the way they should
- fine-grained memory control: to allow you to get the most out of memory, which is very valuable on databases like this. Sort memory, tablespace caches, etc
- automatic summarization: db2 can automatically update summary tables for you
- query rewrite: db2 can automatically rewrite incoming queries to hit summary tables when it makes sense
DB2 Express is free for small systems. I think the limit is 2 cpus & 4 gbytes of memory, with no limit on data size. Beyond those limits you can get workstation edition which is (going from memory here) $1500/CPU list price plus $250/named user. If you want it on the net then it's $7500/CPU for unlimited users. That's list, and you can probably get 50% off.
The only disadvantage with db2 in this configuration is that you'd want to use MDC for partitioning, which is slower to load than a regular table. You'd probably hit just 25-35k rows/sec with mdc. Right now DB2 viper is in beta, that allow you to use MDC, range partitioning, or hash partitioning. All in combination. This means that if you really wanted to scale up you could spread the data across a dozen two-way boxes with each one partitioning its local data. This isn't cheap to do (licensing db2 this way is expensive), but if you wanted to get adhoc queries against a 100 million row table returning results in 1-4 seconds - this is the way to do it.
Oracle is similar, just 2x as expensive, more knobs to twiddle, better skills & third-party availability, and highly unpleasant sales staff to deal with.
Of course, Netezza might also hit the spot using postgresql. It wouldn't have all the benefits of db2 or oracle, but has some. And not sure if they have a lot cost entry-level product either.
In the Python DB-API, SQL strings look like:
You create a dictionary (hash table) with a key "baz", pass that dict to the database along with your query, and it fills in the blanks. Your job as the programmer is to make sure that dict has all the keys in it to complete the query; it doesn't matter which order you assign them or if you don't use them all.
In fact, a very common case is to create on dict with all the values you'll need to execute a whole list of queries, and just keep passing the same dict rather than redoing it each time:
It's about as easy as you can possibly make it and has no disadvantages that I've ever encountered. So, I'd take the position that it's better to protect the server and forget about old ideas like positional parameters. There are extremely programmer-friendly solutions to this problem if you know where to look.
Dewey, what part of this looks like authorities should be involved?
This post of mine is troll/flamebait-ish, yes, but still honestly meant:
Now look at what you've done. Confusing the MySQL fanbois with performance figures for PostgreSQL not even closely resembling their world-view. How will this end...? The horror!
Or do: ...
insert into t (
select d1
union all
select d2
uncion all
select d3
)
so you insert lots of data in one statement!
> If you think about it, why should it take a long time? It's all one transaction. It's basically just
> writing the data to a file (the log). The table "a" is just an int field, obviously. And with
> postgresql's MVCC, an INSERT is very cheap.
So, you hit 5,000 rows / second with a 4 byte row. That's considerably slower than 35,000 rows / second with real-world data. Granted - you did this with no performance tuning on a desktop.
But, in a real-world scenario you're more likely to have:
- twenty columns
- a row-width of 50-200 bytes
- a primary key providing a unique constraint
- possibly several indexes
- possibly several check constraints
and would probably be lucky to hit 500 rows / second.
Imagine if I started throwing out numbers for selects with bitmap scans - their little heads would explode. :-)
Seriously, "PostgreSQL is slow" is about as current as "Slowaris" and the "17 minutes to copy a file" Mac troll. None of them have been true for years, if they ever were.
Dewey, what part of this looks like authorities should be involved?
Probably he is talking about something like MySQL's LOAD DATA INFILE. In MySQL, inserting data into a table directly from a delimited text file, without using SQL statements, is just as much fast, if not faster.
$query = $pdoInstance->prepare('INSERT INTO myTable VALUES ( :foo, :bar:, :val );');
$query->bindParam(':foo', $foo);
$query->bindParam(':bar', $bar);
$query->bindParam(':val', $val);
$query->execute();
I don't see how that could be any less clear.
Wow, I need to drink my coffee. More clear. More clear... To think I even previewed and reread it.
By allowing back-slashes in strings, it opened
itself up to this sort of vulnerability. I am
waiting for the back-slash mechanism to go away
completely. The only way to escape a single
quote should be to double-it-up. 'Peter O''Brien'
You still need to speed that up by about seven times to get 35k/s inserts. This is why I suggested turning fsync off would be the only way to make this happen.
However, this would be limited to bulk imports of data that didn't have constraints. For example, bulk imports of numbers from experimental measurements into a temporary table so that they could later (at one's convenience) be loaded into proper tables.
Yes, I can imagine real-world applications for such rapid small inserts, but in general, they are few and far between.
LedgerSMB: Open source Accounting/ERP
You have to love Websense and their dedication in making sure that sites actually match the categories their products list them in.
Access to this web page is restricted at this time.
Reason: The Websense category "Peer-to-Peer File Sharing" is filtered.
URL: http://www.postgresql.org/download/
Right now DB2 viper is in beta, that allow you to use MDC, range partitioning, or hash partitioning. All in combination. This means that if you really wanted to scale up you could spread the data across a dozen two-way boxes with each one partitioning its local data. This isn't cheap to do (licensing db2 this way is expensive), but if you wanted to get adhoc queries against a 100 million row table returning results in 1-4 seconds - this is the way to do it. ;-)
Greenplum is doing something similar using PostgreSQL as a base:
http://www.greenplum.com/products/bizgresMpp.php
You still need to speed that up by about seven times to get 35k/s inserts.
I was trying to show that it's not out of the question. I don't have particularly good I/O on my machine, and I have a slow processor. The disk does not need to be written until COMMIT time, and it can be written sequentially (thanks to the write-ahead log). If there are several processors and a connection per processor, I could imagine that this type of benchmark could approach the write speed of the disk.
INSERTs aren't expensive in PostgreSQL. They can be if there are a lot of constraints and indexes, but in general they aren't. That guy could have been telling the truth. The poster who accused him of BS probably assumed that he meant 35k transactions/sec. 35k transactions/sec sounds unrealistic to me, since PostgreSQL can only COMMIT once per disk revolution (assuming "normal" non-RAID hardware).
Social scientists are inspired by theories; scientists are humbled by facts.
I thought the WAL was written before the tuple. THis would mean that the disk would receive a new write instruction for every insert, and if there was any delay in processing it, you would have to wait for another revolution. Thus it would be worth seeing how enabling write caching would affect your performance here. That might tell you if a battery-backed cache might be good enough to obtain these sorts of numbers.
LedgerSMB: Open source Accounting/ERP
I thought the WAL was written before the tuple.
The way I understand it, the tuple is written first at the end of the file containing that portion of the table in question. But it is written asynchronously, meaning that it may stay in the OS Buffer Cache.
When the COMMIT comes, PostgreSQL writes sequentially to the WAL, but this time it writes synchronously.
That way, if there is a power failure, the data pages in the buffer cache holding the tuples may be lost, but the WAL contains enough information to reconstruct any committed transaction.
THis would mean that the disk would receive a new write instruction for every insert
No, often INSERTs will stay in the OS buffer cache until COMMIT time.
Social scientists are inspired by theories; scientists are humbled by facts.
You are correct-- the WAL entries are flushed at the end of the transaction.
LedgerSMB: Open source Accounting/ERP
One of our production systems averages around 500/s, with peaks at over 3000/s, with roughly that sort of table structure... it's a dual Opteron, H/W RAID10 across two channels, 6 disks and 4GB of RAM and a separate SATA WAL drive. Thing is, it's not just an insert, it's a call to a stored procedure that does some data hygiene, then a select and then either an insert or update depending on the result of the select. Very fast, very happy!