PostgreSQL 8.2 Released

positined? by chris_mahan · 2006-12-05 12:15 · Score: 0

I should know better than expect correctness in AS, but come on...

--

"Piter, too, is dead."

bitmap? by cxreg · 2006-12-05 12:18 · Score: 0, Troll

Sure would be nice to get bitmap indexes one of these days

--

God Fucking Damnit

Re:bitmap? by jrockway · 2006-12-05 12:45 · Score: 1, Funny

Sure would be nice if you sent in a patch.

--
My other car is first.
Re:bitmap? by nconway · 2006-12-05 12:53 · Score: 4, Informative

Bitmap indexes will almost definitely be in 8.3. Gavin Sherry submitted a revised patch for them a few days ago.
Re:bitmap? by Anonymous Coward · 2006-12-05 14:08 · Score: 0

wow that reply is so interesting and original! no doubt your major contributions to postgres would amaze and astound.
Re:bitmap? by jadavis · 2006-12-05 14:19 · Score: 1

PostgreSQL does use bitmap indexes, just not on-disk (as nconway said, should be in 8.3).

PostgreSQL currently uses bitmap scans to combine indexes (which means fewer multi-column indexes are necessary), and also to reorder the results of an indexscan in disk block order so that it can get blocks in disk order with better cache behavior.

--
Social scientists are inspired by theories; scientists are humbled by facts.
Re:bitmap? by Anonymous Coward · 2006-12-06 03:16 · Score: 0

> wow that reply is so interesting and original! Is it original to whine about as yet unavailable options in an open source package? So why originality must belong to your parent post and not its parent?

also "PostgreSQL it is still missing" by larry+bagina · 2006-12-05 12:43 · Score: 0, Flamebait

PostgreSQL it is still missing.

--
Do you even lift?

These aren't the 'roids you're looking for.

Watch out, MySQL. by Anonymous Coward · 2006-12-05 12:45 · Score: 5, Interesting

MySQL has been the dominant SQL server within the open source community. Between its non-standard SQL and it's lack of advanced features, many developers and DBAs are getting fed up. Thankfully, they've been able to turn to PostgreSQL.

At my firm, we switched some of our MySQL Enterprise databases over to PostgreSQL 8.1. What we found was pretty amazing: PostgreSQL outperformed MySQL by approximately 23% in terms of the number of queries it could handle per second. And this was with a very basic level of tuning! Our MySQL installations, on the other hand, had been tuned by three different consultants. Keep in mind that both were running on exactly the same system, under the same installation of FreeBSD. Were not sure exactly why there was such a remarkable increase in performance when using PostgreSQL, even without much tuning, but we're happy with it nonetheless. We're also happy to no longer being paying MySQL for support.

We're actually quite happy to get away from MySQL. The other developers I work with were quite sickened by the deal MySQL AB reached with SCO a while back. While we're strictly a BSD shop, we still think SCO's actions are quite distasteful, and we are willing to move away from companies that enter into deals with them.

Re:Watch out, MySQL. by Local+Loop · 2006-12-05 13:44 · Score: 3, Insightful

It's because MySQL runs like dogmeat on FreeBSD, no matter which threading libraries you use. I know, I just switched from FreeBSD to Linux for our database servers. The performance difference was astounding - approximately 60% gain just from switching to Linux.

For us, PostgreSQL is a lot slower than MySQL on the same hardware. But our workload is not typical by any stretch so YMMV.

Try comparing PostgreSQL and MySQL, both running on Linux and I'll think you'll be surprised.
Re:Watch out, MySQL. by kestasjk · 2006-12-05 14:20 · Score: 0

The thing PostgreSQL needs is a phpMyAdmin, it has something similar but it doesn't come close. phpMyAdmin makes MySQL accessible to everyone, and I think if an OSS DB is going to be widely used it needs a good admin CP which doesn't require the user to be fluent in SQL.

--
// MD_Update(&m,buf,j);
Re:Watch out, MySQL. by jadavis · 2006-12-05 14:21 · Score: 1

A more detailed report, even if anonymous, would be helpful. Can you post your findings on the web, such as workload, hardware, etc?

--
Social scientists are inspired by theories; scientists are humbled by facts.
Re:Watch out, MySQL. by jadavis · 2006-12-05 14:33 · Score: 1

It's because MySQL runs like dogmeat on FreeBSD, no matter which threading libraries you use.

Well, PostgreSQL launches a process per connection, so I don't see how that could explain the difference. Or are you saying that threading is slower than using processes?

Why are you so sure it's the threading, when he gave no details? If he had consultants coming in, most likely he would have a connection pool if that would have helped. You appear to have latched onto this explanation because MySQL must always be faster, and if it's not, it must be the OS's fault, right?

Maybe he just had a lot of concurrent connections, which is one of many areas where PostgreSQL can show a major improvement over MySQL.

http://tweakers.net/reviews/657/6

There are a bunch more with similar results at tweakers.net. It could also be the PostgreSQL planner, which has had major improvements recently. Or, it could be one of the myriad other amazing things about PostgreSQL (which are often written off as "unecessary features").

--
Social scientists are inspired by theories; scientists are humbled by facts.
Re:Watch out, MySQL. by Bluesman · 2006-12-05 14:49 · Score: 2, Informative

Do you mean like this?

Having used both, I can tell you phppgadmin is a bit more polished than phpmyadmin. Neither are particularly wonderful ways to interact with a database, but if you're stuck on a no-console web host, I'd much prefer to have the posgres/phppgadmin combo.

--
If moderation could change anything, it would be illegal.
Re:Watch out, MySQL. by thebeatgoes.com · 2006-12-05 14:50 · Score: 4, Informative

You mean like this or
webmin anyone? or
this if you want a non-web version
Re:Watch out, MySQL. by Anonymous Coward · 2006-12-05 14:53 · Score: 1, Informative

http://pgadmin.org/ is the most used open source tool.
Re:Watch out, MySQL. by greg1104 · 2006-12-05 15:04 · Score: 1

At my firm, we switched some of our MySQL Enterprise databases over to PostgreSQL 8.1.

Every time I see the words "MySQL" and "Enterprise" next to one another, it really gives me a good laugh. Why, it's almost as ridiculous as suggesting that SQL:2003 Window Functions are critical for business reporting.
Re:Watch out, MySQL. by Tweekster · 2006-12-05 16:19 · Score: 1

No I think he meant a quality version, last time i checked phppgadmin seriously lacked.

--
The phrase "more better" is acceptable English. suck it grammar Nazis
Re:Watch out, MySQL. by Bacon+Bits · 2006-12-05 18:55 · Score: 3, Interesting

IMX, since about 7.3-7.4 PostgreSQL runs just as fast as MySQL under any significant load. It simply scales a lot better than MySQL seems to.

I will say that if you've just recently switched to PostgreSQL that you should be sure you read the documentation on configuring the server. While the default installation of MySQL is to use as much resources as necessary, PostgreSQL's default install is extremely conservative. By default it only allocated 1 MB (yes, one megabyte) for working memory. If you've got more than 32 MB of RAM, you're probably going to need to edit some config files to see any reasonable performance. Try running a VACUUM VERBOSE to determine how many pages or entries you need in your FSM. That's something that needs to be reconfigured on a production system after it's been in place for some time. If you do strange things like mass DELETEs or TRUNCATE TABLE, you'll also need to VACUUM more often.

The .org root DNS servers run on PostgreSQL, so it's not a problem with the RDBMS itself. Postgre has been repeatedly criticized for being so conservative with the default installation settings. I think they should have some configuration tools (in the Windows installer especially) that helps you to make somewhat more sane configuration settings.

The typical response from PostgreSQL devs on the subject is "yeah, if we turned off fsync on our DB it'd run real fast, too". This is partially why PostgreSQL seems to run slower than MySQL on databases that have lots of INSERT and DELETE queries.

I no longer see any reason to ever use MySQL. It's more popular, but I find PostgreSQL, Firebird, and SQLite cover the range of needs so much better. MySQL is great to learn on, but, well, it's just annoying once you really understand the first things about relational databases.

--
The road to tyranny has always been paved with claims of necessity.
Re:Watch out, MySQL. by sxpert · 2006-12-05 18:58 · Score: 1

you can also connect to your postgres remotely and even use ssl encryption from your home machine with psql...
Re:Watch out, MySQL. by nconway · 2006-12-06 07:45 · Score: 2, Interesting
Some work has gone into improving Postgres' default configuration in recent releases. With 8.2 on my machine, for example, the default configuration allocates:
- 1MB of work_mem. This is a reasonable figure: because work_mem can be allocated once for every sort operation, each backend can allocate several times work_mem concurrently, so setting it much higher than 1MB could actually consume more memory than would be desirable out of the box, IMHO.
- 24MB of shared_buffers (maybe less if initdb can't allocate sufficient SysV shared memory). This is on the low side for a modern server, but still not too bad, considering that many people using PG leave most page caching up to the kernel.
- 128MB for effective_cache_size. Again, maybe not perfect (this box has 2GB of RAM), but probably needs DBA configuration to be at all accurate anyway.
There's probably room for improvement, but the above behavior is definitely an improvement, IMHO.
Re:Watch out, MySQL. by larry+bagina · 2006-12-06 11:38 · Score: 1

Have you seen PGAdmin III? It's wxwidgets based, but if your database is firewalled, you can use ssh tunnelling to access it. Personally, I find it easier to use than phpMyAdmin (I'm not a big fan of web-based apps).

--
Do you even lift?
These aren't the 'roids you're looking for.
Re:Watch out, MySQL. by aevans · 2006-12-07 07:14 · Score: 0

Don't the .com root servers run on a flat file? Flat files must be better than PostgreSQL

Re:Real Men don't use Window Functions by larry+bagina · 2006-12-05 12:46 · Score: 4, Funny

According to the MySQL fanbois, Window Functions are bad for performance and not even useful. Just like subselects, data integrity, triggers, and transactions. Oh wait, MySQL 5 supports subselects. Subselects are no longer bad for performance.

--
Do you even lift?

These aren't the 'roids you're looking for.

Re:Real Men don't use Window Functions by TranscendentalAnarch · 2006-12-05 13:04 · Score: 1

Armchair moderating ftw.

Better than MySQL 3.23? by Anonymous Coward · 2006-12-05 13:05 · Score: 0

So how does this release compare with MySQL 3.23? Because my webhost is still using it, and I need to be able to argue that PostgreSQL 8.2 is infinitely better than MySQL 3.23 for them to provide this also.

Re:Better than MySQL 3.23? by deepestblue · 2006-12-05 13:08 · Score: 1

Another route (that I took) is to switch to a host that provides Postgres support. I can recommend csoft.net
Re:Better than MySQL 3.23? by jadavis · 2006-12-05 14:55 · Score: 1

Or get shell access and install it yourself. It works nicely without any special privileges.

--
Social scientists are inspired by theories; scientists are humbled by facts.

Re:Real Men don't use Window Functions by theelectron · 2006-12-05 13:08 · Score: 1

Correct me if I am wrong, but wasn't MySQL 5.x supposed to include transactions and triggers among other things? I'll be the first to admit that I don't really keep up on the Postgre/MySQL battle, but you might want to keep up on current technology if you are going to make an inflammatory post like that.

Performance? by Ant+P. · 2006-12-05 13:09 · Score: 2, Insightful

How fast is it against MyISAM? (MySQL's main selling point for a lot of people)

Re:Performance? by Anonymous Coward · 2006-12-05 13:31 · Score: 0

Sure, turn fsync off.
Re:Performance? by El+Cubano · 2006-12-05 13:33 · Score: 2, Interesting

How fast is it against MyISAM?
I can't remember where I heard it or who said it, but I once heard someone say words about MySQL to the effect of "if you ignore all the things that make a real database a database, you can make it really fast." Now, I get that lots of web hosts use MySQL and that it is the dominant free database out there. However, there is lots of insight in that statement. Now, in 99% of the cases where MySQL is used, it probably works great with few hitches. However, I'd rather trust my data to a something that values data integrity over speed.
Recall that not too long ago, right here on slashdot we all got to see first hand what happens when MySQL craps out. All the threading was gone. I mean seriously, what sort of database accepts invalid and then silently truncates it and moves on? Again, I don't think that the number of people with MySQL tables with 16,000,000+ rows is very large, but it is still disturbing.
If you are going for something small and light and fast and you are not too concerned about standards, then MySQL is great. Note, I am not trying to troll, I am simply pointing out that for all the people who endlessly bash on one or the other DB, that there is a market space for each.
Re:Performance? by phoenix.bam! · 2006-12-05 13:44 · Score: 3, Informative

Not only does mysql silently truncate (and I just tested this on mysql 5) If you insert 2006-2-30 into the date field, i just completes the insert and makes the date 0000-00-00. Go Go Data integrity!
Re:Performance? by Anonymous Coward · 2006-12-05 13:46 · Score: 2, Informative

In defense of MySql 5.x you can actually toggle a setting to make it reject invalid data instead of silently mangling it and continuing as if nothing had happened. However, shipping with that setting disabled doesn't do much to improve MySql's data integrity reputation.
Re:Performance? by hey! · 2006-12-05 14:02 · Score: 3, Insightful

You have to be careful when you ask a question like that.

What's faster, a Ferrari or a semi-trailer truck? If you are transporting a bunch of bannanas, the Ferrari. If you are transproting 50,000 pounds of bannanas, the semi wins.

In other words, the problem with your question is there is no single thing that is "speed". There's only speed to do a certain class of tasks.

--
Post may contain irony: discontinue use if experiencing mood swings, nausea or elevated blood pressure.
Re:Performance? by jadavis · 2006-12-05 14:38 · Score: 3, Insightful

However, shipping with that setting disabled doesn't do much to improve MySql's data integrity reputation.

Not only that, one of the major selling points of MySQL is that it has many applications. If you deviate from the standard configuration, many of those apps will break. That's one of the problems with the "configureware" mentality, just like in PHP, except that MySQL is lower on the stack so it's worse.

--
Social scientists are inspired by theories; scientists are humbled by facts.
Re:Performance? by LurkerXXX · 2006-12-05 15:05 · Score: 2, Insightful

That's a defense?

'Real' databases don't have a setting for 'screw data integrity'. Data integrity is kind of one of the central points of a relational database.

It just shows it's background as a toy, not a real database.
Re:Performance? by greg1104 · 2006-12-05 15:08 · Score: 5, Funny

How fast is it against MyISAM?

I've managed to get my PostgreSQL installation tuned to very high speeds simply by switching the database disk over to /dev/null. It runs fast as hell, and the data integrity is basically the same as MyISAM.
Re:Performance? by jadavis · 2006-12-05 15:09 · Score: 1

How fast is it against MyISAM? (MySQL's main selling point for a lot of people)

Well, you should probably consider the planner too. After all, if it's using a dumb plan, or if it is lacking a "feature" that allows it to choose an efficient plan, even a "slow" database will be faster. Remember, optimizing the algorithm is usually much more important to performance than reducing the parsing time of a query.

Example: You need to go 15 places all over town today. Is it faster to take a Fararri and visit in a random order, or to plan the route to travel a shorter total distance and avoid traffic?

And you should probably consider a million other things, as well, but I don't think they'd fit in a /. post.

--
Social scientists are inspired by theories; scientists are humbled by facts.
Re:Performance? by MadAhab · 2006-12-05 15:33 · Score: 3, Insightful

Should be modded up.

Now for the MySQL fanboi's, I do have to ask: why not use SQLite for the same purpose? Either you need a dumb data store or you need a Real Database. If you need a dumb data store, why not go for the one that does the best job of being a minimal data store - and use SQLite? If you need Real Database features (and I do), MySQL just hasn't caught up to PostgreSQL, and is even losing ground, after all this time.

The hole in what I'm saying, of course, is replication. PostgreSQL 8.2 looks like it's making progress in this respect. I haven't played around with warm stand-by's, but I'm sure someday I'll need it. When I do, log shipping looks like it will do nicely!

--
Expanding a vast wasteland since 1996.
Re:Performance? by innosent · 2006-12-05 16:59 · Score: 1

Again, I don't think that the number of people with MySQL tables with 16,000,000+ rows is very large, but it is still disturbing.

Are you kidding me? At my last job, we had five tables (audit records and various archives) larger than 16,000,000 rows. PostgreSQL 8.1 worked fine.

--
--That's the point of being root, you can do anything you want, even if it's stupid.
Re:Performance? by jadavis · 2006-12-05 18:09 · Score: 1

Right. These days you can easily run databases with tens of millions of records on consumer hardware.

--
Social scientists are inspired by theories; scientists are humbled by facts.
Re:Performance? by rg3 · 2006-12-05 22:23 · Score: 1

Forgive me if I'm wrong, but I heard that SQLite becomes very slow if the database grows too much. This comes from Amarok users. Amarok may store its song database (and playlists?) using SQLite, and people with a big song database have reported it can become very slow, and the issue is apparently solved if you run MySQL and tell Amarok to use the MySQL backend instead of SQLite. So, basically, you may need a dumb data store as you say, but being dumb doesn't exclude being big, and with big databases SQLite is not apropiate.
Re:Performance? by tacocat · 2006-12-06 00:34 · Score: 1
I've been playing with Postgres 8.1 and doing some reading online and I think the differences between MySQL and Postgres come down to this
- They are both ACID compliant
- Postgres offers more features than MySQL
- - In addition to this there are a variety of curves that are fairly consistent over different versions, hardware, and database sizes... If you have few users then MySQL does well. As you increase the number of users MySQL will start to decline severly. Postgres does not do this. It's flat or nearly flat.
    
    Postgres has a tool called pgbench available that I was using for some testing. I have a PentiumII machine that I wanted to see if it was worth running as a database. I put in a pair of disks as a RAID0 configuration and started testing. Here is what I found.
    
    On a small database (5 million rows?)
    
    10 users run about 120 transactions per second.
    
    100 users run about 85 transactions per second.
  - On a large database (~15GB, maybe 250 million rows)
  - 10 users run about 115 transactions per second.
    
    100 users run about 80 transactions per second.
  Which for says that the database performance over size and user concurrency is fairly flat. I didn't benchmark this against MySQL as I'm not interested in doing that. But it's valuable to me to know that this database will performan consistently regardless of the number of users (slashdot) or the eventual size of the database.
  
  Too often I see people get a database application (MS Access anyone) that works great the first day. But as soon as they get a million rows in a table it starts to flag. The last thing I want to do is put in a database that becomes my worse nightmare in six months.
  
  On the "strictyly my opinion" side of things I think postgres is absolutely better and am glad to see this new release. I think they have advantages because they have only one database engine to work with. I read that to mean they have lower risk of bugs. I love the ssl support and use it religiously. They have always focused on being a database before focusing on being popular in pop-culture. And the eventual licensing of MySQL will make them unfavorable in time. I already found it very difficult to find any documentation on their website because of the marketing influence.
  
  But, I'm not going to tell you to use Postgres or that MySQL sucks. It's probably great. Use whatever the hell you want.
Re:Performance? by Khazunga · 2006-12-06 01:34 · Score: 1

I put in a pair of disks as a RAID0 configuration and started testing.
On a production setup, you may wish to use either RAID1 to speed up reads, or then leave the disks separate. Then, you can move tables and indexes between the two disks so that their load is balanced. With RAID0 you may end up having one disk as the performance bottleneck with the other sitting idle.

--
If at first you don't succeed, skydiving is not for you
Re:Performance? by ckaminski · 2006-12-06 03:12 · Score: 1

Why the hell hasn't ODBC caught on in the Linux world? Really? Different libraries for database access, hardcoded? Ugh.

Granted, ODBC isn't exactly a panacea, but it's definitely better than dlopen(mysqlclient); or dlopen(pgclient);
Re:Performance? by ofgencow · 2006-12-06 04:23 · Score: 1

SQLite is actually very fast and capable of handling large amounts of data, but you do have to be careful to optimize your queries. SQLite doesn't do much query optimization for you, and can make a huge difference in execution. Fortunately, it's not hard to optimize SQLite queries. The pitfalls are documented (if you bother to RTFM), and tools exist to help find the dogs.

I personally built a custom ELT solution using SQLite, where the database grew up to 45gb in size. I also ran benchmarks using other database engines, including MySQL and PostgreSQL. SQLite was 2x faster than MySQL for my use.

If SQLite is ever anything but screaming fast, you either need to a) add appropriate indexes, or b) re-order your queries.
Re:Performance? by LearnToSpell · 2006-12-06 05:23 · Score: 1

What kind of values are we using for "big?" I haven't noticed any real slowness, and I'm at 12,378 tracks. Still ripping though (now on "H"), so I'll let you know. :-)

--
Haida Manga
Re:Performance? by helifex · 2006-12-06 06:15 · Score: 1

There are many well established database abstraction layers and most applications use one or another. Of course ODBC sucks and is rarely the choice. Regardless applications are going to be limited to those database which support all the features required by the application. Using ODBC does not some how magically make SQLite magically equivilant to PostgreSQL
Re:Performance? by Nutria · 2006-12-06 08:54 · Score: 1

five tables (audit records and various archives) larger than 16,000,000 rows.

While I'm glad you're using Pg, 16M rows is chump change. Come back when your tables hit cardinality 100M.

--
"I don't know, therefore Aliens" Wafflebox1
Re:Performance? by innosent · 2006-12-06 16:56 · Score: 1

That was exactly my point. 16M rows is not a "large" table. Maybe 10 years ago that would be a lot for common hardware, but I would imagine that most Mid-size companies see 16M+ rows as a common thing. We only had 10 years of live data, and only about 90 employees. I'm sure there are much larger databases than ours was.

--
--That's the point of being root, you can do anything you want, even if it's stupid.
Re:Performance? by innosent · 2006-12-06 17:01 · Score: 1

The audit DB (for all application transactions [healthcare industry]) was a single Opteron 240 with a 12 drive SATA RAID and 4GB RAM. As the DB was mostly used for inserts and a few employee productivity or records research reports, the CPU load rarely topped 3%.

--
--That's the point of being root, you can do anything you want, even if it's stupid.
Re:Performance? by aevans · 2006-12-07 07:20 · Score: 0

Your raid controller for a Pentium 2 had to have been more than the price of a new computer.
Re:Performance? by aevans · 2006-12-07 08:49 · Score: 0

I can find a song in 12,378 rows on paper in a few seconds by hand. And my eyesight isn't that good. The limiting factor is how quickly I can flip pages. Give me a minute with your full database. OCR and a robotic page flipper could probably speed that up. If I stored the scans to disk, I'd be able to cut that down to less than 1 second.
Re:Performance? by Ayende+Rahien · 2006-12-08 08:38 · Score: 1

You do realize that you may spend more time calculating the shortest distance than just visiting them in randrom order?

--

--
Two witches watched two watches.
Which witch watched which watch?
Re:Performance? by jadavis · 2006-12-09 07:28 · Score: 1

I didn't say "shortest" I said "shorter". And people do it every day in a few seconds of mentally mapping out your destinations, and visiting destinations in groups.

You could say the same things about a 15-table join. The database can't determine the most efficient join order in an efficient way, but in a very short time it can figure out a more efficient join order than a random order. Hence PostgreSQL's GEnetic Query Optimizer (GEQO), which determines a better join order with a genetic algorithm.

My argument is that spending a short time planning to devise a better algorithm is often actually faster at runtime.

--
Social scientists are inspired by theories; scientists are humbled by facts.

I think you're full of it. by Anonymous Coward · 2006-12-05 13:09 · Score: 0

I'm a fan of PG, but your post sounds like you're just trash-talking MySQL. (Don't get me wrong, MySQL is a joke for sure.) I'm also doubting the 23% increase in performance, but I haven't made any comparisons personally.

Re:I think you're full of it. by szap · 2006-12-05 13:38 · Score: 5, Informative

... I'm also doubting the 23% increase in performance... FWIW, and YMMV, when you get hammered with many concurrent queries, it's much, much faster. At about 100 concurrent hits, about 50% faster: http://tweakers.net/reviews/657/6 Benchmark method here: http://tweakers.net/reviews/646/9

Yes, it's missing description on how exactly they set up MySQL. MyISAM? innodb? So take it with a grain of salt.
Re:I think you're full of it. by innosent · 2006-12-05 16:47 · Score: 3, Informative

actually, they used innodb, and yes, Postgres scales much better than MySQL, but MySQL is a little more streamlined for low-volume jobs.

--
--That's the point of being root, you can do anything you want, even if it's stupid.
Re:I think you're full of it. by H+Tuuri · 2006-12-05 22:12 · Score: 1

Hi!

In the upcoming MySQL-5.0.30, we have improved InnoDB's scalability under multiple concurrent threads that insert, update, or query the database as fast as they can. It would be interesting to see the Tweakers' benchmark re-run with the new version.

Best regards,

Heikki Tuuri
Innobase Oy / Oracle Corp.
Re:I think you're full of it. by jadavis · 2006-12-06 04:09 · Score: 1

What are the fundamental differences between InnoDB and PostgreSQL's MVCC?

--
Social scientists are inspired by theories; scientists are humbled by facts.
Re:I think you're full of it. by H+Tuuri · 2006-12-06 07:32 · Score: 1

There is no fundamental difference in the semantics of InnoDB's MVCC and PostgreSQL's MVCC, but there are minor differences. In PostgreSQL-8.1, the default transaction isolation level is READ COMMITTED (like in Oracle), while in InnoDB it is REPEATABLE READ (one can choose READ COMMITTED if one likes). I chose REPEATABLE READ as InnoDB's default isolation level, because then all consistent read SELECTs within one transaction see the same snapshot of the database, and give consistent results with respect to each other.

The implementation of MVCC is quite different in PostgreSQL and InnoDB. InnoDB keeps track of delete-marked records in indexes (the data of the row is in the clustered index), and removes them as soon as they are not needed in consistent read SELECTs. PostgreSQL does not keep track of deleted records and rows, and therefore you from time to time need to run the 'vacuum' over the entire database.

Re:Real Men don't use Window Functions by Anonymous Coward · 2006-12-05 13:30 · Score: 0

*whoosh* Parent post was trying to be sarcastic. Yes, the later versions of MySQL has most of those.

Gotta love it... by chill · 2006-12-05 13:35 · Score: 5, Insightful

PostgreSQL it is still missing the SQL:2003 Window Functions that are critical in business reporting, so Oracle and DB2 will still win out for OLAP/data warehouse applications.

Bullshit, pure and simple. This is nothing more than marketing-speak and you should be ashamed.

I'm not saying that SQL-2003 Window Functions are useless, I'm saying your statement about them being "critical" in business reporting is bullshit. Did no one do business reporting before this standard came out? What the hell did people do in 2002? Are all those MS-SQL Server 2000 and Oracle 8i servers going to fall down in shame? I think not.

I see these comments all the time, usually in marketing brochures from a software vendor touting a new feature. They make it sound like all other products are steaming piles of shit if they don't have whiz-bang-feature #16. They like avoiding any conversation that goes "But, I've been using your product and it works great. Are you telling me your product (last rev) is a steaming pile of shit? That implies if I upgrade, next year you're going to be telling me how THIS rev you are so loudly praising is also a steaming pile of shit."

Charles (had enough marketing-speak for this year)

--
Learning HOW to think is more important than learning WHAT to think.

Re:Gotta love it... by Shados · 2006-12-05 13:46 · Score: 2, Insightful

It IS a critical feature. Like how CSS support is a critical feature for the web. But in both cases, no one has all the critical features, and its annoying as all hells.

Of course, using the extra stuff the databases support (PL/SQL, T-SQL, etc), we manage. But for example, the "workaround" for the window functions are not only ugly, but often quite misunderstood, on top of being difficult to use through dynamic sql (if thats your cup of tea). I keep seeing people using inefficient paging methods in SQL Server 2000 for example, when (while not supporting the actual function to do it "right") there are a few extremely efficient ways. So those features are indeed critical.

A bit like a certain quite popular database engine that shall remain nameless didn't support stored procedures for like ever. People work around it just fine, but...

Database engines are almost consistantly -behind- user's needs, even the fancy commercial ones, nevermind the incomplete ones.
Re:Gotta love it... by rycamor · 2006-12-05 13:50 · Score: 2, Interesting

I mean... did this OP rush to push out a lackluster FP on PG, or what?

Practically the only informative part of this post is focusing on the perceived negative (which is a dubious one, IMHO).

Never mind that Postgres has actually turned out some nice feature advances in this release, although they don't make for good marketspeak bullet points. There have been advances in performance, table partitioning, clustering, query logic, user-defined functions, etc... pretty much every area of "enterprise" database development except for the one area the OP chooses to focus.
Re:Gotta love it... by hey! · 2006-12-05 13:56 · Score: 1

What the hell did people do in 2002? Are all those MS-SQL Server 2000 and Oracle 8i servers going to fall down in shame? I think not.

You are making some assumptions here. First, you are assuming that a feature cannot be implemented before it makes it into a standard, which is not necessarily the case. There are other paths, e.g. the idea gets published in a journal, the relational theory geeks at several leading vendors pick it up, several incompatible implementations are created by different vendors, and since there is no real competitive advantage any more the standards folk come and tidy up. Outer joins were like that. The feature existed well before the standard.

The second assumption is that you can either do, or not do something. However it is entirely possible to do something in a way that is awkard, or slow, or awkward and slow. A language that implements the relational calculus and is turing equivalent can do practically anything you'd want, but it is not guaranteed to do it in a practical way.

I've been in the database business for nigh on twenty five years now. OLAP itself is the kind of thing we thought we'd never have to do because of the power of the relational model. Well, we were wrong.

--
Post may contain irony: discontinue use if experiencing mood swings, nausea or elevated blood pressure.
Re:Gotta love it... by rycamor · 2006-12-05 14:10 · Score: 2, Interesting

OLAP itself is the kind of thing we thought we'd never have to do because of the power of the relational model. Well, we were wrong.

How would we know? We have never yet seen a DBMS that really implements the relational model (at least, not in the normal world of business software). Show me the word 'relational' in the SQL standard, anywhere. What we have is all sorts of incredible complication to work around the fact that SQL itself is a damaged and confused (and at times contradictory) approach to the problem.

The serious theorists I have read argue that the reason we need all the performance workarounds is *precisely because* we are not really working with a relational system, and all the vendors conflate logical levels and physical storage levels to various degrees.
Re:Gotta love it... by Anonymous Coward · 2006-12-05 14:20 · Score: 2, Informative

So true. Today, I finished rolling out an OLAP/reporting system for a mid-sized mining company, and guess what's under the hood?
Postgres rocks (or keeps track of them in this case). It works, and it was done 100% free of window functions.
Re:Gotta love it... by rycamor · 2006-12-05 14:56 · Score: 1

Why is this modded 0? A nicely relevant post.
Re:Gotta love it... by Zetta+Matrix · 2006-12-05 15:18 · Score: 1

PostgreSQL it is still missing the SQL:2003 Window Functions that are critical in business reporting, so Oracle and DB2 will still win out for OLAP/data warehouse applications.

I would contest the assertion a different way than the parent post.

It's not like this one particular feature is the only thing holding PostgreSQL back from kicking Oracle and DB2's asses, respectively (even considering just the OLAP/data warehouse applications, as it was phrased).

Far from it. (PostgreSQL lacks many of the advanced features of those products, and is not as fast either.)
Re:Gotta love it... by dfetter · 2006-12-05 15:36 · Score: 2, Interesting

You mean the magic Darwen/Date/Pascal relational model? The one nobody has managed to implement despite the 25 years it's been around?

Maybe it's because the thing can't be made to work, and its limitations (i.e. being equivalent to first-order logic, a limitation not in SQL DBMSs) make it silly even to keep trying.

--
What part of "A well regulated militia" do you not understand?
Re:Gotta love it... by Anonymous Coward · 2006-12-05 16:34 · Score: 0

I find it more interesting that MS is being so brazen in using /. for a marketing site. Over the last 6 years, /. has been slowly converted from a thinking mans site to a site that is being overrun by marketers and republicans.
Re:Gotta love it... by rycamor · 2006-12-05 17:03 · Score: 2, Interesting

You mean the magic Darwen/Date/Pascal relational model? The one nobody has managed to implement despite the 25 years it's been around?

Ahh yes, the old canard. Actually, several companies and individuals have implemented the relational model MUCH more faithfully than the typical SQL vendor. The problem is not one of difficulty, but rather of popularity and marketing.

In fact, several solo-developer projects have implemented it on the logical level much better than your typical SQL vendor. The problem is that those guys don't have a) the marketing budget and 20 years of industry buy-in, and b) the developer team to implement all the "enterprisey" features like clustering, failover, etc... And by the way, there is nothing about the "true relational model" that makes those things harder to implement. They are if anything LESS difficult to implement with a true relational DBMS than with an SQL DBMS, which has to handle all kinds of oddities like duplicate rows, position-dependent syntax, pointers, and many other nonsensical rules of SQL.

I know lots of you database pros out there hate to hear from guys like Date,Darwen and co. but the thing is they are right: the DBMS world has opted for mediocrity and over-complexity. Of course, that's the way it is with most things in life :(.
Re:Gotta love it... by Anonymous Coward · 2006-12-05 18:00 · Score: 0

You can emulate windowed analytic functions using extra joins or postprocessing, but it's a lot slower. People who need to do lots of real analytics aren't going to stand for it.
Re:Gotta love it... by dfetter · 2006-12-05 18:26 · Score: 2, Interesting

> > You mean the magic Darwen/Date/Pascal relational model? The one nobody has
> > managed to implement despite the 25 years it's been around?

> Ahh yes, the old canard. Actually, several companies and individuals have
> implemented the relational model MUCH more faithfully than the typical SQL
> vendor.

Name one, and make sure it's one that's disallowed NULLs completely. Date,
Darwen and Pascal's fear of recording states of ignorance is ill-founded in
real-world conditions. Codifying that fear isn't even well-founded in last
century's mathematical theory. Yes, it's true that multi-value logics are
just a teensy tad more complicated theoretically than 2VL. That does
not imply that they're less useful, or that the systems built around
them are more complicated than the truly wackily byzantine things D, D & P
suggest as workarounds for not having NULLs.

> The problem is not one of difficulty, but rather of popularity and
> marketing.

Nope. See below.

> In fact, several solo-developer projects have implemented it on the logical
> level much better than your typical SQL vendor. The problem is that those
> guys don't have a) the marketing budget and 20 years of industry buy-in, and
> b) the developer team to implement all the "enterprisey" features like
> clustering, failover, etc...

> And by the way, there is nothing about the "true relational model" that
> makes those things harder to implement.

That it's been 25 years and nobody has implemented it, despite
resources in industry, government, academia and open source, flatly
contradicts your assertion.

> They are if anything LESS difficult to implement with a true relational DBMS
> than with an SQL DBMS, which has to handle all kinds of oddities like
> duplicate rows, position-dependent syntax, pointers, and many other
> nonsensical rules of SQL.

> I know lots of you database pros out there hate to hear from guys like
> Date,Darwen and co.

Nonsense. It's not that we don't like to hear from theoreticians. It's that
we don't want to hear from doctrinaire ideologues like D, D & P, especially
when they have only "angels dancing on the head of a pin" to show for their
side. One theoretican whose stuff is actually worth reading is Leonid Libkin
. There are plenty of others.

> but the thing is they are right: the DBMS world has opted for mediocrity and
> over-complexity.

You know, this is really dumb prima facie. Something that you need to
have a 130 IQ and a math degree to use even at the most basic level is
something that's pretty fragile. A *really* well-designed tool is one that a
person who's not very bright can pick up and use, while not muzzling the
expression of somebody who is bright and has lots of experience. SQL
qualifies.

> Of course, that's the way it is with most things in life :(.

Oh, puh-lease!

--
What part of "A well regulated militia" do you not understand?
Re:Gotta love it... by rycamor · 2006-12-05 20:35 · Score: 2, Interesting

> > > You mean the magic Darwen/Date/Pascal relational model? The one nobody has
> > > managed to implement despite the 25 years it's been around?

> > Ahh yes, the old canard. Actually, several companies and individuals have
> > implemented the relational model MUCH more faithfully than the typical SQL
> > vendor.

> Name one, and make sure it's one that's disallowed NULLs completely. Date,
> Darwen and Pascal's fear of recording states of ignorance is ill-founded in
> real-world conditions. Codifying that fear isn't even well-founded in last
> century's mathematical theory. Yes, it's true that multi-value logics are
> just a teensy tad more complicated theoretically than 2VL. That does
> not imply that they're less useful, or that the systems built around
> them are more complicated than the truly wackily byzantine things D, D & P
> suggest as workarounds for not having NULLs.

You're mischaracterizing the argument. I said "MUCH more faithfully". I know there's no perfect implementation, nor will there likely ever be. Name one perfect implementation of the SQL standard. But, there have definitely been *better* implementations, ones that attempt to fit the concepts of the relational model more closely. You know the ones I'm going to talk about: Duro, Rel, Alphora, etc... The fact is very few people care about these, for the same reason that very few people care to be told to eat their vegetables, or in fact to be told there is a better way to do whatever it is they are doing. That doesn't make my argument wrong.

Meanwhile, I'm not a slavish ideologue about this. I personally don't care about the NULL thing, because I think there are sensible arguments on both sides, and no easy resolution. But, supporting duplicate rows, rowIDs, positional attributes, etc... seem to me such blindingly obvious bad choices. This is without even getting to the more abstract stuff like transitive closure. Of course there are trade-offs in the real world, but why trade off things that are useful to gain things that are not?

> > The problem is not one of difficulty, but rather of popularity and
> > marketing.

> Nope. See below.

> > In fact, several solo-developer projects have implemented it on the logical
> > level much better than your typical SQL vendor. The problem is that those
> > guys don't have a) the marketing budget and 20 years of industry buy-in, and
> > b) the developer team to implement all the "enterprisey" features like
> > clustering, failover, etc...

> > And by the way, there is nothing about the "true relational model" that
> > makes those things harder to implement.

> That it's been 25 years and nobody has implemented it, despite
> resources in industry, government, academia and open source, flatly
> contradicts your assertion.

In other words, it can't be done because it hasn't been done? Fallacy. Tell me a logical *reason* why it can't be done.

BTW, I have a hard time believing that someone with your .sig would think that big government, modern academia, and big business are the standard bearers for logic and the limits of human endeavor ;).

> > They are if anything LESS difficult to implement with a true relational DBMS
> > than with an SQL DBMS, which has to handle all kinds of oddities like
> > duplicate rows, position-dependent syntax, pointers, and many other
> > nonsensical rules of SQL.

> > I know lots of you database pros out there hate to hear from guys like
> > Date,Darwen and co.

> Nonsense. It's not that we don't like to hear from theoreticians. It's that
> we don't want to hear from doctrinaire ideologues like D, D & P, especially
> when they have only "angels dancing on the head of a pin" to show for their
> side. One theoretic
Re:Gotta love it... by LizardKing · 2006-12-05 20:43 · Score: 1

PostgreSQL lacks many of the advanced features of those products [Oracle, DB/2], and is not as fast either.

True, but if you lack a quality DBA and the hardware necessary to get the maximum performance from Oracle or DB/2, then PostgreSQL is a fine alternative. Oracle in particular needs a lot of care and attention to keep it performing at its best, and if you've forked out for the licenses you probably want to get them most from it. It's in situations where the budget or other resources rule out the big commercial DB's that PostgreSQL really shines. It has excellent documentation (which I find much more readable and complete than MySQL's, especially when it comes to tuning options) and is not very complex to administer. In fact, unless there is a compelling reason to go with another DB, then PostgreSQL is always my first choice.
Re:Gotta love it... by chill · 2006-12-05 22:55 · Score: 1

Dude, we're talking about the SQL 2003 standard, NOT Microsoft SQL Server 2003. The last standard was SQL 1999.

http://www.wiscorp.com/SQLStandards.html

--
Learning HOW to think is more important than learning WHAT to think.
Re:Gotta love it... by hey! · 2006-12-06 00:14 · Score: 1

Well, that's an interesting hypothesis. The question is, is it refutable?

In its shorthand form, it is pretty close to being tautological: a REAL RDBMS never requries us to use OLAP strategies; any commercial product X sometimes requires us to use OLAP strategies; therefore any commercial product X is not a REAL RDBMS.

I think it is too strong to say that the products we have to day are not "true" relational systems, although it would be fair to say they aren't pure relational system. If the product implements the relational calculus, I think it can reasonably be called relational.

The question you raise, of logical/physical design separation, is a matter of best practices; and it is true that to the degree that an RDBMS does not implement a clean separation, the more often practicality will encourage user to treat data non-relationally.

However -- and this is an important point -- we should not conflate necessity with sufficiency. Separating logical and physical design means that the functional requiremnts of any two applications that need the data can be met with ease. But it doesn't mean you can meet the kind needs that would fall under the scope of "non-functional testing". Specifically I'm talking about performance. If two applications have critical performance needs that dictate different physical database designs, then they cannot be served by the same database, although they could be served by two databases of identical logical design.

It's not true that all commercial products conflate physical and logical design. Oracle is an example of a commercial product that does a pretty good job. A skilled Oracle DBA can get a single database to support more usage patterns than, say, a skilled MS SQL DBA. But it doesn't mean he can get it to support every possible application with adequate performance. Just more than he could with a simpler system.

--
Post may contain irony: discontinue use if experiencing mood swings, nausea or elevated blood pressure.
Re:Gotta love it... by Anonymous Coward · 2006-12-06 01:16 · Score: 0

The perennial arguments: ivory tower academics who tout theoretical models vs practical engineers who actually have make it work outside of a University.
Needless to say, academics have their place... but they'll never accept that parts of their ideas are good, and parts are just crackpot impractical crap that doesn't work in the real world of actually having to do stuff for real.
Re:Gotta love it... by DuckDodgers · 2006-12-06 03:21 · Score: 1

More importantly, software doesn't exist in a vacuum.

Even if someone invents a superior database paradigm (I haven't read any of the theory on this), who's going to spend the time and money migrating countless petabytes of existing databases into the new system? Whether you love or hate C, C++, COBOL, Java, and SQL, they are in widespread use all over the world. Replacement technologies would need to be orders of magnitude better in most metrics and easy for non-technical users to conceptually grasp before they get any headway.

I'm not saying the status quo is great. It certainly has flaws. But a replacement needs to blow people away with its superiority before it gets any reasonable attention.
Re:Gotta love it... by Anonymous Coward · 2006-12-06 03:54 · Score: 2, Insightful

Critical? CSS is critical for the web? Alternative ways of doing something that's been done forever are critical?

You fail completely to understand the CONCEPT at work here. 'Critical' means you CANNOT DO SOMETHING WITHOUT IT. You have failed -- by your own admission -- to even state, let alone prove, that anything you're talking about is more than a nice alternative, let alone critical.

Who the hell modded that insightful? Give me a break. dictionary.reference.com, now this post had better get a +5 informative, because there you will find the TRUE MEANING OF THIS WORD -- if they aren't missing a 'critical' new widget that can efficiently replace the search text box.

Your statement that DB engines tend to be behind users' "needs" doesn't mean much when a "need" is something "critical," thus definitely not something you should comment on.
Re:Gotta love it... by jadavis · 2006-12-06 04:42 · Score: 2, Interesting

Name one, and make sure it's one that's disallowed NULLs completely. Date, Darwen and Pascal's fear of recording states of ignorance is ill-founded in real-world conditions. Codifying that fear isn't even well-founded in last

CJ Date actually tried to generalize the concept of NULLs into "special values" in a domain. He argued that NULLs cause confusion in 3VL because NULL can mean different things. Sometimes it means "unknown", other times it means "not applicable". And in an outer join, it's not clear at all what the value NULL is supposed to represent. At least in my reading of his work, there is nothing precluding you from having a special value called "NULL" and being able to magically add it to an existing domain with simple syntax. He wanted other special values to be allowed, so that NULL was not so ambiguous. Perhaps I misunderstood, I don't have the book nearby.

I really enjoyed Date's work because he clearly explained points of confusing terminology, and suggested possibly less-confusing approaches.

SQL is far from perfect. If nothing else just the syntax is bad, and it has a lot of reserved words and key words. But one thing we have to remember is that no pure relational language is appropriate for a database. For instance, relations are unordered, by definition. Yet, it's important for a database system to be able to ORDER BY.

So, really, I think what we want is a language that facilitates all the relational operations, and at a level above that also allow non-relational operations like ORDER BY.

--
Social scientists are inspired by theories; scientists are humbled by facts.

Replication? by Curly · 2006-12-05 13:49 · Score: 5, Informative

What do PostgreSQL users do for replication? I'm a MySQL admin who would really like to be able to switch to PostgreSQL, but we need to be able to have several slaves hanging off a master, and have everything replicated in as real-time as possible (but asynchronously) to the slaves. I have spent some time looking for how to do this in PostgreSQL but have found each solution lacking. The "most popular free" one, according to the PostgreSQL faq, is "Slony-I", but from what I could find it doesn't replicate schema changes to the slaves. What happens to your replication when the slaves sees an update to a column/table that doesn't exist on the slave? Slony also doesn't replicate "large objects"; I don't know what they are, but as a MySQL admin who has been replicating our databases for many years, I have a hard time imagining adjusting to limitations like these.

Most of the other options I found were abandonware, undocumented, didn't work with PostgreSQL 8.x, etc. I looked at commercial solutions, but they were similarly a mess. Specifically, here is my survey:

* pgpool -- Max 2 servers, and they're not really in sync---commands like now() or rand() will be executed independently on the mirrored machines, causing them to have different data.

* Slony I -- DB schema changes not replicated, nor are "large objects"

* PGCluster -- Synchronous multi-master. We don't want synchronous, and don't need multi-master. Documentation patchy, didn't appear to be currently maintained.

* CommandPrompt "Mammoth" -- Documentation "in the works". PostgreSQL 8.0.7. Tables can't use "inheritance". Schema changes not replicated (at least not table creation, not sure about the rest). Only 1 db replicated, not all dbs. Tables must have primary keys. Have to list tables in config file.

* Bizgres/GreenPlum -- Buzzword-compliant website, but website was broken when I looked for details. The "Community" is inactive---forum is barely used, questions are unanswered.

* PostgrSQL Replicator -- Poorly documented. Only mentions up to 7.x. "News" is from 2001.

I'm not ragging on PostgreSQL: I'd really like to be able to migrate to it. I just fear that when replication is done in a third-party fashion, it loses the tight integration with the dbms necessary to make it work truly seamlessly, and that it isn't maintained as well as the core product.

Perhaps this comment is off-topic, since the post is about a new release of PostgreSQL, not asking for questions about its individual features. But this is the one feature I look for in each new release, and the fact that I couldn't find any good solution makes me wonder if it's because I missed the one great one that people actually use.

Re:Replication? by rsax · 2006-12-05 14:30 · Score: 1

I love PostgreSQL but I must side with you here. It needs solid, native, asynchronous replication supported by the main dev team.
Re:Replication? by nyamada · 2006-12-05 14:45 · Score: 5, Informative

We use Slony. It is a delicate beast, but works quite well if you take time to read the limited documentation. You can use a kludge to keep schema changes in line: if you execute all schema changes through EXECUTE SCRIPT statements on the master server, all the slave nodes will get the schema changes. As for large object support, you're right; it is a problem.

PITR recovery and log replication may work in 8.2; but I agree with the posters who complain that there is no easy replication for postgresql.
Re:Replication? by oGMo · 2006-12-05 14:50 · Score: 5, Informative
Slony also doesn't replicate "large objects"; I don't know what they are,

You're a DBA and you don't know what large objects are?
but as a MySQL admin

Oh, right. Not really a DBA

Let's see:
- "pgpool -- Max 2 servers, and they're not really in sync---commands like now() or rand() will be executed independently on the mirrored machines, causing them to have different data." One: keep your clocks in sync. Two: how can you tell if rand() isn't "in sync"? You run it on each server and you get different results? You know what rand() means, right?
- "Slony I -- DB schema changes not replicated, nor are "large objects"." One: how often does your schema change, and do you really need automatic replication? Two: If you don't even know what large objects are, why do you have a problem with this?
- "PGCluster -- Synchronous multi-master. We don't want synchronous, and don't need multi-master. Documentation patchy, didn't appear to be currently maintained." So don't use it.
- "CommandPrompt "Mammoth" -- Documentation "in the works". PostgreSQL 8.0.7. Tables can't use "inheritance". Schema changes not replicated (at least not table creation, not sure about the rest). Only 1 db replicated, not all dbs. Tables must have primary keys. Have to list tables in config file." One: MySQL doesn't have inheritence, you're not losing anything. Two: see above about oft-changing schemas. (Otherwise, this sounds like a very high-level replication of tables, probably using simple scripts or triggers. If it doesn't suit, don't use.)
Others listed are older and not relevant.

I just fear that when replication is done in a third-party fashion, it loses the tight integration with the dbms necessary to make it work truly seamlessly, and that it isn't maintained as well as the core product.

Funny, I fear a database that has only rudimentary data integrity checks. Here's the real question for you: Why do you need replication? It doesn't magically work the way you think it does, even in MySQL (see under "Problems Not Solved"). Quote: "MySQL's replication isn't the ideal vehicle for transmitting real-time or nearly real-time data". Every replicated database can lose synchronization and no one can honestly guarantee otherwise. Even Oracle.

Slony-I will pretty much give you what you already have. My guess is that you don't really need replication at all; hot standby servers will suffice in case of failure. The rest comes down to query tuning or faster hardware (or a database that does faster nontrivial queries, like PostgreSQL). (And don't complain about costs if you're already buying servers for replication. If you have real data that's making you money here, hardware is cheap; if you don't, you probably don't really need any of this to begin with.) If you need true realtime synchronization, replication is not an option.

Finally, while I'm not a MySQL fan, since you don't seem to give any real reason for wanting to migrate, why bother? You already have a working system and hardware investment. If it ain't broke, don't fix it. If it comes time to upgrade down the line, and the features justify the move, then maybe consider it.

In summary: meh.
--
Don't think of it as a flame---it's more like an argument that does 3d6 fire damage
Re:Replication? by jadavis · 2006-12-05 14:52 · Score: 5, Informative

"Slony-I", but from what I could find it doesn't replicate schema changes to the slaves

That's a feature, not a bug. That means you can have DB1 be master for Table1 and slave (subscriber) for Table2, and DB2 be master for Table2 and slave (subscriber) for Table1. You can also chain subscriptions to make a hierarchy, which allows for very good scalability.

Oh, and if you want to replicate schema changes, use the Slony-I "execute script" command. It will lock down all the tables as necessary and synchronize the changes so that nothing gets out of order. Slony-I keeps everything transactionally consistent.

Slony also doesn't replicate "large objects"

Ignore that. A large object is basically an interface to a file over the PostgreSQL protocol. You don't need them to efficiently store large amounts of data. Put a GB into a text type if you want (or bytea type for binary data).

I encourage you to take a closer look at Slony-I. It's what the .org and .info registries use. It's good software. It's also great for an upgrade path when you have a lot of data and don't want to be down for a dump/reload.

--
Social scientists are inspired by theories; scientists are humbled by facts.
Re:Replication? by jadavis · 2006-12-05 15:27 · Score: 1

[ ignoring some of the unnecessary rudeness ]

Two: If you don't even know what large objects are, why do you have a problem with this?

Perhaps he thinks he misread that as the direct english meaning: "something large". Slony can store and replicate big stuff, it just won't replicate things that aren't tuples. But tuples in PostgreSQL can be big and efficient.

PostgreSQL replication will force you to consider the real consequences of your choices in various situations. MySQL replication will say that it's working, but you won't really know what actually happens (i.e. where your data is, and what it means) in event X. I think he could benefit from learning about Slony-I.

--
Social scientists are inspired by theories; scientists are humbled by facts.
Re:Replication? by Anonymous Coward · 2006-12-05 15:54 · Score: 0

Look, if you're just going to be a smart-alecky cunt who posts useless shit like that, then stop wasting everybody's time and don't post at all.
Re:Replication? by Anonymous Coward · 2006-12-05 16:11 · Score: 0

And this is one of the friendlier PG users.

The Right Thing would be to replicate also the timestamp and random seed for now() and rand(), but who needs convenience?
Re:Replication? by swbrown · 2006-12-05 16:46 · Score: 1

Despite your angst, PostgreSQL /is/ currently weak when it comes to replication. Trying to talk around that isn't going to lead anywhere. The only stable and well-maintained option right now is basic asynchronous replication which is unsuitable for many types of applications that require data integrity (e.g., read Slony's section on Failover and note the caveats). PostgreSQL is a great database, so I'm hoping that more sophisticated replication strategies will be in its near future.
Re:Replication? by Anonymous Coward · 2006-12-05 17:28 · Score: 0

Pot. Kettle. Black.

Feel like pointing out exactly how it was "useless shit"?
Re:Replication? by oGMo · 2006-12-05 18:19 · Score: 1

PgSQL replication is weak, but then, everyone's replication is pretty weak. They can---and don't get me wrong, they should---work on it. But at the same time, they can't work magic. They can't make everything magically synchronous all of the time, or efficiently try and do every now() or rand() across the network against a single source or other silly things the parent poster wants.

--
Don't think of it as a flame---it's more like an argument that does 3d6 fire damage
Re:Replication? by Curly · 2006-12-05 18:58 · Score: 1

Why anyone modded the parent Informative, I have no idea, but I'd like to respond to this:

If you don't even know what large objects are, why do you have a problem with this?
[...]

MySQL doesn't have inheritence, you're not losing anything.

It's the law of least surprise. Every exception to "set it up and forget about it" is something our programming team is going to have to keep in mind. That's why it almost doesn't matter what "large objects" are in PostgreSQL, or whether we're using inheritance now.

From other postings, which were actually informative, it sounds like "large object" means something in PostgreSQL that might not come up for us in practice, and that the schema changes not being replicated can be worked around (by expressing them with "execute script", documented in the Slony project). Those two being the only issues I noticed in Slony, we could probably reasonably work with it.
Re:Replication? by Bacon+Bits · 2006-12-05 22:27 · Score: 2, Informative

Binary Large OBjects (BLOBs) are table columns with individual entries are larger than several thousand bytes (typically, those that span more than one page). BLOBs are part of the ANSI SQL standard, AFAIK, which is why it is surprising you'd never heard of them. They differ from MySQL's 'blob' datatype, which is just a big TEXT field. The design of the database (PostgreSQL, DB2, Oracle, T-SQL/MS SQL, etc.) prevents such objects from being stored in the same method that other objects are stored, either because the SQL standard defines maximum sizes for fields or because the physical structure of the database makes it impractical or unreasonable. In the case of PostgreSQL, the objects are internally stored in different tables with different physical files, although that is not seen by the DB developer at all. They're typically used for storing pictures and documents in the DBMS when you cannot or do not wish to use the file system instead, or for literally storing large binary data. it also supports data streaming, AFAIK.

Table inheritance is like a reverse VIEW, and was defined in SQL:1999. Given table A and table B, let's say table B inherits from table A. Table B will then have all the fields from table A plus it's own. PostgreSQL also supports multiple inheritance. It's standard SQL, but it's very weird, IMO. It has some pretty specific uses, like being able to essentially have indexed VIEWs and such, or making a permanent JOINed table.
http://www.postgresql.org/docs/8.2/interactive/ddl -inherit.html

As far as schema changes, the argument goes like this: replication is only necessary on productions systems. Schema on production systems should be static. If you're changing your schema, you probably did something wrong.

--
The road to tyranny has always been paved with claims of necessity.
Re:Replication? by Anonymous Coward · 2006-12-06 02:37 · Score: 0

Slony may not be part of the main development, but to be absolutely clear it was written by, and is currently maintained by one of the 7 core Postgres developers.
Re:Replication? by Anonymous Coward · 2006-12-06 02:52 · Score: 0

You are talking about shortcomings in Async replication, not shortcoming inherent to Postgres or Slony. You want guaranteed, no data loss replication on fail over? Then you want synchronous replication.

Slony's async replication works great for me. Losing transactions in a fail over situation is less than ideal, but unavoidable for async.

Synchronous replication is not an option for my apps. I have _very_ tight SLA's on transaction time. No synchronous solution on the planet for any DBMS, commercial or not would be acceptable for me - I'd be blowing SLA's in normal operation.
Re:Replication? by Curly · 2006-12-06 04:09 · Score: 1

Thank you for the clarification. So "large objects" == BLOBs. (So why not call them BLOBs?) Not replicating them is a meaningful restriction, but not one that comes up in my application.

As far as schema changes, the argument goes like this: replication is only necessary on productions systems. Schema on production systems should be static. If you're changing your schema, you probably did something wrong.

Or you updated your production system?

That argument mistakes a bug for a feature.
Re:Replication? by Bacon+Bits · 2006-12-06 05:00 · Score: 1

They are called BLOBs. In ANSI SQL, BINARY LARGE OBJECT and BLOB are datatype aliases. There are also CLOBs for CHARACTER LARGE OBJECTS, but those might be vendor-specific and not standard.

Most often I would expect to take production systems offline for an upgrade that required a schema change, which would not require ongoing data replication, yes? However, reading other posters comments here suggests that the EXECUTE SCRIPTING component of Slony is much more automated than I thought (I've never needed schema replication, and haven't needed Slony in a long while).

Slony is relatively new but it progresses very fast compared to PostgreSQL. That's one of the main reasons the project is kept as a module instead of a core component.

--
The road to tyranny has always been paved with claims of necessity.
Re:Replication? by Christopher+B.+Brown · 2006-12-06 07:21 · Score: 1

The handling of "large columns" gets a bit better in Slony-I 1.2; in earlier versions, the replication engine would blindly grab groups of 100 tuples into memory to get ready to replicate them. If you have big columns with 50MB of data in them, this could lead to loading 5000MB of data into memory, and then throwing it across the wire to replicate it. What with a couple copies getting made (the copy loaded into memory, then turned into a query to be submitted to the subscriber), this could lead to a pretty large memory footprint.
One user regularly hits cases like that, and actually wound up upgrading his replication servers to have >32GB of memory to support this.
In 1.2, that case largely goes away. Large tuples aren't immediately loaded into memory; those larger than a (configurable) threshold get left on disk, to be pulled one by one. One might imagine that to be a possible source of inefficiency, but consider, you've got an insert that's loading 50MB of data into one tuple. There is no amount of index scanning that can possibly be material beside that!
Slony-I most certainly does support replicating "large data;" what it doesn't support is the Large Object interface, which is one that essentially amounts to having a UNIX I/O API sitting on top of SQL. There are doubtless users of Large Objects out there; the "user interface" is arcane enough that sensible people usually prefer to store large amounts of data as plain old columns that get TOASTed...

--
If you're not part of the solution, you're part of the precipitate.
Re:Replication? by lucifron · 2006-12-06 08:38 · Score: 1

Postgres has two 'blob' types.

The standard type for binary data is BYTEA, but you also have BLOB which if i understand correctly is somewhat legacy.

BLOBS can be streamed to/from, while BYTEA fields are processed 'whole'.

Re:Real Men don't use Window Functions by fimbulvetr · 2006-12-05 14:10 · Score: 1

I don't know if I fall under the fanboy sign in this case. I know the differences between postgres and mysql, and use mysql more frequently than postgres.

In any case, I never argued that those things were bad for performance, but I did argue this:

More often than not, subselects and triggers make people lazy and generally patch up cases where non normalized data should be fixed. They encourage things like db/app bleedover, not fully understanding joins, and not fully implementing data normalization (where appropriate).

Fairly often, and certainly less so in mysql (but probably now moreso), I run across cases where postgres and oracle queries used slower subselects, bad non-normalized data or should-have-been-the-applications-job triggers where they were completely inappropriate and really demonstrated a lack of knowledge of any RDBMS.

PG question. by Anonymous Coward · 2006-12-05 14:15 · Score: 0

In mysql, you can do something like this: update x set y=@k:=@k+1 order by z (syntax is probably a little off)

Can you do something similar in pg in a single query?

Re:PG question. by jdew · 2006-12-05 15:22 · Score: 1

What is that supposed to accomplish?

I know you can do something like update x set a=b, b=a; in postgresql.
Re:PG question. by Harik · 2006-12-05 18:47 · Score: 1

from my reading of it, it updates y=(serial counter), ordered by z

An iteration on set data.

so for
y, z
0, 1
0, 2
0, 3
0, 5

you end up with
1, 1
2, 2
3, 3
4, 5
Re:PG question. by Anonymous Coward · 2006-12-06 03:57 · Score: 0

You are correct.
Re:PG question. by jdew · 2006-12-06 04:36 · Score: 1

Interesting. I know of no way to do that in a single statement. Might be possible by creating a sequence and using nextval(...). However update dosen't let you use order on it.

Somebody else with a better understanding of post's sql might be able to come up with a one liner, but not I.

--more-- by Anonymous Coward · 2006-12-05 14:46 · Score: 0

Along that line, what do PG users use for full text searching? Like for ~5mil rows each containing ~12 words.

fts - pretty simple (it stores duplicate words, and breaks down words too much, like AMAZING will store: MAZING AZING ZING ING NG which takes up way too much disk space)

tsearch2 - afaik doesn't support wildcard searches, like for AMAZ*

Do you use those? Or roll your own? Or what?

Re:--more-- by mikaelhg · 2006-12-05 23:51 · Score: 2, Informative

... what do PG users use for full text searching?

The same as everybody else who stores text in a relational database. Use external indexing, such as Lucene, which actually has some features you'd want for non-trivial full text indexing and searching, such as stemming.

Awesome by tcopeland · 2006-12-05 15:32 · Score: 2, Interesting

> 8.2 is positioned as a performance release.

We've only got a small database (17 million records or so), and PostgreSQL 8.1 has been handling it fine. But I'm still looking forward to seeing how 8.2 improves things.

And we're using it in another production system, too, which is going to get pretty big (I hope). Lively times!

--
The Army reading list

What PostgesSQL really needs. by Shawn+is+an+Asshole · 2006-12-05 15:32 · Score: 1

Equivalents to Query Browser and DBDesigner4/Workbench.

Use them. They rock. Query Browser does everything I used in phpMyAdmin and much more. DBDesigner4 and and it's (currently rather unstable) replacement, Workbench, are extremely useful for designing/modifying databases. I prefer PostgreSQL for speed, stability, and features, but I develop in MySQL just because of those tools.

--
"It ain't a war against drugs.it's a war against personal freedom" --Bill Hicks

Oracle and MS SQLServer both have such a setting by brokeninside · 2006-12-05 15:46 · Score: 1

I'd be surprised if DB/2 didn't also have it.

Way to go PostgreSQL by greengarden · 2006-12-05 16:02 · Score: 3, Informative

I worked a lot with Oracle, and then joined an open source project that started using PostgreSQL. The project is a billing system, so is data intensive. What a great little database PostgreSQL is. And that was back in th 7.x version.
Actually, jBilling http://www.jbilling.com/ now runs in many databases but still PostgreSQL is holding its ground against Oracle and other heavyweights. Those extra features that Oracle says you need and charges you an arm and a leg, are really not needed in most applications.

Cheers,

Paul C.
Sr Developer
http://www.jbilling.com/ - The Open Source Enterprise Billing System

Reporting by mccoma · 2006-12-05 16:10 · Score: 2, Informative

PostgreSQL it is still missing the SQL:2003 Window Functions that are critical in business reporting, so Oracle and DB2 will still win out for OLAP/data warehouse applications.

Apparently the submitter has not been visited by any of the plethora of reporting tools vendors who will tell you (without you asking) how crappy the built-in stuff is and how great their stuff is.

Also, given the text, isn't Oracle and DB2 also missing those critical SQL:2003 Window Functions?

I love postgresql by euice · 2006-12-05 20:00 · Score: 1

I'm using postgresql since the 5.x days, when it indeed was slower than mysql.
But as a developper, I never accepted the shortcomings of the non-standard and really incomplete sql syntax of mysql.
The command line tool psql with tab-completion of sql syntax and less style output of query results convinced me to switch in a second.
PostgreSQL never let me down, whereas I often had problems with mysql databases. (e.g. non working databases after upgrades)
Not to mention the semi-free open-source license of mysql.
What's all the fuss about mysql again? Mysql is a commercial product that is and was inferior to postgresql since the very beginning. The performance gain was small compared to the missing features.
That's just my two cents, but I think the mysql guys did a great job marketing their product and fooling everybody into using mysql.

Re:Real Men don't use Window Functions by euice · 2006-12-05 20:18 · Score: 1

If you really have the knowledge of DBMS, you should be thankful for the options! You sound like abandoning features to force you into discipline.
Sometimes you have to balance development time against performance, not to mention the statements you as an administrator type by hand, where performance might not be an issue.
And in addition to that, I can assure you that there are lots of cases where subselects are REALLY fast in postgresql. Even faster than aggregates and group by. Never underestimate the power of the query optimization in postgresql, since 8.0 this is really good.

One thing you cant do in PostgreSQL ... by euice · 2006-12-05 20:41 · Score: 2, Funny

... is create a smallint index on an int column ;-)

Re:One thing you cant do in PostgreSQL ... by Anonymous Coward · 2006-12-05 21:35 · Score: 0

CREATE TABLE si(a int);

INSERT INTO si value (1);
INSERT INTO si value (3);
INSERT INTO si value (70000);

CREATE INDEX si_idx ON si((a::smallint)) where a < 65536;
Re:One thing you cant do in PostgreSQL ... by euice · 2006-12-06 07:04 · Score: 1

you're right, i should have written "accidently" in my post.
I was referring to the slashdot database that forced them to disable threading for a few hours.

Off topic by theshowmecanuck · 2006-12-05 21:13 · Score: 1

Off-topic. default mod point for an 'Anonymous Coward' is zero... even if it is a good post :-) Most moderators don't like to give mod points to ACs because they would rather reward or punish registered users... otherwise it's like throwing away good mod points.

--
-- I ignore anonymous replies to my comments and postings.

Re:They moved to FreeBSD from Linux. by shani · 2006-12-05 22:17 · Score: 3, Interesting

It sounds like you just don't know how to deal with FreeBSD. That would explain the poor performance you experienced, and how it is completely contrary to what we've found.

For the heavest application at my last job, the load pattern was very query heavy, although the application stored intermediate results in temporary tables. This application is heavily threaded, creating two threads per user connection, plus the MySQL thread, so we're talking like 150 threads created & destroyed per second.

Our original platform was Solaris, and performance was excellent (well, excellent considering the dog-slow CPUs that Sun makes).

We eventually migrated to Linux, but this was possible only after the new thread libraries (well, new at the time). Performance then was quite good.

We found MySQL under FreeBSD basically unusable under heavy loads.

We never tweaked any of the systems. We did try a few thread libraries under FreeBSD, but they all sucked.

MySQL license by shani · 2006-12-05 22:20 · Score: 1

Not to mention the semi-free open-source license of mysql.

GPL?

Re:MySQL license by euice · 2006-12-05 22:46 · Score: 1

Mysql has a GPL/Commercial dual licensing model. And because connection to mysql means linking to the client, which is "derivative work" in terms of the GPL, you can only use GPL'ed software with mysql. Unless you pay them to use their commercial license of course.
OTOH PostgreSQL is released under the BSD license, which has none of these restriction.
Re:MySQL license by shani · 2006-12-06 00:26 · Score: 1

Mysql has a GPL/Commercial dual licensing model. And because connection to mysql means linking to the client, which is "derivative work" in terms of the GPL, you can only use GPL'ed software with mysql. Unless you pay them to use their commercial license of course.

Well, that's GPL. You appear to be arguing that GPL is only "semi-free" (your own words).

But if you don't like GPL, MySQL allows you to use any of the following licenses with clients:

http://www.mysql.com/company/legal/licensing/foss- exception.html
Re:MySQL license by euice · 2006-12-06 00:48 · Score: 1

By semi-free i was referring to the dual licensing model. If you want to use mysql in a closed source environment (and do not want to reprogram the client libraries) you need to pay royalties. That's what I meant with "semi-free".
Of course I consider the GPL itself as "free enough" for the server (unlike some of the bsd fanboys).
But as a freelancing developper, I often have to develop closed-source applications and for that I need at least LGPL client libraries, which mysql doesn't provide AFAIK.
In my world it's easy to convince a paying customer to release improvements for existing open-source projects as open source or donate to open-source projects. But it's a tough job to convince them to open-source everything they pay for and propably want to sell.

Re:They moved to FreeBSD from Linux. by Anonymous Coward · 2006-12-06 02:05 · Score: 1, Informative

I'm a rabid FreeBSD advocate, but MySQL performs badly under FreeBSD. This isn't so much a problem with FreeBSD as it is with MySQL, which is very Linux centric. I have no gripes however as I dumped MySQL before I dumped Linux, but I would recommend that if you are going to have a stand-alone server for MySQL that it should be running on a linux distro.

That's a stupid remark by Anonymous Coward · 2006-12-06 03:21 · Score: 0

All the major databases have a way of temporarily disabling integrity checking for bulk processing; it's limited to a single transaction, and the integrity/validity checks are still performed after the task is complete.

That's vastly different from turning off all checking globally and silently eating problems.

Don't forget by Slashdot+Parent · 2006-12-06 03:24 · Score: 1

Don't forget the ultimate problem with pgsql: the users.

GP asked a simple question about what replication strategies are used by pg shops, and some asshole like you responds in a tone like yours.

You could have just answered the question. It wasn't necessary to be a dick about it.

Also, you might be interested to read a bit about MySQL Cluster which is different from their replication solution. Pretty neat stuff.

Also, I do agree with you that GP gave no indication that MySQL was failing to meet their needs. MySQL doesn't meet everybody's needs and neither does pgsql (or Oracle for that matter). But changing databases for "fun" is a horrible use of resources.

--
They don't grade fathers, but if your daughter's a stripper, you fucked up. --Chris Rock

Re:Don't forget by Anonymous Coward · 2006-12-09 06:20 · Score: 0

MySQL Cluster? Check this link:
http://forums.mysql.com/read.php?25,93181,93181

It isn't limitted to a single transaction by brokeninside · 2006-12-06 03:38 · Score: 1

With Oracle and SQL Server you can turn off constraints for an indefinite period of time. It may be used mostly for single bulk transactions but I've seen it used for months at a time on Oracle.

Re:It isn't limitted to a single transaction by kabz · 2006-12-07 07:17 · Score: 1

Why not just uninstall Oracle and use MySQL? ;-)

--
-- "It's not stalking if you're married!" My Wife.

Re:Real Men don't use Window Functions by theelectron · 2006-12-06 05:04 · Score: 1

Oh, ok I admit, I completely missed that and deserve a good flogging. He makes a good point when read in a sarcastic light.

Missing? by lucifron · 2006-12-06 05:52 · Score: 1

I'm sure you could cope without, but it seems like this is actually included in some shape or form.. The release notes mention:

Aggregate-function improvements, including multiple-input aggregates and SQL:2003 statistical functions

But does that help? by Christopher+B.+Brown · 2006-12-06 07:37 · Score: 1

It's all well and good to add a "be careful with my data now" setting, but that's pretty much a pyrrhic victory if it makes all the old applications break...

If using this setting requires major remedies to revise applications and retune them, that may be no less work than redeploying on something that has mature support for data integrity...

--
If you're not part of the solution, you're part of the precipitate.

Re:Real Men don't use Window Functions by larry+bagina · 2006-12-06 09:59 · Score: 2, Interesting

MySQL has made those claims:

Earlier versions of the MySQL manual included claims that certain missing features (considered essential for SQL-compliant RDBMSs) were useless or even harmful, and that users were better off without them. One section, entitled "Reasons NOT to use Foreign Keys constraints" [sic], advised users that relational-integrity checking was difficult to use and complicated a database application, and that its only useful purpose was to allow client software to diagram the relationships between database tables. [13] Another section claimed that a DBMS lacking transactions can provide data-integrity assurances as reliably as one supporting transactions--conflating the issue of transactional integrity with that of saving data when the database server loses power. [14] Since these claims contradicted basic principles of relational database design, they caused MySQL to be ridiculed by some database experts. Regardless of whether they were right or not, these claims are omitted in more recent versions of the manual. MySQL today allows some support for previously-dismissed features of relational integrity checking and transactions.

(From Wikipedia and archived MySQL manuals)

--
Do you even lift?

These aren't the 'roids you're looking for.

A few different reasons by brokeninside · 2006-12-08 01:40 · Score: 1

On a twenty server cluster of SPARC kit with tens of processors each, MySQL can't touch Oracle for distributive processing and load balancing. Second, the point of dropping constraints for a few months was to allow the the database design to reach maturity in a development environment. Before the project reached production, constraints were turned back on.

PhpMyAdmin Equiv? by Tablizer · 2006-12-08 19:46 · Score: 1

Does PostGre have something like PhpMyAdmin, a web-browser-based front-end to MySql? I got kinda used to it for creating and managing MySql schemas.

--
Table-ized A.I.

Re:PhpMyAdmin Equiv? by twoblink · 2006-12-09 20:23 · Score: 0

http://phppgadmin.sourceforge.net/

YES.

147 comments