PostgreSQL 9.2 Out with Greatly Improved Scalability

Re:/. Poll by Eponymous+Hero · 2012-09-10 11:24 · Score: 4, Insightful

E) stop using oracle and start using postgres

--
insensitive clod overlords obligatory xkcd car analogy russian reversals whoosh pedant fanbois ftfy in 3...2...1..PROFIT

Re:LOL by Anonymous Coward · 2012-09-10 11:30 · Score: 1, Informative

Because we love to bash our keyboards into so much plastic scrap whenever we come across one of its many standards-defiant idiosyncracies?

verson 9.2 by Anonymous Coward · 2012-09-10 12:01 · Score: 1

From the summary:

"9.1 adds range restricted types"

nice proof reading...

That's great and all, but . . . by Mitchell314 · 2012-09-10 12:19 · Score: 2, Funny

When are they going to come out with the feature where it installs on OS X without requiring a human sacrifice? :P

--
I read TFA and all I got was this lousy cookie

Re:That's great and all, but . . . by Anonymous Coward · 2012-09-10 12:31 · Score: 5, Informative

9.3. Seriously.
http://rhaas.blogspot.com/2012/06/absurd-shared-memory-limits.html
Re:That's great and all, but . . . by Anonymous Coward · 2012-09-10 12:57 · Score: 3, Informative

http://postgresapp.com/
Re:That's great and all, but . . . by Above · 2012-09-10 12:58 · Score: 1

What a seriously sensible and simple solution. If I could mod up, I would, but I can't so I will reply.
Re:That's great and all, but . . . by dragonk · 2012-09-10 13:17 · Score: 5, Informative

I just posted this to the blog, but I will repeat it here --
There is a very good reason we OS vendors do not ship with SysV default limits high enough to run a serious PostgreSQL database. There is very little software that uses SysV in any serious way other than PostgreSQL and there is a fixed overhead to increasing those limits. You end up wasting RAM for all the users who do not need the limits to be that high. That said, you are late to the party here, vendors have finally decided that the fixed overheads are low enough relative to modern RAM sizes that the defaults can be raised quite high, DragonFly BSD has shipped with greatly increased limits for a year or so and I believe FreeBSD also.
There is a serious problem with this patch on BSD kernels. All of the BSD sysv implementations have a shm_use_phys optimization which forces the kernel to wire up memory pages used to back SysV segments. This increases performance by not requiring the allocation of pv entries for these pages and also reduces memory pressure. Most serious users of PostgreSQL on BSD platforms use this well-documented optimization. After switching to 9.3, large and well optimized Pg installations that previously ran well in memory will be forced into swap because of the pv entry overhead.
Re:That's great and all, but . . . by cas2000 · 2012-09-10 13:40 · Score: 4, Funny

you atheists love to take all the fun out of things, don't you?
Eliminate the human sacrifice now and next you'll be saying we have to get rid of our Steve Jobs altars.
Re:That's great and all, but . . . by Anonymous Coward · 2012-09-10 14:39 · Score: 0

I figure you might know. I can't find out why pSQL uses shared memory in the first place. What info I can find says "to store cached data", but what benefit does pSQL get sharing cached data with other programs? Wouldn't that be a security hole?
Re:That's great and all, but . . . by ls671 · 2012-09-10 15:09 · Score: 1

My guess is that it is only used for postgresql clusters setup, allowing different instances of postgresql to chat together as said in the blog.

--
Everything I write is lies, read between the lines.
Re:That's great and all, but . . . by colinrichardday · 2012-09-10 15:29 · Score: 2

Get rid of your Steve Jobs altars!
Re:That's great and all, but . . . by schmiddy · 2012-09-10 17:02 · Score: 4, Insightful

There is a serious problem with this patch on BSD kernels. All of the BSD sysv implementations have a shm_use_phys optimization which forces the kernel to wire up memory pages used to back SysV segments. This increases performance by not requiring the allocation of pv entries for these pages and also reduces memory pressure. Most serious users of PostgreSQL on BSD platforms use this well-documented optimization. After switching to 9.3, large and well optimized Pg installations that previously ran well in memory will be forced into swap because of the pv entry overhead.

I don't see your comment on the blog (maybe it has to be approved?), but the same issue was raised here during review of the patch. The concern was mostly blown off (most PG developers use Linux instead of BSD, that might well be part of it), but if you had some numbers to back up your post, the -hackers list would definitely be interested. Ideally, you could give numbers and a repeatable benchmark showing a deterioration of 9.3-post-patch vs. 9.3-pre-patch on a BSD. If that's too much work, just the numbers from a dumb C program reading/writing shared memory with mmap() vs. SysV would be a good discussion basis.

--
http://cltracker.net -- powerful craigslist multi-city search
Re:That's great and all, but . . . by Anonymous Coward · 2012-09-10 17:50 · Score: 0

One could mlock() the mmap()-ed region.
Re:That's great and all, but . . . by gazbo · 2012-09-10 21:44 · Score: 3, Informative

Each client connected to the DB has its own child process - the shared memory is a buffer that is shared across postgresql child PIDs with the same parent. That's why the proposed patch would work using an anonymous shared memory segment - because the memory is only passed to children of the same process.
Re:That's great and all, but . . . by fuzzytv · 2012-09-10 22:12 · Score: 1

Because it's not just caching ...
Most of the shared memory is usually reserved for shared buffers, i.e. cached blocks of data files - this is something like a filesystem cache (and yes, some data may be cached twice) with the additional infrastructure for shared access to these blocks (especially for write), and so on. But there's more that needs to be shared - various locks / semaphores etc. info on connections, cluster-wide caches (not directly files) etc.
I'm not saying some of this can't be done using a page cache in Linux or something, but relying on that would make the whole database much more difficult to port to various OS (right now it has almost zero knowledge of the file system / cache beneath it). So it has pros and cons, and the pros of using shared memory outweight the cons.
Re:That's great and all, but . . . by Anonymous Coward · 2012-09-10 22:45 · Score: 0

You mean - http://postgresapp.com ?
Re:That's great and all, but . . . by Anonymous Coward · 2012-09-10 23:33 · Score: 0

`brew install posgresql` worked flawlessly for me.
Re:That's great and all, but . . . by Anonymous Coward · 2012-09-11 00:20 · Score: 0

A process per client doesn't sound very scalable since each client would have to consume at least one thread. Ideally, any single service/application should NEVER have more threads than there are n+1 logical CPUs.

TY for the info. Lots of good responses to my question.
Re:That's great and all, but . . . by gazbo · 2012-09-11 00:52 · Score: 3, Informative

Well...arguably. This is the exact same argument as Apache vs Nginx, where Apache spawns a child process per client, whereas Nginx has a limited number of worker processes that handle a queue of requests as they become free. Nginx definitely has an advantage in terms of RAM when servicing thousands of (truly) simultaneous requests.
While Postgresql does use the Apache model, there is middleware available (google 'pgpool' for an example) that amongst other things will queue requests so they can be serviced by a limited number of children. Of course this only matters if there are an awful lot of simultaneous queries (without the corresponding amount of server RAM).
However; your claim about threads per CPU is oversimplified, and especially wrong with a DB server where processes will most likely be IO bound. With 1 core, for example, there is nothing wrong with having 5 processes parsing and planning a query for a few microseconds, while the 6th is monopolising IO actually retrieving query results. Or the reverse - having 1 CPU-bound process occasionally being interrupted to service 5 IO bound processes, which would negligibly impact the CPU-bound query, while hugely improving latency on the IO bound queries.
Re:That's great and all, but . . . by TheLink · 2012-09-11 02:38 · Score: 1

Ideally, any single service/application should NEVER have more threads than there are n+1 logical CPUs.
In the ideal world you'll never have more nonparallelizable tasks than you have CPUs.
However in the real world you often do. It is usually better for the application developers to focus on having their application solve the application related problems, and let the OS take care of the multitasking and other OS related problems.
A process per client also means that if a process crashes it is less likely to affect other clients. And if there are memory leaks for whatever weird/stupid reason, if you close that process, it frees up the mem and does not affect other processes. Compare Google Chrome and Firefox - Chrome often actually uses more memory for a given set of pages, but just close the unwanted tabs and windows and the mem is freed up, with Firefox sometimes due to plugins, bugs etc you have to close the ENTIRE browser to free up the memory. You can usually get away with having a DB process quit every now and then, but people are more likely to notice if you keep restarting the entire DB.
Of course if you do ever need 1000 or more simultaneous DB connections you probably need a different solution than a single server running plain Postgresql.
--
- Too many replies beneath your current threshold
Re:That's great and all, but . . . by Anonymous Coward · 2012-09-11 03:21 · Score: 1

However; your claim about threads per CPU is oversimplified, and especially wrong with a DB server where processes will most likely be IO bound. With 1 core, for example, there is nothing wrong with having 5 processes parsing and planning a query for a few microseconds, while the 6th is monopolising IO actually retrieving query results. Or the reverse - having 1 CPU-bound process occasionally being interrupted to service 5 IO bound processes, which would negligibly impact the CPU-bound query, while hugely improving latency on the IO bound queries.
That is the old non-async model. Coroutines and async IO running one thread per CPU will dramatically increase performance under-load over one Process or thread per task.

The only work that shouldn't be done as a coroutine(co-op multi-tasking) is anything that needs guaranteed time, like management threads. But any dumb work getting done should just be part of a pool.

Well, the model I described really only works if you only care about throughput. Assuming you're IO bound and not CPU bound, latency shouldn't be an issue.

Most modern servers are IO bound well before their 32+ hardware threads get pegged.

While nginx does do many things as async, it's not a perfect design. They still use child processes and copy data via IPC, which is dramatically worse than a single process with threads.

Amdahl's law is essentially based on how much overhead is caused by resources sharing. Copying data via IPC and context switching is much worse than passing pointers and using lock-less designs. you may not see the issue under low load, but over-load a machine and watch it crumble. A properly designed program should have almost no negative scaling past 100% load. Negative scaling is what crashes servers.. that and running out of virtual memory.
Re:That's great and all, but . . . by Anonymous Coward · 2012-09-11 04:48 · Score: 0

Of course if you do ever need 1000 or more simultaneous DB connections you probably need a different solution than a single server running plain Postgresql.
Funny you say that. We use MSSQL at work and having 1500+ connections to a DB is quite normal. We have A LOT of web servers and the DBs are rarely the bottleneck(complex Business logic), and independent customer DBs. Our biggest issue is feeding our dual-socket 8 core Intel Xeons with enough IO. Two or three fiber channel connections to the SAN with local 15k SAS drives for temp and logs and 48GB-64GB of ram, and the CPUs are mostly idle.

Your main point about separate processes to help alleviate the almost inevitable human mistakes(mem leaks, crash, etc), is what I did not think about. Thanks for pointing this out.
Re:That's great and all, but . . . by TheLink · 2012-09-11 05:26 · Score: 1

FWIW MSSQL defaults to 255 worker threads, which is likely to be more than the number of logical CPUs in most servers.

If you're the OP AC, you can try reduce your max worker threads to "n+1 logical CPUs" on a 1500 connection test DB server and see if the DB performs better.

I doubt it will. The thing is a thread of execution is a useful concept for a programmer - you set up a thread to handle each task and let the OS worry about multiplexing efficiently across logical/physical/whatever CPUs. Same goes for processes. Trying to squeeze out more performance by doing the OS's job should only be reserved for very specific cases. One day the rules could change (hot-pluggable/asymmetric CPUs etc), you'd then have to modify the app if you did the OS's job, whereas if you didn't you just wait for OS people to issue an update, and then you get the performance increase.

Similarly some DBs have an option to bypass the OS filesystem stuff (e.g. Oracle RAW device). But I don't think most sane admins would want to do that. Having the OS take care of it allows you to do a lot of nice stuff.
--
- Too many replies beneath your current threshold
Re:That's great and all, but . . . by Anonymous Coward · 2012-09-11 05:29 · Score: 0

Actually, PostgreSQL is very, very scalable. Furthermore, the process approach also improves stability and reliability. Those are things most people don't learn about threads. Threading use classically lends itself to resource leaks and deadlocks. In PostgreSQL's process model, every connection resets everything. As a result, its more less likely to suffer from leaks and deadlocks.
Basically, the only issue PostgreSQL has ever suffered from the process vs thread model is the simple fact processes tend to take longer to spawn. As such, in high connection rate environments, PostgreSQL suffers. The solution is, however, readily available in a number of forms. Generally they are connection caching middle wares. This in turn can dramatically reduce connection spawns as a bottleneck. Generally speaking, done properly, its never an issue.
Re:That's great and all, but . . . by dragonk · 2012-09-11 07:31 · Score: 1

http://dl.wolfpond.org/benchs/Pg-benchmarks.2011-11.pdf
See page 3, the shm_use_phys #'s vs the other for FreeBSD. DragonFly does not see this hit because we _excessively_ cache pv entries (it would be nice if we could dial this back).
The performance difference here is identical to what this patch will cause.
Re:That's great and all, but . . . by m.dillon · 2012-09-11 08:06 · Score: 1

Here's the problem in a nutshell... any memory mapping that is NOT a sysv shm mapping with the use_phys sysctl set to 1 requires a struct pv_entry for each pte.
Postgres servers FORK. They are NOT threaded. Each fork attaches (or mmap in the case of this patch) the same shared memory region, but because the processes fork instead of thread each one has a separate pmap for the MMU.
If you have 60 postgres forked server processes each mapping, say, a 6GB shared memory segment and each trying to fault in the entire 6G, that is 60 x 6GB worth of page tables, around 87 MILLION page table entries, and thus 87 MILLION struct pv_entry structures.
This is why it blows up. That's just too much. The sysv use_phys hack removes the need for a pv_entry structure. Thus removing several gigabytes of memory that the kernel would otherwise have to allocate to track all those pv_entry's.
There are optimizations that can be done to reduce memory use in the BSDs, but the main problem here is that postgres is using fork() instead of threads. If it used threads the management overhead would be 1/60th what it is now and having 1/60'th the pv_entries would not be a big deal.
-Matt
Re:That's great and all, but . . . by m.dillon · 2012-09-11 08:43 · Score: 3, Informative

I don't think this is true any more. Threads are light weight... that's the whole point. They all share the same pmap (same hardware page table). Switching overhead is very low compared to switching between processes.
The primary benefit of the thread is to allow synchronous operations to be synchronous and not force the programmer to use async operations. Secondarily, people often don't realize that async operations can actually be MORE COSTLY, because it generally means that some other thread, typically a kernel thread, is involved. Async operations do not reduce thread switches, they actually can increase thread switches, particularly when the data in question is already present in system caches and wouldn't block the I/O operation anyway.
There is no real need to match the number of threads to the number of cpus when the threads are used to support a synchronous programming abstraction. There's no benefit from doing so. For scalability purposes you don't want to create millions of threads (of course), but several hundred or even a thousand just isn't that big a deal.
In DragonFly (and in most modern unix's) the overhead of a thread is sizeof(struct lwp) = 576 bytes of kernel space, +16K kernel stack, +16K user stack. Everything else is shared. So a thousand threads has maybe ~40MB or so of overhead on a machine that is likely to have 16GB of ram or more. There is absolutely no reason to try to reduce the thread count to the number of cpu cores.
--
There are two reasons for using lock memory for a database cache. The biggest and most important is that the database will be accessing the memory while holding locks and the last thing you want to have happen is for a thread to stall on a VM fault paging something in from swap. This is also why a database wants to manage its own cache and NOT mmap() files shared... because it is difficult, even with mincore(), to work out whether the memory accesses will stall or not. You just don't want to be holding locks during these sorts of stalls, it messes up performance across the board on a SMP system.
Anonymous memory mmap()'s can be mlock()'d, but as I already said, on BSD systems you have the pv_entry overhead which matters a hell of a lot when 60+ forked database server processes are all trying to map a huge amount of shared memory.
Having a huge cache IS important. It's the primary mechanism by which a database, including postgres, is able to perform well. Not just to fit the hot dataset but also to manage what might stall and what might not stall.
In terms of being I/O bound, which was another comment someone made here... that is only true in some cases. You will not necessarily be I/O bound even if your hot data exceeds available main memory if you happen to have a SSD (or several) between memory and the hard drive array. Command overhead to a SSD clocks in at around 18uS (verses 4-8mS for a random disk access). SSD caching layers change the equation completely. So now instead of being I/O bound at your ram limit, you have to go all the way past your SSD storage limit before you truly become I/O bound. A small server example of this would be a machine w/16G of ram and a 256G SSD. Whereas without the SSD you can become I/O bound once your hot set exceeds 16G, with the SSD you have to exceed 256G before you truly become I/O bound. SSDs can essentially be thought of as another layer of cache.
-Matt
Re:That's great and all, but . . . by dragonk · 2012-09-11 12:06 · Score: 1

This is the biggest issue, not the direct performance hit, but the memory overhead. A pv_entry on 64-bit DragonFly is 80 bytes. 60 clients sharing a 6gb segment is not entirely unrealistic, this is like 7GB of pv_entry overhead.
Re:That's great and all, but . . . by m.dillon · 2012-09-11 15:21 · Score: 1

Yes, and that is precisely what happens. But it means that we had to size-down the shared-memory segment in order to take into account that the machine had 7GB less memory available with that many servers running.
There is a secondary problem here... not as bad, but still bad, and that is the fact that each one of those servers has to separately fault-in the entire 6GB. That's a lot of faults. There would be 1/60th as many faults if the servers were threaded. This is a secondary problem because it only really matters in the warmup phase. Once the pages are faulted in you're done. It doesn't seem to effect performance much on DFly (mainly because our VM fault path is fine-grain locked for SMP now), but it does effect cpu use.
There are a couple of possible solutions. We can take the linux route of algorithmically reverse-mapping via higher-level data structures (vm_map_entry in our case), but this leads to inefficiencies in other areas since more page tables have to be scanned than necessary.
FreeBSD uses a much smaller pv_entry structure than we do, but that doesn't really fix the problem, it just allows one to scale a little bit bigger before you hit it again.
There's also using 2MB pages, but that has fragmentation issues if they aren't dedicated at boot time. FreeBSD has been experimenting with those.
Theoretically one could aggregate pv_entry's to cover more than one page, but that is even more complex than the normal PV scheme.
Finally there's the solution that I think I'm going to try to implement for DragonFly, which is to detect shared mappings that are multiples of the segment size which are mapped the same way (R+W), and allow those page table pages to be shared across UNRELATED pmaps. That is, attach the page table pages to the related VM objects. This solves the problem neatly with the only downside being when/if a PTE has to be removed, which since the pmap's are unrelated requires synchronizing the invlpg instruction across all cpus on the system to avoid various cpu bugs and races.
This latter solution would work equally well for SysV SHM and mmap(... MAP_SHARED), and work whether a server forks or threads. It lets us keep our fine-grained pv_entry abstraction. This is the one I'm going for.
-Matt

Range data types by slack_justyb · 2012-09-10 13:41 · Score: 5, Interesting

I think everyone has glossed over the single most important feature in the Postgre SQL that they have refined in this release, IMHO. Ranged data types. Let's say you have a meeting schedule DB application. Well currently if you want to restrict a room between two times (start and stop) so that no one else can have the room during that time, you are going to have to write that logic in your application.

Postgre's range data type allows you to create unique checks on ranges of time. This can in two lines of code, do every single logic check that is needed to ensure no two people schedule the same room at the same time.

How this is not showing up on anyone's radar is beyond me, or maybe we all just use Outlook or Google Calendar now. However, the range types are not just limited to the application of time, but of anything that requires uniqueness along a linear fashion, as opposed to just checking to see if any other record matches the one that you are trying to insert.

Re:Range data types by Anonymous Coward · 2012-09-10 14:18 · Score: 0

I'm shocked it's not already easy to do with a fairly simple constraint and two time fields.
Re:Range data types by Anonymous Coward · 2012-09-10 14:52 · Score: 0

> Well currently if you want to restrict a room between two times (start and stop) so that no one else can have the room during that time, you are going to have to write that logic in your application.
So you want to do that in the DB now. Will you have to change column definition to change that range? For example, somebody already booked several hours in that room. I can see range feature can be useful but don't see relationship to your scheduling example.
Re:Range data types by nebosuke · 2012-09-10 15:23 · Score: 2

It seems that you're misunderstanding the definition of a range datatype in this context. The data type of the column in the scheduling example would be defined as a timestamp range, and the constraint on the column would be that no timestamp range value can overlap with any other timestamp value in the table (or in the table for any rows that share a key value, such as user ID). There is no need to alter the column definition to accomodate changes to scheduling data.
Re:Range data types by nebosuke · 2012-09-10 15:38 · Score: 2

Strictly speaking, at minimum it would require two time fields and two boolean fields, with each boolean field specifying whether or not the interval is inclusive of each corresponding end point. It would also require a lot more than one simple constraint to get the desired behavior provided by the new datatype--and the whole mess would need to be repeated for every single interval with a simple exclusivity constraint. The new range datatype also makes it relatively simple to, e.g., specify a non-zero overlap constraint, which is significantly harder without it.
Re:Range data types by Kergan · 2012-09-10 20:19 · Score: 3, Informative

Oh, it's simple enough to do with two separate fields and a check constraint. That's how you'd do it i other DB engines, in fact.
Ensuring there are no overlaps is an entirely different story, however: queries against those two fields cannot make any reasonable use of an index. The ranged type, by contrast, allows you to query the data using a nearest neighbour search and a GiST index.
Think of a GiST index as indexing the smallest boxes that enclose your shapes of interest. When queried, the DB scans for boxes that overlap your box of interest, and discards rows that don't match the data's actual shape.
Re:Range data types by DragonWriter · 2012-09-11 02:05 · Score: 1

I think everyone has glossed over the single most important feature in the Postgre SQL that they have refined in this release, IMHO. Ranged data types.
TFS apparently (from the link, which goes to range datatypes) meant to refer to them when it made the comment about "range-restricted datatypes".
Re:Range data types by DragonWriter · 2012-09-11 02:12 · Score: 2

So you want to do that in the DB now. Will you have to change column definition to change that range?
No, the range is data, not part of the column definition, I would say "RTFA", but to be fair the link was mislabelled in TFS as being about "range-restricted types", rather than range types.
But here's the docs on range types. The scheduling use case is the basic example of exclusion constraints on range types (Sect 8.17.10 in the linked doc.)
Re:Range data types by stonecypher · 2012-09-11 03:16 · Score: 1

Strictly speaking, at minimum it would require two time fields and two boolean fields, with each boolean field specifying whether or not the interval is inclusive of each corresponding end point.
Yeah, or, maybe you could go look at how range fields work, and find out that that's part of the range field already.

It would also require a lot more than one simple constraint to get the desired behavior provided by the new datatype
No, actually, it wouldn't, pretend-o-saurus. It's called "an exclude constraint."

and the whole mess would need to be repeated for every single interval with a simple exclusivity constraint.
"You're wrong about this system I've never looked at because the first implementation that popped to mind wouldn't cover these obvious cases, and it's apparently beyond me that that might mean that I've guessed wrong how the feature works, instead of that you're an idiot."
Fake mode off, please.

--
StoneCypher is Full of BS
Re:Range data types by Dewin · 2012-09-11 06:41 · Score: 1

Optimization of a constraint involving date ranges is a bit more difficult than you might think, and having it as one unified type makes queries a lot cleaner and indexes a lot more efficient (if done as GiST indexes anyways)
Old: WHERE (a.starttime BETWEEN b.starttime AND b.endtime OR b.starttime BETWEEN a.starttime AND a.endtime)
New: WHERE a.timerange @@ b.timerange
The speedup when you're doing things like trying to find overlaps between two lists of tens of thousands of ranges each is phenomenal.
Before 9.2, I did this (for timestamp ranges only) using Jeff Davis's Temporal Extensions for PostgreSQL, which I've submitted a few patches to.

--
Of course nobody reads the FAQ! If people read the FAQ, the Questions wouldn't be so Frequently Asked.
Re:Range data types by nebosuke · 2012-09-11 07:37 · Score: 1

Not sure if you're a troll, but, in case you're being serious and just suffered a reading comprehension failure or skipped over the ggp post because you're browsing at >0, I suggest that you re-read the thread.
Re:Range data types by DragonWriter · 2012-09-12 01:49 · Score: 1

Optimization of a constraint involving date ranges is a bit more difficult than you might think, and having it as one unified type makes queries a lot cleaner and indexes a lot more efficient (if done as GiST indexes anyways)
Old: WHERE (a.starttime BETWEEN b.starttime AND b.endtime OR b.starttime BETWEEN a.starttime AND a.endtime)
New: WHERE a.timerange @@ b.timerange
Also, strictly speaking, you can't do the first one as a constraint at all (you can do it as a query condition, or enforce a constraint-like behavior through a series of triggers using queries with that condition.) OTOH, the latter form you can do as an exclusion constraint in 9.2, which -- in addition to any performance improvements -- is a lot more direct and expressive.

Before 9.2, I did this (for timestamp ranges only) using Jeff Davis's Temporal Extensions for PostgreSQL, which I've submitted a few patches to.
Which really is the direct predecessor of 9.2's Range Type support (from the same developer, too.)
Re:Range data types by Dewin · 2012-09-12 03:21 · Score: 1

Before 9.2, I did this (for timestamp ranges only) using Jeff Davis's Temporal Extensions for PostgreSQL, which I've submitted a few patches to.
Which really is the direct predecessor of 9.2's Range Type support (from the same developer, too.)

Heh, I should have guessed.
It was a few years ago, but we actually tossed around some ideas on a standard format for applying the range concepts to types besides timestamps. One of the issues then was that a small handful of built-in types have a notion of infinity/-infinity, but some (e.g. ints) do not -- yet ranges really need to support the notion, even a range of ints. It looks like the 9.2 implementation uses its own definition for -infinity/+infinity unrelated to the type in question, which thinking about it now might not have been the best decision (at least, without the ability to define that [,y] and [-infinity,y] are synonymous) since -infinity (as defined by range types) is less than all other values including -infinity (as defined by the contained type).

--
Of course nobody reads the FAQ! If people read the FAQ, the Questions wouldn't be so Frequently Asked.

Postgres-Curious by kwalker · 2012-09-10 13:42 · Score: 4, Interesting

TL;DR: Is there an advanced PostgreSQL for MySQL Users guide out there somewhere? Something more than basic command-line equivalents? And preferably from the last two major releases of the software?

Long version
I've been using MySQL personally and professionally for a number of years now. I have setup read-only slaves, reporting servers, multi-master replication, converted between database types, setup hot backups (Regardless of database engine), recovered crashed databases, and I generally know most of the tricks. However I'm not happy with the rumors I'm hearing about Oracle's handling of the software since their acquisition of MySQL's grandparent company, and I'm open to something else if it's more flexible, powerful, and/or efficient.

I've always heard glowing, wonderful things online about PostgreSQL, but I know no one who knows anything about it, let alone advanced tricks like replication, performance tuning, or showing all the live database connections and operations at the current time. So for any Postgres fans on Slashdot, is there such a thing as a guide to PostgreSQL for MySQL admins, especially with advanced topics like replication, tuning, monitoring, and profiling?

--
... And so it comes to this.

Re:Postgres-Curious by Anonymous Coward · 2012-09-10 14:35 · Score: 0

The "new" replication thingy in postgresql 9.x is ... new. You need to wait a bit for those HOWTO guide.
But the official documentation is complete, it contain everything you needs.
Re:Postgres-Curious by Art3x · 2012-09-10 15:14 · Score: 4, Informative

PostgreSQL replication is new (revision 9.1) so there may be little out there (Yes, there was replication, but with additional software, like Slony).
I'm in the weird position of having used PostgreSQL mainly --- for seven years, writing dozens of applications --- but never MySQL. I've also used --- out of necessity only --- Microsoft SQL, Oracle, and Ingres, and PostgreSQL is much better. Just from a programming point of view, the syntax is, in my mind, simpler yet more powerful --- more ANSI-SQL-compliant, too, I've heard.
Anyway, the point is, I've never used anything I like more. I adore PostgreSQL. It's so powerful. So many useful datatypes, functions, syntax. Not to mention it's ACIDity.
To your question, though --- are there any good books to help a MySQLite move to PostgreSQL? Not that I've come across. But then again, I haven't found any good PostgreSQL books --- or even, for that matter, very well-written SQL books, period. They all are stupefyingly boring --- but I got what I could out of them.
Actually, PostgreSQL's documentation is not that bad. In particular, try sections I, II, V, VI, and III, in that order. Skip anything that bores you at first. You can always come back. Honestly, there can't be that much of a learning curve for you, coming from MySQL.
Re:Postgres-Curious by poet · 2012-09-10 16:07 · Score: 2

9.0 was the first version with replication, not 9.1 and we have had things like warm standby since 8.1.

--
Get your PostgreSQL here: http://www.commandprompt.com/
Re:Postgres-Curious by gullevek · 2012-09-10 16:29 · Score: 2

There are two PostgreSQL books I used a lot in the past: PostgreSQL 9.0 High Performance by Gregory Smith (Packt) and PostgreSQL Second Edition by Douglas Douglas (O'Reilly).
There is an extended list of books listed on the PostgreSQL homepage: http://www.postgresql.org/docs/books/
Problem with all books is, they get outdated too quickly. While a lot of the basic info is still true for the books above, the O'Reilly book is very much based on 8.4 with is pretty ancient already. Perhaps getting an ebook is less a waste of paper.

--
"Freiheit ist immer auch die Freiheit des Andersdenkenden" - Rosa Luxemburg, 1871 - 1919
Re:Postgres-Curious by Anonymous Coward · 2012-09-10 16:39 · Score: 0

Native replication is a fairly recent addition to Postgresql. It is definitely one thing that MySQL has better options with. From a tech talk I listened in on, online, it just seemed kludgey. It seemed to me the way Postgres does logging makes it a little more challenging. I like Postgres for all its other strengths and use it, but they should focusing on things like replication, materialized views, and other more core business friendly features. I think it will be a few more releases before people won't have to rely on third party software for serious high availability stuff. I say this only because I expected it to be a major feature of this release since it is still relatively new and had a way to go when first released. I sure hope they don't think they have something that is actually good for serious work in that regard.
Re:Postgres-Curious by rycamor · 2012-09-10 17:17 · Score: 4, Informative

Unfortunately, I haven't found a really good guide of the type you are looking for. I can give you my experiences, going from MySQL to PostgreSQL, back to MySQL to support it at a large company, and then back to PostgreSQL. Generally, these days there is really *nothing* that I can find about MySQL that can't be done better in PostgreSQL. I mean it. At least for awhile MySQL could boast of native replication, but Postgres biw has that and it is arguably much more robust than MySQL's solution (had the misfortune to support MySQL replication for 2 years). Ditto with full-text indexing, and just about any other MySQL feature.
Main differences:
1. PostgreSQL is much more "correct" in how it handles data and has very little (essentially no) unpredictable or showstoppingly odd behavior of the sort you find in MySQL all the time. Your main problem in migrating an app to PostgreSQL will be all those corner cases that MySQL just "accepts" when it really shouldn't, such as entering '0000-00-00' into a date field, or allowing every month to have days 0-31. In other words, PostgreSQL forces you to be a lot more careful with your data. Annoying, perhaps, if you are developing a non-mission-critical system like a web CMS or some such, but absolutely a lifesaver if you deal with data where large numbers of dollars and cents (or lives) depend on correct handling.
MySQL has provided for a fair amount of cleanup for those who enable ANSI standard behavior, but it is still nowhere close to PostgreSQL's level of data integrity enforcement.
2. MySQL has different table types, each of which support different features. For example, you cannot have full-text indexing in InnoDB (transactional) tables. PostgreSQL has complete internal consistency in this regard.
3. MySQL has an almost entirely useless error log. PostgreSQL's can be ratcheted up to an excruciating level of detail, depending on what you want to troubleshoot. Ditto with error messages themselves.
4. MANY MANY more choices in datatypes and functions to manipulate them. Definitely a higher learning curve, but worth it for expressive capability.
5. Don't get me started on performance. Yes, if you have a few flat tables, MySQL will be faster. Once you start doing anything complicated, you are in for a world of pain. Did you know that MySQL re-compiles every stored procedure in a database on every new connection? PHP websites with per-page-load connections can really suffer.
6. Don't get the idea that PostgreSQL is more complex to work with. If you want simple, you can stick with the simple parts, but if you want to delve into complex database designs and methodologies, PostgreSQL pretty much opens up the world to you.
- Glad to be back in the PostgreSQL world...
Re:Postgres-Curious by Jay+L · 2012-09-10 18:22 · Score: 1

Greg Smith's book "High-Performance SQL" is a good start.
Re:Postgres-Curious by aralin · 2012-09-10 19:00 · Score: 1

If you look for a good SQL programming book, the PL/SQL book from Oracle is the best book written in this area, IMHO. As for the MySQL to PostgreSQL book, there was no incentive to write it for PostgreSQL power users. We mostly looked over the time at MySQL as toy database and it's users as at best misguided and at worst, not caring about data integrity (cardinal sin in my book). So writing such book would be sort of like "Black Hat Hacking for Script Kiddies". Sure it could be done, but who wants a bunch of amateurs crowding their field? Next thing you know, they will ask for bugfix to treat NULL like zero. :)

--
If programs would be read like poetry, most programmers would be Vogons.
Re:Postgres-Curious by fuzzytv · 2012-09-10 22:31 · Score: 1

No, I'm not aware of such thing ("PostgreSQL for MySQL people" style guide).
The best thing you can do is give it a ride - install it, use http://www.postgresql.org/docs/9.1/interactive/admin.html to do the setup etc.
Basically all you need to do to install and start the PostgreSQL from source code is this (at least on Linux):
$ cd postgresql-9.1.5
$ ./configure --prefix=/path-to-install
$ make install
$ export PATH=/path-to-install/bin:$PATH
$ pg_ctl -D /database-directory init ... fiddle with the config at /database-directory/postgresql.conf
$ pg_ctl -D /database-directory -l /database-directory/postgresql.log start
and then
$ createdb testdb
$ psql testdb
and you're in. If you're installing that from a package (e.g. RPM in RedHat), it might work a bit differently - depends on the packager.
Anyway, use the project mailing lists and IRC channel - there's always someone ready to help / answer novice or complex questions etc.
Re:Postgres-Curious by fuzzytv · 2012-09-10 22:41 · Score: 3, Informative

Well, recommending a PL/SQL book as a source for learning SQL is a bit silly IMHO. Moreover, I find the books from Oracle rather bad - there are better sources to learn PL/SQL (e.g. the one from Feuerstein is a much better book).
And in fact there's a great book about administering PostgreSQL from Hannu Krosing - it's called "PostgreSQL 9 Admin Cookbook" [http://www.packtpub.com/postgresql-9-admin-cookbook/book]. It's a great set of recipes for admins for common tasks, not an exhaustive documentation (that's what http://www.postgresql.org/docs/9.1/interactive/index.html is for), but if you want to learn how real pros admin the database, this is the right choice. And yes, I'd recommend it to newbies coming from MySQL.
It might seem that the PostgreSQL community considered MySQL to be a toy database in the past, but it definitely was not a generally shared view. And this definitely changed recently - there's no reason not to join the community mailing lists / IRC channel and start a post with "I'm using a MySQL right now and I don't understand why PostgreSQL does SOMETHING."
Re:Postgres-Curious by fuzzytv · 2012-09-10 22:53 · Score: 1

What's wrong with third-party stuff? I mean, looking bad it was silly to expect this to happen with replication (third-party replication solutions, not included in the core), but with the management tools this should not be a problem - there are already tools like repmgr and more to come. The problem with in-core tools is that they hard-code a single way to do things the release cycle is tightly bound to the PostgreSQL itself and it's a significant effort for the whole community.
Regarding the replication - it's still a relatively new feature (although two years is a long time), so it has some rough edges, but I wouldn't call it kludgey at all. It's very nicely thought-out and crafted feature. Once you get an idea of how it works / how to set it up, it's fairly simple to do that again and even write a bunch of simple scripts to manage it.
And yeah, there are many features we'd like to see in PostgreSQL, but the community simply has a limited manpower and it's using it to satisfy the main needs of it's current users. It's not that the community strives to beat all the other databases - the goal of the community is to provide useful features for it's users.
Re:Postgres-Curious by Anonymous Coward · 2012-09-11 00:07 · Score: 0

Isn't that like suggesting a javascript book for a java progammer? (or vice versa..)
Re:Postgres-Curious by Anonymous Coward · 2012-09-11 00:53 · Score: 1

Read the PostgreSQL docs.
Unlike MySQL, their docs are clear, complete and exhaustive. The MySQL docs require you to read user comments on the documentation to learn how the software actually works.
Not the case here, PostgreSQL's design and behavior is clearly and properly documentated
Re:Postgres-Curious by omnichad · 2012-09-11 02:22 · Score: 1

I have to admit, as a long-time MySQL user, it really messes with your head and makes you not do things in a way that works with MS SQL Server or PostgreSQL. Especially how MySQL does its lazy grouping.
I've only tried other databases for a short while and give up because I know that I'd have to learn everything properly. If I was starting a brand new project, it might be great, but I wouldn't want to rewrite an existing database app with it.
Re:Postgres-Curious by rycamor · 2012-09-11 04:35 · Score: 1

Kludgy? You must be talking about MySQL's "solution". The one that is not really truly transaction-safe, nor dependable. I can't tell you how many times I've logged into a MySQL server in the morning only to find replication broken.
Native replication has been available for almost two years now. The fact that it uses the write-ahead log in conjunction with streaming is exactly the kind of solution you need if you want dependable transaction-safe replication. I suggest that PostgreSQL took longer to achieve built-in replication precisely because they chose not to kludge it.
Materialized views? You mean you can't figure out how to use triggers and rules? it's all there for you. Triggers or rules give you the ability to specify exactly how the materialization works, including such things as spreading data over multiple tablespaces and partitions for performance.
Core business-friendly features like foreign table wrappers, nested transactions, windowing queries, table partitioning? Check to all those. The only major ones I can think of are multi-master replication and OLAP, for which you do have to go 3rd party. IMHO multi-master replication is one of those things that is very hard to get right no matter which DBMS you use (MySQL's MMR is a cruel joke). Generally it should be avoided unless absolutely necessary. Much better to have robust, well-understood master/slave failover with beefy machines (cloud serving can actually help in that regard) than complex and fragile multi-master with many smaller distributed machines.
Re:Postgres-Curious by aralin · 2012-09-11 06:51 · Score: 1

I'm not sure if the book available to Oracle employees on PL/SQL is the same as the one available externally, I assume so. The books by Oracle are generally not so good, I'd agree, but the PL/SQL one is a rare gem. It sounds like you read a bunch of Oracle books, but not this one and you recommend what you did read on the subject, which is fine. But in this case ... Anyway....
What the point was not a good Postgres book, there are some. The point was a comparison book taking you from MySQL to PostgreSQL and I was explaining why there aren't any.

--
If programs would be read like poetry, most programmers would be Vogons.
Re:Postgres-Curious by Anonymous Coward · 2012-09-11 10:04 · Score: 0

There's www.postgresguide.org, but nothing really targeted at advanced MySQL users. It's been on the community TODO list forever, but it's hard to find people who know both MySQL and Postgres well, and have time to write.
--Josh Berkus
Re:Postgres-Curious by fuzzytv · 2012-09-12 07:15 · Score: 1

Not sure which Oracle books you mean - I've read e.g. "PL/SQL Programming" (ISBN 978-0072230666) and "Expert Oracle PL/SQL" (ISBN 978-0072261943) and probably some more when preparing for OCP exams. And I'd definitely recommend ISBN 978-0596514464 instead of the first one. But yeah, it's a matter of opinion.
But you're right - there are no "PostgreSQL for MySQL people" guides. The problem is that almost no one is able to write it. The people who are switching from MySQL to PostgreSQL don't have the knowledge and experienced people using PostgreSQL don't know MySQL that much (at least that's my impression).
Re:Postgres-Curious by Anonymous Coward · 2012-09-12 10:30 · Score: 0

Try using Informix with datablades and VTI (Virtual Table Interface) /VII (Virtual Index Interface) including the ability to write your own routines to define the selectivity of your new data types and influence the optimizer to use your own index routines...
Re:Postgres-Curious by F.Ultra · 2012-09-13 19:13 · Score: 1

>Did you know that MySQL re-compiles every stored procedure in a database on every new connection?
Actually it really doesn't, it will only recompile the stored procedure if the compiled version has left the cache, so as long as they fit into the cache you would see very little compiling going on.

Re:LOL by Tough+Love · 2012-09-10 14:09 · Score: 2, Insightful

Because we love to bash our keyboards into so much plastic scrap whenever we come across one of its many standards-defiant idiosyncracies?

You mean, idiosyncracies different from Oracle's idiosyncracies, Microsoft's idiosyncracies and IBM's idiosyncracies?

By the way, care to be specific? Oh yeah, posting anon. Right.

--
When all you have is a hammer, every problem starts to look like a thumb.

Re:meh by Tough+Love · 2012-09-10 14:14 · Score: 1

Wake me when it catches up with MemSQL.

That would have to be called MemgresQL, wouldn't it.

--
When all you have is a hammer, every problem starts to look like a thumb.

MariaDB and Percona by kbahey · 2012-09-10 14:22 · Score: 3, Insightful

Oracle is not that big a of concern.

There is MariaDB which is data-compatible with MySQL, and has some nice additions (like microsecond performance data), and there is also Percona Server.

If Oracle messes up, like they did with OpenOffice, there will be another version that they cannot touch, like LibreOffice.

--
2bits.com, Inc: Drupal, WordPress, and LAMP performance tuning.

Re:meh by Anonymous Coward · 2012-09-10 14:34 · Score: 2, Insightful

Wake me when MemSQL supports data-warehousing.

How PostgreSQL stacks up to Oracle ? by Taco+Cowboy · 2012-09-10 14:48 · Score: 3, Interesting

I've been searching for a comparison chart of various SQLs but all I can find are very very old articles

There's a database project that I'm working on and I'm choosing which SQL to be employed

MySQL is obviously not up to par

I don't know how good PostgreSQL is - so, is there a comparison chart or something that can facilitate us, the one who are going to make purchasing decision, to make one choice over the other?

Thank you !

--
Muchas Gracias, Señor Edward Snowden !

Re:How PostgreSQL stacks up to Oracle ? by rycamor · 2012-09-10 17:04 · Score: 5, Informative

Generally there is very little in the sense of logical data manipulation capabilities in which Oracle exceeds PostgreSQL (usually the opposite, actually). The main advantage Oracle has is in the extreme high end of scalability and replication, and that benefit is offset by massive complexity in setup and configuration. Even there, PostgreSQL is closing fast these days, with built-in streaming replication, table partitioning, and all sorts of high-end goodies.
I do all sorts of PostgreSQL consulting, and you would be surprised at the number of large companies and government organizations considering migration from Oracle to PostgreSQL.
And if you *really* need PostgreSQL to go into high gear, just pay for the commercial Postgres Plus Advanced Server from EnterpriseDB and you will get a few heavy-duty add-ons, including an Oracle compatiblity layer.
Also, IMHO one of the really cool things about PostgreSQL is the number of very geeky tools it puts at your disposal, such as a rich library of datatypes and additional features, along with the ability to create your own user-defined datatypes.
Re:How PostgreSQL stacks up to Oracle ? by F.Ultra · 2012-09-10 20:39 · Score: 2

Why is MySQL _obviously_ not up to par? Yes I'm really curios.
Re:How PostgreSQL stacks up to Oracle ? by serviscope_minor · 2012-09-10 21:55 · Score: 3, Informative

and you would be surprised at the number of large companies and government organizations considering migration from Oracle to PostgreSQL.
Not really.
I've had no experience with the database end of things, but I've been on the receiving end of some other Oracle "products" at two places I've been. Once you've been Oracled, there is a strong incentive never to go anywhere near them again, no matter how they look on paper.
When it comes for utter distain and hatred for their customers, Oracle make Sony look like rank ametures.
As far as Oracle are concerned, the customer is a fool whose sole purpose is to be screwed over for as much cash as possible.

--
SJW n. One who posts facts.
Re:How PostgreSQL stacks up to Oracle ? by Anonymous Coward · 2012-09-10 23:48 · Score: 1

Their developers suck. Go look at the sort of bugs MySQL gets, AND gets AGAIN.
MySQL is the PHP of databases.
Example: http://bugs.mysql.com/bug.php?id=31001
Notice the part where the bug is reintroduced. If they require regression tests to pass before releases this bug would not happen again.
Re:How PostgreSQL stacks up to Oracle ? by Nivag064 · 2012-09-11 00:58 · Score: 1

performance, reliability, ease of development...

see http://www.postgresql.org/

PostgreSQL has had ACID compliance built in from the beginning. MySQL added it much later.

Over the last 18 years I have 3 times gone searching on the Internet for comparisons - each time PostgreSQL came out better than MySQL!

PostgreSQL is more standards compliant than MySQL, and has far fewer gotchas (unintended consequences of doing something that seemed so straightforward).

I have the misfortune to have a client with an application backed by MySQL.
Re:How PostgreSQL stacks up to Oracle ? by Lennie · 2012-09-11 01:36 · Score: 2

MySQL has some nice replication built in I believe, I've never used them.
Other than that, I would thread lightly with MySQL:
"Why not MySQL"
http://www.youtube.com/watch?v=1PoFIohBSM4

--
New things are always on the horizon
Re:How PostgreSQL stacks up to Oracle ? by Lennie · 2012-09-11 01:39 · Score: 2

Lots of people such, but it is just hard to trust your data to MySQL. Just a moment ago I posted a link above to this video which illustrates it:
http://www.youtube.com/watch?v=1PoFIohBSM4

--
New things are always on the horizon
Re:How PostgreSQL stacks up to Oracle ? by Anonymous Coward · 2012-09-11 04:42 · Score: 0

Seriously? Every time a MySQL or PostgreSQL article is posted, endless reasons are posted why MySQL is a crapfest, which in turn teaches would-be developers and DBAs very bad assumptions and habits.
The rule of thumb is, when you find a DBA who recommends MySQL over another RDBMS, without lots of benchmarks and tests to back it up, chances are they are not qualified to be making such statements, let alone actual recommendations.
Re:How PostgreSQL stacks up to Oracle ? by tobiasly · 2012-09-11 07:13 · Score: 1

Lots of people such, but it is just hard to trust your data to MySQL. Just a moment ago I posted a link above to this video which illustrates it:
http://www.youtube.com/watch?v=1PoFIohBSM4
The person narrating that video sounds so much like Mr. Garrison I couldn't make it past the first minute. However the video is based largely off the info found here:
http://sql-info.de/mysql/gotchas.html
Re:How PostgreSQL stacks up to Oracle ? by Lennie · 2012-09-12 05:10 · Score: 1

Yes, I missed that link the first time I watched it, thanks.
In the comments it says:
"the vast majority of these gotchas have been solved with SQL_MODE in MySQL 5.0. The SQL_MODE must be set in your configuration file once in then you're done."
So that's also interresting to know.

--
New things are always on the horizon
Re:How PostgreSQL stacks up to Oracle ? by F.Ultra · 2012-09-13 18:34 · Score: 1

Yes seriously. We use MySQL extensively with Terabyte sizes databases with tens of thousands writes per second and have never experienced any performance or stability problems what so ever. Our customers who use MSSQL however experiences lots of troubles.
Re:How PostgreSQL stacks up to Oracle ? by F.Ultra · 2012-09-13 18:50 · Score: 1

If you read the bug that you linked to you'll see that it was a regression that wasn't catched by the original test, i.e it was probably not a regression after all, simply that the first fix wasn't a fix for all possible cases. And it's not like there has never been any bugs or regression bugs in other dbs...
Re:How PostgreSQL stacks up to Oracle ? by F.Ultra · 2012-09-13 18:57 · Score: 1

Exactly why does it matter that MySQL got ACID first with the inclusion of InnoDB? It would only matter if you pretend that MySQL version 1.0 is the only version that you can run or something...
And yeah comparisons on the Internet is of course always true :-), whell it might be that PostgreSQL is better performant etc. It just happens that for my use case it doesn't, we provide stock market data to sql-servers among other things and the only users we have that never experience performance problems or stability problems are out MySQL user base (the ones with the most problems run MSSQL).
Re:How PostgreSQL stacks up to Oracle ? by F.Ultra · 2012-09-13 19:06 · Score: 1

Very nice of the PostgreSQL developer there to run MySQL in a non strict mode and then play confused when it does exactly what he tells it to do... Perhaps it's unknown to many people, but with MySQL you can configure the database engine for different levels of SQL correctness, the default is usually to be very lenient since that is what the normal PHP user would expect (and people used to the older MyISAM storage engine). But for those who want correct SQL correctness like PostgreSQL it is possible to configure it that way.
Re:How PostgreSQL stacks up to Oracle ? by Nivag064 · 2012-09-17 11:01 · Score: 1

Have you tuned your PostgreSQL installations apropriately?
Have you tried PostgreSQL 9.2?
The most annoying, but accurate sttement from a highly paid consultant often is 'It depends!"
Re:How PostgreSQL stacks up to Oracle ? by F.Ultra · 2012-09-18 09:03 · Score: 1

That is left for out end customers to do, we simply supply the application that feeds a (any) sql server with our datafeed aswelll as the data feed, which sql to use and how to tune it is up to them. However we use MySQL internally which is not tuned at all (we are not DB savvy) but they perform very well and we have never had any problems with them, and they do receive our full data flow. Our customers with MSSQL servers however experiences lots of problems and they receive only a subset of the dataflow, I have never heard any problem with PostgreSQL but the one customer that we have that I know runs Postgres has a very competent DBA so I think he knows how to keep that one nice and tuned.

So my experience is that PostgreSQL might be a nice server, MySQL is definitely a good server and most things that are written about it is either FUD or from very old versions, and the one I would never ever touch with a ten foot pole is MSSQL.
Re:How PostgreSQL stacks up to Oracle ? by Nivag064 · 2012-09-18 09:25 · Score: 1

You said previously:
"it might be that PostgreSQL is better performant etc. It just happens that for my use case it doesn't"
I was wondering why that would be the case, as that appears contrary to what I have read - especiall since Postgres 9.2 has been released.
Re:How PostgreSQL stacks up to Oracle ? by F.Ultra · 2012-09-19 10:21 · Score: 1

Now I have not tested 9.2, but when we developed the application and also during QA we run it against all our supported sql servers and MySQL always have better insert/update performance (we are not select intense) then either MSSQL or PostgreSQL (which can be a too old version to be a reliable source since we use the version bundled with CentOS). But we have not performed any tuning what so ever of course since the purpose is to QA the application and not the dba.
Re:How PostgreSQL stacks up to Oracle ? by Nivag064 · 2012-09-19 11:24 · Score: 1

Out of the box, PostgreSQL has values for memory usage and other tuneables that are too small for production use on modern server configurations. So comparing MySQL and PostgreSQL with default settings may well lead to erroneous performance conclusions.
I used to consider myself adequately qualified to set up a reasonably tuned PostgreSQL installations - I then made the mistake of reading the PostgreSQL mailing lists! :-)
Now I know enough to safely improve the performance to some extent, but I would be more cautious about some of the tuning, as some tuning variables are very dependent on:
1. precise nature of your workload
2. operating system
3. amount of RAM
4. number of processor cores
5. number of disks
6. and many other factors
7. not to mention interaction of the above with each other!
Unfortunately, for the best tuning, it depends... There is no one set of tuning changes valid for all workloads, even for identical O/S & hardware configurations.
Fortunately PostgreSQL allows some things to be tuned not only for the installation as a whole, but also: per database, per user, per session - and for some things, for a specific transaction.
Be aware also that some usage idioms are MySQL specific, and PostgreSQL may well have alternative approaches to produce the equivalent results more efficiently. This affects not only the SQL used in a query, but also in the types of fields. Note that PostgreSQL is much more standards conformant than MySQL.
Even if PostgreSQL and MySQL had comparable performance and ease of development, I would still chose PostgreSQL for its greater reliability.
I would be interested, in you comparing the PostgreSQL and MySQL (should you have the time & inclination!), after both have been tuned to a reasonable extent. I suspect that you will find that then PostgreSQL will outperform MySQL for your workload - even with your older version of PostgreSQL.
Re:How PostgreSQL stacks up to Oracle ? by F.Ultra · 2012-09-19 18:55 · Score: 1

So what you say is that PostgreSQL _must_ be better because you like it... Or could it be that you have not successfully tuned your MySQL installation when you tested it etc etc. This can go on forever and ever. You are completely free to think that PostgreSQL is the best thing out there, it just doesn't mean that MySQL is garbage or second rate like many people claim, many people who never even have used MySQL or with anything other than MyISAM tables.

All that I want to convey is that we use MySQL for business critical, write intensive data and have never experienced any problems what so ever, which is quite contrary to the "oh no you're doomed if you use MySQL" crowd.
Re:How PostgreSQL stacks up to Oracle ? by Nivag064 · 2012-09-19 19:41 · Score: 1

I had to do some DBA stuff for a MySQL database for a project involving sales tax in the USA, and I support a web site that uses MySQL (I set up my own development environment to test it, from scratch). Performance was not a problem in either, as far as I know.
However, I do find PostgreSQL easier to set up and to write SQL for.
I have also spoken to a senior developer for http://digg.com/ about how they implemented their distributed database using MySQL - he liked PostgreSQL, but at the time it did not support master & multiple slaves (or some such). So I am aware that MySQL can handle really large transaction rates.
Though I would never recommend MySQL for really critical data, at least with my current knowledge of MySQL.
What I am interested in, is finding reliable performance comparisons between MySQL and PostgreSQL. So when you claimed MySQL performed better than PostgreSQL I was interested, as that goes against what I know.
So it is not a case of "I like PostgreSQL, therefore MySQL must be no good!"! I have also experience in Oracle, Sybase, and PROGRESS (plus at least a couple of others that I don't remember), but I find PostgreSQL easier than them. I will use whatever database is appropriate for a project, or accept whatever has been chosen - so I cannot afford to be too dogmatic.
Hence my attempt to obtain more info from you.
Re:How PostgreSQL stacks up to Oracle ? by F.Ultra · 2012-09-20 18:11 · Score: 1

Performance comparisons is very hard to do since every task out there has different requirements, and if one combines that with the infinite ways one can configure either dbs and the underlying os then you will always fail in the eyes of some one, it's also quite common for people to choose either one of MySQL or PostgreSQL and stick with it so the experience needed to properly set the other one up for the test in hand is often lacking.

Interesting that you feel PostgreSQL easier to set up, I have the exact opposite experience :), with MySQL it's simply a "mysql -p", and then issue somce "create user x identified by " and a few "create table xx", while in PostgreSQL (at least in the older versions) one has to fuss around with local user accounts and use strange commands such as \c in order to change datbase (which is a simple "use " in MySQL. All this might have changed over the years though :-)

I would have no problem (and has no problem) using MySQL for critical data, there is nothing magic in say PostgreSQL that makes it more suitable for critical data than MySQL, neither will eat your data for no good reason.
Re:How PostgreSQL stacks up to Oracle ? by Nivag064 · 2012-09-21 21:06 · Score: 1

You stated that MySQL is faster than PostgreSQL - I simply want to find out on what basis to you made that statement.
In this regard - which database I like best is not relevant.
However, you appear to be very unwilling to substantiate your claim in any meaningful manner.
Re:How PostgreSQL stacks up to Oracle ? by F.Ultra · 2012-09-22 12:38 · Score: 1

Actually I never did, I only wrote that for my use case I didn't find PostgreSQL to be better performant than MySQL. And no I'm not unwilling to substantiate, I told you how I come to my conclusion and since it wasn't done in order to benchmark I never collected any stats. I'm completely uninterested in which dba is the fastest, I simply replyed with my anecdotal experience and what I see when we perform QA on the various databases and the experience that I have with customers different setups.

Re:LOL by Anonymous Coward · 2012-09-10 14:48 · Score: 0

No, different from PostgreSQL.

comment by Anonymous Coward · 2012-09-10 15:14 · Score: 0

Was the scalability that bad to begin with ?

Re:Postgres-CurioPostgres comes with great documus by Anonymous Coward · 2012-09-10 15:18 · Score: 0

You might want to just read the Postgres documentation. It's very well written.

Re:LOL by colinrichardday · 2012-09-10 15:27 · Score: 1

How are SQL Server's idiosyncracies different from Microsoft's? Isn't SQL Server a Microsoft product?

JSON by Art3x · 2012-09-10 15:35 · Score: 2

To me, JSON very interesting. I don't know how exactly I'll use it, but it combines all that's great about PostgreSQL with some of what was interesting about CouchDB and other projects like it.

Mainly, one-to-many relationships may be easier. Usually, they are two separate select statements. For example, one to get the article, another to get the comments. Then you patch it all together in PHP, or whatever middle language you're using. With JSON support, that could be a single SELECT, crammed up in JSON, which you then uncram with a single json_decode function call in PHP, which would yield nice nested arrays.

Re:JSON by Anonymous Coward · 2012-09-10 16:08 · Score: 1

I think you just made the database fairies cry.
Re:JSON by Anonymous Coward · 2012-09-10 16:56 · Score: 1

Then you defiantly don't want to see the DB tables I've seen with just a primary key and a single XML field, then add on XPath indexes.
Re:JSON by DragonWriter · 2012-09-11 02:25 · Score: 1

Mainly, one-to-many relationships may be easier. Usually, they are two separate select statements.For example, one to get the article, another to get the comments. Then you patch it all together in PHP, or whatever middle language you're using.

I'm not sure adding new SQL features is going to deal with the problem of people not using the features they already have. Its already quite possible in PostgreSQL to do a single select that gets the article data and an aggregate that contains all the comments. Features that let you store JSON directly aren't necessary for that.
Re:JSON by Anonymous Coward · 2012-09-12 23:39 · Score: 0

You can already have nested arrays without serialising to JSON in PostgreSQL.

While Postgres is good for many things... by FlyingGuy · 2012-09-10 16:25 · Score: 2

Until the fix the TX number issue ( the infamous rollover ) then they are pretty much out of the running in DB's that have VERY high insert levels since the vacuum process cannot hope to keep up with tables that have 100's of millions of rows.

I am an Oracle professional but I do keep track of Postgres and like it, but the 32 bit TX t is a bit of an Achilles heel.

--
Hey KID! Yeah you, get the fuck off my lawn!

Re:While Postgres is good for many things... by Anonymous Coward · 2012-09-10 16:44 · Score: 2

you can't vacuum your table every 2 billion transactions? did you know autovacuum exists?
There is no table with "100's of millions of rows" that can't be vacuumed every 2 BILLION transactions.
http://www.postgresql.org/docs/current/static/routine-vacuuming.html#VACUUM-FOR-WRAPAROUND
Re:While Postgres is good for many things... by FlyingGuy · 2012-09-10 16:53 · Score: 1

hmmmm table with 17 trillion rows? Not so much.

--
Hey KID! Yeah you, get the fuck off my lawn!
Re:While Postgres is good for many things... by Anonymous Coward · 2012-09-10 16:56 · Score: 0

vacuum does not lock the tables, so there's no down time associated. so what exactly is the problem with letting autovac run every 2 billion transactions?
Seriously I would like to know.
Re:While Postgres is good for many things... by Anonymous Coward · 2012-09-10 17:24 · Score: 0

and BTW, 2 billion is the recommendation. data loss does not occur until you reach 4 billion.
Re:While Postgres is good for many things... by Anonymous Coward · 2012-09-11 00:32 · Score: 0

what exactly is the problem with letting autovac run every 2 billion transactions?
Only weak cranes ask this question.
Re:While Postgres is good for many things... by Anonymous Coward · 2012-09-11 00:54 · Score: 0

that have VERY high insert levels
Unless you also have VERY high delete levels, vacuum is going to be almost a no-op. If you disable automatic table analysis, vacuum will have nothing at all to do on the table other than update the frozen tid.
Re:While Postgres is good for many things... by LizardKing · 2012-09-11 01:39 · Score: 1

Until the fix the TX number issue ( the infamous rollover ) then they are pretty much out of the running in DB's that have VERY high insert levels since the vacuum process cannot hope to keep up with tables that have 100's of millions of rows.
Infamous to whom? A vacuum updates the frozen TID, which is a trivial operation and allows a subsequent TID to safely wrap around. And I'm struggling to think of any common use cases where the volume of inserts is so high that they can't afford a vacuum every two billion transactions - even high-frequency trading doesn't operate at those levels, and if it did I suspect TID wraparounds would not be your most pressing concern.
Re:While Postgres is good for many things... by Anonymous Coward · 2012-09-11 02:04 · Score: 1

If you have a single table with 17 trillion rows then you're doing it wrong. And inserts aren't really an issue with MVCC in PG - I'd focus more on updates.
Partitioning in PostgreSQL will let you split that up into separate physical items on disk. As others have said - you just need to let vacuum scan the table once every 2 billion transactions or so to keep things in check. Rows that aren't update regularly will be given the special frozen xid and won't be subject to any wrap around issues.
And as far as databases hitting this limit are concerned - not an issue - PG will shut down ~ 2 million transactions before that limit until you do something about it. On fast hardware you can recreate the table in no time. I can, with some degree of embarassment, admit to letting a DB get to this point because I hadn't tuned the various parameters around vacuum. Fixed that fairly quick smart let me tell you and its been humming since.
Oracle is legacy software at best these days - even in the traditional shops such as banks - leave the next wave to the new boys that are receptive to what developers want and customers need.
- maintainer of 200+TB of PostgreSQL databases.
Re:While Postgres is good for many things... by rycamor · 2012-09-11 02:12 · Score: 1

Yes, there are all sorts of interesting strategies you can employ once you separate the physical storage from the logical presentation... can't be said enough.
Re:While Postgres is good for many things... by TheLink · 2012-09-11 03:31 · Score: 1

A vacuum updates the frozen TID, which is a trivial operation and allows a subsequent TID to safely wrap around.

What if you have at least one outstanding transaction/connection? Can vacuum update the frozen TID then?
For example if you have a transaction that's open for a few weeks and happen to have 4 billion transactions during that time.
I believe perl DBI/DBD in AUTOCOMMIT OFF mode starts a new transaction immediately after you commit or rollback. So if you have an application using that library that is idling for weeks a transaction would presumably be open for the entire time- since it would be connected to the database and already have started a transaction. This is probably also true for some python and ruby db libraries.
This could also happen if someone starts up psql, starts a transaction and goes on vacation ;).
--
- Too many replies beneath your current threshold
Re:While Postgres is good for many things... by rtaylor · 2012-09-11 04:22 · Score: 2

I don't see why not.
You had the IO to create those 17 trillion tuples in the first place; so vacuum will use that same IO capacity to maintain it.
The low billions of tuples isn't much of an issue despite being on spinning disk with very little in memory.

--
Rod Taylor
Re:While Postgres is good for many things... by fuzzytv · 2012-09-11 05:06 · Score: 1

So you're calling yourself an Oracle professional and you're not aware of this: http://www.infoworld.com/d/security/fundamental-oracle-flaw-revealed-184163-0 ?
I mean - PostgreSQL does have 32 bit transactions IDs and a well designed process to prevent wraparound.
Oracle has 48bit transaction IDs, a number of bugs that speed up transaction ID growth, a feature that "synchronizes" transaction IDs through the whole cluster (thus the IDs are growing according to the busiest of the instances) and a soft SCN limit (16k * seconds since 1988) that when reached, bad things happen. I'm not saying this happens every other day, but with new shiny HW the soft SCN limit is rather easily reachable.
So while PostgreSQL has this sorted out long time ago in a quite reasonable way, Oracle issued / is issuing patches that are rather workarounds than patches. So saying "Until they fix the TX number issue ..." sounds a bit strange to me.
Re:While Postgres is good for many things... by nullchar · 2012-09-11 06:15 · Score: 1

A long open transaction (that's used the table in question) will block auto vacuum for those rows.
You can set options in postgresql.conf to auto-kill long transactions if you like (set a hard limit for transaction time).
I solved this another way by only examining IDLE transactions via pg_stat_activity. Any long running transactions are left alone, while long idle transactions are killed.
Re:While Postgres is good for many things... by Anonymous Coward · 2012-09-11 15:09 · Score: 0

Perl DBI DBD::Pg only starts a new connection as needed right before a statement is executed.

Re:meh by Nivag064 · 2012-09-10 17:07 · Score: 1

Your comment reminds me about a pregnant woman who phoned her doctor saying she had contractions - her doctor said come in when they get to 3 minutes apart - when she phoned a while later saying they were still only a minute or so apart, she was told to come in immediately! You might find that PostgreSQL is already in advance of 'MemSQL'!

More seriously, unless you say what features you think MemSQL is ahead off PostgreSQL, you are sounding very much like a troll.

The appropriate database software depends very much on the specific use case, for most real world requirements you are likely to find PostgreSQL a good choice.

Still no checksums? by Anonymous Coward · 2012-09-10 17:50 · Score: 0

Wasn't support for checksums meant to _finally_ get into 9.2? Postponed again?

PostgreSQL is so cool by DrXym · 2012-09-10 21:24 · Score: 1

It just works properly out of the box. No nasty surprises, no alarming omissions or deviations from expected database behaviour. It's just a fast, reliable database which also happens to be open source and free.

I'm sure most of this applies to MySQL these days but historically it didn't and I never saw the attraction of a DB which went through a succession of backends in order to obtain the behaviour PostgreSQL always supplied. It doesn't help that MySQL is Oracle owned and all the issues with licencing and forking which have arisen out of that.

Re:PostgreSQL is so cool by Lennie · 2012-09-11 03:00 · Score: 1

I believe they both improved.
PostgreSQL 7.x wasn't as much fun either which didn't have autovacuum and needed a lot of tuning.
I haven't tried something like Drizzle but it seems they ditched a lot of old code and problems.

--
New things are always on the horizon
Re:PostgreSQL is so cool by WuphonsReach · 2012-09-12 13:53 · Score: 1

Yeah, 7.x was no picnic. Especially if you came from a Windows background and needed to install it on a Windows server.

We didn't switch until 8.0 or 8.1, after we were able to install as a native Windows application and play with it. The pgsql database servers are actually Linux, but we were still feeling our way there as well.

--
Wolde you bothe eate your cake, and have your cake?

Re:meh by fuzzytv · 2012-09-10 23:01 · Score: 1

Could you please compare Ferrari F1 and Liebherr T1-272 minin truck [e.g. http://www.flickr.com/photos/doncampbellmodels/3434490464/%5D? Not possible, right? Different products for different requirements.

Re:meh by fuzzytv · 2012-09-10 23:03 · Score: 1

Damn, this was supposed to be a response for the parent flamebait ...

Re:LOL by hobarrera · 2012-09-11 00:58 · Score: 0

Did you even read the article? The article talks about PostgreSQL, which is an SQL Server from a different vender. There's also MySQL, and plenty of other SQL Servers.

Re:LOL by Anonymous Coward · 2012-09-11 01:38 · Score: 1

"SQL Server" is not some generic name for a relational database -- it's a product from Microsoft. So "SQL Server", "PostgreSQL", "MySQL" etc are all relational database servers, not "SQL Servers".

Re:LOL by omnichad · 2012-09-11 02:01 · Score: 1

I think they're trying to say that Microsoft calls theirs "SQL Server" in such a way as to make it seem that the SQL standard is something they own or control.

Range types -- not range-restricted -- are major by DragonWriter · 2012-09-11 02:02 · Score: 2

Minor, but probably a welcome relief to those who need them, 9.1 adds range restricted types.

First, its 9.2, not 9.1.

Second, (as shown in the link) these are range types, not range-restricted types. Range-restricted types (as known from, e.g., Ada) are something that (via domains with check constraints) PostgreSQL has supported for a very long time.

Range types, combined with 9.2s support for exclusion constraints, are a pretty major new feature that give 9.2 a great facility in dealing with (among other things) temporal data and enforcing common logical constraints on such data in the database as simple-to-express constraints rather than through triggers.

Re:LOL by hobarrera · 2012-09-11 02:23 · Score: 0

Actually, it's called "Microsoft SQL Server".
That's just them being silly. If you follow that argument, then gnome controls the web (since their browser is just called "web" now). I could just make a file manager called "File Manager" as well.

Re:LOL by Anonymous Coward · 2012-09-11 02:23 · Score: 0

You mean, idiosyncracies different from Oracle's idiosyncracies, Microsoft's idiosyncracies and IBM's idiosyncracies?
By the way, care to be specific? Oh yeah, posting anon. Right.

"GO"

I think that's all that really needs to be said.

Re:LOL by RabidReindeer · 2012-09-11 03:07 · Score: 1

Because we love to bash our keyboards into so much plastic scrap whenever we come across one of its many standards-defiant idiosyncracies?

You mean, idiosyncracies different from Oracle's idiosyncracies, Microsoft's idiosyncracies and IBM's idiosyncracies?

By the way, care to be specific? Oh yeah, posting anon. Right.

I think probably the idiosyncracy that keeps it from running on my Linux servers is probably sufficient. Although that extra level in the table naming hierarchy has been known to cause me to destroy things.

Re:LOL by Zaphod+The+42nd · 2012-09-11 04:17 · Score: 0

Wrong, actually, be careful. The product from MS is "Microsoft SQL Server"

"SQL Server" is just a generic name of which there are many, many implementations. Parent was spot on.

http://en.wikipedia.org/wiki/SQL_Server

Quote Wikipedia:
SQL Server may refer to:
Microsoft SQL Server, a relational database server from Microsoft
Sybase SQL Server, a relational database server developed by Sybase
SQL Server (magazine), a trade publication and web site owned by Penton Media
Any database server that implements the Structured Query Language

--
GCS/MU/P d- s:- a-- C++++$ UL++ P+ L++ E+ W++ N o K- w--- O M+ V- PS+++ PE Y+ PGP t+ 5- X R++ tv+ b++ DI++ D++ G+ e++ h-

Re:LOL by K.+S.+Kyosuke · 2012-09-11 06:31 · Score: 1

LOL just use SQLServer you nubs.

I really tried, honestly, but I couldn't find Debian packages for it anywhere on the MS web site.

--
Ezekiel 23:20

Re:LOL by sr180 · 2012-09-11 06:57 · Score: 1

Note that the two relevant entries that you mention are both spawned from the same product and code base. Originally, MS SQL Server was Sybase SQL server.

--
In Soviet Russia the insensitive clod is YOU!

Re:LOL by colinrichardday · 2012-09-11 13:45 · Score: 1

I took the original poster's use of SQL server to denote a Microsoft product.

Infinities and unbounded ranges by DragonWriter · 2012-09-12 04:51 · Score: 1

It looks like the 9.2 implementation uses its own definition for -infinity/+infinity unrelated to the type in question, which thinking about it now might not have been the best decision (at least, without the ability to define that [,y] and [-infinity,y] are synonymous) since -infinity (as defined by range types) is less than all other values including -infinity (as defined by the contained type).

I think it is a good decision in that it provides a syntactic construct for ranges that are unbounded on either end -- so it applies well to all types -- and works correctly with types that have an infinity value if that is used. You probably don't want the syntactic construct to have to be specially-aware of all the special values within types (because that increases the cost of expanding the type system), and the decision that syntactically-unbounded is "outside" (on either end) of any value in the domain (including, where it exists, an "infinite" value) is the decision that allows the range syntax to not be aware of special type values.

OTOH, code that deals with ranges ought to be type-aware, and as a general rule should use ranges bounded by the types infinity rather than unbounded ranges where the type has an infinity (and, for range values stored in tables, this should probably be enforced by an appropriate constraint which is quite straightforward with the exclusion constraint support implemented alongside range types in 9.2.)

Still, yeah, I can see the argument that more type-aware ranges that treated [,y] as equivalent to [-infinity,y] for types with a meaningful -infinity (and the equivalent for +infinity as the upper bound) would be slightly more convenient.

Re:LOL by Anonymous Coward · 2012-09-12 07:20 · Score: 0

Wrong, actually, be careful. The product from MS is "Microsoft SQL Server"

Yes, and the office product from MS is "Microsoft Office", but when people use "Office" as a proper noun without additional qualification everyone knows what they're talking about.

Slashdot Mirror

PostgreSQL 9.2 Out with Greatly Improved Scalability

146 comments