PostgreSQL 9.1 Released

← Back to Stories (view on slashdot.org)

Posted by Soulskill on Monday September 12, 2011 @07:10AM from the new-and-shiny dept.

With his first posted submission, sega_sai writes "The new version of open source database PostgreSQL was released. This version provides many important and interesting features such as synchronous replication, serializable snapshot isolation, support of per-column collations, K-nearest neighbor indexing, foreign data wrappers, support of SELinux permission controls and many others. The complete list of changes is available here"

148 comments

Min score:

Reason:

Sort:

no comments?! by Anonymous Coward · 2011-09-12 07:19 · Score: 0

I'm kinda shocked. This used to be a nice neighborhood.
1. Re:no comments?! by Anonymous Coward · 2011-09-12 07:22 · Score: 0
  
  GPL vs. BSD!
  Open Source vs. Free Software!
  No need to thank me.
vs Oracle? by pak9rabid · 2011-09-12 07:19 · Score: 4, Interesting

So...how does PostgreSQL compete with Oracle nowadays as far as features go (specifically, spatial, and data-guard-like replication)? Anybody here tried making the switch?
1. Re:vs Oracle? by iamvego · 2011-09-12 07:25 · Score: 1
  
  Well it's always been able to do basic things like LIMIT, multi-row insert statements and correctly treats empty strings as not being null... which Oracle can't do for some dumb reason.
2. Re:vs Oracle? by jellomizer · 2011-09-12 07:27 · Score: 1
  
  I expect that most wouldn't because they have already spend all the money on an Oracle License.
  However my experience is most shops (Including Oracle Shops) Don't use nearly close to all those cool features they provide.
  They just want a SELECT * FROM TABLE WHERE VALUE='TEXT'
  
  --
  If something is so important that you feel the need to post it on the internet... It probably isn't that important.
3. Re:vs Oracle? by Talderas · 2011-09-12 07:32 · Score: 2
  
  That dumb reason is backwards compatibility for their customers due to Oracle being older than the SQL standard.
  
  --
  "Lack of speed can be overcome. In the worst case by patience." --Znork
4. Re:vs Oracle? by Anonymous Coward · 2011-09-12 07:34 · Score: 0
  
  Actually, several years ago I assisted one company to switch from Oracle to PostrgeSQL. They saved a bunch of money because they would have needed upgrades & additional licensing while they dumped their older slower systems. At the same time they also switched from Solaris to Linux.
5. Re:vs Oracle? by Anonymous Coward · 2011-09-12 07:37 · Score: 0
  
  Frankly, if you have already bought Oracle, SQL Server, or DB2, you are not likely to switch to PostgreSQL. But if you are using SQLITE, MySQL or similar and need to scale up, then PostgreSQL may be good enough solution rather than going up to Oracle or the other big SQL servers.
  Then again Oracle wants users of MySQL to switch to Oracle DB ;)
6. Re:vs Oracle? by h4rr4r · 2011-09-12 07:37 · Score: 1
  
  So why not make the older behavior optional, rather than continuing this broken functionality?
7. Re:vs Oracle? by Anonymous Coward · 2011-09-12 07:42 · Score: 0
  
  It varies from individual feature to feature.
  I'm not particularly familiar with data guard and its relative merits compared to Postgres' hot standby but in terms of spatial I'd have said Postgres is far ahead of Locator, about on a par with Spatial.
  One area that I significantly prefer Postgres are you are not limited to PL/SQL and Java for writing SPs. For instance the ability to write in R is a godsend for me compared to struggling with the pretty poor data mining features Oracle has built in.
8. Re:vs Oracle? by amorsen · 2011-09-12 07:46 · Score: 2
  
  Backwards compatibility does not prevent Oracle from supporting FETCH FIRST. See Limiting result sets. Row value constructors wouldn't be a problem to handle either, especially since Oracle already has the functionality just with a silly syntax.
  
  --
  Finally! A year of moderation! Ready for 2019?
9. Re:vs Oracle? by trcollinson · 2011-09-12 07:47 · Score: 1
  
  That makes a lot of really big assumptions. For example, in the case of my company which may switch away from Oracle, we have ongoing licensing costs which means we haven't "bought" Oracle, we are "buying" Oracle, and continuing to do so over and over again, every year.
  Also, as another person mentioned, we use only a small percentage of the actual features that Oracle provides. For us, and I am assuming a lot of others who are paying up the ying yang for licenses, switching to a PostgreSQL solution makes a lot of sense. Really all we want and need is a stable and cost effective environment. (Now, I must say we are looking at enterprise PostgreSQL support which isn't cheap, and far from free. But still a significant savings over Oracles licensing fee).
10. Re:vs Oracle? by discord5 · 2011-09-12 07:50 · Score: 5, Informative
  
  So...how does PostgreSQL compete with Oracle nowadays as far as features go (specifically, spatial, and data-guard-like replication)?
  I can't speak for Oracle, but if you're interested in spatial stuff you should have a look at PostGIS. We've recently been using it to store tons (magnitude of several million) of points and polygons, and we're very happy with it. We've got about hundred simultaneous users connecting to the WFS in peak hours, and it bears the load pretty well if you properly index your tables. I can't speak much for updates, since our database updates in bursts (we import new data every X weeks). I can't go too much into detail about the type of data other than that it's polygons, points, and mostly distance calculations and intersections.
  We briefly looked at Oracle Spatial for a while, looked at the pricetag and the project budget and made the decision to try the PostgreSQL+PostGIS combination and see how far it'd get us. We were pleasantly surprised. I had some experience with PostgreSQL before in the 7.X releases in a previous lifetime but in the end wasn't all that pleased with it, especially on busy servers. Nowadays, I'm running 9.0 and I'm pretty much content about it. Replication wise we've got a PITR setup up and running which is more than enough for our purposes. It's pretty well documented, but be sure to test everything, etc etc etc... It doesn't quite hold your hand when you're setting it up, so double check everything.
  I'm sure that there will be people on here that have more extensive experience with PostgreSQL (and Oracle) to fill you in on the juicy details, but in general I'm pretty pleased with it so far. It scratches my particular itch, and does so without all too much headaches.
11. Re:vs Oracle? by Talderas · 2011-09-12 07:57 · Score: 2
  
  Under the SQL standard a NULL string is different from a zero length string (""). It's subtle but it's a difference.
  Oracle does not differentiate between the two. A zero-length string behaves identically to a NULL. If a customer wrote an application where ("") equated to a NULL then fixing Oracle to differentiate between the two would break those applications.
  If they make it an optional feature Oracle needs to make it an opt-out feature in order to maintain backward compatibility while still allowing their customers the ability to update.
  In otherwise, it's easier to ignore the standard on this one thing than screw with things for their customers.
  
  --
  "Lack of speed can be overcome. In the worst case by patience." --Znork
12. Re:vs Oracle? by h4rr4r · 2011-09-12 08:01 · Score: 5, Insightful
  
  It is easier, it is still wrong. It should be opt-in to prevent people from using it for new installations.
  It is not a subtle fucking difference. It is a huge big honking difference. Either you don't know the information which is null or you know it to be "".
13. Re:vs Oracle? by pak9rabid · 2011-09-12 08:05 · Score: 1
  
  Thank you! Your response was very helpful. I've been eye-balling PostGIS and was wondering: 1.) is it something somebody would bring up if it wasn't specifically mentioned in a question regarding spatial, and b.) how it performs. Based on your response, it sounds like it's pretty usable.
14. Re:vs Oracle? by dkleinsc · 2011-09-12 08:07 · Score: 2
  
  With regards to LIMIT, Oracle does have a couple of equivalents:
  1. WHERE rownum < end
  2. select * from (select S1.*, ROWNUM rn FROM ( real query ) S1 WHERE ROWNUM <= end) WHERE rn >= start
  Obviously, the second syntax is a bit painful, but it works, and it has the important behavior that it doesn't calculate any of the rows that aren't returned (as Postgres says it does for LIMIT...OFFSET in 8.1, see the docs). This is important when you're selecting items 4990-5000 of a 10,000 row result set.
  I've used both PostGres and Oracle - they're both pretty good at their jobs, both have their quirks, upsides, and downsides.
  
  --
  I am officially gone from /. Long live http://www.soylentnews.com/
15. Re:vs Oracle? by marcosdumay · 2011-09-12 08:18 · Score: 1
  
  That depends on how many millions you have to spare on extensions.
  
  --
  Rethinking email
16. Re:vs Oracle? by greg1104 · 2011-09-12 08:28 · Score: 5, Interesting
  
  Lots of companies are converting from Oracle Spatial to PostgreSQL plus PostGIS because it's faster and has better compliance to GIS standards. The text of the talk isn't available, but the FAA Airports GIS and PostgreSQL presentation was a typical report I was in the audience for. The FAA's first conversion happened very quickly: just export their data in a standard format, import into PostgreSQL, and tweak some queries. The result worked so much better that they've standardized on PostgreSQL for spatial applications at the FAA now. Internal projects needing a spatial database have to justify why they want the budget for Oracle Spatial, and it's default deny unless you have a really good reason.
  The addition of synchronous replication to 9.1 has made it a pretty even match for Oracle's Data Guard now. The main bonus is that you can control the integrity level you want at the transaction level. So you can have a database with a mix of important data (only considered safe when on two nodes) and fast, best attempt eventual consistency data, all in one place. Nothing else can replace Oracle at the top end while still having a lean enough mode to be competitive with NoSQL database when integrity isn't the top priority.
  We convert Oracle installs to PostgreSQL all the time at my day job. The main obstacles I keep seeing that don't have simple solutions are 1) using a lot of PL/SQL, 2) differences in query handling, such as OUTER JOIN behavior or reliance on optimizer hints, and 3) can't limit the resources used by individual users easily in PostgreSQL yet. I actually have a design outline for how to solve (3)...would only cost a fraction of a typical Oracle license to sponsor that feature. EnterpriseDB's version of Oracle comes with PL/SQL compatibility, but only in a commercial product that lags behind the open-source releases--and buying from them just switches which vendor you're locked into.
17. Re:vs Oracle? by greg1104 · 2011-09-12 08:31 · Score: 1
  
  There's a sunk cost fallacy here. I regularly convert companies from Oracle to PostgreSQL, financing the project out of savings in the recurring annual maintenance/support costs that Oracle applies. Just because you've already spent a lot of money on a commercial database, that doesn't mean you can't cost justify it based on the recurring overhead.
18. Re:vs Oracle? by Cutting_Crew · 2011-09-12 08:34 · Score: 1
  
  if you are coming to the end of your yearly license with Oracle and why would you not switch to postgres and save some money? Sure you have to spend time and money switching the tables over to a different database and testing that but i am guessing that will cheaper than paying for another yearly license for Oracle and another year...etc etc.
19. Re:vs Oracle? by X0563511 · 2011-09-12 08:39 · Score: 1
  
  If you're using MySQL and need to scale further for some reason, then just use MySQL Cluster. There's no need to change entirely.
  
  --
  For large sets, this will be our guide even unto death, for the LORD will work for each type of data it is applied to...
20. Re:vs Oracle? by shish · 2011-09-12 08:40 · Score: 2
  
  As a demonstration of postgres' spatial abilities, check out OpenStreetMap and the various sub-projects
  
  --
  I mod down anyone who says "I will be modded down for this", regardless of the rest of their comment
21. Re:vs Oracle? by Bengie · 2011-09-12 08:44 · Score: 2
  
  It's about as different as a null pointer vs a pointer pointing to an empty string object.
22. Re:vs Oracle? by Bengie · 2011-09-12 08:52 · Score: 1
  
  You complain about minor things. I have to import data from Oracle servers and for some reason, Oracle loves to dump out rows with different column counts.
  I have to deal with a lot of customers with a lot of different versions of Oracle, and they all seems to have had CSV files that I could not important because the header says 14 columns with their names, but some rows have 5 columns and some rows have 20 columns. Pulled those numbers out of my ass, but that's what happens.. A LOT.. F'n annoying as hell.
  Even running select statements against the DB gives me issues like that.. WTF?!
  I don't admin Oracle boxes, I just request exports from Oracle DBs from customers and this is what happens quite often when customers try to send us data.
  Always fun wasting a week of back-and-forth with a customer just trying to get consistent row schemas.
23. Re:vs Oracle? by Anonymous Coward · 2011-09-12 09:05 · Score: 1
  
  We've been using it at my org for a lot of spatial data as well, though we're still stuck on MS SQL for production.
  One of the big advantages it appears to have over other spatial databases is that there are a lot of spatial apps out there that will nicely integrate to PGSQL, like QGIS, Grass, MapServer. There's plenty of tool suites as well, like FWTools (for windows) or GDAL/OGR/GEOS for *nix.
  In contrast, though MS has had spatial support for a couple of years now it's treated like third rate citizen - no official tool for importing shapefiles, no support in Integration Services, Entity Framework, RIA Services, Silverlight, etc etc. They do have an attractive map component for Reporting Services, but it's buggy at best.
24. Re:vs Oracle? by 0xA · 2011-09-12 09:06 · Score: 1
  
  Using PostGIS and replication here, works very nicely. Out biggest problem was that be have billions of rows that can't be locked for a huge high traffic mobile app. Using a round robin load balanced pool of postgres servers in hot spare mode as read only DBs solved a lot of issues.
25. Re:vs Oracle? by Outtascope · 2011-09-12 09:20 · Score: 1
  
  It's subtle in that people using Oracle coming from a standards compliant database usually have this one bite them in the ass big time without any idea that it was coming. I hate this behavior, and the us of NVs all over the damn place. Outside of cost, it is one of my biggest beefs moving to Oracle. It significantly increased the complexity of moving a large application from MySQL to Oracle. Every single query had to be re-written looking for places where this would get you. It is just plain dumb behavior.
26. Re:vs Oracle? by fuzzytv · 2011-09-12 09:26 · Score: 1
  
  Because then it would be a bit easier to port the applications to other databases?
  Plus this particular "feature" is used on so many places of the current code base (and that's a huge amount of PL/SQL code) that it's almost impossible to fix. Plus it's actually a bit funnier, because the exact behaviour depends whether you use CHAR or VARCHAR2 and if you're in SQL or PL/SQL. And it's not the only funny feature in Oracle.
  Sometimes I have nightmares about the reasons that led the developers to implement it this way.
27. Re:vs Oracle? by Unequivocal · 2011-09-12 09:31 · Score: 1
  
  ++ on this comment. My experience is similar. Postgres and PostGIS are very reliable, very fast and scale well.. If you set them up right. There isn't as extensive of a commercial support network for them as Oracle (duh) but there are commercial options and the online free communities are amazingly open, supportive and helpful. The GiST indexes which enable (I believe) a lot of spatial operations to occur in a timely manner are really impressive and learning how they work itself is a nice little CS continuing education course.
28. Re:vs Oracle? by Unequivocal · 2011-09-12 09:33 · Score: 1
  
  Good post. Errata -- I think you meant to write "EnterpriseDB's version of [[Postgres]] comes with PL/SQL compatibility, but only in a commercial product that lags behind the open-source releases--and buying from them just switches which vendor you're locked into."
29. Re:vs Oracle? by fuzzytv · 2011-09-12 09:35 · Score: 2
  
  I doubt that's an Oracle issue, my guess is they're using a custom-developed tool to export the data and it's buggy. I'm dealing with a lots of data exported from Oracle (CSV, columnar, ...) and I've never had this problem. External tables actually made exporting even easier.
  So while I'm a PostgreSQL fan, let's not blame Oracle for the mistakes of others.
30. Re:vs Oracle? by rasherbuyer · 2011-09-12 09:39 · Score: 1
  
  let me get this straight you're saying that:
  select col1,col2,col3
  from table;
  won't always return 3 columns? What are you smoking?
31. Re:vs Oracle? by fuzzytv · 2011-09-12 09:39 · Score: 1
  
  Frankly, if you have bought Oracle, you're more than aware about their licensing fees. And how ridiculous that gets once you need to use VM, or when you need more CPUs etc. My experience is that the businesses that were already hit by an Oracle sale are looking for other solutions - and PostgreSQL is very popular among them.
32. Re:vs Oracle? by fuzzytv · 2011-09-12 09:48 · Score: 2
  
  The streaming replication is generally equal to Oracle DataGuard (physical standby). The hot_standby actually gives you about the same as Active DataGuard, i.e. the ability to run read-only queries on the standby for free (you have to pay for that with Oracle). With Oracle you'll get a management console to handle all this, with PostgreSQL you have to set it up manually (5-minute task), but there are several tools that may help (e.g. repmgr).
  Spatial ... although it's not a built-in feature, there's a PostGIS (www.postgis.org). A great package to manage geospatial data.
  There are companies that are migrating from Oracle, but they don't want to go public for good reasons. I know there were some case studies about how Sony replaced Oracle with EnterpriseDB - although it's mostly a marketing mumbo jumbo.
33. Re:vs Oracle? by fuzzytv · 2011-09-12 09:53 · Score: 1
  
  Most people don't realize that commercial software is usually licensed, not sold. That's why they don't see the consequences (and it's not just about costs).
34. Re:vs Oracle? by fuzzytv · 2011-09-12 09:56 · Score: 1
  
  I don't think it would be mentioned - actually it's a separate package built on top of PostgreSQL (thanks to the ability to write custom data types etc).
35. Re:vs Oracle? by Nutria · 2011-09-12 09:59 · Score: 1
  
  And it doesn't have scalar data types. What fscked up RDBMS doesn't have *integers*?
  
  --
  "I don't know, therefore Aliens" Wafflebox1
36. Re:vs Oracle? by Synerg1y · 2011-09-12 10:46 · Score: 1
  
  Really? I use tsql, but I work on an inhouse app i didn't write, and of course the previous developers sometimes used "" and sometimes null like the fine professional coders they were (unused methods are sometimes found). Could this be the developer's reasoning? Then again i don't need to switch rdms to take advantage of this fine feature, dbo.ISNULLOREMPTY() anyone?
  And in terms of a recode to switch rdms, it's 99.9% out of the question for everybody, the software costs $, recoding costs way more $, bug fix costs more $, support costs more $, where's the gain?
37. Re:vs Oracle? by fusiongyro · 2011-09-12 10:56 · Score: 2
  
  Another neat thing about Oracle that you obviously don't understand is that Oracle makes you pay for the features you use, but the software lets you turn on whatever features you want. If you're in a large organization with poor database oversight, you may wake up one day to discover that you're using features you aren't paying for, and you're one audit away from a few more tens of thousands of dollars being flushed down the Oracle toilet.
  Lots of organizations bought Oracle in the past because it was trendy or had some little feature that necessitated it, and now that they're growing, they're discovering that the costs are going up too, but they don't need the features that necessitated the original purchase, or the open source databases now have those features. Or maybe they just realize that trendy and expensive doesn't beat free.
38. Re:vs Oracle? by greg1104 · 2011-09-12 11:11 · Score: 1
  
  It's also part of the Oracle lock-in. Try to move off Oracle, discover queries that vary depending on null behavior (such as OUTER JOIN) don't do quite the same thing anymore. Auditing for queries that misbehave due to Oracle's sloppiness here is a giant headache when moving to PostgreSQL from it.
39. Re:vs Oracle? by fuzzytv · 2011-09-12 11:12 · Score: 1
  
  Really? I use tsql, but I work on an inhouse app i didn't write, and of course the previous developers sometimes used "" and sometimes null like the fine professional coders they were (unused methods are sometimes found). Could this be the developer's reasoning? Then again i don't need to switch rdms to take advantage of this fine feature, dbo.ISNULLOREMPTY() anyone?
  Yes, it's one of those Bender-like nightmares - you know, those with 0s, 1s and then a 2.
  
  And in terms of a recode to switch rdms, it's 99.9% out of the question for everybody, the software costs $, recoding costs way more $, bug fix costs more $, support costs more $, where's the gain?
  That's why I wrote 'a bit easier'. The truth is there's a lot of apps that are rather easy to port - e.g. a lot of Java apps built on top of some ORM are not that difficult to port, because they're not using the advanced features at all. And switching the db may be actually cheaper if you consider the licensing fees etc. And for example EnterpriseDB has something that emulates Oracle to some degree and makes the port much easier (so they say).
40. Re:vs Oracle? by nullchar · 2011-09-12 12:08 · Score: 1
  
  The release notes describe some OUTER JOIN changes that may help your (2).
  
  E.1.3.1.1. Performance
  Allow FULL OUTER JOIN to be implemented as a hash join, and allow either side of a LEFT OUTER JOIN or RIGHT OUTER JOIN to be hashed (Tom Lane)
  Previously FULL OUTER JOIN could only be implemented as a merge join, and LEFT OUTER JOIN and RIGHT OUTER JOIN could hash only the nullable side of the join. These changes provide additional query optimization possibilities.
41. Re:vs Oracle? by greg1104 · 2011-09-12 12:17 · Score: 3, Informative
  
  This isn't an optimization problem. As noted elsewhere in the comments here, Oracle handles string nulls oddly. Because of that, joins happen differently against a null value vs. an empty string. That will produce different query results in Oracle vs. PostgreSQL. It's one of those really annoying conversion quirks, either going to Oracle or escaping from it.
42. Re:vs Oracle? by cduffy · 2011-09-12 12:57 · Score: 1
  
  If you're using MySQL and need to scale further for some reason, then just use MySQL Cluster. There's no need to change entirely.
  By "MySQL Cluster", you're referring to the configuration that requires NBD, right? No foreign keys, no guarantees of that data has been flushed to disk when a COMMIT comes back, and you actually think this might be a serious alternative for folks considering Oracle?!
43. Re:vs Oracle? by vux984 · 2011-09-12 13:51 · Score: 1
  
  It is not a subtle fucking difference. It is a huge big honking difference. Either you don't know the information which is null or you know it to be "".
  I'm curious, while in theory a good database can differentiate... how do you realistically capture the information in a user interface?
  You've got some form: Please enter your name:
  firstname :
  middlename :
  last-name :
  If someone leaves middlename blank and submits the form, do you think it is known to be "", or is it NULL? If you assume NULL and someone doesn't have a middle name, how do they indicate that they don't have one so you don't assume its missing.
  If you assume its "" then your database is full of incorrect information, because most people's middle name is not "" even if they omitted it from the form.
  Do you really have a checkbox next to every field... this field intentionally left blank because the correct value is "". I have NEVER seen such a form...
  What is the actual right / normal course of action?
44. Re:vs Oracle? by X0563511 · 2011-09-12 14:02 · Score: 0
  
  By "MySQL Cluster", you're referring to the configuration that requires NBD, right?
  Yes. This is a bad thing? Properly setup NBD is hard to kill. No data lives in once place.
  
  No foreign keys
  You mean just like MyISAM? It has it's uses. This is a cluster, remember. The cluster management nodes are supposed to make this a non issue. As below, as long as someone who knows what they are doing builds it and doesn't half-ass it, this is fine.
  
  no guarantees of that data has been flushed to disk when a COMMIT comes back
  It uses a two-phase commit process. Provided whomever sets the cluster up knows what they are doing and doesn't half-ass it, it works just fine.
  
  --
  For large sets, this will be our guide even unto death, for the LORD will work for each type of data it is applied to...
45. Re:vs Oracle? by hplus · 2011-09-12 14:54 · Score: 2
  
  Imagine if your form did not always include a blank for middle name, though one was added later. People who filled out the form prior to the addition would have a NULL middle name, while those who filled it out later would have it set, if only to "" in some cases.
46. Re:vs Oracle? by smellotron · 2011-09-12 17:33 · Score: 1
  
  What is the actual right / normal course of action?
  
  Allowing the user to specify "empty" vs. "nonexistent" is almost always the wrong answer. In general, data for human contact is either provided or nonexistent, not empty. This can be name components, address components, phone numbers, etc.
  It's hard to set a generalized rule for handling of other inputs, but typically the data model will dictate whether a given "string" datum is optional (NULLable), or whether it has content requirements (empty allowed). Very little data is both optional and allowed to be empty. So keep your interface clean, and do the mapping behind the scenes for data cleanliness.
47. Re:vs Oracle? by cduffy · 2011-09-12 17:49 · Score: 5, Insightful
  
  No foreign keys
  You mean just like MyISAM? It has it's [sic] uses.
  If we grant that a substantial set of use cases where a full-scale, non-embedded, out-of-process database is called for but relational integrity is unimportant exist (something I grant only for the sake of the ongoing argument), that still leaves people who try to use it in situations outside that set. I once took maintainership of an accounting system for car dealerships built on top of a "database" without relational integrity. Cases where bugs in the software resulted in orphaned records and numbers that didn't add up were legion.
  Saying you don't need relational integrity is right up there with saying you don't need bind variables because your hand-built escaping code is good enough. There might be cases where hand-built escaping is good enough, but encouraging J. Random Slashdot Reader to take that approach means it's going to be abused... and I assure you, the person who built that accounting system considered himself an expert.
  
  no guarantees of that data has been flushed to disk when a COMMIT comes back
  It uses a two-phase commit process.
  
  Interesting documentation. Reading the links -- suffice to say that MySQL's multi-master clusters are vastly less safe than Oracle RAC. Depending on after-the-fact conflict resolution rather than having proper locking... really? (And with respect to NDB not supporting fsync() before reporting a commit complete -- yes, I can provide a link for that. Being in a cluster isn't good enough -- sometimes you lose a rack, or a full DC).
  Yes, it's cheaper. Yes, it might be good enough for someone. But if someone needs RAC features enough to pay $50K in licensing fees for a tiny cluster, NDB just ain't gonna' cut it. And if I'm handling medical data (which I was last time I deployed RAC), I'm not going to risk massive liability by deciding that I can afford to get a COMMIT back before data is not just ACKed by multiple nodes but sitting on a platter.
  PostgreSQL 9.1's synchronous streaming replication is a sane middle ground -- it's not multi-master, but it actually preserves the same semantics and performance characteristics (durability semantics being the most critical of those -- but at the same time, needing to rewrite everywhere your code uses savepoints or autoincrement or relies on referential integrity exceptions because you're switching to a database backend that doesn't support them isn't any fun either) you'd have in a non-clustered environment. Having it available six years ago would have saved me one helluva lot of money.
  
  Provided whomever sets the cluster up knows what they are doing and doesn't half-ass it, it works just fine.
  Really? Compared to RAC, wherein moving to a cluster involves no loss of semantics, using NDB looks a whole lot like half-assing it already.
48. Re:vs Oracle? by Anonymous Coward · 2011-09-12 18:37 · Score: 0
  
  sounds like a dump of google map data... make it open source!!!
49. Re:vs Oracle? by Anonymous Coward · 2011-09-12 19:15 · Score: 0
  
  AFAIK Oracle and M$ are still the only two databases to support true cluster deployment (e.g. multiple servers, single SAN storage)
50. Re:vs Oracle? by hawkinspeter · 2011-09-12 21:01 · Score: 1
  
  I've done plenty of exports of data from oracle databases and have never seen this. Are you sure that you're using CSV correctly?
  
  --
  You're a temporary arrangement of matter sliding towards oblivion in a cold, uncaring universe
51. Re:vs Oracle? by tehcyder · 2011-09-12 22:50 · Score: 1
  
  There's a sunk cost fallacy here. I regularly convert companies from Oracle to PostgreSQL, financing the project out of savings in the recurring annual maintenance/support costs that Oracle applies. Just because you've already spent a lot of money on a commercial database, that doesn't mean you can't cost justify it based on the recurring overhead.
  Discounted future cash flow forecasts are your friend here, as for any financially based decision; as you say, sunk costs are irrelevant.
  
  Unfortunately, non-finance people find this concept hard to grasp, a fact that is mercilessly exploited by salesmen of over-priced products.
  
  --
  To have a right to do a thing is not at all the same as to be right in doing it
52. Re:vs Oracle? by Bengie · 2011-09-13 01:43 · Score: 1
  
  We don't do it ourselves, we tell our customers what we need and this is what happens.. Often. We have about 30 different customers who use Oracle, and nearly all of them have had this issues more than once at some point. A select few have never had this issue.
53. Re:vs Oracle? by ifreebudget · 2011-09-13 01:49 · Score: 1
  
  For a paging style query, row_num windowing function is more efficient. Postgres supports it atleast from 8.x version I think.
54. Re:vs Oracle? by Bengie · 2011-09-13 01:55 · Score: 1
  
  "my guess is they're using a custom-developed tool to export the data and it's buggy"
  Good point. We'd have to see what's common among the customers, but they always eventually fix it so we never dug further than sending back examples problem and telling them to fix it.
55. Re:vs Oracle? by Unordained · 2011-09-13 02:52 · Score: 1
  
  That's a lot like the PL/SQL-compatible variant of Firebird, called Fyracle, which is still based on Firebird 1.5, not the current 2.1 or 2.5 series.
56. Re:vs Oracle? by rubycodez · 2011-09-13 04:12 · Score: 1
  
  Why call that "true cluster deployment", there are many types of clusters and I reject your notion that "shared storage clustering" is the One True Clustering (*snort*). It's a huge administrative burden, and in the real world doesn't deliver its claimed "N-times" performance scaling, and when problems arise a nightmare to troubleshoot. yes, part of my job is to work with Oracle RAC. Instead of going that route, better (cheaper, easier to admin, better performance) to get N times the machine and N times performance SAN, that will whoop RAC's ass any day of the week.
57. Re:vs Oracle? by hawkinspeter · 2011-09-13 04:27 · Score: 1
  
  The only way I can see this happening is if the CSV doesn't have text fields enclosed with quotes where the text can include a comma or alternatively if there is a newline/carriage return in the middle of test fields.
  
  I just can't see how you could write a sql query that would return different numbers of columns per row unless you tried really hard. I bet you wouldn't be able to find anyone else having the same problem with oracle who isn't a customer of yours.
  
  --
  You're a temporary arrangement of matter sliding towards oblivion in a cold, uncaring universe
58. Re:vs Oracle? by Anonymous Coward · 2011-09-13 10:11 · Score: 0
  
  I have almost this exact problem. I converted a database from a spreadsheet, and people with no middle name got the empty string. But my PHP code assigns NULL to new entries. The client hasn't noticed, and probably never will. Maybe we'll convert to Oracle someday and it won't matter. (anon because I'm embarrassed not to have fixed such simple sloppiness)
59. Re:vs Oracle? by vux984 · 2011-09-14 08:00 · Score: 1
  
  You didn't really answer my question, but rather side stepped it and gave an example where a column could have both.
  I've certainly been there, but then why not simply:
  UPDATE oldTable SET newField= ''
  Rather than have the old records with NULLs?
  If its a non-required field, that is blank most of the time anyway... what value is there is letting it stay for old records and then dealing with the possibility it might be null all over the place in one's code?
Replication Drawback by Anonymous Coward · 2011-09-12 07:37 · Score: 1

If you wish to use their replication implementation to increase performance, you will probably have to look elsewhere, I'm afraid. In the event that your primary server fails, you are required to promote one of the existing slaves to be the new primary server and all other slaves will require a fresh data dump from the new master. Maybe in another year (when 9.2 gets released) it will be ready for the masses.
1. Re:Replication Drawback by greg1104 · 2011-09-12 08:39 · Score: 2
  
  It's not trivial to figure out, but we've been deploying PostgreSQL 9.0 without the problem you describe (must do a fresh dump from the master) for a while now. The repmgr software we've released takes care of all the promotion trivia. Worst-case, unusual situations can require you use a tool like rsync to make an out of date standby node into a copy of the new master. That's not the expected case though.
2. Re:Replication Drawback by Lennie · 2011-09-12 09:05 · Score: 1
  
  A possible alternative is to use pgpool II, from the webpage:
  - Automated failover. If one of the two PostgreSQL goes down, pgpool-II will automatically let remaining node take over and keep on providing database service to applications.
  - Query dispatching and load balancing. In streaming replication mode, applications need to carefully chose queries to be sent to standby node. Pgpool-II checks the query and automatically chose primary or standby node in sending queries. So applications need not to worry about it.
  - Online recovery. Recover failed node without stopping pgpool-II and PostgreSQL.
  Obviously there are also commercial providers of PostgreSQL which have added their own features.
  
  --
  New things are always on the horizon
3. Re:Replication Drawback by fuzzytv · 2011-09-12 10:00 · Score: 1
  
  That is not true. If you promote the slave that's ahead of all the other slaves, then the other slaves can just reconnect to the new master. Tools like repmgr can handle this for you.
  And no one actually says you have to do a completely fresh base backup. Ever heard about rsync?
4. Re:Replication Drawback by fuzzytv · 2011-09-12 10:01 · Score: 1
  
  Yup, and things will get a bit more interesting thanks to the cascading replication.
The only decently sane SQL database by Anonymous Coward · 2011-09-12 07:38 · Score: 0

I've tried a few SQL databases, and my favorite of them all is Postgres. With Postgres I don't get nearly as many syntax/logic surprises as with SQL Server.. and I don't get nearly as many performance surprises as with MySQL.
I mean for some reason Microsoft couldn't give us a LIMIT/OFFSET clauses for the trivial operation of implementing paging leaving us instead to do sub-queries. What the hell? That's like making a programming language with arrays but leaving out the indexing operator.
1. Re:The only decently sane SQL database by amorsen · 2011-09-12 07:50 · Score: 1
  
  I hate MSSQL as much as anyone, but it does (in later versions at least) support cursors and ROW_NUMBER. It is a bit silly to not support FETCH FIRST in 2011, but hey, it's doing better than Oracle.
  
  --
  Finally! A year of moderation! Ready for 2019?
2. Re:The only decently sane SQL database by djdanlib · 2011-09-12 07:55 · Score: 1
  
  Actually, MSSQL11 will support FETCH FIRST.
  Check it out (you'll have to scroll a couple pages down): http://www.codeproject.com/KB/database/Denali_Tsql_Part_2.aspx#3.3
3. Re:The only decently sane SQL database by djdanlib · 2011-09-12 07:56 · Score: 1
  
  Oops, I forgot to also mention that it will support LIMIT/OFFSET as well, which is noted in the same link. Sorry for double-post.
not excited by roman_mir · 2011-09-12 07:47 · Score: 4, Interesting

I am not excited about any of these changes unfortunately, they are somewhat specialized, though having synchronous replication and serializable transaction isolation sounds more useful than other stuff.
But there are real things that are missing. Most obvious is distributing of one SQL request into parallel processes or threads to speed up query execution on multi-core systems (which are all multi-core today). The other is the entire issue of attempting to calculate execution time and failing in various cases in the planner, like the really sad cases of completely mis-handling of the mergejoin estimates, which then forces people to set enable_mergejoin to false unfortunately, it's a sledgehammer approach, but otherwise things that can execute in a few milliseconds can take tens of seconds and even minutes instead.
There are so many ways to improve performance and really kick it up, and instead there are more features added. I think database performance is now more important for PostgreSQL than features (unless this means introducing parallelization of single SQL requests.)
Otherwise it's a good database, it already provides tons of features. The one weird thing that I find though, is that for replication or hot stand by or just for creating a dynamic backup, the segments that are written to the disk are always of fixed size.
You can modify the size, which is 16MB by default, but you can only modify the size when you configure the source code before compiling it: configure --with-wal-segsize=1 - this configures the segments to 1MB, which allows the second drive to last that much longer if all you are doing is using a second drive to keep dynamic backup (and that asynchronous backup method, by the way, the problem that they are solving with "synchronous replication", it's that you either have these segments fill up, and then the segment is written to disk, or you wait until time expires for segment to be written to disk if you set checkpoint_timeout). I imagine treating fixed sized segments is easier than generating segments that are of exact size equal to amount of data that was produced in a time period, but it's a waste of disk though.
The other big thing that I would love to have in a database is ability to scale the database to multiple machines, so have a logical database span multiple disks on multiple machines, have multiple postgres processes running against those multiple disks, but have it all as one scalable database in a way that's transparent to the application. That would be some sort of a breakthrough (SAN or not).

--
You can't handle the truth.
1. Re:not excited by simcop2387 · 2011-09-12 08:09 · Score: 2
  
  The other big thing that I would love to have in a database is ability to scale the database to multiple machines, so have a logical database span multiple disks on multiple machines, have multiple postgres processes running against those multiple disks, but have it all as one scalable database in a way that's transparent to the application. That would be some sort of a breakthrough (SAN or not).
  The big reason you don't find that and it would be a tremendous breakthrough, is that it is currently believed to be actually impossible to get that. Have a look at the CAP Theorem. http://en.wikipedia.org/wiki/CAP_theorem
2. Re:not excited by Anonymous Coward · 2011-09-12 08:11 · Score: 3, Informative
  
  I am not excited about any of these changes unfortunately, they are somewhat specialized, though having synchronous replication and serializable transaction isolation sounds more useful than other stuff.
  Synchronous replication for many is a must have. In many cases this single feature was preventing adoption of PostgreSQL as many applications require synchronous replication support.
  Per column collation is another feature which is commonly required in complex applications; especially if they are multilingual capable.
  Extensions support is very important and will benefit anyone who uses any contrib or third party extension for PostgreSQL. Unless your database work starts and stop at the most basic of features, this is a something you'll likely use, if not today, then tomorrow.
  Serializable isolation level support, for many, is a critical feature. I agree most people likely won't benefit, but its an important feature to round out support for all database applications and loads. It means PostgreSQL is now an option for whom it wasn't previously.
  Unlogged tables is extremely important, super cool, and very powerful. Many applications require time consuming kludges to work around this previously lacking feature. For many applications this is a make it or break it feature from a performance perspective.
  For many geo-location applications, but even beyond that, nearest-k is an extremely important feature which not only simplifies code, but also reduces development and test while also providing for a nice performance boost.
  And of course, the SE Linux stuff is very important for customers (you'd be surprised how many there are, including the NSA) who consider this the most important feature of the release.
  
  But there are real things that are missing. Most obvious is distributing of one SQL request into parallel processes or threads to speed up query execution on multi-core systems (which are all multi-core today).
  You want EnterpriseDB. It already has this. As for the rest of your rant, pragmatically, it just doesn't happen very often. Assuming you're not just trolling, what did the PostgreSQL Perf guys have to say about your issue. Generally these types of issues are considered planner bugs (unlike all other SQL vendors) and if possible, will gladly create a fix, if able/appropriate. But 99% of this time, this is just one of those things people love to lie and troll about.
  
  There are so many ways to improve performance and really kick it up, and instead there are more features added. I think database performance is now more important for PostgreSQL than features (unless this means introducing parallelization of single SQL requests.)
  Except that PostgreSQL is already one of the fastest databases available for 90% of the likely workloads. Performance, in of itself, typically isn't the primary focus right now simply because, for the vast majority of users, it already spanks or is at least on par with most every other option.
  
  You can modify the size, which is 16MB by default, but you can only modify the size when you configure the source code before compiling it: configure --with-wal-segsize=1 - this configures the segments to 1MB, which allows the second drive to last that much longer if all you are doing is using a second drive to keep dynamic backup (and that asynchronous backup method, by the way, the problem that they are solving with "synchronous replication", it's that you either have these segments fill up, and then the segment is written to disk, or you wait until time expires for segment to be written to disk if you set checkpoint_timeout). I imagine treating fixed sized segments is easier than generating segments that are of exact size equal to amount of data that was produced in a time period, but it's a waste of disk though.
  Pragmatically, just not an issue. Which is why its a compile time option and not a runtime option.
3. Re:not excited by vlm · 2011-09-12 08:11 · Score: 1
  
  The other big thing that I would love to have in a database is ability to scale the database to multiple machines, so have a logical database span multiple disks on multiple machines, have multiple postgres processes running against those multiple disks, but have it all as one scalable database in a way that's transparent to the application. That would be some sort of a breakthrough (SAN or not).
  You need a middle-ware machine that understands enough SQL to send the correct request to the most optimized box, and a fallback slushware box to handle anything you didn't figure out how to manually optimize on the speedy indexed boxes.
  I realize this is a postgresql article but at one time I had a herd of little mysql boxes (data was replicated outside mysql) and each had different custom indexes set up that matched certain queries. So depending on which set of disk drive filling indexes fit the best, my middle box shoved them out to the right box. Using horrible regexes. Icky, but it worked. Obviously this is simpler with lots of strange reads, not many joins, and few inserts.
  A general generic extendable solution would probably take some kind of AI, or have a horrific overhead cost. Or both.
  
  --
  "Science flies us to the moon. Religion flies us into buildings." - Victor Stenger
4. Re:not excited by Anonymous Coward · 2011-09-12 08:18 · Score: 0
  
  Try cassendra, http://cassandra.apache.org/
5. Re:not excited by roman_mir · 2011-09-12 08:22 · Score: 2
  
  people love to lie and troll about.
  - more people saying that I am trolling, yet again. Yet again, incorrectly, I don't troll ever.
  
  --
  You can't handle the truth.
6. Re:not excited by vlm · 2011-09-12 08:23 · Score: 1
  
  The big reason you don't find that and it would be a tremendous breakthrough, is that it is currently believed to be actually impossible to get that. Have a look at the CAP Theorem. http://en.wikipedia.org/wiki/CAP_theorem
  Most CAP arguments seem to rely on some combination of not understanding the concepts behind token ring networks, not understanding distributed hash tables, not tolerating latency, and/or trying to scale to a very small number (like 2) instead of a large prime number of majority voting servers. Or it assumes ENIAC level MTBF rates of individual voting devices and vote counters, etc.
  Don't get me wrong, CAP is a good theoretical argument, and educational to think about, it just doesn't apply to many real world. Kind of like theoretically unbreakable encryption cannot exist so we shouldn't even try crypto... however crypto schemes taking multiple universe lifetimes to crack are none the less useful, despite "perfection" being impossible.
  
  --
  "Science flies us to the moon. Religion flies us into buildings." - Victor Stenger
7. Re:not excited by roman_mir · 2011-09-12 08:27 · Score: 2
  
  I address all of these shortcomings that I am writing about here within the application.
  Of-course it's much simpler for me to do from application perspective, because I know what the business logic is, so I break SQLs into pieces that can run in parallel, then I execute them in multiple connections against the database (thread or process per connection), and then I merge data as it comes back. This speeds up execution dramatically, not even close to what a single serial SQL can do.
  As to adding more machines to the cluster - again, in the application level I have to split data logically into separate instances, application knows where to go for different segments of data, so this is not transparent to the application of-course, it must know where different data is.
  
  --
  You can't handle the truth.
8. Re:not excited by h4rr4r · 2011-09-12 08:39 · Score: 0
  
  What a rational, well argued, and sophisticated comment that adds immense value to the conversation at hand.
9. Re:not excited by Anonymous Coward · 2011-09-12 08:51 · Score: 0
  
  distributing of one SQL request into parallel processes or threads
  This is very useful for some specific use cases like data warehouse applications, but the more common use is OLTP where there are many simultaneous simple queries this isn't a big deal because the multiple queries use the multiple CPUs/cores. Always nice to see more functionality, but this may be hard to implement with their multi-process articture.
  
  really sad cases of completely mis-handling of the mergejoin estimates, which then forces people to set enable_mergejoin to false
  Have you tried increasing your stastics target? Give the planner better data and it'll give you a better plan. Also, may get better results fiddling with Planner Cost Constants instead of turning it right off.
  
  The other big thing that I would love to have in a database is ability to scale the database to multiple machines, so have a logical database span multiple disks on multiple machines, have multiple postgres processes running against those multiple disks, but have it all as one scalable database in a way that's transparent to the application.
  Again, great for stuff like data wherehouse applications, but not a huge issue for OLTP where normal replication solutions are good enough. It may be better to leave stuff like this to the specialized database warehouse systems.
10. Re:not excited by greg1104 · 2011-09-12 09:04 · Score: 1
  
  PostgreSQL changes are driven by what people want badly enough that they are willing to invest resources--development manpower, time, etc.--into that change. So your suggestion that performance is more important than these features can't be true. If performance were really the driver for most people, there would be more performance changes instead of all these features contributed. Extending PostgreSQL and getting the code committed is a lot of work to do right. There's very few people developing features here for fun; most of the new features are developed to solve very real business needs. PostgreSQL 9.2 looks like it will have a number of performance improvements however, the pendulum seems to have swung back to where those are more needed again. (The last release to focus mainly on performance was 8.3)
  Parallel query and multi-node operation would all be nice. Progress continues toward those goals, while still shipping a new, stable version each year.
  The fact that the WAL segments are fixed at 16MB doesn't have the impact you're describing anymore. The streaming replication introduced last year in PostgreSQL 9.0 allowed copying partial WAL files over. And there is no wasted space just because a checkpoint ran when only part of a WAL file was used; checkpoints don't move to another WAL file. If you turn on the archive_timeout feature, that has the problem you describe. But that's been considered obsolete by most people, now that the 9.0 streaming feature is available.
11. Re:not excited by udachny · 2011-09-12 09:05 · Score: 0
  
  Yeah, fuck those moms and pops shops, right? Who the hell needs those? moron.
  
  --
  MY OTHER COMMENTS
12. Re:not excited by Anonymous Coward · 2011-09-12 09:07 · Score: 0
  
  Defensive much? I didn't say you were trolling. I gave you the benefit of doubt you were not trolling and explained why it was reasonable to say this is one of those things to which people constantly troll about. Your OP was somewhat splitting hairs and creating issues which generally are not, so while it was possible you were trolling, you still received the benefit of doubt. Which says a lot considering some of the content in your OP. This in turn makes your latest response pretty fishy. Its needlessly defensive and evasive. I'll also add, I was fully aware of the bug report before you posted a link to it.
  Now then, having said that, most people simply don't trigger the bug. And since you didn't really clarify and answer to the questions posed, is it now safe to assume you read a bug report and are now trolling with that? That's the standard anti-PostgreSQL troll tactic these days. As I originally stated, the vast majority of people simply don't suffer from that bug. Did you actually suffer from the bug or did you read a bug and declare PostgreSQL is broken in spite of the fact few users actually experience an issue from the bug, let alone trigger it.
  And to further clarify, yes, whining about things which are not issues for the general user base, or really, not an issue at all, absolutely does have an air of troll. Especially when its presented in such a way to suggest that it IS an issue for people who may read your post an not know any better. At best, its disingenuous. Regardless, its certainly not likely to endear you to the better informed in the audience.
13. Re:not excited by Lennie · 2011-09-12 09:08 · Score: 1
  
  I'm kind of surprised that you mention parallel processing. In most tests I've seen PostgreSQL does add working with more cores than MySQL does.
  
  --
  New things are always on the horizon
14. Re:not excited by roman_mir · 2011-09-12 09:12 · Score: 1
  
  Hey, I am just writing on my experiences, obviously people don't add stuff without reason.
  As to WAL segments - I am not using this feature for streaming replication, just to have an immediate backup to a separate disk, no hot stand by, nothing like that. Imagine a bunch of stores, each one has a small server for management. Data gets transfered to the central servers but also gets backed up to a separate drive that's only used for backup, nothing else. No hot stand by processes waiting for the main process to die or anything. So yes, archive_timeout is used to force a dump of the data to the second disk. You may consider it 'obsolete', but it has a valid use case, and it's not for streaming to a separate database, just to be able to replay the data back in case main disk fails.
  
  --
  You can't handle the truth.
15. Re:not excited by roman_mir · 2011-09-12 09:15 · Score: 1
  
  As I said - a single SQL is not broken into smaller independent pieces to be executed by separate processors/threads. Separate connections/sql requests run in separate processes, but one sql request is not turned into many parallel executions.
  
  --
  You can't handle the truth.
16. Re:not excited by udachny · 2011-09-12 09:17 · Score: 0
  
  dumb ass, look at the report email and the nick of the GP.
  
  --
  MY OTHER COMMENTS
17. Re:not excited by fuzzytv · 2011-09-12 10:13 · Score: 3, Interesting
  
  Cassandra is just one of many NoSQL databases, but yes - NoSQL can be an answer to workaround the CAP theorem in some cases.
  But in many cases it's not a solution. If the data are relational, if you need full ACID, etc. then ditching "consistency" is not a choice. There are projects to build PostgreSQL clustering solutions, that may resemble RAC a bit, although none of them uses shared disk (so each instance needs a separate disk). Let's mention PGCluster, PGCluster II or Postgres-XC (aiming to build write-scalable cluster, something Cassandra does in the NoSQL world). Sure, all this has to follow the CAP theorem.
18. Re:not excited by fuzzytv · 2011-09-12 10:24 · Score: 5, Interesting
  
  The reliability probably improved since ENIAC, but the the question still is "when it is going to fail" and not if it is going to fail. Because it is going to fail - it may be a drive, CPU, PSU, a network switch, an AC unit, the whole AWS data center ... something is going to fail.
  The beauty of CAP theorem as I see it that it says "You can't get all three at the same time, face it." If you don't need the strong consistency (and with most apps you don't), then ditch it and it'll be much easier and cheaper to built and scale the system. I'd say once you realize this inner beauty, it clears your mind - something like a Zen of distributed computing.
19. Re:not excited by Anonymous Coward · 2011-09-12 10:44 · Score: 1
  
  I'm no PostgreSQL expert, and I have no idea what setting enable_mergejoin=false does...
  But, I took a look at your query you included on your bug report -- and I would guess that the performance might improve if you use explicit joins instead of "filtered cross joins". (Or does PostgreSQL somehow infer this and do this automagically?). Your main "outer query" has a two-table cross join, and then filters the rows of that on the results of a four-table cross join.
  Also, if I were writing that query, I'd SELECT and GROUP BY r.receipt_id rather than ri.receipt_id, although I doubt that makes any performance difference. Anyway, FWIW and for the purposes of discussion, I've re-written your query to use explicit JOINS and e-mailed you a copy (it's too large for a /. reply). I'd be interested to hear if it performs any differently (better OR worse) than the original.
20. Re:not excited by roman_mir · 2011-09-12 10:59 · Score: 1
  
  The old one that's posted in the bug report runs in 83ms right now, yours runs in 74ms, so it's a nice gain.
  However this was code from a couple months back and it's constantly changing, but thanks.
  
  --
  You can't handle the truth.
21. Re:not excited by roman_mir · 2011-09-12 11:08 · Score: 1
  
  newer code actually replaces some really crazy stuff
  CASE WHEN sum(least(ri.total_price_before_discount, ri.total_price_after_discount))0 AND
  sum(greatest(ri.total_price_before_discount, ri.total_price_after_discount))0 THEN
  sum(greatest(ri.total_price_before_discount, ri.total_price_after_discount)) ELSE
  sum(least(ri.total_price_before_discount, ri.total_price_after_discount)) END as total_sale,
  with a much simpler version
  sum(CASE WHEN ri.total_price_after_discount IS NULL THEN ri.total_price_before_discount ELSE ri.total_price_after_discount END) as total_sale,
  which then weirdly reverses the numbers. The SQL you sent with this change becomes 78ms and the old with this change is 72ms.
  Now you got me reading too many explain analyze lines.....
  
  --
  You can't handle the truth.
22. Re:not excited by roman_mir · 2011-09-12 11:18 · Score: 1
  
  it's actually jumping within 10ms more or less, after doing many tries I see different things come out all within 72 to 87ms, so I don't think it's that much different.
  Of-course planner is what is important. If it doesn't do its job right, then it can send it all the way back to 30 seconds - that hurts.
  
  --
  You can't handle the truth.
23. Re:not excited by Anonymous Coward · 2011-09-12 12:31 · Score: 0
  
  well, at the very least there should be a free postgres based alternative to greenplum or netezza. e.g. a parallel mode postgres, or something.
24. Re:not excited by simcop2387 · 2011-09-12 16:49 · Score: 1
  
  I'd absolutely agree that you can often get close enough that for most practical reasons it does get you all properties. However it is definitely a good idea to understand that while what you have looks like that, underneath it likely doesn't for some short periods of time. The biggest thing here would be that you can get consistency that's very very good while still not being in the academic sense that it is in the theorem (e.g. consistent at all points in time). Understanding those trade-offs is the important part. Much less so than "omg it can't be done". Given that, I think trying to give up consistency for the other two in a project like PostgreSQL would be foolhardy (since you'd end up losing ACID compliance even if you're nominally there).
  That being said, it would be nice to see something as mature as PostgreSQL in the field of CP or AP databases. MySQL does have an advantage here as it can handle multiple storage backends that could get you this on a single database system (I think I even recall someone developing a MongoDB storage engine for MySQL).
25. Re:not excited by Lennie · 2011-09-12 21:06 · Score: 1
  
  ahh, ok.
  And sorry for the terrible english above. I most have been really tired. :-(
  
  --
  New things are always on the horizon
26. Re:not excited by Errol+backfiring · 2011-09-12 22:15 · Score: 1
  
  Not excited either. There is no uniqueness handling other than fail, as far as I can tell. Almost any other database has an "on conflict", "on duplicate key", or implements the MERGE command of the SQL standard (a totally illegible dragon of a statement, but still). A simple command like an "insert or ignore" or an "insert or replace" should not require server-side programming. How can agile developers work with such a database?
  
  --
  Nae king! Nae laird! Nae yurrupiean pressedent! We willna be fooled again!
27. Re:not excited by roman_mir · 2011-09-13 00:40 · Score: 1
  
  I just had to do a little bit of this, I use plsql from the client, create a function that handles this case: ...EXCEPTION WHEN unique_violation THEN ...
  
  --
  You can't handle the truth.
28. Re:not excited by Anonymous Coward · 2011-09-13 01:25 · Score: 0
  
  dumb ass - the point still remains 100% valid. Even the evasive bit. Why are so many slashdotters so fucking stupid these days? You seriously live in a world where trolls don't exist? I guess I need to be on the same medications as you. And yes, literally, THAT EXACT SAME FUCKING BUG has been used many times now to troll, by trolls, about PostgreSQL. Dumb ass. Damn you're fucking stupid. So asking sincere questions about someone who is borderline trolling, so as to determine if they are sincere or a troll hardly makes one a "dumb ass." Such a statement pretty well validates just how low your IQ actually is. "Nope! Trolls don't exist you dumb ass. Derp. You're a dumb ass - what is this fantasy thing of trolls. Derp"
  Is your reading comprehension really so fucking bad that you can't read what was said?
  Nice troll by the way.
  Literally, the level of stupidity here on slashdot these days is beyond belief - and yet there you are. I might have to take a screen shot to have proof that people this stupid really do fucking exist.
  At worse, the only thing you can say about my exchange is I was too lazy to look at the name of a bug report as is often linked by trolls. Dumb ass. Damn you're fucking stupid.
29. Re:not excited by Unordained · 2011-09-13 03:01 · Score: 1
  
  While Firebird supports update or insert, who else does? Yes, the merge syntax is common, but that's a lot more annoying to use for day-to-day single-row updates in an OLTP environment. Also, what does "agile" have to do with it? Or did you not mean it in the project-management sense?
30. Re:not excited by jefe7777 · 2011-09-14 05:36 · Score: 1
  
  keep up the good work. I got mods today, trying to keep your posts from being buried/censored.
31. Re:not excited by roman_mir · 2011-09-14 05:45 · Score: 1
  
  Thanks, in a market of ideas some ideas are promoted and some are shut down, and in reality it doesn't matter if idea is right or wrong in principle, if enough people just vote with or against it.
  I don't know if any of the comments make any difference (but they MAY actually for individual people, who may just figure out what is coming in time to save themselves from the economic collapse), but at the very least, when the big one hits the fan, they won't be able just to come out and say:
  NOBODY SAW IT COMING.
  It's just not true. It wasn't true during the dot com bubble, it wasn't true during the housing bubble and now, with the biggest bubble of all that is being inflated in government debt - it is again not true.
  
  --
  You can't handle the truth.
Not Sqlite by wandazulu · 2011-09-12 07:48 · Score: 2

Just one small nitpick...sqlite is really meant as an embedded database into an application, it's not a full-fledged database like any of the others mentioned (it doesn't have networking, for example). I suppose you could be scaling up from an embedded sqlite db, but that suggests your application has gotten so big that an external database is necessary.
It's also one of the backing store options for Apple's Core Data framework.
Re:Universal Relational Database of Probability by Bodhammer · 2011-09-12 07:57 · Score: 1

The odds of this happening are NULL...

--
"I say we take off, nuke the site from orbit. It's the only way to be sure."
Re:Universal Relational Database of Probability by h4rr4r · 2011-09-12 08:05 · Score: 1

Already we can tell you are doomed to failure. You are writing a script while ignoring the fact that this functionality already exists. Just cat /dev/random into your db.
Installer improved? by Anonymous Coward · 2011-09-12 08:17 · Score: 0

How about an installation that doesn't suck?
1. Re:Installer improved? by h4rr4r · 2011-09-12 08:29 · Score: 1
  
  What installer?
  They have rpms and debs, what more could you want?
2. Re:Installer improved? by Anonymous Coward · 2011-09-12 08:33 · Score: 0
  
  Sounds amazing. Unfortunately, I am referring to a Windows installer.
3. Re:Installer improved? by roman_mir · 2011-09-12 08:40 · Score: 1
  
  I prefer source and my script
  
  apt-get install -y libreadline5-dev apt-get install -y zlib1g-dev apt-get install -y zlibc ln -s /usr/bin/make /usr/bin/gmake gunzip /usr/local/pgsql/postgresql-9.0.4.tar.gz tar xvf /usr/local/pgsql/postgresql-9.0.4.tar mkdir /usr/local/pgsql/postgresql-9.0.4/build_dir cd /usr/local/pgsql/postgresql-9.0.4/build_dir /usr/local/pgsql/postgresql-9.0.4/configure --with-wal-segsize=1 gmake world groupadd postgres useradd postgres -g postgres chown -R postgres /usr/local/pgsql/postgresql-9.0.4/build_dir/* su postgres gmake check exit gmake install-world export POSTGRESQL_HOME=/usr/local/pgsql export PATH=$PATH:$POSTGRESQL_HOME/bin ## need some 600MB shared echo >> /etc/sysctl.conf echo "kernel.shmmax=600000000" >> /etc/sysctl.conf echo >> /etc/sysctl.conf sysctl -p mkdir /data /usr/local/pgsql/bin/initdb -D /data mkdir /data/base/SPACE /usr/local/pgsql/bin/pg_ctl -D /data -l logfile start # in postgresql.conf shared_buffers = 500MB checkpoint_segments = 10 track_counts = on autovacuum_vacuum_scale_factor = 0.00002 # many small updates autovacuum_analyze_scale_factor = 0.00001 bytea_output='escape' # still problems with other types ## restart db /usr/local/pgsql/bin/psql -d dbname -U user < model.ddl #place startup script to /etc/init.d/postgres chmod aguo+wx /etc/init.d/postgres ln -s /etc/init.d/postgres /etc/rc1.d/K02postgresql ln -s /etc/init.d/postgres /etc/rc2.d/S98postgresql ln -s /etc/init.d/postgres /etc/rc3.d/S98postgresql ln -s /etc/init.d/postgres /etc/rc4.d/S98postgresql ln -s /etc/init.d/postgres /etc/rc5.d/S98postgresql
  
  --
  You can't handle the truth.
4. Re:Installer improved? by h4rr4r · 2011-09-12 08:44 · Score: 1, Flamebait
  
  Don't run Postgres on Windows. That would be mindbogglingly stupid.
5. Re:Installer improved? by Anonymous Coward · 2011-09-12 08:45 · Score: 0
  
  Why would someone install a database server on Windows?
6. Re:Installer improved? by h4rr4r · 2011-09-12 08:53 · Score: 1
  
  2 things:
  1. You need more ram
  2. you forgot shmall, which you are probably not hitting with so little ram.
7. Re:Installer improved? by roman_mir · 2011-09-12 08:57 · Score: 1
  
  that's part of install procedure I have for little boxes, that go into separate stores in a chain. They have maybe 2GB, and shmall is not set at all.
  For the central servers the settings are different altogether, it's also 48-96GB RAM.
  
  --
  You can't handle the truth.
8. Re:Installer improved? by Anonymous Coward · 2011-09-12 09:49 · Score: 0
  
  Windows is good for games and internet browsing. Serious applications like database servers should never be run on Windows. It's only for consumer level applications that handle data you don't mind losing.
  If you really need to run a database server, try Redhat or a Debian distribution. Postgres is a top notch choice in either case.
9. Re:Installer improved? by Anonymous Coward · 2011-09-12 10:11 · Score: 0
  
  Come on... What about us guys that want to get off of MS-Access.
10. Re:Installer improved? by fuzzytv · 2011-09-12 10:31 · Score: 1
  
  Because they already have a Windows server and they don't want to buy another machine?
11. Re:Installer improved? by fuzzytv · 2011-09-12 10:33 · Score: 1
  
  Things are improving, although a bit slowly. Plus there are windows installers available at enterprisedb.com.
12. Re:Installer improved? by h4rr4r · 2011-09-12 10:34 · Score: 1
  
  Use sqllite. If you need a real DB install it on a linux/BSD box. No need for the frontend to run on the same machine as the DB.
13. Re:Installer improved? by fuzzytv · 2011-09-12 10:34 · Score: 1
  
  Eh? Where does that script mention amount of RAM? Shared buffers is usually recommended to be 25% of RAM, but that's a recommendation, not a rule. For some workloads, keeping small shared buffers is actually a good thing.
14. Re:Installer improved? by ttong · 2011-09-12 21:53 · Score: 1
  
  apt-get install -y libreadline5-dev apt-get install -y zlib1g-dev apt-get install -y zlibc ln -s /usr/bin/make /usr/bin/gmake gunzip /usr/local/pgsql/postgresql-9.0.4.tar.gz tar xvf /usr/local/pgsql/postgresql-9.0.4.tar
  
  su postgres gmake check exit
  
  echo >> /etc/sysctl.conf
  
  # in postgresql.conf
  
  chmod aguo+wx /etc/init.d/postgres
  
  ln -s /etc/init.d/postgres /etc/rc1.d/K02postgresql ln -s /etc/init.d/postgres /etc/rc2.d/S98postgresql...
  For god sake read the manual of your OS and use the packages!
15. Re:Installer improved? by roman_mir · 2011-09-12 22:13 · Score: 1
  
  For god sake? You think god would like me to use packages instead?
  
  --
  You can't handle the truth.
Re:Universal Relational Database of Probability by tibit · 2011-09-12 08:20 · Score: 1

spectateswamp, is that you? :)

--
A successful API design takes a mixture of software design and pedagogy.
But... by rduke15 · 2011-09-12 08:23 · Score: 1

But I run Debian Stable on all my servers.
Insensitive clod!
1. Re:But... by vlm · 2011-09-12 08:26 · Score: 1
  
  But I run Debian Stable on all my servers.
  Insensitive clod!
  Keep an eye on backports.debian.org
  
  --
  "Science flies us to the moon. Religion flies us into buildings." - Victor Stenger
2. Re:But... by greg1104 · 2011-09-12 08:36 · Score: 1
  
  That will be ready any day now. Packages for Ubuntu are already available. Debian packages for pre-release versions of PostgreSQL 9.1 have been around for some time already.
3. Re:But... by rubycodez · 2011-09-12 14:52 · Score: 1
  
  so do I, and for production server I only run packages, but we love Debian and have on our workstations/laptops too, so let's boogie!!
  
  lsb_release -a
  LSB Version: core-2.0-amd64:core-2.0-noarch:core-3.0-amd64:core-3.0-noarch:core-3.1-amd64:core-3.1-noarch:core-3.2-amd64:core-3.2-noarch
  Distributor ID: Debian
  Description: Debian GNU/Linux 6.0.2 (squeeze)
  Release: 6.0.2
  Codename: squeeze
  
  sudo apt-get install libreadline6-dev zlib1g-dev python-dev libperl-dev
  tar xvfz postgresql-9.1.0.tar.gz
  cd postgresql-0.1.0.tar.gz
  ./configure --with-gnu-ld --with-python --with-perl
  make
  sudo make install
  sudo adduser postgres
  sudo mkdir /usr/local/pgsql/data
  sudo chown postgres /usr/local/pgsql/data
  su - postgres
  /usr/local/pgsql/bin/initdb -D /usr/local/pgsql/data
  /usr/local/pgsql/bin/postgres -D /usr/local/pgsql/data >logfile 2>&1 &
  /usr/local/pgsql/bin/createdb test
  /usr/local/pgsql/bin/psql test
  
  psql (9.1.0)
  Type "help" for help.
  test=# select * from pg_database;
  datname datdba encoding | datcollate datctype |datistemplate dat
  allowconn datconnlimit datlastsysoid datfrozenxid dattablespace
  datac
  template1 10 6 en_US.UTF-8
  template0 10 6 en_US.UTF-8
  postgres 10 6 en_US.UTF-8
  test 10 6 en_US.UTF-8
  
  (4 rows)
  test=#
Hate to say it by bigsexyjoe · 2011-09-12 08:39 · Score: 1

I work with Postgres. I'm "for" Postgres. I think it's great. But you know what? I barely understand what most of these features are. And of the ones I understand, I have no plans of using.
Hate to be a troll, but databases are boring. And Postgres is boring. The database people I know are boring. The guy I know with the most Postgres knowledge I know, is someone I respect as a worker, but is also the most passionless person I have ever met. No one wants an exciting data storage product. They want it to be "mature" and "reliable" and "predictable." I bet that the better Postgres gets, the more boring it'll get. And these exciting "noSQL" databases will all die.

--
Democracy Now! - your daily, uncensored, corporate-free
1. Re:Hate to say it by X0563511 · 2011-09-12 08:40 · Score: 3, Insightful
  
  What's your point?
  Are you going to tell me word processors are boring, spreadsheets put you to sleep, and calculators suck the life out of the party?
  
  --
  For large sets, this will be our guide even unto death, for the LORD will work for each type of data it is applied to...
2. Re:Hate to say it by bigsexyjoe · 2011-09-12 09:14 · Score: 1
  
  My point is that he shouldn't be surprised by the lack of comments. It just isn't the type of announcement people get excited about.
  
  --
  Democracy Now! - your daily, uncensored, corporate-free
3. Re:Hate to say it by roman_mir · 2011-09-12 09:44 · Score: 1
  
  It's just not true.
  I really actually enjoy using PostgreSQL, really, much more than I should probably and much more than I have ever enjoyed using Oracle or DB2 or MSSQL (or progress 4GL aaaaaaa, kill me) in years that I had to use them.
  
  --
  You can't handle the truth.
4. Re:Hate to say it by holdme · 2011-09-12 09:45 · Score: 0
  
  Are you sure you are talking about a database and not about a blow up doll you named postgres?
5. Re:Hate to say it by Anonymous Coward · 2011-09-12 09:46 · Score: 0
  
  Clearly rubbish. You can write "BOOBIES" upside down on a calculator.
6. Re:Hate to say it by Steauengeglase · 2011-09-12 09:50 · Score: 2
  
  It's not their fault they are completely normalized.
7. Re:Hate to say it by GaryOlson · 2011-09-12 14:04 · Score: 2
  
  You failed to trigger the proper result.
  
  --
  Every mans' island needs an ocean; choose your ocean carefully.
8. Re:Hate to say it by tehcyder · 2011-09-12 22:20 · Score: 1
  
  Clearly rubbish. You can write "BOOBIES" upside down on a calculator.
  How? I just don't believe they'd allow filth like that in schools.
  
  --
  To have a right to do a thing is not at all the same as to be right in doing it
Re:Universal Relational Database of Probability by Anonymous Coward · 2011-09-12 08:48 · Score: 0

I've thought about this problem, and a NoSQL type structure would actually work better. There would simply be too many columns for a relational model to work well.
Re:Universal Relational Database of Probability by hantarto · 2011-09-12 08:56 · Score: 0

I are already ahead of you on that one, friend haha. I am try writting for SimpleDB in Erlang now. I hope to execute very soon yes. Very soon we have massively columnar database stretching toward infinity haha!
Thank you all for being so wonderful, slarshdot!
should be excited - what you want already exists by ron_ivi · 2011-09-12 09:14 · Score: 2

The other big thing that I would love to have in a database is ability to scale the database to multiple machines, so have a logical database span multiple disks on multiple machines, have multiple postgres processes running against those multiple disk
This exists for Postgres in the form of Yale's HadoopDB project: http://db.cs.yale.edu/hadoopdb/hadoopdb.html http://radar.oreilly.com/2009/07/hadoopdb-an-open-source-parallel-database.html

HadoopDB is comprised of Postgres on each node (database layer), Hadoop/MapReduce as a communication layer that coordinates the multiple nodes each running Postgres, and Hive as the translation layer. The result is a shared-nothing parallel database, that business analysts can interact with using a SQL-like language. [Technical details can be found in the following paper.]

as well as for commercial forks of Postgres such as EMC's GreenPlum.
Foreign database wrappers by Synerg1y · 2011-09-12 10:57 · Score: 0

The only feature that doesn't already exist in ms sql in the same context, probably oracle and mysql too.
I can almost... deploy psql, do what I need to in it and spit back out sql or something. This seems like a really cool feature if it goes the way of OLEDB and becomes a standard, since with oledb I can use any programming language that talks to databases (php, .NET, C) and still talk to my sql instance.
What I can't think of is... why would I ever use psql over taking the straighter ms sql approach?
I'd love to wrap mysql data in ms sql and vice versa in a reliable manner, and better yet wrap sql 2000 in 2008 to prevent the coalescence goodness, but at least the feature exists somewhere.
Re:should be excited - what you want already exist by tankgrun · 2011-09-12 19:31 · Score: 1

MGRID is also a PostgreSQL fork that scales to multiple machines: http://www.mgrid.net/
Column storage by Anonymous Coward · 2011-09-12 20:31 · Score: 0

I would love to see a column storage engine for PostgreSQL. Vertica was forked from PostgreSQL 8.1.1 and has phenomenal storage capabilities that directly translate to performance.
Is something like this anywhere on the horizon for the mainline PostgreSQL? Everyone would use it...
Re:Why does there always have to be an update?? by rubycodez · 2011-09-13 03:45 · Score: 1

There is no "useless" crap being introduced here, these are higher end features that many need and have been waitnig. Just because you as say a small business or hobbyist or startup venture need a small set of database features, remember there are others who are needing bigger features, and as your needs grow over time so you may need these things too.

You can stick with a particular postgresl version of a distribution that is still getting fixes and patches, for example the postgresql 8.1.9 in Redhat / Centos / Scientific Linux 5.x

Meanwhile, there are applications in big business that need and want the synchronous replication. There are geolocative applications needing nearest neighbor indexing. There are applications that handle multiple character sets that need multiple collations of table data.

I myself have clients that need synchronous replication with a couple stand-by servers, we've been making do with an inferior "bandage application" approach to do the same thing.