Domain: postgresql.org
Stories and comments across the archive that link to postgresql.org.
Comments · 1,107
-
Conspicuous omission - PostgreSQL
it doesn't seem like any code has made it into Open Source databases like MonetDB, MySQL, CouchDB, etc.
Lemme guess, MySQL fanatic?
You can already go download:
https://wiki.postgresql.org/wiki/PGStrom
if it fits your problem domain and PostGIS has some hackers adding GPU support:
http://data-informed.com/fast-database-emerges-from-mit-class-gpus-and-students-invention/
Why not the others? Perhaps because PostgreSQL makes developing extensions easier - it's got the largest extension ecosystem, so I'm just presuming there. If it turns out well in Pg-land, the others will naturally adopt it.
So the answer to the story title is "they do." The next question would be, "why isn't it widely deployed", and the answer would be, "it's not done yet." Yadda, yadda, yadda, patches welcome. If the whole summary is just a way to try to turn "hey this is neat" (it is) into an ill-founded complaint story, then write a better story next time. It's neat stuff, no need to whine.
-
Postgres
Looks like exactly what PostgreSQL's PGStrom project is trying to acheive.
-
Re:Something something online sorting
performance
... put up cash...The biggest opportunity for GPUs in Databases isn't for "performance". As others pointed out - for performance it's easier to just throw money at the problem.
GPU powered databases do show promise for performance/Watt.
However, energy efficiency is not enough, energy proportionality is needed. The objective of this work is to create an entire platform that allows execution of GPU operators in an energy proportional DBMS, WattBD, and also a GPU Sort operator to prove that this new platform works. A different approach to integrate the GPU into the database has been used. Existing solutions to this problem aims to optimize specific areas of the DBMS, or provides extensions to the SQL language to specify GPU operation, thus, lacking flexibility to optimize all database operations, or provide transparency of the GPU execution to the user. This framework differs from existing strategies manipulating the creation and insertion of GPU operators directly into the query plan tree, allowing a more flexible and transparent framework to integrate new GPU-enabled operators. Results show that it was possible to easily develop a GPU sort operator with this framework. We believe that this framework will allow a new approach to integrate GPUs into existing databases, and therefore achieve more energy efficient DBMS.
Also note that you can write PostgreSQL stored procedures in OpenCL - which may be useful if you're doing something CPU intensive like storing images in a database and doing OCR or facial recognition on them: http://wiki.postgresql.org/images/6/65/Pgopencl.pdf
Introducing PgOpenCL - A New PostgreSQL Procedural Language Unlocking the Power of the GPU
-
Re:postgres can.
Yes, some did. Here's a link.
Brought to us by Kohei Kaigai, a very prolific PostgreSQL hacker indeed.
More interesting still is the architectural groundwork of PostgreSQL which makes such fancy things feasible (in this case the foreign data wrapper: rather cool).
-
Re:Blow to NoSQL movement
-
Re:Yeah, but they nailed the "documentation" part
Open-source documentation is like an insomniac cat. Theoretically it exists somewhere, but no one's ever seen it.
Don't over-generalize. The open-source PostgreSQL project has the best documentation of a software project that I have ever seen, open- or closed-source.
Oh, also wanted to add, #postgresql on irc://chat.freenode.net is superb - the best IRC support I've ever seen.
I'm in awe of RhodiumToad over there particularly.
-
Re:Yeah, but they nailed the "documentation" part
Open-source documentation is like an insomniac cat. Theoretically it exists somewhere, but no one's ever seen it.
Don't over-generalize. The open-source PostgreSQL project has the best documentation of a software project that I have ever seen, open- or closed-source.
Which is great news, if you happen to be having a problem with PostgreSQL.
-
Re:Yeah, but they nailed the "documentation" part
Open-source documentation is like an insomniac cat. Theoretically it exists somewhere, but no one's ever seen it.
Don't over-generalize. The open-source PostgreSQL project has the best documentation of a software project that I have ever seen, open- or closed-source.
Other open-source projects with really good documentation: The Linux man pages (documenting the Linux API), Tcl/Tk and Perl. And as far as end-user docs go, LibreOffice is fairly decent, though not in the same league as PostgreSQL.
-
Re:Oracle-friendly site(s)
Writeable foreign data wrappers for PostgreSQL just appeared in 9.3, released a few weeks ago. I'm looking forward to seeing the hybrid storage approaches people build with that, with triggers pushing into a FDW being just one of the possibilities. All sorts of neat data queuing approaches are possible if you also combine that with the new background workers interface. You don't even need to make the FDW write synchronous. When eventual consistency is good enough for your data set, you can have the trigger push them into a queue, and then spool those in the background to other places (like a Mongo or Redis cache) via FDW.
-
Re:Oracle-friendly site(s)
PostgreSQL does have a synchronous commit mechanism mechanism too, which requires data be durably written to two nodes. It has a feature that as far as I know is unique to its implementation too: synchronous_commit is a per-transaction behavior. You have to specifically setup a synchronous standby server, but once it's there you can adjust the durability level you want at each commit. So you can pay for important data be fully synchronous, while streaming by less important things in standard mode, where you might lose a little on failover. You can even drop commit level from the default too, so that something you can afford to lose all copies of on a crash avoids any sort of disk commit before completing.
Since synchronous transactions block if there's not a second node around, in practice you have to deploy 3 nodes (to be able to tolerate losing one) and commits go to 2 of them. It doesn't have the ability to rollback locally committed transactions from a sever that's been disconnected yet though, that's a weak point compared to Mongo. A disconnected node that committed transactions nobody else saw is FUBAR, you have to rebuild the entire node to fix it.
-
Re:Oracle-friendly site(s)
No, I mean MongoDB will take a 3 database cluster and let you "Replica Acknowledge" a transaction with "Majority" count. Once it hits 50%+1 servers, it's 100% guaranteed solid unless you lose both servers. If both servers suffer a power drop at that point, the last server refuses to accept writes; when those servers come back, they will replay their oplog back to the last server to synchronize it. There's one flaw here: there's no "Replica Journal Acknowledge", so it's theoretically possible to lose that transaction anyway; both servers have to suffer a system failure (power drop, kernel panic) within 100mS of receiving the operation, since they write out their data to disk every 100mS. In practice this is extremely unlikely.
That means once you've sent it and gotten back that it's written, it is written. You'd have to lose both (or more--3 servers in a cluster of 5, etc.) servers' power or hard drives (corruption, failure) before the data is propagated further.
By contrast, Percona and MariaDB have XtraDB. XtraDB does optimistic locking: in normal autocommit, the transaction might get rolled back silently--it will write successfully to one server and return success, but if another server simultaneously gets a write that conflicts and starts propagating it then the transaction will be silently rolled back (i.e. undone, removed, lost, failed). With BEGIN-COMMIT transactions, you may get a Deadlock on "COMMIT" and then you're informed that it did in fact roll back the transaction and you must re-submit (i.e. do this if you actually care about durability of the data). With autocommit, as well as with any transactions (even explicit COMMIT) on MySQL master-slave replication or PostgreSQL WAL replication, you may in fact be informed that the transaction is 100% committed and then have that server FAIL and the slave comes up without that transaction--unavoidable silent data loss.
The failure mode expressed by MySQL master-slave replication and PostgreSQL WAL replication in the default asynchronous streaming replication mode is the same failure mode as with "Journaled" write concern in MongoDB. When running "Journaled" rather than "Replica Acknowledged," you write to exactly one server and are told it's committed when it's written to disk--it's durable on that server, but not necessarily replicated. If that server power drops and comes back up, it may find new operations have made its non-replicated operations invalid; it will then silently roll those back.
Therefor, in cluster layouts, it is possible for MongoDB to have a negligible reliability advantage over PostgreSQL's most common replication methods. PostgreSQL has settings that make up that last bit of reliability, putting it roughly on par with MongoDB. MongoDB has a guaranteed "It has reached enough servers that it is valid on the cluster unless God hates you" write concern by which the data is likely to actually be there if it tells you it's there, unless a very specific subset of servers experience a catastrophic failure in an extremely small (tenths of a second) window--a subset large enough to take down your entire cluster.
Short version: MongoDB allows you to, on a per-query basis, write data into the database at any level of reliability that MySQL and PostgreSQL provide. Single-server Journaled (WAL log shipping, WAL asynchronous streaming, MySQL master-slave replication), multi-server Replica Acknowledged (PostgreSQL WAL synchronous streaming), and a single-server "Acknowledged" mode that is faster but gives a weaker data durability guarantee (transaction is valid, not yet to disk, and not replicated).
*"PostgreSQL streaming replication is asynchronous by default. If the primary server crashes then some transactions that were committed may not have been replicated to the standby server, causing data loss. The amount of data loss is proportional to the replication delay at the time of failover."
-
Re:ON DUPLICATE KEY UPDATE
It's in the works, hopefully for version 9.4.
-
Re:and so meanwhile...
About the only technical advantage MySQL has over Postgres is an easier setup, and generally better performance out of the box (before any tuning).
I think that clustering on Postgres is so obscure as to frighten off all but the most knowledgeable (or foolhardy idiots). You are presented with a list of do it yourself options, but no concrete recommendations. Comments like "pgpool 1/2 is a reasonable solution. it's statement level replication, which has some downsides, but is good for certain things" probably only help if you know exactly what you want already!
-
Performance defaults?
I don't really remember that well anymore, but Linux and MySQL have always been tied together...probably because mysql was relatively fast out of the box. Even today Postgres' default's suck, and the wiki says so:
http://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server
"One reason the defaults are low is because on some platforms (like older Solaris versions and SGI), having large values requires invasive action like recompiling the kernel"
I mean, who remembers when there was a Solaris kernel that you could recompile (sunos4). Who gives a shit about IRIX? I mean, do you still have to specify cylinder and sector counts? WTF? It's 2013. Update your config for a Xeon 2ghz box with 8gb of ram already. If you want to be conservative, use a core 2 duo.
I kept expecting something in the doc to say "due to the lack of floating point coprocessors on some 386 systems so that's not assumed" or "CP/M wasn't designed for multi-user time sharing, so we implement manual time slicing."
Maybe they don't update the config because they want to get paid the consulting $$ to tune it?
-
Re:and so meanwhile...
As long as MariaDB is requiring copyright assignment, there's every reason to believe it will be sold off again the same way MySQL was. The FSF gets away with that for GNU projects because they've never abused contributor trust before. Monty is no FSF, and there's no reason believe MariaDB will remain outside of commercial control any better than MySQL did. I can't believe people are falling for the same trick again.
PostgreSQL aims for SQL standards conformance as much as possible. It's hard sometimes due to the difficulty of participating in the standard process. The idea that MySQL does a better job in that area is kind of odd though. You'll have to list some sample Postgres "oddities" to be credible with that claim.
-
Past Coverity reviews
Coverity's services have been useful to a number of open-source projects. But this article is carefully picking its terms to get a headline worthy result. Compare against the Coverity scan of PostgreSQL done in 2005 for example, and CPython's defect rate isn't very exciting at all. But that was "Coverity Prevent" and this is "Coverity Scan"...whatever that means.
-
Re:TDD
"the fun thing about a kernel is that there is no good way to test it except to run it" --Greg Kroah Hartman
I work on PostgreSQL, and nothing goes out until it's been validated on the entire buildfarm. It's hard to have such a thing for the Linux kernel though, because it's so easy for a bug to break test machines. You need to catch when machines are responding, do a hardware reset, and then rollback to a known good kernel instead. It's much harder than most software testing to automate.
-
Re:Assange vs Mannings vs Snowden vs Greenwald
Is Assange a journalist?
I'd say yes, but remember, he wasn't always in favor of leaks.
-
Re:Linus management technique works
I don't think profanity and abuse is necessary. The Postgresql team has produced decent software without much profanity or abuse (see their mailing lists archives: http://www.postgresql.org/search/?m=1&q=fuck&l=1&d=-1&s=r ).
Once a project has a culture of abuse it's hard to change it esp in an OSS project. There aren't that many people with the knowledge, desire, skills and reputation to lead the project - you get the knowledge by being part of it, and then you might get the culture too
;). Add in a nonabusive temperament and the ability to convince others to be nonabusive and it's just easier to stick with Linus than to find an alternative ;).But yes this may reduce the number of contributors. Not many people _need_ to work on the Linux kernel and put up with abuse.
-
Re:WRONG! Also Transactions!
When I heard that PostgreSQL 9.2 (released last September) was touting its new feature of index only scans, I thought, wow these guys are way behind. Oracle and IBM had that in the mid-'90s and Sybase and Microsoft probably did too. To use a car analogy, that's like hearing an upstart manufacturer announce that electronic fuel injection will now be standard on all of its new cars, replacing carburetors. You think, gee what else are they missing?
-
Re:Does Postgres do online backup?
Sure, you don't need to shut down the database to do a backup. Just call pg_start_backup, do whatever copy you want (copy of the files, LVM snapshot,
...), and you're golden. And you can easily setup a read-only replica using the snapshot (preferably with a WAL archive), which is probably what you mean by online restore.Opening a case
... you may submit a bug whenever you want, either using the web form http://www.postgresql.org/support/submitbug/ or by e-mail into the mailing list http://www.postgresql.org/list/pgsql-bugs/. However the PostgreSQL project does not provide any "corporate support" as such. If you need a support regular to the Oracle support (but actually responding and working as expected), you need to contact one of the companies that provide that - it might be EnterpriseDB, 2ndQuadrant, Command Prompt, Dalibo, Cybertec ... or one of the other companies.I'm not sure about the VM thing, but as they do offer "vFabric Postgres" [http://www.vmware.com/products/application-platform/vfabric-postgres/overview.html] and that VMWare is generally a PostgreSQL-friendly company, I'd say it might be certified.
-
Re:Does Postgres do online backup?
Sure, you don't need to shut down the database to do a backup. Just call pg_start_backup, do whatever copy you want (copy of the files, LVM snapshot,
...), and you're golden. And you can easily setup a read-only replica using the snapshot (preferably with a WAL archive), which is probably what you mean by online restore.Opening a case
... you may submit a bug whenever you want, either using the web form http://www.postgresql.org/support/submitbug/ or by e-mail into the mailing list http://www.postgresql.org/list/pgsql-bugs/. However the PostgreSQL project does not provide any "corporate support" as such. If you need a support regular to the Oracle support (but actually responding and working as expected), you need to contact one of the companies that provide that - it might be EnterpriseDB, 2ndQuadrant, Command Prompt, Dalibo, Cybertec ... or one of the other companies.I'm not sure about the VM thing, but as they do offer "vFabric Postgres" [http://www.vmware.com/products/application-platform/vfabric-postgres/overview.html] and that VMWare is generally a PostgreSQL-friendly company, I'd say it might be certified.
-
Re:Security record
So you can prove that? Because when a company says "we have a process for that " it's often equivalent to a cluster-fuck in the future.
I can't really imagine anything more secure that the transparent development model used by PostgreSQL - you can see all the decisions in the mailing list, you can comment whatever you want, you can ask questions about supposed issues, you can see all the individual commits etc. And many people actually do that, so this is an example of a project where the "many eyeball theory" actually works. Can you do the same with Oracle, or do you simply trust to the corporation because they said so?
It's not that long ago when a serious security vulnerability was discovered in PostgreSQL, and in retrospect I think it was handled well (and the next feature will be handled even better). For more details see http://www.postgresql.org/support/security/
-
Re:There is no One True Way
girlintraining is absolutely right. Heed her advice: know thyself. With that said, I'm going to give you what works for me and some gotchas I've found in the field.
Some people think they need just a specific answer so they learn by googling for everything and never learn outside of a narrow box. Personally, I find they often don't make good programmers or DBAs. I think a disciplined approach to learning a language or database is a must. The PostgreSQL documentation is excellent in that regard. It starts off with a brief history of the database, then gives you basics and then moves on to more advanced topics. Yes, yes... of course you have to use your google-fu for specific answers sometimes, but I think you need a rock-solid core to base your studies off of. Not everyone works this way, but I think we'd have better programmers if they had good material to base a foundation from. When I learn a new language, I hunt down a good, comprehensive core to work from and work my way though.
Next: Practice. Being a book worm won't get you anywhere in the real world. You have to be able to utilize that knowledge. I like to give myself a task or a goal and work towards completing it. I'm not talking about "make a table" then "add a row". I'm talking about working towards a complex goal. My favorite is the address book. It can be extremely simple or incredibly complex. Take something like names. You can just slap a common name into a field in a table, right? Are you sure? How many digits are in a phone number? I live in Germany and the number of digits vary greatly... sometimes within the same city. Who lives at what address? What if a person lives at multiple addresses? (North in the summer, south in the winter.) What if a person has more than one phone number? What if a phone is shared between people at home? Or a team of people in a work place? Another option is to form links between people: who got married? Divorced? Married again? When? Kids? From which spouse? Don't get me wrong... you can drop all this information into a single table or into thirty. It can as complex or simple as you want. You decide. The most important thing is to have something to practice against as you learn.
Good luck.
-
Re:The sorts of things you get
Oracle has to do a complicated implementation in all cases because of how their rollback logs are structured, which makes certain types of things impossible to reverse without dipping into a record of earlier database states. But Postgres has fully transactional DDL in a way that's even a bit better than how Oracle approaches the problem. When you can rollback from DROP TABLE already, small jobs returning to an earlier state are possible without even dipping into the history.
We would have to dump a lot of disk space into the obvious ways to implement deeper rollback targets in Postgres, and Oracle has surely put a lot of engineering work into making that efficient. The design for Postgres I have in mind would be a disk pig. I know it would work though, because I'm already hacking together business SLA level rollback targets for people--things like "we must be able to recover from any fat finger error made in the last 2 hours in less than 10 minutes". All of the hard pieces needed are already inside of the replication and rollback code, they just need to be reassembled for this goal. There's even a generic background worker infrastructure in Postgres now, which makes it easy to create more schedule driven daemons like the existing background writer or autovacuum worker. Two years ago, even an inefficient Flashback clone would have been impossible for PostgreSQL to reach. Now enough of the internal components needed are there that it's just waiting for someone to pick an initial business goal target and build a UI to reach that one.
-
Re:The sorts of things you get
> Flashback queries and flashback archives (they are really cool)
Is that the same as time travel?Nope, in Oracle you can run this query on any table to view the data it held yesterday:
select * from emp as of timestamp(sysdate-1);
Better still:
/* Which employee records did Joe accidentally drop again? */
select * from emp as of timestamp(sysdate-1) where empid not in (select empid from emp /* as of right now */); ... to see what you've dropped, and say if you know it was only that table affected then you don't even have to shutdown the database to get it back:
insert into emp (---insert above query here---);At database level it's common before a potentially risky data change to create a flashback point, and if it messes up, shutdown, revert to the time you created the flashback, and pretend the changes never happened. It happens as fast as all the necessary extents can be written back to the data files and the database can be restarted.
But but but
... you are paying for those features. ~$10k per pair of intel cores per year before haggling... if all you want is a rock solid database, Postgres should be at least as good, and at a much lower price. -
Re:The sorts of things you get
Btw, PostgreSQL 9.3 (in beta 2 currently) has materialized views:
http://www.postgresql.org/docs/devel/static/rules-materializedviews.html
-
Re:The sorts of things you get> Flashback queries and flashback archives (they are really cool) Is that the same as time travel?
Nope, in Oracle you can run this query on any table to view the data it held yesterday:
select * from emp as of timestamp(sysdate-1);
>Oracle uses a statistical optimizer for execution plans in the engine. >They are working through the 2nd generation of it to handle situations where they are lots of high frequency values Here you go. Perhaps Oracle's statistics are more sophisticated: can you enlighten us?
At a first glance postgres seems to have all the important stuff in there, but I'm not sure how postgres handles the execution plans for prepared statements when you have skewed data and histograms on a column. This was a pain in oracle 10g and lower....
> Temporary table undos Don't know about this one.
Temporary tables in oracle are tables with data only available for the duration of your session. Everyone use the same table, but they only see their own data. Because there's no ACID compliance here there's very little redo/undo generation (WAL and undo)
-
Re:The sorts of things you get> Flashback queries and flashback archives (they are really cool) Is that the same as time travel?
Nope, in Oracle you can run this query on any table to view the data it held yesterday:
select * from emp as of timestamp(sysdate-1);
>Oracle uses a statistical optimizer for execution plans in the engine. >They are working through the 2nd generation of it to handle situations where they are lots of high frequency values Here you go. Perhaps Oracle's statistics are more sophisticated: can you enlighten us?
At a first glance postgres seems to have all the important stuff in there, but I'm not sure how postgres handles the execution plans for prepared statements when you have skewed data and histograms on a column. This was a pain in oracle 10g and lower....
> Temporary table undos Don't know about this one.
Temporary tables in oracle are tables with data only available for the duration of your session. Everyone use the same table, but they only see their own data. Because there's no ACID compliance here there's very little redo/undo generation (WAL and undo)
-
CONNECT BY?
This has been superseded by SQL standard CTE (aka Common Table Expressions) Oracle has that. PostgreSQL has that too.
-
Re:The sorts of things you get
> Materialized views (and all the related magic)
Here you go
> Flashback queries and flashback archives (they are really cool)
Is that the same as time travel?> Index only scans (can be a major performance boost)
Always glad to serve> No transaction control in stored functions
Sorry -- I don't understand this one: Is that a feature or a mis-feature?> Oracle handles queries that return 50k plus records far far better.
Would be interesting to quantify that. But yes, perhaps.>Oracle uses a statistical optimizer for execution plans in the engine. >They are working through the 2nd generation of it to handle situations where they are lots of high frequency values
Here you go. Perhaps Oracle's statistics are more sophisticated: can you enlighten us?> Temporary table undos
Don't know about this one.> Oracle is really an excellent product
So is PostgreSQL. -
Re:The sorts of things you get
> Materialized views (and all the related magic)
Here you go
> Flashback queries and flashback archives (they are really cool)
Is that the same as time travel?> Index only scans (can be a major performance boost)
Always glad to serve> No transaction control in stored functions
Sorry -- I don't understand this one: Is that a feature or a mis-feature?> Oracle handles queries that return 50k plus records far far better.
Would be interesting to quantify that. But yes, perhaps.>Oracle uses a statistical optimizer for execution plans in the engine. >They are working through the 2nd generation of it to handle situations where they are lots of high frequency values
Here you go. Perhaps Oracle's statistics are more sophisticated: can you enlighten us?> Temporary table undos
Don't know about this one.> Oracle is really an excellent product
So is PostgreSQL. -
Re:The sorts of things you get
> Materialized views (and all the related magic)
Here you go
> Flashback queries and flashback archives (they are really cool)
Is that the same as time travel?> Index only scans (can be a major performance boost)
Always glad to serve> No transaction control in stored functions
Sorry -- I don't understand this one: Is that a feature or a mis-feature?> Oracle handles queries that return 50k plus records far far better.
Would be interesting to quantify that. But yes, perhaps.>Oracle uses a statistical optimizer for execution plans in the engine. >They are working through the 2nd generation of it to handle situations where they are lots of high frequency values
Here you go. Perhaps Oracle's statistics are more sophisticated: can you enlighten us?> Temporary table undos
Don't know about this one.> Oracle is really an excellent product
So is PostgreSQL. -
Re:The sorts of things you get
> Materialized views (and all the related magic)
Here you go
> Flashback queries and flashback archives (they are really cool)
Is that the same as time travel?> Index only scans (can be a major performance boost)
Always glad to serve> No transaction control in stored functions
Sorry -- I don't understand this one: Is that a feature or a mis-feature?> Oracle handles queries that return 50k plus records far far better.
Would be interesting to quantify that. But yes, perhaps.>Oracle uses a statistical optimizer for execution plans in the engine. >They are working through the 2nd generation of it to handle situations where they are lots of high frequency values
Here you go. Perhaps Oracle's statistics are more sophisticated: can you enlighten us?> Temporary table undos
Don't know about this one.> Oracle is really an excellent product
So is PostgreSQL. -
Re:locking?
PostgreSQL has had full multiversion concurrency (MVCC) for over a decade or so (at least since version 7). It simply doesn't have this sort of locking issue.
-
Re:New features?
Recovery is also awesome. "ALTER DATABASE RECOVER UNTIL [timestamp]", "ALTER DATABASE RECOVERY UNTIL CANCEL", "ALTER DATABASE UNTIL CHANGE [transaction number]" and so on.
PostgreSQL has had this sort of feature for a few years now, see recovery target. The main thing Oracle is better at is giving a simpler UI to this work. This sort of ALTER DATABASE interface is easier to use than the config file approach Postgres uses. Similarly, Oracle Flashback is a slicker interface to look at old data than you get out of the box with Postgres, but the same under the hood look is possible in both databases.
The differences in the backup/recovery area are pretty small now though, especially now that UI clones such as pgbarman are available. The last time I wanted a better UI to part of this that wasn't in any of the available free tools yet, I paid a few thousand dollars to the pgbarman developers and they added it for me. Companies who are willing to spend some money on licensed software really should consider what spending the same amount on improving PostgreSQL would get them.
-
Re:The End
Many people are migrating from Oracle to Postgres...
-
Re:Getting an education today is hard
If you use Linux, and other open source software, you can do a lot of learning and paid work in the software industry without having to pay expensive licences - while still being strictly legal!
word processor & other office software:
http://www.libreoffice.org/database:
http://www.postgresql.org/compilers:
http://gcc.gnu.org/operating system & sufficient software to do useful things (2 of over 100 offerings, pick one that suites you best!):
https://fedoraproject.org/
http://www.debian.org/network diagnostic:
http://www.wireshark.org/ ... and many others ... -
Re:or sqlite
Nice try... Troll all you want.
A number of large and important organizations are using PostgreSQL quite extensively for critical transaction processing:
http://www.postgresql.org/about/users/
And EnterpriseDB:
http://www.enterprisedb.com/success-stories/customers
It's not always the best fit, but it's very mature, and can handle most workloads you could want to use it for.
-
Re:Postgres
Because a sysadmin should be choosing how an application is built, and not architects and software engineers?.... and because postgres isn't available from repositories?....
-
Re:or sqlite
Yes, you need to look at the XML functions. There are options ranging from outputting a single field as XML up to an entire table/query as XML.
-
Re:or sqlite
The default configs for postgres are set for a fairly small memory usage profile (*), which is fine if that's what you need (e.g. tiny vm or something that makes it a huge production to raise things like max shm size), but if you have sufficient ram, you can crank a hell of a lot more performance out of the engine by making the configs less conservative. This page is a good start: http://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server
Not that it's a priori *wrong* to run with the defaults, it'll still work just fine, but once you start having significant traffic or complicated queries you'll be happier if it more fully uses the system resources available.
(*) It's been a good while since I last had to take a pg instance from stock and tune it, but I very vaguely recall the default settings were on the order of a eight megabytes of ram usage.
-
Re:Postgres has a poor toolset
The performance analysis tools for PostgreSQL are still rough, but they're coming out stronger now than ever before. The old slow query profiling approach is based on database log files, and the pgbadger tool has gotten a lot of improvements in the last year to take the lead in that area. Some web app providers have added PostgreSQL data collection and visualization to the products recently, Datadog is a good example, they even run Postgres internally.
Last year's PostgreSQL 9.2 added a built-in query profiling feature via an improved pg_stat_statements module. That makes it relatively easy to see what queries are taking up time on the server, in a way that matches similar statements based on underlying their query plan. I wrote a sort of call to arms to suggest how the next generation of analysis tools can leverage that in Beyond Query Logging.
You are correct that no one has really grabbed ahold of this area and put together a really easy to use tool set around it. All of the hard to construct pieces needed are in place now, and my list of goals for this year's 9.3 development includes pushing the tuning methodology outlined in my High Performance PostgreSQL 9.0 book into some reference tool implementations. The idea that this is a "black art" is coming from consultants who want you to be intimidated. People who want to understand how things work can read my book, and then wander out to confidently build terabyte size databases. I talk with new people who have done just that every week now.
-
Re:Postgres
True, but that's a bigger change. MariaDB is a drop-in replacement for MySQL, because it is just a forked/renamed MySQL. To switch to Postgres typically requires some porting.
-
Re:That depends on what kind of user base you want
I observed that, I just wanted to stop and question your assumptions around why to pull in another database at all. Thank you for answering that clearly.
If your backend is something as developer oriented as PostgreSQL, are all sorts of interesting but not very well known things you can do there. You can push the difficult cache invalidation issues that always pop up in the top layer directly into triggers. If a piece of work turns out to have larger implications than you want to deal with right there, you can use the LISTEN/NOTIFY interface to kick off asynchronous work. And that work doesn't even necessarily have to happen on the server. Add in the Foreign Data Wrapper feature, and you can farm things out very easily to arbitrary worker nodes, including ones built with other database technologies. Make the worker another PostgreSQL database (presumably optimized for different purposes) with postgres_fdw, and you can even get the information it knows to percolate upward into top level query planning--while low level execution happens elsewhere.
PostgreSQL has a lot of these interesting features that let you build more complicated architectures. The minute you add another database layer in front of that, you've done something akin to dumbing down the API available. If I build a 3-tier system, it's usually a memcache style layer at top, PostgreSQL as the full featured middle, and something else below there--but under Postgres's control. This is nice from the perspective that until you really need it (when a light system will do) you don't necessarily even need to add the second database into the mix. When it does show up, you can hide it either with FDW or within Postgres's ability to have functions in most popular programming languages, and maybe not even have to tough the top layer to do so.
Take this philosophy far enough and you can end up with something like Skype's PostgreSQL installation, where exactly how/where the data underneath is stored is done via pl/proxy. Whenever I see people start on the "but PostgreSQL alone will never be fast enough for my web scale app" I ask them if they're more popular than Skype. If not, you can probably ignore using MySQL or NoSQL altogether if you're clever about it.
-
Re:That depends on what kind of user base you want
I observed that, I just wanted to stop and question your assumptions around why to pull in another database at all. Thank you for answering that clearly.
If your backend is something as developer oriented as PostgreSQL, are all sorts of interesting but not very well known things you can do there. You can push the difficult cache invalidation issues that always pop up in the top layer directly into triggers. If a piece of work turns out to have larger implications than you want to deal with right there, you can use the LISTEN/NOTIFY interface to kick off asynchronous work. And that work doesn't even necessarily have to happen on the server. Add in the Foreign Data Wrapper feature, and you can farm things out very easily to arbitrary worker nodes, including ones built with other database technologies. Make the worker another PostgreSQL database (presumably optimized for different purposes) with postgres_fdw, and you can even get the information it knows to percolate upward into top level query planning--while low level execution happens elsewhere.
PostgreSQL has a lot of these interesting features that let you build more complicated architectures. The minute you add another database layer in front of that, you've done something akin to dumbing down the API available. If I build a 3-tier system, it's usually a memcache style layer at top, PostgreSQL as the full featured middle, and something else below there--but under Postgres's control. This is nice from the perspective that until you really need it (when a light system will do) you don't necessarily even need to add the second database into the mix. When it does show up, you can hide it either with FDW or within Postgres's ability to have functions in most popular programming languages, and maybe not even have to tough the top layer to do so.
Take this philosophy far enough and you can end up with something like Skype's PostgreSQL installation, where exactly how/where the data underneath is stored is done via pl/proxy. Whenever I see people start on the "but PostgreSQL alone will never be fast enough for my web scale app" I ask them if they're more popular than Skype. If not, you can probably ignore using MySQL or NoSQL altogether if you're clever about it.
-
Re:What a relief
Why are you so afraid of competition that you (like others before you have for years) suggest that everyone drop mysql for postgre ?
The short form is Postgres, not Postgre. Postgres was started after the Ingres DB project, and was post-Ingres or Postgres. That eventually lead to being renamed Postgres95, which eventually was renamed to PostgreSQL. The short form nickname though is still Postgres.
Citation: http://www.postgresql.org/about/history/
-
Re:Explain, please?
SkySQL is a commercial entity that uses MySQL and now MariahDB - http://www.skysql.com/
They are replacing MySQL with MariahDB for their hosted solutions and throwing financial backing at the project. MariahDB is not going away. I would encourage you to look into PostgreSQL however as an alternative: http://www.postgresql.org/
-
Re:sounds like...
Just out of curiosity, did you use PGStrom or roll your own pgsql/GPU solution? If the latter, did you also hook into pgsql via the FDW interface or some other way?
-
Re:Start with scalable technologies!
This is the best response I've read in the entire thread. I just wanted to add, you are probably okay with SQL if you are familiar with that and you're expecting "thousands of users simultaneously." Postgresql 9.2 can hit around 14,000 writes per second. I'm sure MySQL is similarly capable. If you need more than that, then you have to have go with something like Cassandra.
Netflix has demonstrated Cassandra can hit 1.1 million writes per second on Amazon's commodity hardware. You just have to be willing to sacrifice consistency to get it.
Finally, curb your enthusiasm. When you first have an idea that seems big, it's easy to get carried away. Ask yourself, is this something I really want to work on for the next five years of my life? When you start to dig into the actual implementation, you're gonna get bogged down in details you didn't consider when you were so enthusiastic.
At that point, you're going to either think about the problem day and night until you find a solution, or you're going to say "This is stupid anyway" and want to move on. It's harder to move on once you've told friends/family/investors/etc. After doing it a few times, you look like someone who never follows through. When you DO have a really great idea that's solvable, you're the boy who cried wolf.