PostgreSQL on Big Sites?
An anonymous reader asks: "I've been using PostgreSQL for years on small projects, and I have an opportunity to migrate my company's websites from Oracle to an open-source alternative. It would be good to be able to show the PHBs that PostgreSQL is a viable candidate, but I'm unable to find a list of high-traffic sites that use it. Does anyone know of any popular sites that run PostgreSQL?"
See, for instance, PostgreSQL Case Studies and from the pgsql-advocacy mailing list comes some more: Finally, a list of *big* companies using PostgreSQL for *serious* projects. Why use PostgreSQL? Here's why for some examples.
How am I supposed to fit a pithy, relevant quote into 120 characters?
Story
MadPenguin has an interview with Josh Berkus, one of the core team members of PostgreSQL.
I've never used PostgreSQL so I can't and won't say anything about it other than this: Make sure Postgres does everything you need and can perform similarly to Oracle in your environment.
We momentarily thought about dropping Oracle for PGSQL at my last company, but after we hired a consultant to do everything he could with Postgres to improve performance, Oracle was still a clear winner for us.
I don't know if he was incompetent or what, but the performance numbers weren't even close with what we needed it to do.
If your database will run just as well on PostgreSQL, I say go for it. If you go with PostgreSQL and it doesn't perform as well as Oracle in your environment, your management will have serious doubts about open-source software from then on, and that's a stain that is hard to get rid of.
in short: choose based on your needs, not based on the fact that one is open and the other isn't.
Is your companies website essentially read-only page loading? If so, why not just go with MySQL. Do you really need MVCC in a read-only scenario?
On the other hand, If your company is doing transaction processing, like a customer facing product ordering system (think amazon), its a lot more than just having to sustain certain volumes. The reputation of your company and its ability to make money by selling products will rely entirely on your database. In a best case scenario there may be no difference between oracle and postgres. But imagine the worst case scenario. Peak volume, company is making $1M/hour in sales on the web, db dies and won't come up....who you gonna call?
There's more to the equation than up front cost and ability to handle volumes....
mp3's are only for those with bad memories
SPI, the authoritative .org registrar, and Afilias, the authoritative .info registrar both use PostgreSQL for their registration databases.
I am no longer wasting my time with slashdot
Their website shows that BASF uses PostgreSQL as their DB.
www.basf.com
They're an enormous company. I've always heard too that PostgreSQL is much better for larger sites. Cannot say for sure though as I have never used it.
Alcohol & calculus don't mix. Never drink & derive.
There's no reason however to write all your SPs in PLSQL. Oracle supports stored procedures in Java, as does Postgres.
This not only makes it easier in some instances to migrate some applications to PGSQL, it also improves performance (JIT compiling). You don't say exactly where the performance bottlenecks are, but this could improve performance and close the gap between PGSQL and Oracle.
That said, if you've been working for years on tuning your Oracle physical design to a fare-thee-well, it's going to be nearly impossible to beat, supposing the transaction volume and query performance are the chief issues.
Post may contain irony: discontinue use if experiencing mood swings, nausea or elevated blood pressure.
Apple's remote desktop 2 package uses PostgreSQL for its data store.
link
God save our Queen, and Heaven bless The Maple Leaf Forever!
OpenACS has been Postgres-based for a long time, as a free alternative to Oracle. You can get plenty of Postgres information at www.openacs.org The folks there have been using it for years for all kinds of sites, so it's pretty well tested. OpenACS is a unique system using AOLServer and TCL, but the database performance should translate to whatever server/scripting platform you're using.
...on some hefty hardware these days. This post talks about running it on a 16 CPU machine...
The Army reading list
We are an e-Learning company which started 4 years back with very little startup budget. We have been using Postgres for 4 years now and it has never let us down. We never imagined our company would grow so big so fast. Today we provide an ASP solution for over 10,000 users from around 20 companies. Postgres scales very well and is quite responsive. In the past we have had periods of 100% CPU utilization but postgres did not crash on us. You have to know how to configure it correctly and is will perform as well as a commercial DB.
My company uses PostgreSQL and are pretty happy with the performance. The only problem we had was in November when the Google spider went crazy and hit us a few million times a day for a few weeks. After a few hours of optimization, the sites were running smooth. A few years ago we had to come up with a db platform and we were a small company. We could use oracle but it's all around expensive. Oracle software, support, licensing, and engineers are expensive. Mysql's transaction support was too bleeding edge at the time. What I like most about postgresql is the transition from oracle to postgresql is smooth and most our engineers come from an oracle environment. Plus postgresql has adequate transactions support, subselects and functions...and it's free.
(In defense of Google, their spider did not intentionally go crazy - we have distributed webservers on seperate IPs so the spider can't tell if it's pounding one particular site. However Google only spidered more pages as a publicity stunt before MSN search was released so maybe they are to blame...)
This question really requires more data. How much traffic are we talking about? How much data are we talking about? And then there are all sorts of variables, like the type of content begin stored in the database, the number and types of queries that are done on each page, and the type of caching your application is doing.
Also, if Oracle is already purchased and paid for, you will have a difficult time making a business case for PostgreSQL.
Don't get me wrong, I like PostgreSQL. But you will want to have a reason for switching, aside from PostgreSQL being open source.
Just because your company can spend hundreds of thousands of dollars (or millions for a large installation) on something that's really orthogonal to the actual business that your company is in, doesn't mean you should.
If I was a PHB type for an online retailer and I looked at the costs and noticed that 50% of our profits are going to Oracle rather than to our pockets, I'd have some questions for the IT guys like:
(1) Are we a retailer or a data warehousing company?
(2) What is Oracle and why is it so expensive?
(3) Can you get the same job done with less money? If so, what costs, benefits, and risks might we see?
(4) My friend's IT guys use this thing called Post-whatever-SQL, and it costs $0. Is Oracle kinda like that?
Social scientists are inspired by theories; scientists are humbled by facts.
I'd be interested to read a case study if you're willing to write one.
Just to make sure, you didn't leave the postgresql shared_buffers setting default did you?
Social scientists are inspired by theories; scientists are humbled by facts.
I forgot to mention this in my previous follow-up: your description of why you rejected PostgreSQL lacked many salient details including: any detail on what you organization used a database for, how Oracle was able to cater to that need better than PostgreSQL, why performance matters so much as to outweigh placing your client's data into a proprprietary program, how performance was being measured, and what the figures for performance were.
It's not possible to draw insight from such a description, hence I question why your post was moderated as insightful.
The portion of your post I agree with is the first part about making sure a replacement program fits your needs before switching to it. It seems like a lot of organizations don't do this, so they end up sticking with programs that leak information to untrusted parties and cost a lot of money to maintain, or require time-consuming and expensive upgrades; some of the shortcomings of denying oneself software freedom.
Digital Citizen
I tend to think of the PostgreSQL replication problem the same way people approach any problem: None of the solutions are endorsed as the "official" answer to the problems (because there is no absolute authority on these issues.) All have their shortcomings. All have their benefits. It's up to you to decide which combination of problems and benefits you want.
PostgreSQL, like Linux, is more like an ecosystem of software, where you can go and pick and choose or even write your own stuff. It's not as diverse or as popular as Linux. As far as database systems go, however, it is the most diverse project out there.
Oracle, on the other hand, is like Stalinist Russia. You can't pick and choose. There's only one model of car. And you have to buy the same bread at the same price at the same time as everyone else. And consumer input? Practically zero.
PostgreSQL's biggest advantage is that it is extensible. The simple testimony to that fact is that there are numerous excellent extensions out there, written for a variety of purposes. Oracle is not, and will never be as long as it is closed source. (Doesn't this sound like Linux vs. Windows to you? It should. PostgreSQL people pride themselves on their openness.)
Finally, one caution I like to give to people who are looking at PostgreSQL. Is it panacea? Of course not. There are problems with it. The problems are different than the problems of Oracle. But they exist. The biggest difference is the culture and the philosophy. So when you choose your database, choose the culture and philosophy you desire or agree with the most, and the software and solutions will come naturally and you'll be much happier in the end. This may mean that you prefer the world of Oracle over PostgreSQL. That's not a decision I can make for you.
The radical sect of Islam would either see you dead or "reverted" to Islam.
Really, this goes against what I've seen. But when I say large projects I generally mean many many users with a fair percentage of them writing to the database, while for other folks "large project" means LOTS of nearly static data for things like a repository.
For mostly read depositories, MySQL is pretty good. When you start mixing in more and more writes, it tends to not do so well with MyISAM tables, and innodb don't quite keep up with PostgreSQL. But they're pretty good.
--- It is not the things we do which we regret the most, but the things which we don't do.
The challenge to port an Oracle application to PostgreSQL is much less than to port an Oracle application to MySQL. Particularly in the training department, since MySQL is the most unlike Oracle.
Also, the application matters a lot. MySQL is very effective as a cache to hold a relation. It would not surprise me if many of those companies use Oracle/DB2/MSSQL/PostgreSQL as a backend database, and then use MySQL to cache some of the data for fast access. If you list the companies using PostgreSQL extensively, they are likely to be using PostgreSQL as a replacement for the likes of Oracle. If you list the companies using MySQL, that's probably not the case, it's more likely that they're using it as a complement to Oracle or for a purpose that you normally shouldn't use Oracle for.
I would say out of the relational databases, MySQL is the most different in terms of application domain, functionality, performance, and behavior.
Social scientists are inspired by theories; scientists are humbled by facts.
Do you really expect me to:
(a) notice; and
(b) believe that that is his real email address; and
(c) assume that it summarily discredits every word he writes?
Personally, I didn't make it past (a), so the compiler's optimizer in my brain never bothered to check (b) or (c).
Social scientists are inspired by theories; scientists are humbled by facts.
cdbaby, baby. check out that dude's weblog on o'reily.
MobyGames runs on PostgreSQL and has done so for over 5 years.
I work at WhitePages.com, one of the top 100 US websites, and we use PostgreSQL in our mix of databases. We have the entire US and CA white and yellow page data loaded into PostgreSQL and we see awesome performance from our configuration. We've got over 250,000,000 rows of data and a *lot* of indexes making our database about 375G. We run over 1,500,000 queries per server per day which is about 100 per second at peak. Under load tests, we've seen almost triple that volume from the same servers. However, all of our use of PostgreSQL is entirely read-only in production. So, while you can point to us as a "high-traffic site" using PostgreSQL, you should be aware that our usage is likely very different than your needs.
Actually, I like that feature. I just wish it was possible to restrict the allowable table types, for things like big companies where you don't want joe developer to use the wrong kind of table. I wish you could set a different default that myisam, so that if you forgot the type you'd get, for instance, innodb or bdb. Finally, I wish it would throw some errors every now and then. I hate that I can create an innodb table, fk a myisam to it, begin a transaction, roll it back and THEN find out that some rows can't be rolled back.
Honestly, there's plenty about MySQL that's nice, but the silent data mangling and inability to easily customize certain behaviours makes it a poor choice for work where data integrity is paramount.
--- It is not the things we do which we regret the most, but the things which we don't do.
Yes! I have (the company I work for clock) use PostgreSql for some large-ish popular sites:
Eddie Izzard (link
JD Wetherspoon (link
Bill Bailey (link and others We find it far the best solution for us