Insightful? I have no clue what features is he talking about (and I guess I'm not alone), but there are solutions far better than MySQL. If you check what was commercialized, you can see it's damn damn basic functionality (e.g. the ability to use PAM authentication, or the features of MySQL Enterprise Backup). But the infinitely more important question for all MySQL users should be "What will be commercialized in the future?"
And that's not FUD, that's a question everyone should ask before using any product (not just MySQL).
Really? I use tsql, but I work on an inhouse app i didn't write, and of course the previous developers sometimes used "" and sometimes null like the fine professional coders they were (unused methods are sometimes found). Could this be the developer's reasoning? Then again i don't need to switch rdms to take advantage of this fine feature, dbo.ISNULLOREMPTY() anyone?
Yes, it's one of those Bender-like nightmares - you know, those with 0s, 1s and then a 2.
And in terms of a recode to switch rdms, it's 99.9% out of the question for everybody, the software costs $, recoding costs way more $, bug fix costs more $, support costs more $, where's the gain?
That's why I wrote 'a bit easier'. The truth is there's a lot of apps that are rather easy to port - e.g. a lot of Java apps built on top of some ORM are not that difficult to port, because they're not using the advanced features at all. And switching the db may be actually cheaper if you consider the licensing fees etc. And for example EnterpriseDB has something that emulates Oracle to some degree and makes the port much easier (so they say).
Eh? Where does that script mention amount of RAM? Shared buffers is usually recommended to be 25% of RAM, but that's a recommendation, not a rule. For some workloads, keeping small shared buffers is actually a good thing.
The reliability probably improved since ENIAC, but the the question still is "when it is going to fail" and not if it is going to fail. Because it is going to fail - it may be a drive, CPU, PSU, a network switch, an AC unit, the whole AWS data center... something is going to fail.
The beauty of CAP theorem as I see it that it says "You can't get all three at the same time, face it." If you don't need the strong consistency (and with most apps you don't), then ditch it and it'll be much easier and cheaper to built and scale the system. I'd say once you realize this inner beauty, it clears your mind - something like a Zen of distributed computing.
Cassandra is just one of many NoSQL databases, but yes - NoSQL can be an answer to workaround the CAP theorem in some cases.
But in many cases it's not a solution. If the data are relational, if you need full ACID, etc. then ditching "consistency" is not a choice. There are projects to build PostgreSQL clustering solutions, that may resemble RAC a bit, although none of them uses shared disk (so each instance needs a separate disk). Let's mention PGCluster, PGCluster II or Postgres-XC (aiming to build write-scalable cluster, something Cassandra does in the NoSQL world). Sure, all this has to follow the CAP theorem.
That is not true. If you promote the slave that's ahead of all the other slaves, then the other slaves can just reconnect to the new master. Tools like repmgr can handle this for you.
And no one actually says you have to do a completely fresh base backup. Ever heard about rsync?
I don't think it would be mentioned - actually it's a separate package built on top of PostgreSQL (thanks to the ability to write custom data types etc).
Most people don't realize that commercial software is usually licensed, not sold. That's why they don't see the consequences (and it's not just about costs).
The streaming replication is generally equal to Oracle DataGuard (physical standby). The hot_standby actually gives you about the same as Active DataGuard, i.e. the ability to run read-only queries on the standby for free (you have to pay for that with Oracle). With Oracle you'll get a management console to handle all this, with PostgreSQL you have to set it up manually (5-minute task), but there are several tools that may help (e.g. repmgr).
Spatial... although it's not a built-in feature, there's a PostGIS (www.postgis.org). A great package to manage geospatial data.
There are companies that are migrating from Oracle, but they don't want to go public for good reasons. I know there were some case studies about how Sony replaced Oracle with EnterpriseDB - although it's mostly a marketing mumbo jumbo.
Frankly, if you have bought Oracle, you're more than aware about their licensing fees. And how ridiculous that gets once you need to use VM, or when you need more CPUs etc. My experience is that the businesses that were already hit by an Oracle sale are looking for other solutions - and PostgreSQL is very popular among them.
I doubt that's an Oracle issue, my guess is they're using a custom-developed tool to export the data and it's buggy. I'm dealing with a lots of data exported from Oracle (CSV, columnar,...) and I've never had this problem. External tables actually made exporting even easier.
So while I'm a PostgreSQL fan, let's not blame Oracle for the mistakes of others.
Because then it would be a bit easier to port the applications to other databases?
Plus this particular "feature" is used on so many places of the current code base (and that's a huge amount of PL/SQL code) that it's almost impossible to fix. Plus it's actually a bit funnier, because the exact behaviour depends whether you use CHAR or VARCHAR2 and if you're in SQL or PL/SQL. And it's not the only funny feature in Oracle.
Sometimes I have nightmares about the reasons that led the developers to implement it this way.
Except that the post is not about Twitter. It's about a tool they use they developed and use to analyse the stream of data. And they're dealing with a lot of data, so the tool might be really interesting.
And yes, you could code a crude Twitter in one night, the only problem is it'd support about 3 users while Twitter is used by millions. So this goes to the same drawer like "I can run Google on a single LAMP box" and other similar ideas.
Seems you're right about the popularity of languages - according to http://langpop.com/, C/C++ is the leader (which makes me happy, because I use it too). Although I'm quite suspicious about this kind of popularity charts - mostly because of the data source (search engines, not real projects). I've studied statistics (I have a degree in it), and I guess I could easily tweak it to get Visual Basic to the first place.
I've judged the popularity by my experience, and (probably limited) view of projects. I work in enterprise / banking, and most of the projects here are Java-based. And I don't think that's going to change - maybe they could develop new projects on something else, but they have to maintain the current systems so they need Java developers. And when you have in-house Java developers, it's cheaper to develop new projects in Java too (otherwise you'd need more developers or developers who know both, and that's more expensive). And the banks generally like to have the whole stack from Oracle (including database and application servers) from Oracle, so switching just the application layer won't help them much.
The Oracle approach is very different from Sun. Sun was an engineering company, Oracle is doing business. I may not like it, I see myself as an engineer, so I did naturally like how Sun did it, but in the end it was not very successful.
And I don't see any fault on Apache Foundation side. The fact that there is a pool of projects and various vendors can sell support, that's actually the very idea behind the open-source business model. Yes, Oracle can bundle that with their stack (and since they bought Sun they actually have everything they need), but the other vendors could do the same. The real culprit here is the JVM - with enough patents, Oracle can club to death any attempt to create an alternative JVM. But in that case, the open-source ecosystem will fall and something else will emerge.
Yes, that's the fate of all languages. But the fact that you know how to build a rocket engine does not make combustion engines immediately dead.
COBOL is dead, or maybe in coma - it's not used for development any more, except for maintenance of legacy systems.
Java is probably the most used language of today. There are other popular languages - some older than Java (e.g. C), some younger (Python, Ruby) - but none of them is used as often as Java. This is not going to change in the near future (say 10 years), because the companies have invested so much into the whole ecosystem and there's no reason to ditch Java. Moreover there's no other language with a comparably rich ecosystem.
But I admit that with enough stupid steps from Oracle, this can change pretty fast.
Not true. Manual vacuuming is not needed since PostgreSQL 8.1, released 2005/11, that's almost 6 years ago. The functionality is called 'autovacuum" and it's fully automatic so you don't need to care about it anymore, and it was significantly improved in the following versions. In some cases you have to tune it a bit (to make it more aggresive for example), but in 99% it works fine out of the box.
Maybe your site was one of those 1% that needed a bit more tuning, or maybe changes at the application level (not everything that works on one database will work fine one another one). Otherwise it's a pure FUD. We're running a lot of applications (with a lot of write activity) on PostgreSQL, and it works perfectly.
Sure, nothing is perfect - for example the memory management is not perfect, you need a bit of experience when setting the memory limits. But I really don't think "leaking to disk" is the right term. You can set 'maintenance_work_mem' and if the process needs more, it has to put that on disk. But with reasonable limit and autovacuum that really does not happen.
Yes, that's definitely right. But in many cases the benefit from balanced deal is so far away they just go for fast money.
Take for example real-estate - when a regular customer wants to sell a house, the agent is in a hurry to be the one who sells it. The owner might sell it without them, so they accept much lower price than they might - when the agents are selling their own houses, they usually get about 25% more for it because they're patient and waiting for the right offer.
Insightful? I have no clue what features is he talking about (and I guess I'm not alone), but there are solutions far better than MySQL. If you check what was commercialized, you can see it's damn damn basic functionality (e.g. the ability to use PAM authentication, or the features of MySQL Enterprise Backup). But the infinitely more important question for all MySQL users should be "What will be commercialized in the future?"
And that's not FUD, that's a question everyone should ask before using any product (not just MySQL).
What is this 'open core' you're talking about? And how do the steps of Oracle, an uber-commercial corporation prove that 'open core' does not work?
4. I don't want oracle men in suits asking me money
There are far worse things men in suits can ask you ...
Really? I use tsql, but I work on an inhouse app i didn't write, and of course the previous developers sometimes used "" and sometimes null like the fine professional coders they were (unused methods are sometimes found). Could this be the developer's reasoning? Then again i don't need to switch rdms to take advantage of this fine feature, dbo.ISNULLOREMPTY() anyone?
Yes, it's one of those Bender-like nightmares - you know, those with 0s, 1s and then a 2.
And in terms of a recode to switch rdms, it's 99.9% out of the question for everybody, the software costs $, recoding costs way more $, bug fix costs more $, support costs more $, where's the gain?
That's why I wrote 'a bit easier'. The truth is there's a lot of apps that are rather easy to port - e.g. a lot of Java apps built on top of some ORM are not that difficult to port, because they're not using the advanced features at all. And switching the db may be actually cheaper if you consider the licensing fees etc. And for example EnterpriseDB has something that emulates Oracle to some degree and makes the port much easier (so they say).
Eh? Where does that script mention amount of RAM? Shared buffers is usually recommended to be 25% of RAM, but that's a recommendation, not a rule. For some workloads, keeping small shared buffers is actually a good thing.
Things are improving, although a bit slowly. Plus there are windows installers available at enterprisedb.com.
Because they already have a Windows server and they don't want to buy another machine?
The reliability probably improved since ENIAC, but the the question still is "when it is going to fail" and not if it is going to fail. Because it is going to fail - it may be a drive, CPU, PSU, a network switch, an AC unit, the whole AWS data center ... something is going to fail.
The beauty of CAP theorem as I see it that it says "You can't get all three at the same time, face it." If you don't need the strong consistency (and with most apps you don't), then ditch it and it'll be much easier and cheaper to built and scale the system. I'd say once you realize this inner beauty, it clears your mind - something like a Zen of distributed computing.
Cassandra is just one of many NoSQL databases, but yes - NoSQL can be an answer to workaround the CAP theorem in some cases.
But in many cases it's not a solution. If the data are relational, if you need full ACID, etc. then ditching "consistency" is not a choice. There are projects to build PostgreSQL clustering solutions, that may resemble RAC a bit, although none of them uses shared disk (so each instance needs a separate disk). Let's mention PGCluster, PGCluster II or Postgres-XC (aiming to build write-scalable cluster, something Cassandra does in the NoSQL world). Sure, all this has to follow the CAP theorem.
Yup, and things will get a bit more interesting thanks to the cascading replication.
That is not true. If you promote the slave that's ahead of all the other slaves, then the other slaves can just reconnect to the new master. Tools like repmgr can handle this for you.
And no one actually says you have to do a completely fresh base backup. Ever heard about rsync?
I don't think it would be mentioned - actually it's a separate package built on top of PostgreSQL (thanks to the ability to write custom data types etc).
Most people don't realize that commercial software is usually licensed, not sold. That's why they don't see the consequences (and it's not just about costs).
The streaming replication is generally equal to Oracle DataGuard (physical standby). The hot_standby actually gives you about the same as Active DataGuard, i.e. the ability to run read-only queries on the standby for free (you have to pay for that with Oracle). With Oracle you'll get a management console to handle all this, with PostgreSQL you have to set it up manually (5-minute task), but there are several tools that may help (e.g. repmgr).
Spatial ... although it's not a built-in feature, there's a PostGIS (www.postgis.org). A great package to manage geospatial data.
There are companies that are migrating from Oracle, but they don't want to go public for good reasons. I know there were some case studies about how Sony replaced Oracle with EnterpriseDB - although it's mostly a marketing mumbo jumbo.
Frankly, if you have bought Oracle, you're more than aware about their licensing fees. And how ridiculous that gets once you need to use VM, or when you need more CPUs etc. My experience is that the businesses that were already hit by an Oracle sale are looking for other solutions - and PostgreSQL is very popular among them.
I doubt that's an Oracle issue, my guess is they're using a custom-developed tool to export the data and it's buggy. I'm dealing with a lots of data exported from Oracle (CSV, columnar, ...) and I've never had this problem. External tables actually made exporting even easier.
So while I'm a PostgreSQL fan, let's not blame Oracle for the mistakes of others.
Because then it would be a bit easier to port the applications to other databases?
Plus this particular "feature" is used on so many places of the current code base (and that's a huge amount of PL/SQL code) that it's almost impossible to fix. Plus it's actually a bit funnier, because the exact behaviour depends whether you use CHAR or VARCHAR2 and if you're in SQL or PL/SQL. And it's not the only funny feature in Oracle.
Sometimes I have nightmares about the reasons that led the developers to implement it this way.
Damn, I wish it was 1.000.000:1 - magicians have calculated that million-to-one chances crop up nine times out of ten.
I'm sure it was Zaphod Beeblebrox. He stole the ship!
Except that the post is not about Twitter. It's about a tool they use they developed and use to analyse the stream of data. And they're dealing with a lot of data, so the tool might be really interesting.
And yes, you could code a crude Twitter in one night, the only problem is it'd support about 3 users while Twitter is used by millions. So this goes to the same drawer like "I can run Google on a single LAMP box" and other similar ideas.
Seems you're right about the popularity of languages - according to http://langpop.com/, C/C++ is the leader (which makes me happy, because I use it too). Although I'm quite suspicious about this kind of popularity charts - mostly because of the data source (search engines, not real projects). I've studied statistics (I have a degree in it), and I guess I could easily tweak it to get Visual Basic to the first place.
I've judged the popularity by my experience, and (probably limited) view of projects. I work in enterprise / banking, and most of the projects here are Java-based. And I don't think that's going to change - maybe they could develop new projects on something else, but they have to maintain the current systems so they need Java developers. And when you have in-house Java developers, it's cheaper to develop new projects in Java too (otherwise you'd need more developers or developers who know both, and that's more expensive). And the banks generally like to have the whole stack from Oracle (including database and application servers) from Oracle, so switching just the application layer won't help them much.
The Oracle approach is very different from Sun. Sun was an engineering company, Oracle is doing business. I may not like it, I see myself as an engineer, so I did naturally like how Sun did it, but in the end it was not very successful.
And I don't see any fault on Apache Foundation side. The fact that there is a pool of projects and various vendors can sell support, that's actually the very idea behind the open-source business model. Yes, Oracle can bundle that with their stack (and since they bought Sun they actually have everything they need), but the other vendors could do the same. The real culprit here is the JVM - with enough patents, Oracle can club to death any attempt to create an alternative JVM. But in that case, the open-source ecosystem will fall and something else will emerge.
Yes, that's the fate of all languages. But the fact that you know how to build a rocket engine does not make combustion engines immediately dead.
COBOL is dead, or maybe in coma - it's not used for development any more, except for maintenance of legacy systems.
Java is probably the most used language of today. There are other popular languages - some older than Java (e.g. C), some younger (Python, Ruby) - but none of them is used as often as Java. This is not going to change in the near future (say 10 years), because the companies have invested so much into the whole ecosystem and there's no reason to ditch Java. Moreover there's no other language with a comparably rich ecosystem.
But I admit that with enough stupid steps from Oracle, this can change pretty fast.
Java is not dead. Maybe it's not the hip language anymore, but it definitely is not dead.
Not true. Manual vacuuming is not needed since PostgreSQL 8.1, released 2005/11, that's almost 6 years ago. The functionality is called 'autovacuum" and it's fully automatic so you don't need to care about it anymore, and it was significantly improved in the following versions. In some cases you have to tune it a bit (to make it more aggresive for example), but in 99% it works fine out of the box.
Maybe your site was one of those 1% that needed a bit more tuning, or maybe changes at the application level (not everything that works on one database will work fine one another one). Otherwise it's a pure FUD. We're running a lot of applications (with a lot of write activity) on PostgreSQL, and it works perfectly.
Sure, nothing is perfect - for example the memory management is not perfect, you need a bit of experience when setting the memory limits. But I really don't think "leaking to disk" is the right term. You can set 'maintenance_work_mem' and if the process needs more, it has to put that on disk. But with reasonable limit and autovacuum that really does not happen.
Yes, that's definitely right. But in many cases the benefit from balanced deal is so far away they just go for fast money.
Take for example real-estate - when a regular customer wants to sell a house, the agent is in a hurry to be the one who sells it. The owner might sell it without them, so they accept much lower price than they might - when the agents are selling their own houses, they usually get about 25% more for it because they're patient and waiting for the right offer.