Ask Slashdot: Which OSS Database Project To Help?
DoofusOfDeath writes "I've done a good bit of SQL development / tuning in the past. After being away from the database world for a while to finish grad school, I'm about ready to get back in the game. I want to start contributing to some OSS database project, both for fun and perhaps to help my employment prospects in western Europe. My problem is choosing which OSS DB to help with. MySQL is the most popular, so getting involved with it would be most helpful to my employment prospects. But its list of fundamental design flaws (video) seems so severe that I can't respect it as a database. I'm attracted to the robust correctness requirements of PostgreSQL, but there don't seem to be many prospective employers using it. So while I'd enjoy working on it, I don't think it would be very helpful to my employment prospects. Any suggestions?"
I've used Postgres commercially for years, with a number of employers. It's a great DB and having dealt with MySQL, SQL Server, Oracle, et al I'd never go back - though the softies tell me that SQL Server is much better these days.
I'd be surprised if you can't find plenty of work using Postgres. Maybe it's one of those things people don't feel comfortable talking about - like Delphi in the 90s. Plenty of people used it, but few would own up to what made up their "secret sauce".
It's seeing a constant rise in usage. Also many projects (spacewalk!) have it as the only viable alternative to Oracle.
Small companies with small to mid sized applications use it (see Jira or Fisheye, at Atlassian) as their main development platform.
Also you shouldn't use your USA'ish perspective and only do something because it will benefit your job or future employer. OSS is about sharing, fun, knowlege and getting better. Getting better at your job is a welcome side effect.
You will probably be happier in the fewer postresql shops. Think about it do you want to get it done quick and dirty or the right way?
No sir I dont like it.
If you are an active member committing to a major database's code, then it will help your employment prospects no matter what. If you're committing to PostgreSQL regularly, that's strong evidence you are good at what you do.
Only you can answer that question. Good luck!
Oracle
Jehovah be praised, Oracle was not selected
You might want to take a look at MariaDB, it's a continuation of the MySQL project by the original author of MySQL.
Bits of code, random ramblings: jakimfett.com
I actually love MySQL, but FWIW, someone noted a while back that Salesforce.com has announced intent to hire about 50 top gun PostgreSQL guys in the coming year. It seems obvious that they are preparing to unhook the money siphon leading to Oracle. Assuming Salesforce follows through, all the herd-following executives in the U.S. will want to do the same. So I predict that demand for PostgreSQL talent will be pretty good for many years.
The video shows a number of ways that MySQL seems to insert questionable data; ignoring NOT NULL, inserting default values when no default is specified, etc...
There are two databases that I have had to repair... Hypersonic and MySQL. MySQL I have to repair regularly in my MythTV box. Hypersonic states it should not be used in a production system. I have never had to repair Postgres, MSSQL, or Oracle.
Sure it is. But product quality has little to do with a product's popularity.
There's no -1 for "I don't get it."
Nosql DBs suffer pretty bad from Inner platform effect, where the users end up implementing their own classic SQL-RDBMS on top of the nosql. "I don't have joins... well I'll write on in ruby". You could probably do the community a huge service by PROPERLY re implementing at least a API compatible mysql system on top of a variety of various nosql services. That way devs could be buzzword compliant, while not actually having to change anything (well, the sysadmins will throw fits at the change for sake of change, but no one cares about them)
http://en.wikipedia.org/wiki/Inner-platform_effect
The ones that don't inner-platform aren't really using them as "databases" so much as simple persistent stores. Like dumping data into a CSV file. Maybe persistent stores with advanced parallelization, but just persistent stores.
"Science flies us to the moon. Religion flies us into buildings." - Victor Stenger
https://kb.askmonty.org/en/community-contributing-to-the-mariadb-project/
We (yes, I work for the project) are always looking for new contributors. There are lots of exciting things happening right now.
Their logo is awesome.
I recommend setting yourself about fixing some of that long list of fundamental flaws in MySQL.
Traditionally, especially in 2012, this amounts to listing stuff like "doesn't have transactions" which was fixed back in Bush the Second's first term.
Shoveling thru obsolete FUD to find the truth is a harder job than you'd think, which also shows "good little worker bee" stick-to-it-ive-ness
"Science flies us to the moon. Religion flies us into buildings." - Victor Stenger
0.11016 was uploaded six weeks ago, its not dead.
Its VERY popular internally to auto-generate DB diagrams from midnight cron jobs etc. I'm sure there's other ways to do it. But this was easy, fast, and the diagrams look good enough.
It can do a lot more than generate diagrams.
What it needs is a new artist. The dude in a tutu as a logo...
"Science flies us to the moon. Religion flies us into buildings." - Victor Stenger
Actually I wasn't. I figured the /. crowd might have some knowledge about the relative acceptance and prevalence of the two databases in European business settings, and where things are moving.
For example, if the consensus was that PostgreSQL was so rarely used that it was a dead-end, then I'd suck it up and work on MySQL despite my misgivings.
But as long as PostgreSQL is showing some signs of life in a business setting, I'll perhaps try to pitch in on that.
I also figured that maybe there was some other up-and-coming database out there that I should take a look at. The /. community is good at bringing alternatives like this to light.
As far as flames, I should have been clearer about what I meant by "design flaws". I realize that it's somewhat subjective. What I should have said is that MySQL's behavior strikes me as a lot more surprising in some cases than does PostgreSQL's, and I didn't think that was going to chance. (Probably in a similar vein, I like strongly typed programming languages and compile-time correctness checks. I think it's a mindset kind of thing.)
The post is basically a troll for a video. The video is based on an old list of MySQL 4.x gotchas, many of which were fixed in the 5.x series. Most of them involve things like the semantics of NULL in special cases, truncation of indexed strings with trailing spaces, and similar stuff that an application shouldn't be relying on. There's a comparable list of PostGreSQL gotchas from the same source.
MySQL has political problems, because Oracle owns it and would prefer users buy their commercial products. The future of the free version is uncertain. The problems in the video aren't the ones to worry about.
Does it still silently drop data that does not match the expected input?
Can you easily delete a table with foreign keys yet?
I am sorry if I don't keep checking to see if they finally fixed based problems. Way easier to just use postgres.
While you're at it, do me a favor and add "ON DUPLICATE KEY UPDATE" to Postgres. (If necessary, also add it to an SQL spec.) Thanks!
It's better to vote for what you want and not get it than to vote for what you don't want and get it.
- E. Debs
The problems with MySQL aren't bugs, they're decisions. Decisions that can't be reversed for the sake of backwards compatibility.
my sig's at the bottom of the page.
http://en.wikipedia.org/wiki/Michael_Stonebraker Have a look at what he's done with Postgres, Vertica, VoltDB, and the other systems he's working on. You may find that contributing to this project aligns you with some great, very intelligent people -- that's opportunity for learning, opportunity for contributing, and opportunity for good networking.
"Big Data" is horse shit. It's code for "stuff it all in a pile and maybe someday we can get some of it out, but not in any order that makes sense, and not in any reliable fashion".
The problem with arguments like this is that it overlooks the fact that there is a lot of value in data that doesn't need ordering in advance, and where any missing gaps in data are not critical failures. Not all data can or should be stored in such systems, but for that which can, there are much better ways to store it than RDBMS. RDBMS isn't going away, but it's certainly been commoditised. In time, so will Big Data type storage, but until then it's a good thing to make good money on if you are starting out in this field precisely because it is relatively new, and also is not going away.
I'm curious if those are still actually existent in >=5.0. I know I started avoiding MySQL in the bad old days, but from what I understand it's made a lot of strides in the conformance department.
I haven't bothered to look at it again since then, since Postgresql meets all of my needs, but I am curious. It can't still be that bad, can it? I can see all the bad old behavior being hidden behind default for legacy users, that's reasonable, but silent data corruption (and whether you're truncating strings or inventing dates when you hit NULL, you're corrupting data) doesn't seem like something people would put up with these days.
<xml><I><am><so><damn>Web 2.0</damn></so></am></I></xml>
Oracle, the database and connection software is quite respectable. The problem with Oracle is the organization which sells and services it. They like to "partner" with their customers, and comport themselves like a criminal enterprise. They send auditors to their customers' sites to ensure license compliance (meaning shake-down money for Larry Ellison). Training is expensive, but so are trained Oracle specialists. They're risking ruining Java with slow updates, and MySQL development seems to have slowed-- probably a good thing. Larry Ellison doesn't really get open source.
There are many Oracle-like features in PosgreSQL, so it's helpful to learn, but here in the US, Oracle work pays very well-- probably the best in the industry. I don't know how many of those jobs are available. I do know there's some foreign competition, both from Europe and Asia. The company I work for has some excellent DBAs from India monitoring our servers.
Everything I've ever learned the hard way was based on a statistically invalid sample.
If employment prospects are all that matter, stay away from the major SQL databases. They're mostly feature complete, have large established developer communities that are hard to break into (sometimes requiring employment at the sponsoring company) and often have a lot of legacy baggage that limits what you can accomplish.
Meanwhile, in the NoSQL world, people are busy re-inventing the wheel. You can take decades-old techniques and apply them to new features of these databases. For example, Redis doesn't have true clustering support. There's a preliminary draft and some exploration, but it's still really nascent. If you've got the DB chops to implement it and do it well, there's a ton of places that would hire you.
The downside is, of course, that you end up working with NoSQL databases, but your employment prospects to actual work and knowledge ratio is a lot higher.
Pretty much all the test cases from that video fail on MySQL if the sql-mode is set to traditional. MySQL will throw an error when data would be truncated, throws an error when you try to insert a NULL value in a NOT NULL column, refuses to alter a table if the existing data would be truncated, throws an error on an invalid date, on select only returns a warning for division by 0 but throws an error on an insert of division by 0, throws an error if you try to insert a string into a numeric column and so on.
I understand of course that the strict modes aren't enabled by default but they're easy enough to enable if you choose to. Via my.cnf, the command line when mysqld is started up or while connected to the mysql server itself (for just that session, or globally for all sessions).
I didn't run through all their examples, but mostly because I got bored and all their examples that I did try were throwing errors (except the select 1/0 one, which issued a warning) with the sql-mode set to traditional on MySQL (postgresql is also a sql-mode option but I didn't play with that one since I've never used it before).
First I should probably burn some karma and say "what a load of garbage". The headline asks what OSS database to HELP with, but the article summary might as well read "Which free SQL-compatible database to learn to use". And on top of that it contains the answer already, along with questionable dirt-showing on MySQL which makes it read like a guerilla-ad for PostgreSQL.
But in any case, it makes a major, huge difference whether the question is "which database codebase to contribute improvements to" or "which free database to learn for best amployment chances". Sounds like it's the latter, and in that case a follow-up question is what kind of employment. The one correct answer is "whichever database your employee is using" - don't expect to be able to choose a job on the basis of what database engine they happen to be using in one of the departments at the time. Second best answer is go with both; and again it makes a huge difference whether it's for self-employed web-site design or financial analysis for stock brokerage firm.
And if you actually went with MySQL, next question is which database engine. Huh, you ask? Well you see, MySQL is not a single database engine, in actuality it's a front-end to pluggable database engines. The stock release fetures at lest MyISAM, InnoDB, Heap, BDB, NDB and Archive (and few variations). In general it's a choice between MyISAM or InnoDB which are whole different story. When most people say "MySQL has such and such problem" they're actually talking about MyISAM, but MySQL has defaulted to InnoDB engine for years.
But the third and best answer is "none of the above". In most cases everybody seeking employment in relevant job will be fluent in SQL and have at least some experience with both MySQL and PosgreSQL, and it'll be rare for the employer to be at all interested in your ability to actually "hack" the database source. NoSQL databases offer ample opportunity to differentiate both on the job-market, and on the business competitiveness arena by improving the source-code (and in most cases as long as the binaries stay in-house, so can the source which makes bosses happy, but consult your OSS license).
I love PostgreSQL in theory but hate it in practice. It's a pain in the ass to work with... not very productive. For a long time, I felt it was worth it to endure this for the superior design, feature set, and technical correctness.
But one day I realized that I need to get things done, switched the MySQL. The learning curve was small but the main kicker was that things just worked and easily reworked. There are risks, limitations, and problems. It's very imperfect but I get things done now... and never have or care to think about the purist philosophies with which I used to love to indulge in.
In the end, you have to give up perfection to go anywhere.. Otherwise, it's like having to get half-way there first, meaning you have to get half-way to half-way first, etc. recursively forever.. With MySQL I take a reasonable number of precautions for things that can go wrong, ensure there are good backups, and deal with the others as they come.
Now I think MySQL is superior for practical use by a long shot. And I think that's why its adopted so heavily.
The key ingredients to successful technologies are:
(1) You can do something obviously cool or useful with it.
(2) It's quick and easy to learn and use.
And that's it. This is why so many successful things are made by idiots. Look at HTML. It was made by Tim Burners Lee back when he knew very little. But 12 year olds were picking it up and making cool (at the time) web pages. Now he know so much more and has tons of backing from heavy weight organizations and money but cannot seem to even force the success of the Semantic Web. It's hard to learn and hard to work with even when you learn it. Furthermore, it's not obvious to most what cool or useful things you can do with it. Proponents keep saying it'll mature and will be easier when tools and libraries are available to make it easier... That misses the point. Even the tools mostly suck and are buggy because the basic tech. is a pain in the ass to work with. There are philosophical visionaries galore but no substantial progress beyond what grants and job requirements force people to do... and there won't be.
Matthew
If only they were actually edge cases(look carefully they mentioned one was a common Ruby on Rails mistake). MySQL's habit of pretending everything is alright when it's not has burned more than one of my previous employers.
But they missed the real WTFs like mysqldump creating dumps that need to be hand edited before MySQL will restore them or my all time favorite: mysql user authentication simply does a "SELECT * from mysql.users" and if the fields get reordered by a new MySQL release then logins will simply fail. The best part is that the officially documented way to fix that is a mysqldump followed by a restore which... deletes the table and puts the fields in the wrong order again. The last major MySQL upgrade of my employer's systems involved me starting the new install from an empty DB, restoring everything except the mysql.users table and recreating the accounts using a script.
Please don't pretend it's not a crap database. Those of us who have to deal with it every day know better.
With SQL Server, you set your transaction isolation level that you care about and then you begin a transaction - SQL Server will guarentee consistency in that transaction even if you're just doing multiple selects. And, SQL Server will not let you do a 'select for update'.
For example:
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;
BEGIN TRANSACTION;
SELECT TOP 100 * FROM MyTable
-- Rows in MyTable are now locked
Locks are released when the transaction is commited.
So which entry-level web host do you recommend for running web applications that use PostgreSQL? Most that I've seen either offer only MySQL or charge extra for PostgreSQL.
Gosh, guess I should stop using Cassandra for time series data, and forget all the performance gains its given me on huge data sets. Maybe if you organize it like a pile, a pile is what you get?
All else being equal, if a non-relational database is giving you significant performance gains, it just means you setup your relational database incorrectly.
Cowards!
Table-ized A.I.
The simplest way to say it is that MySQL is really more of a data store than a database. You can store stuff in it, and it'll get the data back reasonably efficiently, but in terms of actually operating as a proper compliant database for critical information it just isn't designed that way. It works great for storing the back end for your web server, but if you wanted to store complex data in it and needed it to be 100% accurate, transactional, and reliable, the product just doesn't fit the bill. For all that it's got a paid "enteprise" edition, it's really more in the space of something like SQLite or SQL CE than it is in the space of Oracle, and again it's not an issue of whether it can scale or whether it's buggy, it just simply isn't designed to be compliant to the required level. That's largely the reason it works so well as a LAMP back end and is so easy to administer, but it just isn't fit for purpose for much more.
If you're using phpMyAdmin, then you aren't doing the kind of in-depth database development where you run into the problems the GP is talking about.
Hell, if you're using phpMyAdmin, you're casual DB user probably just trying to support your small webapp or CMS installation.
And if you do use phpMyAdmin, you'd be much better off with a platform native database tool with MySQL support, such as SequelPro (OSX) or MySQL Workbench for (Windows,OSX)
I'm out of my mind right now, but feel free to leave a message.....
What do you mean by "most popular."
I'm tired of hearing that "everyone uses..." No, they don't. MySQL is pretty popular with the open-source web-crowd but this is the same crowd that respects the engineering behind PHP. I've encountered plenty of people in that arena who would rather roll their own data-checks and treat the database as barely more than a key-value store than use the capabilities of the database and have to deal with handling exceptions. Bring up transactions, ACID compliance, data-integrity and the like at a PHP users group and you get blank-stares. The get-rich-quick-with-a-cute-kitten-website crowd cares not for such things (as an overgeneralization - there are plenty of high-traffic sites such as Instagram, hi5, Etsy and MyYearbook that run on PostgreSQL).
So where do you find PostgreSQL? Salesforce, National Weather Service, Nippon Telephone and Telegraph, Federal Aviation Administration, Sony Online Entertainment, TD Ameritrade, State of Wisconsin Courts, Afilias, BASF, Flightaware, Skype (a contributor of many PG utilities), Fujitsu, Launchpad (Ubuntu)...
And PostGIS is *the* go-to open-source geospatial database.
I've found the PostgreSQL community to be wonderful with opportunities to contribute at all levels. Answer questions on the mailing-lists, contribute to documentation, help at users-groups, give a talk at a conference. One always welcome contribution is doing testing and submitting results/patches during commitfests - and this gets you more involved with the code.
As to employment, it sounds like you prefer PostgreSQL. As such, PostgreSQL is by definition the most popular database among places you are interested in working. Do what you love.
~~~~~~~
"You are not remembered for doing what is expected of you." - Atul Chitnis
Most of what's shitty about it is the MyISAM storage engine, which does approximately dick-all for enforcing integrity. It doesn't even have foreign key constraints. IIRC it can't do transactions either. The trade off is that it's slightly faster for some operations *eyeroll*
If MyISAM is good enough for your application then you may as well—no exaggeration—just use MongoDB or something.
InnoDB is much better. It's got some of the same not-confidence-inspiring quirks shown in the video but at least it supports transactions and foreign key constraints.
Biggest remaining differences off the top of my head are that Postgres supports a shitload more data types and data operations (many through plugins) like stuff related to geographic data and key-value stores (hey, you got NoSQL in my SQL!), and that Postgres has real separate databases, not just separate schema like MySQL, the difference there being strict separation of the data, so you can't, say, do a SELECT across two databases or even tell that there are other databases if you've only got a user account on one of them.
Lots of other under-the-hood stuff, I'm sure, but those are the main ones I can think of from a user's perspective.
Postgres is way, way more powerful, MySQL is (slightly) more widely supported and (IMO) the free tools, both command line and GUI, for working with it are easier to learn and generally friendlier.
MySQL's a completely miserable excuse for a relational database if you use MyISAM; it's only a mostly miserable excuse for a relational database with InnoDB.
I would believe that if it weren't for the fact that there are at least 3 forks from former MySQL leaders trying to fix all the junk in it that is screwed up. For example, read:
http://krow.livejournal.com/700783.html
I was looking for an easy way to automate character conversion from Latin-1 to UTF-8 for the forum software I use. I found out the hard way that the built-in MySQL recoder is completely broken, and will barf in different ways depending on which version number of MySQL you are using. No errors or warnings during the conversion for any version. You'll just find out later that all the field limits are wrong. You can only find out if it worked or not by inserting new rows and finding out if you get errors about data being too large to fit in the field, and whether it fails or not has nothing to do with the actual length of the data, but with whether you send 7-bit or 8-bit characters.
I gave up trying to get MySQL to do it, and wrote my own conversion tool.
And that's just for baby stuff for a web forum on a personal web site. I can only imagine what MySQL is like in an enterprise environment.