Slashdot Mirror


New PostgreSQL Guns For NoSQL Market

angry tapir (1463043) writes "Embracing the widely used JSON data-exchange format, the new version of the PostgreSQL open-source database takes aim at the growing NoSQL market of nonrelational data stores, notably the popular MongoDB. The first beta version of PostgreSQL 9.4, released Thursday, includes a number of new features that address the rapidly growing market for Web applications, many of which require fast storage and retrieval of large amounts of user data."

27 of 162 comments (clear)

  1. next for NoSQL by SchroedingersCat · · Score: 5, Insightful

    Next, NoSQL databases will add schema and ACID support and the circle will be complete.

    1. Re:next for NoSQL by cowwoc2001 · · Score: 2

      Impossible.

      The entire premise behind NoSQL is trading consistency for availability (which actually means "latency" since everything is eventually available). You will never ever get ACID from NoSQL databases.

    2. Re:next for NoSQL by bdares · · Score: 4, Informative

      "NoSQL" doesn't mean "No SQL". At least, not all the time. I've heard it pronounced "Not Only SQL" more than once. RDF triple stores can also be considered NoSQL databases, and they can provide ACID. (They use SPARQL instead of SQL as a query language - hence being something other than a SQL DB.)

    3. Re:next for NoSQL by greg1104 · · Score: 5, Interesting

      All "NoSQL" means is that the database doesn't use SQL as its interface, nor the massive infrastructure needed to implement the SQL standard. This lets you build some things that lighter than SQL-based things, like schemaless data stores. There several consistency models that let you have a fair comparison. It's not the case that NoSQL must trade consistency for availability in a way that makes it impossible to move toward SQL spec behavior.

      Differences include:

      • Less durability for writes. Originally PostgreSQL only had high durability, NoSQL low. Now both have many options going from commit to memory being good enough to move on, up to requiring multiple nodes get the data first.
      • No heavy b-tree indexes on the data.
        Key-value indexes are small and efficient to navigate,
      • No complicated MVCC model for complicated read/write mixes.

        Today NoSQL solutions like MongoDB still have a better story for sharding data across multiple servers. NoSQL also give you Flexible schemaless design, scaling by adding nodes, and simpler/lighter query and indexes.

        PostgreSQL is still working on a built-in answer for multi-node sharding. A lot of the small NoSQL features have been incorporated, with JSON and the hstore key-value index being how Postgres does that part. Both system have converged so much, either one is good enough for many different types of applications.

    4. Re:next for NoSQL by cjc25 · · Score: 2

      the SQL standard.

      That's cute

    5. Re:next for NoSQL by Anonymous Coward · · Score: 2, Interesting

      "Schemaless design" always just sounds like a whitewashed buzzword for "Excel spreadsheet" to me.

      There's a very simple way to make a "schemaless design" within a relational database, and it's generally regarded as Not Best Practice (tm). You need a table with a unique PK (any old GUID or autoincrementing integer will do just fine), a FK to whatever bit of "real indexing" you need (user id or whatever), and two string fields (varchar, nvarchar, character varying, whatever your RDBMS likes to call them). One holds the "key" and the other holds the "value". Now, you need to create an index on the FK. Not a unique index, just a nonclustered, ordinary index. It really is a shit way to store data, but that's why it's Not Best Practice (tm). And now you've just reinvented "NoSQL". And best of all, you can use a real, set-theory-based data retrieval language (that is, SQL) to retrieve it! Of course, you lose all of the advantages of that very well-thought-out language by throwing all of your data into a shit-heap, but hey, you're a web designer, it's not like you're smart enough to make a query that does anything beyond "SELECT * FROM ShitHeap WHERE UserID = @UserID" anyway.

      Of course, there are advantages to a shit-heap, which the NoSQL fanboys will no doubt express vehemently about 10 seconds after I make this post. But why would you bother with one when you're already incurring the overhead of running PostgreSQL? You have the power, and you have the system set up to handle that load. Why dumb it down? Because you're dumb? Not likely. Even the dumbest of managers know when to hire an expert.

      This just reeks of "me too!" on the part of PostgreSQL. Nobody that feels a need to use NoSQL is going to consider using PostgreSQL for that task, and nobody that uses PostgreSQL is going to feel the need to use Not Best Practices (tm) in their RDBMS schema. It's a solution in search of a problem, and it's going to flop. Don't invest your time, energy, or money in it, because it will be abandoned for non-use in a year or two.

    6. Re:next for NoSQL by Bill,+Shooter+of+Bul · · Score: 4, Insightful

      http://en.wikipedia.org/wiki/S...

      "Popular implementations of SQL commonly omit support for basic features of Standard SQL, such as the DATE or TIME data types. The most obvious such examples, and incidentally the most popular commercial and proprietary SQL DBMSs, are Oracle (whose DATE behaves as DATETIME,[30][31] and lacks a TIME type)[32] and MS SQL Server (before the 2008 version). As a result, SQL code can rarely be ported between database systems without modifications."

      That's why its cute.

      --
      Well.. maybe. Or Maybe not. But Definitely not sort of.
  2. Re:Nice touch but too late! by mwvdlee · · Score: 5, Insightful

    By "industry" you mean the 0.001% of websites that could actually benefit from NoSQL?
    How many sites you visit use NoSQL? Do most webshops, blogs, news sites and forums? Does Slashdot?

    --
    Slashdot social media options: AIM, ICQ, Yahoo, Jabber and Mobile Text. Why no MySpace?
  3. How about Parallel Query Execution? by wispoftow · · Score: 4, Interesting

    NB: I love PostgreSQL with all my heart. I always upgrade to the most recent version, because they implement features that I really need. Added to the existing features of Postgres, it's totally awesome.

    But as I have moved toward "Big Data" and the market segment that these new-fangled (non-relational) databases target, I find myself wishing that Postgres would be able to run my vanilla query (*singular*) using all processors. As it is now, I have to either write some awful functions that query manually-partitioned subtables, or simply wait while it plods through all billion or so rows.

    1. Re:How about Parallel Query Execution? by mlyle · · Score: 5, Interesting

      Look at Postgres-XL that we just released. It's a clustered database and can do MPP execution of your queries and has good write-scalability too. (To use all the processors in each machine, you'll want to have a few data nodes per machine.) It's pretty clever about planning a lot of things.

    2. Re:How about Parallel Query Execution? by KingSkippus · · Score: 2

      I like the way the linked page uses Web 2.0 when it means scalability.

      Great job with the buzzwords.

      You know, I was just going to let this go, chalked up as random Internet stranger being an asshat, but seriously. Are you SO bored or jealous of other people's achievements that you have nothing better to do than to sit around and nitpick the friggin' ad copy of a marketing page that was undoubtedly written not just for people who want to know the technical specifications of the product, but common usage applications for it also? What you're calling a "buzzword" is information that business wonks need to know when faced with the question, "Will this solve my problem/fulfill my needs?"

      When you develop your own database system, you can write your own ad copy to say whatever you want it to. Or if you prefer, apply for a job at Postgres as their chief marketing guru, and if they're dumb enough to hire you, you can write its ad copy to be purely technical-oriented until the product is completely irrelevant in an actual production environment. ("Now for OS/2 Warp and BeOS!") Otherwise, forgive me if I don't put much weight into your opinion on the matter over the people who have written a kick-ass enterprise-quality system that is pretty much given away for free.

      Seriously, what exactly are you implying by your comment, that PostgreSQL isn't a capable database system? That they just use buzzwords instead of actual technical brainpower and muscle as the basis of their software? Because I can tell you that to people who architect, engineer, administer, and eat database systems for breakfast, you are sadly off-base here, and this comment comes off as extremely pompous and ignorant.

    3. Re:How about Parallel Query Execution? by mlyle · · Score: 2

      Really premature, and unlikely in any event.

      I think Postgres is good for what it is-- a clean, single-node database system. Clustering adds some complexity in deployment (we're working on making this easier) that you wouldn't want to incur for a typical Postgres install.

      I think there are pieces of Postgres-XL that make sense to be in core/vanilla PostgreSQL, and we'll be working to contribute them upstream. Likewise, there are more pieces from TransLattice's commercial database offering, TED, that Postgres-XL could benefit from that we intend to contribute.

  4. No SQL by TechyImmigrant · · Score: 2

    It would be nice if noSQL databases adhered to the promise in the name. They replace the query language with something sane and secure.

    --
    I should use this sig to advertise my book ISBN-13 : 978-1501515132.
    1. Re:No SQL by Anonymous Coward · · Score: 2, Interesting

      The main problem is that SQL sucks.

      Compared to what? I'm not sure you have any idea what kinds of inflexible horrors preceded the relational systems. Furrthermore, SQL is based on relational algebra, which underlies the whole RDBMS concept; if you need a data manipulation language closely matching the capabilities of an RDBMS, it has to be set-oriented and based on relational algebra (or relational calculus, which is equivalent). And there you have the root cause of the problem: a serious impedance mismatch between a set-oriented query language and a regular imperative programming language (OO notwithstanding.)

      The so called "4G" languages tried to bridge this gap and failed miserably. Various ORM schemes are not brilliant, either. Ruby on Rails seemed to offer a glimmer of hope with its "convention over configuration" approach, but that ran into a myriad of exceptions and performance problems. Nevertheless, SQL is too well matched to the strengths of relational systems to be discarded without thought. I don't know what the solution is, but ditching SQL in toto isn't likely to be part of it.

  5. Horizontal scalability? by michaelmalak · · Score: 2, Interesting

    A hallmark of NoSQL is horizontal scalability. The lack of schema in NoSQL was a brief rebellion against ivory tower DBAs that has since been regretted once it was realized that merely transferred the schema and schema versioning implicitly into the source code, and spread throughout it. Sounds like PostgreSQL got the bad part of NoSQL but not the good part.

    1. Re:Horizontal scalability? by mlyle · · Score: 4, Informative

      We just released Postgres-XL so you can have horizontal scalability and MPP.

  6. Re:Going in the wrong direction by Tablizer · · Score: 5, Insightful

    Most people don't need NoSQL. Last I checked, most people aren't Facebook or Google

    Some people get overly optimistic about their start-ups or new projects. It's like planning on where to park all the beemers before you even get your first sale.

  7. Mutt-hater! by Tablizer · · Score: 4, Funny

    Mixing/Mashing makes no sense

    No HalfSQL movement?

  8. Re:If this is anything like MariaDB by Anonymous Coward · · Score: 5, Informative

    I am actually *using* this thing. Implemented a database with ~100K XML records, access them by arbitrary XPATh expression.

    Of course "normal" access is slow, but once I agree with the customer on an access pattern, I can set up a functional index. Then we are at a couple millisecs per access (on very low-end hardware). And with GIN indexes, I can even set up things like "find all records where tag A or tag B or tag C equal one of "foo" or "bar". All for a handful millisecs. No database tuning whatsoever -- plain vanilla PostgreSQL 9.1 as it comes to us with Debian.

    Of course you can't compare it with -- say -- Elastic Search, but as soon as I finish uttering "Java" my box is out of memory :-)

    OK, on a more serious note: the usage patterns still are different: if you plan to have 100M biggish records, you'll probably want to throw a lot of boxes at the problem (unless the problem has a very specific structure). Then you'll probably be better off with Elastic Search or some such. OTOH if you want transactions, an SQL database it is. If you need both, you are in a tough place (cf. CAP theorem), so you gotta think hard.

    I don't fucking care whether it's called SQL or noSQL if it's well-done. And PostgreSQL is damn well done. The community rocks too.

  9. the hype by Tom · · Score: 5, Insightful

    As a big fan of SQL database, I've been watching this NoSQL hype for a few years now, and I'm still not impressed.

    No doubt, there are a few scenarios where a conventional database isn't the best solution. But quite honestly, 90% of the people jumping on the bandwaggon would be served just as well with an SQL database - except that like so many things, you need to do it right.

    I'm no database expert (but I know a couple), so my SQL isn't perfectly optimized and stuff, but even with a little bit of interest I see that putting some effort into your database and query design pays off massively.

    And I've seen enough cases where someone scraped their SQL database and went NoSQL for absolutely no good reason. You think you're huge and SQL is too slow? Unless you just sold to FB or Google for a couple billions, you very likely are not as huge as you think. I'm running a PostGIS database doing fairly complex geography calculations on non-trivial datasets, and it's blazing fast, and whenever it isn't one hour with an SQL expert and some experimenting makes it so, because it always, with no exception, turns out that my SQL or my database design is at fault, not the database itself.

    If you've got a billion users, I will grant you that you have special needs. But every NoSQL use I have seen has been a case of people moving database work to software code instead, mostly because programmers are plenty and cheap, while experienced database experts are not.

    So I'm still amused and very little impressed, and I'm certain NoSQL will go the way of Java or every other hype ever - for a while it's everyone's darling, then people realize it still doesn't give us AI and it can't make coffee, and will start to figure out where it actually is the best solution and stop using it for everything else.

    --
    Assorted stuff I do sometimes: Lemuria.org
    1. Re:the hype by CadentOrange · · Score: 2

      I've seen this a million times. People with poorly designed relational databases with no thought given to query plans complain that their database is slow. They then migrate said database to a NoSQL solution (typically a document database like MongoDB) and then find that it is still slow! . In a few cases, the NoSQL solution is significantly slower.

      The problem is NoSQL encompasses many different types of solutions. Key value stores like Redis are pretty good (key lookups support wildcards!!!) and I use them as an alternative to memcache. Document databases like MongoDB? If you're excited about them because you don't need a schema, you're just asking for carloads of trouble down the line because you've mistakenly bought into the thinking that you can just chuck arbitrary data into Mongo and get it to perform well.

    2. Re:the hype by Common+Joe · · Score: 2

      There's a pretty good book I own in paperback (electronic versions available) for high performance stuff from PostgreSQL. It's called PostgreSQL 9.0 High Performance. It's probably beyond what you want, but if you're interested in looking at it, let me know and I'll bring it next time we get together.

    3. Re:the hype by Tom · · Score: 2

      If your application is really, really simple and you need truly massive amounts of throughput, then NoSQL is no doubt the way to go.

      Somewhere between 1% and 10% of the shops doing NoSQL really fall into that category. Maybe as many again might, with enough growth.

      Many years ago, long before NoSQL was a thing, I was the sysadmin of one of the largest e-commerce operations in my country. We had enough users and data and throughput that a big consulting company that was tasked with developing the next-gen system for us proposed that we buy not one, but two Sun E10k. At that time, pretty much the biggest commercial machine money could buy. The current system ran on six quad-core Dell servers. Because we had optimized the living daylights out of it, with custom shared memory kernel modules for data exchange, a highly customized database installation (we were Europe's largest installation of this system, so we had pretty much every vendor support we could wish for) and, most importantly, an exceptionally well-crafted design and implementation.

      You can throw NoSQL at your problem for maximum scalability. But everything in this world comes with a prize tag. You sacrifice all the advantages of SQL. As I said: The use of NoSQL very often means moving problems that a good SQL database solves for you into code. So it means more code with more potential bugs, all in order to re-invent the wheel because you think you can make it more round. :-)

      But again: For a few percentage of cases, it is the right way to go, I am not saying it's all nonsense. Just saying it's a hype and it'll calm down.

      --
      Assorted stuff I do sometimes: Lemuria.org
    4. Re:the hype by Tom · · Score: 2

      Because it's a database.

      SQL is 40 years old. In that time, dozens of programming languages, patterns and styles have come and gone. And SQL is still here, exactly because it doesn't care if your language is OO, functional, redundant, brainfuck or agile deployment for optimized synergies with next generation engagement framework whatever.

      A database needs to concern itself with the data, not with the programming patterns of the application.

      --
      Assorted stuff I do sometimes: Lemuria.org
  10. Re:Nice touch but too late! by Simon+Brooke · · Score: 5, Insightful

    A small minority of companies, with very special needs, are using NoSql databases for a small proportion of their operations. Those companies do include some big ones, such as Google and Twitter, but still in absolute terms the numbers are small. A tiny minority of companies have moved away from relational databases altogether. But the numbers are statistically insignificant and are likely to remain so for decades. And the relational model does have some real and enduring benefits which will make it the right tool for many jobs far into the future.

    Remember this is an industry that advances very slowly indeed. Your bank, and your utility companies, are still using programs written in COBOL - technology which is fifty years behind the curve.

    --
    I'm old enough to remember when discussions on Slashdot were well informed.
  11. Re:If this is anything like MariaDB by fuzzytv · · Score: 3, Informative

    Well, yes and no. PostgreSQL had a text-only JSON data type since long time, and was able to index keys using expression indexes. That's nothing new.

    The 9.4 improvements are that the (a) JSONB is stored in a binary form, and (b) a lot of ideas from HSTORE data type, plus new ones were implemented. That means that you can create "universal" index without prior knowledge of what keys will be interesting. So then you can ask for data containing arbitrary keys, sets of keys, values, documents etc. See http://www.postgresql.org/docs...

    Sure, it's not perfect and the index may get somehow big, but well ...

  12. Nice touch but too late! by fuzzytv · · Score: 4, Insightful

    I don't know whether angry tapir knows what relational means, but I see nothing in his post IMHO suggesting he has no clue. JSON is great for storing non-relational data (hierarchies, data without fixed set of columns, ...). Not all data are purely relational, it's often a mix.