Slashdot Mirror


Sun Eyes PostgreSQL

Da Massive writes "Sun is looking seriously into the database market - namely PostgreSQL. It says Oracle and IBM and even Microsoft licensing fees are way too expensive for the average punter. This from John Loiacono, executive vice president of software: "We're not going to OEM Microsoft but we are looking at PostgreSQL right now," he said, adding that over time the database will become integrated into the operating system."

29 of 339 comments (clear)

  1. Predictable by AKAImBatman · · Score: 5, Informative

    This really isn't a surprise. MySQL has both licensing problems, and feature problems in the competitive high-end markets. PostGreSQL has none of these issues, and can hold its own in a comparison with Oracle or SQL Server. These features led RedHat to PostgreSQL for their RedHat Database product, and I see little reason why they wouldn't attract Sun as well.

    The only thing that slightly bothers me about their strategy is that Sun has been pushing their Java Systems hard. If they actually wanted to bolster that strategy, they'd have three major options for a Java Enterprise Database:

    1. Cloudscape/Derby - This product makes the most sense from a technology and licensing perspective, but the fact that it was an IBM product (even though Cloudscape was originally a separate entity before being acquired) taints the software in such a way as to make Sun look bad if they used it.

    2. Daffodil - This database is an excellent choice, but it would require the acquisition of another company, a move that the Sun shareholders might question. It would also bring quite a bit of flak in Sun's direction as Daffodil is an Indian company.

    3. McKoi SQL - An excellent choice for a Java database, but lacks brand recognition. The feature levels and scalability of the database are still considerable questions. The GPL license also allows Sun less freedom to modify the database in comparison to the BSD license used by PostgreSQL.

    As for the choice of Sunbird, I think it's simply a matter of "why not?" It's not like there's any particular leader in the market, and Sunbird plays nice with Firebird/Mozilla.

    1. Re:Predictable by MeauxToo · · Score: 4, Informative

      1. Cloudscape/Derby - This product makes the most sense from a technology and licensing perspective, but the fact that it was an IBM product (even though Cloudscape was originally a separate entity before being acquired) taints the software in such a way as to make Sun look bad if they used it.

      Derby is intended to be an embedded database, not a database server. Yes, they have a server mode, but can't hold a candle to MySQL, let alone, PostgreSQL.

      3. McKoi SQL - An excellent choice for a Java database, but lacks brand recognition. The feature levels and scalability of the database are still considerable questions. The GPL license also allows Sun less freedom to modify the database in comparison to the BSD license used by PostgreSQL.

      Since when can't you modify the source of a product with a BSD-based license? A BSD-based license is, in fact, far more liberal than the GPL because you can take the code, modify it, and close the source of the result. A perfect example is the Windows NT/XP TCP/IP stack -- stolen straight from BSD, and last I checked, Windows is not open source. In contrast to the GPL, where you have distribute any modifications you make and open-source any parts of your products that link to it. Hence, the description of the license as viral.

      Speaking from experience, PostgreSQL is a grat product. Stable, reliable, and reasonably fast for medium to large scale, multi-user, distributed environments. The products listed above are all embedded databases intended for single user, micro environments. You are, in short, comparing apples to oranges.

  2. Re:integrating into the OS by AKAImBatman · · Score: 4, Informative

    it's clear why they wouldn't go with MySQL (technical shortcomings aside).

    Actually, I'd say that the technical shortcomings have a LOT to do with it. PostgreSQL can be placed head to head with Oracle and still pretty darn appealing. MySQL really don't have that capacity (yet), and is hampered by its non-ANSI comaptible design and SQL variant. So I'm not certain that the decision was made entirely on licensing alone. After all, Sun does support the GNOME project as well, and that is solidly under the GPL.

  3. Re:Let the PostgreSql vs MySQL Debate Commence by darylb · · Score: 4, Informative

    On top of being closer to the standards Oracle uses, IIRC, PostgreSQL uses a transaction model that is essentially identical to Oracle's, even though it's implemented differently. In spite of the hype around database independence, the reality is that the differences in transactional behavior radically affect the ability to port from one database to another. The fact that PostgreSQL's native stored proc language already looks a lot like Oracle's PL/SQL, with an effort to make PostgreSQL run PL/SQL unmodified in the works elsewhere, is another big plus.

  4. Java Enterprise System isn't all in Java... by MosesJones · · Score: 2, Informative


    Sun's Java Enterprise System is about programming in Java rather than the tools in Java. The technology of the product isn't hugely important its the fact that the API and development is in Java. Databases are clearly easy with Java as JDBC makes the actual choice a pure commodity. So what Sun want is a solid database, for free, that rounds out their platform effort and means that in one download and license a client can "get started"... which often means it is all they use.

    --
    An Eye for an Eye will make the whole world blind - Gandhi
  5. Re:PostgreSQL vs MySQL by El_Muerte_TDS · · Score: 4, Informative

    SQL92:
        PostgreSQL > MySQL; but MySQL is improving it's feature set
    SQL3:
        PostgreSQL > MySQL; PostgreSQL has a few SQL3 features
    Speed:
        PostgreSQL ~= MySQL; sometimes faster, sometimes not
    Database\table\row\... Size:
        PostgreSQL > MySQL; PostgreSQL has less size restrictions, or at least, the limits are much larger than those of MySQL
    Stored Procedures:
        PostgreSQL > MySQL; MySQL not yet, but in 5 they have SQL:2003 like stored procedures; PostgreSQL has SQL, C, pgSQL, Tcl, Perl, Python and roll-your-own and a few not bundled with PostgreSQL
    Installation\maintenance:
        MySQL > PostgreSQL; MySQL is easier to set up
    OS Support:
        PostgreSQL ~= MySQL; postgres came a long way, e.g. there's now a stable Windows version.

  6. Re:I doubt it by iBod · · Score: 4, Informative

    A 'punter' is common British slang for 'your average joe'.

    Also used to mean a gambler or a prostitues client!

  7. Re:PostgreSQL vs MySQL by Cmdr-Absurd · · Score: 5, Informative

    Yes. They have been compared.
    A quite legnthly comparison can be found here.
    SQL92 compliant is a relative term.

  8. Re:I doubt it by Dun+Malg · · Score: 3, Informative
    Does anyone know the usage of the word "punter" in the article, though?

    It's a British-ism meaning about the same as "bloke", only it can apply to men or women. Tends to have shades of "lowest common denominator" to it, meaning something like "an ordinary slob off the street picked at random".

    --
    If a job's not worth doing, it's not worth doing right.
  9. Re:I doubt it by I+confirm+I'm+not+a · · Score: 2, Informative

    It's a British-ism meaning about the same as "bloke", only it can apply to men or women.

    I've mostly heard it used/used it myself to describe "customers", particularly gamblers. As in "I had a punt on that nag but lost my shirt". I believe politicians use it to describe their electorate, but I couldn't possibly comment.

    Disclaimer: I've only heard the term used in Scotland; it's usage elsewhere in Britain may be more/less general.

    --
    This is where the serious fun begins.
  10. Re:I doubt it by gowen · · Score: 4, Informative
    It's a British-ism meaning about the same as "bloke"
    More usually, it means "someone who is in interested in buying something". It's most frequently used with respect to gamblers (particularly occasional horse-racing gamblers), since "having a punt" means "taking a gamble." It also means "people who frequent prostitutes", thus PunterNet, the leading online guide to "facilitate the exchange of information on prostitution in the UK"
    --
    Athletic Scholarships to universities make as much sense as academic scholarships to sports teams.
  11. Re:I doubt it by Doctor+Memory · · Score: 1, Informative

    It's British for "Joe Six-Pack"

    --
    Just junk food for thought...
  12. Re:Sun? PostgreSQL? by bernywork · · Score: 2, Informative

    I do find it interesting that Telstra is a Sun software customer

    Telstra are also a big Microsoft customer and also a big Linux user. They use IBM GSA extensively too. What's your point?

    I know one guy who worked on an implementation of part of the Telstra Mobile billing system for IBM GSA as Telstra found out that they weren't cathing the milliseconds to seconds in cell switch time and therefore billing users for it.

    This just like the comment in the article is just padding. It doesn't really add anything to the post.

    IBM has DB2, Microsoft has Microsoft SQL Server, Sun has.... Oracle? No....

    I doubt very highly that Sun would buy PostgreSQL Inc, they would partner with them and do some code development of PostgreSQL to get it to the level where it can definately compete head on with Oracle (Although Oracle do have a lot of other software that at present Sun doesn't have) and MS SQL and DB2. The thing they would be best off doing (And probably will do) would be to go out and hire key developers of PostgreSQL to try to prioritize more the requirements that they are after.

    --
    Curiosity was framed; ignorance killed the cat. -- Author unknown
  13. Re:I doubt it by nelsonal · · Score: 2, Informative

    In this context punter is a buyer with shades of uninformed buyer. The term comes from the race tracks where betters became known as punters, and has evolved to refer to more uninformed buyers of especially tech and financial products.

    --
    Degaussing scares the bad magnetism out of the monitor and fills it with good karma.
  14. Re:PostgreSQL vs MySQL by StrawberryFrog · · Score: 2, Informative

    MySQL > PostgreSQL; MySQL is easier to set up

    I don't think this is much of an issue, I recently installed postgreSQL on my Windows XP machine in order to try it out. The installation was 100% simple and painless.

    --

    My Karma: ran over your Dogma
    StrawberryFrog

  15. Re:integrating into the OS by morgan_greywolf · · Score: 2, Informative

    After all, Sun does support the GNOME project as well, and that is solidly under the GPL.

    Actually most of the GNOME is licensed under the LGPL.

  16. Derby by ievans · · Score: 2, Informative

    Sun already has several engineers working on Derby through Apache. Sun bundles Derby with Glassfish (the newly open-sourced Java EE 5 app server), which also integrates Derby into the app server for the EJB timer service, and bundles it with the Java Enterprise System stack. Sun is actively promoting Derby as a development database. There was a story about it here on Slashdot not too long ago.

    Sun used to bundle Cloudscape before IBM bought Informix, and subsequently switched to Pointbase. For App Server 9/Glassfish, they pulled Pointbase and replaced it with Derby.

  17. punter == mark by HermanAB · · Score: 2, Informative

    American slang for 'punter' is 'mark'. A gambler, but more specifically, a loser.

    --
    Oh well, what the hell...
  18. Re:PostgreSQL vs MySQL by JohanV · · Score: 3, Informative
    Installation\maintenance: MySQL > PostgreSQL; MySQL is easier to set up
    You might want to check out this lengthy review of the installation of PostgreSQL, MySQL and Oracle on Windows that has a winner that may be a bit surprising to those that have not been keeping tabs on what has been happening recently.
  19. Re:I doubt it by david+duncan+scott · · Score: 2, Informative

    Since you're unfamiliar with the term, you must be unfamiliar with The Register. The BOFH alone is worth the price of admission.

    --

    This next song is very sad. Please clap along. -- Robin Zander

  20. Re:I doubt it by Anonymous Coward · · Score: 1, Informative

    Isn't a prostitute's client called a "john" or "trick"?
    I know some people get confused and call the "ho" the "trick", thanks to the ill informed rappers claiming that "biatches aint nothin but hos and tricks"

  21. PostgreSQL vs MSSQL vs Oracle by WebCowboy · · Score: 5, Informative

    ...is probably the most fair comparison.

    Don't know much about Postgres in production environemnts. It seems clean and I like the fact you have a choice of stored procedure languages.

    I have had experience with both in production environments, and I've come to the conclusion that PostgreSQL is clearly a step above MSSQL in terms of features and scalability. It is much better than MSSQL with concurrency and managing contention (MSSQL's locking strategy is quite brain dead). There is much more flexibility and power to create user functions and stored procs in PGSQL--you can do things like make user-defined AGGREGATE functions and data types in addition to having a choice of languages (none of that is possible with MSSQL). I find that all things being equal PostgreSQL is probably faster as well (largely an assumption becasue the PostgreSQL systems I've worked with are running on considerably less powerful hardware than the MSSQL systems I am doing). A lot of people comment about the ease of administration of MSSQL but I find that PGSQL really isn't that hard to manage even if you don't use GUI tools.

    Oracle is certainly one step above PGSQL in power--but of course that comes with a very hefty price tag. That price isn't just in licensing either--Oracle takes more time to administer and you also pay by losing flexibility, since enterprise systems based on Oracle better do things the "Oracle way" or you are inviting trouble (just like with Microsoft products, Oracle really pushes its single-vendor solutions).

    I have not played with Yukon/MSSQL 2005 yet, though I've heard a fair bit about it. From what I've heard it closes the gap a fair bit and comes much closer to PGSQL in terms of features and performance--it is supposed to handle locking/contention better and its has embraced .NET--meaning that you can write stored procs and functions in any .NET language. So, they are probably a pretty close match except in a couple of areas--PGSQL is free (libre and gratis), and PGSQL is not platform dependent. I think that the fact MSSQL only works on Windows is a major drawback when all its competitors offer products that run on Windows, Linux and various UNIX derivatives. Various "facts" notwithstanding I still think that Windows servers are a greater administrative burden and more difficult to secure than other alternatives--perhaps the next server version after 2003 will have addressed that.

    1. Re:PostgreSQL vs MSSQL vs Oracle by Anonymous Coward · · Score: 1, Informative

      And Firebird. And lots of object databases. MVCC isn't all that exotic.

  22. Re:I doubt it by fanfriggintastic · · Score: 2, Informative

    That's 3 in America, actually. Two in NY, one in California.

    --
    This is not the greatest sig in the world, no. This is a tribute.
  23. Re:Hmmm. by oGMo · · Score: 4, Informative
    It would be years, if ever, before Postgres gets the kind of features that make Oracle a must have for many high end applications.

    Actually that's not remotely true. We're not talking about MySQL here. PostgreSQL is quickly gaining all the "high-end" features of Oracle: tablespaces, failover, replication, etc. In some cases, they aren't yet as fine-grained as Oracle. In other cases, they're superior. PostgreSQL is quickly coming into its own.

    On top of this, it's a lot less painful to work with, and the SQL featureset is far nicer. After having worked with them both on a daily basis, the only reason I'd willingly use Oracle is if I was working with terabytes of data and had lots and lots of money to throw at Oracle to make it work and support it. Which I don't. Like Sun is saying, this is unjustified for most people.

    --

    Don't think of it as a flame---it's more like an argument that does 3d6 fire damage

  24. Re:OS Integration is a great idea by cdwiegand · · Score: 3, Informative

    Thank goodness for SQLite, then. Just add the dll (or dependancy for linux-folks) to your package and voila. This also assumes you're using some DB-abstraction layer, such as PDO, ADO, ADO.Net, etc...

    --
    . Define sqrt(x) as something really evil like (x / rand()), and bury it deep. Watch your coworkers go nuts.
  25. Re:PostgreSQL vs MySQL by photon317 · · Score: 4, Informative


    The biggest one that has made a difference in my life lately:

    Table Partitioning:
    PostgreSQL > MySQL; Mainline PostgreSQL has table partitioning as of 8.1-beta, by leveraging inheritance (Postgres is an Object-Relational Database).

    Queries on the aggregate of the partitions are directed at the parent table, and optimized to only look into appropriate sub-table by checking CHECK constraints of the sub-table against the query WHERE clause.

    Basically, you do it like this (contrived, but related to how I'm using them at the moment):

    MyBigFatTable stores timestamped data from a bunch of a machines at regular intervals, keying off of the machine id and the timestamp of the data:

    CREATE TABLE MyBigFatTable (
        machineid INTEGER REFERENCES machines(machineid),
        stamp TIMESTAMP,
        data_x FLOAT,
        data_y FLOAT,
        [... lots more data fields ...],
        PRIMARY KEY (machineid, stamp)
    );

    Your problem is, the table size grows and grows and grows unbounded, and database operations continue to get slower and slower (inserts, updates, and selects) as the table grows. You have a policy to expire the data after a month which limits the maximum growth, but this in turn requires lots of deletes happening all the time, which again hurts performance.

    The inheritance-based partitioning solution is to leave that table definition as it is, and also define:

    CREATE TABLE MyBigFatTable-2005-10-05 (
        PRIMARY KEY (machineid, stamp),
        FOREIGN KEY (machineid) REFERENCES machines(machineid),
        CHECK ( stamp >= '2005-10-05 00:00' AND stamp '2005-10-06 00:00')
    ) INHERITS MyBigFatTable;

    As you can see, the column definitions are inherited, but you must re-specify the PK/FK stuff. The added check clause says that only data from Oct 10, 2005 is valid in this subtable.

    You set up a maintenance script to create your new time-based tables ahead of time (say once a day create tables for the next day), and you do your data INSERTs into the specific subtable (you know the timestamp of the data you're inserting, so you can generate the appropriate table name from that (MyBigFatTable-2005-10-05).

    You run your SELECTs against the original MyBigFatTable just as you did before. It automatically includes any rows from its child tables. Further, if your SELECT's WHERE-clause was constraining a query to a specific time-range, only those children of MyBigFatTable whose CHECK constraint indicates they could possibly have relevant data are checked.

    And as for the problem of expiring data and the delete traffic you had before? You simply drop the old child tables with "DROP TABLE" from a maintenance script when they're a month old - no DELETEs neccesary.

    --
    11*43+456^2
  26. Re:Hmmm. by kpharmer · · Score: 5, Informative

    > Actually that's not remotely true. We're not talking about MySQL here.
    > PostgreSQL is quickly gaining all the "high-end" features of Oracle:
    > tablespaces, failover, replication, etc. In some cases, they aren't yet as
    > fine-grained as Oracle. In other cases, they're superior. PostgreSQL is quickly
    > coming into its own.

    Hmmm, as much as I like postgresql I don't see it that way:

    1. replication? it's most often used as a clunky way of implementing failover - yuck. In my large data architectures, replication is almost never used: it's almost always the worst solution to some problem.

    2. tablespaces? yep, they're good things to have. that's fine - i think oracle and db2 have supported them for around twenty years, so it's hardly high-end technology tho.

    3. failover? ok, this is critical - but there are also many different forms & flavors. I'm not familiar with what postgresql has so I won't comment - other than to say it needs to be rock-solid.

    ok, how about a few more:

    4. memory management: a high-end database should give you a ton of control over how memory is handled - especially when you plan to buy tons of it. Here the big databases allow you to assign different amounts of memory to different buffer pools, which are then assigned to different tablespaces. These bufferpools (caches) are how to easily ensure that hits against some tables or indexes occur 99% of the time from memory, and others 50% because they're so much larger. I'm pretty sure that neither postgresql or mysql can do this.

    5. process management: in db2 your application writes to a buffer pool, an asychronous agent picks up that change and writes it to a log file, another asynchronous agent picks it up and writes it to the table. This heavily-asychronous behavior (and yes, with memory & processor tuning available for each agent type) allows you to maximize write-throughput. Postgresql and mysql are still in the slower sychronous world.

    6. parallelism: in mysql and postgresql all queries are single-threaded. In db2 and oracle a large query will actually split itself up into multiple sub-queries to maximize throughput for multiple cpus and storage arrays. This provides db2 & oracle with linear performance improvements up to 4-8 cpus. In othe words, large queries that perform table scans can take advantage of SMP hardware for the commercial products - and cut down your query time by 75% on a 4-way compared to mysql and postgresql.

    7. partitioning: btree indexes only work for very selective queries - like when you want 1% or less of the data of a table. But for many queries you need to crunch 5,10,or 15% of the data. That's where range partitioning comes in: you just scan the data you absolutely need to. So, while db2 or oracle are scanning 10% of the data - postgresql or mysql still have to scan 100% of the data. That would result in a 10x increase in speed over postgresql or mysql.

    that's just off the top of my head - given a little time this list would double.

    Postgresql is a fine tool, and it has all the technology that db2 or oracle had 12-15 years ago. And that's a cool achievement, and qualifies it do a ton of cool projects. Plus, with time it will catch up. But it still has a *long* way to go.

  27. Re:Hmmm. by nconway · · Score: 4, Informative
    Here the big databases allow you to assign different amounts of memory to different buffer pools, which are then assigned to different tablespaces.


    Yeah, Postgres doesn't currently support this. IMHO it isn't that useful -- the performance improvement I'd expect would be pretty small (for one thing, all Postgres buffering is done in addition to the kernel's buffering, so the net impact will be smaller). It also adds a significant administrative burden -- you need to configure which objects go in which pools, as well as how large each pool is.

    5. process management: in db2 your application writes to a buffer pool, an asychronous agent picks up that change and writes it to a log file, another asynchronous agent picks it up and writes it to the table. This heavily-asychronous behavior (and yes, with memory & processor tuning available for each agent type) allows you to maximize write-throughput. Postgresql and mysql are still in the slower sychronous world.


    DB2 may well be better than Postgres here, but your explanation above doesn't make a lot of sense. In Postgres, a committing transaction only needs to wait for the WAL record describing the transaction to be flushed to disk (multiple transactions that commit concurrently can be flushed via a single fsync(2)). That is the only I/O that needs to be done synchronously -- the rest can be done async (notably, this includes the table I/O itself -- the modified buffers are just marked dirty in memory and are subsequently flushed to disk). Note that a backend may also need to wait for dirty pages to be flushed from the buffer pool if it is trying to replace a dirty page with a clean one, but (a) those flushes are done via write(2), so there is not necessarily a disk flush involved (b) the background writer in 8.0+ is intended to resolve this by ensuring that most of the work of flushing dirty pages is not done by a normal backend.

    6. parallelism: in mysql and postgresql all queries are single-threaded. In db2 and oracle a large query will actually split itself up into multiple sub-queries to maximize throughput for multiple cpus and storage arrays. This provides db2 & oracle with linear performance improvements up to 4-8 cpus. In othe words, large queries that perform table scans can take advantage of SMP hardware for the commercial products - and cut down your query time by 75% on a 4-way compared to mysql and postgresql.
    ... assuming your table scan is CPU-bound, which is almost certainly not the case. In practice, intra-query parallelism is useful for two scenarios that I can think of: creating large indexes, and OLAP workloads in which you are running a small number of concurrent queries, each of which is extremely expensive. In more normal OLTP circumstances, the number of concurrent clients far exceeds the number of CPUs in the box, so you don't need to parallelize within each query. Still, I agree this would be useful to have in some circumstances, although it's a bit difficult to see how to implement it reasonably.

    7. partitioning: btree indexes only work for very selective queries - like when you want 1% or less of the data of a table. But for many queries you need to crunch 5,10,or 15% of the data. That's where range partitioning comes in: you just scan the data you absolutely need to.


    PostgreSQL 8.1 (currently in beta) includes "constraint exclusion", which is essentially a primitive form of table partitioning (using inheritence and check constraints, you divide the data into tables with distinct check constraints; the optimizer has been improved to recognize when a child table can be omitted from the query plan by looking at the check constraints involved).