Slashdot Mirror


Database Clusters for the Masses

grugruto writes "Cluster of databases is no more the privilege of few high-end commercial databases, open-source solutions are striking back! ObjectWeb, an Apache-like group, has announced the availability of Clustered JDBC (or C-JDBC). C-JDBC is an open-source software that implements a new concept called RAIDb (Redundant Array of Inexpensive Databases). It is simple: take a bunch of MySQL or PostgreSQL boxes, choose your RAIDb level (partitioning, replication, ...) and you obtain a scalable and fault tolerant database cluster."

278 comments

  1. WOOHOO! by semifamous · · Score: 2, Funny

    Wow! Can you imagine a beowulf.... oh, wait, nevermind

    1. Re:WOOHOO! by phrantic · · Score: 1

      no it's a cluster of clusters...

      --
      --My sig is bigger than your sig--
  2. Non-Java Implementations? by the_quark · · Score: 4, Interesting

    Just started looking at the site. I've wanted this for years. I was ecstatic with what load-balancing cheap Apache boxes did for the cost of web hosting. Unfortunately, reliability has still required hundreds of thousands of dollars of high-end equipement and software for databases. I've been hoping the open-source community would make headway on this front.

    So, the question is - is anyone working on anything like this for Perl, C, or generic implmentations?

    1. Re:Non-Java Implementations? by the_2nd_coming · · Score: 1

      I would think that as soon as some one does it in one language a perl monk can fling out a 5 line mess in a few hours.

      --



      I am the Alpha and the Omega-3
    2. Re:Non-Java Implementations? by Etcetera · · Score: 3, Insightful


      Exactly -- given that the RAIDb itself sits elsewhere, I can't imagine it would be that hard to take the source itself and make a Perl DBD::Module out of it.

      If only I had the spare time... :/

    3. Re:Non-Java Implementations? by akadruid · · Score: 4, Insightful

      Unfortuntly there is no free open source hardware available :)
      Seriously though, this may reduce the costs for some users but I don't think it will get a wide take up. Most people will not want to leave the deniability you can have with large corps like Oracle. Oracle is a 'safe' solution for the purchaser with their ass on the line, which is most corperate users these days.
      And the more entrepenrial users will not usually have the hardware to use this properly anyway.
      Anyone who is financing this lot will want proven standards.
      Just my flawed £0.02

      --
      "Those who cast the votes decide nothing; those who count the votes decide everything." (attrib. Joseph Stalin)
    4. Re:Non-Java Implementations? by akadruid · · Score: 1

      5 line mess in a few hours
      A few hours for five lines?
      I thought I was slow!

      --
      "Those who cast the votes decide nothing; those who count the votes decide everything." (attrib. Joseph Stalin)
    5. Re:Non-Java Implementations? by lobsterGun · · Score: 1

      Thanks for posting this comment early. I was so afraid that the first comment I read on this subject would be a "imagine a beowolf clust.." comment that had been mod'd up as 'funny'.

    6. Re:Non-Java Implementations? by the_2nd_coming · · Score: 0

      a perl hacker can code a 5 line mess in a few hours.

      you seem to be slow. I was actualy dising perl becasue of its syntax (that is why I called it a mess) moron

      --



      I am the Alpha and the Omega-3
    7. Re:Non-Java Implementations? by palad1 · · Score: 3, Insightful
      So, the question is - is anyone working on anything like this for Perl, C, or generic implmentations?

      Am I the only one a bit saddened by the fact that Sun botched it with java that much, that we now exclude java from 'generic implementations'

      Build once, run anywhere, riiiiight.

    8. Re:Non-Java Implementations? by akadruid · · Score: 1

      This is what I mean and why I don't use perl.

      I think perhaps the syntax of my sentance could have used some improvement too...

      --
      "Those who cast the votes decide nothing; those who count the votes decide everything." (attrib. Joseph Stalin)
    9. Re:Non-Java Implementations? by the_2nd_coming · · Score: 1

      are you sure you don't use perl :-)

      sorry I missed your meaning.

      --



      I am the Alpha and the Omega-3
    10. Re:Non-Java Implementations? by Anonymous Coward · · Score: 0

      Then let MySQL fund it. Then they can incorporate it into /. as a test bed, then omebody may use it for real data.

    11. Re:Non-Java Implementations? by The+AtomicPunk · · Score: 1

      He's talking about Perl. The fact that it's only 5 lines of code certainly does not mean it's not insanely complicated. :)

    12. Re:Non-Java Implementations? by the_quark · · Score: 4, Informative

      Not to argue in any way that Sun botched Java, but what I meant is, this implementation is for Java programs. It provides no functionality for programs not written in Java. Even if Sun had done Java correctly, my statement would still be true - this isn't a generic implementation, as it requires the code be written in Java. Even if Java itself were generic, this implementation wouldn't be generic, it'd be Java-specific.

      When I said "generic implementation" I meant "an implementation which doesn't require your programs be written in a particular language." Which is probably a bit of a pipe dream, you'd still need some sort of glue code (ODBC, JDBC, DBD, etc). But, as was alluded to above, I was trying to beat the Beowulf comment when I asked my question. ;)

    13. Re:Non-Java Implementations? by The+AtomicPunk · · Score: 1

      No, because I think everybody else understands that there's really no such thing as a 'generic implementation'. :)

    14. Re:Non-Java Implementations? by palad1 · · Score: 4, Interesting

      Please don't take my previous post as a flame, I completely agree with your point. What I was whining about was the fact that java doesn't play nice with system libs, as it is 'easy' to import other libs, but exporting java classes to other languages is ...
      Let's say that few people feel like embedding a JVM to their C app :)

    15. Re:Non-Java Implementations? by grugruto · · Score: 1

      The client (driver side) cannot be generic. It will always be application dependent.
      Therefore, you will always have to port (at least) the driver. But the controller itself (where the cluster logic is really implemented) just deals with SQL strings sent over sockets whatever the client is on the other side.
      Could be interesting to have an ODBC driver sending the requests to a Java C-JDBC controller.

    16. Re:Non-Java Implementations? by GT_Alias · · Score: 1

      Miss this ?

    17. Re:Non-Java Implementations? by caluml · · Score: 2

      C-JDBC 1.0alpha0? Unfortunately the very people that need large resiliant clusters probably won't fancy giving alpha software a go. But it can only mature, and get better.

    18. Re:Non-Java Implementations? by tha_mink · · Score: 1

      The fact that it's only 5 lines of code certainly does not mean it's not insanely complicated.

      No, the fact that it's *perl* means it's overly complicated. (as in explosion in the punctuation factory)

      --
      You'll have that sometimes...
    19. Re:Non-Java Implementations? by DonkeyJimmy · · Score: 2, Funny

      Exactly -- given that the RAIDb itself sits elsewhere, I can't imagine it would be that hard to take the source itself and make a Perl DBD::Module out of it.

      You don't have a very good imagination.

      --
      "Probably the toughest time in anyone's life is when you have to murder a loved one because they're the devil." -Philips
    20. Re:Non-Java Implementations? by Thoguth · · Score: 1

      It may be a hackish workaround, but I know php has experimental support to load Java classes, and I believe Python can as well. For someone with the patience and willingness to work with marginally-supported or unsupported tools, you can already do this. (but it's probably got a ways to go before companies start using it to make their cheap hosting better)

      --
      The requested URL /iframe/sig.html was not found on this server.
    21. Re:Non-Java Implementations? by Coz · · Score: 1

      What, you want a glue factory replacing your "native" ODBC library? Given how db-specifics still seem to creep into ODBC, I don't blame these guys for tackling a solvable problem. JDBC has the advantage that it was constructed from the ground up to be actually used, and subjected to a lot of prototyping and early scrutiny, as opposed to ODBC, which has parts that look like they got put in by vendor reps going "Heh heh heh - they'll never figure out how to use this!"

      --
      I love vegetarians - some of my favorite foods are vegetarians.
    22. Re:Non-Java Implementations? by necio_online · · Score: 1

      You can use the Linux Virtual Server to
      cluser databases. If there are mostly reads,
      it will be rather easy. MySQL has built-in
      replication.

      http://www.linuxvirtualserver.org

      It's not that transparent, though.

      --
      http://arhuaco.org/
    23. Re:Non-Java Implementations? by antis0c · · Score: 1

      Actually it just requires anything that can use JDBC. I could for example, use JNI or the opposite of JNI (cant think of the name) where you link to Java code from C to initiate calls to the JDBC driver for the cluster. For example within PHP I can create Java classes using the Java extension, so there is no reason I couldn't use PHP to connect to this cluster. Same with Perl, and just about any other language.

      Granted, I would have liked to see a more generic implementation but whats it going to be generic in? C? You'd still need to write an interface for it anyway from every other language, it just so happens they wrote it in Java so every gets all up in arms about it.

      --

      ..There's a-dooin's a-transpirin'
    24. Re:Non-Java Implementations? by Corpus_Callosum · · Score: 1

      For a short term solution, one could always write a Perl, C or *whatever* binding that talks to the Java libraries that are handling the database(s), therebye buying "procrastanation time" w/respect to a full re-implementation in other languages...

      --
      The reason that it can be true that 1+1 > 2 is that very peculiar nonzero value of the + operator
    25. Re:Non-Java Implementations? by Matt2000 · · Score: 2, Interesting

      I don't know if you're still writing applets, but Java works. And there's a reason this DB RAID was written for Java and not for the other languages you mentioned, because Java works.

      You should check out some of the Java technologies post 1999, they're entrusted with a lot of sensitive computing nowadays.

      --

    26. Re:Non-Java Implementations? by buckinm · · Score: 1

      Let's say that few people feel like embedding a JVM to their C app :)

      It's not really all that hard to do, especially if you use some sort of code generator to do it. I have a java app that acts as a service. Although most of my clients are java based, I have a few win 95 clients that I have to support also. I'd use xml, but on the boxes involved, it's too much overhead. So- I wrote a code generator that created C++ proxy classes for the java ones. Works great.

      --
      This isn't any ordinary darkness. It's advanced darkness.
    27. Re:Non-Java Implementations? by StandardDeviant · · Score: 1

      Hmm... this would be kind of a hack, but you could write a "query pass-through" java program. In other words, run a java program as a daemon that accepted queries on a given port and returned result sets. Or it could cache the queries and corresponding result sets, to save execution time. You could even get really fancy and have a pagination mechanism that would allow different "reader" clients to page through large result sets on a per-reader-client-thread basis.

      Just a thought. You could dedicate one machine to the redirector and N many machines to the RAIDb, much like putting a cisco local director in front of a cluster of web servers.

    28. Re:Non-Java Implementations? by perlchild · · Score: 2, Insightful

      You forgot the replication and transactional aspects of it...
      What happens if a transaction fails on one member of a cluster, but not another, do you report success or failure?
      That's the problem with using this kind of proxy architecture, once you "commit" transaction on server 1, if it fails on server 2, how do you rollback server 1? you can't... it's already committed...
      (I won't go into the atomicity of how you would rollback a commited, non-atomic change because another server failed, to keep them in check, nor how that might mean you might have to stop accepting transactions until the discrepancy is resolved)

      None of this is covered by LVS, which is a fine product, it just doesn't apply to the right area of the problem(there's more to database clustering than connection redundancy).

    29. Re:Non-Java Implementations? by KingRamsis · · Score: 1

      actually there is open source free hardware
      my informative 0.02 cents :-)

    30. Re:Non-Java Implementations? by danpbrowning · · Score: 1

      Tim Bunce already did. Several years ago. It's called DBD::Multiplex.

      --
      Daniel
    31. Re:Non-Java Implementations? by moderators_are_w*nke · · Score: 1

      How hard would it be to wrap it in CORBA and call it from anywhere? Not too hard methinks.

      --
      "XML is like violence. If it doesn't solve your problem, use more." - Anonymous Coward
    32. Re:Non-Java Implementations? by chrisback · · Score: 1

      Unfortunately, this probably is not feasable outside at the JDBC, ODBC, DBD level for a database cluster that handles reads and writes simultaneously. Imagine for example you have a DBD interface that behaves similarly to the C-JDBC tool and they are both writing at the same time to the same table on a cluster of DBs. Because the two access methods do not know anything about each other, there can be no garuantee that writes occur in the same order across al databases in the cluster. Auto-increment columns become worse than useless. Updates are even worse.
      What somebody really needs to do is strip away all parts of the database server excepth that which recieves client requests. This code could then be modified to forward those requests on to a cluster of normally functioning servers. The benefit here is client side code for all languages remains the same. Any application that can connect to the original database server can also connect to this proxy app. The hard part is setting up load balancing, fail-over, etc. Fortunately, the code to do this is available in java. A developer savy in both java and the native language used by the database software developers could probably easily accomplish this.

    33. Re:Non-Java Implementations? by Anonymous Coward · · Score: 0

      Yeah, ruby can do the same thing too. I suspect that all the fairly mature scripting languages (Perl, Python, Ruby, etc) can

    34. Re:Non-Java Implementations? by noelbk · · Score: 1

      Yes! Check out www.hotswap.net. We're developing process replication and failover at the OS level, so any program (C, Java, Python, whatever) can fail over transparently. So far we've tested Perl, Python, and PostgreSQL.

  3. hmmm by the_2nd_coming · · Score: 4, Interesting

    now if only MySQL or PosgreSQL can get the reputation that Oracle has mabye we will start to see Oracle DBs go away in favor of the cheaper solutions using RAIDb

    --



    I am the Alpha and the Omega-3
    1. Re:hmmm by zsmooth · · Score: 1

      It's not just a matter of reputation - MySQL and Postgres, as impressive as they are, are still nowhere close to Oracle in terms of features. Yes, most of those features may be high-end, but they're still features people look for. One example: RMAN.

    2. Re:hmmm by poot_rootbeer · · Score: 1

      now if only MySQL or PosgreSQL can get the reputation that Oracle has

      You mean 'being run by a privacy-hating megalomaniac like Larry Ellison'?

      Open source RDBMS's are good solutions for many, perhaps even most, problems. But there are still some situations where I'd want to stick with Oracle's strength and maturity and not take chances.

    3. Re:hmmm by Sxooter · · Score: 2, Insightful

      Interesting point. I find that there are several views when it comes to OS databases.

      One is that since most open source databases lack some feature, they will never replace any Oracle servers. Most of the people who believe this also believe that Oracle servers are always used in high parallel load transactional systems that have to be up 24/7 and never go down. While plenty of sites that need that use oracle, it is not inversely always true. Many places put Oracle online because it's what their developers know and love, not because it's the best fit for the problem.

      The next view is that Open Source databases are ready to replace Oracle right now, everywhere. While there are plenty of places using Oracle that could switch to Pgsql/MySQL/Firebire right now, there are plenty more that couldn't dream of it. It's all about what you're doing with your database that defines which ones you can use.

      The final view is the right tool for the job view. These folks are rare. They're actually loaded test datasets into various database engines, read up on how each db's locking mechanism works, examined each to see where the best fit is.

      People relying on the first two views are treating computer science like a religion instead of a science.

      --

      --- It is not the things we do which we regret the most, but the things which we don't do.
    4. Re:hmmm by Anonymous Coward · · Score: 0

      is there a branch of science that hasn't become a "religion" these days?

    5. Re:hmmm by quantum+bit · · Score: 1

      Open source RDBMS's are good solutions for many, perhaps even most, problems. But there are still some situations where I'd want to stick with Oracle's strength and maturity and not take chances.

      PostgreSQL isn't mature? It's a direct descendant of Ingres, the original relational database. Ingres was written in 1977 at Berkeley. Bob Miner, Ed Oates, and Bruce Scott saw the commercial potential of RDBMS and founded a company later in 1977 called Software Development Laboratories. Larry Ellison joined up with them several months later.

      It wasn't until two years later in 1979 that the first version of Oracle was released (SDL had since changed its name to Relational Software Inc.).

      In 1983, Relational Software changed its company name to Oracle.

      The funny part is that Berkeley UNIX (i.e. BSD) started out as a modification to AT&T UNIX to provide a better OS to run Ingres on...

    6. Re:hmmm by leandrod · · Score: 1
      > Berkeley UNIX (i.e. BSD) started out as a modification to AT&T UNIX to provide a better OS to run Ingres

      That would be delightful! Do you have any references at hand?

      --
      Leandro Guimarães Faria Corcete DUTRA
      DA, DBA, SysAdmin, Data Modeller
      GNU Project, Debian GNU/Lin
    7. Re:hmmm by quantum+bit · · Score: 1

      That would be delightful! Do you have any references at hand?

      My main source was this page that seems to ramble on about various database-related things for a while.

      http://www.cs.uiuc.edu/news/alumni/win97/abbasi. ht ml

      For a more authoritative source, here's a chapter from an O'Reilly book that talks about the evolution of the BSD line of UNIX:

      http://www.oreilly.com/catalog/opensources/book/ ki rkmck.html

      There's a paragraph near the beginning talking about the Ingres project and how it was one of the first heavy users (and modifiers) of UNIX at Berkeley. Interestingly, that particular article dates the Ingres project back to 1974, considerable earlier than I had previously thought...

    8. Re:hmmm by leandrod · · Score: 1

      Thank you! That was nice. I hope someday someone writes this history in full. Today's world is lacking memory...

      --
      Leandro Guimarães Faria Corcete DUTRA
      DA, DBA, SysAdmin, Data Modeller
      GNU Project, Debian GNU/Lin
  4. Re:Nothing beats Oracle RAC by Anonymous Coward · · Score: 1, Funny

    Is RAC the new name for OPS? If now how do they differ?

  5. If only replicaton was so trivial by marcink1234 · · Score: 4, Insightful

    Running many databases is easy. Organizing and serializing replication is hard. Even if one have distributed transactions handy - not present in this case. But let's read their code...

    1. Re:If only replicaton was so trivial by Anonymous Coward · · Score: 1, Informative

      Good point. The sooner people realize this, the better. Running SQL server in partition mode doesn't provide any failover. for that matter running any database in partition mode without replication provides no failover.

  6. That's nice... by Usquebaugh · · Score: 0, Offtopic

    ...but what about my desktop? I want to run my company on xterms connected to a large cluster. Problem, nobody has yet to provide High Availability for Linux or X. Not your heartbeat type stuff or MOSIX etc. But a cluster that I can hotswap entire machines and nobdoy loses uptime. Bits and pieces are there but not everything.

    Yes I submistted an Ask /. but it's still waiting to be posted, only been there since Feb.

    1. Re:That's nice... by the_2nd_coming · · Score: 0, Redundant

      it is called a beowolf cluster.

      --



      I am the Alpha and the Omega-3
    2. Re:That's nice... by Usquebaugh · · Score: 1

      No,
      a beowulf will just allow you to disperese workload. Either at the process or at the thread with MOSIX.

      In fact a HPC like MOSIX can result in reduced uptimes. If a machine has a 1% failure rate then what is the failure rate of 100 machines in parrallel?

      The question all boils down to how granular your recovery process is. For a desktop you need very fine granularity and few current systems provide this. I think Tandem provides this by using special hardware and kernel patches for NT.

    3. Re:That's nice... by Anonymous Coward · · Score: 0

      Check apple OSX Server NetBoot...

    4. Re:That's nice... by the_2nd_coming · · Score: 1

      hmm...thanks for the info

      --



      I am the Alpha and the Omega-3
    5. Re:That's nice... by Anonymous Coward · · Score: 0

      From the beowulf faq (http://www.canonical.org/~kragen/beowulf-faq.txt)

      "Some Linux clusters are built for reliability instead of speed. These are not Beowulfs."

      This article is about reliability so it seems to me all the mentions of beowulf are bogus.

  7. Performance? by deranged+unix+nut · · Score: 5, Interesting

    Hmm, interesting idea. I didn't see performance listed as a feature.

    I wonder how much slower my query will be when the data is spread across several machines. I'd imagine that a few complex queries that aren't correctly optimized would bring this system to it's knees rather quickly.

    1. Re:Performance? by jsin · · Score: 5, Informative

      Database clustering is typically used for high-avaliability, not performance.

      There are better ways to improve the performance of a database, horizontal partitioning, federated servers, etc.

      This would be very cool if there was a generic implementation; we build many Microsoft SQL clusters and just the hardware requirements for an MSCS cluster easily exceed $50k, let alone the licensing...as an MCDBA I'd consider an open source solution if I could use it as a back-end ot an ASP/VB.NET application, just to save the licensing $$ for consulting! ; )

    2. Re:Performance? by Anonymous Coward · · Score: 0, Troll

      God you're an idiot.

      HA Oracle implementations are for *BOTH* performance and HA. All systems can be in use at the same time, so why have the hardware go to waste?

      I get fucking sick of all these armchair syadmins on slashdot.

    3. Re:Performance? by Anonymous Coward · · Score: 2, Informative

      C-JDBC can handle more than just full partitioning or replication, it also provides partial replication (a little bit like you would use RAID-5 with disks).
      The idea is that with full replication you have to broadcast the write to all databases (to be consistent) and you can only balance the reads. By controlling the replication of each database table, you can have scalable performance. Look also at the nested RAIDb levels, it's pretty cool to build large configurations.
      Some tests have been done with TPC-W and performance scales linearly up to 6 nodes (we did not have a larger cluster to test bigger configurations).
      Sure it will not replace very large Oracle configurations at the end of the year, but it looks very promising.

    4. Re:Performance? by Glock27 · · Score: 1
      I wonder how much slower my query will be when the data is spread across several machines. I'd imagine that a few complex queries that aren't correctly optimized would bring this system to it's knees rather quickly.

      Total read query throughput will scale with the number of machines in the cluster, given (from the website):

      "The database is distributed and replicated among several nodes and C-JDBC load balance the queries between these nodes."

      For writes, the data must go to every machine replicating the data. It's possible this happens using broadcast TCP/IP or some such technology, so performance might not suffer much there either.

      Very interesting stuff!

      --
      Galileo: "The Earth revolves around the Sun!"
      Score: -1 100% Flamebait
    5. Re:Performance? by jsin · · Score: 1

      Hey, at least I can read the fucking english language:

      "Database clustering is typically used for high-avaliability, not performance"

      You may have noticed a word in the we in the literate world like to refer to as "typically". For those of you who are unfamiliar, this word is not the same as "always" or "absolutely", but instead is used to reference average or common situations.

      Before you freak out and start name-calling maybe you should learn to fucking read...

      ..and by the way, maybe post useful information instead of spiteful waste of time and bytes (or at least have the balls to put your name on your comments).

  8. This is a threat to the big vendors by Jack+William+Bell · · Score: 4, Insightful

    This is a major threat to the big vendors. In fact I would say it is even more of a threat to Oracle than it is to MS! After all MS can continue to go after the midrange market that are are already locked into them for the OS.

    But Oracle shops are dealing with expensive boxes they would love to replace, not to mention expensive Oracle licenses. Often the only reason they use Oracle (other than Oracle salesmen licking their buttholes) is because only Oracle has the horsepower to meet their requirements. Give them a cheaper alternative with the same capabilities and they will bail out faster than you can say 'Geronimo'.

    Expect Larry Ellison to start talking about the dangers of using Open Source software now...

    --
    - -
    Are you an SF Fan? Are you a Tru-Fan?
    1. Re:This is a threat to the big vendors by Lordrashmi · · Score: 2

      People will always by oracle because "No one ever got fired for choosing Oracle". If something goes wrong, you always have someone to blame. With open source, your job is more on the line because you have to take responsability.

      We were using MySQL and it was working fine but somewhere along the line some Oracle salesman convinced someone that Oracle was better and we switched. I have seen some minor good things,but not as assload of $ worth.

    2. Re:This is a threat to the big vendors by Anonymous Coward · · Score: 0

      Hi I would like to call "BS" on your load of mindless drivel.

      Yes, Oracle is expensive. Very expensive. You get what you pay for.

      Now, please prove to me that this new "threat" has all of the features of Oracle and can perform as well if not better.

    3. Re:This is a threat to the big vendors by stu-pendous · · Score: 1

      The boxes do not have to be expensive; Oracle does run on Linux. My firm is looking at RAC on multiple intel boxes running Red Hat.

    4. Re:This is a threat to the big vendors by DavidpFitz · · Score: 3, Informative
      Give them a cheaper alternative with the same capabilities and they will bail out faster than you can say 'Geronimo'.
      But there isn't anything close to Oracle when it comes to availability/reliability etc. And, even if there was IT managers would not go for it for some years because it's not proven in the enterprise. Oracle is so embedded into management brains, and it's reputation is well deserved.

      If you want to cluster Oracle, use Oracle RAC (Real Application Clusters). It's based on Parallel Server so is mature enough to put forward for consideration... and even then it might be eschewed from above. Cheap databases are not going to ring the bells of the people with the say-so simply because Oracle (and DB2 etc) are proven over the years, and the cost of losing your data because you went for the cheap option is going to lose your company a lot of money, and you your job!

      Technically better, cheaper and all those good things does not mean better for a business. Databases are predominantly used for *business*, and as such a *business* reason it used when choosing one over another, not technical reasons.

    5. Re:This is a threat to the big vendors by FortKnox · · Score: 3, Insightful

      I have to say this is a major point. This is why you don't see people using open source. If my DB goes down, I call up Oracle, and make them bring someone down here to fix the problem. If my open source DB goes down, I crap my pants and hope to keep my job.

      What does proprietary software have that Open Source doesn't? Insurance.

      The best way to knock over oracle is to start up a company that supports open source for a fee (which is cheaper than running oracle for a year).

      --
      Good quote, too many chars. Seriously, the slashdot 120 char limit sucks!
    6. Re:This is a threat to the big vendors by akadruid · · Score: 1

      It's not your assload of $, it is your job though. And that's why people will buy Oracle. In a way you are getting what pay for - deniability up the chain of command. And as long as it goes high enough then no-one cares about the cost. Their bonuses are more than that!!

      --
      "Those who cast the votes decide nothing; those who count the votes decide everything." (attrib. Joseph Stalin)
    7. Re:This is a threat to the big vendors by mangu · · Score: 3, Insightful
      please prove to me that this new "threat" has all of the features of Oracle


      That's exactly the point. Who needs all the features of Oracle? Maybe the IRS or Mastercard, but the vast majority of Oracle users are getting just one feature: the Oracle reputation that their marketing has built.


      And with all those features comes the big problem of managing them: no matter how small the application is, once you choose Oracle you need a team of experienced DBAs to correctly and reliably configure the system.

    8. Re:This is a threat to the big vendors by the_2nd_coming · · Score: 2, Informative

      MySQL AB and a few Postgre companies.

      they do consultant work for there products.

      --



      I am the Alpha and the Omega-3
    9. Re:This is a threat to the big vendors by Anonymous Coward · · Score: 0

      I think you need to come into the real world son and see how many places run Oracle.

      I couldn't agree more: Why buy Oracle if you don't need it?

      But this begs the question: Why do companies purchase Oracle and not OSS software if the OSS software will fit the bill? Maybe, just maybe, the answer is "it won't"

    10. Re:This is a threat to the big vendors by RocketScientist · · Score: 1

      Given the level of help I've typically gotten from Microsoft, I'd prefer to use real insurance: backups. Lots of backups. If your only recourse in the event of a database going down is to call the vendor, you'd best just start working on the resume now.

      And I can back up a MySQL database and offsite/onsite copy the tapes as necessary, just like SQL Server or Oracle. Generally I can start a server rebuild/restore in less than the time it takes to give some level one tech support asshat my phone number for the tenth time anyway. I've got verified documentation for setting up the server, and a good stream of verified backups. If you don't have that level of documentation and tape, you're just hosed anyway.

    11. Re:This is a threat to the big vendors by Anonymous Coward · · Score: 0

      But who is responsible for broke code? Surely not the consultants.

    12. Re:This is a threat to the big vendors by zsmooth · · Score: 1

      Until someone can come up with an open source solution even vaguely resembling RMAN, Oracle has nothing to worry about.

    13. Re:This is a threat to the big vendors by Anonymous Coward · · Score: 0
      But this begs the question: Why do companies purchase Oracle and not OSS software if the OSS software will fit the bill? Maybe, just maybe, the answer is "it won't"

      "It won't" is the answer in some cases but in many cases it it's a combination og ignorance of the state of OSS DB's, the "feel good" factor of running Oracle and Oracles effective sales team. I've worked at one company the switched from MySQL to Oracle simply because someone high up was dazzled by Oracles sales pitch. The database we where running was extremely trivial and Oracle gained us nothing but headache and huge bills. I know for a fact that there are several sites out there running Oracle which could switch to Postgresql with no loss in performance or features.

    14. Re:This is a threat to the big vendors by mobiGeek · · Score: 2, Informative
      If my open source DB goes down, I crap my pants and hope to keep my job.
      Oh please! If you throw 1/2 as much money at one of a number of support organizations, you'd at least as guaranteed of the same uptime and probably able to push for enhancements that you won't get otherwise.
      --

      ...Beware the IDEs of Microsoft...

    15. Re:This is a threat to the big vendors by AlecC · · Score: 2, Insightful

      The best way to knock over oracle is to start up a company that supports open source for a fee (which is cheaper than running oracle for a year).

      Which is exactly what MySQL AB does for MySQL. Their support is not particularly cheap, (though I be that it is a lot less than Oracle's), but I recommend it highly. The original designers are still leading the development/support team (is that true for many of the alternatives?) and make a living *only* because of their superior product, not because some salesman conned the management.

      (You may gather that I am a fan of MySQL).

      --
      Consciousness is an illusion caused by an excess of self consciousness.
    16. Re:This is a threat to the big vendors by haystor · · Score: 1

      I just can't believe slashdot these days.

      This is business. Take $10k and see what you can get with Oracle vs free databases.

      The proper comparison for the price will be clustered Postgres vs a lone Oracle server (neither with support).

      How do they compare at this level?

      This clustered solution doesn't have to compare to the best solution ever by anyone. It just has to compare favorably against those products in its price range.

      Moving to Postgres from Oracle would be asking someone to accept more risk in return for thousands and thousands of dollars. For some companies that's the difference between being a 3 man shop and a 4 man shop.

      The business world existed and got along quite nicely with paper records for quite some time. Paper records got lost all the time and business went along nicely. The same can be said today, if we were to lose 8-24 hours worth of data it would be bad, but not catastrophic. Insurance against such an even would probably cost a hell of a lot less than Oracle licensing.

      Not every database needs to be 12TB and accessible by 2 million users 24/7.

      The implicit argument for Oracle is that cost is no matter. Well then, I suggest you hire 12 people to each independently carve your data into stone as data loss there would be minimal.

      --
      t
    17. Re:This is a threat to the big vendors by valisk · · Score: 5, Insightful
      People will always by oracle because "No one ever got fired for choosing Oracle". If something goes wrong, you always have someone to blame. With open source, your job is more on the line because you have to take responsability.

      Prior to Oracle taking off in a big way people used to say:

      People will always by IBM because "No one ever got fired for choosing IBM". If something goes wrong, you always have someone to blame. With the Seven Dwarfs (the common name for IBMs competitors back then), your job is more on the line because you have to take responsability.

      Then Larry E. shamelessly put together a cool SQL database which copied every major innovation IBM had made and added in a few more for good measure. He also cut the price by a third, IBMs database customers deserted in droves, after all if this Oracle thing turned out to be shit, they could always get IBM to come clean up the mess. It turned out though, that Oracle wasn't and isn't shit.

      That does not mean that Oracle is immortal and will always be top of the pile, Postgres now replicates almost all of the major features and is proven in the reliability stakes, tools like this are only going to make it more likely that corporate data departments will dip their toes into the Free software waters, after all if it turns out to be shit, they could always get Oracle to come clean up the mess.

      --

      Economic Left/Right: -0.62
      Social Libertarian/Authoritarian: -3.69
    18. Re:This is a threat to the big vendors by leviramsey · · Score: 4, Informative

      Josh, know what you're talking about before you post. MySQL (the company which does the vast majority of development of MySQL) offers a variety of levels of support and consulting, regardless of the number of systems that you admin. For $48,000/year, you get:

      • Access to the entire development team 24x7x365, with a guaranteed response within 30 minutes
      • Ability to request developers by name
      • Just about every issue is supported (from APIs to configuration to OS, kernel, library, and filesystem dependencies to custom compiles, to recovery, to tuning and so on)

      Does Oracle match that for the price?

    19. Re:This is a threat to the big vendors by rob_from_ca · · Score: 1

      Yeah. Who needs all those Oracle "features". Like, you know, data integrity. All that money, just so you can keep your data from being corrupted...what a waste! Who needs stored procedures, triggers, rock-solid backup/restores, and referential integrity?

    20. Re:This is a threat to the big vendors by grassy_knoll · · Score: 1

      you might be surprised that most of our clients need oracle. Our accounting system (for instance) would grind to a halt without range partioning ( damn near did... s'why we reconfigured the db to use it ).

      as to smaller canned applications that require oracle, we combined many of their databases into a single oracle instance.

    21. Re:This is a threat to the big vendors by Lordrashmi · · Score: 1

      In the long term maybe, but in the short term the big vendors still win.

    22. Re:This is a threat to the big vendors by davidkw · · Score: 1

      This not a threat to Oracle. Oracle released 9i 18 months ago, with FULL support for Red Hat Linux, on Intel boxes. Oracle's new paradigm is cheap equipment on Linux OS, with up to 64 nodes in the cluster. This is scalable, highly available and the OS/hardware are cheap (about 2.5K/node. No. Oracle already read this one... I'll still choose Oracle, but with Linux as the OS and Intel as the hardware and keep my job.

      --
      DKW
    23. Re:This is a threat to the big vendors by valisk · · Score: 1
      Maybe, but one day they will suffer from the same effect IBM did, and see half their customers disapear overnight after watching a few 'headline' customers try out the alternatives and realise that they were capable of doing the job.

      When by hiring a decent DBadmin who has experience with both types of RDBMs and can write a script to convert all that pretty Oracle specific SQL into ANSI-92 compliant SQL and feed it into whatever server you like without shovelling hundreds of thousands of dollars into Larry's/IBM's/Msoft's pockets for processor licenses plus per user deals etc. it begins to look like a seriously tempting proposal, even if only for a small, say, largely unimportant database, like the internal telephone number directory etc etc.

      --

      Economic Left/Right: -0.62
      Social Libertarian/Authoritarian: -3.69
    24. Re:This is a threat to the big vendors by pmz · · Score: 1

      But who is responsible for broke code? Surely not the consultants.

      You should read a commercial software EULA some time, because you might be in for a suprise. I'd bet you have about as much recourse for broke code with Oracle as you do with MySQL.

      The people who claim that an angel from Oracle decends down to magically fix their problems probably aren't aware that their boss just signed away another $10,000 in support fees. Running a fully supported Oracle server is expensive!

    25. Re:This is a threat to the big vendors by nojomofo · · Score: 1

      You're right. Oracle isn't for everybody. But lots of companies do need high availability. Lots of companies do have hundreds of gigabytes of data, and need the performance that Oracle can provide. For lots of companies, the tens of thousands of dollars that Oracle costs SAVES them money in run-time and maintenance. Nobody is telling you that YOU need to go out and buy Oracle, but similarly, you don't know what my database needs are, so you're not in a position to tell me that postgres + RAIDb will fit my needs.

    26. Re:This is a threat to the big vendors by Anonymous Coward · · Score: 0

      "Thousands of dollars" of risk is really nothing if your data is critical. Even for small businesses. The customer list at the corner hardware store is worth more than $10K.

      I would suggest that anyone who requires the 100% uptime given by clusters, by definition has the dough to buy a commerically supported product.

      What you are suggesting is that you can be a skinflint and save a tiny amount cash by being one of the six people betatesting clustered Postgres -- which means that you basically have to dedicate an employee to do that. Meanwhile, there are numerous firms that will install and manage Oracle/Sybase/Microsoft for you. That is the difference between 4 and 3 employees.

    27. Re:This is a threat to the big vendors by ianjk · · Score: 1

      How many IT managers and DBAs are going to dump thier reliable, proven system for some open source alpha software? I don't see this making much of an impact, yet.

    28. Re:This is a threat to the big vendors by Jason+Earl · · Score: 1

      As my boss says, "you don't need an aircraft carrier to go bass fishing, but it certainly helps if you need to land fighter jets out at sea."

      Most people don't need to "land fighter jets at sea" and for those folks PostgreSQL gives you most of Oracle's neat features at a very low price. A redundant set of PostgreSQL servers can even get you Oracle-like availability at a much lower price.

    29. Re:This is a threat to the big vendors by vallee · · Score: 1
      This is just no longer so - maybe this was true in 1998 but it's certainly not true now.

      Tactical outsourcing is the easy, inexpensive, reliable and reproducible solution to this problem.

      For example, you could hire Pythian. We outsource Oracle DBA and make running and managing Oracle into the long-term completely turn-key. We run some of the largest and most challenging Oracle environments in the world, including distributed architectures and shops where the cost of downtime is 5-figures per hour. We run some shops that have in excess of 50000 simultaneous users, we manage publically traded manufacturing shops, dot-coms, health care companies (including HIPAA-protected data) and we're used by other outsourcers to fill this gap. That being said, we're willing to take on any Oracle shop, from part-FTE (lots of satisfied customers) to multi-FTE (lots of satisfied customers).

      This "problem" with choosing Oracle is licked, not only by Pythian but also (to a lesser degree) by Pythian's competition - we're not alone in this industry. Pythian, along with other industry leaders such as Tusc and DBADirect are working competitively to completely reinvent the production engineering challenges (and costs) associated with running Oracle. :-)

      Cheers,

      Paul

      --
      The real Paul Vallee is slashdot userid 2192, and, what do you mean it's not cool to point out your low userid?
    30. Re:This is a threat to the big vendors by maxpublic · · Score: 1

      Seems to me it'd be more cost-efficient to hire an admin who was experienced with MySQL or Postgres. That way you have support on-site from an actual expert; no expensive contracts required, no waiting, no getting screwed when the 'support' is sub-standard or the company that provides the support demands more money.

      Max

      --
      My god carries a hammer. Your god died nailed to a tree. Any questions?
    31. Re:This is a threat to the big vendors by Alidar · · Score: 1

      I couldn't agree more. Where I currently work I have a ton of projects running in PostGres, anyone who says it has all the features is, well, just wrong.

      Of course I complain, spout the advantages of a better RDBMS and then almost shoot myself because the other half of my department (don't ask why we are split down the middle) use nothing but Oracle.

      I keep chanting Ours is not reason why ours is but to do and die over and over and over.

      --
      HTTP Status 418
    32. Re:This is a threat to the big vendors by Jason+Earl · · Score: 1

      Yes, but if a large portion of the people that previously bought Oracle find they can get by with PostgreSQL then you will find that their decision does effect those of you that stick with Oracle. The people that don't need all of Oracle's neat features are effectively subsidizing the product for those of you that do. If these users are siphoned off by another product then either your prices will rise, development will slow, or Oracle will die.

      That's one of the reasons that techies tend to be advocates for the tools that they know and use. After all, all of us have seen what happens to tools that fall out of favor. We don't want the technologies that we have invested our time and money in to become the next SCO.

    33. Re:This is a threat to the big vendors by mark_lybarger · · Score: 1

      first off, oracle on a desktop type machine (p-4, 2gb ram, 200gb hdd) would be a complete waste. what is the oracle licensing for a linux installation of this sort? try adding another processor to the box. what happens to the license cost? IIRC, just the license for oracle STARTS at 5k and goes upwords rapidly. i'd imagine most shops are spending more than 50k on thier oracle software.

      people don't run oracle (legally) who have a budget. those who have a budget also have $$ for hardware. they're going to be spending lots of money on storage devices that are redundant themselves. their E10k will not go down. they don't measure uptime 9's, they only measure downtime in 0's.

      so, yes, while the boxes don't have to be expensive, if you're buying a ferrari, chances are you'll have someplace nice to park it.

    34. Re:This is a threat to the big vendors by pmz · · Score: 1

      ...with up to 64 nodes in the cluster.

      I wonder if Oracle is still motivated by per-CPU licensing. Even if the hardware and OS are $2,500 per node, will Oracle still be another ten grand per node? And it would probably be wise, for responsiveness/throughput, to have two CPUs in each node. This can get expensive very quickly.

      Add the fact that you still need a paid Oracle DBA to keep everything running smoothly, and I'd bet the real savings of using a Linux cluster is relatively small. Add annual support for Linux itself, and the relative savings are even smaller.

      I think the people who feel they got burned by Sun or IBM or H-Paq on hardware costs naively caved into their salespeople and want to steer the blame somewhere else. Two or three dual-CPU Sun Fire servers, for example, can push enough throughput for most applications I can imagine for any but the largest businesses. How many Googles and NYSEs are there in the grand scheme of things?

      SPARC/Power/PA-RISC/MIPS vs. x86/Linux is, in reality, a pretty small part of the big picture, IMO. All this versus Opteron, however, is probably a different story (AMD really knows how to stir the pot, don't they!).

    35. Re:This is a threat to the big vendors by FatherOfONe · · Score: 1

      I agree with what you say, and it makes me think that I need to test my disaster recovery plan; but lets say that you have some XML transfer or DB trigger that causes that database to die. A restore won't fix that problem, you may not even know what the heck caused the problem.

      Most closed source databases are expensive. I like to think that most people can pick the best database that works best for their company. That could be MySQL, PostGreSQL, FileMaker, Access, DB2, Oracle whatever...

      In my opinion this new software will help MySQL and PostgreSQL, but not at the cost of Oracle. Most people that buy Oracle know what they are getting. New Oracle customers are looking for a solid reliable database, that offers a bunch of features, and they generally want to minimize their risk. Open Source database users are generally risk takers AND in my experience very cost sensative. It is my belief that the companies that have the most to loose by this are products like FileMaker, Microsoft Access and possibly Microsoft SQL server. Customers of those products tend to be far more price sensative. Although most Microsoft people tend to want to stay "ALL Microsoft", they tend to even brag about being a "Microsoft Shop"....

      --
      The more I learn about science, the more my faith in God increases.
    36. Re:This is a threat to the big vendors by Malcontent · · Score: 1

      "I call up Oracle, and make them bring someone down here to fix the problem."

      I call bullshit. Unless you are a fortune 500 company AND are paying more then a hundred thousand dollars per year in support costs then there is no way Oracle is going to send someone over there to fix anything.

      --

      War is necrophilia.

    37. Re:This is a threat to the big vendors by haystor · · Score: 1

      First, I know I rambled in my previous statement.

      Second, I'm assuming that the clustering solution doesn't just suck.

      What I'm suggesting is that a clustering solution would improve uptime for Postgres at the marginal cost of another machine. A cost which still wouldn't put it into the same price range as Oracle but (for some) would reduce the risk enough to justify moving to it.

      Of course, I'd still like to see things carved into stone.

      --
      t
    38. Re:This is a threat to the big vendors by the_2nd_coming · · Score: 1

      consulting for their product does not mean recource. sheesh.

      anyway...duh...that is the whole point about arguing that there is some one to blame.

      --



      I am the Alpha and the Omega-3
    39. Re:This is a threat to the big vendors by iplayfast · · Score: 1

      I was at a Linux conference yesterday (in Toronto). Oracle had a presentation. They now support Linux.

      If you have a problem with your database you phone Oracle, and they talk you through it. If it turns out to be an OS problem, then they tell you go talk to your OS vender except when it's Linux. If it's Linux they will deal with it directly.

      I was very impressed. They are moving their whole company onto Linux and are more then 50% there now.

    40. Re:This is a threat to the big vendors by afidel · · Score: 1

      Yep list price is $10K per cpu + some min number of client liscenses at $SOME_CRAZY_COST.

      --
      There are 4 boxes to use in the defense of liberty: soap, ballot, jury, ammo. Use in that order. Starting now.
    41. Re:This is a threat to the big vendors by Lordrashmi · · Score: 1

      Hey I am all for the little guys winning or atleast being good enough that the big guys quit making shit.

      I would love nothing better (Well besides two hot brunettes) then to be able to tell oracle to shove it. They throw you over the barrel and rape you, and as long as management wants oracle you have to take it.

    42. Re:This is a threat to the big vendors by egghat · · Score: 1

      Don't want to be knit-picking,but ...

      with Oracle you won't get any kind of insurance. Read their "EULA" for details. The only thing you have with commercial software is "someone to blame". (OK, of course you can buy any kind of support from most commercial vendors, but you must pay a lot of money and the only thing you get is, that their support tries harder (e.g. faster)). Try to set an agreement with Oracle of someone else where they pay you any money you lose from their faults.

      Say: 1 hour of downtime of your Oracle DB --> 60 people can't work, can't sell anything, can't help any customer; damage: 100K. Will Oracle pay this? Never. If 1 day of Oracle downtime ruins your business, will Oracle pay this? Never.

      (Oracle is just an example here. Almost anything you can buy does have this problem of limited guarantee. There lies one of the big advantages of Open Source. It's like an old car that you can repair for yourself).

      Bye egghat.

      --
      -- "As a human being I claim the right to be widely inconsistent", John Peel
    43. Re:This is a threat to the big vendors by jpa5n · · Score: 1

      Good open source tools have 3rd party support, just like good closed source tools.

      I can get Cisco/Oracle/MS/whatever corporate folks to support me directly or get an authorized partner to do the support. Open source projects that make headway in the enterprise either offer "vendor" support (eg MySQL AB, JBoss) through a consulting arm or through good grassroots 3rd party support that just pops up when the market can bear it.

      The best way to knock over oracle is to start up a company that supports open source for a fee

      MySQL AB does this already. Postgres has a similar but smaller plan in place. RedHat is another example in a non-db arena.

      I see people using open source they can get support for -- it's that simple. JBoss is #3 depending on who you talk to. MySQL is gaining ground. Plenty of security tools are in the enterprise as well.

    44. Re:This is a threat to the big vendors by FortKnox · · Score: 1

      I was meaning more "Job Insurance" not software insurance.

      --
      Good quote, too many chars. Seriously, the slashdot 120 char limit sucks!
    45. Re:This is a threat to the big vendors by carlos_benj · · Score: 1

      I think you need to come into the real world son and see how many places run Oracle.

      That's not even pertinent to the point the parent was trying to make.

      Why do companies purchase Oracle and not OSS software if the OSS software will fit the bill? Maybe, just maybe, the answer is "it won't"

      Most of them buy it because of the application they need being built on it. Most of Oracle's sales go to folks who aren't developing their own databases, but to folks that are running SAP or PeopleSoft or some other package with Oracle as the back end.

      --

      --

      As a matter of fact, I am a lawyer. But I play an actor on TV.

    46. Re:This is a threat to the big vendors by sqlgeek · · Score: 1

      Ok, caveat emptor -- I make my living as an Oracle developer. Now given that, here're some features that are used very widely among Oracle customers.

      1. The Optimizer -- when your data changes queries against it are parsed differently to give you remarkedly optimal access strategies. This removes well over 90% of the hard work in writing scalable sql.

      2. PL/SQL -- this is a sql-based procedural language that makes writing stored procedures very easy and very transparent.

      3. SQL -- Oracle implements SQL features that make my life far easier than it would be were I working against _any_ other database. There are analytic functions that return result-set-level data at the level of each individual row. There are scalar queries that transparently perform as simple scalar values, even allowing one to select their contents. For that matter, one can craft a query that contains data for a row as well as additional non-scalar cursors related to it.

      4. Objects -- One can create views that store their query results (materialized views) for performance reasons. One can define the hierarchical relationships between different attributes of a table (dimensions) so that the optimizer knows when it can re-write a query and substitute existing aggregate data (materizlized views, above) for more granular data. One can create a queue in the database that supports asynchronous (jms-compliant) messaging in either a push/pull model or a publisher/subscriber model -- these queues can then be associated with one another across different databases to obtain asynchronous messaging between them.

      And I'm sure that I'm missing painfully obvious features that are comparable importance. This, quite simply said, is why customers use Oracle. This is why postgres is far, far closer to surplanting Oracle or DB2 than a bit-bucket like mysql.

      Cheers,
      Scott

    47. Re:This is a threat to the big vendors by Anonymous Coward · · Score: 0

      This is probably the typical oracle machine.

      You might have a 2 way or 4 way xeon or ultrasparc, but most of those shops wouldn't notice (and might have a dual pentium pro 200 sitting next to it that *feels* like more machine to them because it has a big old hotswap.)

    48. Re:This is a threat to the big vendors by Anonymous Coward · · Score: 0

      a 64 node cluster of linux boxes at $2500 each is still a savings of $1.74 million over a
      36 way Sunfire 15K.

    49. Re:This is a threat to the big vendors by fyonn · · Score: 1

      You should read a commercial software EULA some time, because you might be in for a suprise. I'd bet you have about as much recourse for broke code with Oracle as you do with MySQL.

      interestingly I was at a meeting the other month where it transpired that there is a recent version of oracle out there that plain refuses to run on a machine with 2 ip addresses.

      who let *that* showstopper of a bug out of the door? how much does oracle cost again? :)

      oracle did at least admit to that one and promise to fix it, don't kow if they have done so yet though.

      dave

    50. Re:This is a threat to the big vendors by Anonymous Coward · · Score: 0

      Oh, and if you add in the Oracle licenses at $10K/CPU you're linux solution still comes out at half price, for 4 times as many cpus.

    51. Re:This is a threat to the big vendors by egghat · · Score: 1

      Oh, OK,m I got that one wwong. Then my comment was somewhat offtopic M;-)

      Bye egghat.

      --
      -- "As a human being I claim the right to be widely inconsistent", John Peel
    52. Re:This is a threat to the big vendors by PCM2 · · Score: 1
      This is a major threat to the big vendors. In fact I would say it is even more of a threat to Oracle than it is to MS!
      Maybe -- except that Oracle has been the one aggressively pushing clustering applications for the last few years, with its Real Application Clustering (RAC). If anything, most admins are going to view this as the open source community trying to catch up with Oracle, not leaping ahead.

      In reality, though, this actually sounds closer to IBM's implementation of database clustering than Oracle's. Oracle uses a "shared disk" clustering system -- many database servers pulling information from the same data store. In IBM's implementation, each database server is only responsible for part of the data, and transactions are distributed across the various servers based on what data needs to be accessed. It's known as "shared nothing" clustering.

      IBM is also much more focused on heterogeneous systems than Oracle is. Oracle wants all your databases to be Oracle databases. IBM, on the other hand, is developing products that will give you a front end that looks like IBM DB2, where the back end can be virtually anything -- competitors' databases, filesystems, even Web services. (That's the claim anyway.)

      You can read some more about this, and the competition between Oracle and IBM for the database clustering market, here. (Yes, I wrote it.)

      Expect Larry Ellison to start talking about the dangers of using Open Source software now...
      He won't do that. That's not his (or Oracle's) style. Oracle's attitude is that Oracle will support anything. The idea is to sell expensive Oracle products to the business line managers with the deep pockets. Then they reassure everybody by emphasizing that going with Oracle won't alienate anybody -- even those developers in your organization who are dickering around with open source. "Open source is great, and your expensive Oracle database will accommodate it just fine."
      --
      Breakfast served all day!
    53. Re:This is a threat to the big vendors by pmz · · Score: 1

      a 64 node cluster of linux boxes at $2500 each is still a savings of $1.74 million over a
      36 way Sunfire 15K


      The Sun Fire 15K can maintain 43.2 GB/sec bandwidth connecting all the CPUs (see here). In a cluster, a small multiple of 0.128 GB/sec for Gigabit Ethernet is about all you'd get.

      Regardless, you missed my point, because I explicitly mentioned Sun's low-end servers (2 CPUs). Even though these do cost more per node than the PCs, the cost difference is not nearly as dramatic as you were trying to make it out to be.

      Also, don't forget that storage costs are basically constant between the systems. The only real variable is the cost for each CPU-RAM bundle. Hardly groundshaking.

    54. Re:This is a threat to the big vendors by Anonymous Coward · · Score: 0

      Wow, pythian, that pythian post pythian sounded pythian a lot pythian like a pythian big ad pythan.

      Seriously though, what problem did you solve exactly? You're an Oracle support outsource company, how exactly does that solve the problem of paying out the ass for liscensces and support for Oracle? Sounds to me like you're an extra cost on top of all the other associated with Oracle.

      And by they way, pythian pythian pythian. I didn't think I said it enough above.

    55. Re:This is a threat to the big vendors by fucksl4shd0t · · Score: 1

      Two or three dual-CPU Sun Fire servers, for example, can push enough throughput for most applications I can imagine for any but the largest businesses. How many Googles and NYSEs are there in the grand scheme of things?

      Um, last time I checked, Google was running their supercomputer on commodity Intel 32-bit hardware with Linux as the OS. No idea what their database is, though. I imagine it's OSS, but they could've switched it out to Oracle or something after they started making money. I think I read recently that they were starting to look at Itanium, but were going to wait for AMDs 64-bit processor before making a decision. (Do YOU want to recompile 60,000 databases for Itanium?)

      --
      Like what I said? You might like my music
    56. Re:This is a threat to the big vendors by pmz · · Score: 1

      ...at half price, for 4 times as many cpus.

      Like I said in my other reply, you completely flew by the point I was making and began foaming your myopic zealot-speak.

      Again, my point is that there are so many costs involved that the cost of the hardware itself isn't that big of a deal. Unless, of course, a business is so unsucessful that even buying donuts for the break room requires issuing bonds.

    57. Re:This is a threat to the big vendors by Anonymous Coward · · Score: 0

      Google is a lousy example -- they use a custom in-memory database that stores temporary, disposable data. I doubt they they have any of the 4 letters in ACID.

    58. Re:This is a threat to the big vendors by Anonymous Coward · · Score: 0

      Give me a break, he totally mentions his competition man, and the post he's replying to is about thd dba problem, not the license problem.

    59. Re:This is a threat to the big vendors by Anonymous Coward · · Score: 0

      No, I mean in a situation where shoddy code caused data loss.

      Hint: You probably can't sue the OSS developers but maybe you can sue a large corporation and/or get discounts in the future.

    60. Re:This is a threat to the big vendors by leandrod · · Score: 1
      > there isn't anything close to Oracle when it comes to availability/reliability

      Yes, there is: IBM DB2. And NCR Teradata, BTW.

      > even if there was IT managers would not go for it for some years because it's not proven in the enterprise

      IBM DB2 not proven in the enterprise? Perhaps the new generation in the golf club isn't old enough to appreciate IBM quality when they see it...

      > it's reputation is well deserved

      You mean, the reputation of being a resource hog, difficult to manage, too expensive, and not SQL compliant? There is a point to be made that complexity hinders reliability, performance, etc.

      --
      Leandro Guimarães Faria Corcete DUTRA
      DA, DBA, SysAdmin, Data Modeller
      GNU Project, Debian GNU/Lin
    61. Re:This is a threat to the big vendors by Coz · · Score: 1

      Finding a bi-lingual database admin is tough - finding one who understand the semantics of the different engines is tougher - finding one who can speak more than two database lingos is almost impossible, and when you do find them, they're already working for someone else and making MUCH more than you can afford to pay them.

      It's nigh unto impossible to do a mechanical conversion of Oracle-specific PL/SQL triggers into something SQL92. AND, last time I checked, MySQL didn't support all of SQL92 (almost nobody does). Your conversion has to be knowledgeable about both source and target database flavors, while using SQL92 plus extensions (probably made up by whoever's doing the converting) as an intermediate representation.

      It's hard enough converting between major revisions of Oracle (say, 7.1 to 9i), and they provide support for that. Noone that I'm aware of provides support for migrating away from their own product.

      --
      I love vegetarians - some of my favorite foods are vegetarians.
    62. Re:This is a threat to the big vendors by leandrod · · Score: 1
      > by hiring a decent DBadmin who has experience with both types of RDBMs and can write a script to convert all that pretty Oracle specific SQL into ANSI-92 compliant SQL and feed it into whatever server you like without shovelling hundreds of thousands of dollars into Larry's/IBM's/Msoft's pockets

      You obviously never worked with data?

      The problem with Oracle isn't the SQL commands. It is data types. Rewriting the application is bad enough, but migrating to ANSI SQL data types can be a nightmare.

      --
      Leandro Guimarães Faria Corcete DUTRA
      DA, DBA, SysAdmin, Data Modeller
      GNU Project, Debian GNU/Lin
    63. Re:This is a threat to the big vendors by Coz · · Score: 1

      Doesn't SAP sponsor the open-source SAPDb?

      --
      I love vegetarians - some of my favorite foods are vegetarians.
    64. Re:This is a threat to the big vendors by Anonymous Coward · · Score: 0

      Try more like $1M a year. At least that's what I was told by the IT management.

      That's how much it cost us to have phone support, patch support and access to developers for code customization, which took forever. We had Oracle DBA's on site full time at an additional $250.00's an hour for over two years. We had a total of four on site. That's what you call rolling through the cash.

      That company is now bankrupt. The IT department spent $50M over a three year period. Preformance was stellar, upper management were idiots. They couldn't get the newly aquired corporations to fall under the new ERP system in a timely fashion.

    65. Re:This is a threat to the big vendors by real_b0fh · · Score: 0

      pfft even PGSQL have all the features you cited. The real beef from oracle is the OPS (parallell server) or whatever they call it these days. Ah btw pgsql is usually faster than oracle (for me at least).

      Go learn something before you state such utter bullshit. pgsql is no fscking toy. IMO its on par with oracle and way superior to the shitty ms-sql-server.

      ciao

      --
      "Contrary to popular belief, UNIX is user friendly. It just happens to be selective on who it makes friendship with"
    66. Re:This is a threat to the big vendors by Anonymous Coward · · Score: 0

      well, well, why dont you run down to my data center and check the backup drive.. oh, wait, you are in another time zone. damn.

    67. Re:This is a threat to the big vendors by Malcontent · · Score: 1

      A mil per year on support just for the database!. Why is it that the most moronic people always end up in management.

      --

      War is necrophilia.

    68. Re:This is a threat to the big vendors by valisk · · Score: 1
      I have worked with data, though not in a big way, and I have malingered on the edges of the OpenACS project where they converted an Oracle based project to Postgres, and from time to time they convert old ACS customers data from Oracle SQL to ANSI without too much trouble.

      I realise that such simplicity is born of experience and that there are some major differences and pure nightmares to be overcome, but the conversion process is possible and almost certainley won't cost as much as an Oracle license.

      --

      Economic Left/Right: -0.62
      Social Libertarian/Authoritarian: -3.69
    69. Re:This is a threat to the big vendors by Ian+Bicking · · Score: 1
      To all the other repliers who are saying "But PostgreSQL/MySQL doesn't have feature X" or "PostgreSQL/MySQL is still just playing catch-up", I think you're missing the point.

      Yes, they are playing catch-up (MySQL even moreso). But they long ago surpassed Oracle in other areas. While I don't say this from experience, it seems clear that administration and management of either of the free databases is far easier than Oracle. They also scale in a way that Oracle does not -- they scale down.

      The OSS databases aren't going to beat Oracle at the reliability, enterprise, etc. They may be able to be good enough, which is what's important. They already have beaten Oracle at maintainability.

      The same is true of Linux as well. Linux isn't more reliable or more powerful than the proprietary Unices. But it's far more manageable, easier to administer, and it scales down better than those other operating systems. Those other systems are dying... sure, they'll be around for a long time, but they've become a legacy. No doubt Oracle will be around for a very long time, but if PostgreSQL or MySQL can become good enough in Oracle's arena, then Oracle runs a serious risk of becoming a legacy database.

    70. Re:This is a threat to the big vendors by grugruto · · Score: 1
      The Sun Fire 15K can maintain 43.2 GB/sec bandwidth connecting all the CPUs

      That is really nice but in a database engine, a query is usually handled by a single thread and you don't need to communicate with another thread on another cpu. Therefore this bandwidth is useless.
      What would be more interesting is the IO bandwidth of the machine and how many Ethernet adapters you can plug in. Can you plug 36 Ethernet adapters to handle the same aggregated bandwidth as a cluster does?

    71. Re:This is a threat to the big vendors by leandrod · · Score: 1
      > from time to time they convert old ACS customers data from Oracle SQL to ANSI without too much trouble.

      You have to consider two factors:

      One, PostgreSQL is not ANSI SQL, but a compromise between ANSI and Oracle SQL. For example, it has add-ons for CONNECT BY and PL/SQL. So it is an easier migration target than ANSI SQL.

      Two, ACS, being WWW and TCL based, is string oriented. Thus data types and the such are much less of an issue.

      In general business applications such migration would be much harder.

      --
      Leandro Guimarães Faria Corcete DUTRA
      DA, DBA, SysAdmin, Data Modeller
      GNU Project, Debian GNU/Lin
    72. Re:This is a threat to the big vendors by valisk · · Score: 1

      Thats fair comment Leandro, but once we do get a truly reliable open source rdbms, and people are convinced of it's worth, the pain involved in a once only migration process will be mitigated by the long term cost savings, or Oracle will reduce their license fees to something that allows it's big customers to feel that the cost of moving away simply isn't worth the trouble. I believe that things will reach that point in 2-3 years, then we will see fireworks.

      --

      Economic Left/Right: -0.62
      Social Libertarian/Authoritarian: -3.69
    73. Re:This is a threat to the big vendors by pmz · · Score: 1

      Can you plug 36 Ethernet adapters to handle the same aggregated bandwidth as a cluster does?

      The 15K has up to 72 PCI slots on 18 channels, according to the Sun website. They say the overall maximum I/O bandwidth is 21.6 GB/sec, which is more than adequate for a full complement of Gigabit Ethernet adapters. However, I'd bet a few slots are needed for FibreChannel to a storage array, so perhaps only a maximum of 60 or so Ethernet adapters could be installed.

      After seeing this Slashdot article, I went and read the RAC whitepaper at Oracle's website. Their cluster architecture is more compelling than I anticipated. It literally uses the cluster nodes just for CPU/RAM and uses a shared storage model (e.g., FibreChannel SAN).

      Clustering still does not allow for ultra-cheap hardware, however, because ECC on RAM and busses is critical for basic integrity. The $2,500 per node numbers floating around above are very optimistic, where $4,000 to $8,000 per node looks more likely (based on PenguinComputing's site and Sun's on-line store). Opteron-based servers are a shoe-in for an Oracle cluster. The latest Sun Fire V210 servers are also worth considering, because each one comes with four built-in Gigbabit adapters and an optional SSL accelerator for client connections.

    74. Re:This is a threat to the big vendors by carlos_benj · · Score: 1

      When they presented to our company they only mentioned Oracle and Informix (and maybe DB2 but I don't remember).

      --

      --

      As a matter of fact, I am a lawyer. But I play an actor on TV.

    75. Re:This is a threat to the big vendors by leandrod · · Score: 1
      > once we do get a truly reliable open source rdbms, and people are convinced of it's worth, the pain involved in a once only migration process will be mitigated by the long term cost savings

      Agreed, but there are some points:

      SQL does not an RDBMS make; we are still waiting for a relational, free software offering, today the only one is proprietary, Alphora Dataphor;

      Reliability is not the only need. Other is for ANSI SQL conformance. Many people will be wary for exchanging a not-quite-SQL DBMS for another only slightly more SQL compliant, and will prefer to migrate to something standards compliant so they could go IBM DB2 or SAPdb or FireBird or whatever should PostgreSQL show cracks later;

      All this will be a process. If it happens at all, no one will be able to say, 'This is the year of the LAN^H^H^Hfree software DBMS'.

      --
      Leandro Guimarães Faria Corcete DUTRA
      DA, DBA, SysAdmin, Data Modeller
      GNU Project, Debian GNU/Lin
    76. Re:This is a threat to the big vendors by rob_from_ca · · Score: 1

      Can't believe I'm bothering...

      I didn't say PgSQL was a toy, I was just pointing out that Oracle advantage is in far more than just marketing; I've used it before and I will use it again, it's certainly much closer to Oracle than MySQL.

      However, while it does have some of those features I mentioned, it doesn't have them as fully implemented as Oracle. Stored procedures can't return cursors, highly limiting their usefulness (at least last time I checked). Referential integrity is missing a few important peices for making your data model robust in the face of lously clients (essential in an enterprise environment), and faith in the ability to perfectly restore the system at any point in time even in the face of a system crash is not so high.

      Anyway, don't get me wrong, PgSQL rocks (as does MySQL for particular tasks), but Oracle it is not.

    77. Re:This is a threat to the big vendors by LadyLucky · · Score: 1
      Preach it brother!

      From someone who with their first deployment of MySQL into a live environment went completely pear shaped, MySQL crashing several times per day. The damned thing doesn't report ANYTHING to the error log, except "I'm starting up again, and oohhh look at all that corrupt data, I hope I can do something about that!". I would never touch the database again, not with a 10 foot bargepole.

      We're dropping that pile of crap faster than you can click the hyperlink on the MySQL website which says it may take up to two weeks to get any kind of support even in the case of an emergency.

      We're now using MSDE for low powered embedded installations that the MySQL crowd had pushed prior to this. Who would have thought, use the Microsoft solution because the open source one doesn't cut it.

      Sorry, it's been a long week of conference calls and VPNs in the middle of the night because MySQL decided to crash once again.

      --
      dominionrd.blogspot.com - Restaurants on
    78. Re:This is a threat to the big vendors by stu-pendous · · Score: 1

      Trust me... Wall Street is getting real cheap right now. They will park their ferrari in a toolshed if they could. Sun hardware is expensive and we are stuck with an Oracle dependancy that will require massive code rewrites to get off of. Its easier to bing down hardware (and datacenter related) costs

    79. Re:This is a threat to the big vendors by mccormick · · Score: 0

      The fact that you mention Tutorial D and Dataphor and "true" RDBMS systems suggests to me that you are a fan of Fabian Pascal and Chris Date. I love their site, both for the humour of their fairly extreme and positions and their shear lack of modesty in telling the establishment that "you're wrong." They also know what their talking about and their writings are 100% lucid. Oh well. 'Tis the world we live in.

      --
      Pete
  9. Quick thru the docs... by Lysol · · Score: 4, Informative

    So a few things come up just reading the docs on this:

    1. A Controller. It looks as tho a single controller is used by the clients to communicate to the various RAID'd dbs. I'm sure there can be multiple controllers since there would be little point to make some db's redundant, yet the access to them not. Still looking into this.

    2. And also, it looks as tho the default port is 1099 - RMI. If you have, for a web app, your EJBs and web app local to that containter, that might not be a problem. However, I happen to have my EJB server on its own box and this might very well cause probs. I think it said you could specify our own ports, but I haven't seen any examples in the docs yet of this being the case. Also, still looking.

    A few other things exist as well which are in the docs as known limitations:
    * XAConnections
    * Blobs
    * batch updates
    * callable statements

    These could be serious issues for some. My last project used CLOBs/BLOBs, batch updates and callable statements, so this would rule that out. Of course, all the db stuff was strictly tied to Oracle, so I think that would rule this all regardless. ;)

    All in all tho, this looks like a good start. As my current project progresses, clustered dbs will become more and more of an issue. I've looked into some other projects out there for Postgres, but nothing yet really satisfactory. I think this is a good step in the right direction - for Java developers. It'll be interesting to watch.

    1. Re:Quick thru the docs... by grugruto · · Score: 2, Informative
      Some answers:

      1. Yes, you can have multiple controllers that synchronizes using group communication. In the driver, you give a list of coma separated host names running controllers. The driver has built-in failover and load balancing among multiple controllers (check the doc here).

      2. Yes, all ports are customizable when you start the controller (check the doc here).

      This is just an alpha version, so as you mentioned, there are still many features missing but it is a good starting point and contributions are welcome (remember it is open source software ;-))!

    2. Re:Quick thru the docs... by StupidEngineer · · Score: 1

      grugruto, from reading your comments, I'm assuming that you're part of the project? Please correct me if I'm wrong. I'm writing these questions on that assumption... If I'm wrong, could anyone affiliated with the project answer?

      Here're a couple other questions that I didn't find answers to in the docs:

      1. Once a controller fails, how do the other controllers deal with this failure? Can controllers be reintroduced into the working set real time?

      2. Is it possible replicate a c-jdbc raiddb across multiple sites? IE, I have c-jdbc raidb running at site A and B (Or even a site C).

      Is a dual active master-master setup possible (reads & writes at both sites replicated)?

      Dual active master-slave (reads at both sites, writes go to one site and replicated to all sites)?

      I would've asked about having a master-slave hotswap, but the RAIDb is supposed to make sure the 'master' never goes down. :)

      Would I have to rely on the replication features of the backend databases? /just thinking out loud

    3. Re:Quick thru the docs... by grugruto · · Score: 1
      "StupiedEngineer(102134)", yes, I am part of the project.

      1. Yes, you can have multiple controllers started at any time that use group communication to synchronize the requests to be sent to the backends. Clients that were communicating with the failed controller automatically redirect their queries to another controller.

      2. The assumption made in C-JDBC is that communications between controllers and backends are fast (like in a cluster environment). What generates inter-controller traffic is write queries. If you have many of them, the performance will go down. If your workload is read-mostly it should scale well.
      If you want to distribute controller on multiple sites, the performance will be dependent of the link speed and reliability between the sites (not talking about the security issues like crossing firewall, the possible need to encrypt data, ...).

      As a side note, you never need to replicate reads, just writes.

  10. Hehehe... by Duncan3 · · Score: 0, Funny

    MySQL - not a database
    + Java - *laughter*
    -------

    My god, someone actually BUILT a cluster-f***.

    ... dont forget the XML!

    --
    - Adam L. Beberg - The Cosm Project - http://www.mithral.com/
    1. Re:Hehehe... by Anonymous Coward · · Score: 1

      Perl Bigot
      + Clueless
      --------------
      Script Kiddy

      How did this post ever even get a score of 2?

    2. Re:Hehehe... by mobiGeek · · Score: 2, Funny
      MySQL - not a database
      Your comments - Not a post

      *laughter*

      --

      ...Beware the IDEs of Microsoft...

    3. Re:Hehehe... by Chainsaw · · Score: 0, Flamebait

      Actually, using "echo" and "Visual Basic" (or maybe "dd" and "Perl")would be a true clusterfuck. MySQL still *is* a database, even if it doesn't support sub-selects and views just yet. Java is easily the least worst language/environment for large server applications that has to be secure and maintained for $BIGNUM years. You don't write this stuff in C++ unless you are slightly insane and/or simply dislike Java.

      --
      War is one of the most horrible things a human can be exposed to. And one of the worlds largest industries.
    4. Re:Hehehe... by Anonymous Coward · · Score: 1

      ::clearing tears away from eyes::

      That was a great post. Good stuff.

    5. Re:Hehehe... by Anonymous Coward · · Score: 0

      No, Common Lisp is the least worst. ANSI-standardised language, multivendor, at least or more powerful than java, guaranteed maintainable by anyone with a real CompSci degree.

    6. Re:Hehehe... by Anonymous Coward · · Score: 0

      So, are the database libraries standardized, or are they "multivendor"?

      The big advantage of Java is that while the language isn't "open", the runtime platforms are. So you can move well-behaved jdbc or ejb code between different vendors without much fuss. And it's Guaranteed maintainable by anyone with a real CompSci degree from an Indian university.

  11. Sigh - Looks like I have my work cut out for me... by Yoda2 · · Score: 4, Funny

    Off I go to starting coding a FORTRAN port...

  12. Of course, if mysql had replication worth a damn by drinkypoo · · Score: 0, Flamebait

    Then this wouldn't be so necessary. Of course this must also perform load balancing which you do not get for free with replication, but it would be nice. After all, you can always use a load balancing solution between you and the databases if your replication is actually worth using.

    --
    "You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
  13. Where are the benchmarks that they speak of ? by a7244270 · · Score: 5, Insightful

    I looked at the diagram, and it looks very nice, but they seem to be very light on the details.

    Supposedly, This new version has been successfully tested with Tomcat, JOnAS, MySQL and PostgreSQL. Excellent results have been obtained with the TPC-W and RUBiS benchmarks.

    Don't get me wrong, I like the idea, and I have been wanting something like this for years, but I sure would like to _see_ the test results, even if they are preliminary.

    1. Re:Where are the benchmarks that they speak of ? by Anonymous Coward · · Score: 1, Informative

      A paper on this has been submitted to a conference with a blind review process. Therefore, we are not allowed to publicly disclose the results before the notification of acceptance/rejection.
      Maybe you can get a copy by asking directly one the C-JDBC team member?

  14. How about a meta-database adapter? by Frater+219 · · Score: 4, Interesting
    Since the article suggests the idea of applying disk-volume concepts (RAID) to databases, I thought I'd bring this up: For a while now I've been wishing there was an equivalent of NFS for databases, a way to mount a running database's tablespace into another database. This would allow one to draw together disparate databases, creating views and running joins across tables which natively reside in different databases, on different hosts.

    Here's an example of an application: I have a database-driven Web application that allows my onsite clients to register network services for openings in the firewall. Another software component probes the registered hosts for daemon version information and records it in the database, so that we can send out alerts when security holes are discovered in particular versions. I use PostgreSQL on Debian and Solaris. Independently of my work, our networking office has a Microsoft SQL Server database of IP addresses, MAC addresses, and physical switch ports and jack numbers.

    What I'd like to do is mount both my database and the networking office's database into some sort of "meta-database" -- analogous to mounting filesystems from two different hosts via NFS -- and run SQL queries that span both data sets. I wouldn't expect to be able to write to this conjoined database -- locking would be a nightmare -- but being able to SELECT across the two sets would be incredibly valuable.

    1. Re:How about a meta-database adapter? by Anonymous Coward · · Score: 0

      The JET engine from Microsoft Access, though it sucks in almost every way possible, can in fact do this with any ODBC data source. MS SQL Server also allows one to do it, since it supports cross-database queries - SELECT * FROM DBNAME.USERNAME.TABLENAME or something like that.

    2. Re:How about a meta-database adapter? by eric2hill · · Score: 2, Informative

      Oracle has database links.

      Create a database link (for example to an AS400) and you can query the remote tables just like local tables.

      select * from somelib.sometable@as400

      Oracle will pass as much SQL as it can to the remote DB engine in order to keep things speedy.

      --
      LOAD "SIG",8,1
      LOADING...
      READY.
      RUN
    3. Re:How about a meta-database adapter? by Anonymous Coward · · Score: 0

      I have always wondered why SQL did not allow join over multiple databases that are hosted on the same database server. Multiple database servers would be problematic as the processing of the join would have to happen at the client level - this could require lots of unused data to be transmitted.

      However with this approach of a controller serving multiple databases, I think that could be an extension of this idea. Instead of redundant databases, the controller could connect to multiple distinct databases. Using some simple extension to SQL you could specify what database your table is in (or the controller would know what table is in what database as long as there aren't duplicate table names). The performance would probably suck but it would make the client that needs this type of functionality much easier to write.

    4. Re:How about a meta-database adapter? by Orgg · · Score: 2, Informative

      Your problem wouldn't be solved with the product mentioned in the story. However, because you are using MS SQL Server, this is really easy. You just need to get the postgres ODBC Driver, and setup a Linked Server on the MS box.

      Check out This page for the postgresql ODBC Driver.

      You should also look at the linked servers documentation in SQL Server Books Online (under sp_addlinkedserver) as well as the interface in enterprise manager (security -> linked servers)

      As I was searching a bit, I see that people have had trouble using the server.database.owner.object syntax, however using the SQL Server OpenQuery(servername, query) function seems to work, and will allow you to control the exact SQL Statement sent to postgres.

    5. Re:How about a meta-database adapter? by gillbates · · Score: 0

      IIRC, Microsoft Access 2000 can already do this with linked tables. You can link one set of tables to a database on one server, and another set to a database on another server.

      As for your PostgreSQL database, that would require some work. Here's what you would do:
      1. Convert (or dump) your PostgreSQL tables to comma delimited text files on the server.
      2. Use Samba to share the directory to which these tables are dumped.
      3. Map this directory to a drive letter on your MS boxen.
      4. Use the VBA part of Access 2000 to open and read these files; use DAO (or whatever) to build local tables from same on a regular basis.
      5. Develop a plan for synchronizing the two sets of tables. If you really didn't care whatsoever about performance, you could use your VBA code to read these files over the network every time the user does a query.
      Hope this helps
      --
      The society for a thought-free internet welcomes you.
    6. Re:How about a meta-database adapter? by LeBleu · · Score: 2, Informative

      Actually, if you look at RAIDb-0, it is very close to this, maybe even identical. They show having different tables on different database servers. They also indicate that C-JDBC can be used without modifcations to the application. This would imply that if you get a JDBC driver for MSSQL, a JDBC for PostreSQL, and write your code using JDBC, you should be able to do the type of selects you are talking about.

      --
      --LeBleu

      If you're reading this you're part of the mass hallucination that is Kevin the Blue.

    7. Re:How about a meta-database adapter? by grassy_knoll · · Score: 1

      What you're talking about is common with oracle. You can make a select-only query against two databases via DBLINK. With local ( or public ) synonyms the remote tables look like they're a part of your schema.

    8. Re:How about a meta-database adapter? by Anonymous Coward · · Score: 0

      One would have to be a real Access "expert" to come up with a plan that retarded.

      If one were going through the process of moving data around (rather than linking servers), why not just import it all into MSSQL or Postgres and run the queries directly? There's no need for Access or Samba or CSV files at all.

      MSSQL even comes with a tool called "DTS" which allows you to do this on a scheduled basis.

    9. Re:How about a meta-database adapter? by Anonymous Coward · · Score: 0

      Microsoft Access is an abberation of nature and nothing short of a high-powered version of Excel.

      The process you just described sounds like going to the doctor to have your tonsils removed via the rectum.

    10. Re:How about a meta-database adapter? by Anonymous Coward · · Score: 0
      If one were going through the process of moving data around (rather than linking servers), why not just import it all into MSSQL or Postgres and run the queries directly? There's no need for Access or Samba or CSV files at all.

      I guess you forgot to read the first line of my post, which explicitly mentioned linking the tables. But lacking the ability to link, you could use VBA to import the data. In fact, there are some situations in which it would be prefereble to use VBA to explicitly import/export data rather than using the import tools:

      1. Not all queries are read-only. You might need to update a value on the PostgreSQL server, and VBA can do this by updating the remote text file. Importing the files will only update the local copy.
      2. Access can't import arbitrarily large files - sometimes it will have problems, and when it does, it abandons the entire process.
      3. You might not need to import all of a given table. Rather than moving the whole table between servers, you can move only the parts you need. And, you can pick and choose which records to import, and which to exclude.
      4. You might want to continue to import records even if there are some bad ones.
      5. If the database becomes corrupted, it is possible that the linked tables would be corrupted as well. Linking tables places all of your eggs in one basket - for security reasons, you might want to have more explicit control over what happens to the other data.
      Access is cheap, and it's out there. Obviously, this guy wouldn't be running both MSSQL and PostgreSQL as a matter of choice - he's basically working with what he's got. Which is the way things are often done in the real world; you get mixed platforms and less-than-ideal situations. And this sometimes dictates less-than-ideal solutions.

      Honestly, I've had to clean up after the likes of you - you know, the folks who think database design is just running a wizard. It's not that simple, folks. Not every database problem fits into the Microsoft database model mold.

      For example, I work for a financial firm. We take memberships, and every employee is a member. Obviously, I shouldn't be able to see financial information for fellow employees, but I need to see information for our members to do my job. If we were to use Microsoft tools to import tables from the remote servers, then I could see sensitive financial information about fellow employees, which is a security violation. OTOH, if my Access database used VBA code to import the list of members, then we could exclude all company employees rather easily. So the solution is _NOT_ to use Microsoft's import tools, but rather to use VBA code to import and export the relevant tables. This way, we can enforce security.

    11. Re:How about a meta-database adapter? by Anonymous Coward · · Score: 2, Insightful

      > "You might need to update a value on the PostgreSQL server, and VBA can do this by updating the remote text file."

      You complete fucking moron. You are suggesting using shared text files as a database server. Hello? Anyone home? That's even worse than Jet! What about locking, integrity, and so on? Damn you are stupid.

      > "Access can't import arbitrarily large files"
      Don't use Access. Duh.

      > "You might not need to import all of a given table"
      Don't import the whole table. Duh.

      > "If the database becomes corrupted"
      Get your DBA to fix the database corruption. Duh. This is only a common problem for glue-sniffing Access bozos like yourself.

      > "Obviously, I shouldn't be able to see financial information for fellow employees, but I need to see information for our members to do my job"

      A) VBA does not solve this problem - there's no security model there at all.
      B) What you need is called a "View". You can learn more by taking SQL 101 at the local community college.

      > "I've had to clean up after the likes of you - you know, the folks who think database design is just running a wizard. Not every database problem fits into the Microsoft database model mold"

      This from an Access programmer who is stupified by basic RDBMS concepts. Yes, please please "clean-up" my stuff by introducing MS Access and updatable CSV kludges into the mix. That will really help tons. (To make you a tiny bit more informed, DTS is actually an API, although it comes does with a wizard for MS Access dinks like yourself)

      After you complete your elite mastery of VBA, I recommend reading up on the tools found in MS-SQL or another RDBMS.

    12. Re:How about a meta-database adapter? by gillbates · · Score: 1

      My, how I feed the trolls...

      This from an Access programmer who is stupified by basic RDBMS concepts

      No, actually I'm a database programmer whose used a variety of products, from DB2 (and others, unfortunately) on the mainframe, to Access on the client. My original solution was contingent on the original poster's problem of not being able to join tables across different databases - a significant problem. Obviously, if this was an option - to link tables, that would be the first choice. But what do you do when you've got two databases that, in essence, won't talk to each other? Running SQL is nice, provided you've got connectivity, but all the RDBMS knowledge in the world is useless if you can't get the two database servers talking to each other. And if you can, then joining tables across a the two systems would have been a trivial exercise - one that wouldn't have merited such a post.

      Sometimes you've got to work with what you've got. And if you can't implement a database system with text files, then you aren't much of a programmer. Here's a hint: if you need locks, add another field - this is the way some of the older mainframe systems handle locking.

      Right now, I'm doing work for a company that can't afford anything but Access. And yet, they expect their systems to have mainframe class reliability. Granted, Access is not my first choice, and I've told them this. Now I know what you'd tell them - get a real database. And I know what they'd tell you - get another job.

      Good database design is not merely plugging SQL code into a database server. There are bigger issues to consider, such as cost. Small businesses simply can't afford the more robust databases, and you can't work with what they've got, they'll hire someone else.

      --
      The society for a thought-free internet welcomes you.
  15. If this can make the free databases scalable... by 1337_h4x0r · · Score: 0, Troll

    it'll put some of those high-dollar Oracle admins out of business. Oracle has been in the business of changing the entire support method of their database every revision, and releasing numerous revisions, to keep their dba's in business. I've been waiting for something that can do what oracle can do that doesn't come with such a pricetag. This is awesome!

    1. Re:If this can make the free databases scalable... by Anonymous Coward · · Score: 0

      Try using it before you call it awesome. Database replication isn't as simple as "write to multiple servers" and if your data gets out of sync who ya gonna call?
      This is a simple solution for simple systems.

  16. More info on transactions by binaryDigit · · Score: 3, Interesting

    Maybe I missed it but there info is pretty sparse on how they handle updates (i.e. adds/deletes/updates). Does it do two phase commit so if I'm stripping data and one of the updates fail then everything fails? If they are replicating, will they automatically update replication servers if they are down at the time of the update? If one of the databases in the RAIDb doesn't support online backups and it's backing up, what will their system do? After all, this would be the true grunt work, without these features then what they have isn't a big deal at all. Does anyone have more info?

    1. Re:More info on transactions by grugruto · · Score: 3, Informative

      The C-JDBC controller embedds a recovery log that allows backends to recover from failures (check the recovery log part in the doc).
      If one backend fails in the cluster, it is automatically disabled and the controller always ensures that data that are sent back to the application are consistent.
      By the way, you can tune how you want distributed queries to complete (return as soon as the first node has commited, wait for a majority or safer wait for all nodes to commit). There are many options that helps tuning the performance/safety tradeoff.

    2. Re:More info on transactions by j3110 · · Score: 1

      I was wondering the same thing.

      You must have a two-phase commit in order to maintain ACID compliance. (or you'll have to pre-lock rows which will cost more performance than you'll gain)

      Since you can't implement a two-phase commit at the driver level, you would have to run a pass-through server that handles the two-phase commit. You would need to distribute this server so that a client could connect to any one server in order to get any kind of redundancy.

      I bet this was a lot of really hard work to do if it works right. Also, I wouldn't expect it to support stored procedures or database specific functions the way you would expect either. (I'm betting in another version they'll support putting stored procedures on the C-JDBC side of things.)

      --
      Karma Clown
  17. Why? by Anonymous Coward · · Score: 2, Insightful

    Why do masses need database clusters? Does anyone apart from mid-large sized businesses need one?

    1. Re:Why? by AlecC · · Score: 1

      Reliability. You may have quite a small, relatively low loaded database for your small business. But if your business depends on quick response, you want 24/7 uptime. If somebody asks at a shop if you have something in stock or checks a reservation at a hotel, you want to be able to say yes/no quickly. How amny times to we go elsewhere if someone says "Sorry, the computer is down"? I got that at my doctor's the other day - due to building works, the one computer with the appointments on it had been powered off. OK, one PC will host the database without breaking into a sweat. But how reliable is that PC? Well, you can get UPS, raid drives etc. etc, which make failures less likely, but they cost money and are not infallible. Or how about you use this to cluster (for free) across all the PCs you have sitting around, mostly running screen savers? Orders of magnitude reliability increase for zero bucks.

      --
      Consciousness is an illusion caused by an excess of self consciousness.
    2. Re:Why? by grugruto · · Score: 2, Interesting
      You have your web site backed by an open source database?
      Just put a replica on a second node and you will have fault tolerance (even just for maintenance) and you will be able to handle peak loads. 2 nodes is already a cluster, don't need to have hundreds of nodes.

      Another usage could be to keep a single Oracle instance and put a bunch of open-source databases to offload your main Oracle database. You could have all the write queries (orders, ...) handled by your [safe] main Oracle database and have all other open-source databases handle the read requests for browsing your web site (which is the main part of the load). What do you think of this idea of scaling Oracle with open-source databases?

    3. Re:Why? by Anonymous Coward · · Score: 0

      Am I the only one who thinks that making a desktop computer run a critical database sounds like a bad idea? (If nothing else, consider security)

    4. Re:Why? by Anonymous Coward · · Score: 0

      To the first point...

      yes you will have a fault tolerance but consider 1-10 person business... do they really need 100% uptime even at a low cost of few thousand bucks? (electricity and hardware). In most cases, answer is no.

      The idea of supporting Oracle database is a good one. But it completely missed my point, I claimed that masses dont really need database clusters and anyone running Oracle database that is under extreme load is quite definately NOT a part of the masses.

    5. Re:Why? by sharkey · · Score: 1
      Why do masses need database clusters?

      So that there is no chance of worshipers not knowing which hymn is next due to a node failure?

      --

      --
      "Outlook not so good." That magic 8-ball knows everything! I'll ask about Exchange Server next.
  18. Re:I'm 100% Confident by grugruto · · Score: 2, Informative

    What you missed is that this thing only forwards SQL requests. Therefore you can also build clusters of Oracle if you want. You will not miss any Oracle feature this way.
    When you look at Oracle pricing policy, you can have Oracle RAC for the price of just Oracle (+ a free RAIDb), which is already a 50% discount!

  19. supposed to be at RDMS level by Arethan · · Score: 4, Insightful

    Isn't clustering supposed to be a function of the database system, not the software you use to access it?

    I mean, this is neat and all, but I really don't want to have to use this interface just so that I can cluster my database. You're much better off placing clustering functions within the database itself. Then you can access the data by any method (ODBC, native libraries, hell even with the provided command line interface).

    Take a look at how MS SQL Server performs clustering sometime. Everything (and I mean EVERYTHING) is performed via triggers and tsql. All the clustering setup does is set up a bunch of known working trigger scripts to propagate the data. You can even edit them to your liking afterwards if you wish. Now I'm not saying that MS's solution for clustering is the cat's ass. Personally, I think it is kind of hackish, but then again I believe that clustering should be something you simply turn on, and shouldn't be able to fuss with. Realistically, I can't think of any good reason to change the cookie cutter tsql scripts that perform the clustering, so I only see the ability to modify them as a potential way to fsck it up (that being an obviously bad thing).

    Clustering really isn't that hard to implement. I'm pretty surprised that MySQL and Postgres don't have better support for it. Especially Postgres, since transaction support is really the one big key that makes clustering possible. Maybe no one has really had an itch to make it heppen yet. Hopefully it will happen soon, since I'd love clustering to be another argument for why OSS databases can play with the big kids just as easily.

    1. Re:supposed to be at RDMS level by Vihai · · Score: 3, Insightful

      You are true, clustering not only it better implemented ad DMBS itslef, it actually NEEDS support from the DBMS.

      You are wrong saying that implementing clustering isn't hard.

      If we are talking about REAL DBMSes (no, MySQL is not a real DBMS) enabling every form of clustering which maintains the ACID properties we expect from a DBMS is a major step, it means becoming a distributed application, and it is one of the most complex thing to implement.

      Just for example, suppose you have two machines in a master-to-master configuration, suddenly the network become partitioned, each server thinks that the other is offline, but the clients can reach both of them.

      Suppose now that the clients update the same record on the two servers in an incompatible way... you could imagine what will happen when the servers become visible to each other again...

    2. Re:supposed to be at RDMS level by pmz · · Score: 1

      Take a look at how MS SQL Server performs clustering sometime. Everything (and I mean EVERYTHING) is performed via triggers and tsql.

      Does MS SQL put those triggers into your applications database tables or in some sort of separate system table? If clustering in MS SQL requires mucking up the application's own data structures, then MS SQL must be worse than I already thought.

    3. Re:supposed to be at RDMS level by maraist · · Score: 1

      There already is clustering support in postgres via triggers. Problem is that it's still alpha/beta.. The problem (as other replies have stated) is that the job is nefareous, and it's possible to massively corrupt the data.

      Think of RAID as it's hard-drive counter-part.. Data-integrity could be most efficiently handled at the hard-drive layer.. Having multiple redundant controllers and disks, etc. It would be a generic disk as far as the SCSI/IDE card was concerned.. But it turns out to not be the cheapest solution. Having several "cheap" fully redundant (in terms of controller and IO/cables) machines with a relatively simple redundant dispatcher works very nicely. The fact that you can use proven off-the-shelf drives is also a boon.. You don't have to pay for low-volume high-value (and thus high cost) drives which may not even be supported 10 years from now.

      The same is true with database clustering.. If you choose postgres clustering solution A.. It may lose support (in a feature-lacking state) and you'd have to port your data to solution B later on.. Moreover, you restrict yourself to the types of operations you can perform.

      Theoretically by making fully redundant lower layers, your system is as reliable as the lowest layer in a simple-mode (which is generally pretty reliable). Moreover, thanks to JDBC, you can swap out entire databases without changing your API.

      It is true that swapping C-JDBC to something else requires a code-change, but remarkably, not a database-format change. We don't have to worry about "upgrading" the clustering triggers or any of that mess. Moreover, my first glance suggests to me that c-jdbc is jdbc-like and thus shouldn't be too bad to swap out with cluster-solution B.

      If we were doing clustering at the database level, we couldn't use the RAID mneumonic, since in many respects it wouldn't be "Inexpensive" (at least in terms of development/maintainance time).

      --
      -Michael
    4. Re:supposed to be at RDMS level by Arethan · · Score: 1

      Suppose now that the clients update the same record on the two servers in an incompatible way... you could imagine what will happen when the servers become visible to each other again...

      This is why you have transaction logs that are timestamped. When the sytems resync, they merge their transaction logs, rollback to the last synced state, and then re-execute every transaction until they are current. The end result is that the newer row updates will overwrite the older row updates. This may or may not be the desited effect, so you must allow this operation to be turned on or off. What most systems do is forcefully synchronize all write operations amongst master servers. By this I mean that if a master server becomes unreachable, the update query is blocked until either it becomes available again or times out with an error.

      Most problems with replication, though, have more to do with poor software design than the database. In order to make the multiple-masters system work correctly in an uncertain environment (pretty much every case), you should really have two tables. One local, and one global. The local table is non-replicated and queues operations to be posted to the global (replicated) table, and a background process runs on the local server to try to push them to the global table as connectivity permits. This allows you to have pending operations, and also allows you to also have a conflict table to store conflicting updates, which can then be later resolved by flesh-n-blood v1.0. Also, SQL queries that insert into one table and then update or insert another using the last insert id from the previous are often written incorrectly. You should really never call the db's function for last insert as part of a separate select, and then use the resulting hard number in the next table's insert/update. Instead, you should incorporate the function call into the 2nd table's query, so that if the insert occurs on an unsynched system, the result of replaying the transaction log will still result in an accurate id number (thus preserving the row relationships). The id may have been changed, but at least it will still have the relation intact. Obviously, all of this needs to occur within a single transaction to make it work. Coincidentally, because of this, the id column makes for a poor unique row id if it is an integer or some magically created string by the client software. You're much better off using GUIDs (or UIDs in linux), as they are based off of more unique algorithm (usually, high res date/time and NIC MAC & some random number).

      On a related side note, 'master server' is kind of a bad label to use, since true RDBMS's have better granulatiry than that. Usually you can set master/slave properties at the table level, and still most even let you set master/slave at the row level (as defined by a where clause), as well as published data at the column level (as defined by a non-joining select statement).

      In all honesty, I do agree with your argument about replication not being easy, but I take the stance that the truly hard (and critical) part is writing software that correctly handles being in a replicating environment. It's easy to write software that uses a database backend. but it can be a bitch to make it run correctly in all cases in a replicating environment. There is only so much that DB replicating can do for you, the rest is up to the software to keep the data sane during disconnected states.

    5. Re:supposed to be at RDMS level by rsax · · Score: 1
      I'm pretty surprised that MySQL and Postgres don't have better support for it. Especially Postgres, since transaction support is really the one big key that makes clustering possible. Maybe no one has really had an itch to make it heppen yet.

      The itch is there and people are scratching away. No idea how long it will take for it to be incorporated into a stable release though.

    6. Re:supposed to be at RDMS level by grugruto · · Score: 1

      Maraist is right, C-JDBC is database independent and can even support clusters of heterogeneous databases (as long as they can understand the same SQL subset).
      In fact, scheduling the requests upfront the database usually performs better than just letting the database doing the locking (check this article). The bet is that with this solution we can have a generic way to provide clustering solutions, it is much easier than implementing it inside a RDBMS engine (see Postgres-R work) and can perform at least as well as the DB specific implementation.

  20. "Shared-Nothing Architecture" by Anonymous Coward · · Score: 2, Insightful

    The commercial databases that have been doing this for years are DB2, Informix, and Teradata.

    Know what? There are a ton of deep issues beyond just making the different partitions transparent to the application level. Think about joins across partitions for sec...

    1. Re:"Shared-Nothing Architecture" by maraist · · Score: 1

      There are a ton of deep issues beyond just making the different partitions transparent to the application level. Think about joins across partitions for sec...

      Don't see how it limits you here. If you have n fully redundant RDB's, a single controller, and m clients, the dispatcher load balances you such that if all m clients are performing non-destructive reads, they all read from different machines.. (preventing resource starvation). Each machine either put everything on one disk or segments the data across multiple disks (by table, or otherwise). But the key is that all DBs do this the same.

      As for writes, that "shouldn't" be a problem either, since all you need do is simultaneously (via multi-threading) perform a write-request to each machine. It shouldn't be any different than if there were only a single machine.

      The only caveat is when two machines don't respond identically.. But if the machines are fully redundant (similar hardware/disk space/memory space) then I doubt that would happen (obviously it's not impossible though).

      I can see problems such as if a C-JDBC connection is cancelled or the controller crashes, then the databases could be left in a bad state (with some DB's completing sooner than others).

      Theoretically, the controller could use a redundant journaling system to recover (knowing what to "try" and rollback).. Moreover, cancelled c-jdbc connection can simply try and complete their tasks, reguardless of the connection problems (this is a policy issue). But a crash in the single controller could be akin to a crash in the database - warrenting some manual data-recovery.

      I'm not saying that there aren't deep issues, but at least I don't see weight to your join problem.

      The issues that DB2/Oracle run into are making clustering transparent. c-jdbc is already demonstrating some API limitations. So it's obviously not a general purpose solution.. But it still suites many needs (Code, Fast, (/good))

      --
      -Michael
  21. Slightly Offtopic.... by frodo+from+middle+ea · · Score: 3, Insightful

    But , Seriously do you see Oracle/DB2 etc customers suddenly jumping over this ?
    My view is that it may be difficult to migrate OSes or even hardware, but its almost darm impossible to migrate existing Databases.
    A Database is the most fundamental and most cared about aspect of a major business. There is a lot of time and effort and MONEY spent to incorporate it in to the company.
    Lots and lots of critical business applications are written using the propritory extenstions of these vendors. Is it very easy to migrate this code ?
    May be interesting for a future pilot project, but if expect business to change their database vendors.. that's not going to happen very soon.

    --
    for the last time people, I am "frodo from middle eaRTH", not "middle eaST".
    1. Re:Slightly Offtopic.... by puppetman · · Score: 1

      It's easy, and it's not easy.

      If you have alot of PL/SQL stored procs, and you are moving to MySQL (no stored procedures yet, no PL/SQL) then it's tough.

      If you are moving to Postgres, then it gets easier.

      It really depends on how you coded your application. Even if you use a bit of non-standard SQL, there are usually equivalents.

    2. Re:Slightly Offtopic.... by grugruto · · Score: 1
      When Linux came out. Did someone take it seriously? Did you see customers suddenly jumping over it? Didn't all commercial Unixes already have all the features that Linux just start to have?

      As someone already mentioned, C-JDBC and RAIDb are certainly not ready for prime time, but at least it is worth debating about it!

  22. How does clustering improve performance? by snatchitup · · Score: 1, Interesting

    Just curious.

    How do you join one table to another when they are on two separate boxes?

    Well. I know how to actually use SQL to join two tables from two separate databases. But what is actually happening inside the RDBMS at the low lever. Does one just bring over the entire other table. How does it use indexes.

    Seems to me this really is doing at best, a reference implementation that may actually degrade performance.

    1. Re:How does clustering improve performance? by maraist · · Score: 1

      How do you join one table to another when they are on two separate boxes?

      Several people have asked this question. Have you looked at the white-paper? It's possible to do RAID-1 which is m fully redudant DBs with all tables being fully accessible from a given DB. In RAID-1, therefore, there is "zero" problem with joins / updates / transactions, because it's literally just pretending to be accessing a single machine.. (I quoted zero because you might have synchronization issues if one machine somehow responds differently to updates).

      Not having looked at the code, it would seem that the Raid 0,2 solutions would require at least seperating tables based on lack of Foreign key references and potential joins. I am guessing that you can't do the joins, (or at least not do them efficiently if the controller is actually munging the data). The key, however is that you - the admin - are choosing which tables go on which machines. So we're not talking about truely striping the data as in RAID-0 where pseudo-randomly some data goes here and some goes there without user intervention.

      --
      -Michael
    2. Re:How does clustering improve performance? by harryk · · Score: 1

      From my understanding, this is more of offering redundancy and availability than it is offering performance.

      --
      think before you write, it'll save me moderator points.
    3. Re:How does clustering improve performance? by CustomDesigned · · Score: 1
      Run the queries in parallel on each server, then merge the results. Remember that SQL tables are sets. The merging overhead is proportional to the size of the result size - and independent of the size of the tables being queried. As long as your result set stays small, you can throw an unlimited number of servers at parallelizing the query with linear performance gain.

      Here is a simple example to help understanding and spark some political debate at the same time. :-) You have detailed ATM card transactions for 300 million americans. The detail records are distributed via a hash function over 1000 database servers. Each server has the same schema - but the tables on each server have a different subset of the records for that table. Each server independently looks for records of visiting a particular ATM on a particular day, and forwards those records to a merge processor. The result set is small, so the merge processing is insignificant, and the query gets a 1000X speedup.

      When inserting a record, the hash function determines which server to send it to for insertion. This randomized distribution is necessary to keep the number of records on each server roughly equal without requiring feed back from the servers. (Although tracking size of each table on each server is also reasonable.)

  23. TLD by Anonymous Coward · · Score: 0

    Well, some of the top level domain controlers (at least .org I think) use PosgreSQL.

  24. a few complex queries that aren't optimized by Anonymous Coward · · Score: 0

    Well.... umm..... Isn't it good that you found some piss-poor code that needed correcting.

  25. their site is not slasdotted... by kipple · · Score: 1

    ...does it mean that their db really works? (at least, until now..)

    --
    -- There are two kind of sysadmins: Paranoids and Losers. (adapted from D. Bach)
  26. Merge it with J2EE spec by trajano · · Score: 1

    It would be nice if C-JDBC was built into the J2EE spec so all J2EE containers can support this facility.

    It may also have the advantage of using the transactional, load balancing and clustering facilities of the J2EE container as well.

    --
    Archie - CIO-for-hire :-)
  27. DB Clusters of the world, unite! by t0ny · · Score: 1, Funny
    Cluster of databases is no more the privilege of few high-end commercial databases, open-source solutions are striking back!

    Finally, my grandmother can have that database cluster she has been bugging me about.

    --

    Manipulate the moderator system! Mod someone as "overrated" today.

    1. Re:DB Clusters of the world, unite! by Tablizer · · Score: 1

      Finally, my grandmother can have that database cluster she has been bugging me about.

      So you are Grace Hopper's grandchild, eh?

  28. Also new! by Dark+Lord+Seth · · Score: 4, Funny

    RAID -- Redundant Array of Inexpensive Developers

    RAID 0
    Multiple developers work on the same project but none of them has any idea what the other is doing at the same time. One developer failing (caffeine dehydration, severe electrostatic shock, sex, etc) will cause the entire project to screw up and become a mess.

    RAID 1
    Extreme Programming.

    RAID 2
    Inefficient way to keep track of what developers are doing. For every 10 developers, 4 are needed to keep track of them and recover any error by the aforementioned 10 while they don't work together at all. Level of efficienty comparable to a modern goverment.

    RAID 3
    Equal to RAID 2, except all responsibility for checking the code is now granted to one person. The rest has been budget-cutted away. A bite more effective but considering people still don't cooperate, not too good.

    RAID 4
    Equal to RAID 3, escept people are finally working together now. Kinda efficient and fast, except it all still relies on that one person who checks the data.

    RAID 5
    Everyone knows what everyone else is doing, they all work perfectly together and they can easily miss one person because of that.

    1. Re:Also new! by Trygvis · · Score: 1

      R O F L

    2. Re:Also new! by sharkey · · Score: 1
      RAID 0 Multiple developers work on the same project but none of them has any idea what the other is doing at the same time. One developer failing (caffeine dehydration, severe electrostatic shock, sex, etc) will cause the entire project to screw up and become a mess.

      Also known as a "Duke Nukem Forever Array".

      --

      --
      "Outlook not so good." That magic 8-ball knows everything! I'll ask about Exchange Server next.
  29. Re:Sigh - Looks like I have my work cut out for me by gatesh8r · · Score: 1

    Poor me; I have to implement it in COBOL :-(

    --
    Karma whorin' since 1999
  30. Limitations by Anonymous Coward · · Score: 1, Insightful

    It's a good start - but not ready for prime time yet... Stored proc support is essential in a production setting.

    4.4. Current limitations

    The C-JDBC driver currently does not support the following features:

    * XAConnections,
    * updatable ResultSets,
    * callable statements (stored procedures),
    * Blobs,
    * batch updates,
    * multiple controller failover is subject to controller support for distributed virtual databases,
    * JDBC 3.0 features.

  31. Fine-grained caching question by code_nerd · · Score: 2, Interesting
    The user's guide page says this about caching:

    8.7.2. Request cache

    The query cache provides query result caching. If two exact same requests are to be executed, only one is executed and the second one waits until the completion of the first one (this is the default pendingTimeout value which is 0). To prevent the second request to wait forever, a pendingTimeout value in seconds can be defined for the waiting request. If the timeout expires the request is executed in parallel of the first one.

    A request cache element is described as follows:

    The request cache granularity defines how entries are removed from the cache. noInvalidation provides a non-coherent cache and should only be used for testing purposes. table and column provide table-based and column-based invalidations, respectively. columnUnique can optimize requests that select a unique primary key (useful with EJB entity beans).

    Can someone who understands C-JDBC better than I do explain what this might mean? Sounds to me like they are replacing a feature of CMP by doing this, which is not necessarily something that would be "useful with EJB entity beans" if I understand it right (unless maybe they are referring to folks using EJB 1.0?). That is, the container already handles cache-invalidation at a fine-grained level. Perhaps there is a scenario I am not imagining where it would be useful to have this at the database level also... thoughts?
  32. Perl has had this for years by Anonymous Coward · · Score: 0
  33. Clusters aren't performance? Just not true! by CharlesDarwin · · Score: 3, Informative

    This simply isn't true. Oracle's clustered database solution (9i Real Application Clusters) are designed to increase the ability to gracefully recover from individual node failures. Additionally, they can scale the performance of your database application by increasing the number of CPUs with access to shared storage. For CPU bound database applications, this technology provides near linear scalability!

  34. Re:Of course, if mysql had replication worth a dam by KenSeymour · · Score: 1

    Don't forget transparent client switchover when the primary being replicated goes down.

    Replication safeguards the data, the client switchover on the fly provides high availability.

    --
    "We can't solve problems by using the same kind of thinking we used when we created them." -- Albert Einstein
  35. This is not that novel of an idea by FearUncertaintyDoubt · · Score: 1
    We've been using a similar idea for years. It's pretty much using "scaling out" with some application logic to make it useful for high-availability purposes. At one time, we had 13 subscriber databases (MS SQL 6.5) throughout the world, using transactional replication to keep them in sync. A small bit of logic in the front-end determined which server a user would connect to. In this way we could point users at the server geographically closest to them (which was configurable in a database itself).

    Essentially, this seems to be that front-end piece which abstracts the calling app from which server it is connecting to, and can dynamically point that app at another server. I'm sure it will be a handy module for anyone who doesn't want to write their own logic for dynamically determining a connection to a database.

    However, the cost of writing that bit of code is much lower than the overhead of maintaining all those database servers (heterogenous replication? ugh). So sure, this is helpful, but anyone with enough wherewithal to set up and maintain a set of synchronized database servers probably has enough sense to be able to set up application logic to utilize those servers anyway.

    1. Re:This is not that novel of an idea by afidel · · Score: 1

      Yes but how did you handle failover in your implementation, if the defined server was unavailable or was up but unresponsive what steps were taken? Yes you can implement all that yourself but why reinvent the wheel if you don't have to. Having the ability to have redundant frontends querying redundant databses sounds pretty close to what apache and load balancing has done for webserving (allow the use of lots of cheap servers to achieve 24X7X365 operations, now I understand databases have some unique properties that don't allow this to work quite as well as for webservers but there are still cost savings to be had)

      --
      There are 4 boxes to use in the defense of liberty: soap, ballot, jury, ammo. Use in that order. Starting now.
    2. Re:This is not that novel of an idea by FearUncertaintyDoubt · · Score: 1

      The app is written so that if the primary server is unavailable, it connects to a secondary. It would take only a small amount of work to make it possible to have n number of failovers. It's a nice thing to have a prebuilt implementation of such an animal, but having to write your own is not that bad either -- it's a small cost in comparison to the overall infrastructure needed to support such an environment.

  36. good idea--just not new by g4dget · · Score: 1
    C-JDBC is an open-source software that implements a new concept called RAIDb (Redundant Array of Inexpensive Databases).

    It's good that these are becoming available in open source form, but the concept is not new at all. IBM and Oracle both have had commercial versions for a while (I suppose the "inexpensive" part is new).

  37. load balancing apache? by Anonymous Coward · · Score: 0

    Just out of curiosity, how do you load balance Apache servers? Not counting round robin DNS.

    1. Re:load balancing apache? by Anonymous Coward · · Score: 0

      Three Letters: LVM

    2. Re:load balancing apache? by geekopus · · Score: 1

      You can use PEN and VRRP to get uptime as good as any "commercial" offering.

    3. Re:load balancing apache? by Sxooter · · Score: 1

      Squid with a simple custom redirector works nicely.

      --

      --- It is not the things we do which we regret the most, but the things which we don't do.
  38. Re:Sigh - Looks like I have my work cut out for me by Troll_Kamikaze · · Score: 1

    The Whitespace port is already complete:



  39. Thorough rundown by photon317 · · Score: 4, Informative


    After actually reading the documentation, here's my informed take on this:

    1) In it's current incarnation, it's only useful for very very simple database access. No transactions, no blobs, etc. Basically if you're just storing some simple weblication tables and doing single-statements against them for selects/updates (no big cross-table transactions), you can use it.

    2) It's JDBC only. Perhaps someone could port the concept to ODBC though.

    3) There's a new middle tier between the JDBC driver and the database itself, which is the bulk of their code. This tier actually re-implements some database constructs like recovery logging, query caching, etc. Of course this is neccesary, as trying to do replication from the client-code side alone would be impossible (what do you do when one of 3 DB mirrors goes offline for an hour? have every jdbc client cache the requests and replay them later, hoping those clients are even stilla round later?)

    For some applications and some companies, in it's current state this is a godsend - but it's not a general solution yet. Making it ODBC (or even better, having the front of it emulate a native postgresql or mysql listener) would broaden it's applicability.

    Supporting transactions would be a big win too, although I'm not sure how feasible this is - I think at that point they may as well just write their own new database engine which is parallel from the start, seeing as they'll be re-implementing in their cluster tier almost everything the database server does except for actual physical storage.

    Still, it's nice to see that someone did this and made it work - and for a lot of simple databases behind java apps it's all you really need.

    PostgreSQL has all the transaction support in place already, so of all the free DBs out there it would seem they have the best shot at doing their own native parallelism, if they would just get it done someday.

    --
    11*43+456^2
    1. Re:Thorough rundown by grugruto · · Score: 1

      C-JDBC fully supports transactions (else it would be completely useless). The only thing that is not supported is distributed transactions (a transaction that would span over different clusters).
      XA connections for C-JDBC are under development but standard transactions are fully supported.

    2. Re:Thorough rundown by photon317 · · Score: 1


      I'm not a JDBC user, but I assumed "XA Connections" meant standard transactions. Isn't XA short for TransAction? And what exaclty is a distributed transaction? If I were writing an app that needed to do a transaction across multiple seperate databases (well, for one I might think thoses databases need to be one, but suppose that's out of my control...), I would simply open two transactions, and make sure all the statements of both have completed successfully before I commit them both. Is a distributed transaction just an automation of that, or is there something more to it?

      --
      11*43+456^2
    3. Re:Thorough rundown by grugruto · · Score: 1
      ... I assumed "XA Connections" meant standard transactions.

      No, XA stands for X/Open Distributed Transaction Processing (DTP). If you access multiple data sources, you need 2-phase commit to ensure that either all sources commit or rollback. It is not acceptable that some sources commit and other rollback in a distributed transaction. There is a lot of litterature available on the web about this.

    4. Re:Thorough rundown by photon317 · · Score: 1


      So X/Open Distributed Transaction Processing is essentially what I was referring to then? Making sure all "sources" are ok before comitting them all, or rolling them all back if any fails? Hardly seems worthy of a designation and a standard, just sounds like logic to me.

      --
      11*43+456^2
    5. Re:Thorough rundown by grugruto · · Score: 1

      Hopefully all transactions are not distributed and that makes life much simpler. In most applications, transactions deal with a single datasource at a time and don't need distributed transactions.

  40. Tried this before... its a tough sell by Jboy_24 · · Score: 1

    I once worked for an opensourced company that tried creating something like this in Perl. We did so to try and lure customers from oracle and prove that open source could handle massive databases. But... we found many problems when trying to sell this to expirenced customers over oracle.

    1st... multiple points of failure. By increasing the number or databases your increasing the potential points of failure. What features are there to automatically backup data? If the data is spread randomly across the dbs and one of the drives or servers dies, what failover is there? Will the other databases take over? In a cost/risk analysis, is this really the cheapest way?

    2nd...Is any speed increase from multiple databases going to be more then the speed increase from just upgrading the database server? More/faster disks, more processors etc. Sticking to one machine allows you to use the fault tolerance built into the RAID controller or the server itself. You could argue that once you got to the fastest hardware you need to go with more machines, but at that point you might need to look at your application. Quad Xeon 2.2Ghz with GBs of memory and an NetApp disk array is going to powerful enough for alot of apps.

    3rd... Is this really faster? With simple SQL queries it might, but what about complex joins etc? Since this lies infront of the dbs, what about stored procedures etc?

    The only really application that I could see this for is a small ecommerce website that needs to have millions of potential products to sell. (Electronics supply store etc). Something where the data needing replicating is static and is imported.

    And as far as eliminating the need for a high priced Oracle DBA, someone able to support an array of 8-10 mysql databases using this technology is going to be both high price and hard to find.

    1. Re:Tried this before... its a tough sell by jfroebe · · Score: 1

      I concur. There seems to be a great deal of misunderstanding about clusters. Especially when talking about High Availability.

      Many people think that when you have a HA cluster, that the DBMS will run faster. Not necessarily. If you have an active-active HA cluster, then you only put a load of upto 50% on the boxes because when one node fails over to the other, that single node has to do the work of both.

      Another thing that seems to be misunderstood is that HA can be handled entirely in software.. this is false for production HA clusters. You will need hardware and software support for the cluster. Shared SCSI Disk Array for instance is required by NT, Sun Solaris, HP, AIX, etc.

      jason

      --
      No one has seen what you have seen, and until that happens, we're all going to think that you're nuts. - Jack O'Neil
    2. Re:Tried this before... its a tough sell by afidel · · Score: 1

      You are arguing against multiple redundant servers and saying that putting everything on one big server and disk array is better??? Are you nuts? A cluster is obviously better in that any one machine or disk array going offline does not take down you complete system. Now maybe Oracle RAC or a DB2 cluster might be better for some, but a cluster of dual cpu linux boxes running postgresql might come in at a fraction of the cost and so allow some people to get clustering protection where they normally couldn't afford it.

      --
      There are 4 boxes to use in the defense of liberty: soap, ballot, jury, ammo. Use in that order. Starting now.
    3. Re:Tried this before... its a tough sell by Jboy_24 · · Score: 1

      Yes, putting everything in one server with redundant power supplies, redundant disk with hot swap capability, reduntant network controllers is better! This JDBC hasn't implemented multiple reduntant servers, right now its about speed increases by striping data across multiple servers. RAID 0. Are you going to trust the redunant ability in this software more then your RAID controller?

      If I bought 3 dual CPU linux boxes there is a 3x better chance of having a Power Supply die, a network connection die etc a processor burn up then a 6-8 way server with failover in hardware. If there is no failover in the DBC layer, then your whole cluster is down. If any one of those net connections goes down the cluster goes down. The more you add, the more the chance of failure, not less!

      Failover is the critical word. If you reed the Limitiations in their documents, they havn't done that yet.

    4. Re:Tried this before... its a tough sell by afidel · · Score: 1

      They support RAID 0 and 1 plus any combination thereof. Therefore you can have your tables on multiple hosts and whichever one is up can send the results. You could have your entire database on multiple pc's and use this with two or more frontends to have the fastest responding server service the request. This is exactly about failover in the DBC layer. For more info on this see This page.

      --
      There are 4 boxes to use in the defense of liberty: soap, ballot, jury, ammo. Use in that order. Starting now.
    5. Re:Tried this before... its a tough sell by Jboy_24 · · Score: 1

      Maybe I was confused, under their C-JDBC controller (2 chapters previous) they say under current limitations that RAID-b1ec and RAIDb-2ec aren't supported. I wasn't able to find any distinction made elsewhere that the 'ec' was of any significance.

      Still, RAID 1 is not much of a cost savings. RAID 0 is a nightmare with databases. A complex cluster with RAID 1 and 0 as described probably has the same chance of downtime as a single server. a 3 unit Raid0 server have a 3x greater chance of dieing then a 1 unit, mitigated by having 3 3-unit RAID 0 servers.

      Again, in thinking of managing (backup, upgrade etc) these mixtures of RAID 0 and 1 servers in a production enviroment, wouldn't it just be better to get the Oracle license, a large hot-swap/failover server with backup and sleep at night?

    6. Re:Tried this before... its a tough sell by afidel · · Score: 1

      Not at $10K per CPU + lots per client liscense. Two frontend servers + 4 DB servers would be fairly cheap and could handle quite a bit of load and wouldn't cost much more than a couple cpu's worth of Oracle liscenses. Spend some on support contracts and you are still at a fraction of the cost of the Oracle solution. Yes it won't work for everyone but that isn't the point. Another place for it would be the small office with a small DB, if every pc acts as a frontend and a db server then any pc going offline doesn't effect the availability of the sytem (great for things like a doctors office where everyone could have the booking app and if one pc goes down the others could still book appointments. The amount of network traffic and processing would be minimal but the increase in availability would be great). Again there are people who want/need the Oracle/DB2 solution with all the bells and whistles, this just moves some more features much further down the price scale.

      --
      There are 4 boxes to use in the defense of liberty: soap, ballot, jury, ammo. Use in that order. Starting now.
    7. Re:Tried this before... its a tough sell by Jboy_24 · · Score: 1

      (great for things like a doctors office where everyone could have the booking app and if one pc goes down the others could still book appointments. The amount of network traffic and processing would be minimal but the increase in availability would be great)

      Actually if you've ever used SaPro for Ebay, i worked at a company where there was 12 continous users on it. Too bad it used an access file as its DB. Talk about slow down. I really think the only application where you might need this is an ecommerce app with a large product database (electronic components). The product database would be fairly static but HUGE, and the profit wouldn't support Oracle.

      But, with a 4 CPU license and 2 client licenses, vrs possible hiring someone to manage the 4 DB servers and any development costs. Oracle looks the better deal the more you go. Oracle will never be 'cheap', its priced at the point where companies who feel the pain will pay it, and not a dime cheaper.

    8. Re:Tried this before... its a tough sell by AELinuxGuy · · Score: 1

      Jboy_24, do you have any information about the Perl solutions developed at your previous employer? May have been a hard sell to the "big guys", but I know lots of medium-sized businesses that could use something like that.

    9. Re:Tried this before... its a tough sell by Jboy_24 · · Score: 1

      We did mainly Ecommerce stuff, the DB work was the product of the CTO's pet projects. Email me for more info at john@(the website URL minus the www). Or click through to my website, you can find my email there at the bottom.

      Thnx

  41. Re:Nothing beats Oracle RAC by Anonymous Coward · · Score: 0

    More or less - it's a reworking of Oracle Parallel Server. It handles failover reliably, scales reliably, etc. Like everything Oracle, wait 'til the 2nd iteration of a product has matured, *then* buy it. I have a multiple 2 and 4-node RAC-on-Intel (RHAS2.1) systems running at the moment. Even OEM and OMS have come along far enough that useable.

  42. Re:Clusters aren't performance? Just not true! by the_2nd_coming · · Score: 2, Insightful

    to bad the licensing for those cpus is exponential

    --



    I am the Alpha and the Omega-3
  43. dream of a language-agnostic system like this by munro · · Score: 1

    A while back, on a slow day at work, some friends of mine idly discussed making a system along these lines that would run as a separate process.

    Our idea was to write it in C, and make it proxy connections to mysql, postgres etc. In otherwords it would speak and understand the wire protocols of each database it supported. It would apply replication (etc) logic as it passed messages through to the real databases.

    We imagined a type of pipeline which you could configure, and messages would move though that pipeline being processed by different modules... ie you could enable replication, logging, and perhaps various other types of processing, as options for each user/db or something like that.

    Such a system would be useful for any client without modification (such as PHP, perl, C programs and of course the relevant JDBC drivers).

    Well we didn't go very far with the idea... Ok we didn't go anywhere with it... But I still I felt like sharing.

  44. Re:Sigh - Looks like I have my work cut out for me by Hard_Code · · Score: 1

    I beat all of you, my Whitespace implementation is below:

    --

    It's 10 PM. Do you know if you're un-American?
  45. LinuxHPC.org is giving away an AMD Opteron Cluster by Anonymous Coward · · Score: 0

    2 node AMD Opteron Beowulf Cluster giveaway!

    http://www.linuxhpc.org/pages.php?page=PSSC_Labs

    Worth a shot!

  46. Not there yet... by jfroebe · · Score: 2, Insightful

    While, I commend their efforts, what they are offering is little more than a poor man's High Availability cluster.

    The shared disk array (RAID, etc.) is just a part of implementating HA.

    My recommendation is for the developers to take a look at how it is implemented in the enterprise DBMSs (Sybase, Oracle, MS SQL Server, DB2) first.

    jason

    --
    No one has seen what you have seen, and until that happens, we're all going to think that you're nuts. - Jack O'Neil
  47. Re:I'm 100% Confident by maraist · · Score: 1

    On slashdotter commented that it was missing callable statements, which I believe are necessary for certain types of stored procedures.

    --
    -Michael
  48. Re:This is a threat to the big vendors-POV. by Anonymous Coward · · Score: 0

    "Moving to Postgres from Oracle would be asking someone to accept more risk in return for thousands and thousands of dollars. For some companies that's the difference between being a 3 man shop and a 4 man shop."

    1-All companies revolve around their database, from SOHOs to multinational. You'll find that movement harder than you think. As far as your 3 vs 4, that's more an argument to hire you than for a company watching the bottom-line hard enough that they'll jump to another DB solution.

    "The business world existed and got along quite nicely with paper records for quite some time. Paper records got lost all the time and business went along nicely. The same can be said today, if we were to lose 8-24 hours worth of data it would be bad, but not catastrophic. Insurance against such an even would probably cost a hell of a lot less than Oracle licensing."

    Uh, no it can not. The world back then isn't the same world now, and applying outdated metrics is being disingenuous. There's also some things that insurance will never mend.

    "Not every database needs to be 12TB and accessible by 2 million users 24/7."

    No, not every, but you'll find that those who need a Oracle calibar solution, also have big databases and need them 24/7.

    "The implicit argument for Oracle is that cost is no matter. Well then, I suggest you hire 12 people to each independently carve your data into stone as data loss there would be minimal."

    Um no. The implicit argument is that when you have business requirements that require something of Oracle's calibar, then that's going to cost.

    You'll see a decline when someone comes out with comparable offers, at a lower cost. And so far there aren't that many.

  49. Custering might improve performance... by Anonymous Coward · · Score: 0

    ...if it is done well. I don't remember exact details, my ex-coworker was doing a PhD thesis on distributed databases. The idea is that each of the databases executes a part of the query, and results are merged. If it is possible to do that, then performance increases. Especialy with complex queries with subqueries and etc. I don't know how much brain you have to put into query engine that distributes the load, and how to merge the results though. Lots if area to expore here.

    Oh, and I also agree that JDBC driver layer is not the right place to do clustering. I wish there were e.g. subprojects of PostgreSQL working on this issue.

    --Coder

  50. This is a very very old idea.... by Jboy_24 · · Score: 1

    Good starting block though...

    First, they should move more and more features of the DB to the controller layer. The goal should be that you can call plain SQL statements and complex joins directly. Later, you could even have stored procedures execute there and use the cluster as if it were one db.

    Then, they should try and work it so that you make low level calls to the DB layer, this would save time in having the seperate DBs compile the SQL statements.

    Next, make some kernal mods ala Tux to make the DB calls faster to execute, ie make the DB machines pure DB handlers.

    Once you do that, you might want to consider moving the seperate dbs into one rack, maybe making them share power supplies, disk arrays to cut down the points of failure.

    As well, have one handler computer handle all incoming connections which would appear to be a stand alone Database. Thus every database instance would apear to be a .... partition of the main database.

    It would be powerful to separate the hardware/database tie to allow the Admin to manage which servers would have which partitions, letting them span a partition accross a new server if it got too big. And let the partitions automatically move away from bad servers using parity information stored on a seperate server.

    Once you finish developing all that... you should realize that's what Oracle already does. Oracle isn't some MIcrosoftish company that developed a product absent any competition so quailty, reliability and performance wasn't job #1. Oracle has long competed against IBM, Sybase, Microsoft etc and pretty much has the DB thing down.

    The only use I could see for this tech, is in a small ecommerce web site that needed to search millions of records (electronics supply store). This would be for when a MYSQL table would start to bog down due to too many records. Even then, having multiple machines should be the very last resort.

  51. As was said before that is really cool but! by codepunk · · Score: 1

    Take it up to the next level and make that connector p2p based using JXTA and autodiscovery
    to handle all the traffic.

    Now that would be COOL!

    --


    Got Code?
  52. overrated by bobaferret · · Score: 1

    I wrote a jdbc wrapper that allowed me to use multiple jdbc drivers simultaneously. This seems ver y very similar. The load balancing was nice. But the replication really fely like it should be down at the db level, and not at the driver level. Syncing to databases after one went down was a pain.
    Spliting up tables across db's seems a little rough, esp since you have to run the query on more than one db and then merge the results into a single result set. This means that you have to do your own sort. It gets even more fun if you use limit and offset in your query. It just gets wierd after a while. I say wait for postgres-R, it'll be much more of a kick in the pants for Oracle.

  53. For example by Anonymous Coward · · Score: 0

    Ok, how about a couple of real-world examples?

    #1 - how do you deal with data that spans partitions? Ex: a customer database is partitioned by geographic location, but inventory applies across all partitions. How do you keep this unique and replicated across partitions? Or - do you decide to leave it on just a single partition, and then have all other partitions join to it when they need that data.

    #2 - how do you deal with queries across all partitions? Ex: ok, how many widgets were sold this month, totaled by geographic location. Do you run the query on each box then pull 50 state-level result sets together into a single one? Or does the application aggregate them (oh boy).

    #3 - how do you handle changes to partition size over time? So what do you do when you want to consolidate a handful of small states (Wy, Ne, ND, SD, HA, etc) onto a single server, and then distribute CA across the freed-up six servers.

    These are typical issues that come up all the time in a share-nothing environment. And for which DB2, Informix, and Teradata has very specific features to assist with rebalancing, localization, etc.

    Delivery of this function without addressing these tough features simply results in a poor product - that can't compete with these commecial alternatives.

  54. ACID? by winchester · · Score: 1

    Okay, so I am supposed to believe that this software is better at being ACID than almost all relational database systems?
    Sure, I love clustering boxes as much as the next guy, but the overhead is tremendous if the rdbms doesn't support it, let alone the data integrity questions it brings up.

  55. I wouldn't get too excited by godofredo · · Score: 3, Informative

    There are many problems with this design, some have already been mentioned. There are serious issues with performing atomic updates. Modern databases use locking to allow high levels of concurrency. Foreign key constraint checking is one thing that would be very hard to implement in this design, as it is generally implemented in the indexes themselves. Likewise, to get all databases in a "RAIDb 0" group to reflect the same state, operations such as concurrent delete and insert must be completely serialized to assure consistency...serialized across all clients, not just from one source.

    Furthermore, to scale up systems generally take advantage of stripping. At the IO level that means striping across multiple disks (modern convention is to stripe across all!). In a parallel database one usually stripes a single table across multiple nodes for parallel query processing. While it is possible with C_JDBC to put table X on node A, table Y on node B I don't see any provision for striping the data. It will be very difficult to use your hardware efficiently in this scenario.

    If you are going to go through the trouble of implementing a complete query processor (that can handle jobs larger than ram), a full update/query scheduler (lock manager), and a journalling mechanism that can (somehow) even maintain atomic transactions (even in the face of multiple failures) then why not just build your own database. This system might be useful in certain rare cases but I wouldn't use it except possibly for replication.

    JJ

  56. cluster? by Anonymous Coward · · Score: 0

    can one of the nodes be shut off without loss of any data? if not, it isnt a clustered database.

  57. ms sql clusters? by Anonymous Coward · · Score: 0

    can ms sql server drop a node in it's 'clusters' without losing any data? if not, it isnt a cluster, just a distributed / 'federated' database.

    1. Re:ms sql clusters? by jsin · · Score: 1

      Yes

  58. because of their superior product by ShieldW0lf · · Score: 1

    Ok... you lost me. Are you still talking about MySQL, or what are you talking about?

    --
    -1 Uncomfortable Truth
    1. Re:because of their superior product by Anonymous Coward · · Score: 0

      MySQL is also the name of a company that provides 24/7 professional MySQL support.

  59. Nope. by mindstrm · · Score: 1

    If a node with a guest task crashes in mosix, the task is re-dispatched to another node; it doesn't crash.
    Only if the originating node crashes does the task fail. So if there is one master node in your cluster, your failure rate is the same as if you had one machine.

    1. Re:Nope. by Usquebaugh · · Score: 1

      Granularity,

      say the task that fails is a editor, how does it restart? At the beginning, any input the user made until the task failed is lost. That is not HA.

      MOSIX is HP, from what little I know it performs load balancing very well. It provides little or no HA features.

      The failure rate is very dependent on the granularity of your system. If a block of work can be re-run then it provides no change. If a block of work cannot then it fails. 1 node failure can bring down a cluster.

  60. Re:Nothing beats Oracle RAC by Anonymous Coward · · Score: 0

    main difference between ops and rac is that ops used files for global lock management and rac does it in memory, iirc. rac works much better. You can try a demo on a single node. It's pretty cool to have multiple database instances but only a single database. You log into one db, perform some DML, then log into another db and there it is, instantly!

    f.y.i: a cluster is a multi-node single database. You can lose one node without losing data, for example by sharing all data on a san.

    A distributed database is a series of databases that are connected via the network. If you lose one node, you lose data.

    a replicated database is a series of databases that have copies of data on multiple nodes. You can lose a node and potentially not lose data, but the overhead of replicating has performance and state limitations.

  61. We pioneered this concept as yearly as 2000... by BigGerman · · Score: 0, Troll

    ... as part of the closed-source project but the 19yr old asshole running the company did not recognize the potential.

  62. Use CORBA clients to J2EE by Anonymous Coward · · Score: 0

    Deploy your JDBC code inside a J2EE app server, whose default protocol is RMI-IIOP.

    (JBoss for you open source zealots)

    Hey presto, your non-java clients can access your code through CORBA!

  63. Re:Of course, if mysql had replication worth a dam by drinkypoo · · Score: 1

    You get transparent switchover for free with load balancing if your replication is good enough, and your load balancing is at least decent.

    --
    "You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
  64. Oh fuck it�s great.. by troy_psx · · Score: 1

    I was tested where(in Brazil) in work very well, i tried 3 machines with postgresql and work veeerryyy well...

  65. No, partial replication would not speed queries by brlewis · · Score: 1

    For queries involving multiple tables, the database itself will be able to optimize better than any RAIDb controller. Full replication will always be better. Each individual db server has direct access to all tables and can formulate a good query plan.

    The only situation where partial replication would speed things up would be a large number of write-intensive tables. I expect most apps would have one, two or possibly three write-intensive tables. I don't know if TPC-W is realistic.

  66. Masses don't need it by brlewis · · Score: 1

    For individuals and small businesses content with 99% uptime, a project like this is overkill. It's still interesting though.

  67. That is replicating not clustering... by hughk · · Score: 1
    The MS SQL Server just spreads the data around. It is very primitive reallz, but then MS Clustering has no clustered file system.

    The problem is that you need a lot of cooperation between the RDBMS and the host OS. It is very hard even for Oracle as they have to solve the problem for eash host OS that they support except for Oracle/RDB, which only supports OpenVMS and Tru64. Oracle/RDB was bought from Digital whilst iit was having financial problems. It was engineered to work well on Digital's cluster technology. Oracle attempoted to strip mine Oracle/RDB for the technology and the customers but this seem not to have worked. The technology was too specific to the OS and many of the customers didn't want to move.

    What does this mean here. Well, with Linux we have OS source. We have the source code of MySQL and Postgress, so it would be relatively easy to build some cluster services into the OS and to build the support in to the RDMBS.

    What Digital did was to build a cluster wide file-system, where storage could be attached to any node (or more than one node and would be served transparently to the rest of the cluster). For this, you needed a distributed lock manager to synchronize access.. This is also pretty vital for the relational database management system and Digital created an OS level 2PC service as well, again kernel integration helped speed up the prepare-to-commit/commit type transactions. Fast cluster comms are also useful, but these alreadz exist for Linux from a number of vendors.

    It *will* happen for the opensource databases, people have the sources and eventually because of the ability to integrate the kernel support, it will probably work better than with 3P databases.

    --
    See my journal, I write things there
  68. Re:Clusters aren't performance? Just not true! by jsin · · Score: 1

    I wrote a nastier response to the first comment like this, but since you used polite terms, I will also.

    In my original post I indicated that clusters are "typically" used for avaliability and not perfomance; this doesn't mean you can't use a cluster for performance, it just means that most clusters are put in place primarilly for the avaliability functionality and performance is a secondary concern if one at all.

    This is evidenced by the most prominent clustering software avaliable for databases. Both Microsoft and Oracle provide an avaliability-enhancing cluster software (MSCS for microsoft and Oracles is called Fail Safe if I remember correctly) as a standard component of the database servers, and only Oracles Paralell Server (an expensive add-on) will enhance PERFORMANCE through clustering.

    I guess my point wasn't to say you CAN'T enhance performance through clustering, it just not the primary motivation for most of the clustering that I've seen.

  69. Oracle more vunerable than MS to Open Source by Anonymous Coward · · Score: 0
    I agree that Oracle is more vulnerable to Open Source development than is Microsoft. Now, this approach to clustering is most important to shops that are doing new development. However, database applications tend to have a long life compared to other forms of software development. Postgres is something at this point that could run fairly simple Oracle applications with minimal re-engineering--but there is a big step from simple re-engineering to no re-engineering.

    Personally, I expect we'll see folks abandon oracle when the process of moving applications from Oracle to Postgress is VERY seemless for at least some class of application.