New Linux TPC-H Record Set
prostoalex writes: "New TPC-H world record for performance and scalability of database software on Linux platform has been set. The winner - Oracle 10g running on a four-node Lenovo Cluster Server DeepComp 6800, each with four Intel Itanium 2 1.3 GHz processors. Oracle also emphasizes that it's 3.5 times more performance than similar IBM DB2 benchmark. TPC-H benchmarks are available at TPC Web site."
are probably comparing this system to some old ibm benchmark. They didn't say in the press release, so I'd assume the worst.
l .asp?id=103073001
IBM appears to dominate the TPC-Hs at the top & bottom, with oracle owning it in the middle.
The only really interesting benchmark out there at the moment is the IBM DB2 ICE configuration - in which they spread db2 across dozens of low-end AMD Opteron dual-cpu servers. DB2 (and informix god bless them) partition differently than oracle - more like a database implementation of beowulf (that they've been doing for 8+ years). Way cheaper than anything from oracle, and you can toss up to 1000 servers into it. Their benchmark is in the 300 gbyte range, not 1000 - but it'll scale way beyond oracle, and is cheap for that kind of power: http://www.tpc.org/tpch/results/tpch_result_detai
Makes me wonder how many pcs I've got laying around the house...
there are basically three type of clusters:
1) shared nothing: in this, each computer is only connected to each other via simple IP network. no disks are shared. each machine serves part of data. these cluster doesn't work reliably when you have to aggregations. e.g. if one of the machine fails and you try to to "avg()" and if the data is spread across machines, the query would fail, since one of the machine is not available. most enterprise apps cannot work in this config without degradation. e.g. IBM study showed that 2 node cluster is slower and less reliable than 1 node system when running SAP.
IBM on windows and unix and MS uses this type of clustering (also called federated database approach or shared nothing approach).
2) shared disk between two computers: in this case, there are multiple machines and multiple disks. each disk is atleast connected to two computers. if one of the computer fails, other takes over. no mainstream database uses this mode, but it is used by hp-nonstop. still, each machine serves up part of the data and hence standard enterprise apps like SAP etc cannot take clustering advantage without lot of modification.
3) shared everything: in this, each disk is connected to all the machines in the cluster. any number of machines can fail and yet the system would keep running as long as atleast one machine is up. this is used by Oracle. all the machine sees all the data. standard apps like SAP etc can be run in this kind of configs with minor modification or no modification at all. this method is also used by IBM in their mainframe database (which outsells their windows and unix database by huge margine). most enterprise apps are deployed in this type of cluster configuration.
the approach one is simpler from hardware point of view. also, for database kernel writers, this is the easiest to implement. however, the user would need to break up data judiciously and spread acros s machines. also adding a node and removing a node will require re-partitioning of data. mostly only custom apps which are fully aware of your partitioning etc will be able to take advantage.
it is also easy to make it scale for simple custom app and so most of TPC-C benchmarks are published in this configuration.
approach 3 requires special shared disk system. the database implementation is very complex. the kernel writers have to worry about two computers simultaneously accessing disks or overwriting each others data etc. this is the thing that Oracle is pushing across all platforms and IBM is pushing for its mainframes.
approach 2 is similar to approach 1 except that it adds redundancy and hence is more reliable.
The Itanium's have 512KB of L2 cache and 3MB of L3 cache, with it's L3 cache being faster and having lower latency than the L2 cache of the old Xeons.
Xeons are fine chips, but the 900MHz Xeon is totally outdated. A new 2.8GHz XeonMP system with 2MB of L3 cache would probably also be about 3.5 times faster on this test than the old 900MHz Xeon.
As you can see here, the DB2 systems they seem to be comparing themselves with scored more than double what this one did.
I would expect a larger system to score lower on a pre-processor basis just from scaling issues, even if the processors were identical.
While the 3.5x ratio is impressive, the manner of it's announcement is very misleading.
"The worst tyrannies were the ones where a governance required its own logic on every embedded node." - Vernor Vinge
No, its the largest Linux machine. Not a cluster, but a coherent, single system image.
Obviously SGI had to modify the 2.4 kernel to achieve good performance and use extensive NUMA modifications. No, "using NUMA over internode communication link" doesn't kill your scalability. NASA isn't that stupid, neither is SGI.
The guys at SGI are currently testing the linux 2.6 kernel with a 512 processor system. They are saying (unsurprisingly) that it looks like it is a great improvement over 2.4.
Apparently the American Chemical Society has a Postgresql database in use that's over a terabyte in size. I don't know if this is the largest one currently in use.
Also the largest commercial database is about 23Tb and runs on Oracle.
What these numbers don't say anything about, though, is how much of these databases are taken up by BLOBs, and how much is actual field data. Having most of your data in BLOBs is really just making your database a fancy file system, since BLOBs reside in a different part of the database, cannot be indexed (at least not like normal fields), cannot be used in SELECT statements, etc.
Actually, this is what Oracle has been trying to get companies to do for a few years now. Put EVERYTHING in the database.
For that matter, Microsoft plans to take this approach by actually placing the filesystem in the database in an upcoming Windows release.
Give me access to a 50 terabyte disc array and I'll gladly build you a 50 terabyte Postgres database.