Linux Grabs World Record For TPC-H Benchmark
An Anonymous Coward writes: "Linux 2.4.3 now holds the world record by performance with IBM's DB2 in TPC-H. TPC-H is a decision support benchmark consisting of a suite of business oriented and ad-hoc queries and concurrent data modifications. This is way cool as the world record was held by SQL Server 2000 on Windows 2000 before." Caveats: this is only in the 100GB (smallest) category, and all but 2 of the other entries are several months old. Even so;)
just look at the other systems in that class. they run ~ $250,000 as opposed to the linux system which is ~ $1 million.
Um, did you notice that the linux benchmark was $347/QphH while the next one on the list (W2K) is $161/QphH. Which one is more expen$ive?
You may have to shell out thousands for the software (vs free Linux) but the _machine_ dominates by a lot. The Linux configuration was 4x the cost of the Windows configuration ($1M vs $250K).
There's really three different markets, with Linux winning two and Windows winning one.
Under 1 GB is dominated by Linux. It's cheap, it's fast. Hard to sell software to this market because the total budget is usually in the low thousands at most.
1 GB-100 GB is decent with Windows. Much more bang for the buck then the Linux solution. At the top end, Windows just can't handle the load and buckles. At the bottom end, it starts becoming unrealistic to really spend a lot of money on your database.
Over 100 GB is dominated by the hard-core *nices. Linux can probably be used seriously now although a lot of companies would rather go with the proven solution and pay. When $10k software is under 1% of your total purchase price, free OS vs paid OS isn't much of an issue.
Although it seems very logically that a 4 machine cluster is faster than it's single machine counterpart, it's not that simple an equation.
Not every database or operating system can scale that well. Lets take a look at each individual machine of the cluster. They are SGI 1450s with 4-way Xeon 700Mhz. The nearest competitor on the performance chart is the NEC 4-way machine, which is at 800. Assuming each individual machine of this cluster is also at about 800 tpm, then the cluster scaled at 85%. Not too shabby. Can you do the same type of scaling with Oracle? Not likely(look at any of the Oracle benchmarks, the biggest cluster they got have two machines).
Also, if you have recently read what the Oracle guys have been asking the kernel developers, you would know that there are a lot of features that the Linux kernel is missing right now causing it to not perform optimally as a DB server. The SGI and IBM guys have worked hard to get around every one of these barriers in order to get these results. This really shows that both these companies are very dedicated to make Linux be the top choice as DB servers in the future.
I don't want to burst anyone's bubble, but Linux off the shelve at the moment is still a very immature operating system for a DB server. However, with the work of companies like SGI and IBM, Linux now has a top result on a industry recognized benchmark.I wouldn't be surprise if there are more results coming in the near future.
Congrats to both companies for this great result!
"But it's open source, so it's less secure. Would you entrust your data to something that anyone can modify the source code for?"
"But it's open source, so you don't have the satisfaction of having paid several thousand dollars for a Windows 2000 Datacenter site license."
"But it's open source, which we all know makes Baby Jesus cry."
"But it's open source, which sort of sounds like 'open sores', which is just gross, don't you think?"
Pick any or all of the above, submit to PR Newswire...
- A.P.
--
Forget Napster. Why not really break the law?
"Remember when the U.S. had a drug problem, and then we declared a War On Drugs, and now you can't buy drugs anymore?"
The reason that people hate Gates is that they've been hurt by his software. As in, "I've been Gatesed," a reference to the loss of data, work, or effeot because a machine behaves in a fundametnally unreasonable way, or becuase the command just used is different thatn the command that was bound to the same keys on the prior version, resulting in a loss of data. Or when the machine refuses to allow acces to a file without a bizarre workaround. Or when focus to a window is lost even though you were still typing because of a "helpful" feature. Not to mention the times when this week's version of word/excel has problems with a vile saved by last week's.
No, it isn't his money, wealth, or succes that cause most people to hate him. It's the painful to use products.
I never hated windows until I had to spend a day with it to et the files I needed to download Linux . . .
And no, I don't hate Gates. I'm not even anti-microsoft. I do, tough, understand someone who regularly uses the products becoming so . . . and I have the advantage that I haven't spent a lot of time with any since word 5.1 and excel 4--not from loack of opportunity, but because the later versions took out features I used constantly. I switched to *nix over LyX, not ideology (and I would have had to switch in a year or two anyway due to high performance computing needs that windows just can't handle).
hawk
Just a note: Although the database was IBM's DB2, the benchmark was done by SGI on 16-way Pentium III Xeon cluster. Also of note is the fact that the nearest competitors were 8-way Pentium III Xeon systems, not clusters, running SQL Server 2000 on Win2k.
The reason is that Oracle, by default, has been using rule-based optimizations for quite some time. It has cost-based optimizations, but they aren't turned on by default (actually, I think they are in 8i). Noone uses them because their queries are now so optimized for rule-based, that switching to cost-based would make it quite unpredictable. Oracle Applications 11.0.3 and earlier is made/optimized for rule-based queries. For any given query, you can turn on cost-based optimizations by adding a specially-formatted comment (forget what this is at the moment). Anyway, Oracle Apps 11i w/ Oracle 8i is supposed to have cost-based optimizations by default. Now, rule-based optimization is more tradition than anything.
Please note that much of this comment is hearsay from what the DB guys are saying. I try to stay out of most of the Oracle stuff myself, unless I have to mess with it.
Anyway, I'm curious whose cost-based optimizer is best.
Engineering and the Ultimate
It wasn't Slashdot that questioned TPC's worth, it was KhaosSpawn. Just because they post an article doesn't mean they necessarily agree with it.
...it's more the fact that Windows isn't the leader in any other category (all the others appear to be heavy-duty *nix). Although I imagine Microsoft will be coming right back with some results of their own. Thanks a bunch to whomever submitted this configuration for testing - does anyone know if it was sgi, or who?
Caution: contents may be quarrelsome and meticulous!
Your right to not believe: Americans United for Separation of Church and
People hate Gates because he is litterally out to destroy Linux. If he could find a way to obliterate Linux from the face of the planet, he would do it, along with everything else that is non-Windows. In contrast, most Linux users prefer a heterogenous software world, which includes Windows along with everyone else. Microsoft constantly try to escalate the situation though, by trying to break open standards for example, or making speeches about how the government ought to do something about free software.
Essentially, it isn't safe from a free software perspective, to just think of Microsoft in the same way as any other large closed source software company. Other large closed source software companies don't go out of their way to attack free software, they try to provide non-free software that stands on its own merits. Turning your back on Microsoft is like turning your back on an axe-wielding homicidal lunatic.
PS/SQL Powerful? Try JAVA... or even better, on postgres you have C.
:)
... we have a policy of no Windows NT/2k machines in our datacenter so no SQL Server. Give PG a shot, you would be surprised.
My DB life started as a MS SQL DBA and I must admit that SQL 7 was a pretty nice product. For a workgroup DB server it can't be beat, though I do agree it requires way to much babysitting to sit in a datacenter.
MySQL is a joke (IMHO) for any serous database work We have a large ASP that is running completely on ASP. (We power image sharing websites).. we are getting 1-2 million hits a day from the various customers websites which we power. Also the nature of the website puts the database into a position of doing various inserts/updates/deletes/selects on VERY large tables.
After testing our application with both Oracle/Sybase (11.9)/DB2/Informix and lastly Postgres... the database that satisfied our needs of a VERY high transaction database stable and scalable was Postgresql. Version 7.1 finaly was able to impliment some of the last needed features and writing our stored procedures in C is no problem for us.
Also, the architecture of our database allows us to easily cluster our databse contents across multiple servers and PGSQL cost structure fits our needs there to keep our performance top notch.
I found DB2 to be just to damn touchy. Oracle was too damn expensive and
--------------------
Would you like a Python based alternative to PHP/ASP/JSP?
... since when did doubling the amount of performance only cost twice as much money?
Never? Have you ever seen anything like it? Even something as simple as getting double clock rate from Intel costs about 4 times the amount of money. Is someone somehow not expecting that going from extreme performance, to twice the performance would NOT cost 4 times as much?
1. Doubling the amount of CPUs does not yield double performance, but cost much more than twice the amount. This would happen to the Windows 2000 servers as well.
2. Did anyone at least check out what is included in the price? I do not know, but for all I know, the SGI 1450 may be equipped with more levels of redundancy, have more expensive (more stable, not fastedr) hardware.
3. The operating system cost for one of these beasts is miniscule, so I as a Linux-advocate would not even consider arguing for price of the OS for machines like this. (I'd argue performance, stability and tweakability).
What's wrong with you?
Why does it seem that most people spend most of their time hoping for something bad to happen to Microsoft? If there's a security problem with a Microsoft product, it's not a *good* thing, much less is it "too good to be true"! If Linux scores better in a database test, it's a good thing because it means there have been advancements in software. It's not a good thing because Microsoft came 2nd.
The whole culture of hate here on Slashdot (and in the open source community in general) really bothers me.. Why do you have to hate something? Why isn't it enough for you that Linux or your favorite open source project is successful and works great? Why do you have to stomp everything else? No wonder people say "open source, closed minds".
If it was just 13 year olds writing the comments, I'd understand it. But it's the editors of Slashdot too! You guys really should set a better example than that. Even Linus Torvalds said in a recent CNET interview that he doesn't understand why everyone hates Bill Gates so much.
It's much more productive - and much better for the cause (which is to make better software, remember?) - to focus your energy in positive things. Write software, report bugs, test.. Sure, celebrate Linux being first in a database test, but don't celebrate it because it knocked away Windows 2000 & SQL Server 2000 from that spot.
Define yourself by what you are for - not by what you are against.
So basically you are saying that 4 linux machines can beat one Win2k machine. In fact, Linux is close to half the price/performance of any of the machines there.
What it comes down to is that if you want to save money then Win2k is going to give you the best performance for a given price, up to 8CPUs. As there are no benchmarks for a Win2k cluster in TPC-H, you can't draw any conclusions from this 'win'.
If you look in TPC-C however, you'll see that a Win2kAS clusters are ramming Unix right up the hole in performance and price... I'd be sticking to the 'Do TPC benchmarks really mean anything' stories if you want to promote Unix over Windows.
Fear: When you see B8 00 4C CD 21 and know what it means
Why is it that so many sites like this one offer spreadsheets for download as "XLS" when in fact the contents of the file you receive are simply plain text? Just because PC users are too dim to load Excel themselves?
No wonder so many people think they need Office to make effective use of the web. How about if I start making all the images on my site "Gimp" images when they're really just PNGs?
Hmm. actually that's not much more mean that what I've done to my homepage. Try that in MS Win IE - oops, someone doesn't understand W3C font/charset interaction recommendations.
at least per CPU
the SGI 1450 Server as 16 procs
2733/16 = 170(TCP) per CPU
ProLiant 8000-X700-8P as 8 procs
1699/8 = 212 per CPU
and doesn't the DataCenter(tm) version support like 64 procs?
Anyway, i think it's pretty damm cool linux is on top, it reassures me that TPC.org isn't just one of Microsoft bit-achs.
-Jon
this is my sig.
Okay, so a few months ago, you ran an article: Are TPC Benchmarks A Worthwhile Measure? where this test was derided as being a worthless measurement? It was seen as "not realistic" because nobody needs those kind of servers... At least on Slashdot.
So now that SGI cranks out a server with twice the processors and knocks off a half year old record, it's legitimate because Linux wins?
This is absurd. Either this is a legit benchmark or not, make up your mind. If you justify hype like this, then you are no better than MS's FUD teams.
You can't honestly view benchmarks as: well, when Linux wins they are the holy grail, but when someone else wins, it's rigged.
Alex
Maybe I'm missing the point, or maybe I'm not clearly stating mine. DB2 does (and has done since the early 1980s) statistics-based optimization of queries *automatically*. Every time I talk to Oracle developers and DBAs they tell me things that sound like sheer insanity to me: what order you list clauses in your 'WHERE' dramatically change performance, you should do an EXPLAIN PLAN and then write your query accordingly.
I agree that the best optimizer is the human mind, but I have yet to meet database programmers who can bring themselves to regularly remember to care how big the table they are accessing is, or what fields are indexed or which part of their cartesian product is bigger. Programmers don't care. Oracle programmers have to (as far as I can understand). DB2 programmers don't.
This doesn't really suprise me. We've been using db2 in production on Linux at work (www.osogrande.com) for about 2 or 2 1/2 years (since the UDB 5 beta on Linux came out). It is easy to install (aside from some curses incompatibilities on RH7.1, which will get resolved shortly), easy to administer and performs well.
For free databases, I prefer postgres (transactional support, referential integrity, triggers, etc.). For commercial support, I have trouble liking Oracle (terrible query optimization for large queries, no statistics-based optimization like db2, much harder to administer, etc.). DB2 UDB on Linux is a low-cost, high-feature, high-performing dream.
The test was run on SGI hardware. Even though it's intel, it's still SGI. I love the SGI hardware I've been exposed to, but you have to go into it with the knowledge and acceptance of the plain fact that you will pay out the nose for it and maybe a few other orfii too if the sales weasel is being efficient that day. At least they have the decency to make their uberexpensive machines look nice.
So while the OS was free, the hardware was cost++. (think about it man, 16 p!!! zeons alone ...)
--
News for geeks in Austin: www.geekaustin.org
News for Geeks in Austin, TX
While I'm happy to see Linux on the TPC-H leader board, there are several things that makes me feel this "win" isn't as glorious as it could be...
First of all, the cost/performance ratio is much higher than the Windows-based entrants. Since this is a decision support benchmark, the work is heavily skewed to reading mostly static data; a solution which more easily scales up by throwing hardware at it.
Put differently, if you run two clusters of the 2nd (3rd, 4th or 5th) ranked system, and it'll have better performance than the Linux solution, but at lower cost/performance ratio.
Second, the results show that Windows (coupled with cheap hardware) is working so well at the low end that it pretty much displaced Unix systems. (Geez, no wonder Sun's getting creamed in the desktop workstation market these days.)
Third, this "win" is not for a write-heavy database application -- in my personal experience, Linux has been an underperformer in write-heavy database application. (With synchronous writing turned on to guarantee data flush to disk). This makes it less appealing for real-time transactional database work.
Separately, I wish such tools as PowerBuilder and CrystalReports were available for Linux... That would go a longer way to making Linux acceptable to the Enterprise...
I've used Oracle (on AIX, adminned by pros), SQL Server (adminned by me and MCSEs) and most recently DB2 (on Linux, adminned by me).
I have to say out of all these, SQL Server is easiest to admin, but as a DB needs constant nursing.
DB2 needs a moderate level of nursing. I have found it to be 'moody' - killing a long-running batch job sometimes seems to stop the DB, and it's far too easy to get the database into a 'backup pending' state where everything refuses to run until you execute an off-line backup.
We also have problems where a batch process seems to lock an entire tablespace, blocking other updates. An experienced DB2 DBA told me that standard practice was for each table to have its own tablespace (kinda like MySQL), which seems to me to be a bit of an admin headache when you want to e.g. change settings for a group of tables.
On-line backups seem to back up all transactions since the last off-line backup, so eventually you have to take the DB down and do an off-line backup so you can clear the logs.
Maybe some of the problems I've had with DB2 are answered in the docs. They're comprehensive, but it's next to impossible to find anything. I usually resort to grepping the HTML tree.
Oracle needs the least nursing; I haven't adminned it, but I've worked on sites which have no DBA, where the database has run happily for months. No doubt a pain to set it up properly, but (like Unix) once it's going you can (in theory) forget about it and get on with some work.
As a developer I must say I prefer Oracle to DB2 and SQL Server; Oracle's stored procedure language (PL/SQL) is much more powerful than either DB2 or SQL Server - you can actually do useful things without resorting to C, Java or Visual Basic.
Contrary to what was said above, Oracle has had stats-based query optimization since at least V7; IDK how the query optimization compares with DB2 (although I've managed to write some some slow queries in both languages that benefitted from simple re-arrangement), but one thing I have learned - DB2 makes the query plan when a statement is compiled, and doesn't change it thereafter. Oracle makes the query plan at execution time (and caches it for efficiency) this means that if the nature of your data changes, or you add new indices, you have to re-compile the queries stored in DB2 or it will continue to use inefficient query plans. I consider this extremely stupid.
There was an excellent shareware developer/admin tool for Oracle called TOAD, that did pretty much everything you could want; I'd kill for something similar for DB2.
Oracle's docs are also much more usable, and (most importantly of all) there's a pile of good O'Reilly books covering all aspects of using Oracle.
Yes, but did you also notice that SGI supplied 44x the ammount of storage that was required, Compaq only supplied 13x. I.e to put then level you could decrease the SGI cost from 300k to 100k. They also supplied a 30K UPS, which Compaq didn't bother to include. The whole system was overspec'ed, and not in directions that actually aid the performance.
(you don't need 4 monitors to run a DB server, for example...)
FP.
-
--
Also FatPhil on SoylentNews, id 863
There are a lot of problems with coming out and saying that Linux is the hands-down winner of this benchmark. The first problem is that the Linux system has twice as many processors as the next system down. The second problem is that the system costs twice as much as the next runner up. For these reasons alone it is foolhardy to immediately claim that Linux is now the undisputed heavyweight champion of the database world.
However, the story is more than that. The most important thing to notice is not that Linux is at the top, or the number of processors is so high, or even that the cost is exorbitant. The important thing is the Linux is a contender at all. This is an OS that hasn't until recently gotten a lot of respect. That Linux can "keep up with the big boys", it shows that it is certainly capable of handling the computing needs of the corporate community.
Of course, if you live by the benchmark, you die by the benchmark. Linux apologists would do well to acknowledge that it only breaks the world record because the hardware is twice as powerful as the next OS's. However, they can still crow about the significant progress that Linux has made over its relatively short lifetime. What other OS has gone from 0 to 60 in such a short time?
Dancin Santa
but a 3TB category! I realize that several companies have tons of data, but to create a category for 3TB DBs? Holy crap!
I work for a storage solution company and it's becoming very common for even modest organizations like universities to start scoping out needs at the 1TB level.
Why is storage still such a booming business in spite of the economy's twist and turns? Just check out your own habits and what the law and generally accepted "good" practice is: there's usually a disincentive to throw any data away.
It's worth someone's time to index and archive all of those trolls and modded down posts -- gotta give those attorney's something interesting to read.