Are There Large RDBMS Using Linux?
Jason Perlow of Linux Magazine writes:"
With all of the recent computer press coverage of Amazon and Intel converting their
web servers and other front end application servers to Linux, many of these stories
neglect to mention that the back end systems these companies use still rely on
commercial Unixes like Solaris, AIX and HPUX to host their RDBMSes (Oracle, DB2,
Sybase, Informix) for their mission critical transactional applications and data
mining.
Are there any companies out there actively using Linux to host a mission-critical
RDBMS ? or looking to replace UNIX with Linux for this purpose?"
Ok, maybe they are not huge, but Prada (Italian fashion designer and sponsor of "Luna Rossa" at the last America's Cup), uses Oracle running on RedHat stored on a pair of EMC Clariions for their datawarehouse.
I don't know what size the database is, but the Clariions had 400GB each worth of disks.
--
The world is divided in two categories:
those with a loaded gun and those who dig. You dig.
As distributions like SuSE continue pushing ahead with high-end features (like logical volume managers, which SuSE already has), usage of these products on Linux will undoubtedly increase. Part of the situation here is cost. When Oracle Enterprise Edition costs $40,000 per CPU, plus another $8,000 or so per year for support, who cares about spending a little more for high end Sun or IBM systems?
Also, Oracle 8i, while supported on Linux, did not offer a couple of features found in Oracle 8i for other systems. In particular, full interMedia support for full-text searches of all sorts of documents (especially from software made in Redmond) was not available in the 8i Linux version. The new 9i does support this feature under Linux.
I think we're going to see things change gradually as acceptance grows. Don't rush things. People will move when they're ready, and trust is there. Redhat's worth watching. And it doesn't have to be big vendors, as so much less functionality is needed in the DBMS in these days of N-tier & appservers based infrastructures
And how about designing FOR failure and using commodity boxes (running a free OS?) at the same time? Check out Clustra for a RDBMS that runs on Linux & Solaris, runs over LOTS of small, cheap commodity boxes, and is as a result, very reliable (yes, I do use it). Ok, so it's not free in any sense, but it's good and solid, and used by some big players in the telecoms industry.
ooooooh! What does this button do? - DeeDee, Dexters Lab.
We hosted roughly 2tera of mission critical db on 2 Quad processor Linux servers. They were running Oracle as their db. It worked great, and we had little problems.
We were also an AIX shop, but decided to go with Linux for this application because of the overall price of hardware and supporting applications.
Our company, a custom e-solutions provider, uses Oracle 9i on Linux almost exclusively because of Oracle's reliability and the fact that we have the resources in-house to support it. There is a caveat to this, though.
At $5,250 for just a 2-year. single processor standard edition license, 9i is not cheap and
most companies who already have an infrastructure built on it will not always realize a signifigant cost savings by moving to a Linux platform. 9i
Enterprise Edition is a cool $45K per processor so it is easy to see how the difference between $20K and $100K for an 8-way Intel versus an 8-way Sun
machine may not always be the determining factor in a platform decision for a system with a 5+ year time horizon.
In addition to the links above, most of the big database systems have active Linux ports. Any Oracle, Sybase, Informix or DB2, InterSystems, Poet, or Versant customer is a potential Linux customer.
It looks like the USGS has some use for linux as a backend - the NWISWeb is using RedHat and MySQL, according to their "About" page.
As to size of the database, the realtime sites are collecting measurements every 15-60 minutes, one or more parameters, 24/7. It all adds up after a while.
If you are running a VLDB on Oracle, you want a 64-bit system; otherwise the SGA is limited to 2GIG.
Oracle only supports Linux x86, with all of its 32-bit memory constraints. Does Linux implement memory windows like 32-bit HP-UX?
Also, at linux.sybase.com, you can download for free the Alpha-axp version of Sybase ASE 11.0.3.3 - this is probably the most available commercial 64-bit database for Linux.
Really, the Linux and WinNT versions of Oracle are at the low end of the food chain.
We have four linux machines using Oracle 9i RAC for our database. The boxes are penguin computing 200x Relions each with qlogic 2200 fibre channel cards and an Intel 10/100 dual nic card, which ties into our SAN'd up Clariion 4500 disk processor/array. The three nics (including the onboard) gives us a frontend/app network, backup network, and an oracle IPC interface.
We have had success using Redhat 7.1 (upgraded kernel to use LVM) and Suse 7.2 (comes w/LVM) for the linux distribution. Do not attempt RAC or OPS without an LVM of some sort. It can be done, but it shouldn't.
The biggest expense you will have is the disk array, and you should not skimp on this. Buy fast reliable maintained disk.
The Linux solution beats out Sun solutions in price hands down. You are talking $30,000 per box for the minimal Sun allowed hardware requirement for the Sun Cluster software with the Oracle Parallel DB runtime licenses (this has changed with v3 and so have the hw requirements). The Sun Cluster software requires an extensive review process by Sun which basically insures your company has two extra of everything and can be onsite to help Sun with their software and hardware in 4 hours. If your company doesn't have it's shit together, Sun and the few vendors that even know what Sun Cluster is aren't even going to bother talking to you about it.
This Linux solutions beats out a Windows NT solution in reliablity over the simple fact that the disk and volume management is clumsy. There is no easy way to create labeled raw devices on a Windows machine. The process as I remembered it was creating unlabeled logic partitions for each disk space and then maintaining a file pointing to the value of the related registry key to map out the tablespaces. As soon as you added a partition, modified a partition, or even used another node to look at the partition table, you and the database were screwed (i.e. restore). This problem with managing shared disk may have been fixed in 2000.
The weakest point in the entire Oracle 9i RAC is the cluster software layer. Whether you are using Sun's Cluster Software, the Oracle supplied cluster manager for Linux, or the hardware vendor supplied OSD layer for Windows. Be prepared to spend serious time in monitoring and getting it under control with appropriate patches.
Once you have fought your way through all of this you can reap in the rewards that multiple nodes with shared data gives you. The greatest benefit is the ability to partition your data and your application which allows you more opportunities to scale. If your data does not partition by some logical means (date, timezone, city, planet, etc) forget about it. Just get a big honking database machine (especially you SAP/Peoplesoft poor SOBs).
No.
I keep a journal at www.livejournal.com They distribute copies of their clients and their server under the terms of the GPL. They use mySQL in what I consider to be a very large environment. I don't have exact numbers, but it is a large (very large) site and keeping track of all of those journal entries is obviously very trying. I guess I should also share that they are having their fair share of problems keeping their hardware up to date to handle their load. Check them out!
At the company I work for (which will remain unnamed because I am not in a position to speak on its behalf - but it is an old and large american company with a single character stock symbol) we use Oracle 8i on Compaq Proliants running Red Hat Linux - not only that but it's RH6.2 with all of the limitations of that line of kernels.
None of the databases are gigantic - 80Gb is the largest, but we haven't had any problems at all. If anything, most of these databases used to be on True64 (Digital Unix before that) and we had a lot of problems (although they were probably hardware related). Also - users have reported that performance is better (not that it was a real issue before) but we've never bothered/attempted to document that.
I can't say that the main factor for the move was money (although it was a factor) - after all, if you can afford the Oracle licenses you probably should not be cheap with the hardware/OS but we've had a whole lot of RH Linux for other applications and it just made sense to consolidate.
I'm sorry, but did you not read his question? He didn't ask which databases you could connect to with Linux, he asked which LARGE, MISSION CRITICAL RDBMS Servers ran on Linux. This is not a troll, rather a correction on what this post should have gotten: off topic.
assert(expired(knowledge));
I would think that Redhat would disagree with your stance based on this product from their website.
We run Linux at one of our fabs here in Taiwan running a mission critical DB system called C-Tree. This is 24/7 stuff for those of you who don't know how Fabs work.
Objectstore. An object oriented database (see www.objectdesign.com ) thats known for its speed.
Who knows why we didn't say that.
Note that the UCITA and DCMA make it even more difficult - actually almost impossible - to sue your software vendor.
So WHY does everyone keep repeating this mantra that you can "at least sue your vendor" with proprietary software? YOU CAN'T. And how is a contract with a closed source vendor any more legitimate than a contract with an open source one?
Mission critical order database goes down at 3:17AM, your network operations center needs to get it back up and operational quickly and they can't reason the error messages (usually programmers suck at putting in decent error messages for the non programmers that will eventually support their software), do you turn to the web - or do you call technical support?
Support will ALWAYS be a sticking point with using open source software and anyone who poo poos that obviously doesn't have experience in trying to deploy and maintain software in a large organization. Sure Bill's Fishing Hut can afford to go without their machines and essential services for hours - but for any significantly large organization that means a LARGE sum of money altogether!
Our Oracle 8.1.7.2 instances running on Solaris 7 to Oracle 9i running on Linux. Our biggest problem so far is vendor related, as our ERP (Peoplesoft) climbed into bed with Microsoft some years ago and basically has just ignored the Linux market for an apps port :(
Anyway, we're shopping replacements for our 3500's and we've found that bang for the buck, Linux for Databases is the way to go. Most of these servers are one-task anyway, and Oracle runs like a champ so far. There are some issues with Glibc that require some manipulation of libraries to get around if you want to use any other dist. than SUSE tho, which sux. That said, we're testing with mandrake 8.1 and it runs fine (post patch).
Imagination is the silver lining of Intelligence.
I run avidgamers.com, a community hosting service currently hosting around 7000 communities. We have 1.2 million records getting an average of 20 queries per second, ranging from sigle-record results to large summarizing queries. (With a fairly large part leaning towards the latter, tallying the number of replies to each thread in message boards.)
Running MySQL 3.23.40 on a 1.4GHz Athlon with 1GB of RAM and an 18GB 15krpm SCSI drive, the system is doing ok, but it's starting to feel the load peaks. I'll be upgrading to RAID fairly soon, which should help things.
All in all, I'm very happy with MySQL, but I'm strongly considering a move to Postgres, because the lack of row-level locking is starting to become a problem. Stability has been no problem... no crashes, no data corruption, nothing.
I'm sure this is in no way one of the largest installations of free software databases, but I thought I'd post my experiences anyway.
-- If no truths are spoken then no lies can hide --
Actually, there's only 24 single letter stock tickers. I and M are both open.
"support. (who gives 24/7 support on postgress, and send out tech support guys giving consultations, will come on site on a sunday at 4am?)"
RedHat either already does or will soon.
"what OpenSource rdbms provide true mutli language support (we have records in cryllic, japanese, american, german, etc)?"
PostgreSQL.
"high availablity (i dont know the current state of HA functionality in the linux kernel)"
Why not look it up?
Become a FSF associate member before the low #s are used
We ran extensive comparisons for a Data Warehousing project using Sun HW/Solaris/Oracle versus Penguin Computing/RedHat/Oracle and while the Sun solution was slightly faster in our tests, it was only marginally faster, yet cost significantly more. No way could we justify the additional expense based on our results. And we haven't looked back. Our Oracle servers haven't failed us in nearly two years, and they just keep getting better. and today's options for Linux hardware are much batter than 2 years ago. We even discovered a problem with a particular Sun server during our testing that Sun asked us to keep quiet about. We took that to mean they'd sue us if we discussed it. Didn't take long to realize that this was not a company we wanted to do any business with. Sun sucks.
Weather.com is using Linux quite successfully to host its Oracle backend. They have replaced 250K Sun machines with 50K Intel based systems doing the same work.
Ummm... Perhaps not. $40000 per cpu is not a lot compared to the cost of each CPU. A high end Solaris box will start at around $450,000 for a couple of CPU's and run up into the millions. For that kind of money (plus the cost of Oracle), you can buy a cluster of VERY powerful x86 servers and run IBM's DB2 EEE for Linux (which has no such limitations as the Oracle/Linux port) in a clustered configuration and blow the Sun box out of the water in price/performance.
We run a large auditing system (OLAP-oriented rather than OLTP-oriented) on PostgreSQL (v7.1.3) on Linux (RH 7.1), using Tomcat (v4.0.1) as the front-end. We're running it on a Dell PowerEdge 2400 (2x PIII-866) with their Perc RAID controller with a Raid 1 and a raid 0+1 volume.
Our database is currently a bit over 8 GB, with many of the tables exceeding one million records. Queries typically join > 5 tables.
We moved from an MS Access/SQL Server environment and are much happier with the functionality , performance, and stability we now have.
Not to slam DB2, as I think it's a great product and have successfully used it for some really big projects, but for this application I found the PostgreSQL delivered ~4x the performance on many of our key queries. The lower cost and lower administrative overhead sealed the deal in favor of PostgreSQL.
As always, though, your mileage may vary.
Gordon.
He that breaks a thing to find out what it is has left the path of wisdom.
-- J.R.R. Tolkien
I work for a company where we tried running a large RDBMS (DB2) on linux. It failed HORRIBLY.
We're back to AIX now, and everything runs smoothly, and we get decent support from IBM.
"Nothing runs on Linux like DB2". Hah, so true...
yup - and they hacked in 64-bit filesystem support so you can have data files larger than 3 gig
The article is specifically asking about RDBMSes, which Google's system is most certainly not.
There is no way current hardware and RDBMSes could handle 1/100th of Google's queryload. A well built custom solution will always destroy an RDBMS; but the development will cost you...
Many VERY large databases are still , since hierarchical RDBMSes can't compete in that field.
you're kidding right? you might want to read what that system does. their entire budget is $300k. its an email list.
I am the lead dba for a company that processes 15-20 million us dollars worth of transactions per day. My backend database is solaris/oracle, it does 3000-4000 sql statements per second, and my company would loose maybe $1000 in revenue for each minute it is down. The larger
two tables in this databasehave in excess of 300 million rows, and are acessed by 100k customers per day. We have over 11 million customers.
It's running on a E4500, which is saving us a lot of money *not* buying E10000s. I like to think it's tuned well, but a big part of the reason it works (fast) is also that it is on an EMC with over 90 disk drives in it. I'ts all about IO bandwidth and servicability in my world, and on those points you are correct in saying sun is a handsdown winner over linux.
.
Now, I work with a sysadmin who is a whiz at making lots of linux boxes work reliably as a web frontend, and is also good at keeping our backend solaris based database up 24/7. neither of us is anxious to put the backend on linux, but we did put up a significantly large, high performance, but *relatively * low availability database up on linux.
It's a 6x800mhz intel box with 4g ram and 16 disks on mylex caching raid 5 controlers. Raid-5 sucks in general, but the point of this system was to get a lot of bang for the buck, so as a big league dba, I took the challenge of making data loads fast in spite of raid-5, in order to get a crack at de-installing windows from this box. If I spent some bucks on more disks, we could get a much faster system, but then that was never the point of this system.
The system is about 200G worth of partition tables (copies of the same 300M row tables mentioned above) with partitioned rollup tables off the sides, for business analysis. The real trick is the partitioning. because of the partitioning, this system is able to do many types of analyses that cant be done on our other analysis system which happens to be solaris with 60 disk drives.
the linux box was a leftover from a failed windows project, so in some sense it was free, but I belive it woulda cost about $80k new. gig ethernet and controler was about 10 or 15k of it.
It's working well for DSS, since the 2 times it's crashed in the last few months didn't really hurt anything.
I'm rambling on now, but I'll talk to the DBAs out there, who speak my language.
If you're gonna do Linux oracle:
- reiserfs sucked performance wise on top of raid 5. Don't know if I did something wrong, but I abandoned it in favor of ext2. I don't care if fsck takes a long time on this system, and ext2 creamed it for database io perf on raid5. I also couldn't get perf out of reiser on simple stripes without the added hurt of raid5, so go figure. fsck times are irrelevant if you use raw partitions, so this is the way to go in most cases.
- Max out the memory (of course) on an intel box. I think the most you can do is 4G on intel platforms. this is sufficient for me, but I kept the SGA down to about 500m, so I could have 10 way parallel processes with 200-200M of sort area size.
- Watch out for linux caching. I've turned it off for my filesystems. It's easy to get into "writeback debt" by pushing a lot of dirty blocks out of oracle cache into ext2fs cache. Add raid5 suckiness at random writeback, and you've got serious constipation problems on your hands.
- I've used some raw partitions, for this system , they seem to be worth it to avoid ext2fs caching hassles, but I haven't migrated completely yet. The "raw" command must be used to "bind" a name to a disk partition before it can be used by oracle as a raw partition, so it makes for a few extra hassles, but no big deal.
- I got a mylex caching controler, which aparently has hot swapping capability in the hardware, mitigating the absence of veritas volume manager and hot plug capabilities at the linux level. It also makes raid5 tolerable. Haven't proven hot swapping by testing yet tho.
- Ext2 fs has some raid5 aware stuff, this helped on the raid5 mylex vols I have, based on cursory thruput tests, but I'm not sure I'm getting the block alignment proper at the oracle level. (don't know after all the oracle/ext2/controler layers, if oracles 16k blocks are aligned with the stripes on the mylex. sigh.
FWIW, back in the dot-com heyday, I also had clients doing modest high availability (to them) databases on oracle/linux. Even then, on relatively small (in gigabytes) database the biggest tunining hassle was writeback caching of linux getting in the way of oracle, and the biggest hassle of scalability was managing many many disks. Raw partitions can get around the former, intelligent controlers (mylex etc) or intelligen disk arrays (clariion, sun t3 etc.)
get around the latter
You are just using the wrong database, then. There is a 50cpu linux cluster (not Beowolf, but the native clustering to the database) that was loaded with 2.5 billion stock transactions. It performed very well using KDB (taken from kx.com):
.1 second.
on thursday jan 4, 2001 steve miano, ed bierly, keith mason and i
loaded 2.5 billion trades and quotes on a 50cpu linux cluster.
simple table scans on one billion trades, e.g.
select distinct sym from trade
select max price from trade
take 1 second
multi-dimensional aggregations, e.g.
/ 100 top traded stocks
100 first desc select sum size*price by sym from trade
/ daily high and close
select high:max price, close:last price by sym, date from trade
take 10 to 20 seconds
translating the data from TAQ to kdb took about 5 hours.
(steve had loaded the 200 TAQ cd's onto several disk drives.)
distributing the 100gigabytes over the 100Mbit ethernet took 3 hours.
(this cluster should probably have Gbit ethernet)
loading the database (k db taq.m -P 2080), starting 50 slaves,
connecting, mapping shared indicative tables over nfs, building
parallel partitions, etc. took