30+ GB Databases On Unix?
CaptainZapp asks: "A customer of mine runs a ~30 GB data warehouse on a Sybase SQL Server Database. Now their business requires a mirror of this database in a different location. An offer by a reputed U/X vendor for the hardware turns out to be about five times as expensive as when you get a reasonable x86 box, with the necessary amount of disk space and, say, 1 gig of memory. What does the esteemed Slashdot community think: Is Unix capable of handling a database of this size and what other terrible pitfalls do you foresee?" He's not worried about "mission-critical" here, he's just wondering if it's possible.
"Now, the database is not mission critical (which doesn't mean it's not a major pain to reload it), so the issue if raw devices are supported is not too relevant. Further, and even more important, this is a major chance to convince a global player of the capabilities of Linux.
All that said, I' m aware that some of you readers have a quarter terabyte of disk space at your disposal. But that's also not the issue at hand. The question is if it is feasible to run an industry strength database of 30 - 40 Gb size with all its consequences (uptime, maintainability, dumps, etc...) in a Linux / Intel environment."
SQL database at 30G, sure. I would say call Sybase Inc. first, then VA Linux second, and get the answers streight from the people who are most likely sure to give you a usable product. Get your prices, then compare.
I'd be more worried about the differances in _how_ your going to mirror the data (connection speeds, transfer methods, how frequently) and that Sybase doesn't garble things when going from a database on one OS to another (unlikely, but possable).
I'm sure Oracle for Linux will be mentioned, because there are many claims that it will handle such a situation. But, your problem there is going from Sybase to Oracle, not from another OS to Linux. Keep in mind, not all "SQL" databases are identical, the SQL may be, but the extentions provided by the manufacture won't be.
It doesn't have to manage it's own disk space. And it may, under certain conditions, provide better performance. We have been moving away from raw data partitions. This after running some benchmarks of a large table residing on raw partitions vs. the same data residing in tables in a filesystem. The performance was actually better while accessing the data in the filesystem. We're talking 10+% better performance not just a few percent. Our experience, based on our benchmarks, and discussions with Oracle technical people, is that the preference for using raw data partitions was based on performance tests using older versions of UNIX and less capable filesystems. Of course, your mileage may vary.
Aside from performance, if your database changes frequently, adding and deleting tablespaces is a major pain (with long downtime) when you're using raw data partitions but is a snap when you're using filesystems for data. If your database is fairly static raw partitions might buy some little bit of performance but, again, at the expense of managability. IMHO, raw data partitions just aren't worth it. Even if comparitive performance were a wash, the easier means of managing the database weighs in favor of filesystems.
--
CUR ALLOC 20195.....5804M
The title and summary say "Can Unix handle it?" while the "below the fold" area asks "Can Linux/Intel handle it?".
I'd say the answer to the first question is a resounding "duh!". The answer to the second is a resounding "probably".
I found Oracle on Linux to be quite usable and nice (except for lame non-readline-enabled interactive tools) and fairly fast. But there is something...incongruous about spending $2000 on hardware, $2000 on Oracle and then using a free OS (that you WILL have to tweak to optimize).
Other tidbits:
1) Do NOT, I repeat NOT NOT NOT use Oracle on NT. The (evaluation) version I tried sucked BIG TIME. The bulk loader didn't properly support all the file formats it was supposed to and I was able to repeatedly crash the box by mistyping field names into the table creator GUI. Add all the problems of NT (no real remote management, etc) and you have yourselves the makings of a nightmare.
2) Raw devices are for more than recovery. They also help in the speed department. If you are going to be loading 30+ GB of data multiple times (this is a backup, right?) you are going to want speed. IIRC, ~100MB took about 5 minutes to bulk load (raw, not insert) on Oracle for Linux. That's 25 hours of load time for 30 GB.
3) Can't you take the backups from your primary DB and load them as restores to the backup DB? That would save tons of time and effort (up front AND ongoing).
--
Give us our karma back! Punish Karma Whores through meta-mod!
Linux MAPI Server!
http://www.openone.com/software/MailOne/
(Exchange Migration HOWTO coming soon)
Unix systems handle the largest databases known to mandkind
as we speak.
Databases managed by unix systems have been known to be in
the vicinity of around 2-6TB.
Your question seems to refer to Unix on x86 databases that
have that size.
Of course that running unix on x86 systems usually boils
down to running Linux...
Linux is officially supported by both Oracle, Informix and
I think that even Sybase altough I'm not completely sure
about that.
Obviously running it on the same RDBMS would be an easier
to accomlish, so you'd probably want Sybase to support Linux.
You'd also want RAID 5, preferably hardware which is supported
by Linux.
You'd probably want to use some sort of journaling file systems.
I myself have no problem trusting the beta versions of ReiserFS.
I've also ran oracle on them witout any problem.
If you feel reluctant in using bleeding edge kernel patches
for a production environment, I can only recomend that you use
SMALL ext2 partitions to avoid catastrophic FSCK times, and let
Oracle / RDMS do it's magic in managing a single 30GB database
over smaller files...
Like most everyone else, you are assuming all database are OLTP systems. Data warehousing or data analysis on the other hand requires MASSIVE data transfer rates (mostly read activity), and Raid 5 with large stripe sizes and multiple arrays works really well for this type database. Most queries against the roughly 3TB database I currently work on run in several minutes passing somewhere under 100GB of data each, and if we had used OLTP tactics (indexes to join everything, small block size for low latency reads, etc) to tune the database, they would run in days or hours instead of minutes. Aggregate I/O rates on this monster can exceed 500MBytes/second.
As to the original question, can Linux handle a 30 GB database, my answer would be "Yes, but it will hurt". Ever try staging more than 2GB of data on ext2? Ever try moving more than 1GB of data on ext2 with less than a 4KB block size? It hurts!
Someone please tell me that I will be able to use large files painlessly on Linux sometime. Until then, run large databases on name brand UNIX servers with name brand UNIX. Linux on x86 is good at a lot of things, but a large database isn't one of them YET.
SQL> select sum(bytes) from dba_data_files;
SUM(BYTES)
----------
2.9003E+12
And every byte is on RAID 5.