30+ GB Databases On Unix?
CaptainZapp asks: "A customer of mine runs a ~30 GB data warehouse on a Sybase SQL Server Database. Now their business requires a mirror of this database in a different location. An offer by a reputed U/X vendor for the hardware turns out to be about five times as expensive as when you get a reasonable x86 box, with the necessary amount of disk space and, say, 1 gig of memory. What does the esteemed Slashdot community think: Is Unix capable of handling a database of this size and what other terrible pitfalls do you foresee?" He's not worried about "mission-critical" here, he's just wondering if it's possible.
"Now, the database is not mission critical (which doesn't mean it's not a major pain to reload it), so the issue if raw devices are supported is not too relevant. Further, and even more important, this is a major chance to convince a global player of the capabilities of Linux.
All that said, I' m aware that some of you readers have a quarter terabyte of disk space at your disposal. But that's also not the issue at hand. The question is if it is feasible to run an industry strength database of 30 - 40 Gb size with all its consequences (uptime, maintainability, dumps, etc...) in a Linux / Intel environment."
Win 2K doesn't have 2GB file limit so it is infinitely more scalable than Linux. Also, you get nice GUI tools to help you if you don't know a heck about databases.
The new oracle pricing model isn't based upon users anymore. You basically pay $15 per MHz of cpu speed you have, at least for the standard edition. The enterprise edition rings in for quite more. For instance, if you have oracle running on a dual PIII at 700MHz, that's 1400 total cpu points.
1400($15) = $21,000
Support cost is an additional 22% of the total list price.
$21,000(1.22) = $25,620 total price.
That's per machine. It is ironic, that once you license a machine under this model, you can run as many instances on it as you like (barring performance). Seeing that this is a ~30G database, I wouldn't see that happening.
The low end single user licenses your talking about are time based licenses on a single 500 MHz processor (if i remember correctly). So, with one of those, you could run as many instances you can squeeze on a single PIII 500 MHz for two years. I do reserve the right to be wrong on this though. The large sum pricing above is for permanent licenses.
Cheers
See the ase-linux-list for more info on large db's and raw i/o. mailing list archive.
However, replication server is not supported. yet. I think this is going to be a showstopper for you, eh?
Again see the list for more info.
http://www.sybase.com/linux/
michael peppler's home page
I've never seen a production DB not run on at least a RAID 5 array (most run on something more serious, like netapp drive arrays, which are basically a RAID 5 type of system).
Who moderated this misinformation up?
Linux and 30Gb+ database work fine. In my company we are using both oracle on linux and oracle on sun. We have databases running on linux and Oracle 8i that are more than 200Gb, running on a dual PII and it really work flawlessly, we are not using raw partition but a regular filesystem mounted from a raid 5 volume on a mylex adapter. We are extremely happy with that, and we are using it on a extremely important production system for the company. We are not using sysbase for linux, but I have some friends that are using it and they are very happy with it, so my guess is that it should work fine for you. After that, for the kind of hardware you need, well, it all depends of what kind of traffic you have, but if i were you, i'll definitly go at least with a dual-PIII-something and a raid 5 card like mylex, and a couple of 9 or 18Gb scsi harddrive.
If at all possible, I'd recommend getting some 64-bit hardware. Probably an Alpha-based system. Next, get a decent filesystem like ReiserFS or Global Filesystem.
If you are running on x86 hardware, there's not telling if the accesses will be capable of reading large files (>2GB).
--
Ski-U-Mah!
I hate x86 as much as the next guy, but wouldn't file size limitations be an issue with the operating system or filesystem, rather than the CPU architecture?
--
$x='S24;r)>63/* h@<5+oZ)32"5cz';$me='phroggy'x$];
$x=~y+ -xz+\0-Tx+;print$_^chop$me for split'',$x;
My understanding is, shortly after Microsoft bought Hotmail, they send in their engineers and tried to convert it to NT. After awhile they gave up and left. They tried again several months later, with similar results. NT won't do it.
Here is the NetCraft query.
--
$x='S24;r)>63/* h@<5+oZ)32"5cz';$me='phroggy'x$];
$x=~y+ -xz+\0-Tx+;print$_^chop$me for split'',$x;
Shortly after posting, something like this occurred to me. Not something I know much about though; thanks.
I want an exabyte of something.
--
$x='S24;r)>63/* h@<5+oZ)32"5cz';$me='phroggy'x$];
$x=~y+ -xz+\0-Tx+;print$_^chop$me for split'',$x;
You'd also want RAID 5, preferably hardware which is supported by Linux
What kind of advice is this people are giving???
RAID 5 is real slow for small writes common with a database. You have to first read the whole stripesize (much bigger than the oftentimes single block to write) from all disks, calculate parity and write the small changes back (data + parity). What you want is mirroring, RAID 1, which won't decrease write performance to a crawl but keep your data safe.
I've had great luck with 11.0.3.3 on Linux, but I'm not doing anything serious with it yet.
The original 11.0.3.3 could use both files in the file system and raw hard drive partitions to store data. Do not use files in the file system for your large database, because the limit is still 2Gig for a single file. Use fdisk to allocate a large block, then format it with Sybase.
A fresh release of 11.0.3.3 is available at linux.sybase.com that addresses all sorts of bugs and caching issues. I suggest that you start with this product. If you can make it work with a free product, you have lots more options - you can have two backups on the free version, and one on the 11.9 supported version.
When you have Sybase configured properly, you can have only a single UNIX process acting as your database server (if you don't run the backupserver when you aren't running backups). If you have SMP, you run one more Sybase server process per processor and they communicate with standard IPC. Installation is tricky for a novice, but the tools work as advertised. Sybase uses the RPM format for their Linux installations.
The Oracle installer (written in Java so you must have the blackdown JRE - it is just sick and wrong) commonly fails when configuring a database instance. Yes, there are workarounds available, but why not write an installer that works properly? When you have Oracle running, it lights up your process table like a Christmas tree - at least 4 server processes, plus some sundry rubbish.
A UNIX admin who admires efficiency will be happier with Sybase.
Your history is so far out of whack that I can't really address it... suffice to say:
- BSD and AT&T UNIX shared stuff from Version 6 through Version 8, IIRC, but SysV was developed afterwards by AT&T in conjunction with Sun
- SCO now handles SysV, and the Open Group handles UNIX
- BSD and SysV are the two major strands of UNIX, not SysV and "SrV"
- Linux is not UNIX and neither are {Net|Open|Free}BSD, but they might as well be
This might help you out a bit...--Matthew
I worked at Nielsen for a bit, and (IIRC) they had around 10-12 TB databases, using Sybase under sun hardware (E10000s and others).
Because I had a half-dozen Sybase SA's within a stone's throw of my desk, I used Sybase for my personal database at home (it was only about 160M).
Sybase works really well under Linux. I'm pretty sure you won't need to worry about file size, because I ended up having several files for the database (my database was initially too small, and I just initialized a new disk and added it to the current database to add data and log space).
lance
so I'm not sure why your jaw is on the floor.. What are the benefits of local storage?
In this scenario, I'd think a gig interconnect or two (on a dedicated storage network) to a NetApp might be fine. Probably faster than local storage (assuming you have a better subsystem on the NetApp), and you contain your storage concerns into a dedicated machine, rather than having to deal with host storage. Adding extra scsi host adapters while the db machine is active is a dance any sysadmin would surely avoid.
And while you're at it, why not have two NetApps clustered?
cheers,
-o
Of course that will run on Unix. Most large databases do and only a few clueless DBAs run that kind of DBs on NT.
I'm sure they'll run also on Linux x86, but I'd be concerned about I/O... and other architectures like IBM or Sun can handle that in better ways (and, oh yes, you can still use Linux there).
My Advice don't skimp on buying the box, you will probably loose anything you save in admin costs on a cheap and not very good box.
The Cure of the ills of Democracy is more Democracy.
Erlang Developer and podcaster
It's only "Silly" until your UPS dies (or the card fails or your SCSI bus resets) while there are cached writes.
I would be under the impression that if your UPS going to die it would let you know through the power control protocols. Unless you mean if the UPS explodes unexpectedly, in which case I thought the battery kept cache data, not state data, which was the reason for my "silly" comment. If it keeps state data (a transaction log if you will) then I'm all for it. :-)
Cache will alleviate the performance problem for brief, small transactions.
Which was exactly the context in which I was speaking. The parent to my reply had stated that the bulk of DB transactions were small and that the multiple-write nature of RAID5 made it a performance bottleneck. I had said that a large write cache would allieviate that.
If you're moving more than 256MB through the controller (in either direction, remember that reads consume that cache, too) in less time than the disks can service it, then your I/O's become as slow as the disks. This is unavoidable and unfixable.
Agreed. But then you're back to square one anyway, with the system (usually) being faster than the bulk storage, which is why you have a small but fast disk cache, a slower but bigger controller cache, and a slower yet but bigger filesystem cache on the OS. Each time you step back from the hardware you get a larger cache. System memory is slower than the fast SRAM on the disk cache, but if the memory has it it's a ton faster than actually waiting to get the drive to give you the data (and waiting to get it over a 16/32-bit bus
RAID5 is best-suited for read-intensive environments, or cost-sensitive customers. It is not a high-performance solution. As others have said, RAID0+1 (striped mirrors) are the answer if you want fast and safe instead of cheap and safe.
I'll state again that it depends on your situation. No need to spend a pile on 30G SCSI-II UW disks for a database when you're doing many small transactions. Better to get a few smaller SCSI-II UW disks and RAID-5 with a large cache. There's the ultimate, then there's the practical. :-) The lines between which depend on the pocketbook and the application.
RAID 5 is real slow for small writes common with a database. You have to first read the whole stripesize (much bigger than the oftentimes single block to write) from all disks, calculate parity and write the small changes back (data + parity). What you want is mirroring, RAID 1, which won't decrease write performance to a crawl but keep your data safe.
Personally I don't like having two very large disks around. Give me a half dozen or so smaller ones.
Also, Most hardware RAID controllers have a decent amount of cache with them. The DPT controllers I use can have up to (I think) 256M of ECC cache RAM and optionally battery back it up (silly IMO). That'll fix your performance issues on RAID5.
I think that RAID5 is a good idea, but YMMV.
SQL database at 30G, sure. I would say call Sybase Inc. first, then VA Linux second, and get the answers streight from the people who are most likely sure to give you a usable product. Get your prices, then compare.
I'd be more worried about the differances in _how_ your going to mirror the data (connection speeds, transfer methods, how frequently) and that Sybase doesn't garble things when going from a database on one OS to another (unlikely, but possable).
I'm sure Oracle for Linux will be mentioned, because there are many claims that it will handle such a situation. But, your problem there is going from Sybase to Oracle, not from another OS to Linux. Keep in mind, not all "SQL" databases are identical, the SQL may be, but the extentions provided by the manufacture won't be.
As many other people have said, 30Gb is nothing special volume-wise on 'big' machines, like Sun's.
30Gb is probably at the top end of what you could expect to put on an x86 box and so the question is, what do you want to do with it? If you're just storing the data and doing a few simple queries, you should be okay, although you'll probably want more than a gig of memory.
If you're doing heavy duty processing with many users then forget it. It's not a problem with Linux, but the hardware. (Yes, Linux will run on a mainframe but you can't get Oracle/Sybase/Informix on it.)
The software is less of an issue. Any of the big commercial databases would do the trick (I prefer Oracle, but then I wrote the Oracle on Linux Installation HOWTO -- URL above). MySQL has no transactions or referential integrity, so even if it could handle the volume it wouldn't be appropriate. Don't think I'd trust PostgreSQL, either.
Bottom line, I think you'd be cheaper with the expensive hardware in the long term.
Take a look at Deja.com (aka deja.com)
/v/10
All of that is run off of an oracle database..
The Database is HUGE!
/dev/rd/c0d0p1 71706488 41278452 29710576 58%
41GIG
As long as you have the right indexes... you're all set..
ChiefArcher
I think this really comes down to what you are choosing to have as your true goals. If you want cost, speed, redundancy, scalability or any other factor.
You need to decide what things are important to you and start making choices.
In the end, you will need to trade off all of these things to find your solution.
Personally, I do a lot of work with banks at the moment. They want brand name, proven tech. Not necesarily the latest greatest. On top of which, they are willing to pay for brand names. As a result, I would spring for a RAID tower coupled with a Sun box running Oracle. But, if I wanted cost, I would probably pick up a VA box or custom built with a RAID tower and run linux with oracle or maybe try postgresql.
It all ends up being a trade off.
*jaw drops*
I would have to recommend against this. By buying hardware RAID and an appropriate filesystem add/on (e.g. Veritas File System) you can get all the benefits of the filer with all the benefits of local disk.
--
-- Slashdot sucks.
While I agree that lately a lot of questions have been pretty brain dead, I think a lot of these questions have been pretty good discussion starters and I get a lot out of the responses from people who either know what they are talking about or know how to look things up and provide a reference URL when they respond. There are usually at least a few of these people replying to most questions.
I do ask people posting replies to avoid posting anything if you don't know for sure or you are too lazy to check your facts before posting. There are far too many people writing uninformed opinions and using phrases like AFAIK and IIRC to forgive themselves for not checking their facts before posting.
Sorry this post turned into a rant.
At work a group implimented a Oracle data wharehouse on Sun equipment running solaris. The database sizes are expected to scale to 1 TB. But if I am remembering correctly the cost on the equipment/oracle was about $2.0 million.
Palin...
Using raw disk partitions is not the same thing as Linux's "raw block device" support, which lets you access block devices without going through the buffer-cache layer. Some database programs want to use this so they can do their own caching, etc. However, lots of things access block devices, e.g. /dev/hda1, for example mkfs, and fdisk. The raw block device support is still very new, and was developed by Steven Tweedie. It uses kiobufs to do zero-copy IO. You would know about this if you were at the second memory management talk, given by Ben LaHaise, at the Ottawa Linux Symposium last week :)
So, unless it's the database's release notes that say not to use whole disk partitions, there should be no problem. The kernel lets you access a disk partition as a big file very easily.
#define X(x,y) x##y
#define X(x,y) x##y
Peter Cordes ; e-mail: X(peter@cordes ,
Linux uses the native word size of the machine for file offsets, so on 32bit architectures, file sizes are limited to 2GB, while on 64bit archs, Linux can handle files up to 8EB (1 exabyte ~= 1 million terabyte).
Recently, large file support on 32bit archs has been developed, but it isn't in the main kernel yet, AFAIK.
#define X(x,y) x##y
#define X(x,y) x##y
Peter Cordes ; e-mail: X(peter@cordes ,
I know it must be hard to be a non-windows hater (that's non-"windows hater", not "non-windows" hater) and listen to the crap that's flung about around here, but you've reached an absurd level of defensiveness. You're defending software (an application AND an OS) that crashes (according to one report) when the user makes a simple mistake and placing the blame on the user. An application should not crash when given invalid input. It should notify the user. An OS should not go down when an application misbehaves. It should kill the app, perhaps generate a core file, and keep on chucking.
Now, Linux and other UNIXes are not without their own problems in this regard, but at least the people responsible don't respond with "don't do that" when told about it. Neither does Oracle, I bet, but you do. You and Microsoft.
--
Fuck the system? Nah, you might catch something.
You must be smoking crack. Raid 5 is optimized for throughput at the expense of seek time, and DB don't give a fuck about throughput, but on the contrary live and die because of seek time. NO RAID 5 FOR DATABASES. Repeat 100 times.
The kind of database you're talking about is a far fetch from what most people here will need, including the original poster. I don't doubt that very specific cases require exceptions to the rule; the rule being, no RAID-5 for databases. That's not just for simple web-type databases; actually it's stated in the O'Reilly Oracle DBA book, which addresses much wider needs.
In the Linux install notes, they claim that for optimal performance you have to split the Oracle install on 4 disks; which implies no RAID.
But hey, am I supposed to have higher journalistic standards than the slashdot editors? Eh eh eh eh.
And one of the discs could also be a tape drive. Yeah.
seek times are dramatically improved in most (if not all) RAID levels
Seek time is not going to be any better in mirrorring, for one. The two heads reading the same data won't go faster than one head, will they?
Then for striping, this usually won't make any kind of difference since data access will be randomly spread over the disk. So there you go.
NOW smartly organizing the database WITHOUT striping amongst several disks *will* make seek times faster, actually, it will require less seeking. A typical Oracle installation (as recommended by Oracle) will have for example the software on one disk, the indexes on another, and the actual data on a third.
Now since one DB transaction requires typically at least one index lookup and one data retrieval, which are unlikely to reside close to each other on one disk. Now when they're separated on two disks, subsequent queries will have less seek time .
Now, since I was right, will you give me my karma back? ;)
BTW - This is the case for most (all?) RDBMSes. The short answer is that binary database files (they store the data) are platform dependant.
Some RDBMSes, like Oracle, allow you to specify the block size of your binary database files. But even if you have 2 files with the same block size, they may not be transferable between databases.
--
"You're gonna need a bigger boat." - Chief Brody
Well, when you're at the 30 gigabyte size, your options as far as remote replication are a little limited. For example, its not enough to warrant the sort of large-scale storage array that an EMC Symmetrix would offer that comes with built-in remote replication (SRDF).
What you could use is Veritas Volume Replicator. It runs as a service/daemon on your box and mirrors every write over IP to another box. It can be configured to do it synchronously (the I/O blocks until the remote I/O completes) or asynchronously (higher performance because there's no delay, but you run the risk of data loss when the db server goes down).
Unfortunately, Veritas VR is not available under linux - I think you said it wasn't under linux anyway, but a lot of people are offering linux solutions.
Also, given that your database is only 30 gigabytes, do you actually do a lot of writes? Realistically, if you only do a couple of hundred inserts an hour, you could just, every hour, manually insert the changed records into the remote db. Heck, do it every 5 minutes. I'm not familiar with Sybase, but on Oracle, you can just run the redo logs on the remote data center. That's going to be the cheapest option, and the most linux compatible.
Anyway, if this is really enterprise-level, spring for Veritas - their stuff is expensive but really good.
Cheers,
Matt
Matthew J Zito, CCNA
me@mzi.to
At my work, we run two large databases, one that's about 95 gigs and the other 160 gigs on linux. Now, we run our production db on solaris, but the data warehouse and the ticketing db are on linux with oracle. We've had great results with linux for the most part - the biggest problem is that the documentation by Oracle is not as good for linux as it is with Solaris, and its harder to find DBAs with Linux experience.
I can't vouch for Sybase's stability under linux, but Oracle will do you just fine. Get a dual or quad-cpu box, depending on how much data you need to do, and 2 gigs of RAM either way.
Matt
Matthew J Zito, CCNA
me@mzi.to
I would stick with Sybase
Absolutely. I spent several months last year adding Oracle support to an application designed around SQL Server and cross-vendor development is really something you should avoid if you can. Leverage your existing DBA knowledge and you can probably use one DBA for both sites. If you do go with another vendor, you'll wind up with another DBA either on salary or on retainer.
Just junk food for thought...
I agree with you. RAID 10 will give a nice combination of safety and performance. If your crazy (well, for just 30 gigs, perhaps not *that* crazy) there's the new Adaptec UDMA 66 RAID card which I think may support RAID 10. It definitely supports RAID 5. Actually, I wonder how bad RAID 5 would actually be with one of those cards and five UATA 66 Maxtors with the 2MB cache on them. They are pretty fast drives....
I don't know if the Adaptec card has it's own caching, but it would be very cool if it did!
BSD is not 'derived from System V' - it forked off from Unix earlier, maybe version 7 Unix.
Also, Solaris 2.x is based on SVR4 (System V Release 4) - SVR4 is quite upward compatible with SVR3.x.
And Solaris is not spelt with a 'u'...
Linux was not 'built on Posix' (not a meaningful term, Posix is an API spec) but I believe Linus tried quite hard to conform to the 1003.1 specs, and the bash people have tried to conform to the POSIX shell specs.
The problem with a caching controller is that unless it's well engineered (with it's own battery backup), you more likely to run into filesystem corruption in the case of a power failure or OS crash.
A standard filesystem (such as ext2) on top of RAID5 will never be fast for small writes.
NetApps get around this because the WAFL filesystem is explicitly designed to sit atop a RAID4 drive array.
And there is a difference between RAID10 and RAID0+1.
RAID10 is a stripe of mirrors. Each pair of disks stores the same information (RAID1), and a stripe is created over those mirrors. This can tolerate multiple drive failures as long as at least one drive from each mirror is working.
RAID0+1 is a mirror of stripes. Two stripes are created(RAID0), each with half the total of disks. These stripes are then mirrored(RAID1). The problem here is that if a drive goes out, it takes out the entire stripe. If a drive in the other stripe goes out before the rebuild is complete, you're hosed.
Normally RAID systems (like RAID5) can't tolerate more than 1 drive failing at the same time. However, RAID10 provides more protection than RAID0+1, at the same price.
I don't think so. Didn't the recent benchmark comparing IIS vs. the new webserver from RedHat run on an 8 CPU SMP system? You can get more CPUS... you just don't see them in the advertising aimed at Joe Sixpack. They tend to be just a bit on the pricy side.
Cheers...
--
CUR ALLOC 20195.....5804M
I think you're talking about an Alpha-based workstation. No one's going to be hosting a 30+GB database on a workstation. They would be looking at a DS10 or DS20 at a minimum. Expect to pay something in the area of US$20K for a smallishly configured DS20.
A whopping $1000 for disk space to host a database? Only if you plan on sticking the entire thing on a single 36GB drive which would be an inexcusable performance hit. And that would leave no money for any kind of mirroring.
I guess this $6000 configuration isn't intended for a production system.
--
CUR ALLOC 20195.....5804M
Depends on your UNIX. Under Tru64 and some other Unices, you have storage management tools (under Tru64 there's Logical Storage Manager, for example) that'll let you slice up a disk into as many pieces as you like. You then access the disks through either /dev/vol/... or /dev/rvol/... (if you really want to use raw data partitions). Striping across SCSI adapters for better I/O performance is quite easy.
Agreed. Some of the dollar estimates that people are throwing around are fairly humorous.
--
CUR ALLOC 20195.....5804M
It doesn't have to manage it's own disk space. And it may, under certain conditions, provide better performance. We have been moving away from raw data partitions. This after running some benchmarks of a large table residing on raw partitions vs. the same data residing in tables in a filesystem. The performance was actually better while accessing the data in the filesystem. We're talking 10+% better performance not just a few percent. Our experience, based on our benchmarks, and discussions with Oracle technical people, is that the preference for using raw data partitions was based on performance tests using older versions of UNIX and less capable filesystems. Of course, your mileage may vary.
Aside from performance, if your database changes frequently, adding and deleting tablespaces is a major pain (with long downtime) when you're using raw data partitions but is a snap when you're using filesystems for data. If your database is fairly static raw partitions might buy some little bit of performance but, again, at the expense of managability. IMHO, raw data partitions just aren't worth it. Even if comparitive performance were a wash, the easier means of managing the database weighs in favor of filesystems.
--
CUR ALLOC 20195.....5804M
Yes, using a Network Appliance Filer is using Oralce with NFS - but this is a solution developed by NetApp in conjunction with Oracle, and is AFAIK the only NFS solution Oracle recommends.
(No I don't work for NetApp.)
We're considering Filers to replace local disk on some of our Sun 450s (running Oracle 8.1.6) at the place I work.
We here at Giganto Communications (name concealed for my protection) use Intel based multi-processor machines exclusively for all our important database needs. you obviously are in the minority when the largest communication companies in the world will do what you will not.
:-)
as for Linux, no. We suffer with NT crashes, but we are sneaking Linux in the door, one server at a time, until.... well you get the picture
Do not look at laser with remaining good eye.
This is UK price for DS10L. You do not need the expandability of a DS10 or DS20. Also AXP has even cheaper machines. Sold in the UK by evolution.
A whopping $1000 for disk space to host a database? Only if you plan on sticking the entire thing on a single 36GB drive which would be an inexcusable performance hit. And that would leave no money for any kind of mirroring.You are right. Off by 2-3 times. Was thinking of an external IDE RAID to SCSI box. Works fast enough. Is cheap enough. If necessary mirror two or more at RAID0.
Baker's Law: Misery no longer loves company. Nowadays it insists on it
http://www.sigsegv.cx/
1. If you have not noted Oracle legal has walked around every single site that had Oracle vs X benchmarks (X=mysql, sybase, informix) and made them drop them. This is actually possible under the 8.0x EULA. Actually just read the EULA. It is a masterpiece in itself. You are not allowed to benchmark the product and not allowed to question the fact that it is fscking slow and not ANSI compliant. That is besides the fact that if I was you I would not buy something where the manufacturer intentionally disallows fair comparison with other products. It is enough to say fsck this at least for me...
2. The original database is on Sybase. Sybase is at least more or less syntactically ANSI SQL compliant. Oracle is as far from ANSI as it gets. It will be a good guess that it will take you ages to port the bloody thing. And porting it will be more expensive than the "expensive" hardware.
3. I would see if the database design is implementable under postgreSQL or MySQL on an Alpha. Alpha is cheap. A reasonably good alpha is under 5000$. Storage will be a 1000$ more. This is as much as an appropriate x86 box. Postgres does not have a 2GB database limit anyway as it splits database files. MySQL does not have this limit on alpha because the platform is 64 bit. Your problems are in the key limitation/lob interface for postgress and transactions for MySQL.
4. If Neither of the solutions in 3 is implementable you have to open wide you wallet and buy informix for Intel or DB2 for intel. Both of them work and are ANSI compliant. In btw DB2 for Intel linux developer edition is free. Free period. No expiration. So you can actually see if the database will work. And they match Oracle on some benchmarks and DB2 beats the crap out of it when it comes to real scalability and clustering.
Baker's Law: Misery no longer loves company. Nowadays it insists on it
http://www.sigsegv.cx/
Oracle on Linux may be the best choice. DO NOT I repeat DO NOT skimp on hardware. Some of the Linux platforms advertised in Linux Journal would be you r best bet for industrial strength hardware.
Much Success!!!!
lcase - @home in cyberspace
will do your mirror asynchronously, with no impact
on performace. Will your obese monster of a
database do it? You can download from www.kx.com
I would skip the filesystem layer and run off of partitions (or /dev/md devices?) directly. Although I've only had 10Gb at a time going under it, and on actual partition devices, not RAID devices, (I am resource-underprivelaged, unfortunately), Adabas/D handles this very well. I'm sure other commercial databases would as well, if they are truly supporting linux rather than just paying it lip service =)
I see absolutely no issues with pushing Adabas/D farther than I have, it has not had any issues with it whatsoever. Of course, if you used files on ext2 or reiserfs or whatever, you would have unnecessary slowdown and potential instability, use disk partitions.
--
Paranoid
Paranoid
Bwaahahahahaa.
But why ever would you replicate a database to a different kind of server? If the original database runs on Sybase SQL on whatever, then the obvious answer is to replicate it to an identical setup. Anything else, whether mission critical or not, is just going to be a lot more work, training, and maintenance.
Is Unix capable of handling a database of this size and what other terrible pitfalls do you foresee?
MacOS and Filemaker.
Honestly, though, of course "Unix" can handle a database this size... it all depends on what hard ware your "unix" is running on. Obviously Linux or *BSD on a 368, 486, or Pentium system won't cut it, but if you up your ante to a dual P-II or P-III system, or even a Quad P-II Xeon system (which should be relatively "cheap" compared to offerings from Sun and Compaq), you'll be well on your way...
In an interview with Linus himself he stated that he has an 8 CPU Intel....
Linux 2.4 will take care of the 2GB limitation.
This question seems to be asking whether Unix can handle a 30GB database? Should this be asking if Linux can handle it, or any general Unix? I would guess that the answer to either of these questions is yes. I certainly know that Solaris is more than capable, but can't see why Linux wouldn't (hey, even Sybase under NT can handle it, although not as well as Solaris).
As far as the 2GB file system limit, this is something that is easy to get around. Up until Solaris 7, when Solaris became 64 bit and supported files/partitions > 2GB, all you needed to do was to create multiple Sybase database devices and span the database across them.
does anyone actually know what the license costs for Oracle on Linux actually is? If this is a datawarehouse application then I'm assuming that you can't use one of the low end single user licenses.
Whenever I have looked at Oracle the software costs have dwarfed the hardware costs.
Who cares about a file size limit. Oracle databases are designed to span multipule files. DO you really think that no Oralce/Linux system goes over 2GB? LOL
Dissenter
Dissenter
"There is no knowledge that is not power."
True :)
Above that... Linux is really a Unix clone. (People who call it a "work-alike" are just being cute... it's a clone thats all)
Sun entered the market and really got it's fame with SunOs (a BSD based Unix clone).
Linux is usually called a *nix not a Unix becouse it is not liccensed from AT&T or SCO. (Or anyone else who held the trademark)
It should be noted that BSD and Solarus are Unix forks. The BSD dev group and Sun must maintain compatability by relying on documented standards just like Linux.
[notied becouse some Unix people who dislike Linux will attack Linux becouse it is built on standards not on the accual code. The idea being that Solarus and BSD are the same code and by default compatable. This is false for the above reason. The below is just to extend the point nothing more.]
Solarus and BSD are forked from diffrent code. BSD is from the original AT&T code later known as SysV. Solarus is from a total rewrite in the 1990s known as SrV. SysV and SrV are not compatable.
So in reality Linux, BSD and Solarus are three totally unique (and multally compatable) operating systems. Linux being the only one of the three with no liccens to the name Unix.
Over time many Unix clones were incorrectly called Unix. However this fact was less than noticable as forks and clones had no standards to folow and ended up pritty much being mutually incompatable.
Linux was built on Posix the first effort to correct this issue.
On a side note... Linux disordented me becouse I learned Unix on an AT&T 3B2/300. But Linux didn't thow me much.
One gripe people have about Linux is that it is posable to write Linux only code that dose not work on BSD or Solarus.
While true it is equally posable to write BSD or Solarus only code.
It is an effort of the programmer to maintain portability. Failling that it dose not matter what operating system the code was made on.
I don't actually exist.
BSD is liccesned from AT&T and thus is a Unix.
:)
Small issue
I believe BSD was never cerifyed... It simply is by age alone
I don't actually exist.
So you don't consider field names to be critical data?
Critical to the correct behaviour of the Oracle application? Probably.
Critical to the stability of the server OS? Absolutely not.
How much is your data worth? Will you cheap box have all the servery things? Here're just some:
- RAID
- Hardware monitoring
- Hardware redundancy
- ECC low latency RAM
- Over-engineered cooling
I bet when you add all of those, your x86 box will become much pricier.As to the original question, can Linux handle a 30 GB database, my answer would be "Yes, but it will hurt". Ever try staging more than 2GB of data on ext2? Ever try moving more than 1GB of data on ext2 with less than a 4KB block size? It hurts!
My understanding is that Oracle can now use its own filesystem on Linux...I don't know very much about Oracle's properietary FS... but my thinking is that it would make life easier. I dunno. Anyone else know?
My journal has hot
Really? Can you explain the procedure for rebuilding the kernel remotely for an NT machine?
People, this is a joke !
How can a joke (and this is not a funny one) be marked as "insightful" ???
-------------------------------------------- Se você consegue ler aqui então fala português. Óbvio
It's hard to give sound advice without a little more info. Is this mirror goig to serve as a backup-server in case the main server is not available ? How many users are we talking here ? If there's only a few users, i can't think of anything against an x86 based server, other then maybe supporting a linux/insert_db_engine_here.
Message on our company Intranet:
"You have a sticker in your private area"
beauty is only a light switch away
Funny thing, could you refer me to a shop that can sell me an x86 based machine that will outperform a fully loaded RS/6000 S80 on large databases ?
Message on our company Intranet:
"You have a sticker in your private area"
beauty is only a light switch away
Oracle on NT a. '~crash by mistyping...' The answer to that of course, is to not mistype mission critical data, you should be using scripts for bulk trtansfer anyway.
Who said anything about typing "mission critical data"? As I said originally, I was typing field names into a GUI. My point was not "I'm going to need to type field names all the time so it better robust". My point was "if something so simple can go so wrong, what ELSE is broken".
"As an NT engineer, I can do ANYTHING from my laptop, from ANYWHERE in the world. Using only MS tools and a few scripts I wrote in vbscript. I concede that sometimes it would be nice to have a 'true' terminal connection to the server, but you don't 'need' it."
Oracle provides no facility for remotely starting a "local" bulk load (that I could find, anyway). This means that you must be running locally to load from a local disk. On Linux this is easy: telnet. On NT this requires time and/or money (which is what "MS tools and scripts I wrote" translates to).
"Only 20mb a minute? bwahahahaha I can reload data into my NT, MS SQL server at over 150MB PER MINUTE."
Different hardware. I was using a simple desktop for benchmarking (to get comparisons, not absolute numbers). In any case I wasn't using bulk loading to restore--Oracle has an actual backup/restore mechanism that doesn't require reinsertion of data.
--
Give us our karma back! Punish Karma Whores through meta-mod!
Linux MAPI Server!
http://www.openone.com/software/MailOne/
(Exchange Migration HOWTO coming soon)
'~no local bulk load in Oracle'
RTFM. Need I really say more?
No, you need to read more. I said "no facility for remotely starting a local bulk load". If I am on machine A, I have no way to tell Oracle on machine B to bulk load a file directly from B's harddrive. If you wish to claim that is possible, you are going to have to provide a URL for proof.
"...600 mhz p3, 128 mb ram Dell inspiron 3800. Your desktop is probably about as powerful, yes?"
Nope. I finished testing a year ago (started 18 months ago) with a spare desktop. If I recall, it was a PII 300 with 64 MB. Also, totally unoptimized (i.e. no kernel tweaks, etc). Just a straight RedHat install with a straight Oracle install on top.
--
Give us our karma back! Punish Karma Whores through meta-mod!
Linux MAPI Server!
http://www.openone.com/software/MailOne/
(Exchange Migration HOWTO coming soon)
The title and summary say "Can Unix handle it?" while the "below the fold" area asks "Can Linux/Intel handle it?".
I'd say the answer to the first question is a resounding "duh!". The answer to the second is a resounding "probably".
I found Oracle on Linux to be quite usable and nice (except for lame non-readline-enabled interactive tools) and fairly fast. But there is something...incongruous about spending $2000 on hardware, $2000 on Oracle and then using a free OS (that you WILL have to tweak to optimize).
Other tidbits:
1) Do NOT, I repeat NOT NOT NOT use Oracle on NT. The (evaluation) version I tried sucked BIG TIME. The bulk loader didn't properly support all the file formats it was supposed to and I was able to repeatedly crash the box by mistyping field names into the table creator GUI. Add all the problems of NT (no real remote management, etc) and you have yourselves the makings of a nightmare.
2) Raw devices are for more than recovery. They also help in the speed department. If you are going to be loading 30+ GB of data multiple times (this is a backup, right?) you are going to want speed. IIRC, ~100MB took about 5 minutes to bulk load (raw, not insert) on Oracle for Linux. That's 25 hours of load time for 30 GB.
3) Can't you take the backups from your primary DB and load them as restores to the backup DB? That would save tons of time and effort (up front AND ongoing).
--
Give us our karma back! Punish Karma Whores through meta-mod!
Linux MAPI Server!
http://www.openone.com/software/MailOne/
(Exchange Migration HOWTO coming soon)
It's not really a platform problem. You might have to partition the DB into multiple files to get around the 2GB file size limit on Linux (I think Sybase can do that), but I doubt there would be any other real problem.
Sybase runs on Linux, of course, so there is no problem there.
I'd ask in the Sybase newsgroups about the biggest database they have seen on Linux - they have a good reputation for quick answers. (About the onlt good thing I have to say about Sybase, but still....)
I'd be surprised if there aren't quite a few Linux DB's bigger than 30GB anyway.
I looked at this sort of thing over a year ago, and previous posters are right, 30 GB is kinda puny. Here are the headaches I had that I hope you can avoid.
;0)
;0)
1. First and foremost, stack the box with as many SCSI adapters as you can. I/O quickly becomes a bottle neck on large DB systems. Also if you're doing Linux go with Linux's built in RAID, I hear it's faster than the hardware raid cards you can buy out there. That said be sure to get more than one of some hot processors, you're going to be using a goodly portion of one of them to do your RAID.
2. A journaling filesystem would be good. I don't know of any available for Linux (except maybe XFS, what's the status of that?) you really really don't want to fsck your Raid 5+1 (Yes I said 5+1)
3. Unless you have the funds to implement a slightly lower performance box, expect to be developing on a seperate instance on this same server. That means worst case another 30 GB of space for the new istance, which will also require a kernel re-compile to get the shared memory and semaphore settings right. (You are using Oracle aren't you?
4. Better yet get your requirements up front for number of instances and design the hardware for that number + 2, and tune the kernel appropriatly. Whatever Oracle gives you for kernel parameters multiply them by the number of instances.
5. Don't sweat the raw devices stuff. It's generally more trouble than it's worth. It makes backups harder, makes restores harder, and makes RAID harder. It's just not worth the headache.
6. Invest in a nice DLT library that is supported up front. Get your backup scheme in place, even if it's just your DBA's writing dump files nightly. A good DBA can restore from a dump in a few hours, AND they can restore a dump of production to your development database, making those refreshes from production a fairly painless task (and management/developers/DBA's *WILL* ask for refreshes from production.
7. DON'T considder RAID 5, onless it's 5+1. RAID 5 can be murder on DB performance, especially in a VLDB, where you perform inserts (it's a little less bad on Datawarehouses) Think 1+0 or 0+1, and span the + across multiple controllers/disk arrays.
8. Don't skimp on your DBA. In reality most any competant SA can administer a DB *system*, sink any payroll money into a very good DBA, it will save you in downtime and calls to oracle later (You are using Oracle aren't you
g:wq
What if it is just turtles all the way down?
does the indices fit into main memory? not close - a good fit is needed. so you need at least 50% more memory than the indices to be on the safe site. (you wanna do joins, etc ...)
...
... this gets really complex ... ;-)
are there big tables, each one >memory/2 ? or are there 1000 small ones. (we talk about real mem here not virtual)
the rules are:
1) design
2) choose hardware and software on the details of 1.
sometimes a little redesign makes it possible to have more freedom on hard/software
(the 50% i mention above, are a value form experience. the more flexibility you need the larger the real memory needs to be. having indices in ram and 50% of the memory to work gives you a fair amount of flexibility. driven by the needs of the application the % can be 20% too, it depends on how often you create entries, what and how often you look up fields and what joins are necessary to do that
DB2 is a great product and DB2 Universal Database runs on AIX, HP-UX, Linux, NUMA-Q, OS/2, OS/390, OS/400, Solaris, Windows 2000, Windows 95 & Windows 98 and Windows NT. Check it out.
Unix systems handle the largest databases known to mandkind
as we speak.
Databases managed by unix systems have been known to be in
the vicinity of around 2-6TB.
Your question seems to refer to Unix on x86 databases that
have that size.
Of course that running unix on x86 systems usually boils
down to running Linux...
Linux is officially supported by both Oracle, Informix and
I think that even Sybase altough I'm not completely sure
about that.
Obviously running it on the same RDBMS would be an easier
to accomlish, so you'd probably want Sybase to support Linux.
You'd also want RAID 5, preferably hardware which is supported
by Linux.
You'd probably want to use some sort of journaling file systems.
I myself have no problem trusting the beta versions of ReiserFS.
I've also ran oracle on them witout any problem.
If you feel reluctant in using bleeding edge kernel patches
for a production environment, I can only recomend that you use
SMALL ext2 partitions to avoid catastrophic FSCK times, and let
Oracle / RDMS do it's magic in managing a single 30GB database
over smaller files...
Oracle over NFS to a NetApp Filer would work fine on a Sun or such. But despite HJ's valiant efforts, Linux's NFS isn't there yet. Linux is getting there, but there are still real good reasons to go buy those Suns if you've got a big, mission-critical problem.
"The future's good and the present is nothing to sneeze at." - Roblimo's last
You can absolutely do this. Now, depending on which version of Sybase ASE you use, you may run into some dumb limitations. For instance, version 11.9.2 has a 2GB limit for the size of a device, so you have to partition your disks into 2GB slices, and distribute your database across multiple devices. I think they increased the size in ASE 12, but I haven't worked with that yet so I don't know what the limit is there.
Oh, go on, check out my job.
Unless, of course, you wanted us to go to that Digital Couple website.
Oh, also, you can write your Oracle redo logs to the Filer, even though they recommend against doing so to anything other than flat disk or RAID 0/1. Why? Because the Filer uses a journaling filesystem in NVRAM, so the writes happen as fast as the wire (GigE in our case) can run.
How long does it take to back up that local disk? For us, it takes about 2 seconds, and takes up almost NO STORAGE SPACE! The Filer has a feature called "snapshot" which is basically a copy-on-write filesystem. You tell it to snap and it comes back after a second or two. After that, you can always go back to that point in time and recover files on-line, without any sort of programatic interface (just filesystem access). There is even an add-on package called snap restore that will instantly restore the entire filesystem to that previous state....So, get this, our Oracle backup is: put all of our Oracle tablespaces in locked/suspended mode; call tell the filer to snap; unlock the tablespaces. Now if we ever need to restore, we just bring Oracle down, swap in the old data files, bring Oracle up. We can also do tape backups this way, as the Filer backup program uses snapshots. Thus, as soon as a backup is started, you can start writing to the data again safely!
I run a database of this size, and it's not a challenge. Cost is very high, but that's mostly because a database of that size is one that you cannot afford to have to restore.
I currently use a Sun architecture, but I know of sites that use Intel/Linux, HP PA/RISC and even (may all the little gods help you) Intel/MS/SQL server which does have it's place in non-mission-critical places where you're never going to have a good DBA.
I can seriously recommend the Network Appliance Filer for back-end storage. Their claim that their network-attached storage array is faster than local disk sounds silly on the face of it, but there are good and valid reasons that it's true (mostly due to their journaling and caching strategy which is highly optimized for NFS). The Filer makes databases a lot easier to manage. For example, the Filer can make an online backup in less than 5 seconds, no matter how much data you have!
Back to your original point: 30GB is small, don't sweat it. But, don't cut corners either!
Heck, even a 'low-end' F80 (1-6 500MHz copper Power-III CPUs with up to 16GB of RAM) would be able to take on PC hardware...
"It's tough to be bilingual when you get hit in the head."
No need to even think about this one - on HP/UX at least Oracle can handle db's in the TERABYTE range. Oracle is usually configured to use raw partitions, so you just add more table space by creating more partitions until your disk is gone... and then you add more disk!
mysql is not a real database.... it's a filesystem on drugs. don't get me wrong, I use it! It has it's place and a 30 gig database is NOT it.
Is that 2 GB limit a 32-bit limitation or is that limitation also present on 64-bit machines?
Got HTML? Want LaTeX? Try html2latex
The question of course is what industrial strength databases are supported on FreeBSD? I don't think Oracle, Sybase, or DB2 are. That just leaves interbase, and I don't really here too much about interbase.
-- Superlame http://catpro.dragonfire.net/joshua/
30Gb is pretty modest for a database; you can get home PC hard drives double that size. A big database is measured in Terabytes.
:-)
A wee Pentium server with one of those little hot plug SCSI trays is fine - just be sure not to use RAID-5 on your log drives
Run Oracle on Linux or even En-Tee - beware that on Linux there is a 2Gb per file limit which may constrain your layout.
yes, you read it right....
We run an oracle DB on Sun equipment (the DB is on 1, uno, one, singular, machine). Yes it's a sun enterprise level 64way with 40GB of ram, but our DB is over 2 Terra-Bytes.
So to answer your question...yeah, 30GB, no problem. When we have ext3, you could even do it on linux :)
I would wait until the next version of the kernel came out unless you're planning on using raw file devices. Linux kernel 2.2 and below have a 2 GB filesize limit which will be removed in kernel 2.4.
I work for France Telecom, as the SysAdmin for Voila.com
We use Linux exclusively on our servers. (Well, except for one lil box running NT to interface with Reuters, because they refuse to make their proprietary client for Linux)
Our current database is around 4 Terabytes. It sits on about 80 servers all running Linux.
Admittedly, we use a custom database package, developed in house, and not an RDBMS, but when your dealing with such a specific dataset (we index web pages... thats it...) you don't need the flexability of Sybase.
Then theres Google... How many thousands of Linux boxes are they running? How huge is there database?
So yes, Linux is more than capable of handling a puny 30 gig database. Heck, I have more than 30gigs of data indexed on my HOME machine. (30gigs of MP3's all indexed and cataloged with Postgres) not quite the same as a "30 gig database" but similar.
With the cost of the RAID cabinets being equal, RAID-5 requires half plus one as many disks as RAID-10. Eight 9GB SCSI drives at RAID-10 will yield 36GB storage striped across five mirrored pairs of drives; to get 36GB RAID-5 storage, you need five 9GB disks with each stripe's parity information alternating among the drives.
A good RAID-10 setup will be able to read different data from each mirrored drive simultaneously, creating a potential 100% read performance advantage over RAID-5 or simple striping--200% for three-drive mirroring if you're that rich. Realistically, though, it comes out to a lower number whose upper limit is defined by the SCSI channel's throughput, and insert-your-bus-architecture-here's bandwidth, and your computer's general ability to keep its shit together.
Probably the best advantage RAID-10 has is that you will probably put each RAID-0 on its own controller, which that in addition to being able to survive a drive failure, you could live through a controller failure as well. Redundancy is your friend.
Okay enough rambling. This was supposed to be a simple clarification that said "RAID-5 costs less than RAID-10, not the other way around."
--
This is not my sandwich.
That's not what "not mission critical" means. Not mission critical means that it is okay if you have to do a reboot and the server is down for 10 minutes. It means that you don't have to have a redundent cluster to make sure that if one goes down you don't need to take the database ofline. Losing an entire 30GB database and having to reload it is unacceptable under any circumstances. Treating it mission critical usually means cost takes a back seat to having 100% uptime. I don't think that's what he needs.
A deep unwavering belief is a sure sign you're missing something...
Of course unix can handle it - a LOT of people do this sort of thing on Solaris/sparc, for example.
I expect Linux would be able to handle it to, but don't expect the same throughput per MIPS from Linux/x86 as you'd get from Linux/sparc or Linux/alpha. Intel and AMD have great CPU performance for the price, but they aren't that much of a server architecture.
You really need to talk to EMC.
They have a high performance disk storage array called Symmet rix, which is pretty cool in it's own right. However, what makes it REALLY REALLY cool is that they sell it with a software package called Symmetrix Remote Data Facility (SRDF). SRDF allows you to copy/mirror data to an offsite Symmetrix array that can be located anywhere in the world! This is the software that all the large companies use to provide their "disaster recovery" site at another geographical location.
Oracle can use raw partitions, it doesn't have to. Last I heard, the use of raw devices wasn't recommended under Linux...I'll check the release notes again.
--jbJust FYI: a 30GB database doesn't imply one file. I have a 10GB+ Oracle on Linux database right now; Oracle organizes data into tablespaces which contain one or more data files. The data files can be spread over any number of partitions; in fact for performance it's better to spread them over multiple disks.
Now what I'm doing isn't mission critical, so I can't comment on that aspect of it, but I will say this: a 30GB database will certainly require more than 1GB of memory.
--jbWith respect to journalling filesystems...
At my site, we've been using ext3 on production NFS servers for almost a month now, with no trouble in terms of stability. Disk I/O has suffered a big performance hit (less than 1/2 previous performance), but for the better filesystem reliability, it's worth it.
The 2GB file size limit is your biggest problem. If you're going to go with Linux for sure, look into DEC- I mean, Compaq- Alpha hardware. It's a 64-bit architecture, so that limit shouldn't exist there. I haven't ever actually used Linux on Alpha though, so I cannot guarantee that.
As far as the software RAID bit...
The ext3 patch is against kernel 2.2.17-pre9, so we're sticking with that for now. No development kernels for us, here. Once Steven Tweedie's ported it to a moderately stable 2.4-test, I'll look at giving that a shot. From what I understand NFS performance has increased significantly there. Don't know about RAID. Don't know about databases, either, but if that will put a moderate load on your computer, you ought to look at hardware RAID in any case- you'll get better performance by far, if you take the RAID load off the main CPU.
-Matthead
Not a flame, I'm serious:
Then please, tell us how it is. I understand that high availability and outright performance is probably going to be more a concern than cost.
I don't know much at all about working with databases of any significance. Is the data stored in separate files, or when he says a 30GB database, does that mean a 30GB file?
How much is in RAM? If the database server were to crash, would all the changes made be sync'ed to disk? Would you have lost data when you do get the computer back up? How does this depend on the platform the database runs on?
What other concerns are there? Can you give us a "fscking clue?"
-Matthead
"4. If Neither of the solutions in 3 is implementable you have to open wide you wallet and buy informix for Intel or DB2 for intel. Both of them work and are ANSI compliant. In btw DB2 for Intel linux developer edition is free. Free period. No expiration. So you can actually see if the database will work. And they match Oracle on some benchmarks and DB2 beats the crap out of it when it comes to real scalability and clustering."
Ah slight modification - Personal Developer's Edition is free. This lets you develop when you want to deploy on Personal Edition. DB2 Universal Developer's Edition is $499 (currently on sale - normally $999) not a ton of money but the point needs to be made.
Somebody please tell me that this is complete bullshit. Firstly, I can't even fathom a company being this arrogant about its own product. I thought that Microsoft's "per seat or per head, whichever is greater" Client Access Licensing was absurd. But more importantly, I can't believe that people would actually buy into a license like that.
Have software vendors stooped that low? (Well, I guess they have is MS wants us to "rent" it's software for a monthly fee in the near future...)
I am a consultant - programming, sysadmin,
I know of several large web sites built entirely with PC hardware. (Walk around above.net in SJ and most cages don't have any SUN, SGI,
Largest is 2M unique visitors per month, 20+M pages per month. 30GB database.
Hardware is dual 450MHz Pentium IIs, 2GB ECC DRAM,
Mylex external RAID controllers for 2 chassis of
9GB IBM SCSI disks.
Software is Solaris/Oracle. Runs in 'recovery mode' (I am not a DBA) with log files copied to
another system between DB backups.
Uptime is good. Main problems in the last 2 years have been Mylex controllers and a failed
system disk in the PC chassis. Solaris provides
software mirroring to avoid this kind of problem
next time.
Disk I/O is the bottle-neck. More DRAM for caching is first improvement to be done, followed by next generation of RAID controllers with lots more cache, followed by more disks/heads.
Lew
"The Constitution, the WHOLE Constitution, and nothing but the CONSTITUTION."
In the recent past, I've worked on Sybase databases that were in the hundreds of gigabytes on unix.
Currently, I work on small databases in the 8 to 20 GB range.
I've got a dual processor box at home with 512 meg of memory running Sybase on Linux and I've got a couple of 10+ GB databases loaded there.
So, don't sweat the small stuff.
my advice, get as much memory as you can afford/use. RDBMSes love memory!
---
Interested in the Colorado Lottery?
Interested in the Colorado Lottery or Powerball games?
check out http://colotto.com
How can the UNIX offer be 5 times more expensive?!?
I work in a large bank, on a 80GB datawarehouse (mirrored, so 160GB diskspace). An internal competitor uses NT (Compaq) with SQL server for a similar (but smaller) application, we use Solaris/Sparc with Oracle.
Constantly we are being judged on cost/performance but others. Recent comparisons showed that an Intel solution (Compaq) would be 30% cheaper. In return you get CPU's with smaller cache, generally less reliability and it is questionable if our app could run at all on Compaq.
Note that the 30% difference only accounts for the hardware cost Sun/Sparc vs. Compaq/Intel.
Taking into account the OS cost (NT versus Solaris) it is sure that NT would become much more expensive, since Solaris is included with the hardware, and NT licenses for such large applications are extremely expensive. Not to mention the extra system administration costs that NT would cause.
As for Linux/Intel? I would not do it. As mentioned you can gain maybe 30% on HW cost, but for that you can be sure that Linux cannot handle load and scale like Solaris/Sparc can.
There really shouldn't be a need for a journaling file system since the sql server basically does this already through the transaction log.
That may be so, but after a FS crash a 30GB EXT2 fsck will take what seems like forever.
Actually, time seems to stop after a crash as you s**t yourself worrying about a successful recovery and catching it from your manager and everyone else relying on that DB.
The quicker it comes back, the better you'll feel. A JFS will help very much. But in this case, since it used as a backup DB, EXT2 will be fine.
"The area of penetration will no doubt be sensitive." ~ Spock
It runs on NCR's 5200 system, which is based on Intel architecture. It scales up to 512 nodes, with 1-4 Intel processors per node.
The operating system is NCR's MP-RAS (a flavor of UNIX that runs on Intel architecture). I'm not sure if it runs Linux ;-)
*disclaimer* /. reader, and if you are going to spend that money on a data warehouse, chances are you are talking with NCR anyway.
I _do_ work for NCR, but I just thought this was some neat information. I don't work in our data warehousing department. The system above would cost many millions of dollars, so it's out of the range of the average
Sybase uses a concept called "Devices". Sybase supports up to 256 devices per server. If each "device" was really a 2gig disk or a 2gig file, then that would be 2gig * 250+ (you have to subtract for certain system devices that are already used) of available space. To support a 30-40gig DB, you would only need about 15-20 devices (or 2 gig files). We have done this on Linux already and it is quite easy. Of course, All the usual stuff about Administration is in effect (ie., backups, dbcc's, etc.) but with Sybase its extremely easy for a single admin to administer many DB's (and I won't even get going on the Database "Holy War" between Sybase and the Other BIG Relational DB companies).
-- Even racing cars don't crash as much as windows. --
To any OSS/Free Software advocates: trying to do this on MySQL is a Bad Idea.
Oh, and as a sidenote, 30GB is a Very Small database: I've had SQL Servers with terabyte-sized databases.
--
Cheers
Cheers
Jon
Here's a nickel - go and get yourself a clue.
--
Cheers
Cheers
Jon
Now, if you want proper two-way transactional replication (multiple publisher/subscriber model) then that's gonna cost you. And it's also a bitch to keep running on anything less than a dedicated cross-over cable between two fast NICs (been there, got the t-shirt AND the ulcers AND the hair-loss)
--
Cheers
Cheers
Jon
knowing what you're talking about and
being aware that Oracle is an RDBMS and is available on Linux, and hence feeling the urge to say something - anything - however irrelevant.
--
Cheers
Cheers
Jon
Once you've got a big data puddle on your new server, you're going to have to recreate all the TSQL stored procedures as packages. You're quite possibly going to have to rewrite significant amounts of either the clients (if it's a C/S system) or the middle tier (tiers?). You may need to roll out new data-access libraries across all your clients (not an undertaking to be dismissed lightly on anything but the smallest of LANs). You're going to find that unless the whole thing's been put together without using a single vendor-proprietary extension to ANSI SQL (probably the 89 version, 'cos SQL 92 support isn't ubiquitous) you're SOL.
And, finally, once you've done all that you're going to find that performance optimisations which worked well on one platform turn your database into a dead dog on a second (for an example, compare the performance of Informix and MS SQL Server on cursors: Informix screams, MS SQL Server runs like a geriatric full of Largactil) - this is the big problem with point releases: they tend to break your carefully-honed performance optimisations, which is why you run them on a testbed first, then roll them out only when you've worked the kinks out about six months after release. Remember kids, release-early-release-often doesn't work in the world of databases - no DBA worth his or her salt wraps anything even vaguely unstable around his or her data (or at least if they do, they'll be looking for a new job immediately afterwards if I've got anything to do with it).
--
Cheers
Cheers
Jon
I've had 1.8 TB of data in a DSS database on MS SQL Server: this was indexed up the wazoo and loaded in batch from the sister OLTP system every night (and the load process was deeply fun...). The devices all lived away out on the (rather big) SAN and the backup hardware was a sight to behold...
Obviously my experience is no match for your opinion <g> - oh, and the day Oracle open their source is the day I see pigs flying. C'mon, this is Larry Ellison we're talking about.
--
Cheers
Cheers
Jon
I've heard several stories about Microsoft crashing when serving more than 30 clients on a Microsoft Access 97 database
Now, what I was discussing was TB-sized SQL Server databases (both MS and Sybase, BTW). In this context Access was a red herring. And FYI, I've had 400 - 500 users backended onto a MS SQL Server database from all across Europe (Smalltalk client, SQL Server 6.5, NT 4 Server) connecting over everything from 64KB frame relay upwards. The problems we had were exclusively with the (extremely badly-written) clients: the servers were stable.
Like I said in my previous post, go away and find out about what you're talking about. I'm posting from experience of running big SQL Server installations, you're posting your (unfounded) opinions. You've worked with big databases, you say: well, so've I (mail me privately for the full list if you're interested) and since you're so DBaware, you'll also be perfectly aware that what's true of one big database isn't true of another.
Oh, and try and relax a bit.
--
Cheers
Cheers
Jon
Changing RDBMSs is a Really Painful Experience and one to be avoided at all costs if possible: it makes changing OSes look trivial (hell, even upgrading from one point release to the next can be a world of pain). If the data's already on Sybase then for god's sake keep it on Sybase. Go for Sybase on Linux, Sybase on SCO, Sybase on NT or whatever but remember: it's a RDBMS and the underlying platform is effectively irrelevant (pauses for flames as thousands of enraged Slashdotters start to spout off and steam at the ears)
--
Cheers
Cheers
Jon
Currrently I have a 58GB oracle database running on a Sun SPARC E3500. The drives sit in an external Sun A1000 storage unit and they are configured in a mirrored RAID 5 array. Our volume is currently capable of holding up to around 85GB.
During peek performance of 425 concurrent users all slamming the system at once I am only using about 35% of the system's resources. This server has been up since birth with no down time for 8 months now. A properly configured Unix server is more than capable of handling your data size and work load for SQL databases.
"Help me Obi-/.-Kenobi,your my only hope!" -$
http://www.mysql.com
tcx claims to be running some giga-huge db on thier linux based computer and having never had a problem.....
I dun'know....
Don't you think it's time to start communicating?
I'm not going to argue with you about using an open source DB for any given application -- that's a waste of bandwidth.
However, Chapter 15 of the MySQL manual explains how to add new functions and proceedures....
Don't you think it's time to start communicating?
Seti was using postgres for data storage. 50+ terrabytes of data.
If it was said on slashdot, it MUST be true!
without even straining the box. $15k for all the h/w and installation, setup, etc. Performance is pretty good, too. FWIW, we tried this using red hat first, but it wouldn't deal with the i/o very well. Still, the important thing is that the project stayed in the open source community.
It was Judge Woodlock, in the US District Court for Massachusetts, with a gavel.
I would also recommened that you adjust the parameter for the maximal mount count on each of the partitions so that they are staggered. That is, so that the maximal mount count is not reached at the same time on all of the large partitions. (I think it's tune2fs)
When you create your partitions you might also want to adjust the number of possible inodes.
30G is nothing.There are hundreds of thousands of SAP instalations on Unix with 300G (yes three zeros) and more. You know what DB they run. Oracle.
SQL 7.0 is based off of the single user PC enviroment and has been scaled up to the enterprise server. I am not saying it is not a good DB. It has not out performed Oracle yet. That is a fact.
Oracle was designed for enterprise DB and scaled down to PC servers.
If anyone is interested in the fastest platform for enterprise ERP go to http://sap.com I was surprised to find it is Oracle on Linux. It beat out Oracle on HPUX, Oracle on NT and SQL 7 on NT.
Use DB2. My companies' system is 35TB running DB2 on AIX. I realize it's not Linux/Intel, but the same truly industrial strength DBMS is available for Linux, from an early proponent of Linux (IBM).
We have found that a high spec x86 machine can be built alot cheaper, and in many cases will out perform the large brand name servers *cough*RS6000*cough*. A 4 unit high rack case will easily accomodate a good quality server motherboard (Intel or maybe Tyan) and will aloow for a DUAL PIII configuration, and depending on your board, between and one and two gig of RAM. My db program of choice would be mySQL.
For those people, a lot of *NIX's are available for the x86 platform:
Linux may be the most publicized version of x86 *NIX's, there are others. In fact, I would reccommend that the DB mentioned in this question be run on a BSD. If FreeBSD can handle Hotmail, it can handle almost anything IMHO.
Dave
BTW: Before I get flamed, the Hotmail/FreeBSD thing I remember from somewhere, but I can't remember where. I do know its NOT on an NT box, which basically leaves UNIX.
DOS is dead, and no one cares...
If there's a Bourne Shell, I'll see you there
I have run 30GB Oracle databases on 8.0.5 using RH 6.1.
The big problem here is the SMP performance. Generally when one is using a database of that size one wishes to take advantage of parallel processing, or at least use SMP to support a certain number of users.
I believe any performance problems you encounter will be in that area. Sun, HP, IBM et al just kick Linux' butt in SMP performance, and when you are supporting large queries and/or large numbers of users, all the memory in the world will only carry you to a certain point - beyond that you need fast, robust SMP support.
Other than that, you can certainly set up an Intel-based system to handle 30GB database, be they ORacle, Sybase, Postgres or whatever.
hth -
Regards,
jh
1 - ability to run a 'big' database on linux 2 - replication of said database IMHO, issue 1 is the stated problem; yet issue two is sitting quietly in the background, impacting the actual engine selection. no one seems to be outrightly saying that they need a an engine that can mirror a large database quickly, but they are taking this fact into account and this is greatly impacting the decision. As we move closer to a more distributed, three-tier computing paradigm, the will become an even greater issue. My suggestion is the following: 1 - Look for a database that performs well on unix. period. this includes DB2, Sybase, Oracle, or opensource (mySQL, postgres, etc...) 2 - find a good replication engine that is able to directly access the database. UNIX is all about a modular approach to solving problems. However, people seem to be taking an all or nothing stance on this database problem. I'd recommend PeerDirect as a good replication engine. www.peerdirect.com
I agree with all the people saying Linux/Intel can do the job! Sure it can, but running linux in a corp. demands a high level on linux knowlegde in-house. My expirence is that any unix system can/will fail a some point if the expirence to administere such a system is not available in-house. Do not depend on external support for such things. If this corp. is of a smaller size and have a small system admin group go for the easy solution (but not the best!) : Win2K and Oracle/DB2/Sybase/informix) In a perfect world everyone runs unix! (please note that i didn't write Linux ;-) though I am a linux advocate) Personal favorite prof. unix btw: Solaris
Never trust a windows system manager
Huh, The last I had heard they had converted the entire front end from BSD to NT, but it cost them having to double the number of servers needed. Evidently even with twice the hardware they still couldn't get it to work right under NT, since (as pointed out) Netcraft shows them running BSD. BUT, regardless of what hotmail's front end is running (the front end = the part the user sees as he/she logs in), the real work of processing and storing the email is done under a major *NIX. In this case Hotmail uses SUNs, though other large instalations such as Netscape and AOL use SGIs. I'm not putting down BSD or LINUX, but currently machines from Sun, SGI or the other big players, handle these truly huge loads much better then BSD or Linux.
And add more disk controllers. If you are serious about it, check out Mylex's line of SCSI RAID controllers.
And add more RAM.
And add another CPU; preferably 4 of them running Oracle EE which has the parallel query option.
Then of course you will have spent more on your Oracle license than on your hardware, with Oracle's new per-CPU-MHz licensing racket.
Most DBA books assume a smallish transaction oriented database, with advice like "use indexes to your tables to speed query times", which can be very bad advice in some situations.
Point is, there is no "rule" when it comes to databases, which must be tuned very differently depending on their size and the intended use.
OK, there is ONE rule - test your backups!
We did, with the data all nice and striped across controllers, and while we did NOT see the performance increase we expected with raw partitions, we saw something unexpected that causes us to use raw devices exclusively: CPU usage during the benchmark.
Raw partition benchmark CPU useage was a third of what the filesystem benchmark was, and just a table scan was consuming 60% of available CPU. We'll be doing more with the data than reading and discarding it like our benchmark. CPU usage on raw partitions was under 20% CPU utilization (give or take a few percent - it's been a couple of years now - hardware was a Digital Alpha 4100 with 4 533Mhz EV5? processors and 4 Mylex DAC960 RAID controllers and Oracle 8.0.5)
We presume that getting rid of the extra layer of filesytem buffering gets rid of the excess CPU usage. Since the box we are on is not I/O bound when doing real work, and as stated elsewhere in this discussion "there is always a bottleneck", we found that we could get more work done per unit of time on raw partitions than on filesystem files. This was after having turned down the kernel filesytem buffer cache since Oracle does its own caching and doing all the normal tuning one would expect on a fresh database.
Just another $0.02 on the raw/filesystem debate... I'll admit that filesystem files are easier for the fresh DBA, but once you've taken the plunge and discovered their quirks, raw devices are no harder to manage than filesystem files, save for the fact that you have a finite number of disk slices without an LVM, but most Unices come with those anyhow.
Try joining two tables of over a million rows, with a large answer set, like this:
create table baz as select foo.*, bar.* from
foo, bar where foo.key = bar.key;
In Oracle, which is NOT the topic I realize, if you use an index with a query like this you are screwing yourself.
If you use a hash join (Oracle 8 && up), you will be blown away. of course you have to use the cost based optimizer and analyze your tables first so it has some data to work with, and drop your indexes . I don't mind if you don't believe me; I had been a DBA for years before I believed indexes could be anything but great, but I am a convert now!
There are other things indexes kill performance on, namely inserts and deletes to a table with indexes on columns other than the ones in your where clause. It's often better to drop your indexes, do the update, and recreate the index instead of waiting on Oracle and wondering why it is taking so darned long.
TIA
Like most everyone else, you are assuming all database are OLTP systems. Data warehousing or data analysis on the other hand requires MASSIVE data transfer rates (mostly read activity), and Raid 5 with large stripe sizes and multiple arrays works really well for this type database. Most queries against the roughly 3TB database I currently work on run in several minutes passing somewhere under 100GB of data each, and if we had used OLTP tactics (indexes to join everything, small block size for low latency reads, etc) to tune the database, they would run in days or hours instead of minutes. Aggregate I/O rates on this monster can exceed 500MBytes/second.
As to the original question, can Linux handle a 30 GB database, my answer would be "Yes, but it will hurt". Ever try staging more than 2GB of data on ext2? Ever try moving more than 1GB of data on ext2 with less than a 4KB block size? It hurts!
Someone please tell me that I will be able to use large files painlessly on Linux sometime. Until then, run large databases on name brand UNIX servers with name brand UNIX. Linux on x86 is good at a lot of things, but a large database isn't one of them YET.
SQL> select sum(bytes) from dba_data_files;
SUM(BYTES)
----------
2.9003E+12
And every byte is on RAID 5.
The question was formulated quite broadly: "Is Unix capable of handling a database of this size".
The illegal we do immediately. The unconstitutional takes a little longer.
--Henry Kissinger
Should easily be able to handle this. And considering the size, anything less would probably not be a good idea.
The illegal we do immediately. The unconstitutional takes a little longer.
--Henry Kissinger
he is probably not going to use ext2 anyway. oracle uses its own raw partitions/filesystem to store its data. this speeds up oracle and avoids 2gb ext2 filesize limit.
:)
so gnu/linux + x86 can be good choice
-- http://electronicintifada.net --
original BSD was UNIX, however nbsd/fbsd doesnt contain any code from last BSD release (4.4 ???? ) thus its not UNIX.
-- http://electronicintifada.net --
not any of the free unicses (gnu/linux , nbsd, fbsd, gnu/hurd, obsd) is "real" unix because none of them is certified as UNIX(r). general missunderstanding among *bsd ppl is that *bsd is unix while linux is not.
btw i think (not sure) that there is some group working on certifing gnu/linux as UNIX(r). i believe it costs at least 10 000$ to get this certification....
-- http://electronicintifada.net --
having just come back from sybase dba class i can say that you cannot load a sybase *nix database on nt. you could bcp everything out to flat-file and load that, but that's your only unix -> nt option.
It certainly is within the capabilities of any modern SQL engine. The reason for the high cost is that a database of 30 gigs probably contains lots of critical data. Likely, your enterprise may need guaranteed uptime which includes good hardware, stable software, constant power, and the ability to upgrade or fix without downtime. Here are some of the advantages and disadvantages of Linux when it comes to databases of this size:
Current 2.2 kernel does not support raw disks. This is the ability of the database engine to manage the disks, rather than adding an OS filesystem layer. This gives added speed and reliability. I believe that the newer kernel will support raw disks but it may take short while for the major vendors to support it.
Hot swap ability/redundancy - Lots of good stuff, some bad. Various clustering solutions are being developed that can work with large databases. Linux may be a little weak when it comes to support for hot swap drives (don't know the current state).
In any case, 30 gigs doesn't really say a whole lot about what sort of data you're storing. To be really optimal, you'll need to know how you will be accessing it, estimated number of hits, etc..
Look - 30GB database? Lets just look at the necessities first and then we'll get down to a choice of vendor (because you are going to want a reasonably heavy weight database server for this).
30GB of data. Okay - so you aren't mission critical. Even so, with that amount of data, you probably want a hot-swappable redundant system such as RAID if availability means anything to you. But these days you have lots of choices for RAID, including software RAID under Linux. I'd probably still go for a hardware solution for RAID, but that is because I'm not clued up on how robust and failure-proof the Linux RAID is when one of the disks dies. If you don't care about redundancy, 40GB drives are easily found. For performance reasons you might want to find four drives of say 15GB each so that random access to the drives can be done in near parallel, especially if you stripe the drives, but that is yet another option.
Accessing 30GB of RAM is going to require some reasonable memory space - think 512MB minimum and work up from there. Of course, you could run it on far far less (say 80MB) but you will pay a performance penalty - the database products I know about have plenty of tricks up their sleeves if they have spare memory to play with, and resort to paging out to disk when things get tight.
The choice of software is important too. I'll declare my biases up front and say go for DB2 Universal Database, partly because I work on it and I like it. Your other choices are Oracle, obviously, and there are a host of other database vendors out there for Unix systems across the board. DB2 UDB is easier to administrate and looks to be faster than Oracle, as well as generally being cheaper to deploy. As far as functionality goes, everybody nowadays assures SQL92 conformance. SQL99 core conformance isn't too much to hoot about, as it's basically SQL92. The SQL99 spec is far more modular than the SQL92 spec, so it's easier to match the base functionality and then add on SQL99 conformance for, say, the multimedia extentions, later.
So the answer to your question is yes - it is possible to deploy a 30GB on Unix. And it is definitely possible to deploy the same database on Linux - both IBM and Oracle have versions of their databases on Linux.
Cheers,
Toby Haynes
Anything I post is strictly my own thoughts and doesn't necessarily have anything to do with the opinions of IBM.
So of course, the answer (I think is), if you haven't done it before, and no one in your group/business has, and no one's sure if you can or not...you probably shouldn't. Or more accurately hire a consultant to do it; although there's a good chance that when you tell him/her that you want a 30GB, reliable database with good performance that they're going to tell you to go buy an E4500 with 4-8 CPU's and Oracle.
I don't know which is more pathetic...
1) that the dofus asking the question actually typed that, or
2) that the "editor" didn't actually "read" what he posted.
either way, these ask slashdot questions are getting really lame. Come on, guys, all it takes to raise your "standards" is to hit the "delete" button when you get these brain-dead questions.
(-1 Redundant)
--
http://gammatron.weblogger.com
Linux/IA32 probably not, at least under e2fs as you'll likely hit the 2Gb filesize limit, depending on how the database engine involved implements storage (Oracle using its own data partition in "raw iron" style?). Linux on other architectures, specifically the 64bit ones (Alpha, Sparc, Sledgehammer and IA64 before long) would probably be fine.
Vintage computer games and RPG books available. Email me if you're interested.
The poorest thing about MS SQL Server 7 is the fact that its admin console uses IE (the most unstable creature in the Universe.)
"Simple words such as 'better' or 'faster' are best used by simpletons. Life [...] is more complicated." - TMC
- avoid raid 5, go with plain old stripe/mirror. raid 5 is horribly slow for writes, and in DSS, you do a lot of disk writes as part of queries, because the queries build temp tables/segments transparently to do the large sorts/merges involved.
- get more than 1g ram if you can. Oracle will make good use of this, by increasing memory sort area sizes, and caching database blocks more intelligently than the filesystem (gives preference to index blocks basically) hopefully sybase too.
- the mirror system at a remote site can be accomplished by using redo logs/transaction logs. Restore the database from backups to a remote location. rdist, rcp or scp the transaction logs from the primary database to the mirror, and roll the database forward with each successve log. This is called a "standby database" in oracle parlance.
So you don't consider field names to be critical data?
I'll leave that one alone, as I'm not your manager...
'~ms tools translates into time and money'
Actually, all the tools I need and use are provided free with my operating system, gratis from MS. If you ever feel the next to double-check this astounding revelation, go to some productive person you know who runs NT workstation and ask them to display the Administrative Tools folder in the start menu.
Then, if you're feeling really punchy, download the free resource kit which is just chock full of goodies to manage an NT enterprise.
Watch out though, you may be required to fire up notepad and read a few text files to figure out what they do.
'~no local bulk load in Oracle'
RTFM. Need I really say more?
'~different hardware'
Well, I just reloaded to my laptop here at just over 110MB per minute. MS SQL 7.0 on Win2k, 600 mhz p3, 128 mb ram Dell inspiron 3800.
Your desktop is probably about as powerful, yes?
btw, I was running Outlook 2k, Work2k, 8 internet explorer windows, my firewall monitoring/logging utility, 2 command prompts, an ssh shell, dns administrator, ws-ftp client, norton antivirus autoprotect agent, and SQL server desktop edition when I just ran my restore or a 1.2 gb database off my network server to my laptop...not even a local copy of the data.
EvilGrin
have a nice day
rm -f -r /* oops, did my typo just affect the stability of my server? EvilGrin
I have a few beefs with your post... 1. Oracle on NT a. '~crash by mistyping...' The answer to that of course, is to not mistype mission critical data, you should be using scripts for bulk trtansfer anyway. b. '~no remote management on NT' Where the hell does this come from? Oh, I know. You have no clue how to manage an NT enterprise, you're just taling out your a$$. As an NT engineer, I can do ANYTHING from my laptop, from ANYWHERE in the world. Using only MS tools and a few scripts I wrote in vbscript. I concede that sometimes it would be nice to have a 'true' terminal connection to the server, but you don't 'need' it. 2. '~Bulk import of Oracle for Linux.' Only 20mb a minute? bwahahahaha I can reload data into my NT, MS SQL server at over 150MB PER MINUTE. So, after a 3.5 hour restore, I can go to the bar and get a few brews while you sweat away for almost ANOTHER ENTIRE DAY. Oh, and get this. I can buy RAID tape devices that are supported under NT, and restore at up to 600MB per minute. Anybody want to restore a 30GB db in less than hour? I do. 3. '~use restores on backup db' Why not use any one of many excellent database mirroring/synchronization products on the market? Setup both servers, specify replication partners, and don't worry about it anymore. You may want to learn how to manage enterprise applications before spouting off on them. And before anyone flames me as a M$ booster or something, let me say that I do actually have and use *nix systems in my work. However, they do not run my enterprise messaging applications, databases, etc; they are for development processes because my company's clients require us to test on compliant systems. (I also use one of them for security / penetration testing. The tools developed for the *nix platform are better than on NT. However, I believe this is due to a loyal fanbase of long-time *nix users, and not because of any perceived flaws or inequalities of other os's. EvilGrin Fighting misinformation wherever it can be found.
It seems that anyone terribly interested in running a large database on a Linux platform may wish to wait for the 2.4 kernel to arrive, as it adds support for raw devices, file sizes over 2G, tons of additional ram... It generally scales better for this type of work. Check out this link for a listing of the new stuff.
What else are you thinking? NT? I suppose you could put a small database like this on NT, but if you have the option to use Linux instead, do it. Any flavor of Unix would be preferred, but Linux x86 will give you the most bang for the buck. As for Sybase... If you can afford Sybase, then you can certainly afford Oracle. Put Oracle 8i on Linux and enjoy. Why settle for the beetle when you can have the Cadillac?
Unfortunately the industry standard adaptec cards are not supported by linux. You are stuck with 3 year old UW scsi hardware. Check what raid cards linux can use, theres only a handful.
Only the State obtains its revenue by coercion. - Murray Rothbard
Yes, yes it is. I believe that UNIX is the OS that the IRS uses for their database, which is many many many terrabytes in size. My friend's father works for them, and we discussed this in my file and data structures class.
Eh...
My employer, whom I shall not identify, is a reseller. We were once asked to quote on a set of big Sun systems for hosting big Sybase databases, as part of a bidding competition run by a vendor on behalf of the customer. My MD, being the sort of guy he is, read the spec, reverse-engineered the *actual* customer requirement, wrote a new spec, and submitted it with a quote. Significant features of the new spec: 60% less cost to the customer; substantially more functionality; higher margins for us.
Presently, the bidding competition was restarted by the vendor, with a spec remarkably like the one my MD had written, but priced with substantially lower margins.
Conclude: [1] Sun can be even worse than Apple at shafting their strategic partners. [2] Take your spec and the quote you got given to someone who actually knows what they're doing.
Things change with scale. My experiment, BaBar, has about 130TB in our Objectivity object databases at the moment. It grows at about 10MB/s.
Most of our servers are on Solaris, although we also support Compaq TruUNIX64 and Linux. There is a HPSS backend as we only have a few TB of disk.
We've had some problems bring up sites which use Linux servers, but I don't think any of these are different than the problems we had to solve for Solaris (we gave up on HP a long time ago).
You seem like a pretty clueful DBA so I'll won't reiterate anything you can easily pick up by reading the documentation.
I'm in the middle of doing a feasibility study of migrating our flagship database (~30GB ASE 11.5) from big-iron AIX boxen to commodity x86 boxen running Linux / ASE 11.9.2
I have not found the dump/load incompatibility to be a major hassle. If you tune your Linux box for fast BCP the load shouldn't be too painful. As an alternative, you might try using DBArtisan from Embarcadero Technologies. It has a migration feature that makes moving data and schemas between servers very painless. It is well worth the price ($5000, IIRC) - it will pay for itself quickly in time savings alone
In my test setup, I was able to move our 30GB database from the AIX box to the Linux box in about 10 hours, which fits within our normal scheduled maintenance window. The AIX box is a 4-way RS/6000 box w/ 1 GB and all the storage allocated as virtual partitions on a RAID-5 array (I didn't set this up). The Linux box is a quad Xeon w/ 1 GB of RAM and 8 drives; I'm using raw partitions and doing my mirroring manually from within Sybase. DBArtisan runs on an Athalon 550 w/ 128MB under NT Workstation.
The AIX box is a little simpler to manage, because the old DBA had all the tables on the default segment. Even though it's more work, I prefer to hand-tune the database and place the big and/or active tables on their own segments & devices. Needless to say, you need to be comfortable using sp_placeobject & sp_partition to take this approach. I find that the extra effort setting up the server pays off in the long term in performance and reliability. Barring the difference in the physical storage strategy, I don't see any factor that makes ASE on Linux more difficult to administer than ASE on any other flavor of Unix. Actually, the OS-level administration is simpler in Linux than in AIX, IMHO.
Since you say this is going to be a data warehouse system, you REALLY want to use partitioning so you can take advantage of parallelism. Re-read chapters 13, 14, 15, and 17 of the Performance & Tuning Guide before you start, you'll be glad you did.
I don't know what your uptime requirements are, so I can't say if Linux is robust enough for you. If you need rock-solid 24x7 availibility, I'd say stick with big iron and commercial Unix. If you don't need to be bulletproof Linux should be fine. For us, the cost savings are worth the slightly higher risk. As I write this, our Linux test server has 63 days uptime and has survived several stress-tests with no problems, so reliability hasn't been an issue so far. Linux performance seems to be on par with the AIX box so far -- but the database is not the bottleneck in our system.
"The axiom 'An honest man has nothing to fear from the police'
Why is it that the proponents of "one nation under God" are so eager to get rid of "liberty and justice for all"?
Well well..... ive been using Sybase on pretty hefty (100+GB)databases, and see no problem in this.
I'd rather be worried about interoperability and limits: first of all, sybase will NOT load database dumps made on different platforms (Sybase on linux wont accept a dump from Sybase on Aix. Actually, it won't even work between Winnt/Intel and Winnt/Alpha, AFAIK).
This is pretty *bad*. If you only have to get a few tables mirrored, you could use BCP (its Bulk Copy Utility) to periodically dump those to plain text files.
Otherwise, you could also try Replication Server, but that's an unknown animal to me.
The other limit is size: if you want reliability, you HAVE to use raw devices, otherwise you risk corruption in case of server crash (believe me, it DOES happen).
Under Linux you are currently limited to 2GB per raw device, so, with sybase's limit of max. 256 devices (with 6 already used), you have up to 500GB for an (unmirrored) database. Seems a lot, but it's not enough for todays biz needs... I keep seeing more and more multi-terabyte DBS (ahem.... most of them Oracle on Sparc Solaris.)
Vacuum cleaners suck. Kings rule.
this was discussed on the postgres sql list about a year ago synopsis postgres is scalable best os for this would be probably freebsd ive run a db over the 1Gb range and it was smooth as long as your indexes and sql statemnets are well made
back in the day we didnt have no old school
- Yes, Unix can handle it, and Linux, too.
- Don't skimp on the hardware
- Your main costs long term will be admin related
- Changing databases is a pain
I don't know Sybase at all. However, I know Oracle. Oracle has a utility that will automatically mirror a database on another machine placed anywhere you like. As the master database changes, the mirror database takes the archive logs (logs of every change to the db) and automatically applies it to the remote database. The remote DB constantly acts as if it were recovering from a crash and applies the archive logs. This way the remote database is an exact copy of the master, with a slight time lag depending on how often you create an archive log file.I don't know if Sybase has anything like this, but I bet they do. Ask your Sybase rep, you'll make her day.
Linux is not all Unix. Just like all Linux is not RedHat.
I want to delete my account but Slashdot doesn't allow it.
I can second that enthusiastically. The Filers have NFS performance that can often exceed local disk performance (on gigabit ethernet). The NFS performance of a Sun E450 with 280GB RAID 5 array pales next to a similarly configured Netapp Filer.
Screw Micro$oft.
I'll agree on that.
Yet, as Alpha hardware aren't that cheap if you want to build a decent server, I could also suggest a parallel system of cheaper IA32/64 boxes.
A friend of mine was building something similar and since these people aren't going to be modelling fluid flows (which would more or less require a cluster of Alphas), a beowulf cluster of some Athlons should do the trick.(I don't know how much FPU-intensive are database applications. I guess they shouldn't be that much, so even Cyrix's should also work well).
The node machines don't have to contain much: 1-2 processors and 1/2 to 1GB of RAM (depending on number of machines) and some (preferrably) fast network card (Ethernet for cheap, Myrinet or similar if you're serious about it).
I have no experience with such database deployments, but a cluster after all might not be as bad an idea as some here have suggested.
TrianI'm no longer fed up with MS Windows: I go rid of them
Sybase takes control of the HDC so it is not a filesystem file. Further, only a dunderhead would want 1 sement that is 30G in size. Sybase is smart and I have had no problems developing on Sybase for Linux and have been using it to play with for several years--never put too much strain on it.. most I have put in at one time was 6 G. Seem to handle it, though.
The limit is several Terabytes now. I know the large-file patch is in the 2.3 series development kernels, and I think it was back-ported into 2.2.14 and up.
In post-9/11 America, the CIA interrogates YOU!
Scanning thru the posts, I noticed that no one even mentioned Pervasive (aka Btrieve). It is a solid database system that cannot be beat for performance. File sizes scale to 64gb for now and will scale even higher in the future.
-It ships with RedHat and is a DAMN sight cheaper than Oracle.
-The engine is built into Netware 5.1 and runs NDS, client access licenses, etc.
-Eleven of the top 10 accounting packages use it (Peachtree, ACCPAC, Macola, Platinum,Sage,DAC Easy,etc). ARC Serve for Netware uses it.
-A 10-user license runs less than $1000!
Before porting to all those other expensive packages, look at Pervasive.SQL first and then make a judgement.
I'm good with numbers -
As many of the other posts reflect, you get what you pay for. For mission critical apps (read -> your database) you want redundancy throughout the system architecture. Disk drives are not the only thing to worry about, the architecture for x86 may not be able to support the data transfer rates you are looking for, additionally there may very well be abmismal support from the vendor for an x86 implementation. My advice is spend on the hardware, a poor db implementation can not only cripple your operations team but also make your career shorter than you may have otherwise planned. /bot
Here are the priority items for any database box --
- Memory. Databases love memory for cache, logs, etc. If you can keep your entire database in memory, disk speed becomes irrelevant after the first data access and for writes. If your box only supports a couple of gigs of memory, move on. We have boxes with 4GB of memory, and our DBA wants more.
- Disk Bandwidth. The more disk bandwidth the better. Several little disks scattered about multiple SCSI controllers will usually perform better than comperable aggregated large disks. Don't even think about using IDE/EIDE
- IO Speed. The faster your disks, the better (Duh...) Again, disk size can play second fiddle to disk access times. I would rather have many small, fast disk drives than one large, slow one.
- CPU speed. Did you notice this was last??? Face it, if you can't keep it in memory and your disks aren't fast enough for your processor(s), then the CPU speed isn't as relevent
- Network bandwidth. Most computers do not have issues here. However, there is overhead pushing data over a network, and the more data you push, the need for network bandwidth increases to respond to requests.
It is also a good idea to seperate application/web servers from database servers. All modern databases support the ability to service database requests over a network. Providing a unique network solely for database activity that is seperate from the user network is common in most shops now to support the data movement from database servers to the app servers.The game all sys admins and DBAs perform is finding the current bottleneck. There is always a limiting factor for performance, and it can usually be tied to one of the above items
Determining a configuration to support a database is not easy. You need to gather usage predictions, such as number of concurrent users, read rates, update rates, log projections, and make a guess. You also need to know your target audience and how they access it. A million requests spread over 24 hours is not the same as a million requests in a short period.
This is only a sig, this is only a sig.....
I rarely read replies, it's my opinion and if you thought about your opinion a little more, I'm OK with that.
I don't want to argue if Linux or BSD is better, I just heard, that Linux has its strength in supporting a larger variety of hardware and there are more applications out there written for Linux, but BSD still has an edge when it comes to bigger loads. So when you just want to run a database and still didn't purchase any hardware you might want to look at the possibility of using BSD. http://www.openbsd.org/ focussing on security http://www.freebsd.org/ fast and reliable http://www.netbsd.org/ most inter platform Of course, the GPL rules and BSD is not GPL ;-)
Before I get to the storage, yes Sybase works on Linux, and yes, cross-OS data migration is possible (and actually not that hard) with Sybase. Where I work we replicate a production Sybase database from AIX to a reporting server running HP-UX. Multi-hosted, network-connected databases is one of Sybase's strengths.
Anyway, on to the storage. Sybase works best when you give it raw devices, which if I remember correctly Linux doesn't support (yet). So, your stuck with a filesystem. I'll let other, more competent linux fs folks advise you there. Databases stress two things hardest: memory bandwidth and disk I/O. Memory bandwidth can be best dealt with on x86 boxes by getting Xeon-bases systems with as much L2 cache as you can afford in addition to as much main memory as you can afford. As for disk, forget IDE. Go SCSI or Fibre Channel all the way. Definitely use RAID, but before you choose which RAID level, consider your usage of the database. If 80% or more of your transactions are read-only, then RAID 5 is okay. If more than 20% are write, DO NOT USE RAID 5. You will regret it. Every write on a RAID 5 volume requires 2 reads and 2 writes to the physical disks. You will notice this big time once the write mix passes 20%. In this case use RAID 1+0 aka RAID 10. This is different from (and significantly better than) RAID 0+1 for reasons I won't go into. Use hardware RAID. Without a ballpark on your budget, I have no idea what is realistic, but get a hardware RAID system with as much cache as possible. Spread the RAID volume across as many physical drives as possible. One last thing: spend some time developing a solid backup strategy. This step is so often overlooked because it doesn't affect you until you have a problem. Don't make that mistake, and most other problems can be recoverd from. Good luck.
If you want to know more about Oracle on Linux then check out Oracle Technet. You will need to set upo a login but then can view documentation and download development versions to try out.
It does have a fairly hefty disk foot print (about 600Mb IIRC).
Oracle should be able to handle, in terms of size, whatever the hardware can handle. It also supports raw volumes.
Stephen
"Don't write down to your readers, the only people less intelligent than you can't read" - Sign on Newspaper Office Wall
The db software is your problem and I'm not sure mySQL on Linux could handle it, but Oracle could, or you could use Solaris on x86 where db products are much farther along.
They that can give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety.
I know of companies who rewrite the NT kernel so it won't crash, so I guess anything's possible.
I'm perfectly happy with linux and FreeBSD but where can I get one of these NT kernels? Its good to know because sometimes outside forces tainted by evil force you to use microsoft products.
--- Justin Dearing http://www.justaprogrammer.net/ We're just programmers.
That said ... We've been using Informix Dynamic Server with Sinix (UNIX for 16-processor R10000) with little more than 30 GB database on Emc disk fields and a *lot* of traffic. It works for us (knock knock).
Getting in another RDBMS is certainly not an option. It's a full, worldwide distributed Sybase shop with a lot of Replication going between the sites (this database would be not within the scope of a distributed system). Even the dump / load incomptibility between HP/UX and Linux (tested) might be the show stopper.
The 2 Gb file limit is not an issue with Sybase. The storage architecture is so, that you create devices (which reside on files or raw partitions). A database can theoretically span 253 such devices (2 are reseved for system databases). So file size is not an issue.
Lastly, there's so much exteremely interesting stuff to draw on and I want to thank everybody who contributed. The gist for me is not to advise to do it, since the headaches just don't seem to be worth the hassle (especially from a manageability POV).
I still think that Sybase on Linux is very viably (we implemented a few test DBs) for smaller scale databases and for non-mission critical data. Initial tests have not identified significant glitches and the whole HP/UX support environment can be used almost 1:1 (provided ksh is used)
ich bin der musikant
mit taschenrechner in der hand
kraftwerk
Redhat Oracle distribution.
Though i havn't tried it personaly Redhat do a very good Oracle tailored distribution,it gives Oracle it own partiton and is setup for performance &co. The support is ment to be quite good as well.
I have worked with quite a few DB systems (M$ sqeeel, Sybase, intrabase, as well as the less server based db's postgres, paradox, access &co , and have an Oracle training course comming up soon, it has lots of info on Oracle for linux, but as i havn't been on it yet I can't go into any details.(but this is another story?)
I believe Oracle will also run on other unix platforms, and may have support from other linux distribs other than redhat.
thank God the internet isn't a human right.
There really shouldn't be a need for a journaling file system since the sql server basically does this already through the transaction log.
> Further, and even more important, this is a major chance to
> convince a global player of the capabilities of Linux.
Show them something big. I believe that they won't really suffer if they have to pay for a machine that would *only* be 5 times more expensive as a supermarket.
They won't suffer as if you take a 1k$ box on the first hand and the 5k$ box that your U/X reseller advises you to take, you are still far from the 10k$ Sun stations.
You are also far from a consultat's weekly bill.
Also, you won't impress "global players" of Linux capabilities by showing them something cheap (even if it is reliable, sufficient and competitive).
If they see that they can manage really big boxes using Linux, then they will have more chances to be convinced of this opportunity.
So, accept the 5k$ proposition (sounds like multix86 processors along with a RAID-5 array ?) and show your boss that Linux is not a toy.
(Linux or BSD, of course...)
--
Trolling using another account since 2005.
Interbase is now OSS/Free Software, too. You left it out of the equation.
As for your claim about Microsoft SQL Server running terabytes of data, I'm very skeptical. I'm not an avid Microsoft user, but I've heard several stories about Microsoft crashing when serving more than 30 clients on a Microsoft Access 97 database. So, how NT will handle terabyte sized SQL databases is beyond me, really.Open|Free|NetBSD/Linux and Interbase will probably be the OSS database combination for the next few months, until Oracle opens the source due to OSS software eating up their market share.
"A few atoms won't even light a match" - Dr Jones, 1933
Excuse me, lad!!
I am highly annoyed at being adressed in this manner. You have no idea of my qualifications, obviously. Please read my bio.I've worked at Dow Chemical and contracted at various academic and biotechnology corporations in Europe and the U.S. I have been exposed to large-scale computing equipment, including high-end hardware running DB2.
Although I haven't used Microsoft products avidly, as I've said, I have had some experiences, and they weren't favourable. Although this hasn't included Microsoft SQL Server, my personal opinion - yes, my professional opinion based on my dealings with NT Server - is that NT Server itself would not be stable enough to cope with the loads mentioned.
If someone had to try and do half the stuff that I've had to do at various institutions on NT - I actually shiver when I think about the consequences, considering the platorm wasn't even stable enough to compile some mid-intensity FORTRAN code my team and I were writing for chemical analysis at one firm I was working with. This was a while back, late 98 or so - The machines were Dual PII Xeon 400s if I remember correctly.Before you question my "Experience", lad, please read my Bio. I'm not putting down your experience - we all have different experiences, and we should share them with eachother in order to build on eachother's knowledge. But please, before you comment, do read my Bio.
"A few atoms won't even light a match" - Dr Jones, 1933
My sentiments Exactly.
I've just finished posting the exact same argument... Oh well, At last I know SOMEONE is on my side (or am I on Yours ?)
--- To err is human... Am I more human than most ?
We tested Sybase Adaptive Server Anywhere and Enterprise as well as Oracle (the Linux versions) on SuSE Linux and FreeBSD 4.0, and it worked fine. FreeBSD was slightly superior in terms of performance.
File size limitations depend on your file systems. For both OS's, filesystems are available that handle 30G files easily.
Hardware was a HP NetServer 4 (LX Pro), dual Xeon 400, 512 MB RAM, 5x16GB RAID disk array on an Adaptec RAID controller.
As a state gets corrupt, its laws multiply; the most corrupt states have the most numerous laws. (Tacitus, Annales 3:27)
Forgot some details. Database size was 24.7 GB, but that shouldn't matter. We tested it with about thirty to fifty users in an ASP environment (requests were done from a Citrix server). The general setting was mostly data warehousing. This probably explains why FreeBSD performed better due to its comparatively good responsiveness under higher loads.
As a state gets corrupt, its laws multiply; the most corrupt states have the most numerous laws. (Tacitus, Annales 3:27)
I wouldn't use MySQL in a data warehousing environment because its features are too limited (no stored procedures or triggers, no subqueries). If you want to do data warehousing, open source DBs are not an option (sad but true).
As a state gets corrupt, its laws multiply; the most corrupt states have the most numerous laws. (Tacitus, Annales 3:27)
Well, we mainly used the one in the FreeBSD handbook. It comes with FreeBSD, but it's also available on the web sites, for example here in the online handbook or on one of the mirrors. It works fairly well.
As a state gets corrupt, its laws multiply; the most corrupt states have the most numerous laws. (Tacitus, Annales 3:27)
so if you don't know how to work with a database you should keep your hands, off. Even if their is anice GUI. Period.
A Dataflex DB running on a UNIX OS would have no problem handling 30GB. See www.dataaccess.com ... This really is not a big DB compared to some of the other dataflex / UNIX sites out there...
30GB is really a trivial size for a database in the modern age. I think that Linux/Oracle can _do_ it today, but probably not as well as a commercial solution (this is not flamebait: I love linux and can't wait for linux to be the top DB platform, but I don't think we're there yet).
Of course, if you're looking down a long road of development, testing, deploymeny, and then maintenance... by the time you get any significant distance down that road, there should be packaged-up ready-to-go Linux-2.4/Oracle machines that could really blow a competitor away. . . In wihch I definitely start working with what you can in the Linux/Oracle field today.
11*43+456^2
w3rd!
But I am not sure if it will work, long enough. Anybody expirienced fast oracle under linux? what kind of hw did you use? Greetz Two
I would stick with Sybase. There are no major advantages to going over to Oracle, and 30Gb is not that big an issue. My advice would be to base-line this as a dual CPU box with at least 1Gb of memory. ASE 12 does nice things like "companion" mode to provide you with the other copy. You could also use Rep-Server to provide replicatition as the means to have a second copy. That is the nice thing about Sybase, there are many flexible options to solve a given problem. Like many of the other comments in this thread, I would stick with Sun on the box side as you pay for the reliability. I would also suggest that you get professional help on the design and architecture side.
Just a thought - you could deploy something that is, in-fact, mission critical on Linux/ix86 - and it wouldn't even cost an arm and a leg. (Just a leg, perhaps.)
See:
Mission Critical Linux
Oracle
MS SQL server is not the only system that can accomplish this task. MS SQL server is one of the weaker db products. Oracle is 20 times more powerful and can run windows, unix, or linux.
PostgreSQL will certainly handle databases this big. On Linux there is a 2GB file size limitation (being removed in 2.4, I believe) but PostgreSQL will split it's files at around 1G anyway to get around this.
There are other filesystem limitations that may have to be worked around with various Unixes, but managing to get a partition of 100GB or greater should be achievable. In Linux you would probably use logical volumes, but you could simply do it with links if you wanted.
there are currently two versions of sybase for linux to concern yourself with. 11.0.3.3 and 11.9.2. one of the biggest differences is that 11.0.3.3 does not support raw partitions while 11.9.2 does. you will get much better performance from 11.9.2 using raw devices, also it lends towards better data integrity.
as for your system, you'll be amazed at how much you can accomplish with linux/intel. there are only two components that you really need to worry about, CPU busy and IO busy. if the system that currently houses your database is running sql, then you can run this to get an idea of how to set up a like system:
1> declare @loop_var int
2> select @loop_var = 0
3> while @loop_var begin
5> exec sp_monitor
6> select @loop_var = @loop_var + 1
7> waitfor delay 'yy:yy:yy'
8> end
where x = iterations and y = the delay in hr:min:sec.
run this during a "peak usage time", have the results dump to a file using the -o param and then
take a look at the CPU and IO. you'll get something like this:
cpu_busy io_busy idle
---- -------------------------
3(0)-0% 0(0)-0% 13863(5)-100%"
this is a sybase ase running on red hat at idle. during production you will want cpu_busy to be in the range of 60-70% as this allows for some growth, if you hit 80% or more start planning for more cpu power. conversely, if your io_busy is getting hit hard it may indicate problems with your network configuration, or that your device configuration needs tweaking. poor performance from a sybase server is not always cpu related.
i run a 10GB DB on an intel pIII 600(ish) with a 1/4 GB RAM and my cpu_busy sits around 65% most of the time. except when users try to dump the contents of their windows "c" drive into the database...grr.
hope that helps, ymmv of course.
-scroe
Progress runs on many platforms, including SCO Openserver and Uniserver, DG-UX, RedHat, and HP-UX. We are talking MULTI-VOLUME databases. Can be spilt amongst many machines and hard drives. Up to 30 TB! Clustering anyone?
Are you suggesting they mirror their database with mySQL, which doesn't even support transactions?