It only matters whether the greater of the two directions is reduced, if you do 8gig out and 1gig in you get charged for 8gig out, even if you reduce 1g inbound to.925G. (also it's 95th percentile typically, but that shouldn't matter in this case)
It would cost more to do the refactoring than the ever could hope to recoup, even if shorter urls also decrease outbound traffic.
And I wonder, has he ever tried to do anything non default with oracle's equivalint: tnsnames.ora, listener.ora swlnet.ora etc. The defaults have bad security holes, and changing things is pretty bad. They had a daemon in there who's *purpose* was to allow remote systems to execute programs without authentication for gods sake. You had to pay for encryption and it took highpriced dba to set it up (contrast with pretty simple SSL for opensource dbs)
SQLNet is *not* simple and out of the box is pretty freaking insecure.
Things oracle can do, that pg and mysql can't, include the below, but they are only truly needed in extreme cases
- massive IO thruput for datawarehouses, overlapping low end teradata. Many people call a pair of 250G drives a datawarehouse because it's.5T, but I'm talking reading sorting and analyzing say 10T of data in about a half an hour on just a few xeon boxes and a few FC arrays. It's a $200k system, but even on the same $200k of harddware mysql or pg will only push probably a few hundred megabytes/sec of the IO, meaning the same query *if* you managed to write it and parallelize it yourself on a distributed system of mysql or postres, since these would not present a single system image as RAC does. But hey, who *really* needs to to 10 gigabytes/sec of IO anyway?
- Very high availability: Not "I want my blog up 100% of the time" availability, but "we loose $250 per second that my website is down" availability. oracle is not the best here but is better than mysql or pg because of serviceability features, not because it doesn't have any bugs. e.g. rebuild index with now interruption, fix block corruption with no interruption etc. But then most of the real work for high availability is in the application anyway. but still you'll want the visibility and availability features the oracle backend offers.
I've been a oracle dba for a long time, and recently turned my back on oracle to seek other career challenges. I think the short explaination for my decision, is that the speed of hardware has grown to overwhelm most any real world performance problem, with just "good practices" as a dba as opposed to having to be some kind of hero dba. So there is no future there as far as $$.
I built some very large, very high thruput databases under difficult conditions (torrid growth at paypal) and I don't think it could have been done with that hardware and even todays mysql or postgres: only oracle had the features then, and it's features then exceed what pg and mysql have *in the category of scalability and serviceability*. This mostly stemmed from the difficulty of doing transaction safe database calls across multiple machines forcing us to scale a single machine for a very long time. And before anyone says pg and mysql have transactions, I"m talking about billions of dollars of transactions, where if you loose or fail to recover even a few you risk goign out of business because customers stop trusting you.
But moores law and whatever law applies to disk densities just crush any classic RDBMS problems. Number of spindles can still be sort of costly, but way cheaper than even 5 years ago. memory and cpu are practically free, making it unecessary and wasteful of time and money to obsess over tuning, inall but the most exterme cases.
If you're writing a custom new application, fuck oracle, except in those cases fall along these few lines:
- you want to do truly massive scans of multi terrabyte tables: Oracle RAC will do 2-4 gigabytes/sec on a few xeon servers facing a few FC disk arrays. PG or MySQL will never do that even on the same hardware, since they don't parallelize and they don't do directIO and asyncIO in a pervasive or big way. This is the classic datawarehouse design and is probably a bastion for oracle (oracle is basically trying to eat terradata from the lower end) - you want > 99.9 uptime: You will still need to work 10x harder in the application to get this or better, but you'll also need the serviceability of oracle e.g. index rebuilds corruption fixes without reboot. Also, the visibility and monitoring of oracle is lightyears ahead of mysql and pg. - you want to build an empire as a middle manager - you are writing a turnkey application you plan to sell to corporate america or govt.
So how bout someone explaining why you wouldn't burn your face when making a cell call?
my fucking microwave takes 3 minutes to cook an egg . Why would a handset operating for days on a 3 volt battery cook something that fast, while not cooking the face of anyone using it?
The original question was regarding real world performance of iSCSI in particular, and since frew of the posts seem to touch on that I may as well tell what I've learned from hard experience regarding the other technologies: SCSI and FCAL.
My experience is with very high transaction volume OLTP databases (oracle) backing a financial website. I've found that neither SCSI nor FCAL adapters limit performance significantly. This was with qlogic qla3200 adapters, or with highend adaptec Ultra320, on Solaris 9 and the last few versions if RedHat enterprise. Only the older versions of redhat had some kind of problem with the qlogic driver, plus bounce buffer IO, which drove down performance. But then to be nit picky, that was the driver not the HBA. Solaris was always fine, and now redhat is too.
The main performance challenge was *always* tuning the database and spreading out on lots of spindles. The HBAs at over 200M/sec each never posed a problem on larger sun boxes (8 or more procs) with 7 or 8 way parallel sequential reads going. On smaller hosts or smaller disk arrays,k the problem was always on the host itself or the disk seek times respectively, not the hbas themselves.
A 10k rpm drive will do about 70 mbytes/sec off the outer platter (near block 0) and as a rule of thumb, a 2Gbit fcal adapter will do 200mbytes a second (at least on solaris or newer redhat EL). So my dual qlogics would do 400Mbytes a second under absolute optimal disk access, but typically it's not that perfect 8 way parallel *serial* scan off the outer platter, its usually farily random.
So in the high end database applications (datawareyouse or OLTP) least the usual tuning challenge (and $$ for that matter) are with getting a fat spread across a lot of spindles, and making sure the application is either caching well (OLTP) or doing orderly, sequential scans (datawarehouse)
I may not be in your leage Most guys in my league measure the size of their systems in disk drives first, then memory, then processors. Yes, I can imagine doing a big database on linux. High availability is something else.
I am the lead dba for a company that processes 15-20 million us dollars worth of transactions per day. My backend database is solaris/oracle, it does 3000-4000 sql statements per second, and my company would loose maybe $1000 in revenue for each minute it is down. The larger
two tables in this databasehave in excess of 300 million rows, and are acessed by 100k customers per day. We have over 11 million customers.
It's running on a E4500, which is saving us a lot of money *not* buying E10000s. I like to think it's tuned well, but a big part of the reason it works (fast) is also that it is on an EMC with over 90 disk drives in it. I'ts all about IO bandwidth and servicability in my world, and on those points you are correct in saying sun is a handsdown winner over linux.
.
Now, I work with a sysadmin who is a whiz at making lots of linux boxes work reliably as a web frontend, and is also good at keeping our backend solaris based database up 24/7. neither of us is anxious to put the backend on linux, but we did put up a significantly large, high performance, but *relatively * low availability database up on linux.
It's a 6x800mhz intel box with 4g ram and 16 disks on mylex caching raid 5 controlers. Raid-5 sucks in general, but the point of this system was to get a lot of bang for the buck, so as a big league dba, I took the challenge of making data loads fast in spite of raid-5, in order to get a crack at de-installing windows from this box. If I spent some bucks on more disks, we could get a much faster system, but then that was never the point of this system.
The system is about 200G worth of partition tables (copies of the same 300M row tables mentioned above) with partitioned rollup tables off the sides, for business analysis. The real trick is the partitioning. because of the partitioning, this system is able to do many types of analyses that cant be done on our other analysis system which happens to be solaris with 60 disk drives.
the linux box was a leftover from a failed windows project, so in some sense it was free, but I belive it woulda cost about $80k new. gig ethernet and controler was about 10 or 15k of it.
It's working well for DSS, since the 2 times it's crashed in the last few months didn't really hurt anything.
I'm rambling on now, but I'll talk to the DBAs out there, who speak my language.
If you're gonna do Linux oracle:
- reiserfs sucked performance wise on top of raid 5. Don't know if I did something wrong, but I abandoned it in favor of ext2. I don't care if fsck takes a long time on this system, and ext2 creamed it for database io perf on raid5. I also couldn't get perf out of reiser on simple stripes without the added hurt of raid5, so go figure. fsck times are irrelevant if you use raw partitions, so this is the way to go in most cases.
- Max out the memory (of course) on an intel box. I think the most you can do is 4G on intel platforms. this is sufficient for me, but I kept the SGA down to about 500m, so I could have 10 way parallel processes with 200-200M of sort area size.
- Watch out for linux caching. I've turned it off for my filesystems. It's easy to get into "writeback debt" by pushing a lot of dirty blocks out of oracle cache into ext2fs cache. Add raid5 suckiness at random writeback, and you've got serious constipation problems on your hands.
- I've used some raw partitions, for this system , they seem to be worth it to avoid ext2fs caching hassles, but I haven't migrated completely yet. The "raw" command must be used to "bind" a name to a disk partition before it can be used by oracle as a raw partition, so it makes for a few extra hassles, but no big deal.
- I got a mylex caching controler, which aparently has hot swapping capability in the hardware, mitigating the absence of veritas volume manager and hot plug capabilities at the linux level. It also makes raid5 tolerable. Haven't proven hot swapping by testing yet tho.
- Ext2 fs has some raid5 aware stuff, this helped on the raid5 mylex vols I have, based on cursory thruput tests, but I'm not sure I'm getting the block alignment proper at the oracle level. (don't know after all the oracle/ext2/controler layers, if oracles 16k blocks are aligned with the stripes on the mylex. sigh.
FWIW, back in the dot-com heyday, I also had clients doing modest high availability (to them) databases on oracle/linux. Even then, on relatively small (in gigabytes) database the biggest tunining hassle was writeback caching of linux getting in the way of oracle, and the biggest hassle of scalability was managing many many disks. Raw partitions can get around the former, intelligent controlers (mylex etc) or intelligen disk arrays (clariion, sun t3 etc.)
get around the latter
I have been a Oracle dba for many systems that were larger than this, bot OLTP and DSS/Warehouse. but most of this should apply to sybase too, which would presumably be a gentler migration, if you want to go to linux.
- avoid raid 5, go with plain old stripe/mirror. raid 5 is horribly slow for writes, and in DSS, you do a lot of disk writes as part of queries, because the queries build temp tables/segments transparently to do the large sorts/merges involved.
- get more than 1g ram if you can. Oracle will make good use of this, by increasing memory sort area sizes, and caching database blocks more intelligently than the filesystem (gives preference to index blocks basically) hopefully sybase too.
- the mirror system at a remote site can be accomplished by using redo logs/transaction logs. Restore the database from backups to a remote location. rdist, rcp or scp the transaction logs from the primary database to the mirror, and roll the database forward with each successve log. This is called a "standby database" in oracle parlance.
Something tells me if you're posting to slashdot, you're not at the poverty level.
It kinda cheapens the hardships of actual poverty level existence to say that because your neighbors average 100k, you're at the poverty level. You know darn well you can get in your car, and commute out away from wherever it is you're working . . conneticut? An impoverished person would be unlikely to get the training and the degree and the experience to get your (probably) high tech job, and furthermore would not likely have a car to drive out to the suburbs, even if it does take an hour and a half one way.
Bite the bullet. Buy a house.
If you're good, your value goes up.
on
Too Old To Code?
·
· Score: 1
Anybody with a new sheepskin in their hand feels great because they're making 100k. But if you stopped looking in the mirror, and really looked around, you'd find that 35 and 40 year old guys are making a hell of a lot more than you, doing what you are dooing
Or, really I should say doing it with a lot more experience. if you get up early enough, and look around you as you drive up 101, you'll get a glimpse of wizened contractors on their way to 150 - 200 dollar per hour gigs. But the deal is, you have to be good at programming and you have to be determined enough to keep on top of your skills, and to go out and get top dollar, not wait for somebody to find you
The upshot is, you get a small but complete pc motherboard. You generally have to add a riser card to get pci expansion but thats okay, its still smaller than an Oreiley book. You'll need to find a vendor that will give you a chassis with space for pci or isa risers.
Also some of these vendors sell "point of sale" form factors, which were the basis of the I-opener and WebSurfer "little pc" boxes that caused so much hoopla recently.
Point taken, but the example is off the mark: To fix your parent child performance problem: Truncate the chile table instead of deleting all rows. The HWM of the table is not reset when you do a delete, and the full table scan of the empty table scans the entire allocated segment. Use indexes for the fk columns , and truncate, or rebuild.
But yeah, I'd *love* to see the oracle code right now that handes buffer pinning and DBWR copyies to disk. I'm having a hell of a time optimizing checkpionts for a certain high update financial website.
I've heard of laws saying that, for example, murderers may not profit by writing books about their illegal exploits. Given the fascination, at least in the US with serial killers and violent types in general, most of these true blackhats could get pretty rich. Some could get rich and get parole later, to spend their fortunes.
Perhaps this law was extended to non-violent criminals like mitnick. If so, it seems sensible to me. Fine, talk all you want (free speech) you just can't get rich off a previous crime (regulation of commerce)
Since you'd tend to spend less time in jail for hacking, than murder, it might actually be an apealing way to make a million for some script kiddies out there. They'd just have to fake orgasms till parole, then blow some of the book royalties on preparation.h ^?^?^?^?^?^?^?^? Preparation-H.
When you differentiate between x86 and SPARC boxes, make sure you're not kidding yourself about where the performance comes from. Much of it comes from the leading edge sun JVM. Also I/O subsystems are much faster generally.
as for the rest of the pro-java rhetoric, I agree. I think java is a great way to go, but you often need scalable hardware.
Like any platform, you have to learn java and how it plugs into an environment before you can say it sucks performance wise.
Not sure why you'd avoid stored procedures, as they are the equivalent of precompiling a scripting language to p-code. No startup overhead of parsing and compiling the original SQL statement.
But in Oracle you get most of that benefit just by using the same sql statements , and counting for variations with bind variables.
Oracle recognizes a statement that was run before, and pulls up a pre-compiled version out of it's cache (shared_pool)
In both cases you're talking 10ms on an unloaded system. But at high levels of concurrency, 100s of statements per second, the cached way is much faster. Under high load cached statements avoid cpu overhead which is scarse, but more importantly do not have to wait for a semaphore (shared_pool_latch) to get into the critical region of the oracle kernel to do the compile and update the cache.
How to tune java on the server side: - make sure you've tuned apache itself
- read the tuning docs for your servlet engine, and try different engines too. - Make sure you are using the JIT
- Make sure you have enough memory (-mx and -ms optoins)
- PROFILE YOUR CODE using any number of freeware or commercial profilers (not sure why you want destroy to go fast, when a page hit is going to one of the get methods)
- make it multithreaded, They're serialized by default I believe.
- manage your cookies.
- use the newest JVM.
When you've done that, then look at the system wholisticly (sp?) fire a few hundred hits at it. watch what happens in terms of cpu and memory. Are you sure Apache is tuned? are you sure there is enough memory? are you sure you've allowed the JVM to use enough memory?
to be versatile enough, a little expansion capability is called for. First, remove the bios restriction on memory size. A PCMCIA slot would make the divice much more amenable to various applications, by allowing one hacker to pop in 100mbit ethernet for 50 bucks, and some other hacker to put in scsi controler. True, there is usb, but the performance isn't there fore high bandwidth peripherals. If you're going to hack it to be a network computer, you're gonna be willing to shell out 50 bucks after buying it to get *fast* ethernet. Me, I'd put in a WaveLAN card. When I pull into my driveway, the I-0pener, mounted on my dashboard, would connect to my home lan and sync up my music and NPR Mp3 files. No wires!
The idea is to have no single point of failure, and the raid unit boils down disk redundancy to probably its simplest most foolproof form. So 2 or more components would have to fail in the raid unit in order to lose any information. So you're putting all your eggs into the most redundant basket possible, so to speak.
Now, If the raid itself fails (again, a long shot with a good raid unit), you are forced to rely on traditional recovery methods. Every site I've managed, I've had redundant disk, Hot or cold backups (equivalent to dd or dump backup of filesystem), exports (equivalent to tar backup of a filesystem), and some level of application level logging, like saving flat files of Credit card transactions.
When a hardware or human f**k up occurs, its all about having options, and knowing how the options work, and what tradeoffs they have. The truth is, hardware failures are more rare and easier to fix than human screwups. More downtime occurs due to human screwups.
In the case of a total failure of the raid unit, you have to do a traditional recovery, by recovering from the most recent tape backup, and rolling the database forward. By setting up the system properly before the crash, you can guarantee that you'll get the system to within X minutes or Y transactions prior to the crash. You tune these numbers to try to meet your needs without sacrificing too much performance. then make sure you have application people who understand the ramification of loosing, say 5 minutes of committed credit card transactions. After the recovery, they start the painstaking process of sorting out how to manually cancel or otherwise patch up these customer transaction that ocurred in the seconds or minuts prior to the absolute failure of the raid unit.
I am an Oracle DBA, working in Silicon valley, I've had a lot of experience with various flavors of failover. You can have all sorts of degrees of automation and rapid failover, but most of the really high end solutions are very complex, and the added complexity buys almost nothing in terms of uptime. The "sweet spot", IMHO, for Oracle is a "standby" databse, or simply a raid box that is physically attached to two unix boxes, but only one mounts it at a given time. Not a clustered or Parallel Server implementation. There are some places that will pay the complexity and $$ price to have parallel server, but it really only gets your downtime down from, say 30 minutes to under 5 minuts *IF* it works right, automatically. Even parallel server, where both systems are up at the same time, you'll still see half of your connections aborted when one side of the cluster goes down, and you'll have to redirect that half to the surviving node. All the while dealing with complexity of these things *every day*.
Here's a good way to do it in oracle, that'll probably genralize to almost any RDBMS:
The simplest way is to have two unix boxes, each physically attached to a raid unit (dont use raid 5, use mirroring for performance). Don't use clustering software, just mount the filesystems on only one machine at a time. If the primary system fails, then mount the filesystems on the secondary , startup the database, and you're on your way with minimum downtime. Script it and tie it to some ping or other fail detection strategy if you feel lucky, but in my opinion, keep a human in the loop to actually execute the switchover.
Another great solution is a so called "standby database". You make a copy of the primary database to your standby machine, using your backup tapes. Then you start to "roll forward" the standby database by applyin all offline redo logs of the primary database, as it is generated. This method should work for any database that logs transactions including Sybase, Informix and probably MySql. The equivalent of a redo log in MySql appears to be the "update Log". but I have no experience with MySql. One big drawback of this solution though, is that even though a transaction has been committd to disk on the primary, the event may not make it to the offline redo logs before the system crash. So a standby database can only be within some delta-T of the primary database, the last fiew transactions before the crash are lost. Sometimes this is acceptable, but if you billed the customer then lost the order, maybe not. You tune the redo logs buffering parameters to trade of performance for small "lag time" of getting redo logs across to the standby database.
The truth is that it takes 10x effort to get failover and cluster software correct, and 9 times out of 10, the automatic failover either triggers accidentally (bad news) or triggers correctly but fails to come up on the secondary. This is because you have to be incredibly scrupulous about keeping non-shared resourses in sync on the two machines and you can't test the failover (politically that is) until it actually fails. Heck, I just took a call this past saturday for this very problem: Site paid $$ for a bigger consulting firm to implement cluster, firm cant be reached on saturday when cluster trips but does not failover successfully. Customer calls me. I am to gracious to say I told you so.
My free advice: Make two machines, not a cluster. Keep the two machines in sync, using the "standby technique", or make the raid unit accessable by both machines. Keep a human in the failover loop.
It only matters whether the greater of the two directions is reduced, if you do 8gig out and 1gig in you get charged for 8gig out, even if you reduce 1g inbound to .925G. (also it's 95th percentile typically, but that shouldn't matter in this case)
It would cost more to do the refactoring than the ever could hope to recoup, even if shorter urls also decrease outbound traffic.
And I wonder, has he ever tried to do anything non default with oracle's equivalint: tnsnames.ora, listener.ora swlnet.ora etc.
The defaults have bad security holes, and changing things is pretty bad. They had a daemon in there who's *purpose* was to allow remote systems to execute programs without authentication for gods sake. You had to pay for encryption and it took highpriced dba to set it up (contrast with pretty simple SSL for opensource dbs)
SQLNet is *not* simple and out of the box is pretty freaking insecure.
Most applications do not need oracle.
.5T, but I'm talking reading sorting and analyzing say 10T of data in about a half an hour on just a few xeon boxes and a few FC arrays. It's a $200k system, but even on the same $200k of harddware mysql or pg will only push probably a few hundred megabytes/sec of the IO, meaning the same query *if* you managed to write it and parallelize it yourself on a distributed system of mysql or postres, since these would not present a single system image as RAC does. But hey, who *really* needs to to 10 gigabytes/sec of IO anyway?
Things oracle can do, that pg and mysql can't, include the below, but they are only truly needed in extreme cases
- massive IO thruput for datawarehouses, overlapping low end teradata. Many people call a pair of 250G drives a datawarehouse because it's
- Very high availability: Not "I want my blog up 100% of the time" availability, but "we loose $250 per second that my website is down" availability. oracle is not the best here but is better than mysql or pg because of serviceability features, not because it doesn't have any bugs. e.g. rebuild index with now interruption, fix block corruption with no interruption etc. But then most of the real work for high availability is in the application anyway. but still you'll want the visibility and availability features the oracle backend offers.
I've been a oracle dba for a long time, and recently turned my back on oracle to seek other career challenges. I think the short explaination for my decision, is that the speed of hardware has grown to overwhelm most any real world performance problem, with just "good practices" as a dba as opposed to having to be some kind of hero dba. So there is no future there as far as $$.
/sec on a few xeon servers facing a few FC disk arrays. PG or MySQL will never do that even on the same hardware, since they don't parallelize and they don't do directIO and asyncIO in a pervasive or big way. This is the classic datawarehouse design and is probably a bastion for oracle (oracle is basically trying to eat terradata from the lower end)
I built some very large, very high thruput databases under difficult conditions (torrid growth at paypal) and I don't think it could have been done with that hardware and even todays mysql or postgres: only oracle had the features then, and it's features then exceed what pg and mysql have *in the category of scalability and serviceability*. This mostly stemmed from the difficulty of doing transaction safe database calls across multiple machines forcing us to scale a single machine for a very long time. And before anyone says pg and mysql have transactions, I"m talking about billions of dollars of transactions, where if you loose or fail to recover even a few you risk goign out of business because customers stop trusting you.
But moores law and whatever law applies to disk densities just crush any classic RDBMS problems. Number of spindles can still be sort of costly, but way cheaper than even 5 years ago. memory and cpu are practically free, making it unecessary and wasteful of time and money to obsess over tuning, inall but the most exterme cases.
If you're writing a custom new application, fuck oracle, except in those cases fall along these few lines:
- you want to do truly massive scans of multi terrabyte tables: Oracle RAC will do 2-4 gigabytes
- you want > 99.9 uptime: You will still need to work 10x harder in the application to get this or better, but you'll also need the serviceability of oracle e.g. index rebuilds corruption fixes without reboot. Also, the visibility and monitoring of oracle is lightyears ahead of mysql and pg.
- you want to build an empire as a middle manager
- you are writing a turnkey application you plan to sell to corporate america or govt.
So how bout someone explaining why you wouldn't burn your face when making a cell call?
my fucking microwave takes 3 minutes to cook an egg . Why would a handset operating for days on a 3 volt battery cook something that fast, while not cooking the face of anyone using it?
With those kind of bucks, why invent yourself: just buy yourself.
The original question was regarding real world performance of iSCSI in particular, and since frew of the posts seem to touch on that I may as well tell what I've learned from hard experience regarding the other technologies: SCSI and FCAL.
My experience is with very high transaction volume OLTP databases (oracle) backing a financial website. I've found that neither SCSI nor FCAL adapters limit performance significantly. This was with qlogic qla3200 adapters, or with highend adaptec Ultra320, on Solaris 9 and the last few versions if RedHat enterprise. Only the older versions of redhat had some kind of problem with the qlogic driver, plus bounce buffer IO, which drove down performance. But then to be nit picky, that was the driver not the HBA. Solaris was always fine, and now redhat is too.
The main performance challenge was *always* tuning the database and spreading out on lots of spindles. The HBAs at over 200M/sec each never posed a problem on larger sun boxes (8 or more procs) with 7 or 8 way parallel sequential reads going. On smaller hosts or smaller disk arrays,k the problem was always on the host itself or the disk seek times respectively, not the hbas themselves.
A 10k rpm drive will do about 70 mbytes/sec off the outer platter (near block 0) and as a rule of thumb, a 2Gbit fcal adapter will do 200mbytes a second (at least on solaris or newer redhat EL). So my dual qlogics would do 400Mbytes a second under absolute optimal disk access, but typically it's not that perfect 8 way parallel *serial* scan off the outer platter, its usually farily random.
So in the high end database applications (datawareyouse or OLTP) least the usual tuning challenge (and $$ for that matter) are with getting a fat spread across a lot of spindles, and making sure the application is either caching well (OLTP) or doing orderly, sequential scans (datawarehouse)
I am the lead dba for a company that processes 15-20 million us dollars worth of transactions per day. My backend database is solaris/oracle, it does 3000-4000 sql statements per second, and my company would loose maybe $1000 in revenue for each minute it is down. The larger
two tables in this databasehave in excess of 300 million rows, and are acessed by 100k customers per day. We have over 11 million customers.
It's running on a E4500, which is saving us a lot of money *not* buying E10000s. I like to think it's tuned well, but a big part of the reason it works (fast) is also that it is on an EMC with over 90 disk drives in it. I'ts all about IO bandwidth and servicability in my world, and on those points you are correct in saying sun is a handsdown winner over linux.
.
Now, I work with a sysadmin who is a whiz at making lots of linux boxes work reliably as a web frontend, and is also good at keeping our backend solaris based database up 24/7. neither of us is anxious to put the backend on linux, but we did put up a significantly large, high performance, but *relatively * low availability database up on linux.
It's a 6x800mhz intel box with 4g ram and 16 disks on mylex caching raid 5 controlers. Raid-5 sucks in general, but the point of this system was to get a lot of bang for the buck, so as a big league dba, I took the challenge of making data loads fast in spite of raid-5, in order to get a crack at de-installing windows from this box. If I spent some bucks on more disks, we could get a much faster system, but then that was never the point of this system.
The system is about 200G worth of partition tables (copies of the same 300M row tables mentioned above) with partitioned rollup tables off the sides, for business analysis. The real trick is the partitioning. because of the partitioning, this system is able to do many types of analyses that cant be done on our other analysis system which happens to be solaris with 60 disk drives.
the linux box was a leftover from a failed windows project, so in some sense it was free, but I belive it woulda cost about $80k new. gig ethernet and controler was about 10 or 15k of it.
It's working well for DSS, since the 2 times it's crashed in the last few months didn't really hurt anything.
I'm rambling on now, but I'll talk to the DBAs out there, who speak my language.
If you're gonna do Linux oracle:
- reiserfs sucked performance wise on top of raid 5. Don't know if I did something wrong, but I abandoned it in favor of ext2. I don't care if fsck takes a long time on this system, and ext2 creamed it for database io perf on raid5. I also couldn't get perf out of reiser on simple stripes without the added hurt of raid5, so go figure. fsck times are irrelevant if you use raw partitions, so this is the way to go in most cases.
- Max out the memory (of course) on an intel box. I think the most you can do is 4G on intel platforms. this is sufficient for me, but I kept the SGA down to about 500m, so I could have 10 way parallel processes with 200-200M of sort area size.
- Watch out for linux caching. I've turned it off for my filesystems. It's easy to get into "writeback debt" by pushing a lot of dirty blocks out of oracle cache into ext2fs cache. Add raid5 suckiness at random writeback, and you've got serious constipation problems on your hands.
- I've used some raw partitions, for this system , they seem to be worth it to avoid ext2fs caching hassles, but I haven't migrated completely yet. The "raw" command must be used to "bind" a name to a disk partition before it can be used by oracle as a raw partition, so it makes for a few extra hassles, but no big deal.
- I got a mylex caching controler, which aparently has hot swapping capability in the hardware, mitigating the absence of veritas volume manager and hot plug capabilities at the linux level. It also makes raid5 tolerable. Haven't proven hot swapping by testing yet tho.
- Ext2 fs has some raid5 aware stuff, this helped on the raid5 mylex vols I have, based on cursory thruput tests, but I'm not sure I'm getting the block alignment proper at the oracle level. (don't know after all the oracle/ext2/controler layers, if oracles 16k blocks are aligned with the stripes on the mylex. sigh.
FWIW, back in the dot-com heyday, I also had clients doing modest high availability (to them) databases on oracle/linux. Even then, on relatively small (in gigabytes) database the biggest tunining hassle was writeback caching of linux getting in the way of oracle, and the biggest hassle of scalability was managing many many disks. Raw partitions can get around the former, intelligent controlers (mylex etc) or intelligen disk arrays (clariion, sun t3 etc.)
get around the latter
- avoid raid 5, go with plain old stripe/mirror. raid 5 is horribly slow for writes, and in DSS, you do a lot of disk writes as part of queries, because the queries build temp tables/segments transparently to do the large sorts/merges involved.
- get more than 1g ram if you can. Oracle will make good use of this, by increasing memory sort area sizes, and caching database blocks more intelligently than the filesystem (gives preference to index blocks basically) hopefully sybase too.
- the mirror system at a remote site can be accomplished by using redo logs/transaction logs. Restore the database from backups to a remote location. rdist, rcp or scp the transaction logs from the primary database to the mirror, and roll the database forward with each successve log. This is called a "standby database" in oracle parlance.
It kinda cheapens the hardships of actual poverty level existence to say that because your neighbors average 100k, you're at the poverty level. You know darn well you can get in your car, and commute out away from wherever it is you're working . . conneticut? An impoverished person would be unlikely to get the training and the degree and the experience to get your (probably) high tech job, and furthermore would not likely have a car to drive out to the suburbs, even if it does take an hour and a half one way.
Bite the bullet. Buy a house.
Or, really I should say doing it with a lot more experience. if you get up early enough, and look around you as you drive up 101, you'll get a glimpse of wizened contractors on their way to 150 - 200 dollar per hour gigs. But the deal is, you have to be good at programming and you have to be determined enough to keep on top of your skills, and to go out and get top dollar, not wait for somebody to find you
The upshot is, you get a small but complete pc motherboard. You generally have to add a riser card to get pci expansion but thats okay, its still smaller than an Oreiley book. You'll need to find a vendor that will give you a chassis with space for pci or isa risers.
Also some of these vendors sell "point of sale" form factors, which were the basis of the I-opener and WebSurfer "little pc" boxes that caused so much hoopla recently.
But yeah, I'd *love* to see the oracle code right now that handes buffer pinning and DBWR copyies to disk. I'm having a hell of a time optimizing checkpionts for a certain high update financial website.
Perhaps this law was extended to non-violent criminals like mitnick. If so, it seems sensible to me. Fine, talk all you want (free speech) you just can't get rich off a previous crime (regulation of commerce)
Since you'd tend to spend less time in jail for hacking, than murder, it might actually be an apealing way to make a million for some script kiddies out there. They'd just have to fake orgasms till parole, then blow some of the book royalties on preparation.h ^?^?^?^?^?^?^?^? Preparation-H.
as for the rest of the pro-java rhetoric, I agree. I think java is a great way to go, but you often need scalable hardware.
Like any platform, you have to learn java and how it plugs into an environment before you can say it sucks performance wise.
But in Oracle you get most of that benefit just by using the same sql statements , and counting for variations with bind variables.
Oracle recognizes a statement that was run before, and pulls up a pre-compiled version out of it's cache (shared_pool)
In both cases you're talking 10ms on an unloaded system. But at high levels of concurrency, 100s of statements per second, the cached way is much faster. Under high load cached statements avoid cpu overhead which is scarse, but more importantly do not have to wait for a semaphore (shared_pool_latch) to get into the critical region of the oracle kernel to do the compile and update the cache.
- read the tuning docs for your servlet engine, and try different engines too. - Make sure you are using the JIT
- Make sure you have enough memory (-mx and -ms optoins)
- PROFILE YOUR CODE using any number of freeware or commercial profilers (not sure why you want destroy to go fast, when a page hit is going to one of the get methods)
- make it multithreaded, They're serialized by default I believe.
- manage your cookies.
- use the newest JVM.
When you've done that, then look at the system wholisticly (sp?) fire a few hundred hits at it. watch what happens in terms of cpu and memory. Are you sure Apache is tuned? are you sure there is enough memory? are you sure you've allowed the JVM to use enough memory?
to be versatile enough, a little expansion capability is called for. First, remove the bios restriction on memory size. A PCMCIA slot would make the divice much more amenable to various applications, by allowing one hacker to pop in 100mbit ethernet for 50 bucks, and some other hacker to put in scsi controler. True, there is usb, but the performance isn't there fore high bandwidth peripherals. If you're going to hack it to be a network computer, you're gonna be willing to shell out 50 bucks after buying it to get *fast* ethernet. Me, I'd put in a WaveLAN card. When I pull into my driveway, the I-0pener, mounted on my dashboard, would connect to my home lan and sync up my music and NPR Mp3 files. No wires!
Now, If the raid itself fails (again, a long shot with a good raid unit), you are forced to rely on traditional recovery methods. Every site I've managed, I've had redundant disk, Hot or cold backups (equivalent to dd or dump backup of filesystem), exports (equivalent to tar backup of a filesystem), and some level of application level logging, like saving flat files of Credit card transactions.
When a hardware or human f**k up occurs, its all about having options, and knowing how the options work, and what tradeoffs they have. The truth is, hardware failures are more rare and easier to fix than human screwups. More downtime occurs due to human screwups.
In the case of a total failure of the raid unit, you have to do a traditional recovery, by recovering from the most recent tape backup, and rolling the database forward. By setting up the system properly before the crash, you can guarantee that you'll get the system to within X minutes or Y transactions prior to the crash. You tune these numbers to try to meet your needs without sacrificing too much performance. then make sure you have application people who understand the ramification of loosing, say 5 minutes of committed credit card transactions. After the recovery, they start the painstaking process of sorting out how to manually cancel or otherwise patch up these customer transaction that ocurred in the seconds or minuts prior to the absolute failure of the raid unit.
Here's a good way to do it in oracle, that'll probably genralize to almost any RDBMS:
The simplest way is to have two unix boxes, each physically attached to a raid unit (dont use raid 5, use mirroring for performance). Don't use clustering software, just mount the filesystems on only one machine at a time. If the primary system fails, then mount the filesystems on the secondary , startup the database, and you're on your way with minimum downtime. Script it and tie it to some ping or other fail detection strategy if you feel lucky, but in my opinion, keep a human in the loop to actually execute the switchover.
Another great solution is a so called "standby database". You make a copy of the primary database to your standby machine, using your backup tapes. Then you start to "roll forward" the standby database by applyin all offline redo logs of the primary database, as it is generated. This method should work for any database that logs transactions including Sybase, Informix and probably MySql. The equivalent of a redo log in MySql appears to be the "update Log". but I have no experience with MySql. One big drawback of this solution though, is that even though a transaction has been committd to disk on the primary, the event may not make it to the offline redo logs before the system crash. So a standby database can only be within some delta-T of the primary database, the last fiew transactions before the crash are lost. Sometimes this is acceptable, but if you billed the customer then lost the order, maybe not. You tune the redo logs buffering parameters to trade of performance for small "lag time" of getting redo logs across to the standby database.
The truth is that it takes 10x effort to get failover and cluster software correct, and 9 times out of 10, the automatic failover either triggers accidentally (bad news) or triggers correctly but fails to come up on the secondary. This is because you have to be incredibly scrupulous about keeping non-shared resourses in sync on the two machines and you can't test the failover (politically that is) until it actually fails. Heck, I just took a call this past saturday for this very problem: Site paid $$ for a bigger consulting firm to implement cluster, firm cant be reached on saturday when cluster trips but does not failover successfully. Customer calls me. I am to gracious to say I told you so.
My free advice: Make two machines, not a cluster. Keep the two machines in sync, using the "standby technique", or make the raid unit accessable by both machines. Keep a human in the failover loop.