What Sustained Disk Transfer Rates Do You Get?
Mr. Jackson asks: "What kind of disk transfer rates (MB/s) do people get in the real world when moving around large (100s MB) files? Either every machine in our building is mis-configured, or our notions about what we were getting are way off. I've tested half a dozen machines, mostly Win2k, some Linux, by just copying a large file and timing it with a watch. 8 MB/s seems to be about average for inter-disk copies. RAID 1 (stripped) got as high as 12 MB/s after fiddling with cache settings. RAID 5 was as low as 2 MB/s. We all thought the numbers should have been around 30 MB/s."
We were shocked at some network file transfer speeds here -- if you are doing network copies and are shocked at how slow it is, make sure that your switch and your NIC agree about whether or not your connection is half- or full-duplex.
Makes a HUGE difference.
What you're describing sounds about right, actually.
Be sure you're keeping Mega-Bytes and Mega-Bits straight!
Sequential reads on my drives can go from 25-65MB/sec (maxtor and cheetah), if the file is heavily fragmented I've seen it drop as low as 5-10MB/sec. Not so bad on the 15Krpm cheetah because the access rate helps, but on the lowely 5400rpm maxtor fragmentation destroys read/write speeds.
"I don't know that atheists should be considered citizens, nor should they be considered patriots." George HW Bush
A gentleman does not discuss his sustained disk transfer rates in public. I assure you, however, that they are adequate for my purposes.
Karma: Good (despite my invention of the Karma: sig)
I've benchmarked our various disk subsystems heavily.
Once you get exhaust the on-disk cache and the filesystem cache, the raw disk access speeds are visible. Here's what I've found for Seagate ATA-IV 80MB drives:
Sustained Sequential read: 40MB/sec
Sustained Sequential write: 17MB/sec
That was benchmarked on a 2Gz dual Xeon system under linux with nothing else running, and IDE tuned optimally. So, real life results are going to be worse.
It would not surprise me to see most consumer level systems with sustained speeds on a single disk under 10MB/sec. Most systems that use IDE drives don't have the DMA/ATA mode settings tuned aggressively.
Most systems with RAIDs have crappy implementations. Get a hardware RAID controller with its own processor and a large-bandwidth backend bus (ie, SCSI-160 or higher) and with lots of onboard battery-backed memory so you can safely turn on write caching.
Massive databases or file size large :
... they are the same speed.
On my fairly new Dell Latitude C800 (30G OEM IDE drive, PIII/1GHz laptop) I have seen that sequential database reads with a little data crunching runs around 16 megabytes per second.
Change that to read/write access (roughly 50/50) and it drops to 1.5MB/s read, 1.5MB/s write (total, 3MB/s).
On my desktop, two IBM 9.1G u160 SCSI drives in a RAID 0 array using a American Megatrend MegaRAID card (428) and 32M of RAM for read/write cache the sequential read access only peaks around 10MB/s, but in read/write access it is something like 3MB/s read, 3MB/s write for 6MB/s combined.
The SCSI drives were rated u160, but my card was only a 20 (68pin U/W, hell I forget what the 428 is rated for but I think 20) but even in a RAID 0 array it wasn't going to go any faster than 10MB/s peak sustained read.
If the file sizes were less than 16M, the writeback cache on the SCSI RAID array skewed the benchmarks bigtime, access times were almost as fast as ramdrive. Goes REAL FAST.
On a regular IDE drive, I would be insanely happy with anything better than 20MB/s unless you were doing some serious transaction based computing.
If you have to get a stopwatch out to decide if one is faster than the other
Glonoinha the MebiByte Slayer
From your average on board IDE controller without any special configuration, the numbers you're seeing look about correct. The fastest you can really expect to get with any consistancy is like 15mb/sec, and that's with tuned interfaces AND tuned I/O. With a high quality IDE controller, or a reasonable SCSI controller, and fast discs (10,000RPM) you can get 50-75% better then that. The fastest I/O I've seen in linux was with 2 gigabit Fibre Channel, and an array of 15 striped 15,0000 RPM disks. I managed about 120MB/sec, and that was only with certain block sizes. The average was still in the 60MB/sec range.
Bottom line, with a 7000 RPM IDE disk, and a regular cp command using a 4kB or so block size, you're probably not going to get better then 10MB/second. Disks are just too slow.
My sustained data rate goes up to 11.
/sbin/hdparm -d1 -c3 /dev/hd[abcdefgh] is generally safe for most IDE chipsets and could very easily double or triple your transfer rate.
I get about 35MB/s copying between my IDE IBM 60G drives
If you're looking for nothing but speed, try solid state...I have seen file servers that use solid state. Yea, I guess you could try and find the fastest hard drive on the planet, but if you really NEED higher transfer rates, nothing beats a drive from one of the manufacturers like Curtis or Imperial.
But, most of these solutions are expensive...you could try to keep your costs down by just adding more memory to your systems and using a chunk of it for a RAM Drive (a 1GB RAM Drive on a modern system should be more than possible).
Once you've experienced what it's like running your OS off of a RAM Drive, you'll never want to run off of physical media again.
Ok, so that's what's good about it, now what's bad...well, you should still keep backups on physical media if you use the RAM Disk method...You should also purchase a UPS with power management features...And if you wan't a better solution, go with a much more expensive Solid State Drive.
Are you kidding me? That's why we have ATA/133 coming out, because IDE drives are getting that fast, oh wait, that must be 133Mbits/sec (*sarcasm*) (yes people have told me this)
Try turning on DMA, you absolutely _need_ DMA turned on for modern drives, PIO Mode 4 maxes out at 16MB/sec with 100% cpu utilization, PIO Mode 5 isn't official and will most likely break your hardware. After you turn on DMA you can set your interface speed at 16/33/66/100/133MBYTES/sec.
I hate to repeat myself but here are the Specs for ST318452LW, Cheetah x15
Internal Transfer Rate (min) 548 Mbits/sec
Internal Transfer Rate (max) 706 Mbits/sec
Formatted Int Transfer Rate (min) 51.8 MBytes/sec
Formatted Int Transfer Rate (max) 68.1 MBytes/sec
External (I/O) Transfer Rate (max) 160 MBytes/sec
Avg Formatted Transfer Rate 61 MBytes/sec
"I don't know that atheists should be considered citizens, nor should they be considered patriots." George HW Bush
Most distros use very conservative settings for the IDE interfaces which will work with just about any old drives, but do not take advantage of more modern hardware. hdparm allows you to activate those advanced features.
There is a nice write-up about using hdparm here: http://www.oreillynet.com/pub/a/linux/2000/06/29/h dparm.html
Of course, all this only applies to Linux boxes.
Hey, who else could go for some flapjacks right now?
As you've found out it does matter which RAID scheme you use. RAID 0+1 will outperform RAID 5 substantially.
Think spindles. Because each disk has only one spindle, the disk head can only be over one given track at any instant. If you want the heads to nearer to your where your data is stored you want to have more heads. With RAID 1 your read or write request can be handled by more than one disk spindle. That gives you the best performance.
To get more spindles, use as many disks as practical. I've had some long conversations with my co-workers that now that disks are really cheap it doesn't matter that RAID 1 "wastes" half the disks. It does matter that disk I/O is a bottleneck and more disks will help ease that bottleneck..
References:
Ever dream you could fly? Get up from the Flight Sim. I Fly
I used to have this line in my rc.local:
/dev/hda 1>/dev/null 2>/dev/null /dev/hdb 1>/dev/null 2>/dev/null
hdparm -m16 -X66 -d1 -c3 -u1
hdparm -m16 -X66 -d1 -c3 -u1
You can see what that means with `man hdparm`.
That command would speed up my hard drive from 3mb/s to a 20mb/s transfer rate.
The new KT333 chipset(uata 133) I use doesn't need hdparm to set the device mode(make sure you use 2.4.19) and transfer between two hard drives on different controlers is about 20mb/s. The lowest I got was 13mb/s...
Hope this helps.
On RAID technologies, speaking in general terms assuming vendors do a good job of implementing it, here's a summary:
RAID 0: Pure striping, maximum performance, no redundancy. Cost is the same as concatenating disks to get the space you need.
RAID 1: Pure Mirroring, full redundancy - reads can be as fast as a stripe of the same width as the number of mirrors (2-way stripe, 2-way mirror, same read speed, etc) if they do round-robin reading. Writes happen in parallel, and can be slower unless you've got the headroom and the disk spindle is the only write bottleneck. Cost is double a simple concat or stripe.
RAID 2-4: Sometimes used for very special purposes, but generally ignored by all because one of the other raid levels does the same thing better. I've seen RAID-3 recently, there are occasionally valid uses for like 0.01% of people out there.
RAID 5: You get some data redundancy to survive a single disk failure, but you don't pay the double disk cost of full mirroring. It's an N+1 type of configuration. Speed is generally the slowest compared to everything else.
Now on top of those very basic things, there are other factors. Because RAID-5 is cheapest disk-wise, and (IMHO) because it has the highest number of the well-standardized RAID levels, RAID-5 is very popular. To make up for RAID-5's abysmal performance, people use hardware RAID-5 accelerators with cache and whatnot. The problem there is that the controller can add significant cost (in some cases enough to have paid for a full mirror in plainly controlled disks), and that the RAID controller itself can become a single point of failure.
At my office (where a lot of bad decisions get made every day and I have to eat it) they built a Veritas cluster of Sun machines around a SAN. The idea was that no node was a single point of failure because of clustering (with veritas allowing all nodes to reach the SAN storage). However, the SAN storage was a big fat RAID-5 array with redundant controllers/disks/yadda/yadda. Of course, as much as the vendor tries to bury it in the fine print, the RAID-5 hardware is a single point of failure. Sure enough, our very reputable vendor's "redundant" hardware raid-5 controller did fully fail one, knocking our data offline for hours.
For the same cost as the expensive raid-5 array and the disks in it, we could have bought two independant JBOD arrays (just a bunch of disks, no raid controller), placed them on the redundant SAN, with the redundant clustered machines doing software mirroring to the disks, and been truly free from single points of failure (assuming we do all the details right - that the mirrors are always across seperate arrays, and that the arrays are on seperate power, etc)..
I've spent a lot of time on these problems, and it is my strong belief that the optimal solution for almost all normal situations where you want high availability is to do software mirror/stripe (1+0). Be careful that there is a difference between 1+0 and 0+1 when the 0 part's stripe is more than two disks wide... Consider two JBOD arrays of 5x 36G disks each...
In 0+1, you first stripe each array into a 180G stripe, then mirror the two together. When your first disk fails, nothing so mcuh as hiccups. However, of your remaining 9 disks, if any of the 5 disks in the array opposite the one with the first disk fails, you will lose data. Thus there's a 5/9 chance that the second disk failure causes data loss.
In 1+0, you first mirror each disk from the first array with its partnet in the second array. You then take your 5 36G mirrors and stripe them together for your 180G. Again, first failure, no hiccups. If a second disk fails, in order to cause data loss it must be the partner of the first failed disk - any of the other disks can fail and you still lose nothing. So the chances of data loss on a second disk failure are now 1/9 instead of 5/9.
11*43+456^2
We just build a 2 TB fileserver using two 3Ware 7850 controllers, with eight 160 G Maxtor drives per controller. Each controller has RAID 5 across all it's drives. We split each RAID 5 partition into "inner" and "outer" partitions, and striped inner-to-inner and outer-to-outer using software RAID 0. Bonnie++ benchmarks show the "outer" array is getting > 241 MB/sec sustained read, and > 81 MB/sec sustained write.
Click here for the Bonnie++ results
Bonnie++ Tests with U160-scsi and IDE:
/sec %CP
/sec %CP
... but don't need to tell you that the 2x80 gb maxtor were quite a lot cheaper ;)
IDE promise ATA hardware raid, 2x 80 gb maxtor:
Version 1.02c ------Sequential Output------ --Sequential Input- --Random-
-Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP
cotopaxi 1G 16356 99 34258 41 9183 7 14602 89 50924 16 351.6 1
thats 50 mb/sec.
now 3x fujitsu 15k rpm scsi-u160 drives, running software raid5:
Version 1.02b ------Sequential Output------ --Sequential Input- --Random-
-Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP
stromboli 3600M 9777 84 24733 68 18517 68 11916 98 53297 65 378.1 3
Again, around 50M/sec
time dd if=/dev/zero of=largefile2 bs=1024k count=1024
1024+0 records in
1024+0 records out
real 0m10.015s
user 0m0.010s
sys 0m7.810s
This gives me about 107MBytes/second for writes.
The RAID-1 system drive is significantly slower.
- A.P.
"Remember when the U.S. had a drug problem, and then we declared a War On Drugs, and now you can't buy drugs anymore?"
All this post did was show me that hardware RAID is faster than software RAID. Well, no duh!
Let's not compare apples to Buicks.
- A.P.
"Remember when the U.S. had a drug problem, and then we declared a War On Drugs, and now you can't buy drugs anymore?"
Running raid0 gave about 110MB/s sustained (at best) with an IO mix of 2/3 reads with 50/50 random/sequential mix. I believe in that case it was my 1Gb FC connection that was the bottleneck.
It would have been. I've seen similar performance over a single FC channel (through a Brocade switch) to a Hitachi SAN. You'll need more FC cards if you really want to do performance testing. (The added failure protection is nice too.)
- A.P.
"Remember when the U.S. had a drug problem, and then we declared a War On Drugs, and now you can't buy drugs anymore?"
~60/second sustained on a 6th-generation Cheetah. They're the fastest, IMHO (and most expensive).
Daniel
I use IDE disks for video editing, and if I don't have appropiate data rates I get errors. So I am very meticulous. I do a mix of uncompressed video and DV25 work. Uncompressed video requires 23Mbytes/sec (NTSC) DV25 only requires 3.5 Mbytes a second, per stream. Frequently in editing or compositing I will use up to 9 data streams.
My IDE drives can all sustain over 19.5 Mbytes/sec for reads in single configuration.
I benchmark my sytems with Matrox Disk Benchmark provided with Matrox video editing equipment. Matrox Disk Benchmark writes random data to the disk Typically the generated data set runs very large, usually in the tens of Gigabytes. With such large data sets any cache is completely negated, be they OS or hardware.
I ran a 16.5GB test for my reply, and just for giggles kept an instance of Winamp running as well as playing two separate full resolution NTSC video streams and surfing the web while the test ran.
Here are the results for a Maxtor 160GB DiamondMax D540X 4G160J8 drive in a single configuration. The test platform is a dual Pentium 3 750 with 512MB RAM and Windows 2000 service pack 2. At the time of the test the drive had 53.22GBytes of 152.66 Gbytes free. Data is in MB/sec. The three measured values are: minimum/average/peak.
Single Write
8.43/30.28/39.96
Single Read
19.56/27.12/39.96
Dual Read
8.43/11.89/16.87
I expect that the number that should interest you most is the average (middle number), though for my application the minimum data rate is critical.
Aside from defragmenting drives regularly I do no special maintenance.
I hope this information is of utility.
Don't post innacurate information
If you do, I swear by my pretty floral bonnet I will end you.
Yes, that's exactly what I have. Lots of people already do this with their Linux system -- using ReiserFS, jfs or ext3 for their data partition, likely ext2 (maybe mounted readonly) for /boot, etc. For filesystems storing large files, ext3, xfs, etc. perhaps tuned for larger block sizea or whatever.
You can also use RAID judiciously. Depending you your anticipated needs, different RAID configurations (including software RAID and RAID with IDE drives) can boost read or write performance (sometimes both, depending on the pattern, size of files, etc.).
In days of large drives, avoiding the fsck delay at boot is critical. On the system I described, it used to have ext2 filesystems on all 13 drives (11x73GB + 2x18GB). Booting after a crash could take hours if fsck needed to fix things. With ReiserFS or another of the journaling filesystems, it's nice and quick.