Where are the High-Capacity SCSI Drives?
An anonymous reader asks: "Storage technology has really exploded in recent years, giving us ATA drives up to and exceeding 200-250 GB per drive. Why is it that SCSI drive technology has remained stagnant? I can't find a SCSI drive exceeding about a 146 GB capacity. Instead, businesses (and some individuals) wanting greater storage capacities are required to buy more drives which takes up more space, generates more heat, provides more points of failure, uses more electricity, etc. Why is this so?"
Hasn't it always been this way? It's nothing new.
Screw SCSI Srives. It looks like we need to work on spell checker technology.
Sorry sif sI sidn't so sit someone selse swould
No sig for you!!
I don't know, but it srives me crazy!
I'm a signature virus. Please copy me to your signature so I can replicate.
It was the same way a decade ago. It's simply a lack of demand for the drives in general.
Isn't it because SCSI drives are for high speeds and generate more heat? I may be out of the loop, but I really haven't heard of 10,000 RPM & 15,000 RPM drives for SATA or IDE. Maybe there are design considerations that prevent these higher performing drives from allowing more capacity. Or maybe the market demand does not warrant the manufacture of such units due to economics. In any event, I want to know if there is a 15,000 RPM SATA drive out there somewhere. I am curious!
>>>>>> Chewie, take the professor in the back and plug him into the hyperdrive.
Instead, businesses (and some individuals) wanting greater storage capacities are required to buy more drives which takes up more space, generates more heat, provides more points of failure, uses more electricity, etc.
Are you unfamiliar with the concept of RAID? That's where all those SCSI drives are going, and it most certainly does not add more points of failure as it pertains to systems. Business do not want high-capacity single SCSI drives, especially when they can pile together 146 GB drives.
Interested in open source engine management for your Subaru?
The solution to this reliability problem is the RAID. There are two RAID levels that are ideal (there are more, but this is a simple explanation). There is 1, which is just a mirror; and 5, which is striping with parity.
With RAID 1, if you have 500 GB of data, you would need 2 500 GB drives. You lose 50% of the capacity you buy. The other option is RAID 5, where you lose (1/number of disks). So you could store 500 GB of data on 6 100 GB disks. This way you've only lost 100 GB of storage to redundancy as opposed to 500 GB.
So when businesses want to store large ammounts of data, it's more economical to use many smaller drivers than to large drives. Even if you don't need the redundancy (for example the disk is just being used for temporary storage while working on large digital picture or video files) they it's still better to use many small disks. While using a 500 GB drive will only go so fast (lets just say 60 MB/s sustained), by using a RAID, you can mulitply that. So by using 5 100 GB drives, you might be able to sustain 300 MB/s (assuming the bus can keep up, etc). Even if you only scale at 50% (that would be 150 MB/s) that's still 2 to 3 times faster than a single drive. That performance can save you money.
So, if you can afford it you can get much better performance or economics from using multiple smaller drives from one large one.
That's my theory/understanding. Begin tearing it apart!
Comment forecast: Bits of genius surrounded by a sea of mediocrity.
would you put effort into product development when you where already spending money on Serial Attached SCSI ?
well the storage people do not think it wise to spend the money...
iSCSI and SAS are good things !
(pitty there is not a MacOS X driver for iSCSI...)
regards
John Jones
Could it be that the various manufacturers have a large stock of the smaller drives that they're trying to get rid of before putting larger ones to market?
Maybe it is due to the fact that SCSI storage has typically doubled in size... 9.1, 18.2, 36.4, 72.8, 145.6... Could it be that they're currently testing 291.2GB disks?
My $0.02.
This info is from an IBM Magnetic Storage Engineer. The reason is that the IDE market is a retail home market and very competitive. He said "If an IDE manufacturer can save 5 cents on a component he'll buy the cheaper one". The time from R and D to store shelf is less than a year. For SCSI drives on the other hand are primarily for servers and they have expensive components and are tested for a long time before they reach the market. The time from R and D to store shelf is about three years for SCSI. what was the bigest drive you could buy three years ago (ide)? Thats right about the same size as the biggest SCSI drive today. So ... what does this mean? IDE drives suck, they are cheap they are the zip lock bag of the storage industry. If you are going to grandmas with your data thats ok but if its going to the moon... buy tupperware, (SCSI).
Hitachi/IBM produce the 300GB UltraStar 10K300, which is a mighty drive if I've ever seen one.
The real reason is that when you move up to higher rotational sppeds to reduce latency, you have to reduce density relative to the motion of the disk under the head, so a 10K drive can generally pack only 60%-ish as much data per-inch as a 7200RPM drive.
The same can be seen in 15K disks, which are much lower density than their 10K counterparts. The 15K platters are smaller too, to keep them from flying apart.
Do you remember when the 5400RPM disks had higher capacity than the 7200 ones? I sure do, it was for the same reason.
Until the latency of the read-write head improves this will be the case.
"Sometimes, I think Trent just needs a cup of hot chocolate and a blankie." -Tori Amos on Nine Inch Nails
In most cases, companies are purchasing large numbers of drives and adding them to RAID arrays. They don't want a single large drive, but rather several smaller drives in a redundant configuration. Also, more spindles is usually a good thing in a server environment, where multiple data sets are being requested from multiple hosts, rather than a few sets from a single host as in the typical IDE setting.
But its probably the same reason we dont have large capacity/high speed SATA drives either.
"Sic Semper Tyrannosaurus Rex."
I would think that SCSI has been shifted toward thin client servers. Gigabit Ethernet is fast as it stands, but extra speed calls for faster drives and faster disk access.
Overall, a valid 'Ask Slashdot'. But one thing bugs me. "More points of failure." I guess that's technically true, but I'd think having half of your data saved is better than having all of it lost -- I fail to see how having more drives is a bad thing when it comes to reliability.
Yeah, that's a problem. It's much better to reduce potential points of failure... preferably down to a single point of failure.
Or is that not what you meant?
Drive speeds haven't really gone up tremendously. Still too slow.
;) ).
Imagine you have a 1TB drive, but were stuck at a 100MB/sec max seq transfer rate. It takes you 2.7 hours to read/write the entire drive. And that's for _sequential_ access. Gets ugly for random seek.
A similar speed 10TB drive will take you more than a day (27+ hours) to read sequentially.
Before the point where it takes too long to read an entire single drive you might as well start using multiple drives to add capacity rather than having bigger drives.
Taking too long is subjective, but I'd say this: how long can you make your boss/customer wait whilst you are restoring an entire disk image from backup? 27 hours or 2.7 hours? or 25 minutes?
So 70GB would be about the limit if you have impatient users and bosses.
Larger capacities are OK if they are to hold data that aren't important enough to be backed up, and don't require masses of data to be available quickly. Or you are doing mirroring and read speeds are important but write speeds aren't as important (but remember that restoring from backup = writing
If you are actually going to use those drives in a server (guess where they are used, for the most part), you need them to be fast. An array with fewer larger drives is much slower than one with lots of small drives.
The bottom line is nobody wants it. Right now, data storage is so abundant that having twice or four times the amount of data per disk won't solve any problems. We've hit the stage in my line of work that long-term data storage is a non-issue. Suck as much data as you want and store it anywhere, we're never going to run out.
The radical sect of Islam would either see you dead or "reverted" to Islam.
The whole point of SCSI was high speeds, not high capacity. With the growing size of DB nowadays bigger drives with higher speeds are needes, but in a close future non volatile RAM will be much cheaper. Hell google already hosts itself on ram.
My company was offering 180GB SCSI drives in one of our RAID products, but we had to stop due to reliability issues. There was a huge difference in reliability between the 180GB and 146GB drives (which we still offer).
Under capitalism man exploits man. Under communism it's the other way around.
...and at the time (4 years ago?) with equivalent-speed SCSI and IDE drives and controllers, dd if=/dev/hdc | dd of=/dev/sda and vice versa chewed about 3x as much CPU horsepower to work on the IDE drive (same ration in either direction).
Got time? Spend some of it coding or testing
Could it be the VCR vs beta max syndrome? Or something more sinister..... I am going to put on my afdb ( http://zapatopi.net/afdb.html ) and think about this....
./what?
... the computers of people with high-capacity wallets.
I could see if SCSI was just behind IDE, but I've seen 146 gb drives available for years. In the same time, IDE has gone from 160 gb to 300+ gb.
If you're only worried about how much data you can put on a chain, SCSI has a two-level addressing scheme. Each 'target' (usually a drive), an have up to 8 or 15 Drives on it... It's not used very often, but I've seen some SUN enclosures that used that addressing method. In any case, that's pushing you up into the 16 and 32 terabyte range for a single chain.
As you pointed out, however, the bottleneck that you run into at that point is performance. When you're looking at performance, smaller drives have two advantages:
* More spindles means lower average latency
* more IO drives means faster overall transfer.
Simple fact of the matter is that it can take way too damn long to dump the contents of a 300GB drive -- and during that time, the drive is tied up. Far better to split that data among 4 80GB drives where only one is tied up at a time and/or you get a higher overall transfer rate if you have them properly distrubuted among controllers, etc.
At 30megabytes/second, a 300gigabyte drive is going to take almost 3 hours to dump/fill. If you're running a raid-5 system with data worth millions of dollars, thats a long time to be exposed to the possibility of a second drive failure. (I once suffered a 2-drive raid failure -- trust me, it's not pretty.).
From a manufacturer's point of view, 300GB RAID isn't a good bet. For the consumer market you can get away with a 1 year warranty... Few enough pay attention to the small print that you could probably get away with a 90 day warranty for a lot of them.
On the other hand, the kinds of businesses that like buying SCSI drives pay attention to their warranties. They'll want a 5 year term, and they'll put your feet to the fire if the drives fail in that 5 year span.
Given the lower volume, the lower profit margin and the lower reliability, it's just not worth building a 300GB drive with a 5 year warranty on a SCSI controller. It's just asking to et your foot blown off.
Free Software: Like love, it grows best when given away.
I will try to avoid the SCSI vs IDE flame war.
1) RPM. It is easier to spin a 2.5" platter at 15K than a 3.5" platter. (someone else can figure out the addtional energy but I would guess more than double the juice adduming uniform density.)
2) IOs per second. In large arrays the driving factor is not necessaraly throughput but IOs per second. Which leads to more transactions per second for your server farm. So more spindles = more IOs per second.
3) Access time. The bigger the drive the longer it takes the drive's processor to position the head. Therefore increasing access times. decreasing IO per second. I now its a trivial amount of time but it adds up over millions of IO.
4) Error correction. I cannot speak for IDE but each block on a SCSI drive has an Error Correction Code (ECC) which helps the drive recover from read errors. Again minimal.
5) Cynical answer. Smaller drives means your drive company sells more product to meet a given capacity.
educational point. SCSI is a protocol like IP or TCP. It can be tunneled through or carried by anything.
SPI -SCSI Parralel interface (old school).
FCP - Fibre channel protocol
SAS - Serial attached SCSI. SAS can also tunnel SATA.
iSCSI - scsi in TCP. (not ethernet)
SBP - SCSI Block Protocol. firewire.
ATAPI - yep SCSI ove IDE so your CDROM works.
many others.
SCSI is generally more reliable, that's part of what you are paying for. But some people have figured out how to get great reliability with still lower costs. For example, Google is based upon inexpensive and easily replaceable hardware. They have so much and such a robust system that hardware failure is not a problem.
I have been using the best IDE drives I can get (usually, Maxtor with 3 year warranty / 8 meg cache / 7200 rpm) in my servers with Acard SCSI-IDE adapters. The latest adapters, such as the 7726Q, even support command queueing, and all of them support all of the other SCSI goodies like disconnection / reconnection and speeds up to 160 MB/sec. No more adding multiple PCI IDE controllers, no more cabling nightmares! They're expensive ($90 USD an adapter, usually), but worth it for applications which require good performance (like servers).
. html
http://www.acard.com/eng/product/scside/aec-7726q
Even my Amiga has one; the motherboard IDE is pretty slow and can only support drives up to 128 gigs, but a 160 gig drive on the 40 MB/sec UW SCSI bus is so much faster!
The big OEM HDD customers don't require larger SCSI/FC capacities. There are many reasons:
-Less data per disk means faster RAID rebuild times and less data dedicated to a single disk.
-Big RAID configs already provide plenty of capacity.
-Reliability is enhanced with lower data densities. Desktop always leads the density curve.
This is what IBM calls a "Single-level store", and is used in their AS/400 architecture. The RAM in the machine is essentially nothing more than a cache for the DASD (disk array). Everything exists as an object within the 2^64 bit address space, which is mapped to the disk.
For more info, just Google for Single-Level Storage. It seems that this type of system causes more problems than it solves.
aQazaQa
I've read of companies that bought a bunch of SCSI drives and then set them up to only use half their normal capacity, by throwing away half the cylinders. This reduced the average access time of the drives. I'm not sure if they reconfigured the drives in-house or if the manufacturer did it for them.
Mea navis aericumbens anguillis abundat
Maybe that's why the
You only format the outer half of the disk, so you reap the higher DTR.
I want to delete my account but Slashdot doesn't allow it.
I'm talking about 64GB of RAM, and have the huge storage disk sync with the RAM every now-and-then. All of your OS and apps, and most of your recent documents will reside in RAM, the rest will shuffle off to the disk when it gets 'cold'.
You just saved your document. I just pulled the plug. You just broke your monitor.
I'm just clarifying that in order to satisfy Durability, such a system would have to write out dirty data (which you called "cold") within milliseconds after it's marked dirty, with applications blocking on syncs of their VM spaces. Your "every now-and-then" would explode to dozens or hundreds of times a second.
Whether a single point of failure is better than multiple points of failure depends on whether the points of failure are in parallel (e.g. RAID 1) or in series (e.g. RAID 0).
Its because we want more drives. The more drives the fast the Disk IO. So as corporate storage size requirements increase so do demand to access to that size. One extra drive means one extra set of reads per second in most raid arrays. Another reason SCSI has had 10k and 15k longer than IDE too.
Well I'm not the guru on enterprise level buisness, but I think SCSI was intended for middle-ground? You're talking about Fibre, FICON, and ESCON up at enterprise. I've actually never got to mess with any SCSI devices in my short (4 years) as a pc guy. I've seen them, touched them, read up about them, but I've never got to impliment one or an array of them. Sounds like fun! ^^
-Conrad