The Amazing $5k Terabyte Array
An anonymous reader writes: "Running out of space on your local disk? How about a Terabyte array for only a few thousand dollars. This article at KCGeek.com shows how to put together 1000 Gigs of hard drive space for the cost of a few desktop computers."
I could rip my entire anime collection for instant access! Rip all my
CDs and still have .9 Terabytes left! Maybe Mirror Usenet! I guess
the simple truth is that now that 100 gig drives are a couple hundred
bucks, we now have the ability to store anything we reasonably could
need (unless you define "Reasonable" as "I need to store DNA Sequences").
human = 3 billion base pairs
= 6 billion bits of data
= 7.5e8 bytes
= 7.3e5 kilobytes
= 715 megabytes
< 1 gigabyte
Sure, lots of other life forms have been sequenced too, but most of these have much smaller genomes than humans.
So how would you need a terabyte to store DNA sequences?
1 Terabyte = 1024GB = 1048576 MB
/1048576 is a price of $0.0047 a mb.
$ 5,000
Or another was $4.88 for a GB.
Now who remembers when harddisks where more than $10 a mb.
Cruise TT
I hate to rain on everyones parade (I really do). But this is just a typical IDE raid 5 setup with bigger disks. Not exactly slashdot worthy IMHO. If you're thinking about doing somthing like this, Raid Level 5 is not a bad choice if you don't need redundancy. For more raid info check out:
http://www.acnc.com/04_01_00.html
I believe that Promise makes the SuperTRAK Pro series of ATA RAID cards that support up to 6 drives and RAID 5. I haven't used them personally but they do exist.
I agree that on a server or a professional workstation SCSI is the way to go for speed and reliability. But for the home consumer who wants to work with digital video the cost of a SCSI RAID set up is extremely prohibitive.
With tapes, you just get a new drive.
Okay... I'll do the stupid things first, then you shy people follow.
[Zappa]
FYI, the DNA sequence isn't that big. The National Human Genome Research Institute has their 90% complete draft burned on a single CD.
Why aren't we told when editors moderate our posts?
9.9 TB = approx 0.01 PetaByte
Don't hold your breath thinking about petabytes.
Also, RAID isn't for people who make stupid mistakes. Sorry about your 'rm' debacle.
What's a second? An hour? A day?
It has much more to do with
the Earth's rotation than with cesium.
Actually, if you did read the article, you would find that the proposed systems is build on ATA100 supported by RAID5 software... which mean that the last of the 8 160GB drive, would be used for parity and that leaves *ONLY* (7*160GB)/1024= 1.09375TB! Now, i know that hardware RAID5 is expensive, but just think for a second: you would have hot-swapable secure-as-long-as-only-1-hard-drive-fail personnal massive-and-fast storage system... A dream system :)
I live in Soviet Canuckistan you insensitive clod!
on a server or a professional workstation SCSI is the way to go
I do wish to avoid yet another SCSI/IDE flamefest, but I would point out that this configuration is like most of its ilk--it is basically network attached storage. That means that no one will be reading or writing from the server system itself, but will be accessing the raid array through a network link via NFS and/or SMB. In my experience, performance of Linux Software RAID5 on Promise IDE controllers with 80-GB Maxtor 5400-RPM hard drives can exceed 50 MB/s write and 70 MB/s read. SMB/NFS even over Gbit ethernet will be hard pressed to saturate that.
Having built many of these low-budget raid5 arrays, I cannot concur that SCSI and/or hardware RAID is necessary to see acceptable performance. <Horror stories about Hardware IDE RAID5 controllers deleted.>
I do admonish would-be builders to include an extra hard drive in the raid array as a hot spare. For four drive arrays (3 data + 1 parity), it may be unnecessary. For larger system (7 data + 1 parity), I think a hot spare is a worthwhile investment. Also, avoid 7200-RPM drives if possible and actively cool all of the drives in the array. One or two fans blowing on the array can make a big difference.
extend that to safe-as-long-as-only-one-hd-fails-and-you-never-e
Always remember: data that is not backed up might as well not be there in the first place!
The illegal we do immediately. The unconstitutional takes a little longer.
--Henry Kissinger
Promise FastTrak 100TX2 * 4 $500
Maxtor DiamondMax 160GB Drive * 8 $3000
Maxtor DiamondMax 20GB Drive $80
You can get an Escalade 7850 for $550 or less, which is a single 64-bit card instead of the 4x Promise controllers. I don't know why there's a 20GB drive in there, maybe a boot drive? At $3k for 8 160GB drives, that's $375 each. Looking quickly at pricewatch, you can get the same Maxtor 160GB drives (5400RPM -- yuck!) for around $260 each. 8*160*(1000/1024) = 1250MB (actual MB) = 1.22 TB for a total of 550+8*260 = 2630 instead of 3580. Plus you have 3 PCI slots more than you had before.
A full USENET news feed (everything one can find) will exceed 120GB per day. It'll almost fill a DS3. (And we were receiving a "crappy" test feed from UUNet.) So, minus @alt.binaries.*, one could mirror USENET for a few years. With the binaries, it'll hold you for about a week, 2 at the most.
please tell me how you get 6 IDE drives on a pc that gives you any performance in a rad function...
I don't know how he does it, but I have personal experience in doing it two different ways:
1) 3ware IDE RAID controller, has 1 IDE controller per drive on the card (i.e. 8 ide controllers), which the firmware maps to a RAID Device. Depending on the RAID configuration the drives appear as one large SCSI drive to the system.
Performance is on par with SCSI.
2) External IDE-SCSI Raid chassis. Again, 1 IDE controller per hot-swap drive, appearing to the system as one or more big SCSI drives, controlled by a standard SCSI controller. Speed and reliability have surpassed that of a $60,000 SCSI solution sold by Sun I happen to have lying around.
U160 SCSI drives will give you at least a 70% speed increase and a 80% increase in reliability....
If I had to store a terebyte of information I'd be an idiot to use consumer level storage (IDE).
Nonsense, see above. This is simply SCSI bigotry (I know, I was once a SCSI bigot too). What you say is only true if you are using low end cards, with more than one device on each IDE bus, which is untrue for mid- and high-level IDE-SCSI solutions such as 3ware and various external chassis systems. We run our entire enterprise on one, and have done so for well over a year, with much better reliablity and performance than an older, very expensive SCSI solution provided.
But yes, if people are plugging drives into el cheapo IDE "raid" cards like Promise and the like, or worse, into their onboard IDE controllers (most of which are inexpensive knockoffs anyway) then performance will be very suboptimal, and reliability problems (one device taking down the entire IDE bus, etc.) abound.
The Future of Human Evolution: Autonomy
It seems to be just an urban legend.
If your serious about backing up that much data you could also use a 9840 drive which holds 20gb uncompressed and (they say) 80 gb compressed however in my experience you can get 140gb onto a tape. Also, it'll write faster (when backing pu a terabyte having the backup take 32 hours is not a good idea). The 9840B drives write at up to 50gb/hour but usually run closer to 30-35gb/hour. While DLT drives usually write at about 5gb/hour.
I haven't tested it out but StorageTek has a drive called the 9940 which has tapes that hold 60gb uncompressed (likely 200+gb compressed), it writes faster (10mb/sec ~= 55gb/hour). Also, the drive itself will put you out $33.5k with the tapes being a couple hundred a piece.
In this case, it'd probably be better just to have a second 1tb raid - then again tapes are much more stable.
-Cuyler
> Last time I looked at IDE in any technical
> depth, I only saw four addresses "reserved" for
> IDE controller use. I guess you can have any
> address, but the BIOS couldn't boot off any
> address, it has to know where to look for the
> controller. Predetermined list of 4 seems to
> ring a bell.
There are 4 addresses, but you can only boot off the first 2 in most operating systems. There are ways to get more than 4 up and running to expand to lots of drives, but not sure what OSs it works with.
> Secondly, IDE seems to REALLY hit the breaks
> when you do two independant operations on two
> drives on the same channel (say, a read on
> drive 1 and writer on drive 2).
The issue is that most ATA implementation don't support command queueing, therefore there is no bus release. Each command finishes to completion until the bus is released, while the other drive sits idle. Upcoming drives will be implementing queueing and won't have this performance limitation.
> If my 4 controller addresses educated guess is
> right, and performance does crawl, you'd
> probably want to have 4 drives on 4
> controllers, one each.
The secondary port isn't inherently slower than the primary port. However, each port uses a controller address. (0x178 or something for the first, can't remember offhand)
Best performance is achieved with one drive per cable.
> If all the above is correct, this guy is plain
> wrong. He's published, I'm not, I'm willing to
> admit defeat - where am I wrong? Do the raid
> controllers emulate being scsi hosts, run off
> OS drivers (=likely windows ones), etc?
Everything except ATA hard drives are emulated as SCSI hosts. ATAPI (the CDROM protocol) is simply a packet scsi over an ATA cable. The raid controllers also just use the built-in scsi layer in the OS.
eric
http://www.t13.org for the real ATA specs if you're curious
More data, damnit!
I used to build a similar kind of raid system (half a TB) using the Antec case. Their case is nice, but not for the IDE raid. The problem is that the IDE cables need to be within certain length in order to get DMA 5. The case is designed for scsi, which has a longer cable length limit. To hook up all the IDE drive in that case is really a pain in the butt.
c km ountchassis_4ud.htm
For IDE raid, this case is good except it's a bit expansive:
http://www.rackmountnet.com/rackmountchassis/ra
It can hold up to 16 drives with hot swappable trays. There should be no cable length problem.
On a side note, I used to plugin 5 Promise Ultra100TX2 cards in one computer. All cards are recognized but only 8 drives are recognized correctly (I plugged in 12 drives altogether). I remember seeing some where (either in linux kernel source or FreeBSD sys source) saying that Promise has a limit of 12 drives per system, with 8 of then in DMA mode, and the rest 4 in PIO mode with some tweak (burst?). So for a big raid like that, an ide raid cards (either 3ware's or high point's) are recommended. Using a hardware raid ide card also has the benefit of being able to hot swap the drives with the case mentioned above.
gd
They also spec'd the motherboard as an "A7B266-D". I'm guessing this is the A7M266-D, as there is no A7B266-D (no one else is even considering manufacturing an SMP Athlon chipset besides the forthcoming Micron Scimitar)
It seems to me like this is a rather poorly thought out spec. Why are they using 4 FastTrak100 TX2s when they could use 2 FastTrak100 TX4s? Which of course brings up another point, why are they even using FastTraks? Under Linux the FastTrak driver is quite immature, and last time I used it only worked with 2.2 kernels, which hinders tbe ability to use filesystems like XFS. Also, the FastTrak cards are essentially software RAID as they offload the work of calculating the stripe locations onto the host CPU. There's no point in using md to combine multiple FastTrak arrays.
Many people were mentioning the 3Ware Escalade. It is a relatively good card, but for a home storage array Linux md + XFS might be a better choice. (Also note that the advantages of 64-bit PCI couldn't be had with the A7M266-D as it doesn't include any 64-bit PCI slots. Perhaps the Tyan Tiger would be a better choice for a 3Ware solution) My recommendation would be 3 Promise Ultra133 TX2 controllers. The read and write performance on an Escalade 7410/7810 is appaling. With the embedded processor on the 7450/7850 (R5Fusion Technology, as 3Ware calls it) the performance exceeds that of software RAID, but at the much more expensive price, of course. I think the goal here is bulk storage and not performance, and the ATA133 controllers are by far the cheapest solution.
For more information on IDE RAID under Linux, check out this site It's information is a bit dated at this point, but I used it for my home storage server and haven't regretted it. With 5 7200RPM drives on Promise Ultra100 controllers and Linux md RAID-5 w\ XFS, my bonnie++ scores are 90/30MBs for sequential read and write, respectively. I couldn't be happier. This site also has benchmarks showing the superior performance of software RAID over a hardware solution with a 3Ware card.
And there were a few other things people seemed confused about. No one in their right mind would put more than one drive per channel for the purposes of a performance RAID. That's just foolish. As for the limitation of being unable to access both the primary and secondary IDE channels simultaneously, this limitation was removed years ago with the introduction of EIDE.
In as far as everything else goes, I'm a SCSI bigot. I have SCSI drives in my workstations and I couldn't be happier. However, IDE RAID is a very economical solution for a home user, often with performance on par with that of more expensive SCSI RAID solutions.
To conclude, this article seems very poorly researched and documented. Had they actually attempted to build this beast and failed, perhaps I would've been more amused. However, as stands it's an overpriced specification which uses incompatible parts, and little research has been done on the optimum parts for the configuration.
Ok. This is just inane. Why build this when someone has already done it better for cheaper?
http://www.raidweb.com
We purchase their 8 disk IDE RAID arrays. They are hot swap, support RAID 0, 0+1, 1, 3, 5, and hot spare, have dual failover power supplies, come with 64MB cache, which can be upgraded. Configurable via the EZ front LCD display, or via serial console. They support ATA-100, and ATA-133 coming shortly. Software upgradable, and it runs Linux.
They array (sans disks) runs us $3200. They even have versions that have dual fiber ports out the back.
WARNING - DO NOT purchase these with IBM GXP75 (75GB) disks like we did... we have about 80 of them that failed.
I just built a similar setup -- 500GB for less than $2,900. However, I made some different design choices.
First of all, I wasn't too impressed with the Promise controller, so the choice for me was between the 3Ware 7850 and the Adaptec 2400A. The Adaptec had the best overall performance, but the 3Ware is close and can support 8 devices. For the hard drives, I wanted to come reasonably close to SCSI performance, so I chose the WD1000JB drive with the on-board 8MB buffer. I used a Tyan Tiger K7 with 64-bit PCI for the motherboard with dual Athlon XP (not MP) 1700+ CPU's plus 1GB ECC registered PC2100 DDR RAM. Put them all in a nice aluminum rackmount case.
I'll probably replace the motherboard with the newer Tyan with 66MhZ PCI bus in the near future and use the current one in a workstation. I'll also drop in more RAM if/when prices drop.
It's been pretty sweet so far with LVM + XFS. My backup solution is a 33GB tape drive, so I spend most of every Sat. backing up the array. Time and money permitting, I'll build a second one and look for a DLT tape library on ebay.
From a Huntsville Times (Alabama) interview with Bill Gates:
QUESTION: "I read in a newspaper that in l981 you said '640K of memory should be enough for anybody.' What did you mean when you said this?"
ANSWER: "I've said some stupid things and some wrong things, but not that. No one involved in computers would ever say that a certain amount of memory is enough for all time."
An unjust law is no law at all. - St. Augustine