The Amazing $5k Terabyte Array
An anonymous reader writes: "Running out of space on your local disk? How about a Terabyte array for only a few thousand dollars. This article at KCGeek.com shows how to put together 1000 Gigs of hard drive space for the cost of a few desktop computers."
I could rip my entire anime collection for instant access! Rip all my
CDs and still have .9 Terabytes left! Maybe Mirror Usenet! I guess
the simple truth is that now that 100 gig drives are a couple hundred
bucks, we now have the ability to store anything we reasonably could
need (unless you define "Reasonable" as "I need to store DNA Sequences").
Its only a matter of time 'til video becomes as commonplace as MP3's on our drives. 100 Gigs is what...20 movies??? I don't see my appetite for disk space slowing down any time soon.
Hmmm...video; logfiles that don't roll over - ever; online network backup... I'm sure to figure out a way to fill that terabyte. :)
BRENT ROCKWOOD, EST'd 1975
yeah , with 160 gig ATA drives out now,
you can do it with 6 drives vs. 10 drives,
and alot of motherboards come with onboard
RAID, and if you use software RAID via
win2k or Volume manager type app for Linux
it would rock .
Cheap too, at $260 per drive per pricewatch .
Peace out...
Actually a DNA sequence is only about 3GB for a human - you're anime DVDs might take more space, at least until you compress them. Then again, DNA should be fairly trivial to compress highly. Let Z = CA, Y = TG, .....
"Computer Science is no more about computers than astronomy is about telescopes."
-E. W. Dijkstra
human = 3 billion base pairs
= 6 billion bits of data
= 7.5e8 bytes
= 7.3e5 kilobytes
= 715 megabytes
< 1 gigabyte
Sure, lots of other life forms have been sequenced too, but most of these have much smaller genomes than humans.
So how would you need a terabyte to store DNA sequences?
Nobody should ever have need for more than 640 kB of RAM Bill Gates
Simularities anyone?
Sig (appended to the end of comments I post, 54 chars)
1 Terabyte = 1024GB = 1048576 MB
/1048576 is a price of $0.0047 a mb.
$ 5,000
Or another was $4.88 for a GB.
Now who remembers when harddisks where more than $10 a mb.
Cruise TT
A terabyte isn't any thing special. But it's cool to see someone doing it. I was bored once one night. For a mere 36K you could, assuming you already own a Thunder K7 w/ the on-board SCSI pluss needed components, put together your self some really big storage. Using those 181GB Seagate SCSI drives.
;p
U160 and all of it churning at 10,000RPM. For a grand total of a few GB short of 5.5 Terabytes.
But assuming you can affoard Thirty 1200$ drives you should be able to spring for a nice U160 SCSI RAID Card with an external connector
I couldn't even find a case with enough room for 30 hd's.... and I don't want to even think about cooling.
But I wont have to worry about that. I can't even affoard a 9gb scsi drive at this point.
Computational Madness in a round package.
I've been using these for a long time (6200 dual-port in hardware-mirror, up to the 8-port cards for large disk configs), and they're very fast and reliable. Cheap, too.
$500 for an 8-port 64-bit RAID controller, looking to the host like a single scsi device per logical volume, seems like the best deal available. Along with a motherboard with sufficient slots for gig-e and these cards (easy to get 4 64-bit slots...maybe you can get more with 3-4 buses), and a 4U rackmount case with 16 drive bays, and you can have 4U of rackmount storage for $5k, too.
I've been using setups like this for clients, as well as for private file storage (divx, mp3, backups, etc.), and know of people using them for USENET news servers (one of the most demanding unix apps for reasonably priced hardware).
It goes without saying you want a journaled file system or softupdates when you have disks this size, and ideally keep them mounted read-only, and divided into smaller partitions, whenever possible. e2fsck on a 300GB partition with hundred of open files is painful.
Yes, this is a groovy/geeky/cool solution for under your desk, but at least spend the extra dollars for a SCSI card and tape backup unit. You could fit the whole thing on a few DLT's. You can also keep incremental backups to keep the tape swapping to a minimum.
Check out this article referenced by slashdot on July 20 2001.
The nice thing about this article is that the people building it at SDSC really took extreme care in getting quality components that would work together to build a reliable, solid system, and still didn't spend more than $5K for a terabyte file server. In particular, the tradeoff of disk speed vs. power consumption was extremely insightful.
I built one of these to their spec for my company, and I couldn't be happier. It's worked flawlessly since then. It's not clear if the Escalade boards are still available -- 3ware had said that they were discontinuing them, but they still appear to be for sale.
thad
I love Mondays. On a Monday, anything is possible.
Not to be rude, but if you RTFA you would find out in the third paragraph.
But to answer your question, it looks like an IDE Software RAID5.
I hate to rain on everyones parade (I really do). But this is just a typical IDE raid 5 setup with bigger disks. Not exactly slashdot worthy IMHO. If you're thinking about doing somthing like this, Raid Level 5 is not a bad choice if you don't need redundancy. For more raid info check out:
http://www.acnc.com/04_01_00.html
1) "Compress" at a higher rate than the CD uses (I've seen this)
2) Use POV Ray to render Lord of the Rings for the cinema
3) Keep every src and every
4) Set the Linux swap space to be "500Gb" because you've upgraded the Kernel to the new VM stuff and it looks cool
5) Install Windows XP+ in two years time, with Office XP+.
Imagine that "Minimum Reqs: 1TB of available disk space"
It will happen
An Eye for an Eye will make the whole world blind - Gandhi
I'm sure some poor fool will do something like this, fill it up with data, then have ONE hard drive go bad, making everything practically useless.
What we need isn't larger hard drive storage (not that it's a bad thing) we need more speed, and a cheap, gigantic & ultrafast tape backup system to backup all the data. Some PC designs that use better cooling methods would be very nice as well.
Slashdot gets worse every day... Pipedot: News for nerds, without the corporate slant
In fact I remember reading somewhere about a year ago on the linux terminal page about how they put a tb server together for right around 4K I can't find the link, but if someone does please post. But grabbing the third largest drive (100GB) out there will save you a bundle and you still only need 10.
Do you changes clothes while making the "chee-chee-cha-cha-choh" transformation sound?
That reminds me, I don't know where the hell the tape manufacturers think they're marketing to, but with 80 GB hard drives common now, it's rare to find a tape backup solution that is affordable for a consumer that can handle that much. By affordable I mean drives around $250 and tapes under $10/piece for at least 50GB of storage. I've seen some of the proprietary drives but the tapes cost almost as much as the drive! 5 or 6 years ago the backup drives available to consumers could handle backing up the entire average hard drive of the time onto a $15 tape (Travan), but now people are probably just doing without backups which is a disaster waiting to happen.
pfft, these days people are demanding a terabyte of RAM.
How we know is more important than what we know.
How about another terabyte array and rdiff? While Joe Average User probably isn't going to be able to afford to do that, he's probably not going to be able to want to build the first one either. If you're a small to medium size company, it'd probably be worth considering. I think by the time you start talking this price tag, you'd be considering some of the mainfraime storage companies for DASD and backup though. IBM's 2105 "Shark" machine will go larger than 11TB now, IIRC, and I'm sure the other "big iron" shops have similar solutions.
I'm trying to teach myself to set people on fire with my mind... Is it hot in here?
Inspired by Slashdot's earlier story that was nearly identical, and with the help of Peter Ashford from ACCS, we built two servers, both with capacities well over a TB, for around $8000 each. They have the capacity to expand to 3TB if need be.
Story here
As far as performance:
(from my memory)
EXT3: About 16MB/Sec block write, 45MB/sec block read
ReiserFS: About 20MB/sec block write, 130MB/Sec block read (that's no typo).
XFS: About 30MB/sec block write, 85MB/sec block read.
It seems that file system plays a large role in performance. The arrays are three RAID5 in hardware using Linux software RAID0 on top of the RAID5 arrays to tie them together.
IDE RAID controllers are 3ware Escalade 7810. Write performance can be greatly increased by using 7850 cards that have more cache.
We stuck with XFS, Reiserfs had a bigfile bug, files created over 2GB would lock up the computer basically. XFS in general seemed much more mature, reiserfs seems more like someone's college thesis project, that they never cleaned up to be production grade.
We experimented with different RAID0 stripe sizes, the hardware RAID5 stripe size is fixed at 64k, there are 7 active disks in each array and one hot spare. Stripe size tweaking seemed to mostly trade off read for write speed, within a certain range of values, with a taper off in performance at either extreme, (down around 8k stripes, or over 1024k stripes)
We eventually went with 1024k stripes. That is what the benchmarks above reflect. The variance in file system performance could very well be due to interactions with stripe size, but there seemed to be common themes (reiser always read fastest no matter what stripe, XFS was always better at writes)
I have been in so many arguments with SCSI zealots on here over this RAID... I wish people would understand what price/performance ratio means. IDE isn't a superior technology, but every now and then, it is the right tool for the job, when price is a goal too.
I've had enough abrasive sigs. Kittens are cute and fuzzy.
Is this any more special than the last time
.. 10 of these would give you over 1 terabyte in useable space in raid 1.. Or if you just cared about write performance, 6 of them for $1554 would give you a terabyte of useable storage.. another $600 to throw together a cheap pc and cheap ide raid cards.. you get it for under $2500.. big deal.
slashdot announced an amazing terabyte arrayHere
Seriously though.. People's numbers are pretty far off. This can be done for about 3000.. Pricewatch
has 160 gig drives for $259
Lately I'm realizing how awful IDE really is.. I finally got around to throwing 2 36 gig ultra 160 drives on my box with an adaptec scsi card, running ext3 on top of a raid mirror.. more space than I need (I just keep all my mp3s on an IDE raid.. since my dragon motherboard has ide raid built in).. Since I've gone to scsi life has been happy. I can do things while compiling, while vacuuming my db, etc..
Funny how mac used scsi before the rest of us, huh?
"And how can this be? For he is the
The first thing that runs through your mind when you see the above headline is: "Wow, imagine a Beowolf cluster..."
Argh.
And remember kids: Never trust a computer you can actually lift.
Why not snap in a Promise SX6000 for like $250?
This neat piece DOES hardware RAID5, so you don't need a fast cpu&mobo, less RAM, and since it can only manage up to 6 drives you can even have 2 as pseudo hot spare...
The only drawback is the ability of "only" storing 800GB which is nice at this even cheaper price...
Aren't these types of systems more for archiving massive amounts of data than actively working on it? I mean, how much data can a computer actively process anyway? Wouldn't a 100GB drive meet just about any processing demands (genome tracking, video editing, etc)?
Why not use slower but MUCH cheaper offline storage? I really like the design goal of
http://www.dvdchanger.com/
You can easily get 1TB of storage with such a device for less than $1000. True, only one person can access it at a time but that is only because PowerFile wants to charge more for so-called "networked version".
In theory, if someone could figure out how to build on of these things, you could throw in a two or three CD/DVD drives for accessing and a 20GB hard drive to buffer images. Boom. Now you have the perfect storage backbone for a house-wide media center. I just wish Linksys or someone would throw a linux thinserver onto of the PowerFile hardware and get me something cheap and network-ready.
- JoeShmoe
.
-- I wonder which will go down in history as the bigger failure: the War on Drugs or the War on Filesharing
If you're going to spend $4K on a DLT drive, spend $8K on a DLT tape library that holds 10 DLT's plus 1 cleaning tape and forget about it. Sure it's only 700Gig of backup but you can always compress.. Otherwise upgrade to a 20 DLT tape library box and call it done.
Do not look at laser with remaining good eye.
1. SCSI can handle between 15-30 devices on one good controller, *not* including support for multi-disk changers via LUNs. Most IDE can handle 2-4.
2. SCSI drives don't turn into slugs when you access more than one at a time. IDE does. Want to see it REALLY screw up, access an ATAPI CD-ROM slave the same time as a HD Master on the same controller.
Learning HOW to think is more important than learning WHAT to think.
I figure this is the easiest way to add as you grow without having to break open the case and try to figure out how to add another damn drive in there. For backup, just have two systems with identical capacities and rsync between the two nightly.
RAID is nice, but for home use, it's not as nice as a nightly mirror. Why? I've seen RAID controllers fail and take out an entire RAID set. RAID also doesn't deal with the "Holy shit, I just accidently type `rm * ~` instead of `rm *~` problem."
1 Promise 6 channel PCI ide raid controller, 99$US.
12 Maxtor 160gb ata133, 270$ each
1920gb of Pr0n and other goodies, priceless!
FIRE!
Any serious data store needs to include a backup system which allows for copies off-site. Fire is the obvious risk of course, but floods, vandalism and lightning strikes are all possibilities.
AFAIK the only generally available tape backup for something this big is DLT, which IIRC can now do around 40GB per tape before compression. With the 2:1 compression usually quoted thats 80GB per tape, or around 13-14 tapes for a full backup. So you really need about 30 tapes for a double cycle, and maybe more if lots of the data is non-compressible (like movies). But this stuff ain't cheap. DLT drives start at around £1000 and the tapes cost £55 each. So thats around £2500 = $4200 to back this beastie up.
Having said that, the possibility of using hot-swappable IDE drives as backup devices is intriguing. Just point your backup program at /dev/hdx3 or whatever. One big advantage is that if your tape drive gets cooked in the server-room fire you don't have the risk of tapes that can only be read on the drive that wrote them. A Seagate 5400RPM 60GB drive costs £110, which is only a third more per megabyte than a bare DLT tape. Two cycles-worth of backup (34 drives) would be £3,700. And you can probably do better by shopping around. For servers with only a few hundred GB on line this might well be more cost-effective than buying a DLT drive.
We use Amanda to do backups here. Its a useful program, but it can't back up a partition bigger than a tape. So you need to think carefully about your partition strategy. (Side note: you can use tar rather than dump to break up over-large partitions, but its still a pain).
Suddenly that terabyte starts looking a bit more expensive.
Paul.
You are lost in a twisty maze of little standards, all different.
Amen to that. I was looking for a backup solution for my 60 gig server a few weeks ago. Know what the most cost effective solution turned out to be??? Another damn harddrive!
Have a Happy.
Does anyone out there actually use IDE drives like this? It seems a pretty obvious thing to do.
Paul.
You are lost in a twisty maze of little standards, all different.
The requirements specify 4 PCI RAID controllers. Each of these could potentially handle 4 hard drives. I'm assuming that he's only putting 2 on each so that it doesn't come across the problem of accessing 2 drives on the same channel. In addition to this, there are 2 more on the motherboard, that I guess he isn't using. Secondly, these cards are bootable. So any one of them can be set to boot from and you can boot from any drive. But I don't think he is doing that because he has an additional 20 gig drive that I'm assuming is going on the motherboard. That is where the OS is going to be installed.
Go here for the datasheet
_______________________________
"I'm not Conceited...I'm just a realist..."
With tapes, you just get a new drive.
Okay... I'll do the stupid things first, then you shy people follow.
[Zappa]
FYI, the DNA sequence isn't that big. The National Human Genome Research Institute has their 90% complete draft burned on a single CD.
Why aren't we told when editors moderate our posts?
Actually, if you did read the article, you would find that the proposed systems is build on ATA100 supported by RAID5 software... which mean that the last of the 8 160GB drive, would be used for parity and that leaves *ONLY* (7*160GB)/1024= 1.09375TB! Now, i know that hardware RAID5 is expensive, but just think for a second: you would have hot-swapable secure-as-long-as-only-1-hard-drive-fail personnal massive-and-fast storage system... A dream system :)
I live in Soviet Canuckistan you insensitive clod!
True. But one thing I haven't seen yet is the fact that most backups aren't full backups. You do a full backup maybe one a month or once a year. Every other backup is a diff only. So while the initial backup may take several tapes, the nightly backups shouldn't. At least on the type of system where the data is basically the same from day to day, which was the point of the article.
Plus, as described in the article, where the point was to have a singe hard drive based storage for dvd's and cd's, if there was a drive failure, you could just take the original media and do the rip again. Annoying yes, but doable. You haven't lost data unless the fire burned down your house and melted the cd's at the same time it took out your storage. That's why companies buy fire safes and use off-site storage.
Video is the most bulky storage people would save. How much would people want to save for re-viewing? First you have the time-shifting stuff like TiVo/Replay- perhaps a few tens of hours at most. Then you would be your favorite movies and TV series. As video-phone improves you might be saving some hours of friends and relatives video conversations. With infinite storage, the constraint becomes need and time to view all that stuff. And you'll probably be wanting to spend your time looking at new stuff. So I'd guess most people's real needs would be hundreds to a thousand hours. At 1-2 BG per hour, your talking about a terabyte or two.
I don't include the argument that you'd have trouble finding old stuff. Computer software is more clever at organizing things - far better than material storage. A good recent example of this is Apple's "iPhoto" that much more convenient for organizing thousands of photos than physical albums.
This would be great for a home file server. Many new homes are being built pre-wired with CAT5 (alas not my old house). Just add a big file server in the basement. With proper wiring, it can act as an answering machine / PBX, personal video recorder, music (MP3) repository, mail server, file server, etc. With RAID, you have less worries about a drive crash wiping you out (though you'll need a disaster recovery plan - flooded basements would be real bad). I've always wanted to do this! Main stumbling block is getting CAT5 wiring from the second floor (where my computers reside) to the basement.
[Insert pithy quote here]
At the most people's genomes differ by 0.1% from each other - much less than that if you are relatives. Therefore you'd record the differences, sort of like several of mpeg algorithms.
Ironically, I just built something very similar to this a few weeks ago (it runs great BTW), but I spent <$1500US on all the components. The biggest thing you have to watch out for is the Hard Drives. I went for the ones with the best bang/buck ratio at the time (Maxtor 80GB 5400RPM drives). This let me build a system with well over 1/2 a Terabyte of usable space at a fraction of the cost. Additionally, the slower drives require less power and less cooling, making them easier to fit in a standard full tower case with a merely beefy (as opposed to server-class) power supply. I think the processor requirements he stated were a little overboard as well. I've found that disk access tends to be limited by the PCI bus (it doesn't help that I used an older motherboard with 33 Mhz 32bit PCI), especially on writes where you can spread data across the write cache on the drives. Be careful when you build an array like this, ATA *hates* having access to both a master and a slave drive at the same time. Be sure to avoid having two disks on the same plex on the same controller. This was natural for me fortunatly, since I was building two plexes, a "backup" and a "media" plex.
A final word of warning: Promise ATA100 TX2 controllers may look like a natural choice for a server like this, but they only support UDMA on up to 8 drives at once, and Promise's tech support only supports a maximum of 1 (one!) of their cards in any system.
I read the internet for the articles.
"Draco dormiens nunquam titillandus."
Actually, I assembled a 600 gig storage device using the afore mentioned 3ware controller.
First, there were hardware bugs and they recalled the controller
Second, 3ware dropped the product line, but vendors were still telling me it was available.
Third, they brought it back, and I had to get a drop ship
I lost about 3 months on design phase due to this little tidbit.
Now don't get me wrong, it's working now and seems reliable... but... there's always this nagging suspicion that something is going to go wrong and I'll lose all that data.
Get rid of it.
extend that to safe-as-long-as-only-one-hd-fails-and-you-never-e
Always remember: data that is not backed up might as well not be there in the first place!
The illegal we do immediately. The unconstitutional takes a little longer.
--Henry Kissinger
With Raid5 a single drive can fail without causing dataloss.
How do you know WHEN a drive has failed?
With the low end IDE RAID cards your notification comes when the 2nd drive fails......
3Ware's website describes a SNMP monitoring utility for windows, but didn't specifically mention Linux support. Ditto for Adaptec.
If the raid is done in software, is there a linux program to monitor and notify when a single drive goes down?
So, then, I'm confused... He's trying to use software raid, but he has 4 Promise FastTrak 100TX2 raid controllers. WTF? First off, each of those cards supports 4 drives on 2 channels... Why does he need 4 cards when he only has 8 drives? He only needs 2 cards. Second, why is he using expensive raid hardware (that doesn't even support RAID 5) when he's using software raid!?
All he needs are two of maxtor's cards, which you can buy packaged with the drive for an extra $13. Not only that, but his prices on hard drives are way too high. 8 drives (2 with maxtor's ide cards) are $2122, per pricescan.com. Since he lists $500 for the ide cards and $3000 for the HDD's, that's a savings of $1378.
Then, he quotes $500 for 2 GB of ram. At $70/.5GB sticks that's $280. $500 for a case??? Try $365.
That said, the $5720 price he quotes is high by $1733. You could build one of these for just under $4000.
Ok, I admit, I didn't include shipping.
please tell me how you get 6 IDE drives on a pc that gives you any performance in a rad function...
I don't know how he does it, but I have personal experience in doing it two different ways:
1) 3ware IDE RAID controller, has 1 IDE controller per drive on the card (i.e. 8 ide controllers), which the firmware maps to a RAID Device. Depending on the RAID configuration the drives appear as one large SCSI drive to the system.
Performance is on par with SCSI.
2) External IDE-SCSI Raid chassis. Again, 1 IDE controller per hot-swap drive, appearing to the system as one or more big SCSI drives, controlled by a standard SCSI controller. Speed and reliability have surpassed that of a $60,000 SCSI solution sold by Sun I happen to have lying around.
U160 SCSI drives will give you at least a 70% speed increase and a 80% increase in reliability....
If I had to store a terebyte of information I'd be an idiot to use consumer level storage (IDE).
Nonsense, see above. This is simply SCSI bigotry (I know, I was once a SCSI bigot too). What you say is only true if you are using low end cards, with more than one device on each IDE bus, which is untrue for mid- and high-level IDE-SCSI solutions such as 3ware and various external chassis systems. We run our entire enterprise on one, and have done so for well over a year, with much better reliablity and performance than an older, very expensive SCSI solution provided.
But yes, if people are plugging drives into el cheapo IDE "raid" cards like Promise and the like, or worse, into their onboard IDE controllers (most of which are inexpensive knockoffs anyway) then performance will be very suboptimal, and reliability problems (one device taking down the entire IDE bus, etc.) abound.
The Future of Human Evolution: Autonomy
It seems to be just an urban legend.
You can stuff 8 60 gb disks into an antec server case. With a pair of 1600 XP processors, the total cost is 2 promise cards = $50, 8 drives = $720, .5 tb and $3000 for the full tb. Further, you have a bit more
.6 tb into a case. When you are paying for floor space and cooling, the 160 gb drives make sense, but when you are tunning these in your basement, going for two boxes makes it a cheaper and more robust solution.
2 xp processors = $220, mobo = $220, memory = $200,
case = $150, total is about $1500 for
i/o bandwidth with 6 ide controllers, and 2 pci busses than with the single. Also when one of them craps out, the other is still going in all probability. Going to 80 mb drives gives you about the same cost per gb of drive space and lets you put
With 120 gig drives, your total cost for a 1 TB array would be about $2500. With 4 IDE ports and a large enough case, you could get all that into one box, then network the beastie.
Now I just need to find $2500. I know I won't have a problem filling it.
-Restil
Play with my webcams and lights here
For those that choose to go the "fire proof box" route, please be careful that you buy a unit that's certified to protect media. A fireproof box that will protect papers from catching fire isn't necessarily sufficient to keep tapes and disks from being destroyed by the heat. Make sure you buy one that's appropriate for your intended contents.
using a tb array for anime is like having one of your turds bronzed.
> He's trying to use software raid, but he has 4
> Promise FastTrak 100TX2 raid controllers. WTF?
> First off, each of those cards supports 4
> drives on 2 channels... Why does he need 4
> cards when he only has 8 drives? He only needs
> 2 cards.
I'm a firmware engineer for Maxtor... if you're going for performance, you want 1 drive on each bus, and you don't want to use the motherboard connectors. With 2 drives on each bus, you are limiting the average transfer rate out of cache to 50% of the max transfer rate. On a modern drive with their 60-65MB/sec channel rates, you cannot stream sequentially off of 2 drives without saturating an ATA-100 cable. Even running ATA-133 won't help starting a year from now.
Additionally, every bios I have looked at sucks in terms of performance. In most cases they have small DMA FIFOs which stutter the pipe during high speed transfers -- they literally hang the DMA lines while they empty their fifo into memory, then come back and grab another 8 words or something sad. They also tend to be very poor managers of the IRQ line. This causes delays at times when your hard drive could be giving you more data, but the host hasn't gotten around to asking for it yet.
All the 3rd party cards have like 2Kbyte FIFOs which prevents any overrun from occurring, which alone is quite helpful in high bandwidth applications.
The cards we include with our drives are in the lower end of Promise's spectrum... you can spend more and get more performance if you want to, which is what I suspect the author of the original article did.
--eric
More data, damnit!
I've wanted a terabyte of storage since the mid-1970s, when I realized that there were approximately a trillion square meters on the Earth's surface. Store one byte of grayscale image for each square meter and that's a terabyte of data right there.
Of course these days I'd want 3TB so I could store color images.
The other problems with your scheme are:
It's not a bad idea, but certainly not something that can be done for $5k. I'd think there must be a breakpoint somewhere where it makes sense to build stuff in multiple machines (instead of cramming tons of disks into a single machine), but I think it's not at 1 disk/machine.
How much uptime you need is purely dependant on you. Since my array is for personal use, I don't mind a bit of downtime when a component fails (since I'm working on the problem myself anyway, it's not like I'd get much use out of it when it was partially down anyway!). If you really really need multi-9 uptime, $5k IDE storage solutions really aren't the way to go.
I read the internet for the articles.
> Last time I looked at IDE in any technical
> depth, I only saw four addresses "reserved" for
> IDE controller use. I guess you can have any
> address, but the BIOS couldn't boot off any
> address, it has to know where to look for the
> controller. Predetermined list of 4 seems to
> ring a bell.
There are 4 addresses, but you can only boot off the first 2 in most operating systems. There are ways to get more than 4 up and running to expand to lots of drives, but not sure what OSs it works with.
> Secondly, IDE seems to REALLY hit the breaks
> when you do two independant operations on two
> drives on the same channel (say, a read on
> drive 1 and writer on drive 2).
The issue is that most ATA implementation don't support command queueing, therefore there is no bus release. Each command finishes to completion until the bus is released, while the other drive sits idle. Upcoming drives will be implementing queueing and won't have this performance limitation.
> If my 4 controller addresses educated guess is
> right, and performance does crawl, you'd
> probably want to have 4 drives on 4
> controllers, one each.
The secondary port isn't inherently slower than the primary port. However, each port uses a controller address. (0x178 or something for the first, can't remember offhand)
Best performance is achieved with one drive per cable.
> If all the above is correct, this guy is plain
> wrong. He's published, I'm not, I'm willing to
> admit defeat - where am I wrong? Do the raid
> controllers emulate being scsi hosts, run off
> OS drivers (=likely windows ones), etc?
Everything except ATA hard drives are emulated as SCSI hosts. ATAPI (the CDROM protocol) is simply a packet scsi over an ATA cable. The raid controllers also just use the built-in scsi layer in the OS.
eric
http://www.t13.org for the real ATA specs if you're curious
More data, damnit!
Remember, most of the breathless prose about the huge, enormous, gigantic, [favorite-bigness-adjective] amount of information in DNA was written years ago, by biologists. Moore's law has been in effect for some time since then, and the human genome hasn't gotten any bigger in the meantime.
To a Lisp hacker, XML is S-expressions in drag.
Unless Taco is storing DNA sequences from aliens, I don't know what he's talking about. I downloaded the human genome project last year and if I remember correctly it was definitely under a gigabyte.
1 Terrabyte solution - $2500
All the pr0n you could ever watch - $1,000,000
The look on your Mom's face when she clicks on AsianDogAssRape10.mpg - Priceless
This
[I still need a versioning filesystem, like VMS though.]
I hate to say it, but SCO (yes, SCO) had a versioning filesystem in OSR5. HTFS (High Througput File System) had versioning support.
Fascism starts when the efficiency of the government becomes more important than the rights of the people.
You've just stumbled across one of the main concepts behind the Storage Area Network [snia.org]. The biggest problem you have is bandwidth.
Dude, that's why most SANs are made out of Fibre Channel. FC is a 1GB transport that has a SCSI protocol on top (FCP-SCSI). 2GB FibreChannel is available, and work is currently under way on 10GB. In addition, FC is full duplex.
Fascism starts when the efficiency of the government becomes more important than the rights of the people.
We (my company) designed a very similar system using a Tyan Tiger200 with dual GHz Cel's etc. The problem is that the drives he lists (the 160GB Maxtors) aren't addressable by the RAID controller he is using (the Promise TX). The Promise card will only address up to 127GB per drive. You have to use a ATA-133 spec controller to get the full capacity out of those drives. We did an array using the TX and WD 120GB 7200RPM drives (with 8MB cache - mmmmmmmmm.....) that flat smokes anything that you can put together with the Maxtor drives. Oh well....
I think Usenet is underestimated here. I remember reading on the site of one of the larger ISPs, specialized in good usenet access (ie. 30000+ groups & week+ retention even on binaries groups) that they have significantly more than 1 TB of storage space (don't remember how much, but several TB). So mirroring Usenet might be a tight fit.
beauty is only a light switch away
i believe that there is more problem in performance rather than capacity.
a typical configuration that cheap will use an ide hdd (and to make it cheaper software raid).
the main problem (for us in this case) is the performance. how do you increase the data transfer? for the past few years, the storage space has increased tremendously but the transfer rate of the drives are out of proportion with the space.
ide is usually placed in a 33mhz/32bit bus which will give a burst transfer of 133mbyte/sec. that is the max whatever you do. but if you will place a nic card, they will share the bandwidth unless it is placed in a different bus.
for the interface itself, scsi can handle more i/o operations/sec and fc even more. technologies today can implement raid5 at almost no performance hit.
so given 1tb of data, definite many people will be accessing it (unless you really plan to use it for your insane storage space). so if people will be able to store much, they can access it at a much slower rate.
so you won't see the scsi and fc being obsolute even though the serial ata gets through. it will remain in the low end segment of the storage market.
and besides, if you want to backup your data, the best way is to store it to tape and that will cost big (since mirroring the info in another server will not give you the reliability compared to tape)
Live your life each day as if it was your last.
This guy totally went the wrong way for expandability and speed. You can get the Promise SuperTrakSX 6000 for $480 and that has hardware raid 5 and supports 6 drives. I'd throw one of those in with 6 drives to start and take my 800Gig and be happy. That would save me at least a $1000 up front. I wouldn't need 2 of the harddrives, the second processor or so much ram. Plus it would be faster and much more reliable. Then later on I could add another one for about $2500 and have 1.6 TB of space to store my huge collection of pornography... err rather mp3's, software and G-rated dvd movies.
If your not cheating your not trying. If your not trying your not winning and if your not winning why play?
I used to build a similar kind of raid system (half a TB) using the Antec case. Their case is nice, but not for the IDE raid. The problem is that the IDE cables need to be within certain length in order to get DMA 5. The case is designed for scsi, which has a longer cable length limit. To hook up all the IDE drive in that case is really a pain in the butt.
c km ountchassis_4ud.htm
For IDE raid, this case is good except it's a bit expansive:
http://www.rackmountnet.com/rackmountchassis/ra
It can hold up to 16 drives with hot swappable trays. There should be no cable length problem.
On a side note, I used to plugin 5 Promise Ultra100TX2 cards in one computer. All cards are recognized but only 8 drives are recognized correctly (I plugged in 12 drives altogether). I remember seeing some where (either in linux kernel source or FreeBSD sys source) saying that Promise has a limit of 12 drives per system, with 8 of then in DMA mode, and the rest 4 in PIO mode with some tweak (burst?). So for a big raid like that, an ide raid cards (either 3ware's or high point's) are recommended. Using a hardware raid ide card also has the benefit of being able to hot swap the drives with the case mentioned above.
gd
Would be to replace the 4 controllers, and the monster case. Use a more "standard" chassis. Slap a regular SCSI card in it. And then for the drives themselves, use an UltraTrak100 TX8
to hold the drives.
It just seems like a far cleaner solution. Not to mention FAR more expandable. And works out to be about the same price.
"Politicians are interested in people. Not that this is always a virtue. Fleas are interested in dogs." P.J. O'Rourke
Storage solution: 1TB RAID5 storage array (Prices are from Pricewatch) Quantity Price Subtotal Intel Celeron 700 MHz w/ Socket 370 MB, UDMA 100, AGP VIDEO 8~64MB shared only, Sound, 56K AMR Modem, 10/100 Network in MidTower case w/Powersupply 1x$135.00=$135.00 Power Magic PCI IDE U/ATA100 RAID Controller w/Cable 4x$22.00=$88.00 Maxtor 4G160J8 5400/133 8x$259.00=$2,072.00 60.0GB EIDE Ultra DMA 5400 1x$85.00=$85.00 Total: $2,380.00 - Mangoless
[a mango-free monkey]
Get a 3ware escalade card in march they'll support 48bits-LBA in the new firmware, you'll be able to hookup those 160GB monsters in raid-0 (or raid-5) with a tenfold increase in performance, without taking up all the PCI slots.
the TX2 is a nice little card, but you can only use 2 drives per board for getting the "full speed" (else if you use master/secondary, 4 drives will give you the raid speed of 2 in stripe) and then you'd have to stripe your raid-0 drives in software. Instead of wasting PCI slots and using an underperforming card, you pay a couple of bucks more and you get the real thing with full speed and hardware raid5.
There are a lot of raid benchmarks at storagereview.com as well. IDE raid is so damn cheap.
--- Metamoderating abusive downgraders since my 300th post.
They also spec'd the motherboard as an "A7B266-D". I'm guessing this is the A7M266-D, as there is no A7B266-D (no one else is even considering manufacturing an SMP Athlon chipset besides the forthcoming Micron Scimitar)
It seems to me like this is a rather poorly thought out spec. Why are they using 4 FastTrak100 TX2s when they could use 2 FastTrak100 TX4s? Which of course brings up another point, why are they even using FastTraks? Under Linux the FastTrak driver is quite immature, and last time I used it only worked with 2.2 kernels, which hinders tbe ability to use filesystems like XFS. Also, the FastTrak cards are essentially software RAID as they offload the work of calculating the stripe locations onto the host CPU. There's no point in using md to combine multiple FastTrak arrays.
Many people were mentioning the 3Ware Escalade. It is a relatively good card, but for a home storage array Linux md + XFS might be a better choice. (Also note that the advantages of 64-bit PCI couldn't be had with the A7M266-D as it doesn't include any 64-bit PCI slots. Perhaps the Tyan Tiger would be a better choice for a 3Ware solution) My recommendation would be 3 Promise Ultra133 TX2 controllers. The read and write performance on an Escalade 7410/7810 is appaling. With the embedded processor on the 7450/7850 (R5Fusion Technology, as 3Ware calls it) the performance exceeds that of software RAID, but at the much more expensive price, of course. I think the goal here is bulk storage and not performance, and the ATA133 controllers are by far the cheapest solution.
For more information on IDE RAID under Linux, check out this site It's information is a bit dated at this point, but I used it for my home storage server and haven't regretted it. With 5 7200RPM drives on Promise Ultra100 controllers and Linux md RAID-5 w\ XFS, my bonnie++ scores are 90/30MBs for sequential read and write, respectively. I couldn't be happier. This site also has benchmarks showing the superior performance of software RAID over a hardware solution with a 3Ware card.
And there were a few other things people seemed confused about. No one in their right mind would put more than one drive per channel for the purposes of a performance RAID. That's just foolish. As for the limitation of being unable to access both the primary and secondary IDE channels simultaneously, this limitation was removed years ago with the introduction of EIDE.
In as far as everything else goes, I'm a SCSI bigot. I have SCSI drives in my workstations and I couldn't be happier. However, IDE RAID is a very economical solution for a home user, often with performance on par with that of more expensive SCSI RAID solutions.
To conclude, this article seems very poorly researched and documented. Had they actually attempted to build this beast and failed, perhaps I would've been more amused. However, as stands it's an overpriced specification which uses incompatible parts, and little research has been done on the optimum parts for the configuration.
Actualy, when the Human Genome first got online, I downloaded the thing as an 800mb zip file. Because I could. It was only a few gigs uncompressed. Unless you needed to store the whole genome for a couple people (rather then, say, diffs) current tech works fine. Hrm, a little odd knowing that the whole Human Genome is only about four or five times the size of a Divx movie.
autopr0n is like, down and stuff.
The bandwidth is pretty good, but it's the latency that'll kill you.
autopr0n is like, down and stuff.
Ok. This is just inane. Why build this when someone has already done it better for cheaper?
http://www.raidweb.com
We purchase their 8 disk IDE RAID arrays. They are hot swap, support RAID 0, 0+1, 1, 3, 5, and hot spare, have dual failover power supplies, come with 64MB cache, which can be upgraded. Configurable via the EZ front LCD display, or via serial console. They support ATA-100, and ATA-133 coming shortly. Software upgradable, and it runs Linux.
They array (sans disks) runs us $3200. They even have versions that have dual fiber ports out the back.
WARNING - DO NOT purchase these with IBM GXP75 (75GB) disks like we did... we have about 80 of them that failed.
I could be mistaken, but I didn't know that one could hot swap IDE devices. I thought they didn't really take kindly to you pulling them out of a running system. That means that you end up having to power down your system each time you want to take a backup home.
Therefore 57MB required per human
You still need indexing information. You need to spec where those diffrences occour.
autopr0n is like, down and stuff.
One base always matches up with the same one. Cytozine with Guanine (CG), Atozine with the 'T' one (AT) and the reverse (GC, TA). So you only need to record half of the pair.
autopr0n is like, down and stuff.
I just built a similar setup -- 500GB for less than $2,900. However, I made some different design choices.
First of all, I wasn't too impressed with the Promise controller, so the choice for me was between the 3Ware 7850 and the Adaptec 2400A. The Adaptec had the best overall performance, but the 3Ware is close and can support 8 devices. For the hard drives, I wanted to come reasonably close to SCSI performance, so I chose the WD1000JB drive with the on-board 8MB buffer. I used a Tyan Tiger K7 with 64-bit PCI for the motherboard with dual Athlon XP (not MP) 1700+ CPU's plus 1GB ECC registered PC2100 DDR RAM. Put them all in a nice aluminum rackmount case.
I'll probably replace the motherboard with the newer Tyan with 66MhZ PCI bus in the near future and use the current one in a workstation. I'll also drop in more RAM if/when prices drop.
It's been pretty sweet so far with LVM + XFS. My backup solution is a 33GB tape drive, so I spend most of every Sat. backing up the array. Time and money permitting, I'll build a second one and look for a DLT tape library on ebay.
and if you use software RAID via win2k
PLEASE do not ever used software RAID on a production file server! Esp. Win2k's implimentation of software RAID!
We use to run a software RAID on a file server (serving only 10 macs mind you!) - Both using 4x9 gig SCSI drives (a while ago); and 4 x 30gig IDE drives
Everything runs OK until you need to replace one of the drives; then the performance whilst rebuilding absolutly sucks!
I've seen the system take over 12 hours of production time to rebuild a 90 gig software RAID; all time performance for network users absolutly sucked!
The solution; good quality hardware RAID; we now run a compaq 5200 hardware RAID card; and all compaq drives: I can pull a HDD out right now; put a new one in and have the RAID re-built without any network user noticing....
I can't remember exactly right now, but Celera's storage was something like 100TB, wasn't it? Of course when you are actually doing the sequencing and annotation of the whole damn thing, you need more space. (of course they weren't using nearly all of it, and it also included stuff to service their "subscription" clients, each one of which would of course get a significant chunk to store their stuff...)
any one have more recent (or more exact) info?
sic transit gloria mundi
Over the years we have put so much of our lives on to the PCs that we would be seriously lost without the archive.
Paul.
You are lost in a twisty maze of little standards, all different.
Paul
You are lost in a twisty maze of little standards, all different.
When you want to recover something a browser lets you traverse the directory tree and tag the files you want. Then Amanda tells you which tapes to mount to recover them. Cool!
Paul.
You are lost in a twisty maze of little standards, all different.
But if you lose your data you lose your business, and no insurance is going to cover that. Years of work goes up in smoke.
Paul.
You are lost in a twisty maze of little standards, all different.
Similarly I read something about a 1GB/hour VHS backup system about 5 years ago. With packs of 5 standard 4 hour VHS tapes costing about £5, that works out 25 pence a gigabyte - about half the price of cdr's.
However even the best video tapes will degrade very quickly compared with optical, computer tape systems, and even IDE hard drives.
As far as the 40/80 GB max on DLT, IBM and Compaq both offer larger backup solutions, LTO and Super DLT. Compaq has embraced Quantum's SDLT, which has a capacity of 110/220 GB and a transfer rate of 11 MB/sec (uncompressed). Search speed is roughly 4.5 meters/sec. IBM has embraced LTO, which uses 100/200GB tapes, has a transfer rate of 15 MB/second (uncompressed, and a >2x increase over 40/80 DLT at about 6 MB/second uncompressed), and has an on-tape chip which can hold an index of all the files on the tape for easier retrieval. The search speed on LTO is about 6 meters/sec.
Now, all of this is useless without being "generally available", so I did a little price-checking. Below are internal single-drive units (no autoloaders), and list price from manufacturers:
Compaq 40/80 DLT Drive (internal) - $3,499.00
Compaq 110/220 SDLT Drive (internal) - $5,590.00
IBM 100/200 LTO Drive (internal) - $3,999.00
Just wanted to point out that there are other options.
49 20 68 61 76 65 20 74 6F 6F 20 6D 75 63 68 20 66 72 65 65 20 74 69 6D 65 2E
Sam's clum in the area recently had a WD 100GB dide hard drive on sale for $120 after rebate. 1TB at that rate is ~$1320, plus a few hundred for the Motherboard, processor memory and extra controller cards, and a TB server is within reach of an 18-year-old who saved his paper-route money.
The real question is: how long will it take to listen to all those mp3s? At some point, extra storage just isn't practical because you can't fill it fast enough.
Do the raid controllers emulate being scsi hosts, run off OS drivers (=likely windows ones), etc?
Yes. As far as the OS is concerned, each raid controller is one big giant SCSI disk. There are no master/slave disks, each controller has 8 ports for 8 drives - again, only one drive per channel, no master/slave.
The IDE RAID controller is the thing that makes this work. It takes care of all the issues you mentioned (drive limitations, booting, speed issues, etc). But since you can only have 8 drives on a single controller, you put in multiple controllers. With 3 controllers, you can get 24 drives. At 120GB a pop, that's 2880GB. You'll lose some of that to RAID but you're still looking at close to 2TB. Then you do a software raid 0 on the 3 drives (as far as the OS is concerned, you have three huge scsi disks) and you can create one giant partition with very acceptable performance.
-Ryan, with the unoriginal sig
I guess the simple truth is that now that 100 gig drives are a couple hundred bucks, we now have the ability to store anything we reasonably could need (unless you define "Reasonable" as "I need to store DNA Sequences").
Doesn't "640k ought to be enough for anybody" suggest that Bill Gates once felt the same way about RAM?
Of course, visionary that he is [snicker!], there's no way he could have imagined desktop machines being used to edit video.
Likewise, who knows how big and bloated Clippit The Office Paperclip can get if we have 100 gigs of hard disk space to burn... maybe, one day, he'll actually bear consultation when you need information, instead of when you need something to laugh at.
I love calculus so much, I want to give it to everyone! Come, get some integration!MmMMmmm... calculus. Hours spent in the dentist's chair, with him scraping hard crusties off my teeth... And you're just giving that stuff away?
Fire and Meat. Yummy.
I'm surprised no one has mentioned this, but Promise has become more and more Linux-unfriendly lately.
There's different minor revisions of the 100Tx2 controllers; you can only tell by looking at the chip on board, I think only the last digit is different. I could not get the latest ones working with Linux at all. I ended up buying these boards under the Maxtor brand name (same units, but slightly older), which had the older chip set.
On the latest boards, it seems Promise appears to have intentionally made certain registers read only, thwarting open source driver development.
With that kind of behaviour, I'm staying away from Promise controllers, period. (I also had a hard time with their Raid5 controllers.)
Back when they were Linux-friendly, their ATA100tx2 cards were nice. But with the latest incompatible chipsets and no help from the company, forget it.
I also had some frustration with Adaptec's 2400 controller. It is *still* only supported by Adaptec under RedHat 7.0. And it has no audible alarm for drive failure, most annoying. Finally, under FreeBSD 4.3, it's performance was abysmal; there was definitely something wrong with the I2O driver working with this card. (I haven't tried 4.4 yet.)
For now, I'm just sticking with motherboard IDE controllers; far more tried and true.
-me
Love many, trust a few, do harm to none.
If you do store DNA sequencing information, make sure you only use lossless compression.
Or, for that matter, the issue for me is backup capasity again. With the advent of DVD-R (or whatever it's called today) I thought that "full backups" were going to be possible again. But now, with such vast quantities of data possible to have online and changing, backup issues again come to the fore.
Lossless compression helps, but now I'm stuck writing not 50% of a 4-Gig tape over the weekend, I have to write two or three full tapes.
As memory and disk space has become cheaper, bloat-ware uses more and more of it. I don't consider bloat-ware a good thing, but it cannot be fought any more than the monster shopping mall can be fought just because I happen to like mom and pop shops.
The difference between information and data, I guess. The next great invention I think will be the personal digital secretary, like the ones detailed by Daniel Keys Moran in his wonderful "books of continuing time", designed to sift through the impossible quantities of data yet still have the personal touch to say "Gee, that bit over there looks interesting. I think Bob would like that."
Bob-
The Ludwig von Mises Institute. The reasoning individuals economics
For about $20-30, you can get disk drive drawers that turn a 3.5" drive into a 5" removable drive. Nothing active; it's just a bunch of mounting hardware. (About $20 for the part that stays in your machine and $10/disk for the removable drawer parts.)
This makes it easy to use disk drives as backup media, which is good, because they're much faster than tape. It also makes it easy to upgrade your disk capacity when you want to do that.
Bill Stewart
New Fast-Compression-only CPR http://preview.tinyurl.com/dy575ks
Fibre Channel hardware tends to be a little too expensive for the $5k crowd. This guy is using commodity hardware, and that generally doesn't include fibre channel. Even if he bought the hardware, the driver support for something like a SAN just doesn't exist in Windows/Linux/BSD yet.
I read the internet for the articles.
You don't say 1.024k bytes, you say 1k bytes and expect the listener to know that about 1000 is exactly 1024 due to the context. If 1k bytes were always 1024 bytes, how would you interpret 14.112k bytes?
3/4" pipe is 1.050" Outside Diameter.
The 3/4" refers to an Inside Diameter of a pipe with a particular wall thickness (which may or may not still be made). Regardless of how thick the walls are, and consequently what the Inside Diameter really is, 3/4" pipe is 1.050".
IIRC there is something about a US bushel being a different volume depending on what is being measured.