Slashdot Mirror


The Amazing $5k Terabyte Array

An anonymous reader writes: "Running out of space on your local disk? How about a Terabyte array for only a few thousand dollars. This article at KCGeek.com shows how to put together 1000 Gigs of hard drive space for the cost of a few desktop computers." I could rip my entire anime collection for instant access! Rip all my CDs and still have .9 Terabytes left! Maybe Mirror Usenet! I guess the simple truth is that now that 100 gig drives are a couple hundred bucks, we now have the ability to store anything we reasonably could need (unless you define "Reasonable" as "I need to store DNA Sequences").

11 of 448 comments (clear)

  1. Actually by IAmATuringMachine! · · Score: 5, Interesting

    Actually a DNA sequence is only about 3GB for a human - you're anime DVDs might take more space, at least until you compress them. Then again, DNA should be fairly trivial to compress highly. Let Z = CA, Y = TG, .....

    --
    "Computer Science is no more about computers than astronomy is about telescopes."
    -E. W. Dijkstra
    1. Re:Actually by dNil · · Score: 5, Insightful

      You are correct that the human genome is "only about" 3 giga basepairs of sequence, but to only store that would be rather egocentric. There are as of Dec 3 2001 some 14396883064 bp in the GenBank, and the amount of sequence information still grows roughly in a exponential manner.


      Now, this will not hit the TB line anytime soon. The trouble starts if you are involved in genome sequencing. Then you need to store the raw data for all that sequence. Each some 450 bp of sequence is reconstructed from about 5 - 10 different fairly high reslution gel images (in the ballpark of 150 kBi per image). Also, recall that even short stretches of the sequence can be accompanied with a lot of annotating information, such as names and functions of genes, regualtory elements or pointers to articles explaining the experimental evidence for such. This mutiplies the storage requirement with quite a factor - nothing a neat little linux box with a huge RAID-array cannot handle though. Thats how we handle the sequencing data from Trypanosoma cruzi, by the way.

  2. Need for memory/storage by morie · · Score: 5, Funny
    I guess the simple truth is that now that 100 gig drives are a couple hundred bucks, we now have the ability to store anything we reasonably could need (unless you define "Reasonable" as "I need to store DNA Sequences"). slashdot

    Nobody should ever have need for more than 640 kB of RAM Bill Gates

    Simularities anyone?

    --
    Sig (appended to the end of comments I post, 54 chars)
  3. 3Ware Escalade IDE-RAID by rdl · · Score: 5, Interesting

    I've been using these for a long time (6200 dual-port in hardware-mirror, up to the 8-port cards for large disk configs), and they're very fast and reliable. Cheap, too.

    $500 for an 8-port 64-bit RAID controller, looking to the host like a single scsi device per logical volume, seems like the best deal available. Along with a motherboard with sufficient slots for gig-e and these cards (easy to get 4 64-bit slots...maybe you can get more with 3-4 buses), and a 4U rackmount case with 16 drive bays, and you can have 4U of rackmount storage for $5k, too.

    I've been using setups like this for clients, as well as for private file storage (divx, mp3, backups, etc.), and know of people using them for USENET news servers (one of the most demanding unix apps for reasonably priced hardware).

    It goes without saying you want a journaled file system or softupdates when you have disks this size, and ideally keep them mounted read-only, and divided into smaller partitions, whenever possible. e2fsck on a 300GB partition with hundred of open files is painful.

  4. Great! Where's the backup solution? by danimal · · Score: 5, Insightful
    I would rather spend the money on good disk storage with an integrated or integral back-up solution. Why? Well, as cool as all that storage it, what happens when it goes *poof* and you can't get it back. You're screwed.

    Yes, this is a groovy/geeky/cool solution for under your desk, but at least spend the extra dollars for a SCSI card and tape backup unit. You could fit the whole thing on a few DLT's. You can also keep incremental backups to keep the tape swapping to a minimum.

  5. A much better article, also pointed to by /. by Thagg · · Score: 5, Funny

    Check out this article referenced by slashdot on July 20 2001.

    The nice thing about this article is that the people building it at SDSC really took extreme care in getting quality components that would work together to build a reliable, solid system, and still didn't spend more than $5K for a terabyte file server. In particular, the tradeoff of disk speed vs. power consumption was extremely insightful.

    I built one of these to their spec for my company, and I couldn't be happier. It's worked flawlessly since then. It's not clear if the Escalade boards are still available -- 3ware had said that they were discontinuing them, but they still appear to be for sale.

    thad

    --
    I love Mondays. On a Monday, anything is possible.
  6. 2TB for $8300 by GigsVT · · Score: 5, Interesting

    Inspired by Slashdot's earlier story that was nearly identical, and with the help of Peter Ashford from ACCS, we built two servers, both with capacities well over a TB, for around $8000 each. They have the capacity to expand to 3TB if need be.

    Story here

    As far as performance:
    (from my memory)
    EXT3: About 16MB/Sec block write, 45MB/sec block read
    ReiserFS: About 20MB/sec block write, 130MB/Sec block read (that's no typo).
    XFS: About 30MB/sec block write, 85MB/sec block read.

    It seems that file system plays a large role in performance. The arrays are three RAID5 in hardware using Linux software RAID0 on top of the RAID5 arrays to tie them together.

    IDE RAID controllers are 3ware Escalade 7810. Write performance can be greatly increased by using 7850 cards that have more cache.

    We stuck with XFS, Reiserfs had a bigfile bug, files created over 2GB would lock up the computer basically. XFS in general seemed much more mature, reiserfs seems more like someone's college thesis project, that they never cleaned up to be production grade.

    We experimented with different RAID0 stripe sizes, the hardware RAID5 stripe size is fixed at 64k, there are 7 active disks in each array and one hot spare. Stripe size tweaking seemed to mostly trade off read for write speed, within a certain range of values, with a taper off in performance at either extreme, (down around 8k stripes, or over 1024k stripes)

    We eventually went with 1024k stripes. That is what the benchmarks above reflect. The variance in file system performance could very well be due to interactions with stripe size, but there seemed to be common themes (reiser always read fastest no matter what stripe, XFS was always better at writes)

    I have been in so many arguments with SCSI zealots on here over this RAID... I wish people would understand what price/performance ratio means. IDE isn't a superior technology, but every now and then, it is the right tool for the job, when price is a goal too.

    --
    I've had enough abrasive sigs. Kittens are cute and fuzzy.
  7. Re:Great! Where's the backup solution? by Paul+Johnson · · Score: 5, Insightful
    Absolutely. And to those who say "Just build another one" / "RAID doesn't need backup", I have only one thing to say:

    FIRE!

    Any serious data store needs to include a backup system which allows for copies off-site. Fire is the obvious risk of course, but floods, vandalism and lightning strikes are all possibilities.

    AFAIK the only generally available tape backup for something this big is DLT, which IIRC can now do around 40GB per tape before compression. With the 2:1 compression usually quoted thats 80GB per tape, or around 13-14 tapes for a full backup. So you really need about 30 tapes for a double cycle, and maybe more if lots of the data is non-compressible (like movies). But this stuff ain't cheap. DLT drives start at around £1000 and the tapes cost £55 each. So thats around £2500 = $4200 to back this beastie up.

    Having said that, the possibility of using hot-swappable IDE drives as backup devices is intriguing. Just point your backup program at /dev/hdx3 or whatever. One big advantage is that if your tape drive gets cooked in the server-room fire you don't have the risk of tapes that can only be read on the drive that wrote them. A Seagate 5400RPM 60GB drive costs £110, which is only a third more per megabyte than a bare DLT tape. Two cycles-worth of backup (34 drives) would be £3,700. And you can probably do better by shopping around. For servers with only a few hundred GB on line this might well be more cost-effective than buying a DLT drive.

    We use Amanda to do backups here. Its a useful program, but it can't back up a partition bigger than a tape. So you need to think carefully about your partition strategy. (Side note: you can use tar rather than dump to break up over-large partitions, but its still a pain).

    Suddenly that terabyte starts looking a bit more expensive.

    Paul.

    --
    You are lost in a twisty maze of little standards, all different.
  8. Ouch! $160GB disks! by jandrese · · Score: 5, Insightful

    Ironically, I just built something very similar to this a few weeks ago (it runs great BTW), but I spent <$1500US on all the components. The biggest thing you have to watch out for is the Hard Drives. I went for the ones with the best bang/buck ratio at the time (Maxtor 80GB 5400RPM drives). This let me build a system with well over 1/2 a Terabyte of usable space at a fraction of the cost. Additionally, the slower drives require less power and less cooling, making them easier to fit in a standard full tower case with a merely beefy (as opposed to server-class) power supply. I think the processor requirements he stated were a little overboard as well. I've found that disk access tends to be limited by the PCI bus (it doesn't help that I used an older motherboard with 33 Mhz 32bit PCI), especially on writes where you can spread data across the write cache on the drives. Be careful when you build an array like this, ATA *hates* having access to both a master and a slave drive at the same time. Be sure to avoid having two disks on the same plex on the same controller. This was natural for me fortunatly, since I was building two plexes, a "backup" and a "media" plex.

    A final word of warning: Promise ATA100 TX2 controllers may look like a natural choice for a server like this, but they only support UDMA on up to 8 drives at once, and Promise's tech support only supports a maximum of 1 (one!) of their cards in any system.

    --

    I read the internet for the articles.
  9. Mirror Usenet? by Matey-O · · Score: 5, Funny
    Maybe Mirror Usenet!
    Well, exclude the binaries and I can mirror USENET on my Palm III!
    --
    "Draco dormiens nunquam titillandus."
  10. Re:RAID by edmudama · · Score: 5, Insightful

    > He's trying to use software raid, but he has 4
    > Promise FastTrak 100TX2 raid controllers. WTF?
    > First off, each of those cards supports 4
    > drives on 2 channels... Why does he need 4
    > cards when he only has 8 drives? He only needs
    > 2 cards.

    I'm a firmware engineer for Maxtor... if you're going for performance, you want 1 drive on each bus, and you don't want to use the motherboard connectors. With 2 drives on each bus, you are limiting the average transfer rate out of cache to 50% of the max transfer rate. On a modern drive with their 60-65MB/sec channel rates, you cannot stream sequentially off of 2 drives without saturating an ATA-100 cable. Even running ATA-133 won't help starting a year from now.

    Additionally, every bios I have looked at sucks in terms of performance. In most cases they have small DMA FIFOs which stutter the pipe during high speed transfers -- they literally hang the DMA lines while they empty their fifo into memory, then come back and grab another 8 words or something sad. They also tend to be very poor managers of the IRQ line. This causes delays at times when your hard drive could be giving you more data, but the host hasn't gotten around to asking for it yet.

    All the 3rd party cards have like 2Kbyte FIFOs which prevents any overrun from occurring, which alone is quite helpful in high bandwidth applications.

    The cards we include with our drives are in the lower end of Promise's spectrum... you can spend more and get more performance if you want to, which is what I suspect the author of the original article did.

    --eric

    --
    More data, damnit!