Bulk Data Storage For The Common Man?
Vigyaan writes "Lately, I have been looking into different bulk data storage options available to a common man. My work
depends on generating, storing and analyzing a
large amount of data -- averaging about 1 TB per
month. I would like to have a storage system which is automated, fast, reliable
and most importantly does not cost the price of an
eye. Right now, I have a 4 node Linux cluster with
10 large hard disks (total capacity 1.6 TB); data storage roughly costs
about $0.60/GB (excluding the cost of PC
hardware). But long term storage is painful -- DVDs
cost about $0.10-$0.15/GB but takes too much human time
and leaving data on hard disks makes me nervous
because of possible failures. RAID is a possibility, but it increases the cost significantly. I was wondering, if
Slashdot readers have any recommendations for a
cheap automated way to store and retrieve data."
I'll send you a couple.
http://www.fsckin.com/
Floppy disks.
"..hard disks makes me nervous because of possible failures." everything can possibly fail. I do not know the failure rate of DVDs, but I suspect it is very low and comparable to other mediums. Anyone have a figure on this?
You're always going to get a better rate with Hard drives but you're going to be prone to failure.
If you buy them in bulk you can save.
Burning DVDs is going to take you forever and drive you nuts.
Find a hotswappable set of drives and use that for your offline backups. Use a raid for your current backups.
"Not my manner of thinking but the manner of thinking of others has been the source of my unhappiness." - M
Hook yourself up with as many gmail accounts as you can. Email yourself 1GB chunks of data whenever you need to back up your stuff.
Blu ray based dvd burners.
Those will be sweet =)
The sea changes color, but the sea does not change.
PRINTSCREEN should do the trick.
Logic, macros, and more
On the subject of RAID please remember, if it's spinning, it ain't a backup!
For long term storage, how do you feel about firewire drives? Maybe not as cheap as you'd like, but you can get them in >160 gig flavors, plus you can hook them up to just about anything. Once you do the backup, which'd be a simple copy and paste, you can just unplug the drive and store it in a safe or something.
Again, I'm not sure if that's as cheap as you'd want, but that's a solution I came up with for a similar problem. My company's going to be 3D rendering some stuff that could end up eating 50 megabytes a frame. (Extra data is stored for future refinement... I can go into detail if I've piqued anybody's curiosity.) We can't afford to lose this data, so the Firewire drive approach is what we're considering right now.
"Derp de derp."
I was wondering, if Slashdot readers have any recommendations for a cheap automated way to store and retrieve data."
Although the good ones don't come cheap. I guess this another case of "pick any two."
KFG
Short of launching his own space probe, the only way for this guy to consume a TB a month of storage is a serious porn habit. Just post your 'content' on Edonkey and it will be available when you 'need' it. You likely only watch them once anyway.
SD
âoeWho knew something as harmless as willful ignorance could end up having real consequences?â
Buy some inexpensive IDE drives with high storage capacity and use a software raid solution. What kind of budget do you have anyway?
Iomega has a somewhat new backup solution out called a rev drive. Its quite a bit like a hard drive, but removable. Mine holds 90GB compressed, and the transfer speed isn't all that bad. I haven't had the opportunity to test it on Linux, So I can't say it will work with your setup. the drive is under 300USD, and about 50USD per cartridge.
I got a couple of drawers of old floppy disks. $10 takes 'em. Plenty of bulk.
The Sony "lifetime" warranty may still be good on them too!
Ahh the large amount of data that has X value versus a storage solution...
If your data is worth $20,000.00 then a $2000.00 solution is dirt cheap.
what is your data worth? that is where you need to start and then look at the 10-30% of the data's value to start looking at how must to spend on it's storage.
If 1 month's data was lost forever, how much money would it cost the company? that is your actual $ amount that you should be shopping at.
and that is how I got the company to buy a $20,000.00 1000 tape DLT jukebox.
my data is worth over $100,000 a month and is much lower than yours is size.
That is where you need to start. Justify your storage costs by figureing out what it is worth to begin with.
Do not look at laser with remaining good eye.
You build a tower of dual-layer DVD burners. Maybe 8. That'd give you 8x9, 72 gigs per loading session. Although, aren't those 50 gig discs supposed to be out soon? 20 of those would be a TB.
I'm not sure if you've already thought of this, but would it be possible to remove any redundant data, then compress what is left, or would the data need to be left intact and as-is, so that it can be accessed without any fuss should the need arise?
Hope be with ye,
Cyan
Depending on your budget, the appropriate thing to do may be to get an automated DVD burning system to do scheduled incremental backups in duplicate. We used to do that with CDs at an ISP I used to work at. It's unfortunately difficult to search for while not getting people pirating movies, but this is the first thing I found on Google; doubtless there's better out there.
StoneCypher is Full of BS
I haven't seen anyone mention magnetic tape yet. I'm sure it has its drawbacks too, but considering its still widely used for backup purposes in a commericial environment, it can't be too bad. Especially depending on how much a cartridge can hold. Its not the cheapest, but it might be something to look into.
Look for an LTO gen 1 or SDLT220/320 on ebay, with a SCSI connection (some of them are fibre, and I assume you don't want to go there!). Don't forget to pick up some tapes. In general, this sounds like it would work if you plan on doing this for a while, and can leverage the initial investment over months or years.
Capacities are (for the cost of a sub $50 tape):
- LTO1: 100 GB uncompressed
- LTO2: 200 GB uncompressed
- SDLT220: 110 GB uncompressed
- SDLT320: 160 GB uncompressed
If your data is particularly ammenable to compression (i.e. database data) you could easily get 3 or 4 to 1 compression with these drives without sinking your CPU utilization.
You want it fast, cheap, reliable, easy, and now, eh? Good luck with that.... Sounds like a request from the PHB...
Blessed are the pessimists, for they have made backups.
It does not sound like your needs are anywhere near that of the 'common man'. You sound more like a power user to me. Somethimes you have to pay for heavy-duty storage as the cost of doing business.
Aren't Write-Once Read-Many solutions ideal for this kind of thing?
If you're not looking for permanent backups, the per-media cost may be prohibitive though.
Tape Drives - Probably the cheapest way to store large amounts of information. The only drawback is that they aren't fast. However, If your harddrives are large enough to hold the data you are currently working on and tapes are used exclusively for backup then a speed problem shouldn't be . . . a problem.
That's a LOT of pr0n
Best Buy can have you arrested
How much porn can one store?
If you don't have the patience for DVD backups (neither do I), then you're pretty much stuck with RAID. So buck up, spend the extra cash, and setup a storage box or two on the network with one or two terabytes in each. I have a branch of my network setup on gigabit, one box has 250 GB of storage on RAID 1 across two 250 GB (this one's for video projects), the other has 160 GB in RAID 0 (my learning system). Works fine and easy as hell to setup. If I need to add storage I can either add some drives or just add another box. I've thought about using GFS, but I don't know enough about it to implement it, yet. Anyone here currently using GFS?
I also reply below your current threshold.
First off, if you aren't already compressing that data, start. You may be able to cut the size down dramatically using compression.
Then backup using tapes just like every other place that has to do backups. Generally do full backups once a week and incremental ones nightly or whatever is necessary based on the data you are working with.
i am responsible for providing storage solutions for a mid-sized content creation company which, through version archiving, accumulates near 1-200 GB per day. they require access to their media backups on a rolling basis, so tapes are not an option.
i have found that a Teutonium cluster of 6.5 TB Spongedrives (either Cray or SecreTech are fine) fits the bill nicely. housed in a 15-unit rack server, the amoeba-shaped drives utilize BioLas technology to store data on 6-dimensional Moebius Cilia for a slick seek time of 0.00 ms.
a cluster costs about $45,000 USD but the price should come down in 2004 Q4 when SecreTech launches their new 40-platter blackholium SCSI's.
If I could make this sig kill you, I would.
All forms of media/backups have their own drawbacks... but some aren't as bad as others, and the others often are more accessable.
;-)
Tape: Tapes break, they wear, they have dropouts, take a while to back everything up, can't always access files if you just want to restore something (Different methods vary, folks)... but ultimately, it's cheap when you use DAT because they're a common media. Swap the tapes twice as often (and throw old ones out) if you're paranoid about tape related failures.
Hard Drive: Most common form of backup I see now, mainly for the 1:1 size factor. Yeah, drives fail, too. Sometimes you have a pretty good warning when this is going to happen, sometimes you don't. (My 13GB Maxtor and 40GB IBM Deathstar drives both went *pfft* on reboot.) Get enough of them at once, you could swap out the logic boards if one does fry out. Ultimately, RAID or just simple 1:1 mirroring is probably the most efficient and easy method. Accessing bits and pieces is also easiest under this method. I personally just use an external USB2 case with a 120GB drive in it. Everything I want to back up goes on that drive, and then eventually... DVDRs. I turn off the drive when I don't need it... hopefully prolonging the life of it when I need it most.
DVDR: Not anymore. If we had these new-fangled DVDR discs (+ or -) say... when 2 to 6GB drives were common.... sure... But in addition to hard drives, recovering selective files is easy under this method too... Unless you use a backup program that crunches everything together on the disc in some spanning format. Burn times can be tedious... but it's not bad if you consider the overall amount of data you're putting on the disc. Cheaper than quality-brand name CDRs, though, in terms of price per mega/gigabyte. Only an idiot would trust $0.01-per-disc spindles for long-term backups. Even the longevity of DVDR has yet to be seen...
CDR: I'm not going to bother.
Network: Well, still relies on hard drives and other components... but good if you don't want to saddle one room with a ton of boxes. Simply for space and efficiency... external drive is probably better anyway.
Old fashioned method: Print everything out and keep it in a filing cabinet somewhere. You could always OCRA the stuff later.
What kind of idiot moderator is handling this article/item - giving Scores of "1" to reasonable suggestions and "2" for stupid/funny ideas like someones posting about his drawer full of floppies that he's willing to sell???
The drive will run about $4000, but the tapes are only around $0.20/GB assuming a 1.5:1 compression ratio. And keeping that assumption, 1 TB of data should only take 3 200 GB native tapes per month, so swapping wouldn't be so bad with the single tape drive. An autoloading library would be significantly more expensive, but if you really need automation, that's the way to go.
echo 1tb.txt > /dev/lprn0
The reason you won't find such things on the cheap is because the average person with a PC doesn't even know what a GB is. He simply goes into the store, the sly salesman says "oh, what do you need it for," and then says "well 60-80 gb should be all you ever need."
Now, contrast that to me - my friends shit when they hear I have a 250 gb drive and a 120 gb drive, as well as an extra 60 gb on a networked machine. They can't fathom ever needing that much space. I know that's probably a pittance by Slashdot standards, but it's true :(
The age old problem I've had with RAID is that:
- If the machine gets stolen, there's no backup.
- If the RAID controller shits out and takes a couple of drives with it (uncommon, but has happened), there's no backup.
- If there is a physical disaster and the machines spontaneously combust then their is no backup.
I don't know if there is a cheap solution for what you want...
If anything I would say plugging in hop swap drives just to backup in to your machines, and then take them offsite when the backup is done, as well as RAID if you can afford it....
I believe that it would be worth the money to invest in a duplication tower at this point, the ones with the mechanized arms, preferably one you could hook up to your computer.
Look at what the rest of the corporate world uses for large scale storage management. It is still ruled by Tape drives.
I don't know how much an eye goes for at the moment, but if you can spring for a Super DLT drive you'll get up 320GB (Compressed) for each tape.
It all comes down to the Quality:Cost:Time triangle.
I use bioneural gel packs at a cost of $0.04 per teraquad. What is this hard drive of which you speak?
Well it isn't going to happen, you -HAVE- to drop change for what you want, as a back-up solution. There really isn't any way around that.
There are many plausible suggestions though that won't break the bank totally. One of course is raid as has been mentioned and will be a few times I imagine. But you may also wish to look into hot swappable solutions.
USB 1.1/2.0, Firewire and SATA are all relatively cheap storage solutions if you shop around (Pricewatch is a good place if you are willing.). You can convert IDE drives to USB with an IDE>USB box, and buy a few decent 200 gig hard drive for around $120~$150.
Another could be buy a SATA card and some SATA drive and plug them into the front of your case, SATA 2.0 is hot swappable and the hard drive prices have come down into a decent range.
Now another solution is buy used SCSI, and raid those together, reliable fast and not overly expensive if you don't want 15k RPM.
Another idea is buy another box, place a few hard drives in it, and use that box as your back up, but it's a hassle more so then the rest, but as a plus you can place it somewhere else as an offsite backup and all you have to do is plug it in and your work is ready to go (from the place you most recently backed up.)
With incremental back-ups it may not be too bad.
Then again you are moving terabytes.
Paper.
I was wondering, if Slashdot readers have any recommendations for a cheap automated way to store and retrieve data.
Remember the data in your brain, that is much cheaper than buying disks IMO.
Quantum hacker.
That's easy. Name your files "Hot Lesbian 4-some[0000-1024].mpg" and make it available to the P2P sharing community. Every horny male in the country will then help to download and distribute your data across the country and when you want file number 5 back, just search for Hot Lesbian 4-some0005.mpg and bingo, 10 available good copys ready for re-use. ..
Buy a tape drive.
If you have money burning a hole in your pocket, but a tape changer, so you don't have to change the tapes.
How much does an eye cost?
... and program it as a repeater.
It's about 90 minutes away, so at 250 Kbps that's over one terabit in storage on the way out there, and another terabit on the way back.
Worst-case access latency is about three hours, though. Maybe the hard disks are a better idea.
If you send your probe^H^H^H^H^H repeater to Alpha Centauri, you'll get more than 20,000 times the storage capacity.
backups? I upload to an ftp server and let the world mirror it for me
You can get cheap computers from the trash, donations, bulk, etc. You can use that cluster to mirror your data once or twice. I don't know what data you have, but if you have the same data on more than 1 different hard drive, you can be rest assured it will be fine. Or you can just print it all!
The stockmarket is backed up to three (or more?) seperate locations. Look into NVRAM (e.g. flash media) or a cluster with all those hard drives linked together, with a constant backup. With the builtin IDE controller on most motherboards, you can hook up to 4 Hard Disk Drives. If you add SATA, RAID, SCSI, and IDE, you can have lots of hardrives on one machine!
You could also rotate hard drives, so they arent constantly used (making the whole system last a LOT longer!) or replace the drives that are about to fail (which would be at least in 3 years!). Most Hard Drives could probably handle 5 to 10 years no prob (maybe even 20 if they are rotated!).
It all depends on what you have and what you want to do!
mysql>SELECT * FROM users WHERE clue > 0
0 Rows Returned
Compaq's StorageWorks SSL2020 AIT library is a single or dual drive library that offers 2 terabytes (2:1 compression) of storage in a 4U tabletop or rack configuration. Library modules can be stacked five high for up to 10 terabytes of storage within a 20U space. The SSL2020 is qualified with Windows NT and Windows 2000, NetWare, Tru64 UNIX and OpenVMS operating systems, as well as Compaq ProLiant and Alpha-Server product lines.
http://www.aittape.com/
Build yourself a cluster of cheap boxes with cheap IDE disks and replicate your data across them. Because the data is replicated across your cluster, no need for backups or RAID.
How many months at 1TB/month do you require access to online? After you are done with data can you discard it or do you need it archived? What is the cost of losing your data set at any given time? In what manner do you expect to access it (read/write mixture and sizes plus aggregate throughput and number of client connections). The answers to these questions could cause the cost of a solution to vary but a couple orders of magnitude.
So, you can put your data on 4-5 HD's, 10 tapes or 232 DVD's per month. The Cost of doing so will be about $500 per month for the tapes or HD's and $50 for the DVD's (assuming your time cost $0)
At work, we had a need to keep a few TB of data online permanently, so we purchased a few NexSAN ATABeast's. At $50,000 for 10TB of usable storage ($5/GB), they may be a bit out of your price range. The advantage is that you can hold almost a years worth of data and it is protected by RAID5. It also makes management a lot easier, since it is very difficult to mount 42 300G drives in a single chassis (and it takes only 4U of rack space).
On the low end, NexSAN has the ATABoy2 or ATABaby (2TB or 1TB) for the $8-$15K range. This will let you hold a months worth of data
On the high end, You have EMC disk arrays (Think upwards or $20+/GB for the 'cheap' stuff from them.
Overall, if you have 1TB per month, you need to either a) get a grant to fund your work, b) hire somebody to swap DVD's for you or b) seriously rethink your data generation.
Any of the "cheap" storage methods have serious drawbacks, and the low cost ones are, well, not so low cost if $15,000 sounds like a lot of money to you.
otherwise, good luck
If you are more interested in volume than speed, then the emphasis should be on the 'ID' part of RAID. Inexpensive Disks. If you used 160GB Drives, which appear to have the best bang for your buck at the moment, and put 6 (yes 6!) in a pc. Just use any old cheap pc (I use 200-400Mhz PII)
:-)
Run the disks RAID 5 and you will get about 800GB of storage for $600 . Now get two cheap ata100 cards so you have a total of 6 channels, and mount each drive as a master on each channel. Build a 2gb root partition on the first disk (mirror it if you want) and then set the rest of the space up as a huge raid 5 array.
Et Voila cheap, big server. To archive data, turn off pc, and throw into attic
- Several 80-gig drives
- 1 removable IDE hard drive enclosure
- 1 fireproof safe, preferably bolted to the ground or kept off-site (for the particularly paranoid)
--- Robert Strickland
Compaq's StorageWorks SSL2020 AIT library is a single or dual drive library that offers 2 terabytes (2:1 compression) of storage in a 4U tabletop or rack configuration. Library modules can be stacked five high for up to 10 terabytes of storage within a 20U space. The SSL2020 is qualified with Windows NT and Windows 2000, NetWare, Tru64 UNIX and OpenVMS operating systems, as well as Compaq ProLiant and Alpha-Server product lines.
http://www.aittape.com/hewlett-packard.html
No I don't work for them, but I work with their H/W & their support is second to none. You can get a recon'd R100 for reasonable money. New, they cost ~$100k for 12TB.
They aren't File servers, as they aren't designed for lots of clients. But they are perfect for storing a 'live' backup of data ! They can the Technology Nearstore, its designed to sit between your File servers & your Tape backups
Well, I have a cd changer for computers made by NSM... It's scsi (comes with a 2x reader origionaly) so all you gotta do is find a scsi dvd burner (or a long enough ide cable and convert it, since the motors are all powered by a com port anyway) and replace thd drive, (or like in my case, a cd-rw - had the drive for a while, so at the time a dvd burner would have cost to much) then you have 100 dvd's you can burn data to automatically, and when those are full just swap them out for new ones.
Now the problem is, you can only get 430gig's out of one changer using single layer dvd's... Double would bring you to 970gig's per changer.
Assuming you can get the unit for 100 bucks or so, and the dvd drive costing 100 (69 bucks at frys).. Then you have a 200 dollar backup unit that can store 430gigs of information onto dvd's
That amount of data has to be a company, not a common man.
What are you retension requirements for this data? 2 months? 1 year? Forever? How often do you need to access this backed up data?
If you requirements are rare restores, then I'd go with tape. You can back up to a tape drive about as fast as a ATA disk and you can move the tapes offsite. Restores w/tape are a bit more painful, but then a tape isn't as delicate as a disk drive.
depending on the value of your data, you should try having a nice 4*400Go SATA in raid 5 *2, possibly using a distributed file system for redundancy...
/. story on a rackable Petabyte storage system
Not the cheapest, but fast, simple and saves you the unholy pleasure of having 2-3 DLT boxes to archive/cycle each month...
You already have a linux cluster, so implementing a distributed file system, or even simply a nightly incremental mirror to the target server if you can afford losing one day work/computation...
It would help if you told us what sort of data you work with... from databases and to automated telescope tracking system, both need large amount of storage, but you won't need the same system array for each...
I seem to remember a
You don't need to go to the Petabyte capacuty but you will find some interesting comments on filesystems, disk virtualisation, 1U rack providers and so on....so a 1 Terabyte rack server is definetly possible...
Good luck...
It takes 40+ muscles to frown, but only four to extend your arm and bitchslap the motherfucker
I actually use a DLT with autoloader I got off ebay for under $200. I then bought a lot of used DLT tapes (100) and use them to backup my Video and DVD projects. It is great because when I fill my offline storage (about 1TB) I just fire up the backup software and get the old DLT going overnight. It is done by morning and the shelf life for those tapes is about 20 years.
Why would you even think that?
I'm not sure what your budget is but if your like me you want something that complies to standards so it will be around, is cheap and effective. For this I would have to recommend an Ultrium tape backup drive. The drive is standards based (google it) and the tapes are dirt cheap a 200/400 gb tape pulls up for $55. If you figure (hardware compression) 250gb of storage per tape then it will cost just $.22/gigabyte. The problem is that the drive itself is listing for about $2600, not exactly cheap but it's guaranteed to be backwards compatible with future lto standards and the media is as cheap as you could possible ask for. One more thought, look into an LTO Gen 1 solution (100/200) for a cheap drive, cost per gigabyte is roughly the same, it will just take more swapping.
One company that provides massive online backup and storage at reasonable prices is Streamload. You might want to check them out.
Opera, Proxomitron-Grypen,GPG 0x0A1C6EE3
Only on Slashdot would they start talking about huge storage arrays and title it "for the common man"
A DVD-R jukebox can give you 200 DVDs at once. That's $3600 (drive/changer) + $268 (1000 DVD-Rs), for (1000*4.7GB) 4.7TB@$4000, or $1.18:GB. That's almost double your HD cost, but you'd need at least another host PC, and multiple controllers for the 16HD RAID, which is probably another $1000. And another $268 buys you another 4-5 months storage, so by next April you're down to $0.14:GB; in a year you're at $0.12:GB. A shelf of 200-disc "CD" books will hold your archives, 1 book per carousel for "fast" retrieval. Backup all your DVDs offsite at $0.27:GB. As DVD-R prices fall over time, you're probably looking at something like $0.05:GB, probably less than even plummeting HD prices. And the DVDs (especially with the cheap backups) are much more reliable, especially over 10 years, than the HDs. If you are looking at 10 year archive, at $80:month in DVDs, for 29% more money you can add a second host PC/changer set, left in their boxes, in case the original PC/changers fail.
--
make install -not war
Makes an cheap, fast way to put lots of data onto lots of hard drives. Using one of these bad boys means no extra money is spent on drive enclosures, cases etc. You only buy raw standard hard drives. Excellent if it's only backup, and you do not need lots of access. This solution is not automated however.
Hard drives are prone to failure. I was thinking of buying at least 2 drives of different brands to mirror, storing them in separate locations in sealed, air tight containers at just the right humidity/temperature. Also I think a disk check every 6 months or year would be necessary, and if any problems are found, replace the disk with another.
One beauty with this method is you only need to pay for disk space as you need it, and hard drives may still get much bigger. I was going to buy drives at the lowest cost/megabyte which at the moment is 160GB drives.
I would love to find more information on the physical storage of hard drives, especially how long they would be expected to last without use - months? years?
I would hope that if you are working with a TB of data, the value of that data is pretty high . . .
.. )
Promise SX6000 = $255.95. (6) 200GB IDE drives in a Raid 5 = $624.95
If you had a separate boot drive from the SX6000, you could just bring the system down for a couple hour maintenance once a month and slam all the drives out and put fresh ones in.
Just keep buying new 200GB drives anymore and shelf the old ones (or if its *really* valuable and your home firesafe isn't enough, pay Iron mountain or someone to keep it).
There aren't hidden labor costs outside of those two hours it takes to setup a new array every month (DVD's are about 60 bucks a month for a TB, with a hundred or so for a drive (which *will* need to be replaced occasionally if you are burning that much) but you'll spend hours and hours just dealing with the swap outs and breaking up your data .
If you don't have to keep the TB of data after a month or three, then your price gets even cheaper after you invest in your initial hard drive media sets . . . and you can put all the drives in hot swap chassis to further minimize your time dealing with the issue.
Of course this is all moot if your 1TB of data isn't valuable enough to invest 600 a month in . . .
For cheapest backup possible, just use harddrives. Create a software raid5, backup to it, then powerdown and remove the drives to someplace safe. You'll also be able to recover the drives on any machine that can boot linux.
Hotswap or removable drive cages can be pricy, and aren't designed for lots of swap-ins and outs, so I'd just buy new IDE or SATA cables every few backups. If you're using the same set of drives multiple times, then leave the cables connected as not to wear out the drive's pins.
Eventually you'll wear out the ide connectors on the motherboard, so use one of those cheap ide adapter cards and replace as needed. Or use a cheap motherboard.
It's too labor intensive to be in the same realm of solutions as a nightly tape backup, but not nearly so much as CD or DVD backups. It's easy enough to do once or twice a month.
If you're cheap, you're not after disaster recovery, you want disaster mitigation.
Explain this to me, I can buy a 200 disc cd changer for $100 bucks, but the same thing with a burner (cd/dvd) runs thousands of dollars. Isn't there any company out there that can do it cheaper?
Heck, I remember a slashdot article about a guy who built one out of WOOD!
This would be a great solution for short term recovery storage. Just keep a stack of CD's or DVD's ready, and it will load them in and burn them all automatically.
On a site note, it would be great for converting a 400 disc cd collection into MP3's.
It has a few bonuses going for it.
First, you get the speed of RAID drives, and you only lose 1 drive of your entire set. (So the more drives you have, the better it gets)
Of course, if you're going to go with RAID, you need to duplicate that data somewhere else, just to be safe.
Along these lines I'd recommend another RAID 5 in (at minimum) another location. With RAID 5 it also covers the eventual loss of a drive from your array, and can rebuild it. Of course, you can now spend as much or as little you want on your RAID 5 setup, so the costs will vary widely.
Then again, you often get what you pay for.
Using this method you can store surprisingly large amounts of data very cheaply. It's also protected from MS Windows viruses, but perhaps not all viruses.
Lately, I have been looking into different bulk data storage options available to a common man. My work depends on generating, storing and analyzing a large amount of data -- averaging about 1 TB per month.
Wrong assumption, 1 TB per month is not a common man problem.
bash$
So spend some time and money on making sure it is safe!
Even if you had a Bluray DVD burner, that would be 20 discs you'd have to burn to backup 1TB. So that is out of the question.
Really what I'd set up is:
1) Local: 1TB of hard drive space on IDE RAID (mirrored). An 8-port SATA controller would do, with 8 250GB SATA drives.
2) GigE ethernet to somewhere else (got a separate garage?), or something faster if affordable
3) A file server there with the same config for "off-site" backup. Should your PC catch fire and melt, you'll still have your data. Yeah, backing up 1TB of data over GigE will take around 15000 seconds a go, or 4 hours or so. That's okay overnight, and better than swapping 50 BluDiscs or tapes and then carrying them out there.
Try Exabyte - for hardcore tape storage.
http://www.exabyte.com/products/prodviews.cfm
I think you can store about 1.6 TB on a single tape or similar, but check them out. Tape drives have come a long way from old SCSI DATs transferring 20meg a minute. And they're fully automated and although there's an outlay cost for the tape drive, over time the cost per gigabyte for storage will be lower than hard drives.
If you have a security company do patrols of your office you can get them to take the tape offsite with them after nightly backups for added security... etc etc.
Putting syrup in coffee is some form of blasphemy.
mine say AOL on 'em.
(Why is it I don't throw them away?)
"Kittens give Morbo gas!"
RIAD is a possibility?
What does that mean? How does RAID by itself help someone lookinf for an automated storage system? RAID is just a way to add redundancy, speed, or both. It isn't a magic bullet to increase capacities.
-bZj
.sig
Do you simply need a snapshot to restore the most recent point in history, or do you need the ability to restore some point in history?
Consider this scenario where I work:
People have documents for once-a-year reports. They need to make a new one for all the stuff that happened this year, and they need the final from the previous two years. However, a virus (or new-hire) went through and randomly corrupted and deleted several documents some months ago. A mirrored system can quickly give you the data exactly as it was yesterday, but that would still be bad data.
If you need to restore/access things as they were 6 months ago, or a year ago, you gotta have tapes or DVD's. Affordable solution are too slow, that is it takes longer to do the backup than allowable downtime (suppose DVD's take 20 hours to backup 8 hours of work - it won't fit in a day)
There's many options. One thing you can do is mirror your system onto cheap disks which gives you a quick snapshot. Then use tapes/DVD to do periodic full and incremental backups of the mirrored system.
The you can use a cheap backup system and hope the tapes don't go bad. (They will).
Print your data with tiny bar code or ocr font. Don't forget to number pages.
Several months later, you can try to sell some outdated backups to geeks as a wallpapers.
There you are, staring at me again.
The real nuisance with backups is if you need to keep them for several generations of hardware/software. In this case you need to keep a copy of everything you need to read them. Firewire is probably good for another few years before they start to fade into obscurity (unless it suddenly blossoms into popularity, reversing the apparent trend) but my guess at this point is that your new PC ten years from now won't support them in any way. Jump backup technologies before they become obsolete, but don't jump too soon either -- wait until the technology is stable and has shown that it is likely to have a long life-span.
Am I part of the core demographic for Swedish Fish?
It all depends on what level of recoverability you want. Here is the enterprise view:
1. Software: We use Tivoli (IBM) this software and its cost (Maintenance) is high because IBM guarantees recoverability of data it stores.
2. Hardware: Tivoli caches backup data on disk(compressed) preferably RAID, then after a period of time or other criteria it moves it off to tape, using a tape library is preferred since it is automated. Two tapes are created so that one is shipped off-site. Off-site storage is required for recoverability.
3. A second set of hardware and software is required if you want to insure that you can recover in a reasonable time period.
Cost is directly proportional to how much of each component you wish to utilize, which then determines whether you can recover at all and in what time period.
Cost is exponential for these components and is determined also by size of data that must be recovered, either a small amount quickly (seconds or minutes for small amounts of data to hours or days for large amounts of data).
Remember, another high cost option is to mirror the data at two locations that can be accessed within seconds if one location fails.
Cost is high if recoverability and recoverability time period is important. Cost is low if you do not need the data back quickly or do not care if all of it is available ever again.
If this is a job critical function *Business depends on it* then the higher cost options are required. Sorry. If you want recoverability you pay more for each degree of recoverability...there are multiple layers to be considered.
This is the art and theory of Contingency Planning for data centers. Oops, I am letting out the secrets of the professionals... wait I am one, I do this sort of planning for a living. Cost benefit analysis is a requirement in this sort of thing. Also, consider if there are other requirements for recoverability such as legal, corporate intellectual property, etc. These all factor into the cost one must be willing to pay for recoverability.
As far as DVDs go, what about the new blue-ray dics coming out? I thought ~.5TB was possible per single sided, dual-layered disc.
I choose b ... HEY! That's a trick question!
Meh.
After looking around for a couple of months I bought myself a 12 Drive SATA hotswappable case from from Rackmountpro.com and bought 12 250GB SATA drives and a 3ware card. It works like a charm. Speed is great and I think the price of the 250GB drives are right (WD2500JD $165/Piece).
For the longest time I was looking for a CD/DVD sort of Jukebox where I could have like 200-400CDs (Real CDs/DVDs, not ISOs) but I had no luck finding anything. Even if I had some kinda program telling the jukebox to change the CD/DVD would be great, but googled it for a while and came back empty.
I am still looking for a jukebox solution for archivals puposes if anybody has any idea where I can get/make this.Dear aunt, let's set so double the killer delete select all
If your "work" (as in food, housing and income) requires this kind of storage, you should be charging the kind of money that can make the ecomomics of such data storage actually viable. I'm assuming that some of the really high-end storage devices from EMC, Hitachi, et al could handle your data generation/replication/backup needs effortlessly.
If that's too expensive (and it usually is), you can kludge your own system using low-end stuff from Hpaq/IBM/Dell's x86-server-oriented product lines. LTO1 drives are pretty cheap and we've found them to be very reliable over the past 3+ years, as well as offering 100 gig native per tape.
If even that's too expensive, then I seriously think you need to re-think the economics of your work situation. If your work doesn't cover your capital costs, you're not charging enough. If the work and data are business valuable enough, cutting your storage bill to the bone by building Linux clusters crammed with IDE HDDs is just a bad business decision.
If this is just your hobby-type work, then you need a cheaper hobby, like heroin addiction or something affordable. Physical space and electricity aren't cheap enough in a metropolitan area to burn through 1TB of storage per month, let along reliable data storage.
Do you need for all the data to be online at once? If not, consider a small LTO tape autoloader - a 24-slot loader will run you about $13k when loaded with LTO Ultrium 2 media, but you get 4.6 TB of storage, assuming you fill 23 slots with tape (you should always have a cleaning cart in the library to run at regular intervals). If you're really cheap, you can write a set of perl or shell scripts to operate the library with mtx and mt, rather than buying expensive software.
If you absolutely need all the data to be online forever and ever (amen), then you're going to need a fat wallet. An ever-growing homebuilt solution will rapidly become unmaintainable. Consider an off-the-shelf NAS or hardware raid solution. Apple's XServe RAID boxes are surprisingly cheap for fiber-attached hardware RAID - $11k gets you 2.5TB of RAID5 (and that's with 2 drives reserved for hot spares, you could squeeze 3TB out of it with no hot spares, but i don't recommend it.)
If you don't think you'll need to expand forever, you can start looking at homebrew options. A 12-port 3WARE SATA controller with 12 250GB SATA disks should run you less than $3000 and give you 2.5TB of raid5 with one hot spare. (Of course you'll need a system with the case and power supply(supplies) to handle them. Next year, twice the storage might cost the same or less - 400GB SATA drives are already shipping, though still cost more per gig.
Basically, you're not the common man. Like another poster said, you need to consider what the data are worth, and buy your storage accordingly.
-Isaac
I am not a lawyer, and this is not legal advice. For Entertainment Purposes Only.
I've been thinking about something similar to this for a while. At work, it would be kinda nice to have some network storage that is more reliable than it is fast. It would be there for files that we use but which we don't use frequently enough to warrant expensive disks.
One idea I had was to scarf up a bunch of cheap 9-gig SCSI drives from one of the local computer fairs and RAID them together. But I'm not sure if that's a good idea.
Google stores data for fast access, not for reliable storage. They don't care if they lose a few hundred gigs when a handful of disks die, they'll just re-spider it in a few days when the Googlebot hits the sites which were lost. Their solution is NOT optimized for reliable storage and it's not suited in the slightest to this guy's problem.
And that's what tape drives really aren't. If you want the last file on the tape, you have to wait for the drive to seek past everything else, and that's a real pain.
Once apon a time there were data adaptors that let you record data onto VHS tape.
It was pretty much like a TAR file in that it was NOT addressable, but made for huge/cheap long term backup storage.
Donno if they still exist or not, but it would still be a good product. If they dont exist any longer, perhaps it would be good to re-create the devices.
---- Booth was a patriot ----
can you do a curve fit to some time periods and then just store the coefficients? it will be pretty close.
e.g. do a curve fit on each days data using e.g. LOWESS.
it will be reasonably accurate, and the amount of data will be drastically reduced.
i wonder if there's a way to do 'RAID' across DVD's so that you can use cheap ones and not worry so much.
Xserve RAID.
Yep, you read that correctly. Apple's Xserve RAID product is cheaper bang for buck than virtually any other solution from any other manufactuer, plus it works with Solaris, Linux, Windows, Mac OS X and more.
You can use regular ATA drives and administer it from an easy to use interface either on a Mac or a web interface.
Also check out Xsan on Apple's website.
enjoy!
I've been using one of these 1 TB USB/firewire drives. It's a wonderful thing; entirely self-contained, with no cluster to manage or worry about. USB allows for 127 devices, so you should be able to acquire as much hard drive space as you need. They can be easily unplugged and stored, too.
I made a PHP/MySQL library that prevents SQL injection & makes coding easier!
Have you tried the Iomega Rev?
35GB Native capacity
Up to 90GB with compression
Hard disk speeds
ATAPI and USB interfaces
Good stuff
I havent seen any posts that took into account that (it sounds like anyways) he needs to store this data permanently. He has a 1.6TB cluster, but if he has 1TB of data per month, that's not doing him good.
:)
Normally I would recommend a RAID, but at that store capacity and taking into account you need to keep your data permanently, that's not going to work as you will continuously need to be adding more disks. That would be a minimum of 4 new hard drives per month (assuming they make 500GB ones, I think they do, if its less than change that to 6 HD's/month).
Many posts have pointed out tape backup as one of several solutions, however with your requirements I dont see how you could do anything else, if your time is worth anything. The other option is basically DVD's but that'd be 200 DVD's a month you'd have to burn. Fun fun...
Look to some of the other posts for good recommendations on particular tape drives to look for and their costs, there's plenty of them here.
Joseph?
I'm planning to use 2 large disks. e.g. 400GB. One in my desktop, the second one in my server.
:) and it was always a mess until the streamer died after 3 years (weekly use).
;-) Then the layout becomes more centralized ...
Both disks get 2 partitions á 200GB. The first partition may be divided into more parts, but the last one is 200GB.
Then I'll use (intelligent == diff) rsync to copy the server partitons into directories of the last client partiton and vice versa.
It's very unlikely that 2 harddisks in 2 differnt computers die the same day - even at a lightning event - but then again you should use cheap extra fuses for that. (I had lightning and high voltage damage, power supplies and mainboards died, but no harddisk ever.)
If you need protection from fire, you have to place the second disk offsite, or use a firewire/usb 2.0 disk and store it somewhere else.
You can even use this in additon to the client/server 'mirror' described above.
I own a 12GB DDS3 streamer and worked with DLT, don't throw away the money. Harddisks are faster and the backups are easier to access. DVD's are too small...
The rsync version has a very huge advantage: I can easily access each *file* directly.
I'm using a such a backup with firewire disk and 'tar' for additional compression on another site. We had another 12GB streamer before (1000 EUR then, 550 EUR now, and 15 EUR per tape - now and then
Cross-mirror Advantages:
1) fast backups
2) relatively cheap (2 x 360 EUR)
3) quiet (I'll throw out *all* older/smaller hds)
4) direkt fast access
5) daily cronjob
Cross-mirror Disadvantages:
1) just *one* one day old snapshot
2) no automatic raid mirror, where you don't have to do anything
Comments on the disadvantages:
ad 1) use the -b flag on backup, so nothing will be deleted, just moved out of the way and then use the -delete flag from time to time, or clean up by hand.
ad 2) having the disks in the same computer is dangerous for backups, considering high voltage etc.
You can even do that with more than two computers. Put a second 400GB disk into your server just for backups
Send it to an e-mail account in europe, with a few interesting words added (I'm not sure that's even neccessary). Delete it at the destination with an automated script. Retrieve it a couple of years later using the freedom of information act.
Data mining for al Qaeda, right?
;-)
-psy
I'd love to see a Firewire hub that could act as a hardware RAID controller. A program on the computer would enable management of the RAID controller, and once formatted, the logical volumes would be presented to the host computer as standard disk volumes, eliminating the need for any special drivers on the host computer, as well as enabling the entire array to be portable to other platforms.
How expensive could something like this really be? $300-400 at most, I'd have to guess considering what most places are charging for SATA RAID cards.
I have to disagree with the sister system though. For most geeks like you and I a sister system would be fairly adequate. It would be better with an occasional off-site backup. However it really sounds like this guy's data is far too valuable to have only one copy of it and to have all copies be at one physical location. He really needs an off-site backup somewhere. Imagine for a moment if his home (I'm guessing he works from home, but this still applies to a real store-front business) was robbed. The crooks didn't know what they were taking. They saw two shiny computers in an office and figured they could hawk them on the street. There goes all his data, both copies. D'oh! So in short a sister system is a good idea but it probably won't do this guy much of any good. It would be a good local solution for a short term live mirror (ie, data is archived that night but the sister machine gives you a backup for that one day's work).
Bittorrent rules!
Can someone point me differences in use of HDD related to file/system I/O?
I know that SCSI drivers are faster than IDE.
But why does a IDE driver makes the CPU/system go down to its knee?
IDE being slower than SCSI shouldn't hung the processor while waiting for the data.
The process will be BLOCKED longer but why the CPU hanging?
Can someone point to me a good explanation to what's going on?
I have tested some SATA drives (Seagates and Maxtors) with Silicon Image and NVidia chipsets.
In all cases I saw no difference to ATA drives.
Also how is Firewire and USB2 compared to [S]ATA drives related to performance/IO?
My work depends on generating, storing and analyzing a large amount of data -- averaging about 1 TB per month.
Then you should be *depending* on a quality storage solution (such as from Sun, SGI or EMC) rather than asking advice from a bunch of Linux using nerds whose place in the planet's pecking order is listed immediately under "Maggot".
Basically, I got 95% of my data back and had to reinstall the OS to get the computer running the first time. I then wound up going home and getting my old computer back. I was going to Holland and figured just taking the tapes and the tape drive would work well enough. It didn't.
The second time, most of the tape volume didn't show in the restore, and after frantic e-mail to customer service, I got an e-mail saying that there was a known issue with the recovery software, they no longer supported it, and the best I could do would be to go to the vendor they'd gotten the the software package from and download the demo. I think I only lost 3-4% of my data.
I try not to make the same mistake 3x in a row. My next computer upgrade included a mobile rack and a hard drive, my day-to-day stuff I mirror. For long-term/archival stuff, I do a zip-compressed DVD backup set once a month or so.
I actually have a couple of 8mm tape drives, either of which are big enough for this workstation. Never even bothered to plug them in.
Certainly, lots of commercial shops use tape. That doesn't mean you should or they should.
Though I'd be willing to take a look at the Ecris packet writing tecnology or the LTO standard stuff.
Anybody know of a cheap automated DVD-R? I mean... put in a pile of blank DVD-Rs, get a pile of burned DVD-Rs out?
Tech Public Policy stuff
Like, for example 3Ware's 8506-12 ?
It can accomodate up to 12 SATA drives in to a few RAID groups, so you can have for example two RAID-5 groups with two spare drives.
When backup is needed, you could unmount one group of drives, unplug it and plug in & mount another.
For further convenience, you could have drives already boxed in gorup of five in handy enclosures, so all you would heve to do is just reconnect 10 cables once per month.
To be safe(r), label the connectors and each drive in the group clearly (1-5 or A to E etc) to avoid
mess when accessing data in the future...
Hard drives are not so expensive, neither is a solid hardware RAID card, or a few extra (S)ATA cards if you can't afford hardware RAID.
Just watch where your archive is and your data should be relatively safe...
But in essence I guess it all depends on your real needs.
Does someone's life depend on that data ?
Is all that data equally valuable or is there some core portion that is better not to loose and other part that could eventually be lost ?
I just saw an add for 250GB harddrives at Compusa for $129.99 a pop. Combine that with an inexpensive system (say for $400) and you could backup a terabyte of data each month for less than $1000 dollars a month. Just add another box each month and move the older one off site. Setup a an encrypted VPN to the offsite location and you have instant access to the backups.
If you don't need instant access then you can run the data off to DVDs at some point and reuse the harddrives and system.
1) stuff you are likely to access in the next some-period-of-time
2) stuff you might need to access in the next some-period-of-time
3) stuff you are keeping around just in case of disaster, legal requirement, or silly request by the CEO.
#1 is on live hard disks.
#3 can be on any medium, and will probably be stored off-site. Not all organizations have a #3.
#2 is the interesting question.
You are using DVDs and want something that's less of a hassle w/o costing a ton of money.
I've seen several specific recommendations. I'm going to give some general ones:
live disks, but removable
a server farm, presumably using less-than-the-fastest drives and equipment, removable drives
fiberchannel or iSCSI are options
a robotic tape library system
a robotic dvd-burning library system
IDE drives are running at under $500/TB on sale, so I'm sure you could get them at that price in bulk.
Knowledge is how to play a game, intelligence is how to win, wisdom is knowing what game to play.
I've never used them, but have you considered tape drives? HP makes good tape drives. They can store 3 GB on a single tape and store it at a rate of 80 GB/hour.
It's good for archiving. So you can't exactly rapidly access the stuff again later. I assume if you wanted it back you'd rip the whole tape or large parts of it to your disk(s) and view it that way.
Computers can be used for something other than porn or computer games. It's called work. Often of the scientific type.
I always wondered about that way back in the days of multi-floppy spanned ZIP files that crapped out on a single disk -- why not parity info in the zip file so that you could lose a segment of the zip file (one or more floppies) so that a burned out floppy wouldn't cause a problem.
Your suggestion would either imply writing DVDs such that the parity was part of the filesystem on the DVD itself, or containerizing the data (like a disk image) so that the "file" on the DVD had parity info in it.
Either way, you'd need a big jukebox capable of mounting a set of DVDs at one time to accomodate the parity info. If you assume 14 DVDs readers, it's probably too small an amount of data to make it worthwhile, unless you had big HDD(s) in the jukebox and internal logic to rebuild the RAID set into a single logical file off of multiple DVDs.
I can't image his data sets to be non-visual data (unless he's CERN doing particle physics). In that case, why doesn't he use some compression that's tuned to the human visual system; what the compression algorithm throws out he wouldn't have seen anyway. There are a bunch of wavelet based algorithms (much better than blocky MPEG-2 which needs interframe compression anyway) which are extremely good, he should see 30:1 ratios without seeing any "loss" or artifacts.
This is what the studios are moving to. A typical feature film scanned at 2K takes 1.5 terabytes uncompressed. These files then tie up multi-million dollar telecine machines, so they need to move them off fast. (Imagine backing up 1.5 TB onto DLT every day). With scanning going to 4K this problem will get even worse.
Looking at the link attached to the poster's name, it appears he's doing some sort of bioinformatics work, as such he should probably have some grant money and/or VC funding supporting this work. As such, 'cheap' is relative; it's always easier to spend Other People's Money.
Secondly, Is all the data -really- worth keeping around? For how long? I can believe a TB/mo of data but have a hard time imagining of it all being something you'd want to use 6mo down the road; only saving things that are important might make the job a lot easier.
On another point - you can probably compress this data down significantly; multi-hundred-megabyte files consisting of nothing but 'A', 'T', 'G' and 'C' should be able to compress down by a factor of at least 4. Both Windows and Linux allow you to automatically compress data on filesystems which would make a large dent in your storage requirements.
my sig's at the bottom of the page.
Do It Yourself CD Changer
// TODO: fix sig
Then again, w/in 18 months you should get twice the storage for the money.
Knowledge is how to play a game, intelligence is how to win, wisdom is knowing what game to play.
Hello,
I'm a time traveler stuck here in 2004. Upon arriving here my dimensional warp generator stopped working. I trusted a company here by the name of LLC Lasers to repair my Generation 3 52 4350A watch unit, and they fled on me. I am going to need a new DWG unit, preferably the rechargeable AMD wrist watch model with the GRC79 induction motor, four I80200 warp stabilizers, 512GB of SRAM and the menu driven GUI with front panel XID display.
Stop the world; I need to get off.
I remember those, I think they stored under 10GB, certainly under 20.
Knowledge is how to play a game, intelligence is how to win, wisdom is knowing what game to play.
I thought he was being serious.
1 000 000 users x 1 024 megabytes = 1,024 petabytes
11 1101 1011111 0100 000 110 1011111 0101 10 01 1011111 101 1 011 1011111 0 1111 11 111 1011111 101
Bill Stewart
New Fast-Compression-only CPR http://preview.tinyurl.com/dy575ks
The next-gen DVDs are all in the order of 15-50GB. Sure would be nice if it were 0.5TB though, wouldn't it :).
Knowledge is how to play a game, intelligence is how to win, wisdom is knowing what game to play.
If your data is valuable to you, you should seriously consider using RAID. The first drive failure you have (and you will have many) will cost you more in lost data, down-time, and labor than a 3ware RAID controller would have cost. I've used both their IDE and SATA controllers and they're great. I also went through 1.5 TB worth of 250GB SATA drives in about 18 months due to failures. I'm glad I selected the WD drives with their 3 year warranty.
For off-line data storage, consider using large hard drives and a hot-swap bay. There's nothing cheaper, faster, or of higher density.
Do you think google expects 1,000,000 users to use 1GB within a year? They offer that because they know most people arent going to use it, and that by the time that isnt true anymore there will be new technologies available.
-- 'The' Lord and Master Bitman On High, Master Of All
Let me get this straight: You have a four-node cluster, you have 1.6TB of online storage, and you need some sort of permanence; and you're not using RAID of any form?
This is utter insanity! Without RAID, your only hope of safety is in your backups--which you're only asking about now!
RAID your data ASAP, and then start looking for backup systems. Take a look at some of the DLT4000 replacements.
"People who do stupid things with hazardous materials often die." -- Jim Davidson on alt.folklore.urban
Just tossing out another point of view - similar but different than some of the others previously discussed. First off, examine the data you are keeping - do you really need that much? Nowadays it's common to be able to acquire data faster than it can be processed, and if you never stop gathering data, well, you never will catch up, only fall farther and farther behind.
If you DO need this data, and you are going to need it for awhile, (year or more) I'd recommend cheap HDs. They also have an advantage of being easily catalogged, and are untouchable when compared with access time of tapes. Don't go raid5 though, this is not "catastrophy-proof". (flood, fire, tornado, etc) For catastrophy protection, mirror your drives. When you have them loaded up with data, pull the FW cables and swap drives in the enclosures with fresh empty drives. Label them well, and then take each half of the mirror to DIFFERENT LOCATIONS. It's OK to keep one set on-site, but the other set must be somewhere else, preferably in another zip code. This will allow you near instant access to your data (since it's onsite), will protect your data from mechanical failure (through mirroring) and will protect you against catastrophy. (you WILL need to acquire new firewire boxes etc if your office gets leveled... don't forget this detail in general - the data is of no value if you lack the equipment (tape drives etc) to read it back in with) I know you can get compression and fit more on a tape etc by using archiving software, but it may be worth the extra cost to obey the KISS rule and just simply drag and drop the data to the formatted HDs. This will make data recovery MUCH SIMPLER, and if there are errors on the HD when you need to recover, this will insure you can actually recover most/all of the information. Archive streams and tapes are notorious for losing 100% of the data that follows a corruption point in the stream.
Once you know you no longer need a specific set, drop it back into the pool of usable drives. Buy them by the case, it's much cheaper this way. It also is advisable to buy the same make/model every time you have to get more drives, even if there are newer, larger, cheaper models out, because having all the same drives means one less complication to worry about in times of crisis.
I work for the Department of Redundancy Department.
You NEED a serious raid array. Raid 5 or possibly even Raid 50. You NEED to implement snapshots on your dev space. You NEED a serious tape drive, probably a robotic system.
This was my recollection, but its been years so that may not be right...
But 10gb doesn't sound right to me....
They were ungodly slow.. and of course no random access ability gave them a small market.. so they fizzled out..
But a great idea, cheap archiving, back when a 20 mb QIC tape was priced beyond mortal men....
---- Booth was a patriot ----
Hope this helps...
Cheers!
VXA2 has a much higher storage capacity at a better price point than DLT. For example, a 10 tape VXA2 autoloader is about $2300 USD and holds 1.7 or 1.9TB compressed.
Stock up on printer paper and ink cartridges.
Come to think of it, you may want some filing cabinets as well (Only ~1.5 mb per 8.5x11 sheet).
Not enough feedback or information!
OK, 1TB/month that doesn't say much.
Always look at different levels of case scenarios and work from there. I usually start with loss of building by fire and work down through limited hardware failure or data corruption.
There are several factors that determine how often you should backup. Here's just a couple of questions to answer.
How much is the data worth?
How much is your time worth? If you lost a day or week of processing time.
Is your work time dependent? (deadlines)
If you lost the data, did you lose the data completely or just lost processing/analyzing time on the data that you can get from your clients again?
How long do you have to store the data, and have it retreivable? One month compared to several years really changes your options.
How financially responsible are you for the data?
Multiple backups(daily, weekly, monthly)(full and incremental) in multiple locations are key to a successful backups.
Raid is for redundacy or performance not backups.
> RAID is a possibility, but it increases the cost significantly.
Software RAID1 would only double your cost, and let you sleep at night.
Must-not-watch TV!
Sent back two of them as they died quickly. They don't work with Adaptec 2940's.. even tho they "see" them momentarily. You gotta get the latest (3000? series).
Get a pile of hard drives, and a pile of these:
USB to IDE cable
While this certainly isn't a very original idea, I'm still amazed at what I can get on ebay. Yes, it's cliche, but I often forget about it when I start soliving problems similar to these. I get all engineerie and over analyze the challenge.
Large chunks of disk-based storage can be found on the cheap. With the advancements in software (read OS with Linux) based RAID, even JBOD's would work well.
1-2 TB ain't what it used to be!
This one gang kept wanting me to join cause I'm pretty good with a bo staff.
Instead of firewire or USB get Highpoint SATA. It's faster and about the same price.
Ibrix has a really good clustered FS setup. Plus you can just plug in more systems when you want them.
Another thing that you may want to check out something liket GPFS if you want to build your own filesystem cluster.
DLT and AIT are far more reliable than hard drives.
:-(
Unlike hard drives, I don't worry about losing all my data if I drop a DLT on the floor. You can practically run over the DLT cartridges with a tank and not worry about data loss.
DLT and AIT are specifically designed to last a long time and avoid tape wear.
You can pick up a 35gb DLT IV drive off ebay for ~250 and a stack of tapes for ~100. Turn off the built-in drive hardware compression, run your data through bzip2, and sleep soundly knowing your DLTs are far more reliable backup than any hard drive.
Don't trust your important data to DAT -- the rube goldberg tape mechanism means many munched tapes.
For simplicity, I'm not going to go into RAID tradeoffs, etc. and just stick with "striped data", which gives you maximum bang for the buck. You should draw up a simple spreadsheet with the following headings:
It's not exactly a great spreadsheet layout, but it should be enough to enter everything in and start seeing what is practical and what isn't. I'm sure that someone else would be able to enhance this a little further - any takers?
By the way, you really should think about RAID-5 at the very least. All it would take is just one drive to hose your data completely. Besides, as the array grows in size, the price tradeoff becomes smaller and smaller, to the point where it's really not worth your time to stripe all of your data without redundancy. I believe that the md drivers in linux support up to 32 devices per RAID set. That takes your overhead from 1/5 of your array (in a 5-drive setup) down to 1/32 of your array.
A SAN-style setup lends itself well to this, but the price is very prohibitive to "the common man", as it requires very expensive hardware. You can emulate something like this via GFS support in Linux, which (theoretically) would allow you to aggregate your data.
If there is a requirement to keep the data online at all times, you'll need to spend more on some PC cases, as well as some networking to string the units together. Pick a reasonably-priced case that will house all the media units, have adequate power (at least 250 Watts, 300+ would be ideal) and keep them cool. Use a motherboard that is reliable, and can adapt to several different clock speeds for a given CPU; you'll want something that can be thrown out for less than $99.00 if it should go bonkers on you, but if the CPU burns up, you should be able to still get parts off the shelf and get the Motherboard running again. Stick will the "commodity" or low-end CPUs, as (a) they tend to be cheaper, and (b) having been through a complete lifecycle, any bugs or issues with the CPUs will be well-known by now. Don't worry about the speed of the board or CPU at this point, as most "modern"
I haven't tried it myself, but Apple's Xraid appears to be gaining in popularity as a reasonably priced bulk data storage solution. It reportedly works with Linux, Windows, Netware and, of course, Macs.
If that doesn't suit ya, and it's bulk storage without necessarily speed you're looking for, check into the ATABoy line from Nexsan.
Dump the IRS - http://www.fairtax.org
Wow, ignorance gets demonstrated in such amazing ways.
... I know where you can get a cheap eye.
You are in a maze of twisty little passages, all alike.
Outsource it, pay 1000 people in an impoverished country to memmorize all of it !
Linux is unix training wheels, while BSD *is* unix.
How about a gigE NAS solution here? You can get a 3TB (raw) NAS solution from EMC these days for $6k (it is an AX100). The question remains, what is your data worth?
Phredd - "I have found people tend to take you far less seriously once you start waving your genitals at them..."
Well, not this is a soluation, but if you just have lots of money to blow. I just put some of this stuff in at work.
I've got an EMC Clariion CX200 fibered to the servers, that means its a SANS. Its 3 TB. I think it was $40k. Then I've got a Win2k3 Appliance Server fibered to the EMC Clariion CX300 with 12 TB. It shares out through various filesystems ( nfs, smb, ftp etc.. ). So its' my NAS. The CX300 I think was about 100k. The SANS holds the online data, the NAS holds the archived data. Oh, and the NAS head is fibered to a Dell PowerVault 12TB LTO2 tape jukebox. Which backs up the NAS. It was pretty cheap, i think it was 18k or so.
Oh yes, and just for the slashdot crowd. Yes, this equipment will be holding fMRI studies. As well as MRA, CT, DR, CR, CTA, US and variaty of other modalites. Being a PACS engineer is fun, although, I do make less then a teacher. Fucking Economy.
SGI's Data Migration Facility. It works on IRIX, but maybe they have a Linux version now (or get a MIPS SGI on eBay). It migrates older files to tape automatically, or to another filesystem, if you happen to have a big huge array somewhere...
There exists no way of exchanging information without making judgments. --Bene Gesserit Axiom
Read the question the person asked again. They're looking for long term storage, not a backup. The redundancy of RAID provides some assurance that your data will be around for the long term.
Try a stone tablet and a chissle.
;-)
Proven to last a long time.
Though quite labor intensive. Could employ slaves?
>>"Lately, I have been looking into different bulk data storage options available to a common man. My work depends on generating, storing and analyzing a large amount of data -- averaging about 1 TB per month. I would like to have a storage system which is automated, fast, reliable and most importantly does not cost the price of an eye. Right now, I have a 4 node Linux cluster with 10 large hard disks (total capacity 1.6 TB); data storage roughly costs about $0.60/GB (excluding the cost of PC hardware). But long term storage is painful -- DVDs cost about $0.10-$0.15/GB but takes too much human time and leaving data on hard disks makes me nervous because of possible failures. RAID is a possibility, but it increases the cost significantly. I was wondering, if Slashdot readers have any recommendations for a cheap automated way to store and retrieve data."
First of all, is the 1TB of data that you collect every month mostly different or mostly the same as the data you collected in the past month?
Secondly, how compressable is the data you are collecting?
Thirdly, how much random access do you need to the data, or is a serial stream of the data good enough?
--
If the data you have is mostly the same from month to month then you only have to perserve the difference between 2 months
If the data is highly compressable then you can use bzip2 -9 to make the data much smaller, therefore needing a lot less of the media than otherwise.
A 1TB file that compresses 50 to 1 will only be 20 GB. This will easily fit on 5 DVD's.
If you collect 1TB of data and diffing it with the previous months data outputs only 100 GB differences, and that compresses down 21 to 1 then you can fit it on a single DVD.
rsync is also good for copying the data on a system to a remote system.
Your Linux cluster is the cheapest method I've found. I have one such TB box with 10 old/used/cheapo SCSI disks and using software RAID I built about 4 years ago. It acts as a backup server using rsync in a shell script that collects data from 15+ networked servers every night.
Never had a problem with it.
Guess that's what the "I" in RAID is for.
does the "common man" need 1.6TB?
How about doing the raid bit with what you already have.
I wish the poster told us more about the application. Is the data from sensors (someone mentioned FMRI), or is this from sort of simulation modeling?
...
This might sound off topic, but I have used different compression schemes (like wavelets) for sensor data, and regenerative techniques with statistical sumeries that gave me me about 2000:1 equiv. compression. That still does not address the 1TB/month backup problem, but might help reduse the problem somewhat.
As for backup. In the past I have chosen to go with a modified cluster solution, where I set up a data server in another building and auto-backuped up the new/unique data to it. The reason I chose slow/large HD's instead of tapes is my experience with using tapes over a course of 10 years. I cannot even get replacement tapes if I needed them not to mention the actual tape drives...
How long do you realistically need to keep the data around for? Can you recycle the media (say after 1 to 2 years? Does it need to be truly perminate?
I looked through some of the answers here, and as near as I can tell, you've got a bunch of home hobbyists telling you how to back up your home computers. Perhaps all your needs entail is a computer with an external IDE drive array and 4-10 200G SATA drives in it. But from your initial post, it's not clear what you need your offline storage _for_.
First of all, you mention that you generate and use 1G of data a month. What happens at the end of that month? Does all of the data become useless? Is some of it carried through? Is it useful for historical processing for some time after it's not "live" any more? The disposition of that offline data is important; you can't determine how you can most effectively back up your data until you know what you need to do with that data once it's backed up.
Since no one cares about backing up old data that they never use any more, I'm going to assume you need this data in some form in the future. I'm also assuming that your data ages out completely every month.
Realistically, you have two options: Large redundant disk arrays, or tape. Various factors give credence to one or the other.
First of all, get off of the SATA hacks, and realize you're going to need to go to SCSI, whether you end up with disk or tape. You're backing up data, you're going ot want it to be reliably written out, and SCSI is the de facto standard for backup architecture. Yes, you pay more for it, but there's a reason for it: the SCSI equipment I manage at work fails a fraction of the percentage of time that the various IDE/ATA systems fail. While SATA is marketed as a consumer technology, it will never meet the rigors of being a reliable backup methodology.
This space for rent. Call 1-800-STEAK4U
Beam the data into space. When you want to retrieve it, you'll have to go catch it.
You don't want the beam waving all over the place because of the Earth's rotation, so aim it at the North Star (assuming you're in the northern hemisphere). If this activity continues for a long time, you'll have to compensate for the precession of the Earth's axis and for various smaller wobbles; I believe a correction every century or so will suffice.
Note that if the speed of light really is an absolute limit, you'll have to count on the universe being curved and wait for the data to come around again. Again, you didn't say anything to indicate that this might be a problem.
1. Disassemble 4 node linux cluster. 2. Reassemble 4 node linux cluster as a robot. 3. Give robot a pencil and notebook. 4. "Mr. Robotensen, take this dictation..."
Well, not too demanding, are we?
Sure I'm paranoid, but am I paranoid enough?
Request your free CD of my piano music.
There's no reason why you couldn't read each of the DVDs in serially and incrementally rebuilt the lost DVD. On recovery, you should only need enough space to hold a single DVD to rebuild the remaining disk.
A disadvantage is that the data cannot change while you write all N+1 DVDs and restoring would require lots of DVD swaps (regardless of whether you've lost a DVD or not) and the ability to incrementally write files with gaps in them (not an issue with most filesystems).
/ \
\ / ASCII ribbon campaign for peace
x
/ \
Hopefully we wont have to wait long for Holographic Memory to become commercially available. It looks like the space program is one of the few to actually use Holographic Memory for anything now.
"Meaningless!, Meaningless!" says the Teacher. "Utterly meaningless!"
If you really have 1TB/month of fMRI data, I'd look very carefully into algorithms for compressing it. I would think that a customized compression scheme for a particular type of data might yeild quite impressive results, reducing the backup problem by an order of magnitude or more. From your description of the value of the data and the cost of recovery, some time invested in a compression scheme would be well spent.
Thad
I love Mondays. On a Monday, anything is possible.
As a freelance programmer and sysadmin, I admin about 20 boxen for various customers. Virtually all Red Hat systems under various support contracts.
Many of these servers are using hardware or software RAID, but one of the terms in most every contract I've ever signed includes the term "regular, off-site backups".
In my case, I discovered rsync and wrote a nice, easy-to-use backup system based on rsync I've called Backup Buddy. This allows me to not only backup data, but with a minimum of additional storage usage, view my backups as a set going back through time to any point in the last (typically) 45 days, seamlessly.
With this tool, I manage about 2 TB of data on 4 different backup servers, all remotely.
My own backup server for my own stuff is a recycled AMD K6 system with an PCI IDE card and two HDD, 120 & 160 GB put together using Logical Volume Manager. I've fallen back on these backups innumerable times - and I can't say how nice it is when a restore can be done in 2 minutes flat from any nearby workstation.
I also have two primary servers at different locations (hint: they are 200+ miles apart) on separate networks in case of catastrophe - they're also mirrored via rsync nightly, and a switch from one to the other takes about 3 hours.
Uptime is important, and I think this is proof that even for small (1 employee) businesses can have a reliable, effective backup solution!
I have no problem with your religion until you decide it's reason to deprive others of the truth.
There were several packaging options for this tape.. including reels of 2" wide tape and cartridges.
I've lost track of what happened to it. All I remember is that this tape existed at one time and some research was being done to make data recorders of phenomenal storage capacity.
Back in the early 90's, there was one company in Campbell, California, known as "LaserTape" which was trying to design a tape drive for the PC which used cartridges of this tape. I have lost track of whatever happened to the company.
"Prove all things; hold fast that which is good." [KJV: I Thessalonians 5:21]
Ya know, mate, there is this new thingie over which you can send your data for a ride over a distance, to be stored in another machine elsewhere. Them guys call it the Internet.
Do you have any idea how large a short porn movie can get? Imagine a DSL or cable connection running 24/7 downloading all movies available many times over (we're talking common man here, we know he won't check the file size much less hash and be easily fooled by a simple name change, so "Paris Does It.mpg" and "ParisHiltonHomeMovie.mpg" will both be downloaded). One can easily reach the terabyte needs this way.
hey ya folks...lol i want a gmail invite:) POR FAVOR (spanish for please) ...gracias (spanish for thanks)
email -> opweirdisntit@yahoo.com
i'd like my gmail account to be nitinshantharam@gmail.com
thanks in advance :P
You may call this Adrian's Law of Backup because I've never seen all three at once (and two of them together is still pretty damn hard).
That is all.
Today Media at $/TB is going to be IDE@200GB$508/TB.
Hire a guy to build you servers. You'll need one server per month, figure about $300 for the cpu and OS (this is slashdot. figure it out.) That's $1308 per month, plus whatever you pay your idiot nephew to set it up.
The good news is every month it gets cheaper.
Hint: Since your backup stream is 3mbps continuous, I'm going to guess you're using gigabit so add $100 per server for a switch network and NIC.
Throw the old servers in the attic as business needs require. Once it's built in our AO it costs about $1.00 a month worth of juice keep a server up, so the issue is really the business need to have the data be unavailable.
Help stamp out iliturcy.
How about a nice terabyte sized external hard drive? Only $1,199.00.
-+-=-+-=-+-=-+-=-+-=-+ *** http://www.mountainfort.com *** +-=-+-=-+-=-+-=-+-=-+-
3$ / GB for 400 Mps xfers and fully redundant fiber channel storage.
http://www.apple.com/server/
Word of warning. Don't cheap out on Firewire hardware - it's touted as being bulletproof, but in practice I've found SCSI-voodoo-like interactions between cheap cards/cases, and questionable power supplies. I've pretty much given up using Firewire for applications where I need to swap drives a lot, as weird crap happens, just at the worst possible moment.
In those applications, I've gone with dedicated ATA/133 cards with a nice roomy case with a bunch of removable drive bays. It's a pain to have to shutdown to swap drives, but less of a pain than Windows bluescreening, rebooting, and "fixing" your attached Firewire drive, scrambling all of the data on it, and making it impossible to run a recovery (no, I didn't have a backup of that data...)
I've also had weird crap happen with my Macs as well - some hardware doesn't show up unless you have it plugged in on startup. In theory it was a great idea (mix Firewire cases with removable drive bays), in practice, you're asking for trouble if you're using cheap parts (ie, bottom-basement cases, with cheap cables.)
I'm not sure what your price range is, but one method I've had success with is a Promise SATA add-in card and removable hard drive enclosures. SATA is hot-swappable and combine that with a cheap hard drive enclosure ($10-$30+) with any SATA hard drive of your choice and you have a relatively cheap solution.
I remember reading one day about some research somebody did about abusing the network capacity to store data. Basically he would send mail to himself via a third party smtp server. Of course he would tell his destination server to ignore his messages until a set date, then refuse the messages which would then be bounced back to his originatin email acocunt. By having a roll on that he achieved some pretty amazing storage for FREE! with ultra reliable ISP grade mail backup. Now aplly to same principle to space! Saw you have a server on Mars. You could transmit to Mars the data in full before MArs even started receiving the Data. When Mars would receive the data it would immediately send it back, not even waiting for the message to be completely received. Thus the data would not use any storage on mars either. At this point you have achieved media less storage. And have abused the network capacity of Space. Talk about the geek factor in that!. I don't really wan't to model this network's capacity but everybody here understands that it is a function of the transmission rate, celerity, distance with "relay" server. Of course there is an amount of data for which you will start needing some sort of storage on both servers. This will noly happen if the data has time to do a return trip to and from the relay server in less time than one can transmit the data in full. Improve the transmission rate and your network "memory capacity" multiplies.
Artificial intelligence is no match for natural stupidity
You know, there's really no need to backup all your SPAM. After filtering out the SPAM, I'm sure you can use the floppy method mentioned already ;)
When public Gmail accounts are available... just get yourself a couple hundred of 'em and write a script to partition and email yourself your data... as well as a script to download and rejoin your data... dude, save thousands by using a free service with ultimate backup, etc.
man, I wish I had a need of this myself... I'vd just come up with the holy grail...
A fool throws a stone into a well and a thousand sages can not remove it.
What would be really interesting would be an informal contest to see who could store the most data in a single page of paper.
:
My entry
Please open your browser at www.google.com and type in what you are looking for. Then write the answer in the space below.
Two wrongs may not make a right, but three
Here are a few sensitive variable which may affect your solution:
- time to recovery
- initial capital outlay
- total cost of ownership
- time window for backups
- cost of downtime
- man-hours of intervention required.
- type of recovery needed
Some less significant variables include:For instance -- you have 1TB/month of data. That's fine. I am going to make an assumption that your work in some way is either based on a weekly (256GB/week), or daily (~51GB/day) cycle. From that, we have to look at how long you can have a load on the system as you back it up on a daily/weekly basis.
When you have to recover, is it going to be a disaster situation (ie, restore the entire system), or used as an 'undelete' type repository (restore files x,y and z). In the first, you're limited by physics
blah...I could go on for hours on the subject, about the type of stuff that would be useful to consider before designing a backup solution. Quite simply, you want the price of the solution to be lower or equal to the benefit that you'd get out of it -- it's not typically worth
paying 3x the money for a 10% reduction in recovery time, unless there are some odd factors (ie, users get their month free if you have an outage of more than 12hrs).
Build it, and they will come^Hplain.
Ok, a few questions:
./ readers.
1) How much is the data worth per day, week, mouth, year? Your final solution should reflect these data points.
2) How quickly do you need to have access to it? Quicker means more money longer lowers the price, but add complexity.
3) How stable does the data need to be? Is year old data worth the same as current data? What about 2, 3, 4 years later. Do you need to get that data back?
4) How much physical room is available for the backup systems and offsite storage? Is it climate controlled, yet convenient? Is it in a different state to avoid disasters?
5) How secure does the data need to be? Is this your customers' credit data that cannot be leaked or there are federal fines or will it just be inconvenient?
A storage engineer would use your answers to help design a total solution. If your data isn't worth very much, then you've also shown that by this study. OTOH, if it is worth millions, don't expect to "get by" with a $20k answer from
I work where there are daily penalties of $400k if we make a mistake or our systems go down. Other systems will cost $5M / hr if they aren't up. What do you think the cost of our backup and recovery system is? We have data stored in multiple locations - near, a little further and on the other side of the Earth. It takes a little longer to get the data back the further away it is. I can imagine insurance and banking where the cost of data is in the $10M per second.
How much is your data worth?
My work demands 1TB a month ....
It sounds like you need a good cost benifit analysis and an idea of a budged.
First RAID your existing data.
Second Replicate any working solution you have now identically for next month and backup hardware.
Have a serious talk with work as to what is expensive and what you can afford. What happens if a data set is lost? How much damage\cost would that incur? I would look int AIT drives from Sony.
It sounds like you are in a frame of mind where you see everything as expensive. This will heavily influence your decision. Walk through a data disaster scenario with your backers and examine your costs in that light.
ls
With the new G-Mail concept, Google have shown the world that they are capable of storing TBs of data without too much trouble.
... plus the trouble of setting it up, keeping it safe, and paying the electrical bill ;-)
:-)
And it is not data that can be lost without problems (as suggested elsewhere) because we are talking about user-data! Not just search-engine index data.
Why don't you ask them how much an online storage area would cost? Other suggestions here run from 500 USD to many thousands of USD each year
How much online storage space can you buy from Google if you pay them 100 bucks a month? Give'em a call and ask...
http://www.google.com/search?sourceid=mozclient&ie =utf-8&oe=utf-8&q=optical+jukebox
You might consider an 'Infinite Storage' optical jukebox. We ran one for awhile back for user files, but it wasn't fast enough with small files and it was replaced by SAN.
Perhaps with your large files it'd perform better.
Environmentalism is the new Victorianism. Everyone ties on a green corset and pretends we're virtuous.
Can you please point a link to where I can buy a 400GB SATA drive? I know they've "announced" them, but they aren't for sale yet. Why are you recommending a solution that you cannot have possibly implemented yourself?
hey asshat,
if you are generating a TB of data a month, you are not a "common man"
stupid troll
I'm a SW engineer at ATTO Technology, and one of the products here is a RAID-capable disk array. I truthfully don't know all that much about its behind-the-scenes workings as I am on a different project. The Diamond is capable of holding 24 ATA disks for a total of 7.2 TB for less than $10/GB. It does Fibre Channel or SCSI, and it can emulate a tape drive to be used with your favorite backup software.
Here's the link for more info:
Diamond Storage Array
--The Programming goddess from Gorflaz
Later this year Panasonic, I think, will be releasing DVDs that hold 50 GB on a side using a blue laser. Of course, that's still not even close to a TB.
Anyone who needs 1TB of storage per month must be either a spammer or an *gasp* evil data-miner!
;-)
cat 1tb.txt > /dev/lprn0
A freind of a freind works at Dow Corning. He mentioned at one point that Dow uses a direct-to-microfilm printer for some of their most critical data. Expected media life is over 100 years given modest climate control. The pricetag, however, is doubtless out of this guy's range-- six figure, as I recall.
//Information does not want to be free; it wants to breed.
I have an office full of computers, relatively new-ish, all running Linux, all with 80-160GB hard drives.
/home.
/home partitions together in one single network-wide, redundant file system.
I section off a part of each disk for OS -- root, usr, var, etc.
The rest is
What I'd really like is to tie the
Here's how I'd plan it out:
Start with something like the NBD (kernel level network block device). Have a distributed block hash of some kind, mapping each virtual block to one or more physical blocks. Make that layer support redundancy, so that each block was repeated (or 3 for 2 checksummed, as in RAID) elsewhere on the network, so that the distributed block driver could survive at least one machine going down (later: build in multiple levels of redundancy so that blocks get mirrored in multiple places, and the virtual block driver could survive multiple machine failures).
Build a file system on top of that, just like regular NBD.
Now I'm not a file system guru but I expect this can be done with the right amount of cash inflow.
I had a friend who built a coffee table from 4 Mac Classics he got for 5 dollars each at a yard sale (only one booted). Thus, I can fairly say 20 dollars is reasonably priced for an Apple themed coffee table. And that would be the 2nd time you heard it.
IIRC Seagate's 400GB disks aren't scheduled for availability until the fall.
Paper launches suck.
- chrish
There's no reason why you couldn't read each of the DVDs in serially and incrementally rebuilt the lost DVD.
That's kind of where I was going; the DVDs are a storage medium for a multi-part archive, and a HDD internal to the jukebox provides the working filesystem once sequential reads of the DVDs are done.
I know that some tape drives have memory chips for labeling and positioning data, but I've often wondered if you couldn't have a storage device with a built-in HDD and tape-type unit that would implement a HFS invisible to the OS. The HDD keeps frequently accessed data and transparently reads/writes from the tape drive. It's probably mechanically impractical and too expensive, but an interesting idea.
You can also save some money and get a Storagetek silo that can hold over 7000 tapes and have around 7 petabytes of tape storage at your finger tips like my work has :)
I know hard drives go bad, but:
1. if they are
cheap enough, how about multiple drives per backup
maybe in conjunction with multiple copies per drive?
2. if drives do go bad, isn't data retrieval from
dead drives a pretty advanced science?
Sure it is pricey but the total cost again depends on the failure rate.
If cheap drives have a failure rate of say 15%
in the first year, using a second drive pushes the
overall failure rate to around 2%.
It is by coff... er, will, alone I set my mind in motion...
Seriously, though, if you want cheap online backups, go with IDE RAID. It's slower than SCSI, but a lot faster than the other options.
They're a hardware RAID
Try a Redundant Array of Floppy Disks.
"Can you please point a link to where I can buy a 400GB SATA drive?"
... well, this one is the closer to my home... about 10 minutes on foot, in Paris (France)
:
/., and taking my post at face value would be a leap of faith for someone that has been on slashdot for soooooo long (number 514...nice...).
Here ?
I know the description is in french, but "HITACHI DESKTAR 7K400 400 Go Serial ATA 7200t 8Mo" is quite understandable, I hope...
"I know they've "announced" them, but they aren't for sale yet."
It seems that chrish (the one that answered your post) have the same delusion
" IIRC Seagate's 400GB disks aren't scheduled for availability until the fall.
Paper launches suck."
Well, the world is not all Seagate centric, it seems...
"Why are you recommending a solution that you cannot have possibly implemented yourself?"
Well, Joseph, any other witty/uninformed comment on 400Go Sata availability and me implementing (or not, btw) them ???
I mean, this is
Maybe you just need to relax. Or read something else than slashdot, and keep informed on what's hot, or, more to the point, on what's out in the market...
I mean, before posting, that is.
Any other flame/comment ?
Yours Faithfully,
da5id
It takes 40+ muscles to frown, but only four to extend your arm and bitchslap the motherfucker
Hard drives based solution seems to be (currently at least) cheap + easy to use for immediate use
DVDs are cheaper for long term storage but automation devices are still not commonly available (plus capacity per DVD is small)
Software RAID is slow for writing but okay for reading data
Tapes may be an viable alternative for long term data storage but the tape drives require an initial investment
Some readers have mentioned "LaCie Bigger Disk". $1199 for 1 TB disk space ... a price to pay for convenience.
Since a lot of you have asked me, now I will explain the nature of my data, its storage and analysis.
I am a scientific researcher working in computational biology. I do atomistic modeling and generate snapshots of protein conformations along MD trajectories. The data is analyzed several times to calculate different quantities. Those of you who do this kind of stuff know that we can collect and store only a fraction of data we want. I generate this data on supercomputers and then compress it (using bzip2 and gzip) to store temporarily and permanently.
The data generated is ususally analyzed within the next month. A good fraction of the data (about 70-80%) needs to analyzed again several times for different quantities in the next few months. In my experience 80% of data is usually discarded after a year of so. Therefore 2.4 TB/year need permanent storage.
So 500 GB at least is required for "daily use", 1-2 TB would be nice to have for intermediate use and over 2 TB will need "permanent" storage.
Assemble a 1500$ machine. Tyan, AMD, lots of ram over processor power. Use a big, nice case like the Codegen S-101. This case has 11 x 5,25" bays, plus an additional 3,5" bay. Ditch the included psu and use something in the 650 - 750 W range.
Use 2 5,25" bays for a dvd burner and a tape unit. In the remaining 9 5,25" bays, assemble three Promise sata enclosures, these can hold 4 disks each, and are 3 bays high. Then plug three Promise S150 SX4 RAID 5 controllers with 128 MB PC133 ECC each. Now populate all the raid enclosures with 400GB Seagate disks, as soon as they're available, for a capacity of 1,1444 TB x 3 = 3,433 TB grand total.
Initial cost:
1 x computer = 1500$;
3 x S150 SX4 + 128mb = 900$;
3 x SuperSwap 4100 = 750$;
12 x 400GB HD, 200$ each? = 2400$:
total = 5500$, 1,60$ per GB.
Have handy some 4 unit sets of SuperSwap 1100 enclosures for additional 4 disk raid 5 sets.
If you use a mobo with integrated Giga Ethernet, you can expect a network throughput about 60 to 75 MB/sec.
This setup will provide you with online access to the current data set, plus two backups, and as many as you can pay backup sets in closet storage.
You can reduce the initial cost reducing online raid sets, 1000$ each.
Hope it helps,
Bug Eyed Hardware Nut.
So put it on a Netware box ($60 SBS shop.novell.com). Salvage has been around since the early 90's. IMHO there's no reason for an OS to not incorporate it's own restore feature these days.
"I can't give you a brain, so I'll give you a diploma" - The Great Oz (blatently stolen sig)
and here's the link
LaCie Bigger Disk
only problem i have with it is that they claim that 1 terabyte = 1,000,000,000,000 bytes
not even 1,000 Gb
let alone 1,000,000 Mb
one of the oldest tricks in the book but still an oh-so-common practice
I can't speak from experience, but I've heard that OS X's built-in software RAID is terribly slow. Perhaps that wouldn't matter for backups rather than regular use.
"I've got to stop masturbating! It makes me too lazy! Stop it, Albert. Stop it." -- Albert Einstein
If you really need that much data, it will fairly quickly become cheaper than hard disk storage. Add something like Bacula to back it up.
Government of the people, by corporate executives, for corporate profits.
which does full bleed printing (edge to edge)
8.5/11 letter paper therefore can have 24480 bits per line, and 15,840 lines for a total of 387763200 bits per sheet, for a total of 48470400 bytes per sheet?
every day http://en.wikipedia.org/wiki/Special:Random
Let's sort this out. Slashdot/Speakeasy is offering a 6 megabit DSL service; there are 86400 seconds in a day; we can average 30 days in a month. So, if all that bandwidth is dedicated to downloading pr0n, it amounts to about 1.76 TB/month.
(Of course, that same figure would apply to downloading anything at full bandwidth on such a connection, like MP3's, MPGs, etc.)
RHCE; are you certified? Karma: ambiguous.
There are these DVD duplication stations that have a DVD burner and an robotic empty-burning-done mover. you can put a pile of some 50 DVDs on there, so you need to feed it with a new pile of empty DVDs every week or so.
Use a RAID array of 6+1+1 to buffer the data: 6 data disks, one redundancy disk, and 1 hot spare. Remember the hot spare. Your MTBF goes up enormously through the hot spare. Trust me. Or do the math yourself. But please do replace it once one of the disks has failed, and the hot spare has come into action....
Of course, if you require online accesss, then the offline dvds are useless.....
If you're willing to play disk jockey, I could imagine doing this completely in software with a single DVD drive. After all, this would probably be part of a backup application, anyway. I don't see any reason to dedicate hardware to this. Furthermore, it is certainly possible to have more data than available DVD drives, so having all DVDs online at the same time is impractical in the general case.
Building a simple hack couldn't take more than a day of hacking (at least to write a proof of concept).
/ \
\ / ASCII ribbon campaign for peace
x
/ \
The Parchive Project
My first attempt at flamebait is duly recognized by at least on slahdot moderator. Yeah !
/., we deal in strong opinions, not in nice social intercourse.
OTOH I was really pissed off at seeing someone blatantly calling me a liar...when that person could have taken the time to google for "400Go SATA" and get some answer before posting...
This post could also have finished as informative, I gave the link asked, Insightfull, for the Seagate part, or quite anything else, this being slashdot and moderators being known for their heavy use of the crack (crap?) pipe.
My point ? well, I'm still waiting for Joseph to present some sort of excuse, at least recognising he was wrong... again, this is
so, here is My Formal Mea Culpa : HEY JOSEPH !!! YOU WERE WRONG !!!!AND I'M STILL RIGHT !!!
Lol.
Let's see if this also get a flamebait mod... Ahh, the nice smell of Karma burning 8)
It takes 40+ muscles to frown, but only four to extend your arm and bitchslap the motherfucker
make it 5500 Euro, I forgot to add the 3ware cards to the price....
It takes 40+ muscles to frown, but only four to extend your arm and bitchslap the motherfucker