Ask Slashdot: How Do You Store a Half-Petabyte of Data? (And Back It Up?)
An anonymous reader writes: My workplace has recently had two internal groups step forward with a request for almost a half-petabyte of disk to store data. The first is a research project that will computationally analyze a quarter petabyte of data in 100-200MB blobs. The second is looking to archive an ever increasing amount of mixed media. Buying a SAN large enough for these tasks is easy, but how do you present it back to the clients? And how do you back it up? Both projects have expressed a preference for a single human-navigable directory tree. The solution should involve clustered servers providing the connectivity between storage and client so that there is no system downtime. Many SAN solutions have a maximum volume limit of only 16TB, which means some sort of volume concatenation or spanning would be required, but is that recommended? Is anyone out there managing gigantic storage needs like this? How did you do it? What worked, what failed, and what would you do differently?
use amazon
It's all going to get backed up.
subject says it all.. large storage arrays typically run freebsd kernels or some variant..
we use Ceph, its fast, redundant, and crazy scalable, oh did i mention free (paid support)? ceph.com
Do you mean:
(a) "Don't store it. Employ Amazon (or some other cloud) storage."? or
(b) "Do not use Amazon."
Clarity: it's like that one thing that is not the other thing, except for when it is.
Get thee glass eyes, and, like a scurvy politician, seem to see things thou dost not.--King Lear
Honestly, you should talk to the pros. I would call a couple of storage vendors, give them the basic outline of what you want to do, and let them tell you how they would do it. You can even get more formal and issue a Request for Information (RFI) or even a Request for Quote (RFQ). If you're a biggish company, your purchasing people probably have an SOP and standard forms for how to issue an RFI/RFQ. For the big boy storage vendors, half a petabyte is commonplace. The bigger question may very well be what this is going to look like at a software level. Managing the data might be a bigger challenge than storing it. Is this going to be organized in some sort of big data solution like Hadoop? Is it just a whole bunch of files and a people are going to write R or SAS jobs to query against it? Sometimes the tool set that you want to use will drive your choices in how to build the infrastructure under it.
It's expensive, but can be used as SAN or NAS (NFS or SMB). It's also redundant to itself - think RAID6 across cabinets. It will set you back, but it's worth every penny.
The poor mans solution is the latest Synology product, which will allow you to do RAID-spanning up to 1.5tb raw.
The only truly viable backup options are to either do block level replication (this isn't backup) and/or Amazon S3 to Glacier.
At Facebook, it's memcached, with an HDD backup, eventually put onto tape...
At Google, it's a ramdisk, backed up to SSD/HDD, eventually put onto tape...
For anyone who can't afford half a petabyte of RAM with the commensurate number of computers? I have no good ideas... except maybe RAM cache of SSD, cache of HDD, backed up on tape...
Using something like HDFS to store your data in a Hadoop cluster of file requests, is likely the best F/OSS solution you're going to get for that...
WARNING! This girl exceeds the MAXIMUM SAFE standards established by the FDA for BRATTINESS
This project must have an unrealistically low budget, otherwise there are quite a few Enterprise solutions that will do all OR a combination of these tasks.
> how do you present it back to the clients?
Look at a NAS, not a SAN. ie NetApp or 3Par C series.
> And how do you back it up?
Disaster Recovery replication to another system or hosted services. NetApp, EMC, 3Par, etc, etc
> Many SAN solutions have a maximum volume limit of only 16TB
NetApp Infinite volumes limit is 20PB
You can contact a sales person from any of those companies to answer any of these questions.
Seriously. Call ixsysyems. They specialize in this stuff and they use ZFS.
There's no place like
The research projects I've seen using that amount of storage has usually used a tape solution with dCache in front of it. You use a number of tape robots filled with tape, put them in different locations and have them back up everything between them.
16TB, thats so wrong on so many levels. Even small business nas arrays like synology are capable of multi PB size storage. Take the RS18016xs, it holds 12x8TB drives, and can be expanded with units to hold 180 total drives. The base unit is like $9000.00, and the expansions are about $3000.00, so for about $50,000 total, you could have 500TB online storage. Want to be doubly safe, just get 2 identical units. So for 100K, you have all the storage and redundancy you need
DNA strands are the way to go.
https://en.wikipedia.org/wiki/DNA_digital_data_storage
Our storage just passed 1 petabyte, and we are using AWS, but, the real answer to your question is that you need the help of someone with storage expertise (and asking Slashdot doesn't count).
I have a few friends that have built their own NAS with RAID5/6 and ZFS with much more than 16TB.
You really need to talk to an Enterprise Reseller. Do not bet your career on some half-assed solution you engineered in house.
If you want to keep your data on-site, unless your already have a lot of the infrastructure that you can leverage the path of least resistance is to use something like a NetApp Filer.
For backups it can create snapshots on a schedule (hourly/daily/weekly), then either replicate them to a second physical storage unit (hopefully at a different site) or present them to your backup solution.
Using the file services on the NetApp will also provide a solution to your "how do I present it to the storage consumers" question - iSCSI, CIFS with domain integration, NFS, Fibre Channel... You also get storage level de-duplication and compression, if that works for your data.
Of course you will pay what seems like a lot for it, but it does solve a lot of your problems in one unit. How much will it save in servers, backup capacity, a multi-drive tape library, daily visits to the server room to reload tapes and so on.
But if your data center isn't up to providing the level of availability you want then any hardware solution is going to be problematic - large storage systems do not like having the power pulled out from under them. Minimum is dual-redundant UPS power and fault tolerant cooling, or you will most likely have problems.
Tegile does nfs, cifs, iscsi, and fc. Also dedupes with ZFS without having to trade the kids in for more RAM.
Something like storage pods? https://www.backblaze.com/blog/storage-pod/
I use slashdotFS which is a markovian random comment generator which effectively embeds data in a stegenographic comment. The FS handles the details of creating and saving these so it's all transparent and mounts on your desktop like a regular drive. It's slow but it's capacity seems unlimited and frequently gets modded insightful
Some drink at the fountain of knowledge. Others just gargle.
of this project. You probably can supply enough information to vendors to get proposals (which will be all over the map because you can't be very specific) but I fear you're not in a strong position to evaluate them. You need to have your solution developers talk to a experienced consultant you hire to make recommendations and provide evaluations of vendor bids. You might be able to get it done that way.
You could look into Lustre, although it would change your hardware configuration a bit (its not a SAN) Depending on your configuration and desired redundancy, this will affect costs a bit (i.e.. more luster nodes).
You could by a traditional SAN and tie it all together with fibre, though you'd need a clustered file system like Stornext, or another commercial CFS, or even GFS if you prefer open source. This would help solve your traversal of the system as a regular directory structure issue.
Best bet for backup would be to a robot tape library of some sort. There is some work being done on dynamic backup of data in Luster systems in the HPC space, but its not very mature. CFS systems like Sternest have methods in place for automatically backing up data on the filesystem.
SanDisk's Infiniflash is 512TB in a 3U chassis that is SAS-connected. You can front this with something like DataCore's SANsymphony to turn it into a NAS/SAN appliance.
The pricing looks to be around $1/GB, which is a ton cheaper than building a SAN of that capacity, plus it's much smaller in power/space/cooling.
up 12 days, 22:30, 2 users, load averages: 993.20, 994.21, 994.56
*makes note to limit user processes...
Let's start growing brains in jars.
“He’s not deformed, he’s just drunk!”
What clients will you be exporting it to? Linux, OS X, Windows? All three?
What kind of throughput do you need? Is 10 MB/sec enough? 100 MB/sec? 10 GB/sec?
What kind of IO are you doing? Random or sequential? Are you doing mostly reads, mostly writes, or an even mix?
Is it mission critical? If something goes wrong, do you fix it the next day, or do you need access to a tier 3 help desk at 3 am?
We have a couple of petabytes of CMS-HI data stored on a homegrown object filesystem we developed and exported to the compute nodes via FUSE. Reed-Solomon 6+3 for redundancy. No SAN, no fancy hardware, just a bunch of Linux boxes with lots of hard drives.
There is no "one shoe fits all" filesystem, which is part of the reason we use our own. If you have the ability to run it, I'd suggest looking at Ceph. It only supports Linux, but has Reed-Solomon for redundancy (considered it a higher tier of RAID) and good performance if you need it. If you have to add Windows or OS X clients into the mix, you may need to consider NFS, Samba, WebDAV, or (ugh) OpenAFS.
You're asking like you will be implementing it... don't.
Gather all their requirements, gather your requirements on top of it (I'm pretty confident that some of those requirements were your additions for "you'd be an idiot to have that, but not also have this...", possibly including the backup).
Then put out an Preliminary RFP to the major storage vendors, including asking them what they'd say you'd missed in the preliminary.
Then take the recommendations they make on top of the preliminary with a grain of salt, since most of them will be intended to insure vendor lock-in to their solution set, revise the preliminary, and put out a final RFP.
Then accept the bid that you like which management is willing to approve.
Problem solved.
P.S.: You don't have to grow everything yourself from seed you genetically modify yourself, you know...
Make some sarcastic comment about tape library, tape library, or a library of tape
However you could probably get a rack of boxes running openVMS to present its pooled storage as a single blob of networked drive, which sounds like what you want. Backup to tape of course.
Unless you REALLY want to pay for it.
As someone who works in a Hospital system, Imaging Informatics specifically, we have roughly that much data spread across 2 locations. Backups aren't what you think they are. We backup the infrastructure config. Databases, VM cluster config and VM's, which compressed, probably equates to 5-10 Terabytes. That's it. That's the stuff which, if worst possible event happened, we wouldn't be exctly back to 0 when we rebuilt.
As for the 400-500 Terabytes of data, they're in what we call Archive state. There isn't backup of them, but they are in proper data centers with fire suppression. So there's that... Still, if 1 site went up, we'd be down that data. Thems the breaks... Goes back to money! But, what we do have, is evertying in RAID with Hot Spare. I think... I know 2 drives can fail in a block, and have recently, and we can recover the block. As 75% of this data is pretty much read-only transfer, the only stuff being written to permanent storage is new data. I think we're seeing 120-150 Terabyte of growth a year, and we're looking at new storage since current gear is at the 'EOL'. Life Cycle wise, not warranty or operation.
Point is, will we see a PetaByte storage system bought? Maybe, but it will be the same setup. Archive system, with backup for the 'guts', what I like to call it. Simply put, CXX's don't want to throw the $$ down for Petabyte Data store site duplication. If money was far more flowing to use, we'd at least start there and implement a 100-150 Terabyte SSD Caching block with 10GB Fiber, in and out. Not happening, but a man can dream...
Backblaze blog has a rundown of their storage pod https://www.backblaze.com/blog/storage-pod-4-5-tweaking-a-proven-design/
This with something like gluster, luster, cephe or even just nfs.
Using online services (Azure, AWS) you are looking at $5,000 - $10,000 per month for this kind of storage support (500TB). Realize that these businesses are not extremely high margin, so if your budget is orders of magnitude less than this you have an issue.
The disks alone for a completely non-redundant system are around $15,000. At this point, you should absolutely not be using Ask Slashdot for a resource, you are well into the "real" enterprise space and getting information/quotes from established vendors.
My assumption using zero facts would be your storage solution alone will be $150,000 or so, plus a few thousand in maintenance/bandwidth/hosting etc per month.
http://hardware.slashdot.org/story/09/09/02/138209/build-your-own-28m-petabyte-disk-array-for-117k
Backblaze is an online backup provider. They have open sourced some of their software and hardware designs.
They are currently storing over 150 Petabytes of user data. https://www.backblaze.com/blog/150-petabytes-of-cloud-storage/
They are working on scalability into the Zettabyte range https://www.backblaze.com/blog/vault-cloud-storage-architecture/
They have open sourced their hardware design for anyone to use. https://www.backblaze.com/blog/storage-pod-4-5-tweaking-a-proven-design/
They also looked into using 3rd party vendors but decided that they could build a better solution for at least 1/8 the price. https://www.backblaze.com/blog/petabytes-on-a-budget-how-to-build-cheap-cloud-storage/
I know that it is not a plug and play solution but if you are willing to build off of their work you can save a ton of money and have a solution that truly fits your needs.
That's the easiest question I've ever seen.
1. Wait about a decade or so.
2. Buy two half-petabyte flash drives.
3. Alternate your copies on the two flash drives, the previous one becomes your backup.
NEXT!
Get free satoshi (Bitcoin) and Dogecoins
Plenty of options to chose from. Google cloud has the best prices.
https://cloud.google.com/storage/?utm_source=google&utm_medium=cpc&utm_campaign=2015-q2-cloud-na-storage-bkws-freetrial-en&&gclid=Cj0KEQjw58ytBRDMg-HVn4LuqasBEiQAhPkhuh1xdQtfg4Eqt40cJJYA-SI9IoeXst1e861yuLSgYaYaAk9P8P8HAQ
You can just encrypt everything at rest if you're concerned with your data living 'in the cloud'.
Step 1: buy a metric shitton of storage space (virtual or physical)
Step 2: put your data on it
Step 3: ???
Step 4: profit
If you have a small budget and moderate reliability requirements, I'd suggest looking into building a couple Backblaze-style storage pods for block store (5x 180TB storage systems, apx $9000 each), each exporting 145TB RAID5 volumes via iSCSI to a pair of front-end NAS boxes. NAS boxes could be FreeBSD or Solaris systems offering ZFS filestores (putting multiples of 5 volumes, one from each blockstore, together in RAIDZ sets), which then export these volumes via CIFS or NFS to the clients. Total cost for storage, front-ends, 10GbE NICs and a pair of 10GbE switches: $60K, plus a few weeks to build, provision, and test.
If you have a bigger budget, switch to FibreChannel SANs. I'd suggest a couple HP StorServ 7450s, connected via 8 or 16Gb FC across two fabrics, to your front ends, which aggregate the block storage into ZFS-based NAS systems as above, implementing raidz for redundancy. This would limit storage volumes to 16TB each, but if they're all exposed to the front ends as a giant pool of volumes, then ZFS can centrally manage how they're used. A 7450 filled with 96 4TB drives will provide 260TB of usable volume space (thin or thick provisioned), and cost around $200K-$250K each. Going this route would cost $500-$550K (SANs, plus 8 or 16Gb FC switches, plus fibre interconnects, plus HBAs) but give you extremely reliable and fast block storage.
A couple advantages of using ZFS for the file storage is its ability to migrate data between backing stores when maintenance on underlying storage is required, and its ability to compress its data. For mostly-textual datasets, you can see a 2x to 3x space reduction, with slight cost in speed, depending on your front-ends' CPUs and memory speed. ZFS is also relatively easy to manage on the commandline by someone with intermediate knowledge of SAN/NAS storage management.
Whatever you decide to use for block storage, you're going to want to ensure the front-end filers (managing filestores and exporting as network shares) are set up in an identical active/standby pair. There's lots of free software on linux and freebsd that accomplish this. These front-ends would otherwise be your single-point-of-failure, and can render your data completely unusable and possibly permanently lost if you don't have redundancy in this department.
Your requirements arn't clear.
If your only requirement is size and single tree; Amazon S3. ....It is a touch slow and is remote.....But your costs are easy to predict and it's easy to tell each group exactly what their monthly expenses are.
There are some good desktop clients as well as a web client. I've used CyberDuck quite a bit on Mac.
Glacier is pretty cool too for cold data.
The problem with buying a huge san.....
You will plop down tens of thousands of dollars.
It will require constant maintenance
When its time to replace it, good luck, acquisition is always a total PIA
The ONLY reason you should look at a SAN for bulk data IS SPEED. .....though I suspect someone will make some super lame argument about security.....
If someone trys this, revoke their email access and give them a desktop.
I promise you the bigger security hole is the sloppy user with email access and a laptop in a car.
Amazon is not the week link in security, you and your staff are (no offense, the same is true where I work).
The mixed media storage stuff is fairly conventional, but for the analysis thing you need to work backwards. Data locality is a big problem, just moving that much data between storage and processing is a problem, expecially if you will be running it repeatedly. It's impossible to give an answer for storage workout knowing what the computational profile is like, but something in the big data space is most likely.
They'll be happy to talk to you for free, for the prospect of getting their hands on that kind of cash. You're easily looking at $.5M-$1M between storage, processing, and redundancy.
Sounds like you need the storage onsite at least for the research project.
The mixed media thing sounds like something to throw at the cloud unless there's a reason not to do that.
As to spanning volumes etc... I don't really understand the file structure of this research project. Having a petabyte of data in a single directory is typically the opposite of good ideas.
I'd like more information.
As to back ups... it depends on how frequently the information changes. Backup tapes are probably the cheapest way to go for backups of archives. 3 TB at 20 dollars a tape.... not bad. And you can do incremental back ups if there are little changes.
The tapes are supposed to last about 10 years. So that's something.
If we're talking about high frequency changes... you almost need to replicate the primary storage... and the number of times you need to do that is variable on how badly you need to not lose the data.
If we're talking about data that if lost orphans are going to get ground up into hamburger and fed to the dogs... you're going to want multiple back ups. If it would merely be annoying... maybe one back up is fine.
I've decided to stop wasting my time responding to AC trolls/sockpuppets... so if you want a response from me... login.
GPFS was built for this. Standard file access from any platform. Peta (and beyond) size hierarchical file tree across multiple systems. High availability, file recovery.
https://en.wikipedia.org/wiki/IBM_General_Parallel_File_System
We recently bought for our group a NAS server with ~200Tb of raw storage (175Tb after RAID6 with a good card). And this is NFS mounted to other servers. It is pretty easy to use and configure and quite cheap (20k UK pounds). Regarding the backup, I would probably just buy a second server. (maybe with cheaper confiuration, worse raid card, etc.)
The storage cluster I manage is a bit smaller than yours, but you could look at GlusterFS.
It is created with your requirements and scale in mind:
- Single hierarchy filesystem
- Flexible regarding underlying storage (SAN is possible, commodity hardware is also possible)
- No Single Points Of Failure in your cluster
- Targets the 'several petabytes' scale explicitly
I found GlusterFS extremely easy to setup. After receiving the hardware I had the cluster set-up in half a day (but studied it and tried a test setup before that).
It seems most /. posters are big on recommending commercial support (that is not a bad idea in most situations). Support is available from RedHat if you need that.
You will not get a good answer here, because even if there would be one it will be hard to find between all the nonsense.
BTW your scenario is incomplete and therefore it is unlikely to give a good answer. It looks a little bit like you want /. to make your homework.
You're not asking the right questions:
The first correct question is why on earth would someone need to access half a petabyte? In most cases the commonly accessed data is less than 1%. That's the amount of data that realistically needs to reside on disk. It never is more than 10% on such a large dataset. Everything else would be better placed on tape. Tiered storage is the answer to the first question. You have RAM, solid/flash storage (PCI based), fast disks, slow high capacity disks and tape. Choose your tiering wisely.
The second question you need to ask is how the customer needs to access that large datastore. In most cases you need serious metadata in parallel with that data. For Petabytes of data you cannot in most cases just use an intelligent tree structure. You need a web-site or an app to search that data and get the required "blob". For such an app you need a large database since you have 5M objects with searchable metadata (at 200MB/blob).
The third question is why do you have SAN as a premise? Do you want to put a clustered filesystem with 5-10 nodes? Probably Isilon or Oracle ZS3-2/ZS4-4 are your answer.
Fourth question: what are the requirements? (How many simultaneous clients? IOPS? Bandwidth? ACL support? Auditing? AD integration? Performance tuning?)
Fifth question: There is no such thing as 100% availability. The term disaster in Disaster Recovery is correctly placed. Set reasonable SLA expectations. If you go for five-nine availability it will triple the cost of the project. Keep in mind that synchronous replication is distance limited. Typically, for a small performance cost, the radius is 150 miles and everything above impacts a lot.
Even if you solve the problems above, if you want to share it via NFS/CIFS or something else you're going to run into troubles. Since CIFS was not realistically designed for clustered operation regardless of the distributed FS underneath the CIFS server, you get locking issues. Windows Explorer is a good example since it creates thumbs.db files, leaves them open and when you want to delete the folder you cannot unless you magically ask the same node that was serving you when it created the Thumbs.DB file. Apparently, the POSIX lock is transferred to the other server and stops you from deleting, but when Windows Explorer asks the other node who has the lock on the file you get screwed since the other server doesn't know. Posix locks are different from Windows locks. It affects all Likewise based products from EMC (VNX filler, Isilon, etc.) and it also affects the CIFS product from NetApp. I'm not sure about Samba CTDB though.
I would design a storage based on ZFS for the main tiers, exported via NFSv4 to the front-end nodes and have QFS on top of the whole thing in order to push rarely accessed data to Tape. The fronted nodes would be accessed via WebDAV by a portal in which you can also query the metadata with a serious DB behind it.
I've installed Isilon storage for 6000 xendesktop clients that all log-on at 9AM, i've worked on an SL8500, Exadata, various NetApp and Sun storages and I can tell you that you need to do a study. Have simulations with commodity hardware on smaller datasets to figure out the performance requirements and optimal access method (NAS, Web, etc.). Extrapolate the numbers, double them and ask for POC and demos from vendors, be it IBM, EMC, Oracle, NetApp or HP. Make sure that in the future, when you'll need 2PB you can expand in an affordable manner. Take care since vendors like IBM tend to use the least upgradable solution. They will do a demo with something that can hold 0,6PB in their max configuration and if you'll need to go larger you'll need a brand new solution from another vendor.
It's not worth doing it yourself since it will be time-consuming (at least 500 man-hours until production) and with at least 1 full-time employees for the storage. But if you must, look at Nexenta and the hardware that they recommend.
And remember to test DR failover scenarios.
Good luck!
UNIX was not designed to stop you from doing stupid things, because that would also stop you from doing clever ones.
Library storage sounds like that may be your best choice. Several high end vendors sell such systems and may need to have RFS and RFQ's submitted, not to mention seeing the systems in action. This is not going to be cheap, but it's best on the long term investment. Ensure that it is scalable and can handle any future expansions without investing in whole new kit or that will simply put your department back to square one.
First rule of holes; When in one, stop digging.
On a SAN the 16tb limit comes generally from 32 bit SANs the 64 bit SANs wouldn't have it. Plenty of SAN solutions can handle 500tb or 10x that much. So just upgrade. If you only want backup there are plenty of hardware backup devices that handle this. For example exagrid scales to I believe 300tb / hr much less 500tb total. This isn't gigantic in today's world. You just need to have a conversation with your vendor, or an agent. You aren't asking for anything abnormal or challenging.
I manage 6 Isilon clusters, 4 of them has 1.2PB. Today with 20 x X410 nodes you can have 1.2PB and it's scalable, fast and can be backed up easy. Of course the price is high but the solution works perfectly.
There are many Enterprise SANs that can support that size. Dell Compellents have a maximum LUN size of 10PB for instance.
Just put "bomb" and "assassinate" in every line. ... It's all going to get backed up.
But getting them to restore it after it's gotten lost or corrupted is difficult.
Bantam Dominique roosters crow a four-note song. Once you've heard it as "Happy BIRTHday" you can't NOT hear it that way
For high throughput/IOPS requirements build a Lustre/Ceph/etc. cluster and mount the cluster filesystems directly on as many clients as possible. You'll have to set up gateway machines for CIFS/NFS clients that can't directly talk to the cluster, so figure out how much throughput those clients will need and build appropriate gateway boxes and hook them to the cluster. Sizing for performance depends on the type of workload, so start getting disk activity profiles and stats from any existing storage NOW to figure out what typical workloads look like. Data analysis before purchasing is your best friend.
If the IOPS and throughput requirements are especially low (guaranteed < 50 random IOPS [for RAID/background process/degraded-or-rebuilding-array overhead] per spindle and what a couple 10gbps ethernet ports can handle, over the entire lifetime of the system) then you can probably get away with just some SAS cards attached to SAS hotplug drive shelves and building one big FreeBSD ZFS box. Use two mirrored vdevs per pool (RAID10-alike) for the higher-IOPS processing group and RAIDZ2 or RAIDZ3 with ~15 disk vdevs for the archiving group to save on disk costs.
Plan for 100% more growth in the first year than anyone says they need (shiny new storage always attracts new usage). Buy server hardware capable of 3 to 5 years of growth; be sure your SAS cards and arrays will scale that high if you go with one big storage box.
Buy Storage Pods, designed by BackBlaze. You can get 270TB of raw storage in 4U of rackspace for $0.051 per gigabyte. Total cost for half a petabyte of raw storage: $27,686. To back it all up cheaply but relatively effectively, buy a second set to use as a mirror. $55,372. For use with off-the-shelf software (FreeNAS running ZFS or Linux running mdm RAID) to present a unified filesystem that won't self-destruct when a single drive fails, you'll need to over-provision enough to store parity data. Go big or go home. Just buy another pod for each of the primary and the backup sets. Total of 6 pods with 1620TB of raw storage: $83,058. Some assembly required. And 24U of rackspace required, with power and cooling and 10Gbe ethernet and UPSs (another 4-8U of rackspace).
Expect a ballpark price of something a little under $100,000 that will meet your storage requirements with sufficient availability and redundancy to keep people happy. It will require 2 racks of space, and regular care and feeding. Do the care and feeding in house. A support contract where you pay some asshole tens of thousands of dollars a year to show up and swap drives for you is a waste of money. Bearing that in mind, as other posters have said, talk to storage vendors selling turnkey solutions. Come armed with these numbers. When they bid $1 million, laugh in their faces. But there's an outside chance you'll find a vendor with a price that is something less than hyperinflated. Stranger things have happened.
If you don't generate data very quickly, you can ease into it. For around $35,000, you can start with just 2 pods and the surrounding infrastructure, and add pods in pairs as necessary to accommodate data growth. Add $27,000 in 2 chassis next year to double your space. Add $26,000 of space again in 2017 and increase your raw capacity another 50%. (Total storage cost using BackBlaze-inspired pods is dominated by hard drive prices, which trend downwards.) When you find out your users underestimated growth, another $25,000 of space in 2018 takes you to somewhere in the neighborhood of 2 petabytes of raw storage, that you're using with double parity and 100% mirrored backup for a total effective useable space of approximately 918TB. You'll be replacing 2-3 drives per year, starting out, and 0-1 after infant mortality has run its course. Keep extras in a drawer and do it yourself in half an hour each on a Friday night. If you configured ZFS with reasonably sized vdevs, (3-5 devices) the array rebuild should be done by Monday morning. By 2020, you'll be back up to replacing 2-3 drives per year again as you climb the far side of the bathtub curve. While you're at it, you can seriously consider replacing whole vdevs with larger capacity drives, so your total useable space can start to creep up over time, without buying new chassis. By 2025, you will have 8 chassis in two racks hosting 2.88PB of raw storage space that's young and vital and low maintenance, having spent roughly $200,000.
A bargain, really.
Super-Micro has 36 and 72 drive racks that aren't horrible human effort wise (you can get 90 drive racks, but I wouldn't recommend it). You COULD get 8TB drives for like 9.5 cent / GB (including the $10k 4U chassi overhead). 4TB drives will be more practical for rebuilds (and performance), but will push you to near 11c / GB. You can go with 1TB or even 1/2TB drives for performance (and faster rebuilds), but now you're up to 35c / GB.
.. $200k.. But you can grow into it.
That's roughly 288TB of RAW for say $30k 4U. If you need 1/2 PB, I'd say spec out 1.5PB - thus you're at $175K
Note this is for ARCHIVE, as you're not going to get any real performance out of it.. Not enough CPU to disk ratio.. Not even sure if the MB can saturate a 40Gbps QSFP links and $30k switch. That's kind of why hadoop with cheap 1CPU + 4 direct-attached HDs are so popular.
At that size, I wouldn't recommend just RAID-1ing, LVMing, ext4ing (or btrfsing) then n-way foldering, then nfs mounting... Since you have problems when hosts go down and keeping any of the network from stalling / timing out.
Note, you don't want to 'back-up' this kind of system.. You need point-in-time snapshots.. And MAYBE periodic write-to-tape.. Copying is out of the question, so you just need a file-system that doesn't let you corrupt your data. DEFINITELY data has to replicate across multiple machines - you MUST assume hardware failure.
The problem is going to be partial network down-time, crashes, or stalls, and regularly replacing failed drives.. This kind of network is defined by how well it performs when 1/3 of your disks are in 1-week-long rebuild periods. Some systems (like HDFS) don't care about hardware failure.. There's no rebuild, just a constant sea of scheduled migration-of-data.
If you only ever schedule temporary bursts of 80% capacity (probably even too high), and have a system that only consumes 50% of disk-IO to rebuild, then a 4TB disk would take 12 hours to re-replicate. If you have an intelligent system (EMC, netapp, ddn, hdf, etc), you could get that down to 2 hours per disk (due to cross rebuilding).
I'm a big fan of object-file-systems (generally HTTP based).. That'll work well with the 3-way redundancy. You can typically fake out a POSIX-like file-system with fusefs.. You could even emulate CIFS or NFS. It's not going to be as responsive (high latency). Think S3.
There's also "experimental" posix systems like ceph, gpfs, luster. Very easy to screw up if you don't know what you're doing. And really painful to re-format after you've learn it's not tuned for your use-case.
HDFS will work - but it's mostly for running jobs on the data.
There's also AFS.
If you can afford it, there are commercial systems to do exactly what you want, but you'll need to tripple the cost again. Just don't expect a fault-tolerant multi-host storage solution to be as fast as even a dedicated laptop drive. Remember when testing.. You're not going to be the only one using the system... Benchmarks perform very differently when under disk-recovery or random-scatter-shot load by random elements of the system - including copying-in all that data.
-Michael
Lucky (?) for you, I just went through purchasing a storage refresh for a cluster, as we're planning to move to a new building and no one trusts the current 5 year old solution to survive the move (besides which, we can only get 2nd hand replacements now). The current system is 8 shelves of Panasas ActiveStor 12, mostly 4 TB blades, but the original 2-3 shelves are 2 TB blades, giving about 270 TB raw storage, or about 235ish TB in real use. The current largest volume is about 100 TB in size, the next-largest is about 65 TB, with the remainder spread among 5-6 additional volumes including a cluster-wide scratch space. Most of the data is genomic sequences and references, either downloaded from public sources or generated in labs and sent to us for analysis.
As for the replacement...
I tried to get a quote from EMC. Aside from being contacted by someone *not* in the sector we're in, they also managed to misread their own online form and assumed that we wanted something at the opposite end of the spectrum from what I requested info on. After a bit of back and forth, and a promise to receive a call that never materialized, I never did get a quote. My assumption is they knew from our budget that we'd never be able to afford the capacities we were looking for. At a prior job, a multi-million dollar new data center and quasi-DR site went with EMC Isilon and some VPX stuff for VM storage/migration/replication between old/new DCs, and while I wasn't directly involved with it there, I had no complaints. If you can afford it, it's probably worth it.
The same prior job had briefly, before my time there, used some NetApp appliances. The reactions of the storage admins wasn't all that great, and throughout the 6 years I was there, we never could get NetApp to come in to talk to us whenever we were looking for expansion of our storage. I've had colleagues swear by NetApp though, so YMMV.
I briefly looked at the offerings from Overland Storage (where we got our current tape libraries), on the recommendation of the VAR we use for tapes & library upgrades. It looked promising, but in the end, we'd made a decision before we got most of those materials...
What we ended up going with was Panasas, again. Part of it was familiarity. Part of it was their incredible tech support even when the AS12 didn't have a support contract (we have a 1 shelf AS14 at our other location for a highly specialized cluster, so we had *some* support, and my boss has a golden tongue, talking them into a 1-time support case for the 8 shelf AS12). We also have a good relationship with the sales rep for our sector, the prior one actually hooked us up with another customer to acquire shelves 6-8 (and 3 spares), as this customer was upgrading to a newer model. Based on that, we felt comfortable going with the same vendor. We knew our budget, and got quotes for three configurations of their current models, ActiveStor 14 & 16. We ended up with the AS16, with 8 shelves of 6 TB disk (x2) and 240 GB SSD per blade (10 per, plus a "Director Blade" per). Approximate raw storage is just a bit under 1 PB (roughly 970-980 TB raw for the system).
In terms of physical specs, each shelf is 4U, have dual 10 GbE connections, and adding additional shelves is as easy as racking them and joining them to the existing array (I literally had no idea what I was doing when we added shelves on the current AS12, it just worked as they powered on). Depending on your environment, they'll support NFS, CIFS, and their own PanFS (basically pNFS) through a driver (or Linux kernel module, in our case). We're snowflakes, so we can't take advantage of their "phone home" system to report issues proactively and download updates (pretty much all vendors have this feature now). Updating manually is a little more time-consuming, but still possible.
As for backups, I honestly have no idea what I'm going to do. Most data, once written, is static in our environment, so I can probably get away with infrequent longer retention period backups for every
"The urge to save humanity is almost always a false front for the urge to rule." --H.L. Mencken
Or "spectrum scale" as it is called now, with TSM for backup if you can't afford a second disk replica copy.
Apache's Hadoop
Just distribute your data all around.
One of these will do you well
https://en.wikipedia.org/wiki/...
For storage that's trickier. You probably need to characterize your usage before you talk to a vendor otherwise they will oversell you into oblivion.
Get 1000 WB Black 1TB HD's. Put EXT2 on it.
Build a PHP front end for clients.
Done.
...of Windows10 boxes!
Sacred cows make the best burgers.
Hands-down, the EMC Data Domain is the best option for backing up such a large amount of data.
Where I work, we are running EMC's Isilon platform. We have ~4PB of data replicated between two data centers.
The platform supports the traditional CIFS/SMB and NFS for client connectivity.
It also has Hadoop support (HDFS). The great thing about the HDFS support is that you do not have to spin a separate file system for it. The same files that your clients access via CIFS or NFS can be accessed via HDFS. Isilon was built with Hadoop in mind and the Isilon nodes act as Hadoop "compute nodes".
The OneFS file system presents a practically unlimited in size, single file system. There are some interesting tuning options that can be leveraged depending on your data type and IO patterns. If you need to get REALLY crazy, the system has support for tiering data based on a whole slew of different factors (last accessed date, file date, file size... basically any file metadata attribute you can think of can be used for tiering purposes).
This probably does not matter for you, but the system also supports AES256 at-rest encryption. We deal with a lot of financial and other highly sensitive data for clients that demand at-rest encryption, so that was a must have for us.
The only downside is that since it is from EMC, you can plan on paying through the nose for it. (But never pay full retail for EMC, ever. Threaten them with NetApp if you have to. ;) )
We still leverage a SpectraLogic tape library to archive data off of the system. With a moderately specced NetBackup system we get a consistent ~35000kb/s restore rate off of a single drive. That lets us provide reasonable RTOs back to the business.
On the subject of backup, another great thing about Isilon is that you can dedicate certain nodes to specific tasks. In the Isilon architecture, the NL nodes are the slowest nodes that they have. We leverage those for backup to keep the network IO off of the faster X and S-nodes.
No way that you should roll your own at this point in time. The future is all clouds all the time. Be on thee leading edge instead off the trailing.
500TB is nothing these days. You can easily buy any system and it will support it. Look at FreeBSD/FreeNAS with ZFS (or their commercial counterpart by iXSystems). If you want to have an extremely comfortable, commercial setup, go Nexenta or with a bit of elbow grease, use the open/free counterpart OpenIndiana (Solaris based).
You can build 2 systems (I personally have 3, 1 with SAS in Striped-Mirrors, 1 with Enterprise-SATA in RAIDZ2 and 1 with Desktop-SATA in RAIDZ2) and have ZFS snapshots every minute/hour/day replicated across the network for backups, both Nexenta and FreeNAS have that right in the GUI. The primary system also has a mirrored head node which can take over in less than 10s. As far as sharing out the data: AFP/SMB/NFS/iSCSI/WebDAV etc. whatever you need to build up on it.
My system is continuously snapshotted to it's primary backup so that in case of extreme failure (which has not happened in the 7 years since I've built this system) I can run from the primary backup until the primary has been restored with perhaps a few seconds of data loss (don't know if that's acceptable to you but in my case it's not a problem in case we do have a full meltdown)
Where are those systems limited to 16TB? I wouldn't touch them with a 10-foot pole because they're running behind (within a few years a single hard drive will surpass that limit).
Custom electronics and digital signage for your business: www.evcircuits.com
What are your performance requirements. If you just need a giant dump of semi-offline storage then look into building a backblaze Storage Pod.
https://www.backblaze.com/blog...
For about $30,000 you could build four storage pods. Speed would not be terrific. Backups are handled through RAID. If you want faster, more redundant or fully serviced your next step up in price is probably a $300,000 NAS solution. Which might serve you better anyway.
what ever hardware you buy, get Veritas Volume Manger (or whatever new name it's under)
then splitting off a copy for backup / test / snap shot / duplication will be easy and reliable.
And you will have a leg up on disaster recovery offsite.
Plus you won't have an expensive hardware vendor dependency (EMC, Netapp, etc)
You can get those guys to bid against each other every time you need storage.
Your sysadmin will have more time to solve your other problems !
I know I loved this product, it paid for itself every time we did any data migration.
Turned a difficult to manage project on off hours into a, the DBA & I will do it in the background this week without interuption.
Use Amazon S3 storage (gives you cloud storage with a directory tree.
Accessible via desktop apps or even web browser if you want.
For stuff they want to archive but will rarely ever use have those S3 folders archive to Glacier.
Nothing to backup and you can store petabytes in glacier cheaper than any other option on the planet. :)
Here's an option.
http://www.nasuni.com/solutions/scale-out-storage/
"Whether the system is 10TB, 100TB or 10PB, it is available through every Filer "
You can do that with a single server with a rack full of drive expansion bays and SAS expanders.
Probably a fraction the cost of a NAS solution as well.
And if you need a backup, just replicate it to another box of exactly the same spec in a different building on the same site, hook them up with 40Gbit of bandwidth and you can replicate to your hearts content.
Simplez.
If you need to ask HERE about THIS kind of install, you are the wrong guy to be handling this. Seriously.
What's next when you have the system? You come back with questions about tuning and operations? You want credit for the work without doing the work.
Where I work we deal with data sets of a similar order. However, different data sets are stored differently depending on need. For online relational data where performance is critical, it's in master/slave/backup DB clusters running with 4.8TB PCIe SSDs. The backups are taken from a slave node and stored locally, plus they're pushed offsite. No tape, if we need a restore we can't really wait that long.
For data we can afford to access more slowly we use large HDFS clusters with regular SATA discs. There's a level of redundancy built in there, and where data is important enough to need a real backup (much of it is not) it is also pushed offsite. The HDFS approach has the advantage of presenting as a very large filesystem, and obviously if you're running hadoop against it there's an automatic advantage.
---- Den ene knappen er powerknapp, den andre er Bender voice knapp "Bite My Shiny Metal Ass"
While I agree with most commenters that you need to supply many more details before even beginning to narrow the options, if you do look at the storage vendors, DDN (Data Direct Networks) is really hard to beat.
I see the EMC Isilon guys posting here and need to counter. :) They are overpriced and underpowered for almost every application. Their strength is typical enterprise environments - lots of small files accessed via NFS and "enterprise" SLAs. That's almost always the wrong solution for big data applications (NFS is terrible for big data). EMC Isilon sold a lot of storage into my space (gene sequencing) and very few customers are happy, especially when they find out what the other vendors could do.
I've organized bake-offs between DDN, Isilon, and a number of other vendors. DDN always came out ahead on price and performance (every time they were half the price and twice the speed as Isilon). DDN is the most represented of the vendors on the Top 500 Supercomputing list and also power a certain streaming movie/TV service we all know and love. DDN is also a pretty ethical - if they're a bad match for your application, they'll let you know and provide recommendations.
Whatever you do, don't build it yourself. As tempting and fun as it is, given that you're asking the question, you've already self-identified as someone who won't be able to support it. I've seen many smart people go the SuperMicro JBOD route only to create support nightmares for themselves.
Also, for that much space, avoid Amazon at all costs. It's way too expensive compared to dedicated hardware.
For cost, budget around $150-250k to get started. It might seem pricey, but you'll spend more than that on manpower building it yourself (or your first few months on Amazon).
In addition to DDN, IBM, Dell, and HP all have solutions in this range that aren't terribly expensive.
-Chris
Gluster or Ceph, depending on requirements.
Both are Open Source, call Red Hat if you want support.
I keep it all in a separate drive, and only mount it when I want to look at the data. Also, I mount it under .porn, so it isn't visible in a casual listing.
A republic cannot succeed till it contains a certain body of men imbued with the principles of justice and honour.
Or, Copy on Write with tons of copies and cheap storage...
What OS? If you are using Windows, Shadow Copies do this... if using Linux, use LVM with snapshots. Do that on both sides.
Or store it on a SAN with snapshots and replicate.
Put it on /dev/null :-)
Given how few use cases there are like the one you describe, there are probably a lot of important considerations that didn't make it into your question that make your use case unique.
This is one of those cases where you really need to sit down and decide what works best for your situation, NOT what works best for other situations that require this amount of data storage.
Knowledge is how to play a game, intelligence is how to win, wisdom is knowing what game to play.
OK, I'll do your job. Use multiple storage servers with DFS. To backup your DFS, buy the same thing somewhere else with 1.5x the capacity, and set up an rsync with a dedup FS. Aren't you glad you asked /. to do your job?
To store files close to a petabyte, you need a petafile, obviously.
We got an EMC Isilon X410 cluster last year where I work. It supports SMB, NFS, HDFS, or OpenStack Swift. I'd recommend storing/retrieving your 100-200MB objects programmatically using Swift. IF they need the directory-tree, you can present it over SMB or NFS to the humans. We use a slower/larger Isilon NL cluster at our DR site which we replicate live date and snapshots to from the main one.
Storing the data is the easy part, Glusterfs should do it just fine. The point I am curious about is backups: how do you backup such a volume?
Well, if you want users (multiple) to have access, you need a NAS, not a SAN. I've had pretty good luck with EMC Isilon. It's a NAS, supports easy scale out with a minimum of 3 or 4 nodes and a maximum of something over 100. A single file system can be I think multiple terabytes
Disclaimer: I work for a storage vendor. Also a long time Slashdot reader though, so this isn't mean as a sales pitch.
Half of a petabyte is not really a lot of data in today's world. I talk to people every day that are trying to find ways to manages many PBs (into the hundreds) and are having challenges doing this with traditional storage. The trend that was started by the big Internet companies is to get rid of the fibre-channel SANs and instead solve the problem of storage using standard x86 servers. They use Linux as an abstraction layer from the hardware, and applications acting as storage systems too pool many servers together.
One of the challenges you need to get over is stretching a namespace that big without filesystem limitations like maximum inode counts. This is generally accomplished using some type of key/value store (object) under the hood. Single flat namespaces with no practical size barrier.
Some options that are available today are Swift from OpenStack and Ceph from Red Hat if you want to go the open source route. These can be good choices if you have the engineering staff on hand to piece it all together and the talent to keep it running. GPFS is also making a come back in this area, and there are a ton of startups looking at this space now.
My company has a commercial solution for this stuff. Pretty cool - it's a Linux app and runs on the server of your choice. I'l save you the sales pitch, and if you want you can try it for free on your own here: http://scality.com/trial
Whatever you choose, best of luck to you!
I am a professional and manage several hundred petabytes globally. From experience I can tell you, they may be asking for half petabyte right now but tomorrow that will double and again next year and so on. Plan big to start with and you'll save your future self a lot of grief! If you PM me I can give you more details but in short I can suggest:
1) Look at a scalable filesystem like GPFS or StorNext. Yes there is a price tag associated with big iron filesystems (and no I don't work for any of them) but you get what you pay for, and scalability is everything. As an example - pairing GPFS with TSM and the right hardware, I can create an infinitely scalable filesystem that'll scale to yodabytes.
2) Tier the storage system. Think SSD for the cache (here and now) I/O, winchester disk for the short term and tape for the long term. Yes, tape: compute cost per tb on tapes the vault versus square footage in the data center.
3) Separate your networks. Keep the client access separated from the disk i/o. Doing this will save massive congestion problems from day one!
There are lots of other things to consider but by today's standards a half petabyte isn't an insurmountable amount of data just like a terabyte was twenty years ago.
It may sound "funny," but I once priced Mega (KimDotCom) for offsite backup & storage. They turned out to be less expensive than Amazon Glacier by a bit AND instantly available. We didn't go with them. Instead, we replicated across data centers with multi-terabyte storage nodes.
I back up my 3 exabytes of porn by printing out the contents of the files.
Store it in the cloud. 1/2 petabyte isn't even the "highest tier" requirement.
On Azure it will cost $168k/year to store this much data instantly accessible. Whatever other solution you come up with, if it takes more than 1 full time person to support, then it's already more expensive (and that's not even including the up-front capital costs, installation and setup costs, training costs, deprecation, maintainance, ...)
I would look at CohoData, Cleversafe, Qumulo, or Cloudian. It scales well, easy to manage, and comes in around $0.70-$1.00 per GB
Sounds like a fairly simple case for a Hadoop cluster - a smallish one at that. We're currently deploying to clusters at 1PB/rack density, which means you could deploy a rack or two easily enough. You'd get compute, you get a single flat filesystem, you get redundancy, all built in. Our biggest cluster is now up to 16PB, all one big compute/storage beast, chugging away all day.
I'd suggest starting with the Hortonworks Sandbox VM - grab it, fire it up, play with it. Add some files, poke around, see if it meets your needs. Learn about mapreduce, or maybe your data can be put in to HIVE for analysis.
The nice thing is that yo ucan use hardware you may already have to get things going. Hortonworks is pretty much at the point of a 'next next finish' installer, so you really only need to dedicate a few hours to getting something up to test. Then, thre's a lot of tuning and craziness to running a bigger cluster, but a POC is simple.
Anyhow, I'm blind, because all I do is Hadoop clusters all day, but this seems like an easy win for ya.
GL;HF!
We emerge from our mother's womb an unformatted diskette; our culture formats us. - Douglas Coupland
Not only are you out of your league, but you're barking up the wrong tree.
1) You should hire someone to figure it out for you- as either on-site consultancy or use something like amazon.
2) You should use a different site that has more than 5 legitimate comments on a thread.
If costs are not a priority look into using multiple EMC SANs striped in a RAID array. I've installed a few with the largest encompassing 14 physical units for ~100 VMs, they work great.
Get quotes from Netapp, EMC, and Red Hat.
How about MooseFS (http://moosefs.org) for an OSS solution, or if you want appliances off the shelf that won't cost you a limb or three, Exablox (http://exablox.com). Or if you need more than the 700TB that can give you, how about http://www.scality.com/ - which is software defined and you can use your own iron.
-- Sig Sig Sputnik
Amazo storage with a dedicated connection to the Amazon cloud from your data center
Both are free, hardware agnostic and the future of software defined storage. And Red Hat can provide enterprise support if you need.
Disclaimer I work at one of the Big 5 storage vendors, but we try to be as upfront and straight forward when dealing with our customers. It is one of the reasons that time and again our sales and support teams are cited as being strong to work with. It is not to say that we're perfect, but just that we care a awful lot about our customers.
All of the posters who talk about IOPs, throughputs, availability requirements, required operating models, etc. are right. Basically these folks point out that you must define and adhere to your requirements and do things like compute the total cost of ownership over a 3-5 year time horizon -- basically the time to fully depreciate the equipment from a taxation perspective. In the TCO you'll want to include everything you think that you want to tackle: Ongoing development support or not, sparing or not, supply chain management or not, WAN bandwidth costs (especially important if you're partially in one of the Big 3 Cloud platforms), needs for regulation/legalities (Example if you're in an industry which must report on data breaches especially with customer data, think Payment Card Industry or Health Care, you may want a partner to share liabilities), O&M costs (including employees), and so on. Normalizing to a financial model will give you some indication which approach to take, and I would add if you do the model you should look at both the cost and benefit angle. In this case if the system is more directly related to revenues and acquiring the system allows you to increase your business volumes (e.g. revenues) even if the costs are higher then perhaps the lowest cost solution isn't the right approach.
While the above explanation doesn't really cover what we do as one of the Big 5 I will tell you that we have a Chief Economist and spend time with our customers to do the kinds of modeling I mention above. So the short answer is really: Don't purchase IT fashion, do your homework and come up with a solution that provides the best financial benefits to your company even if it is a multi-million dollar storage infrastructure OR only something stood up one of the Big 3 clouds.
As to the point of backup & DR when you begin looking at the total costs, including WAN pipes, make sure you're also adding in restoration simulation and thinking about how to have your users participate in some (or all) of the human generated data recovery practice. Barring legal retention requirements -- some of which can be challenging like those in the Healthcare industry where retention is at least for the life of the patient -- the defining criteria is about data restoration regardless of if your copy is onsite, offsite, mixed up with disaster recovery, etc. Data that is backed up and cannot be restored is well worthless. Even when thinking about fundamental data protection there are areas to be concerned about like multiple drive loss scenarios in a protection set, media reliability and so on. Here's an interesting point: These days there's lots of technology in the area of fundamental data protection like predictive sparing, RAID, erasure coding (yes I know RAID is a form of EC, but...), tape, and BluRay (thank you Facebook). It is this last technology that I want to talk about because it changes disaster protection in my opinion.
What if your media was certified for say 50 or 100 years, could survive water events, and was impervious to EMPs? Well this is BluRay and the advantage which BluRay has over tape: the media format started from the CD and carries over to today. I'll be the first to admit that there's still work to do in the industry, but BluRay shows promise and would have a significant impact on a disaster recovery process because there are new assumptions that could be made.
So my point is do your homework, not all of the Big 5 are evil, and at least some of the Big 5 are savvy enough to know that sometimes we're not the best answer!
To save boatloads of money you can build the storage yourself - for us it has been working very well for many years
http://www.juhonkoti.net/2012/01/02/building-a-85tb-cheap-storage-server-with-solaris-openindiana
I did some work with Ceph and it was a very interesting experience. Instead of a single server or machine hosting data I had several. I could completely kill a server permanently and ungracefully and the other would have that data and replicate it to a new server. Adding openstack to the equation you now have 100% disposable and virtually configurable nodes that can be used however you want. all it does is PXE into the stack and you then tell the stack what that new node is, who it belongs to and what its role is. This could help you resolve the issue behind both groups needs and allow you to start and grow/scale with the group needs as opposed to running a bunch of storage iron empty while they ramp up. you then buy Cheap storage nodes that run Inexpensive disk and add some SSD for journaling/metadata (they suggest it and no joke it helps).
Best part of this you can grab 5 uber cheap servers off wherever load up and test it to see its proof of concept with nothing more than an investment of time.
But how are YOU gonna access the backup ?
Particularly from sunny Guantanamo ?
Lustre, IB Fdr, some type of 12Gb/s SAS Jbod. You dont backup on these types of use cases, hence you need a highly scalable filesystem like Listre. You only need to backup critical work data sets, and head nodes for the most part.
Hi ... been through this. At this scale, small considerations you used to ignore really do matter. If you talk to a pro, you'll need to find out 1) How hot does the backup need to be ... is instant failover required?, 2) How many threads will be reading/writing at the same time ... meaning can you create just bulk storage, or does it need have parallel access ... the difference can be a factor of 5X, 3) Assuming you need some level of storage redundancy, are you talking RAID5, RAID6, RAID10, etc ... or can you deal with redundancy on a file basis (e.g., a Gluster file system), 4) Are there different recovery scenarios? ... Losing a file may be solved by shadowing at the file system level, and if that's what you really need, maybe you don't need full binary backups, 5) Is there a data set that can be used as a kernel to regenerate the rest of the data? ... is there a tradeoff between backup size/complexity versus processor cycles used to regenerate from a data kernel? Moral: large storage is not a upscaled version of small storage.
I worked on a project to do just this for research data. We decided to purchase two storage arrays and backup by taking snapshots and replicating the data to the other site. A NetApp DS4486 disk shelf can hold 48x4TB drives (192TB raw) in 4RU. Also, NetApp's only replicate unique data (i.e if your 500TB dedupes down to 250TB you only need 250TB at the backup site). Just setup snapshot and replication policies and away you go. That'd be how I'd do it. Best thing is that you can check all of the data from the backup site as it's all online. They also sell an Amazon virtual NetApp appliance so if you don't have another good site to replicate to then you can do that.
'nuff said
Bingo Dictionary - Pragmatist, n. A myopic idealist.
We store and backup about this much data (a little more), although spread across a variety of machines. All in all, though, the data is primary virtual hard drives (we run a private cloud environment).
Storing it on disk is easy enough - and cheap enough, that it's little concern. Amazon, Azure, etc. are *insanely* expensive for this task, month by month, compared to self owned disks.
As our hypervisors are all Microsoft (Hyper-V - and yes, I know this is Slashdot and I just said I use a Microsoft product but it's easily the most economical approach, when 99% of your clients need Windows licensing), we use Windows Server 2012 R2 native tiered storage pools on a mix of SATA HDD and SSD to achieve the storage, generally spread across a group of Supermicro servers with large numbers of disk bays - effectively software defined storage.
For backup, we use the highly dense 1RU servers, with 12 bays (Supermicro again), with commodity 6 or 8TB SATA disks. Each RU can get near to 100TB of storage (raw) and they don't use much kW - and they cost hardly anything. Backups are performed using Microsoft DPM 2012 R2, as well, because, again, cheapest option and so far, 0 problems.
The biggest issue I have is airwalled backups - those are hard to manage, for low dollars, for this kind of setup. So I've resorted to having a few more backup machines and manually swapping the network cable from one group, to the next, as the equivalent of swapping tapes.
Hitachi Data Systems' virtual storage platform (an 'appliance' front-end to virtualize FC SAN arrays) did this c. 2003, so I suppose the technology is still available now.
I've used Sanify for the last 4.5 years. Rock solid. Commodity hardware, interface via iSCSI, auto-failover and migration while hot, whatever interconnect you want (ethernet, FC, IB) and software control for replication count and controller count. 16+TB volume might need a little more room in meta-data to support though. Sales email.
It will keep things redundant and safe from corruption. A incremental backup of the volumes can provide a large grain backup offsite.
It will keep things redundant and safe from corruption. A incremental backup of the volumes can provide a large grain backup offsite.
Works to Exabyte sizes.