Ask Slashdot: How Do You Store a Half-Petabyte of Data? (And Back It Up?)
An anonymous reader writes: My workplace has recently had two internal groups step forward with a request for almost a half-petabyte of disk to store data. The first is a research project that will computationally analyze a quarter petabyte of data in 100-200MB blobs. The second is looking to archive an ever increasing amount of mixed media. Buying a SAN large enough for these tasks is easy, but how do you present it back to the clients? And how do you back it up? Both projects have expressed a preference for a single human-navigable directory tree. The solution should involve clustered servers providing the connectivity between storage and client so that there is no system downtime. Many SAN solutions have a maximum volume limit of only 16TB, which means some sort of volume concatenation or spanning would be required, but is that recommended? Is anyone out there managing gigantic storage needs like this? How did you do it? What worked, what failed, and what would you do differently?
Do you mean:
(a) "Don't store it. Employ Amazon (or some other cloud) storage."? or
(b) "Do not use Amazon."
Clarity: it's like that one thing that is not the other thing, except for when it is.
Get thee glass eyes, and, like a scurvy politician, seem to see things thou dost not.--King Lear
At Facebook, it's memcached, with an HDD backup, eventually put onto tape...
At Google, it's a ramdisk, backed up to SSD/HDD, eventually put onto tape...
For anyone who can't afford half a petabyte of RAM with the commensurate number of computers? I have no good ideas... except maybe RAM cache of SSD, cache of HDD, backed up on tape...
Using something like HDFS to store your data in a Hadoop cluster of file requests, is likely the best F/OSS solution you're going to get for that...
WARNING! This girl exceeds the MAXIMUM SAFE standards established by the FDA for BRATTINESS
Honestly, that's the WORST thing to do. When you talk to the pros, they will try and sell you some outrageous overpriced Fiber Channel system that's total overkill for what you are doing. I've worked with 'big data' storage companys like EMC and Netapp. We needed 300TB of 'nearline' storage, and EMC came up with a $3,000,000.00 TOTAL overkill Fiber Channel solution, and Netapp wasn't much better, coming in at close to $2,000,000.00. Total ripoff. The ONLY reason you would ever choose Fiber Channel over ISCSI is if you are doing HUGE transactional database, with millions of access per minute. If you just need STORAGE, I went with Synology, and got 300TB of RAID-10 storage for about 100K. I DUPLICATED it (200K total), and still only paid 10% of what the 'vendors' tried to sell me, I was VERY clear that I did not need Fiber Channel, I refused to spend tons of money for something that would have zero bearing on the performance, and found it's much better to research and provide your own solution at 10% of the cost of the big vendors. Why do you think EMC has almost 3Billion of revenue, because they convince pointy haired bosses that their solution is the best. Trust me, going with a 2nd tier vendor for 'near line storage' is a much better idea than talking to the 'big 5' to ask for a solution
You're asking like you will be implementing it... don't.
Gather all their requirements, gather your requirements on top of it (I'm pretty confident that some of those requirements were your additions for "you'd be an idiot to have that, but not also have this...", possibly including the backup).
Then put out an Preliminary RFP to the major storage vendors, including asking them what they'd say you'd missed in the preliminary.
Then take the recommendations they make on top of the preliminary with a grain of salt, since most of them will be intended to insure vendor lock-in to their solution set, revise the preliminary, and put out a final RFP.
Then accept the bid that you like which management is willing to approve.
Problem solved.
P.S.: You don't have to grow everything yourself from seed you genetically modify yourself, you know...
That's the easiest question I've ever seen.
1. Wait about a decade or so.
2. Buy two half-petabyte flash drives.
3. Alternate your copies on the two flash drives, the previous one becomes your backup.
NEXT!
Get free satoshi (Bitcoin) and Dogecoins
You're not asking the right questions:
The first correct question is why on earth would someone need to access half a petabyte? In most cases the commonly accessed data is less than 1%. That's the amount of data that realistically needs to reside on disk. It never is more than 10% on such a large dataset. Everything else would be better placed on tape. Tiered storage is the answer to the first question. You have RAM, solid/flash storage (PCI based), fast disks, slow high capacity disks and tape. Choose your tiering wisely.
The second question you need to ask is how the customer needs to access that large datastore. In most cases you need serious metadata in parallel with that data. For Petabytes of data you cannot in most cases just use an intelligent tree structure. You need a web-site or an app to search that data and get the required "blob". For such an app you need a large database since you have 5M objects with searchable metadata (at 200MB/blob).
The third question is why do you have SAN as a premise? Do you want to put a clustered filesystem with 5-10 nodes? Probably Isilon or Oracle ZS3-2/ZS4-4 are your answer.
Fourth question: what are the requirements? (How many simultaneous clients? IOPS? Bandwidth? ACL support? Auditing? AD integration? Performance tuning?)
Fifth question: There is no such thing as 100% availability. The term disaster in Disaster Recovery is correctly placed. Set reasonable SLA expectations. If you go for five-nine availability it will triple the cost of the project. Keep in mind that synchronous replication is distance limited. Typically, for a small performance cost, the radius is 150 miles and everything above impacts a lot.
Even if you solve the problems above, if you want to share it via NFS/CIFS or something else you're going to run into troubles. Since CIFS was not realistically designed for clustered operation regardless of the distributed FS underneath the CIFS server, you get locking issues. Windows Explorer is a good example since it creates thumbs.db files, leaves them open and when you want to delete the folder you cannot unless you magically ask the same node that was serving you when it created the Thumbs.DB file. Apparently, the POSIX lock is transferred to the other server and stops you from deleting, but when Windows Explorer asks the other node who has the lock on the file you get screwed since the other server doesn't know. Posix locks are different from Windows locks. It affects all Likewise based products from EMC (VNX filler, Isilon, etc.) and it also affects the CIFS product from NetApp. I'm not sure about Samba CTDB though.
I would design a storage based on ZFS for the main tiers, exported via NFSv4 to the front-end nodes and have QFS on top of the whole thing in order to push rarely accessed data to Tape. The fronted nodes would be accessed via WebDAV by a portal in which you can also query the metadata with a serious DB behind it.
I've installed Isilon storage for 6000 xendesktop clients that all log-on at 9AM, i've worked on an SL8500, Exadata, various NetApp and Sun storages and I can tell you that you need to do a study. Have simulations with commodity hardware on smaller datasets to figure out the performance requirements and optimal access method (NAS, Web, etc.). Extrapolate the numbers, double them and ask for POC and demos from vendors, be it IBM, EMC, Oracle, NetApp or HP. Make sure that in the future, when you'll need 2PB you can expand in an affordable manner. Take care since vendors like IBM tend to use the least upgradable solution. They will do a demo with something that can hold 0,6PB in their max configuration and if you'll need to go larger you'll need a brand new solution from another vendor.
It's not worth doing it yourself since it will be time-consuming (at least 500 man-hours until production) and with at least 1 full-time employees for the storage. But if you must, look at Nexenta and the hardware that they recommend.
And remember to test DR failover scenarios.
Good luck!
UNIX was not designed to stop you from doing stupid things, because that would also stop you from doing clever ones.
Nope. Not 'owned'. It's covered under the CDDL and developed by a group that isn't associated with Sun. Open-ZFS.
There's no place like
500TB is nothing these days. You can easily buy any system and it will support it. Look at FreeBSD/FreeNAS with ZFS (or their commercial counterpart by iXSystems). If you want to have an extremely comfortable, commercial setup, go Nexenta or with a bit of elbow grease, use the open/free counterpart OpenIndiana (Solaris based).
You can build 2 systems (I personally have 3, 1 with SAS in Striped-Mirrors, 1 with Enterprise-SATA in RAIDZ2 and 1 with Desktop-SATA in RAIDZ2) and have ZFS snapshots every minute/hour/day replicated across the network for backups, both Nexenta and FreeNAS have that right in the GUI. The primary system also has a mirrored head node which can take over in less than 10s. As far as sharing out the data: AFP/SMB/NFS/iSCSI/WebDAV etc. whatever you need to build up on it.
My system is continuously snapshotted to it's primary backup so that in case of extreme failure (which has not happened in the 7 years since I've built this system) I can run from the primary backup until the primary has been restored with perhaps a few seconds of data loss (don't know if that's acceptable to you but in my case it's not a problem in case we do have a full meltdown)
Where are those systems limited to 16TB? I wouldn't touch them with a 10-foot pole because they're running behind (within a few years a single hard drive will surpass that limit).
Custom electronics and digital signage for your business: www.evcircuits.com