Ask Slashdot: How Do You Store a Half-Petabyte of Data? (And Back It Up?)

Don't by Anonymous Coward · 2015-07-25 07:17 · Score: 0

use amazon

Re:Don't by Anonymous Coward · 2015-07-25 07:38 · Score: 0

Instead ask slashdot to give information for free which could net a consultant a hefty fee since it's seemingly out of the capabilities of the asker.
Re:Don't by jellomizer · 2015-07-25 08:06 · Score: 1

But he used vague requirements so not to give enough information for an actual informed decision.
But in general it sounds like it is going to be expensive and a lot of work, with working out a lot of details more then storing and backing up data.
Then the question but how do you present it back to the clients? That is a different can of worms.
The real question should be.
Which consulting company should I work with on a big data project?
Have you worked with some that seems to be able to give you a clear goal and time lines, and meet the budget specified.

--
If something is so important that you feel the need to post it on the internet... It probably isn't that important.
Re: Don't by Anonymous Coward · 2015-07-25 09:43 · Score: 0

Glusterfs
Re:Don't by CaptQuark · 2015-07-25 16:53 · Score: 1

Another example of posters trying to be cute and split their reply between the Subject and Comment blocks. It causes confusion when the comments don't stand alone and then you realize the subject line needs to preface the comment.

Just "Don't" do it.

--
Re:Don't by davester666 · 2015-07-25 19:41 · Score: 1

Budget? I suppose we do a round of layoffs...

--
Sleep your way to a whiter smile...date a dentist!
Re: Don't by Anonymous Coward · 2015-07-25 21:30 · Score: 3, Insightful

I think that the intention was to stimulate a discussion amongst a community of geeks who have a genuine interest in this type of technology and enjoy discussing solutions that they have built. Sure, you could just outsource the service and pay consultants to do it for you but I don't think that is the general ethos of the traditional Slashdot reader. Also, if you feel that you should be paid for commenting here then this is probably not the forum for you. Twat.
Re:Don't by Hognoxious · 2015-07-25 23:56 · Score: 1

But he used vague requirements so not to give enough information for an actual informed decision.

Which is the perfect situation to employ a consultant. Outcome 1: he'll ask the right questions, get accurate answers because management know the requirements, and it'll be a success. Outcome 2..N: it'll be a disaster but it won't be your fault.

--
Confucius say, "Find worm in apple - bad. Find half a worm - worse."
Re: Don't by Arnold+Reinhold · 2015-07-26 00:47 · Score: 1

Would I make a major enterprise purchase based on a Slashdot discussion? Absolutely not. Would I want to read a Slashdot discussion and maybe follow suggested links and look up all the buzzwords BEFORE talking to vendors or consultants? Absolutely.
Re:Don't by donaldm · 2015-07-26 01:43 · Score: 1
But he used vague requirements so not to give enough information for an actual informed decision.
Which is the perfect situation to employ a consultant. Outcome 1: he'll ask the right questions, get accurate answers because management know the requirements, and it'll be a success. Outcome 2..N: it'll be a disaster but it won't be your fault.
Excellent answer and the bit about covering your rear is priceless.
I have consulted on issues like this and there are multiple solutions some relatively simple and others complex, but a backup solution for half a petabyte of data is not going to come cheap so obviously any professional consultant will also want to cover themselves as well.
If a project has not been raised with all input being documented, milestones set and sign-off for all steps no professional consultant would want to touch this. Sure you can jury rig a solution and it may work but if anything goes wrong then whoever is perceived to be guiding this is effectively going to be looking for a new job.
Here are some very basic questions a consultant is going to ask and don't think these can be answered in a simple sentance:
1. Do you have a disaster recovery plan?
2. What amount and type of data do you really want to backup?
3. Do you want daily, weekly monthly, yearly or other types of backups? What type of backups do you want them to be?
4. What do you want your backup window to be?
5. What do you want your recovery window to be?
The above is just the start of the questions and there are going to be many many more before that will require detailed answers before any recommendation is reached with regard to equipment, installation, maintenance as well as backup, storage and and recovery strategies. This takes time and everyone wants to cover their rear so sign-off for important steps (ie. milestones) are essential.
--
There ain't no such thing as proprietary standards only proprietary formats. Standards are by definition open.
Re:Don't by Anonymous Coward · 2015-07-26 04:15 · Score: 0

Nah, I just use 200 spanned 5TB drives and another 200 spanned 5TB drives for backup.
Re:Don't by KGIII · 2015-07-26 05:42 · Score: 1

I did some work to help ease the traffic flow around Atlanta, GA. (There is a giant highway that runs around it in a circle, access was fairly easy but egress was not as good as it should have been. The idea was brilliant when they designed it. Importantly,population growth was around the outside of the circle and there was congestion at peak hours and the load was not where it was anticipated and designed for.)
Anyhow, after bidding and getting the contract (a consulting contract - we would recommend design changes, for example, but not specify how the changes were made only what needed to be changed and where and traffic engineers would take care of the rest - traffic engineering was not a part of this contract and we did not bid on that project due to the mess that it was, it has only been marginally improved but it is great in off-peak hours except it is not really needed in off-peak hours) we learned something. They had effectively bid out to hire a consultant to see if they had needed to hire a consultant. Our internal name was, "The Georgia Recursive Loop." The City of Atlanta has its own traffic engineers, not as many as needed really, so we were unable to recommend consulting a consultant to keep the chain going.
That was one of the projects (surprisingly few) that made me feel a little bad for the tax payers. They were not the only ones that hired a consultant to consult on hiring consultants. Sometimes they hire a lawyer, a specialist who is not on the city budget, to determine if they should hire a consultant to determine if they should employ the services of a consultant. (I am looking at YOU District of Columbia. I am looking at you...) Buffalo, NY hired a lawyer who recommended a specialist lawyer to vet our proposals. The original lawyer remained on the books and handled communication between the specialist lawyer (who had ended up being our main contact) and the city council. The council, of course, reported to the manager of the local transportation department. It was a lot like the "Chinese Telephone" game we played as kids where you say one thing in one person's ear and they repeat it and so on and so on until it is munged silliness at the end.
It is quite lucrative, really. If you are not insane when you start then you will be by the time you get familiar with all of the silliness. Sorry for the novella but there simply is no easy way to share the experiences. Hopefully it is reasonably clear. My only justification, for being a part of the system, is that it paid well, provided great jobs, and the tech/educational aspects of it were originally mind blowing and fun.

--
"So long and thanks for all the fish."

Just put "bomb" and "assassinate" in every line. by Anonymous Coward · 2015-07-25 07:21 · Score: 1

It's all going to get backed up.

latge vendors such as netapp use freebsd by Anonymous Coward · 2015-07-25 07:21 · Score: 0

subject says it all.. large storage arrays typically run freebsd kernels or some variant..

Re:latge vendors such as netapp use freebsd by Anonymous Coward · 2015-07-25 12:12 · Score: 0

"subject says it all..." NOT!
WTF is "latge"???
Re:latge vendors such as netapp use freebsd by Anonymous Coward · 2015-07-25 15:27 · Score: 0

NetApp runs a heavily modified BSD called Data ONTAP. It's equivalent to saying MacOS X is the same thing as FreeBSD.

ceph by drew8523 · 2015-07-25 07:23 · Score: 3, Informative

we use Ceph, its fast, redundant, and crazy scalable, oh did i mention free (paid support)? ceph.com

Re:ceph by Anonymous Coward · 2015-07-25 07:45 · Score: 0

+1 for Ceph!
Re:ceph by u-235-sentinel · 2015-07-25 08:08 · Score: 2

we use Ceph, its fast, redundant, and crazy scalable, oh did i mention free (paid support)? ceph.com
Personally I've been using Ceph for the last few years myself. It has to be one of the best DFS's I've ever used. It includes security, speed, easy to expand by adding additional nodes. The free part was great. I found it looking through the repos one day. You can even tie it into other projects such as Hadoop (at least I recall reading it had a plug in a couple years ago).
Great product!

--
Has Comcast disconnected your Internet account? Same here. You can read about it at http://comcastissue.blogspot.com
Re:ceph by Anonymous Coward · 2015-07-25 08:33 · Score: 0

The front page in their web site is a bit of a turnoff, consisting mostly of marketroid drivel and very little of substance.
Re:ceph by Anonymous Coward · 2015-07-25 21:35 · Score: 0

Thing hat's always bothered me about Ceph's filesystem part in stable; "Important - Ceph FS is currently not recommended for production data." (http://docs.ceph.com/docs/v0.80.5/cephfs/)
If there was a problem and questions were asked I wouldn't like to have to say why I ignored that bit.

Ambiguous by smittyoneeach · 2015-07-25 07:24 · Score: 4, Insightful

Do you mean:
(a) "Don't store it. Employ Amazon (or some other cloud) storage."? or
(b) "Do not use Amazon."
Clarity: it's like that one thing that is not the other thing, except for when it is.

--
Get thee glass eyes, and, like a scurvy politician, seem to see things thou dost not.--King Lear

Re:Ambiguous by Anonymous Coward · 2015-07-25 08:26 · Score: 0

"How Do You Store a Half-Petabyte of Data? (And Back It Up?)"
don't. use amazon.
Re:Ambiguous by FatdogHaiku · 2015-07-25 16:00 · Score: 1

Clarity: it's like that one thing that is not the other thing, except for when it is.
Good Lord! You've hit on the exact motto needed for my new startup!
Random Eyeglasses Hut

This is going to be so much better than what we had:
"Somebodies prescription, in about an hour..."

--
You have the right to remain sentient. If you give up the right to remain sentient, you will be elected to public office
Re:Ambiguous by arglebargle_xiv · 2015-07-25 16:22 · Score: 0

"How Do You Store a Half-Petabyte of Data? (And Back It Up?)"
don't. use amazon.
I keep half my petabyte on Xhamster, the other half on Xnxx. Problem solved.
Re:Ambiguous by allo · 2015-07-26 05:52 · Score: 1

Don't do the shit.
Go shopping on amazon
FTFY
Re: Ambiguous by cthulhu11 · 2015-07-26 17:50 · Score: 1

Ceph. Or if you can get past the traditional filesystem concept, Swift. Don't cheap out on journals. Trust me on this.

Talk to Vendors by Old+VMS+Junkie · 2015-07-25 07:25 · Score: 3

Honestly, you should talk to the pros. I would call a couple of storage vendors, give them the basic outline of what you want to do, and let them tell you how they would do it. You can even get more formal and issue a Request for Information (RFI) or even a Request for Quote (RFQ). If you're a biggish company, your purchasing people probably have an SOP and standard forms for how to issue an RFI/RFQ. For the big boy storage vendors, half a petabyte is commonplace. The bigger question may very well be what this is going to look like at a software level. Managing the data might be a bigger challenge than storing it. Is this going to be organized in some sort of big data solution like Hadoop? Is it just a whole bunch of files and a people are going to write R or SAS jobs to query against it? Sometimes the tool set that you want to use will drive your choices in how to build the infrastructure under it.

Re:Talk to Vendors by Anonymous Coward · 2015-07-25 07:41 · Score: 5, Informative

Honestly, that's the WORST thing to do. When you talk to the pros, they will try and sell you some outrageous overpriced Fiber Channel system that's total overkill for what you are doing. I've worked with 'big data' storage companys like EMC and Netapp. We needed 300TB of 'nearline' storage, and EMC came up with a $3,000,000.00 TOTAL overkill Fiber Channel solution, and Netapp wasn't much better, coming in at close to $2,000,000.00. Total ripoff. The ONLY reason you would ever choose Fiber Channel over ISCSI is if you are doing HUGE transactional database, with millions of access per minute. If you just need STORAGE, I went with Synology, and got 300TB of RAID-10 storage for about 100K. I DUPLICATED it (200K total), and still only paid 10% of what the 'vendors' tried to sell me, I was VERY clear that I did not need Fiber Channel, I refused to spend tons of money for something that would have zero bearing on the performance, and found it's much better to research and provide your own solution at 10% of the cost of the big vendors. Why do you think EMC has almost 3Billion of revenue, because they convince pointy haired bosses that their solution is the best. Trust me, going with a 2nd tier vendor for 'near line storage' is a much better idea than talking to the 'big 5' to ask for a solution
Re:Talk to Vendors by NatasRevol · 2015-07-25 08:22 · Score: 1

LOL at FC only for transactional DBs.
Also, VM environments. Large media servers, etc.
My solution:
Infortrend. It has iSCSI for you and your slow environment, and FC for me and my fast environment. And cheap enough for both.
Also, 300TB of RAID 10 at $100k is most likely 7k rpm. I much prefer 15k as it's performant for VMs even when full of running VMs. 7k drives never will be. Well, maybe if you put a nice fat SSD cache in front of it.

--
There are two types of people in the world: Those who crave closure
Re:Talk to Vendors by jbolden · 2015-07-25 08:37 · Score: 1

Netapp provides performance storage. If you don't want performance and only want part of their solution they can virtualize the software and run on anyone's hardware. You can be down around $12k / mo for 300TB duplicated 1x with their software. Nowhere near $3m.
Re:Talk to Vendors by Anonymous Coward · 2015-07-25 09:11 · Score: 0

Idiot, he was clearly referring to letting your tool preferences drive infrastructure. The business needs drive the infrastructure, which should dictate tool use to where it seems like there were never any options to begin with. Salespeople tend not to agree with that but scientists and engineers do.
Re:Talk to Vendors by laurencetux · 2015-07-25 09:18 · Score: 1

How is a function of what can you spend on it. If you can drop a couple megabucks on it then you will get a solution that delivers your data before its requested and if a storage module THINKS of going bad its going to get swapped out.
but yeah send feelers out to as many vendors as possible (and don't forget The Other Tower does not count as offsite backup)
Re:Talk to Vendors by ArcadeMan · 2015-07-25 09:21 · Score: 1

I put my data inside XML files, split the fields with CSV and store all of it on 4200RPM laptop drives that automatically go to sleep after a few minutes of inactivity.
Oh, and I backup all of that data on punched tape once per year.

--
Get free satoshi (Bitcoin) and Dogecoins
Re:Talk to Vendors by markus_baertschi · 2015-07-25 09:24 · Score: 1

A well written RFI sent to some vendors should give you an overview of what is available and at what cost.
As you need file level access you should talk to NAS vendors, like Netapp or ENC Isilon. They will certainly have storage boxes for you. You'll have to fit a backup solution to your storage box too, this is work and adds cost.
If you think this may grow, then look at scalability. Not all solutions scale. Also you may end up with millions of files, this may be problemantic to some backup solutions.
I have experience with IBMs Spectrum Scale (GPFS) kit. The cluster filesystem scales nicely and handles lots of files and data with excellent performance. With the recent Elastic Storage components the price per TB is very competitive (5-10x lower than traditional enterprise storage). A half a PB of storage should cost in the $150k range, you'll have to add more for backup and maybe implementation.
For the backup you should look into tape robots. Handling a TB size backup set manually is not fun and will require considerable manpower, when a robot does it mostly unattended. It may also make sense to combine your backup with other backups on site, you may end up be ten imes bigger than everything else combined...
Re:Talk to Vendors by Anonymous Coward · 2015-07-25 09:30 · Score: 1

Talking to companies that try to oversell you is the worst thing you can do.
Talk to Oracle and spend 20x more.
Talk to Cisco and spend 5x more.
Derp
Re:Talk to Vendors by Anonymous Coward · 2015-07-25 10:23 · Score: 0

Fully agree. Especially EMC tries to push their overly expensive last-decade storage & compute systems that are total overkill. recently got a quote from them in the same range as you mentioned. compared to cloud versions by amazon, google and ibm they are about 50% more expensive. and all of the cloud vendors offer cheaper options depending on the kind of replication and access performance you need. confronted emc with that. they didn't get it. can you blaim them, theirs is a dying business and most nobody of dying businesses get it. poor sods.
Re:Talk to Vendors by Anonymous Coward · 2015-07-25 12:03 · Score: 0

What EMC and NetApp give you are features. A LOT of companies "backup" their data by putting it on a deduplicated volume, which replicates asynchronously between 1-2 different physically located sites. They think this is cheaper than tape because all the data is there, and you can map a filesystem as a VTL.
They also have a shitload of caching. You have a server that is spinning a lot of random reads/writes, the caching on the EMC appliance turns all that into a constant write and speeds up reads by having a good chance of hitting the RAM on the controller before it has to go to the spinny thing.
As for FC versus iSCSI versus FCoE, this all depends:
A lot of places have fiber channel switches in place. Having the storage fabric separate from the network fabric is a good security measure. However, FC provides multipathing, but iSCSI doesn't. This means that if you need to upgrade or fix a network appliance, iSCSI goes down, while you just lose one path with FC... and in the real world (i.e. production), you either have redundant systems, or you get fired. iSCSI is great for amateurs in a dev environment, but the pros use FC and FCoE.
Re:Talk to Vendors by mlts · 2015-07-25 12:22 · Score: 2

Oracle has a SAN (well, SAN/NAS) offering which does similar with a rack of ports/HBAs that were configurable, assuming the right SFP was present. Want FC? Got it. iSCSI? Yep. FCoE? Yep. Want to just share a NFS backing store on a LAG for a VMWare backing store. Easy doing.
The price wasn't that shocking either. It wasn't dirt cheap like a Backblaze storage pod, but it was reasonable, especially with SSD available and autotiering.
Re: Talk to Vendors by Anonymous Coward · 2015-07-25 12:53 · Score: 0

ISCSI doesn't have multi-pathing capabilities??
Re:Talk to Vendors by Anonymous Coward · 2015-07-25 13:04 · Score: 0

On the PB level, for data that is valuable, I just don't see any other way than using a tape silo:
1: Tapes take 0 wattage to store, other than HVAC.
2: Anything LTO-4 and newer has built in AES, and it is easy to enable it on a silo. Set a passphrase, make sure the corporate brass and relevant people know it, encryption is then covered.
3: Copying backups is very easy, so you can have a site offsite.
4: Tapes are archival media and can actively repair bit rot; HDDs are not, and some arrays will not detect bit rot (note, this is different from parity), and happily serve up corrupt data.
5: After the high cost of the silo, tape cartridges are cheap.
6: LTO-6 capacity is decent. Handling 200-500 tapes (assuming 0% compression) sounds onerous, but in reality, not that big of a task. LTO-7 is even better, with 6.4TB native, so a 1PB backup is well less than 150 tapes.
Re:Talk to Vendors by Anonymous Coward · 2015-07-25 13:11 · Score: 1

Talking to professional SALES people is the worst thing you can do. They will sell you what they have, and what they think you can afford... the WONT sell you affordable solution that you actually NEED.
If you don't know the difference between sales professionals and IT professionals... you are part of the problem.
Re: Talk to Vendors by mlts · 2015-07-25 14:41 · Score: 1

Unless I'm completely hallucinating, I have set up MPIO on ESXi for iSCSI, as well as a LAG (link aggregate) for a NFS based backing store.
iSCSI has its place in the enterprise, and it can be used in production. If the NIC supports it, it can even be used for booting. How does it fare against 8GB FC? In reality, there are a few tasks which will saturate a 10GB iSCSI link or an 8GB FC link, but not that many.
All of these are just tools in the toolbox. iSCSI is easier to get going ad-hoc (but still be useful with MPIO), FC is well known and well used, and FCoE seems to be popping up because it works well with Cisco Nexus architecture.
Re:Talk to Vendors by FrozenGeek · 2015-07-25 14:46 · Score: 1

Talking to the pros is only the worst thing to do if you know as much, if not more, than they do. The fact that the OP is asking slashdot indicates he does not know a lot about setting up storage in the PB range. Are the major vendors overpriced? In terms of the hardware you get, probably. In terms of the knowledge they bring to the table, probably NOT in the case of the OP. If you have someone who can select COTS components and effectively couple them with some good OS/SW, great. Otherwise, get someone who knows what they are doing and buy their solution. Doing it on your own when you don't know what you are doing will only end in tears.

--
linquendum tondere
Re: Talk to Vendors by Anonymous Coward · 2015-07-25 15:15 · Score: 0

You overpaid. Improve your vendor management and negotiation skills.
-Someone who works designs petabyte scale storage solutions everyday, for a living.
Re:Talk to Vendors by AK+Marc · 2015-07-25 15:23 · Score: 2

He wasn't very clear about his complaint, but talking to professional sales people about what you need will never get you an optimal solution.

--
Learn to love Alaska
Re:Talk to Vendors by AK+Marc · 2015-07-25 15:27 · Score: 1

I know you were making a joke, but 4200 RPM laptop drives are great. You'll have trouble finding a lower power usage spinner, and the read speed will be roughly interface speed for most practical implementations of multi-drive arrays.

--
Learn to love Alaska
Re:Talk to Vendors by Anonymous Coward · 2015-07-25 16:20 · Score: 0

Honestly, that's the WORST thing to do. When you talk to the pros, they will try and sell you some outrageous overpriced Fiber Channel system that's total overkill for what you are doing. I've worked with 'big data' storage companys like EMC and Netapp. We needed 300TB of 'nearline' storage, and EMC came up with a $3,000,000.00 TOTAL overkill Fiber Channel solution, and Netapp wasn't much better, coming in at close to $2,000,000.00. Total ripoff. The ONLY reason you would ever choose Fiber Channel over ISCSI is if you are doing HUGE transactional database, with millions of access per minute. If you just need STORAGE, I went with Synology, and got 300TB of RAID-10 storage for about 100K. I DUPLICATED it (200K total), and still only paid 10% of what the 'vendors' tried to sell me, I was VERY clear that I did not need Fiber Channel, I refused to spend tons of money for something that would have zero bearing on the performance, and found it's much better to research and provide your own solution at 10% of the cost of the big vendors. Why do you think EMC has almost 3Billion of revenue, because they convince pointy haired bosses that their solution is the best. Trust me, going with a 2nd tier vendor for 'near line storage' is a much better idea than talking to the 'big 5' to ask for a solution
Fiber channel? That was their opening quote wasn't it? Did you already have fiber infrastructure or did that include the super overprices Cisco fiber switches they add in? Did they both include the entire nickel and dime software stack? They both have iScsi solutions, you obviously weren't' clear enough about what you wanted if they kept insisting on selling you FC. And trust me, you can get their software stack for free. Take that quote they gave you, give it a discount of 60%, and that's what you should be paying at MOST.
There's other vendors as well, we just quoted 360TB of Nimble Storage for right around 600k. You might think we're being foolish by paying that much, but that premium we're paying is to guarantee support and performance.
See - You just wait until you need support from Synology. I've been there along with a few other people I know that thought they could save a few ducats by going with them. God help you when it comes to the "enterprise support" program that has you send parts in before they'll ship a replacement. I watched a friend struggle to get a replacement on 10613XS+ that had a failed MB. Never again.
At least go with Dell. Dell will sell you an MD3860i with 60 6TB hard drives for not much more than what you paid for the your Synology. Performance is just as good as the Synology, you'll get next day on site support from a Dell tech, and a smaller rack/power/cooling footprint as well.
Re:Talk to Vendors by Anonymous Coward · 2015-07-25 17:41 · Score: 0

So you're saying you're a pro and we should take your advice? Hypocrite. -1 for redundancy because there is no oxymoronic.
Re:Talk to Vendors by drsmithy · 2015-07-25 18:20 · Score: 1

RAID10 for nearline storage ?
More research required, methinks.
Re:Talk to Vendors by Oceanplexian · 2015-07-25 19:17 · Score: 1

We have actually purchased a NetApp cluster, replicated in two sites, and while I can't divulge what we paid (Plus I'm just the guy who set it up), there's a good chance the parent is off by almost an order of magnitude. Now – I'm not saying you couldn't build your own storage cheaper, or that I have my own issues with NetApp, or that some sort of Cloud solution might not be an even better answer- such as Amazon S3 or Glacier, I will say that a SAN is not at all a bad idea and depending on how important your data is, absolutely worth it. Synology makes great gear but they're in a completely different league compared to something like a NetApp and especially an EMC, just in terms of redundancy (redundant psus, redundant shelves, redundant controllers), support, and performance. It's the same reason banks spend millions to run mainframes even though a new smartphone is probably faster.
Re:Talk to Vendors by Anonymous Coward · 2015-07-25 19:20 · Score: 0

EMC would unlikely be selling FC for this solution - it sounds like it would be best solved with an Isilon Scale out NAS which can grow to over 30PB on a single filesystem. We use it for rendering but it also includes HDFS if you do want to use it with HADOOP. We've been extremely happy with it after using netapp for a long time and having to hack around the 16TB file system limit.
My 2c
Re:Talk to Vendors by ihtoit · 2015-07-26 03:12 · Score: 1

Seconded. I use laptop EIDE drives for my network scratch - it's great, the array runs at saturation for my Gigabit network. And at 2TB, that volume isn't too shoddy on usable space either.
For archival storage (for some measure of permanent to not include removable tape) I use huge drives in quick-release caddies and set to JBOD and simply diff the data daily. Once the drive's full, out it comes and in goes the next empty. Full drive goes offsite. Working volume is around 14TB right now, that's a RAID6. All commodity x86/x64 gear. My network volumes are all running in a wooden footlocker on an Athlon64 3400+ clocking at 800MHz.

--
Political debates have me rolling my eyes so much I think I got optical whiplash. I should sue. - Foamy The Squirrel
Re:Talk to Vendors by Thumper_SVX · 2015-07-26 04:54 · Score: 1

At least go with Dell. Dell will sell you an MD3860i with 60 6TB hard drives for not much more than what you paid for the your Synology. Performance is just as good as the Synology, you'll get next day on site support from a Dell tech, and a smaller rack/power/cooling footprint as well.
Seconded... though having recently seen a lot of quotes you could do worse than the Dell SCv2000 which is the newer replacement for the MD3860i using the Compellent code. It's faster and cheaper than the MD, mostly because Dell no longer has to pay the Netapp tax for every MD (the MD's are based on an LSI chipset that's owned by Netapp)
Re:Talk to Vendors by Bengie · 2015-07-26 05:02 · Score: 1

I saw a ZFS benchmark comparing random read, write, read+write, and sequential read, write, and read+write of a 15k RPM RAID and 5400 RPM with 10x as much storage but just as many spindles for a fraction the price, and the 5400 RPM setup was faster once the 64GB of SSDs got warmed up.
Re:Talk to Vendors by KGIII · 2015-07-26 06:59 · Score: 1

Dell will sell you an MD3860i with 60 6TB hard drives ...
How odd? I was drooling over that exact appliance the other day and wishing I could find something similar for home use. I do not want/need fiber. I do have a rack in my data room in the basement. Something rackable, CAT5/6, PB (or close) support, low power, easy management, enterprise level support - can be toned down a bit, expandable, and offering built-in redundancy... There was a YellowBox (I think that was its name, it has been long since discarded) appliance that as nice and met some of those needs, I feel it should have been expanded on. I currently have a home-grown solution based on simple white boxes. They are not rackable and they are power hungry even though they are minimally used. Maybe something based on the above ideas with four Atom CPUs running a *NIX variation with a front end or ability to mount slices of space. Money is not the objective, I will pay handsomely, but finding something that really fits my desires is difficult.
I am sure such an appliance is out there and meets my needs almost exactly. I have not yet found it. I would even pay enterprise level pricing (though I expect enterprise level hardware) and would also want the ability to upgrade to SSDs (without needing to add them all at once) when those become a bit more mature for long-term use and the price becomes more reliable.
One of the things I miss most about still owning my company is I am no longer able to lug home equipment that has been replaced. (I always just gave depreciated equipment to myself, employees, or donated the hardware to local schools. Being a tech-heavy business meant stuff was replaced fairly often and still had a great deal of use left in it.) I kept a lot of that stuff and still use a bunch of it today though it is, more often than not, to play around and much of that is now 10+ years old so upgrading/adding new toys is an option. I do get occasional hand me downs as I still go in and do some work for the company once in a while, I also have stock in the parent company, but they are fewer than was in the past.
Anyhow, the silly mindless drivel above is mostly unimportant. I too, however, would like to be able to have a large storage array with backup capability. I already have off-site backups (not at the enterprise level) and a disaster recovery plan in place as well as cold-storage in a safe deposit box as well as a friend's garage. I would love to have a decent, easily managed, appliance for it that had great support and easy upgrading to 'future proof' things for a while.

--
"So long and thanks for all the fish."
Re:Talk to Vendors by KGIII · 2015-07-26 07:13 · Score: 1

Someone needs to cluster a bunch of unbranded cell phones and build an HPC out of them! A custom rack could hold countless phones and each could contain a 128 GB card. When one goes down they can chuck it into the trash and toss a new one into the cradle. Using a wireless mesh network would be a bottleneck but I suspect it would crunch a lot of numbers but the power consumption may be an issue. I am sure I am missing some snags, I have not actually given this any real thought, but those could be ironed out.
I think this is a thing that needs to be done simply because of General Principle and his army of ants. We need to give it a good cause and get a kickstarter going. I would throw a few dollars (if it looked like they may actually make a serious attempt) at it just to have some laughs. We can build it and sell the compute cycles at cost to people sequencing genomes of rain forest flora and fauna. (It might actually be okay at that. If not, throw some more hardware at it - my favorite solution for everything.) The environmentalist ideal would potentially garner support. We could even make it based on used (read "RECYCLED") cell phones. Register it as a NPO and people can write off their old phones as a donation. It would employ smart people and help the environment! What's not to love?
That, folks, is my shitty idea of the day.

--
"So long and thanks for all the fish."
Re:Talk to Vendors by Anonymous Coward · 2015-07-27 02:43 · Score: 0

Of course we can all race to conclusions and say that this person has no ability to negotiate and the company has no ability to offer multiple options for a sale. It is as if people are incapable of looking at more than one option within a company or asking for more than one option (ever heard of NL nodes from EMC with 10 Gb ethernet?). I don't think that is the case unless they put some poor fool who has no idea what they are looking for in charge (which is possible -.-). So yes someone could find multiple options from the big vendors if they ask. *mind blown*
Re:Talk to Vendors by Cramer · 2015-07-27 11:25 · Score: 1

Obviously, you've never used LTO technology. They cannot repair tracking errors -- the bits written when the tape was low-level formated, something NO commercial drive can do! "Bit Rot" will destroy LTO tapes in a matter of months if they are not kept at a nearly constant temperature. Conversely, I have DLT, DAT (4mm and 8mm), QIC, Exabyte (8200?) etc. tapes that are still readable after decades. (one of those 8200 tapes sat in a kitchen drawer for 11 years!) Yet, I have a trash bin full of LTO-2 tapes that are 100% unusable after one cycle through Iron Mountain's archive. The SDLT-I's have lasted 8+ years of continuous use (~1wk in the library, then 2-3mo on a table in the DC @ a constant 68F); the LTO-2's (fuji and sony) begin to fail after ~2yr in the same environment.
(In fact, the SDLT DRIVES are failing more often than the media these days. The laser tracking servo fails. The drives are 10+ years old, the tapes 8+)
Re:Talk to Vendors by HappyPsycho · 2015-07-28 03:45 · Score: 1

If you don't know the difference between sales professionals and IT professionals... you are part of the problem.
How do you get to the latter without at least making contact with the former?
Something of the OP's scale isn't exactly the normal thing that your average IT professional has any experience with so the normal channels probably won't work.

EMC Isilon by Anonymous Coward · 2015-07-25 07:25 · Score: 0

It's expensive, but can be used as SAN or NAS (NFS or SMB). It's also redundant to itself - think RAID6 across cabinets. It will set you back, but it's worth every penny.

The poor mans solution is the latest Synology product, which will allow you to do RAID-spanning up to 1.5tb raw.

The only truly viable backup options are to either do block level replication (this isn't backup) and/or Amazon S3 to Glacier.

Re: EMC Isilon by Anonymous Coward · 2015-07-25 09:39 · Score: 0

Isilon can easily get below $.50 at that volume and scales to >50pb. If you actually have a business to run, ask yourself how much you care about your data, and if you're asking about it on here, you don't know enough to roll your own. Google does it in software because they have 1000 PhD CS engineers who make that work. If you are a shop that has never handled this scale and aren't 'web' scale just cut the check and sleep well at night.
Re:EMC Isilon by Anonymous Coward · 2015-07-25 10:17 · Score: 0

I have to second this. We have 2 800TB clusters that replicate and are planning on adding 400 more TB on each. Adding storage is so simple and is more or less infinitely scalable.
Re:EMC Isilon by mlts · 2015-07-25 14:53 · Score: 1

Isilons are a cool technology. Take FreeBSD, add a custom filesystem (OneFS), link individual nodes via Infiniband, and let the custom code automatically select which nodes/drives to fetch data from. If a hard drive blows, it shrinks the array in order to maintain redundancy.
Of course, Isilons support deduplication, iSCSI (you create a disk image and mount that), and your NAS protocols of choice. If you set a hard quota, the presented directory can be configured to show the quota as the disk space present. Very nifty, and not that expensive for an enterprise array. Need more space? Add drives or more nodes.
For long term backups, Isilons support NDMP [1].
[1]: Of course, you can always connect a tape silo to a UNIX machine, write a script that SSHes into an Isilon node and pulls off /ifs/data.

Depends who you ask... by snowgirl · 2015-07-25 07:28 · Score: 4, Interesting

At Facebook, it's memcached, with an HDD backup, eventually put onto tape...

At Google, it's a ramdisk, backed up to SSD/HDD, eventually put onto tape...

For anyone who can't afford half a petabyte of RAM with the commensurate number of computers? I have no good ideas... except maybe RAM cache of SSD, cache of HDD, backed up on tape...

Using something like HDFS to store your data in a Hadoop cluster of file requests, is likely the best F/OSS solution you're going to get for that...

--
WARNING! This girl exceeds the MAXIMUM SAFE standards established by the FDA for BRATTINESS

Re:Depends who you ask... by tsetem · 2015-07-25 12:47 · Score: 2

Thumbs up on HDFS. The next question to ask your groups how they will be analyzing it. HDFS (and Hadoop/Spark/Whatever) will hopefully fit in nicely there. Not only will your data be redundantly copied across multiple systems, but as your data needs (and cluster) grows, so does your computational power.
Getting data in & out can be done via Java API, Rest API, FUSE or NFS Mounts. The only issue is that HDFS doesn't play well with small files, but hopefully your groups will be using large files instead.
Now administration is another story, but then there's Cloudera's Manager that's supposed to greatly simplify management. I'm currently using it to store about .25 PB right now for random analysis, but growing it's capacity is a straightforward task.
As far as backing up, HDFS provides snapshots, 3x replication (or more) across nodes in the cluster. Of course there's always the big hammer of just getting a second cluster. As an old HW sage once told me, "If you can't afford to buy two, don't buy one"

Enterprise Storage by NFN_NLN · 2015-07-25 07:29 · Score: 2

This project must have an unrealistically low budget, otherwise there are quite a few Enterprise solutions that will do all OR a combination of these tasks.

> how do you present it back to the clients?
Look at a NAS, not a SAN. ie NetApp or 3Par C series.

> And how do you back it up?
Disaster Recovery replication to another system or hosted services. NetApp, EMC, 3Par, etc, etc

> Many SAN solutions have a maximum volume limit of only 16TB
NetApp Infinite volumes limit is 20PB

You can contact a sales person from any of those companies to answer any of these questions.

Re:Enterprise Storage by NatasRevol · 2015-07-25 08:24 · Score: 2

Yeah, the 16TB limit says OP is looking at VERY low end solutions. As in not feasible for petabyte range projects.

--
There are two types of people in the world: Those who crave closure
Re:Enterprise Storage by Anonymous Coward · 2015-07-25 11:01 · Score: 1

My favorite part of that 16TB limit is that it can be reached with two hard drives.
Re: Enterprise Storage by Anonymous Coward · 2015-07-25 13:44 · Score: 0

Where have you found these 8tb drives?! LIES! All lies!
Re: Enterprise Storage by Anonymous Coward · 2015-07-25 14:22 · Score: 0

Seagate have 8TB drives (model number ST8000AS0002).
Re:Enterprise Storage by Drewdad · 2015-07-25 15:33 · Score: 1

Replication is not backup. I cannot stress this enough.
I know of major companies that depended on replication and ignored backup, and then the original copy gets corrupted and the corruption gets replicated to the recovery sites.
Now if you're doing SAN snapshots, and replicating those, then you might be covered, but mounting one of those snaps, and recovering some portion of your data, can be a real pain in the behind.
Re: Enterprise Storage by afidel · 2015-07-25 16:03 · Score: 1

Not necessarily, HP 3Par 20850 scales to 4 PB of SSD (raw, 15+ PB with dedupe) and 3.2 million sub 1ms IOPS, and 75GB/s of throughout but one LUN is still limited to 16TB because not enough customers need more than that it one logical disk to change underlying code.

--
There are 4 boxes to use in the defense of liberty: soap, ballot, jury, ammo. Use in that order. Starting now.
Re:Enterprise Storage by Anonymous Coward · 2015-07-25 18:38 · Score: 0

This is true.
However, the solution in mind needs to be considered. Too many old school SAN people see Replication and think "omfg no backup"
But, with snapshots + "replication" you get great success.
If it's set up correctly, that is.
I am a PSC for a certain Storage Vendor, so take that for what you will.
I have no idea how people cannot set these things up correctly. Honestly, it baffles me. If you're setting it up, read about it. If you don't understand, pay the PS rate to get a person out who does.
Anyway. Toodles!
Re: Enterprise Storage by ihtoit · 2015-07-26 03:32 · Score: 1

I just had an underpants spooge.

--
Political debates have me rolling my eyes so much I think I got optical whiplash. I should sue. - Foamy The Squirrel
Re: Enterprise Storage by KGIII · 2015-07-26 07:32 · Score: 1

NewEgg has them at $260 with a 2/customer limit and 12 in stock. That number will be smaller in a minute when they update the page. Free shipping too.

--
"So long and thanks for all the fish."
Re: Enterprise Storage by ihtoit · 2015-07-26 10:03 · Score: 1

good price. My last HD purchase was a WD Elements pocket 2TB for £69.99 from DSG. For some reason it doesn't suffer the problem my WDE 1TB has in that that one powers down and I have to hardcycle it for the system to pick it up again. It wouldn't be that annoying except that I use that one for music.

--
Political debates have me rolling my eyes so much I think I got optical whiplash. I should sue. - Foamy The Squirrel
Re:Enterprise Storage by Cramer · 2015-07-27 11:34 · Score: 1

Replication is not Archival. Corruption can be copied to a "backup" as well. If you aren't paying attention to what is being duplicated and to where, then "stupid is going to catch up to you eventually." For the record, I've seen the exact same mistake happen to people doing "backups" (RDX and tape) -- the error wasn't caught within a media cycle. (which was "weeks" for them)
Re:Enterprise Storage by Drewdad · 2015-07-29 07:11 · Score: 1

Yup. Inadequate retention can screw you the same was as just depending on replication.
Re:Enterprise Storage by Drewdad · 2015-07-29 07:12 · Score: 1

"same way as" not "same was as"

Call ixsysyems, use ZFS by darkpixel2k · 2015-07-25 07:31 · Score: 1

Seriously. Call ixsysyems. They specialize in this stuff and they use ZFS.

--
There's no place like ::1 (I've completed my transition to IPv6)

Re:Call ixsysyems, use ZFS by NatasRevol · 2015-07-25 08:26 · Score: 2

ZFS is a great raid system. That's now owned by Oracle. Goodbye ZFS.

--
There are two types of people in the world: Those who crave closure
Re:Call ixsysyems, use ZFS by darkpixel2k · 2015-07-25 08:41 · Score: 4, Informative

Nope. Not 'owned'. It's covered under the CDDL and developed by a group that isn't associated with Sun. Open-ZFS.

--
There's no place like ::1 (I've completed my transition to IPv6)
Re:Call ixsysyems, use ZFS by Anonymous Coward · 2015-07-25 09:13 · Score: 0

That was exactly what I was thinking. iXsystems sells storage solutions that are exactly in line with what the OP is looking for. They back projects like FreeNAS for lower end storage and TrueNAS for higher end. Plus their solutions are open source and based on ZFS so there is little chance of lock-in.
Re:Call ixsysyems, use ZFS by Bengie · 2015-07-25 09:43 · Score: 2

My cousin used ZFS+gluster for this multi-petabyte system.
Re:Call ixsysyems, use ZFS by donaldm · 2015-07-26 02:23 · Score: 1

Seriously. Call ixsysyems. They specialize in this stuff and they use ZFS.
Since when is a file-system a backup and recovery solution?

--
There ain't no such thing as proprietary standards only proprietary formats. Standards are by definition open.
Re: Call ixsysyems, use ZFS by darkpixel2k · 2015-07-26 03:02 · Score: 1

It's *part* of a good disaster recovery solution. Care to back your stuff up with Refs? (Or whatever MS calls it today?)

--
There's no place like ::1 (I've completed my transition to IPv6)
Re:Call ixsysyems, use ZFS by Bengie · 2015-07-26 05:26 · Score: 1

For most people ZFS could be considered a "back up" solution to a certain extent and making certain assumptions. For a large enterprise systems, never rely on a single file system. The data files should be replicated to multiple systems that use multiple file systems.

Tape by kthreadd · 2015-07-25 07:31 · Score: 1

The research projects I've seen using that amount of storage has usually used a tape solution with dCache in front of it. You use a number of tape robots filled with tape, put them in different locations and have them back up everything between them.

Re:Tape by kthreadd · 2015-07-25 07:52 · Score: 1

Just realized I was a few digits off, saw that you said 0.5 PB. Somehow got it to 500 PB. Not that dCache isn't going to handle it, it will. But for as little data as just 0.5 PB a couple of disk arrays connected to a single server will usually be fine. Tape is still good for backup though.
Re:Tape by kthreadd · 2015-07-25 17:20 · Score: 1

Yep. It's not that much. We just installed a new storage system for fast temporary data, not long term storage. 1 PB. It easily fits in a single rack.

Your understand of NAS arrays is very wrong by Anonymous Coward · 2015-07-25 07:31 · Score: 0

16TB, thats so wrong on so many levels. Even small business nas arrays like synology are capable of multi PB size storage. Take the RS18016xs, it holds 12x8TB drives, and can be expanded with units to hold 180 total drives. The base unit is like $9000.00, and the expansions are about $3000.00, so for about $50,000 total, you could have 500TB online storage. Want to be doubly safe, just get 2 identical units. So for 100K, you have all the storage and redundancy you need

My commision please by Anonymous Coward · 2015-07-25 07:31 · Score: 0

DNA strands are the way to go.

https://en.wikipedia.org/wiki/DNA_digital_data_storage

You need real expertise by Anonymous Coward · 2015-07-25 07:32 · Score: 0

Our storage just passed 1 petabyte, and we are using AWS, but, the real answer to your question is that you need the help of someone with storage expertise (and asking Slashdot doesn't count).

maximum volume limit of only 16TB??? by Anonymous Coward · 2015-07-25 07:35 · Score: 0

I have a few friends that have built their own NAS with RAID5/6 and ZFS with much more than 16TB.

This is kind of the wrong place by Anonymous Coward · 2015-07-25 07:37 · Score: 0

You really need to talk to an Enterprise Reseller. Do not bet your career on some half-assed solution you engineered in house.

Use storage level services. by hamster_nz · 2015-07-25 07:38 · Score: 1

If you want to keep your data on-site, unless your already have a lot of the infrastructure that you can leverage the path of least resistance is to use something like a NetApp Filer.

For backups it can create snapshots on a schedule (hourly/daily/weekly), then either replicate them to a second physical storage unit (hopefully at a different site) or present them to your backup solution.

Using the file services on the NetApp will also provide a solution to your "how do I present it to the storage consumers" question - iSCSI, CIFS with domain integration, NFS, Fibre Channel... You also get storage level de-duplication and compression, if that works for your data.

Of course you will pay what seems like a lot for it, but it does solve a lot of your problems in one unit. How much will it save in servers, backup capacity, a multi-drive tape library, daily visits to the server room to reload tapes and so on.

But if your data center isn't up to providing the level of availability you want then any hardware solution is going to be problematic - large storage systems do not like having the power pulled out from under them. Minimum is dual-redundant UPS power and fault tolerant cooling, or you will most likely have problems.

ZFS and dedupe by Anonymous Coward · 2015-07-25 07:38 · Score: 0

Tegile does nfs, cifs, iscsi, and fc. Also dedupes with ZFS without having to trade the kids in for more RAM.

Re:ZFS and dedupe by Anonymous Coward · 2015-07-25 08:17 · Score: 0

I would use lz4 compression, but I think I would pass on dedupe for that much data. It's been a few years since I experimented with zfs dedupe but I think that any sort of panic is going to put you into a 10s of hours rebuilt process for the dedupe table with volumes of that magnitude.
That said, I agree the tldr answer to the original question is ZFS and ZFS snapshots plus ZFS send for backups.

Storage Pod by Anonymous Coward · 2015-07-25 07:39 · Score: 1

Something like storage pods? https://www.backblaze.com/blog/storage-pod/

use slashdotFS by goombah99 · 2015-07-25 07:39 · Score: 3, Funny

I use slashdotFS which is a markovian random comment generator which effectively embeds data in a stegenographic comment. The FS handles the details of creating and saving these so it's all transparent and mounts on your desktop like a regular drive. It's slow but it's capacity seems unlimited and frequently gets modded insightful

--
Some drink at the fountain of knowledge. Others just gargle.

Re:use slashdotFS by goombah99 · 2015-07-25 07:41 · Score: 2

another way is to convert it to jpeg and store it in facebook.

--
Some drink at the fountain of knowledge. Others just gargle.
Re:use slashdotFS by KGIII · 2015-07-26 05:49 · Score: 1

And by shear coincidence the encrypted header's plain text output is MOO! Compressed meta-data is goatse.

--
"So long and thanks for all the fish."
Re:use slashdotFS by Big+Hairy+Ian · 2015-07-27 00:29 · Score: 1

I was just going to suggest embedding it in Piers Morgans DNA as he oviously has the redundancy ande its about time he did something useful

--
Build a Man a Fire, and He'll Be Warm for a Day. Set a Man on Fire, and He'll Be Warm for the Rest of His Life.

You are not knowledgeable enough to be in charge by Anonymous Coward · 2015-07-25 07:39 · Score: 0, Insightful

of this project. You probably can supply enough information to vendors to get proposals (which will be all over the map because you can't be very specific) but I fear you're not in a strong position to evaluate them. You need to have your solution developers talk to a experienced consultant you hire to make recommendations and provide evaluations of vendor bids. You might be able to get it done that way.

Lots of options by JWW · 2015-07-25 07:40 · Score: 1

You could look into Lustre, although it would change your hardware configuration a bit (its not a SAN) Depending on your configuration and desired redundancy, this will affect costs a bit (i.e.. more luster nodes).

You could by a traditional SAN and tie it all together with fibre, though you'd need a clustered file system like Stornext, or another commercial CFS, or even GFS if you prefer open source. This would help solve your traversal of the system as a regular directory structure issue.

Best bet for backup would be to a robot tape library of some sort. There is some work being done on dynamic backup of data in Luster systems in the HPC space, but its not very mature. CFS systems like Sternest have methods in place for automatically backing up data on the filesystem.

SanDisk sells a 512TB 3U shelf... by AcquaCow · 2015-07-25 07:40 · Score: 2

SanDisk's Infiniflash is 512TB in a 3U chassis that is SAS-connected. You can front this with something like DataCore's SANsymphony to turn it into a NAS/SAN appliance.

The pricing looks to be around $1/GB, which is a ton cheaper than building a SAN of that capacity, plus it's much smaller in power/space/cooling.

--

up 12 days, 22:30, 2 users, load averages: 993.20, 994.21, 994.56
*makes note to limit user processes...

Re:SanDisk sells a 512TB 3U shelf... by Lost+Race · 2015-07-25 10:00 · Score: 1

$1/GB, which is a ton cheaper than building a SAN of that capacity,
The marginal price of HDD storage is about $0.05/GB. Maybe double that for higher density, maybe double it again for redundancy. That's a maximum of $0.2/GB for the disks. There's some fixed overhead for a large disk farm plus some more per-byte overhead for the controllers and interconnects. Hard to believe that really adds up to much more than $1/GB. We're talking half a million dollars for 500TB.
Daydream on. Big cluster of mid-tower PCs. Six 4TB drives per tower, for a total of 20TB with 1:6 redundancy. 25 of those towers would give you 500TB. 150 drives at $150 each = $23K. 25 server-grade PCs at about $1000 each = $25K. Networking? No idea, maybe another $2K? So we're looking at about $50K for 500TB. Obviously there will be some overhead for a managed commercial "enterprise" level system from a big vendor. But more than 10x the price? Really? Seems like there's room for a little more competition in that business.
Re:SanDisk sells a 512TB 3U shelf... by hjf · 2015-07-25 11:23 · Score: 1

now factor in the cost of maintaining spinning disks, powering them, cooling them, and datacenter space....
Re:SanDisk sells a 512TB 3U shelf... by Anonymous Coward · 2015-07-25 14:57 · Score: 0

Actually, it's more like $2/GB and then you need to buy some compute nodes in front of it (since it's just direct attached storage with 8 exposed SAS ports). Still pretty darned cheap if you're after lots of flash storage in a small form factor.
For the original poster, I'd go with a few commodity storage chassis (supermicro has one that will hold 72 3.5" drives). Fill a few of those with as many HGST drives as you need to meet your space requirements and then manage it with ceph. That's cheap enough that you can build a redundant cluster and use that for backups if you so desire. It will also scale out fairly easily as your space requirements inevitably increase.

Time for the next step by fustakrakich · 2015-07-25 07:42 · Score: 1

Let's start growing brains in jars.

--
“He’s not deformed, he’s just drunk!”

Re:Time for the next step by KGIII · 2015-07-26 07:59 · Score: 1

I am not sure if it is my file system or my OS but I am definitely suffering from bit rot. Maybe it is Windows and I need a defrag utility?

--
"So long and thanks for all the fish."

How are you using the data? by MetricT · 2015-07-25 07:44 · Score: 2

What clients will you be exporting it to? Linux, OS X, Windows? All three?

What kind of throughput do you need? Is 10 MB/sec enough? 100 MB/sec? 10 GB/sec?

What kind of IO are you doing? Random or sequential? Are you doing mostly reads, mostly writes, or an even mix?

Is it mission critical? If something goes wrong, do you fix it the next day, or do you need access to a tier 3 help desk at 3 am?

We have a couple of petabytes of CMS-HI data stored on a homegrown object filesystem we developed and exported to the compute nodes via FUSE. Reed-Solomon 6+3 for redundancy. No SAN, no fancy hardware, just a bunch of Linux boxes with lots of hard drives.

There is no "one shoe fits all" filesystem, which is part of the reason we use our own. If you have the ability to run it, I'd suggest looking at Ceph. It only supports Linux, but has Reed-Solomon for redundancy (considered it a higher tier of RAID) and good performance if you need it. If you have to add Windows or OS X clients into the mix, you may need to consider NFS, Samba, WebDAV, or (ugh) OpenAFS.

Re:How are you using the data? by Anonymous Coward · 2015-07-25 11:45 · Score: 0

Finally someone is asking for requirements.
Re:How are you using the data? by rev0lt · 2015-07-26 11:53 · Score: 1

It is funny, I've read many comments since the top of the page, and finally someone is actually asking for requirements. At this point, its buried at the middle of the scrollbar. And yet, someone blames slashdot moderation. I blame the users.

You're asking like you will be implementing it... by tlambert · 2015-07-25 07:44 · Score: 4, Interesting

You're asking like you will be implementing it... don't.

Gather all their requirements, gather your requirements on top of it (I'm pretty confident that some of those requirements were your additions for "you'd be an idiot to have that, but not also have this...", possibly including the backup).

Then put out an Preliminary RFP to the major storage vendors, including asking them what they'd say you'd missed in the preliminary.

Then take the recommendations they make on top of the preliminary with a grain of salt, since most of them will be intended to insure vendor lock-in to their solution set, revise the preliminary, and put out a final RFP.

Then accept the bid that you like which management is willing to approve.

Problem solved.

P.S.: You don't have to grow everything yourself from seed you genetically modify yourself, you know...

I was going to by Anonymous Coward · 2015-07-25 07:45 · Score: 0

Make some sarcastic comment about tape library, tape library, or a library of tape

However you could probably get a rack of boxes running openVMS to present its pooled storage as a single blob of networked drive, which sounds like what you want. Backup to tape of course.

You don't. by Anonymous Coward · 2015-07-25 07:49 · Score: 1

Unless you REALLY want to pay for it.

As someone who works in a Hospital system, Imaging Informatics specifically, we have roughly that much data spread across 2 locations. Backups aren't what you think they are. We backup the infrastructure config. Databases, VM cluster config and VM's, which compressed, probably equates to 5-10 Terabytes. That's it. That's the stuff which, if worst possible event happened, we wouldn't be exctly back to 0 when we rebuilt.

As for the 400-500 Terabytes of data, they're in what we call Archive state. There isn't backup of them, but they are in proper data centers with fire suppression. So there's that... Still, if 1 site went up, we'd be down that data. Thems the breaks... Goes back to money! But, what we do have, is evertying in RAID with Hot Spare. I think... I know 2 drives can fail in a block, and have recently, and we can recover the block. As 75% of this data is pretty much read-only transfer, the only stuff being written to permanent storage is new data. I think we're seeing 120-150 Terabyte of growth a year, and we're looking at new storage since current gear is at the 'EOL'. Life Cycle wise, not warranty or operation.

Point is, will we see a PetaByte storage system bought? Maybe, but it will be the same setup. Archive system, with backup for the 'guts', what I like to call it. Simply put, CXX's don't want to throw the $$ down for Petabyte Data store site duplication. If money was far more flowing to use, we'd at least start there and implement a 100-150 Terabyte SSD Caching block with 10GB Fiber, in and out. Not happening, but a man can dream...

Re: You don't. by TheMeuge · 2015-07-25 14:22 · Score: 1

Are you telling me you have a petabyte of clinical data with no backups? Good luck with that lawsuit my friend...
Re: You don't. by ihtoit · 2015-07-26 03:39 · Score: 1

that's all right because we have public officials who leave backups on public transport...

--
Political debates have me rolling my eyes so much I think I got optical whiplash. I should sue. - Foamy The Squirrel

look at how backblaze does it by Anonymous Coward · 2015-07-25 07:50 · Score: 1

Backblaze blog has a rundown of their storage pod https://www.backblaze.com/blog/storage-pod-4-5-tweaking-a-proven-design/

This with something like gluster, luster, cephe or even just nfs.

This is not for Ask Slashdot... by Anonymous Coward · 2015-07-25 07:50 · Score: 0

Using online services (Azure, AWS) you are looking at $5,000 - $10,000 per month for this kind of storage support (500TB). Realize that these businesses are not extremely high margin, so if your budget is orders of magnitude less than this you have an issue.

The disks alone for a completely non-redundant system are around $15,000. At this point, you should absolutely not be using Ask Slashdot for a resource, you are well into the "real" enterprise space and getting information/quotes from established vendors.

My assumption using zero facts would be your storage solution alone will be $150,000 or so, plus a few thousand in maintenance/bandwidth/hosting etc per month.

Check out this older Slashdot story by Anonymous Coward · 2015-07-25 07:52 · Score: 0

http://hardware.slashdot.org/story/09/09/02/138209/build-your-own-28m-petabyte-disk-array-for-117k

Ask the people who are currently storing 150 PB by Anonymous Coward · 2015-07-25 07:53 · Score: 1

Backblaze is an online backup provider. They have open sourced some of their software and hardware designs.

They are currently storing over 150 Petabytes of user data. https://www.backblaze.com/blog/150-petabytes-of-cloud-storage/
They are working on scalability into the Zettabyte range https://www.backblaze.com/blog/vault-cloud-storage-architecture/
They have open sourced their hardware design for anyone to use. https://www.backblaze.com/blog/storage-pod-4-5-tweaking-a-proven-design/

They also looked into using 3rd party vendors but decided that they could build a better solution for at least 1/8 the price. https://www.backblaze.com/blog/petabytes-on-a-budget-how-to-build-cheap-cloud-storage/

I know that it is not a plug and play solution but if you are willing to build off of their work you can save a ton of money and have a solution that truly fits your needs.

Easy by ArcadeMan · 2015-07-25 07:53 · Score: 5, Funny

How Do You Store a Half-Petabyte of Data? (And Back It Up?)

That's the easiest question I've ever seen.

1. Wait about a decade or so.
2. Buy two half-petabyte flash drives.
3. Alternate your copies on the two flash drives, the previous one becomes your backup.

NEXT!

--
Get free satoshi (Bitcoin) and Dogecoins

Use the cloud by Anonymous Coward · 2015-07-25 07:56 · Score: 0

Plenty of options to chose from. Google cloud has the best prices.
https://cloud.google.com/storage/?utm_source=google&utm_medium=cpc&utm_campaign=2015-q2-cloud-na-storage-bkws-freetrial-en&&gclid=Cj0KEQjw58ytBRDMg-HVn4LuqasBEiQAhPkhuh1xdQtfg4Eqt40cJJYA-SI9IoeXst1e861yuLSgYaYaAk9P8P8HAQ

You can just encrypt everything at rest if you're concerned with your data living 'in the cloud'.

easy by YoungManKlaus · 2015-07-25 08:00 · Score: 1

Step 1: buy a metric shitton of storage space (virtual or physical)
Step 2: put your data on it
Step 3: ???
Step 4: profit

What are your budget and reliability requirements? by fishnuts · 2015-07-25 08:01 · Score: 2

If you have a small budget and moderate reliability requirements, I'd suggest looking into building a couple Backblaze-style storage pods for block store (5x 180TB storage systems, apx $9000 each), each exporting 145TB RAID5 volumes via iSCSI to a pair of front-end NAS boxes. NAS boxes could be FreeBSD or Solaris systems offering ZFS filestores (putting multiples of 5 volumes, one from each blockstore, together in RAIDZ sets), which then export these volumes via CIFS or NFS to the clients. Total cost for storage, front-ends, 10GbE NICs and a pair of 10GbE switches: $60K, plus a few weeks to build, provision, and test.

If you have a bigger budget, switch to FibreChannel SANs. I'd suggest a couple HP StorServ 7450s, connected via 8 or 16Gb FC across two fabrics, to your front ends, which aggregate the block storage into ZFS-based NAS systems as above, implementing raidz for redundancy. This would limit storage volumes to 16TB each, but if they're all exposed to the front ends as a giant pool of volumes, then ZFS can centrally manage how they're used. A 7450 filled with 96 4TB drives will provide 260TB of usable volume space (thin or thick provisioned), and cost around $200K-$250K each. Going this route would cost $500-$550K (SANs, plus 8 or 16Gb FC switches, plus fibre interconnects, plus HBAs) but give you extremely reliable and fast block storage.

A couple advantages of using ZFS for the file storage is its ability to migrate data between backing stores when maintenance on underlying storage is required, and its ability to compress its data. For mostly-textual datasets, you can see a 2x to 3x space reduction, with slight cost in speed, depending on your front-ends' CPUs and memory speed. ZFS is also relatively easy to manage on the commandline by someone with intermediate knowledge of SAN/NAS storage management.

Whatever you decide to use for block storage, you're going to want to ensure the front-end filers (managing filestores and exporting as network shares) are set up in an identical active/standby pair. There's lots of free software on linux and freebsd that accomplish this. These front-ends would otherwise be your single-point-of-failure, and can render your data completely unusable and possibly permanently lost if you don't have redundancy in this department.

Amazon S3 with versioning on by Anonymous Coward · 2015-07-25 08:04 · Score: 0

Your requirements arn't clear.

If your only requirement is size and single tree; Amazon S3. ....It is a touch slow and is remote.....But your costs are easy to predict and it's easy to tell each group exactly what their monthly expenses are.
There are some good desktop clients as well as a web client. I've used CyberDuck quite a bit on Mac.
Glacier is pretty cool too for cold data.

The problem with buying a huge san.....
You will plop down tens of thousands of dollars.
It will require constant maintenance
When its time to replace it, good luck, acquisition is always a total PIA

The ONLY reason you should look at a SAN for bulk data IS SPEED. .....though I suspect someone will make some super lame argument about security.....
If someone trys this, revoke their email access and give them a desktop.
I promise you the bigger security hole is the sloppy user with email access and a laptop in a car.

Amazon is not the week link in security, you and your staff are (no offense, the same is true where I work).

Two vastly different requirements by Anonymous Coward · 2015-07-25 08:06 · Score: 0

The mixed media storage stuff is fairly conventional, but for the analysis thing you need to work backwards. Data locality is a big problem, just moving that much data between storage and processing is a problem, expecially if you will be running it repeatedly. It's impossible to give an answer for storage workout knowing what the computational profile is like, but something in the big data space is most likely.

We paid ~$30k for a 24TB array...call a vendor. by InfiniteBlaze · 2015-07-25 08:11 · Score: 1

They'll be happy to talk to you for free, for the prospect of getting their hands on that kind of cash. You're easily looking at $.5M-$1M between storage, processing, and redundancy.

Re:We paid ~$30k for a 24TB array...call a vendor. by Anonymous Coward · 2015-07-26 06:48 · Score: 0

You paid way too much. For $22,000 you can get a 192TB (raw) array with a 3-year warranty from small server sellers. If you build it yourself, you could reduce that price to $14,000 (but good luck on warranties). For $66,000 the requester could get a 576TB array. For $132,000 the requester could duplicate their array. For $198,000 the requester could do some reasonable backups of the original array with a double-sized array.

What's your budget? by Karmashock · 2015-07-25 08:14 · Score: 1

Sounds like you need the storage onsite at least for the research project.

The mixed media thing sounds like something to throw at the cloud unless there's a reason not to do that.

As to spanning volumes etc... I don't really understand the file structure of this research project. Having a petabyte of data in a single directory is typically the opposite of good ideas.

I'd like more information.

As to back ups... it depends on how frequently the information changes. Backup tapes are probably the cheapest way to go for backups of archives. 3 TB at 20 dollars a tape.... not bad. And you can do incremental back ups if there are little changes.

The tapes are supposed to last about 10 years. So that's something.

If we're talking about high frequency changes... you almost need to replicate the primary storage... and the number of times you need to do that is variable on how badly you need to not lose the data.

If we're talking about data that if lost orphans are going to get ground up into hamburger and fed to the dogs... you're going to want multiple back ups. If it would merely be annoying... maybe one back up is fine.

--
I've decided to stop wasting my time responding to AC trolls/sockpuppets... so if you want a response from me... login.

IBM GPFS by Anonymous Coward · 2015-07-25 08:16 · Score: 0

GPFS was built for this. Standard file access from any platform. Peta (and beyond) size hierarchical file tree across multiple systems. High availability, file recovery.

https://en.wikipedia.org/wiki/IBM_General_Parallel_File_System

NAS by sega_sai · 2015-07-25 08:16 · Score: 1

We recently bought for our group a NAS server with ~200Tb of raw storage (175Tb after RAID6 with a good card). And this is NFS mounted to other servers. It is pretty easy to use and configure and quite cheap (20k UK pounds). Regarding the backup, I would probably just buy a second server. (maybe with cheaper confiuration, worse raid card, etc.)

GlusterFS by Anonymous Coward · 2015-07-25 08:24 · Score: 0

The storage cluster I manage is a bit smaller than yours, but you could look at GlusterFS.
It is created with your requirements and scale in mind:
- Single hierarchy filesystem
- Flexible regarding underlying storage (SAN is possible, commodity hardware is also possible)
- No Single Points Of Failure in your cluster
- Targets the 'several petabytes' scale explicitly

I found GlusterFS extremely easy to setup. After receiving the hardware I had the cluster set-up in half a day (but studied it and tried a test setup before that).

It seems most /. posters are big on recommending commercial support (that is not a bad idea in most situations). Support is available from RedHat if you need that.

Ask the guys at CERN by prefec2 · 2015-07-25 08:24 · Score: 1

You will not get a good answer here, because even if there would be one it will be hard to find between all the nonsense.

BTW your scenario is incomplete and therefore it is unlikely to give a good answer. It looks a little bit like you want /. to make your homework.

Wrong questions. More details needed. by d3vi1 · 2015-07-25 08:27 · Score: 5, Informative

You're not asking the right questions:

The first correct question is why on earth would someone need to access half a petabyte? In most cases the commonly accessed data is less than 1%. That's the amount of data that realistically needs to reside on disk. It never is more than 10% on such a large dataset. Everything else would be better placed on tape. Tiered storage is the answer to the first question. You have RAM, solid/flash storage (PCI based), fast disks, slow high capacity disks and tape. Choose your tiering wisely.

The second question you need to ask is how the customer needs to access that large datastore. In most cases you need serious metadata in parallel with that data. For Petabytes of data you cannot in most cases just use an intelligent tree structure. You need a web-site or an app to search that data and get the required "blob". For such an app you need a large database since you have 5M objects with searchable metadata (at 200MB/blob).

The third question is why do you have SAN as a premise? Do you want to put a clustered filesystem with 5-10 nodes? Probably Isilon or Oracle ZS3-2/ZS4-4 are your answer.

Fourth question: what are the requirements? (How many simultaneous clients? IOPS? Bandwidth? ACL support? Auditing? AD integration? Performance tuning?)

Fifth question: There is no such thing as 100% availability. The term disaster in Disaster Recovery is correctly placed. Set reasonable SLA expectations. If you go for five-nine availability it will triple the cost of the project. Keep in mind that synchronous replication is distance limited. Typically, for a small performance cost, the radius is 150 miles and everything above impacts a lot.

Even if you solve the problems above, if you want to share it via NFS/CIFS or something else you're going to run into troubles. Since CIFS was not realistically designed for clustered operation regardless of the distributed FS underneath the CIFS server, you get locking issues. Windows Explorer is a good example since it creates thumbs.db files, leaves them open and when you want to delete the folder you cannot unless you magically ask the same node that was serving you when it created the Thumbs.DB file. Apparently, the POSIX lock is transferred to the other server and stops you from deleting, but when Windows Explorer asks the other node who has the lock on the file you get screwed since the other server doesn't know. Posix locks are different from Windows locks. It affects all Likewise based products from EMC (VNX filler, Isilon, etc.) and it also affects the CIFS product from NetApp. I'm not sure about Samba CTDB though.
I would design a storage based on ZFS for the main tiers, exported via NFSv4 to the front-end nodes and have QFS on top of the whole thing in order to push rarely accessed data to Tape. The fronted nodes would be accessed via WebDAV by a portal in which you can also query the metadata with a serious DB behind it.

I've installed Isilon storage for 6000 xendesktop clients that all log-on at 9AM, i've worked on an SL8500, Exadata, various NetApp and Sun storages and I can tell you that you need to do a study. Have simulations with commodity hardware on smaller datasets to figure out the performance requirements and optimal access method (NAS, Web, etc.). Extrapolate the numbers, double them and ask for POC and demos from vendors, be it IBM, EMC, Oracle, NetApp or HP. Make sure that in the future, when you'll need 2PB you can expand in an affordable manner. Take care since vendors like IBM tend to use the least upgradable solution. They will do a demo with something that can hold 0,6PB in their max configuration and if you'll need to go larger you'll need a brand new solution from another vendor.

It's not worth doing it yourself since it will be time-consuming (at least 500 man-hours until production) and with at least 1 full-time employees for the storage. But if you must, look at Nexenta and the hardware that they recommend.

And remember to test DR failover scenarios.

Good luck!

--
UNIX was not designed to stop you from doing stupid things, because that would also stop you from doing clever ones.

Re:Wrong questions. More details needed. by radish · 2015-07-25 11:12 · Score: 1

The first correct question is why on earth would someone need to access half a petabyte? In most cases the commonly accessed data is less than 1%. That's the amount of data that realistically needs to reside on disk. It never is more than 10% on such a large dataset.

Never say never. We have data sets several times larger than that which are 100% always online due to client access patterns. Not only online, but extremely latency critical. And I personally could name a dozen other companies with similar requirements.

--
---- Den ene knappen er powerknapp, den andre er Bender voice knapp "Bite My Shiny Metal Ass"
Re:Wrong questions. More details needed. by Professor+Paradox · 2015-07-26 02:33 · Score: 1

This is definitely the best post so far. Sending out requirements to different vendors will just get you a vender specific answer. If you ask a DBA how to store that much data they will give you an answer that explains how MSSQL could handle that, and then they would talk about backup snashots, and you would be stuck with SQL as the client access.
I want to reject the premise of your request, are you really responsible for manging the data of these two other groups? It seems like in the past you have owned the storage for other internal teams, but now the time has come for them to start doing this themselves. Option 1, you own the service that does this, you don't pay attention do limits and anything like that, and provide an SLA to groups that want to use your service. This has probably been what you currently doing. Some teams may be unhappy with that service because it doesn't quite fit their needs. Option 2, each team that wants something different and should manage it themselves. Where an filesystem for one team may be what they need, perhaps a different team wants MongoDB shards.
Monoliths are evil, and trying to maintain petabytes of data in one place is not a good solution. It's easier for two teams to maintain and own their own Terabyte storage solutions that will solve their own problems, then having you to try to mediate and come up with the solution yourself.

SAN is out. by TheHawke · 2015-07-25 08:27 · Score: 1

Library storage sounds like that may be your best choice. Several high end vendors sell such systems and may need to have RFS and RFQ's submitted, not to mention seeing the systems in action. This is not going to be cheap, but it's best on the long term investment. Ensure that it is scalable and can handle any future expansions without investing in whole new kit or that will simply put your department back to square one.

--
First rule of holes; When in one, stop digging.

SAN, etc... by jbolden · 2015-07-25 08:33 · Score: 1

On a SAN the 16tb limit comes generally from 32 bit SANs the 64 bit SANs wouldn't have it. Plenty of SAN solutions can handle 500tb or 10x that much. So just upgrade. If you only want backup there are plenty of hardware backup devices that handle this. For example exagrid scales to I believe 300tb / hr much less 500tb total. This isn't gigantic in today's world. You just need to have a conversation with your vendor, or an agent. You aren't asking for anything abnormal or challenging.

Isilon by Anonymous Coward · 2015-07-25 08:36 · Score: 0

I manage 6 Isilon clusters, 4 of them has 1.2PB. Today with 20 x X410 nodes you can have 1.2PB and it's scalable, fast and can be backed up easy. Of course the price is high but the solution works perfectly.

Enterprise SAN by Anonymous Coward · 2015-07-25 08:39 · Score: 0

There are many Enterprise SANs that can support that size. Dell Compellents have a maximum LUN size of 10PB for instance.

But restore ... by Ungrounded+Lightning · 2015-07-25 08:53 · Score: 2

Just put "bomb" and "assassinate" in every line. ... It's all going to get backed up.

But getting them to restore it after it's gotten lost or corrupted is difficult.

--
Bantam Dominique roosters crow a four-note song. Once you've heard it as "Happy BIRTHday" you can't NOT hear it that way

Re:But restore ... by Anonymous Coward · 2015-07-25 15:23 · Score: 0

If you include the name of a president in your data, you will get to see your backups presented as evidence in court.
Re:But restore ... by Hognoxious · 2015-07-26 00:10 · Score: 1

They still apply that bit in about disclosing the nature and cause of the accusation, did they?
That's cute.

--
Confucius say, "Find worm in apple - bad. Find half a worm - worse."

What are your IOPS and throughput requirements? by DamnStupidElf · 2015-07-25 08:59 · Score: 2

For high throughput/IOPS requirements build a Lustre/Ceph/etc. cluster and mount the cluster filesystems directly on as many clients as possible. You'll have to set up gateway machines for CIFS/NFS clients that can't directly talk to the cluster, so figure out how much throughput those clients will need and build appropriate gateway boxes and hook them to the cluster. Sizing for performance depends on the type of workload, so start getting disk activity profiles and stats from any existing storage NOW to figure out what typical workloads look like. Data analysis before purchasing is your best friend.

If the IOPS and throughput requirements are especially low (guaranteed < 50 random IOPS [for RAID/background process/degraded-or-rebuilding-array overhead] per spindle and what a couple 10gbps ethernet ports can handle, over the entire lifetime of the system) then you can probably get away with just some SAS cards attached to SAS hotplug drive shelves and building one big FreeBSD ZFS box. Use two mirrored vdevs per pool (RAID10-alike) for the higher-IOPS processing group and RAIDZ2 or RAIDZ3 with ~15 disk vdevs for the archiving group to save on disk costs.

Plan for 100% more growth in the first year than anyone says they need (shiny new storage always attracts new usage). Buy server hardware capable of 3 to 5 years of growth; be sure your SAS cards and arrays will scale that high if you go with one big storage box.

Buy a Storage Pod by Areyoukiddingme · 2015-07-25 09:00 · Score: 3, Informative

Buy Storage Pods, designed by BackBlaze. You can get 270TB of raw storage in 4U of rackspace for $0.051 per gigabyte. Total cost for half a petabyte of raw storage: $27,686. To back it all up cheaply but relatively effectively, buy a second set to use as a mirror. $55,372. For use with off-the-shelf software (FreeNAS running ZFS or Linux running mdm RAID) to present a unified filesystem that won't self-destruct when a single drive fails, you'll need to over-provision enough to store parity data. Go big or go home. Just buy another pod for each of the primary and the backup sets. Total of 6 pods with 1620TB of raw storage: $83,058. Some assembly required. And 24U of rackspace required, with power and cooling and 10Gbe ethernet and UPSs (another 4-8U of rackspace).

Expect a ballpark price of something a little under $100,000 that will meet your storage requirements with sufficient availability and redundancy to keep people happy. It will require 2 racks of space, and regular care and feeding. Do the care and feeding in house. A support contract where you pay some asshole tens of thousands of dollars a year to show up and swap drives for you is a waste of money. Bearing that in mind, as other posters have said, talk to storage vendors selling turnkey solutions. Come armed with these numbers. When they bid $1 million, laugh in their faces. But there's an outside chance you'll find a vendor with a price that is something less than hyperinflated. Stranger things have happened.

If you don't generate data very quickly, you can ease into it. For around $35,000, you can start with just 2 pods and the surrounding infrastructure, and add pods in pairs as necessary to accommodate data growth. Add $27,000 in 2 chassis next year to double your space. Add $26,000 of space again in 2017 and increase your raw capacity another 50%. (Total storage cost using BackBlaze-inspired pods is dominated by hard drive prices, which trend downwards.) When you find out your users underestimated growth, another $25,000 of space in 2018 takes you to somewhere in the neighborhood of 2 petabytes of raw storage, that you're using with double parity and 100% mirrored backup for a total effective useable space of approximately 918TB. You'll be replacing 2-3 drives per year, starting out, and 0-1 after infant mortality has run its course. Keep extras in a drawer and do it yourself in half an hour each on a Friday night. If you configured ZFS with reasonably sized vdevs, (3-5 devices) the array rebuild should be done by Monday morning. By 2020, you'll be back up to replacing 2-3 drives per year again as you climb the far side of the bathtub curve. While you're at it, you can seriously consider replacing whole vdevs with larger capacity drives, so your total useable space can start to creep up over time, without buying new chassis. By 2025, you will have 8 chassis in two racks hosting 2.88PB of raw storage space that's young and vital and low maintenance, having spent roughly $200,000.

A bargain, really.

Re:Buy a Storage Pod by Anonymous Coward · 2015-07-26 04:36 · Score: 0

as has been mentioned by other posters, MIRRORING IS NOT BACKUP.
REDUNDANCY and protection against a drive failure via RAID, does NOT protect you against file corruption, accidental deletions, version control etc.
Easiest example is, imagine you run your OS off a RAID system.
You then patch the OS due to a security vulnerability.
The patch fails and your OS cannot boot.
What the hell do you do? your RAID mirror has done squat, its just replicated the failed patch across both volumes. What do you restore from?
Everything else in your post is good though, but understanding backups as being different from redundancy is a fundamental misunderstanding.

Roll it yourself but take responsibility by maraist · 2015-07-25 09:16 · Score: 1

Super-Micro has 36 and 72 drive racks that aren't horrible human effort wise (you can get 90 drive racks, but I wouldn't recommend it). You COULD get 8TB drives for like 9.5 cent / GB (including the $10k 4U chassi overhead). 4TB drives will be more practical for rebuilds (and performance), but will push you to near 11c / GB. You can go with 1TB or even 1/2TB drives for performance (and faster rebuilds), but now you're up to 35c / GB.

That's roughly 288TB of RAW for say $30k 4U. If you need 1/2 PB, I'd say spec out 1.5PB - thus you're at $175K .. $200k.. But you can grow into it.

Note this is for ARCHIVE, as you're not going to get any real performance out of it.. Not enough CPU to disk ratio.. Not even sure if the MB can saturate a 40Gbps QSFP links and $30k switch. That's kind of why hadoop with cheap 1CPU + 4 direct-attached HDs are so popular.

At that size, I wouldn't recommend just RAID-1ing, LVMing, ext4ing (or btrfsing) then n-way foldering, then nfs mounting... Since you have problems when hosts go down and keeping any of the network from stalling / timing out.

Note, you don't want to 'back-up' this kind of system.. You need point-in-time snapshots.. And MAYBE periodic write-to-tape.. Copying is out of the question, so you just need a file-system that doesn't let you corrupt your data. DEFINITELY data has to replicate across multiple machines - you MUST assume hardware failure.

The problem is going to be partial network down-time, crashes, or stalls, and regularly replacing failed drives.. This kind of network is defined by how well it performs when 1/3 of your disks are in 1-week-long rebuild periods. Some systems (like HDFS) don't care about hardware failure.. There's no rebuild, just a constant sea of scheduled migration-of-data.

If you only ever schedule temporary bursts of 80% capacity (probably even too high), and have a system that only consumes 50% of disk-IO to rebuild, then a 4TB disk would take 12 hours to re-replicate. If you have an intelligent system (EMC, netapp, ddn, hdf, etc), you could get that down to 2 hours per disk (due to cross rebuilding).

I'm a big fan of object-file-systems (generally HTTP based).. That'll work well with the 3-way redundancy. You can typically fake out a POSIX-like file-system with fusefs.. You could even emulate CIFS or NFS. It's not going to be as responsive (high latency). Think S3.

There's also "experimental" posix systems like ceph, gpfs, luster. Very easy to screw up if you don't know what you're doing. And really painful to re-format after you've learn it's not tuned for your use-case.

HDFS will work - but it's mostly for running jobs on the data.

There's also AFS.

If you can afford it, there are commercial systems to do exactly what you want, but you'll need to tripple the cost again. Just don't expect a fault-tolerant multi-host storage solution to be as fast as even a dedicated laptop drive. Remember when testing.. You're not going to be the only one using the system... Benchmarks perform very differently when under disk-recovery or random-scatter-shot load by random elements of the system - including copying-in all that data.

--
-Michael

Anything is possible with the right budget... by emag · 2015-07-25 09:20 · Score: 3, Informative

Lucky (?) for you, I just went through purchasing a storage refresh for a cluster, as we're planning to move to a new building and no one trusts the current 5 year old solution to survive the move (besides which, we can only get 2nd hand replacements now). The current system is 8 shelves of Panasas ActiveStor 12, mostly 4 TB blades, but the original 2-3 shelves are 2 TB blades, giving about 270 TB raw storage, or about 235ish TB in real use. The current largest volume is about 100 TB in size, the next-largest is about 65 TB, with the remainder spread among 5-6 additional volumes including a cluster-wide scratch space. Most of the data is genomic sequences and references, either downloaded from public sources or generated in labs and sent to us for analysis.

As for the replacement...

I tried to get a quote from EMC. Aside from being contacted by someone *not* in the sector we're in, they also managed to misread their own online form and assumed that we wanted something at the opposite end of the spectrum from what I requested info on. After a bit of back and forth, and a promise to receive a call that never materialized, I never did get a quote. My assumption is they knew from our budget that we'd never be able to afford the capacities we were looking for. At a prior job, a multi-million dollar new data center and quasi-DR site went with EMC Isilon and some VPX stuff for VM storage/migration/replication between old/new DCs, and while I wasn't directly involved with it there, I had no complaints. If you can afford it, it's probably worth it.

The same prior job had briefly, before my time there, used some NetApp appliances. The reactions of the storage admins wasn't all that great, and throughout the 6 years I was there, we never could get NetApp to come in to talk to us whenever we were looking for expansion of our storage. I've had colleagues swear by NetApp though, so YMMV.

I briefly looked at the offerings from Overland Storage (where we got our current tape libraries), on the recommendation of the VAR we use for tapes & library upgrades. It looked promising, but in the end, we'd made a decision before we got most of those materials...

What we ended up going with was Panasas, again. Part of it was familiarity. Part of it was their incredible tech support even when the AS12 didn't have a support contract (we have a 1 shelf AS14 at our other location for a highly specialized cluster, so we had *some* support, and my boss has a golden tongue, talking them into a 1-time support case for the 8 shelf AS12). We also have a good relationship with the sales rep for our sector, the prior one actually hooked us up with another customer to acquire shelves 6-8 (and 3 spares), as this customer was upgrading to a newer model. Based on that, we felt comfortable going with the same vendor. We knew our budget, and got quotes for three configurations of their current models, ActiveStor 14 & 16. We ended up with the AS16, with 8 shelves of 6 TB disk (x2) and 240 GB SSD per blade (10 per, plus a "Director Blade" per). Approximate raw storage is just a bit under 1 PB (roughly 970-980 TB raw for the system).

In terms of physical specs, each shelf is 4U, have dual 10 GbE connections, and adding additional shelves is as easy as racking them and joining them to the existing array (I literally had no idea what I was doing when we added shelves on the current AS12, it just worked as they powered on). Depending on your environment, they'll support NFS, CIFS, and their own PanFS (basically pNFS) through a driver (or Linux kernel module, in our case). We're snowflakes, so we can't take advantage of their "phone home" system to report issues proactively and download updates (pretty much all vendors have this feature now). Updating manually is a little more time-consuming, but still possible.

As for backups, I honestly have no idea what I'm going to do. Most data, once written, is static in our environment, so I can probably get away with infrequent longer retention period backups for every

--
"The urge to save humanity is almost always a false front for the urge to rule." --H.L. Mencken

IBM GPFS by Anonymous Coward · 2015-07-25 09:35 · Score: 0

Or "spectrum scale" as it is called now, with TSM for backup if you can't afford a second disk replica copy.

Hadoop by Anonymous Coward · 2015-07-25 09:35 · Score: 0

Apache's Hadoop
Just distribute your data all around.

Tape for backup by Crashmarik · 2015-07-25 09:38 · Score: 1

One of these will do you well
https://en.wikipedia.org/wiki/...

For storage that's trickier. You probably need to characterize your usage before you talk to a vendor otherwise they will oversell you into oblivion.

So simple by Anonymous Coward · 2015-07-25 09:39 · Score: 0

Get 1000 WB Black 1TB HD's. Put EXT2 on it.

Build a PHP front end for clients.

Done.

A large cluster... by quonsar · 2015-07-25 09:44 · Score: 1

...of Windows10 boxes!

--

Sacred cows make the best burgers.

EMC Data Domain by Anonymous Coward · 2015-07-25 10:01 · Score: 0

Hands-down, the EMC Data Domain is the best option for backing up such a large amount of data.

EMC Isilon by dave562 · 2015-07-25 10:03 · Score: 1

Where I work, we are running EMC's Isilon platform. We have ~4PB of data replicated between two data centers.

The platform supports the traditional CIFS/SMB and NFS for client connectivity.

It also has Hadoop support (HDFS). The great thing about the HDFS support is that you do not have to spin a separate file system for it. The same files that your clients access via CIFS or NFS can be accessed via HDFS. Isilon was built with Hadoop in mind and the Isilon nodes act as Hadoop "compute nodes".

The OneFS file system presents a practically unlimited in size, single file system. There are some interesting tuning options that can be leveraged depending on your data type and IO patterns. If you need to get REALLY crazy, the system has support for tiering data based on a whole slew of different factors (last accessed date, file date, file size... basically any file metadata attribute you can think of can be used for tiering purposes).

This probably does not matter for you, but the system also supports AES256 at-rest encryption. We deal with a lot of financial and other highly sensitive data for clients that demand at-rest encryption, so that was a must have for us.

The only downside is that since it is from EMC, you can plan on paying through the nose for it. (But never pay full retail for EMC, ever. Threaten them with NetApp if you have to. ;) )

We still leverage a SpectraLogic tape library to archive data off of the system. With a moderately specced NetBackup system we get a consistent ~35000kb/s restore rate off of a single drive. That lets us provide reasonable RTOs back to the business.

On the subject of backup, another great thing about Isilon is that you can dedicate certain nodes to specific tasks. In the Isilon architecture, the NL nodes are the slowest nodes that they have. We leverage those for backup to keep the network IO off of the faster X and S-nodes.

Google or Amazon cloud. by Anonymous Coward · 2015-07-25 10:07 · Score: 0

No way that you should roll your own at this point in time. The future is all clouds all the time. Be on thee leading edge instead off the trailing.

That's it? by guruevi · 2015-07-25 10:12 · Score: 4, Informative

500TB is nothing these days. You can easily buy any system and it will support it. Look at FreeBSD/FreeNAS with ZFS (or their commercial counterpart by iXSystems). If you want to have an extremely comfortable, commercial setup, go Nexenta or with a bit of elbow grease, use the open/free counterpart OpenIndiana (Solaris based).

You can build 2 systems (I personally have 3, 1 with SAS in Striped-Mirrors, 1 with Enterprise-SATA in RAIDZ2 and 1 with Desktop-SATA in RAIDZ2) and have ZFS snapshots every minute/hour/day replicated across the network for backups, both Nexenta and FreeNAS have that right in the GUI. The primary system also has a mirrored head node which can take over in less than 10s. As far as sharing out the data: AFP/SMB/NFS/iSCSI/WebDAV etc. whatever you need to build up on it.

My system is continuously snapshotted to it's primary backup so that in case of extreme failure (which has not happened in the 7 years since I've built this system) I can run from the primary backup until the primary has been restored with perhaps a few seconds of data loss (don't know if that's acceptable to you but in my case it's not a problem in case we do have a full meltdown)

Where are those systems limited to 16TB? I wouldn't touch them with a 10-foot pole because they're running behind (within a few years a single hard drive will surpass that limit).

--
Custom electronics and digital signage for your business: www.evcircuits.com

Backblaze Storage Pod? by im_thatoneguy · 2015-07-25 10:13 · Score: 2

What are your performance requirements. If you just need a giant dump of semi-offline storage then look into building a backblaze Storage Pod.
https://www.backblaze.com/blog...

For about $30,000 you could build four storage pods. Speed would not be terrific. Backups are handled through RAID. If you want faster, more redundant or fully serviced your next step up in price is probably a $300,000 NAS solution. Which might serve you better anyway.

Re:Backblaze Storage Pod? by Anonymous Coward · 2015-07-25 10:39 · Score: 0

RAID is not backup.

file system by Anonymous Coward · 2015-07-25 10:15 · Score: 0

what ever hardware you buy, get Veritas Volume Manger (or whatever new name it's under)

then splitting off a copy for backup / test / snap shot / duplication will be easy and reliable.

And you will have a leg up on disaster recovery offsite.

Plus you won't have an expensive hardware vendor dependency (EMC, Netapp, etc)
You can get those guys to bid against each other every time you need storage.

Your sysadmin will have more time to solve your other problems !
I know I loved this product, it paid for itself every time we did any data migration.
Turned a difficult to manage project on off hours into a, the DBA & I will do it in the background this week without interuption.

Use Amazon S3 storage with glacier archival by xavierpayne · 2015-07-25 10:16 · Score: 1

Use Amazon S3 storage (gives you cloud storage with a directory tree.

Accessible via desktop apps or even web browser if you want.

For stuff they want to archive but will rarely ever use have those S3 folders archive to Glacier.

Nothing to backup and you can store petabytes in glacier cheaper than any other option on the planet. :)

Re:Use Amazon S3 storage with glacier archival by Anonymous Coward · 2015-07-25 17:06 · Score: 1

Are you kidding? Amazon S3 is ~0.03 per gigabyte PER MONTH (even upto half a PT they're like ~0.028 per gig PER MONTH. It only takes a quick scroll at some of the solutions on this thread that get you to ~5-15 cents per gig FOREVER (and cheaper in the future as prices fall).
In other words, Amazon S3 is cheaper to start with, but that "cheaper" only lats like 2 months, then it keeps costing you more than your entire solution would've cost you on premises. After a year, there's no question that amazon is WAY more expensive. And you can't process that data---unless you pay amazon for the computing resources. An in-house hadoop cluster would provide both storage AND compute.
True, there's a lot less headache with amazon, but it's definitely not cheap (not to mention you'll be paying by gigabyte to get the data out one day).
Re:Use Amazon S3 storage with glacier archival by Anonymous Coward · 2015-07-25 22:54 · Score: 0

Have you checked the retrieval costs for geting objects from Glacier to S3? If you have you wouldn't suggest they use Glacier!
Re:Use Amazon S3 storage with glacier archival by im_thatoneguy · 2015-07-26 17:29 · Score: 1

Not to mention bandwidth. How are you going to move 500TB to the cloud and back in a reasonable time frame? You're looking at several months even over a gigabit connection.

Object Storage by Anonymous Coward · 2015-07-25 10:24 · Score: 0

Here's an option.

http://www.nasuni.com/solutions/scale-out-storage/

"Whether the system is 10TB, 100TB or 10PB, it is available through every Filer "

half a petabyte - is that all ? by Anonymous Coward · 2015-07-25 10:32 · Score: 0

You can do that with a single server with a rack full of drive expansion bays and SAS expanders.

Probably a fraction the cost of a NAS solution as well.

And if you need a backup, just replicate it to another box of exactly the same spec in a different building on the same site, hook them up with 40Gbit of bandwidth and you can replicate to your hearts content.

Simplez.

Seriously by Anonymous Coward · 2015-07-25 10:57 · Score: 0

If you need to ask HERE about THIS kind of install, you are the wrong guy to be handling this. Seriously.

What's next when you have the system? You come back with questions about tuning and operations? You want credit for the work without doing the work.

Depends on what you need to do with it by radish · 2015-07-25 11:07 · Score: 1

Where I work we deal with data sets of a similar order. However, different data sets are stored differently depending on need. For online relational data where performance is critical, it's in master/slave/backup DB clusters running with 4.8TB PCIe SSDs. The backups are taken from a slave node and stored locally, plus they're pushed offsite. No tape, if we need a restore we can't really wait that long.

For data we can afford to access more slowly we use large HDFS clusters with regular SATA discs. There's a level of redundancy built in there, and where data is important enough to need a real backup (much of it is not) it is also pushed offsite. The HDFS approach has the advantage of presenting as a very large filesystem, and obviously if you're running hadoop against it there's an automatic advantage.

--

---- Den ene knappen er powerknapp, den andre er Bender voice knapp "Bite My Shiny Metal Ass"

From someone who's bought this much storage... by rockmuelle · 2015-07-25 11:17 · Score: 1

While I agree with most commenters that you need to supply many more details before even beginning to narrow the options, if you do look at the storage vendors, DDN (Data Direct Networks) is really hard to beat.

I see the EMC Isilon guys posting here and need to counter. :) They are overpriced and underpowered for almost every application. Their strength is typical enterprise environments - lots of small files accessed via NFS and "enterprise" SLAs. That's almost always the wrong solution for big data applications (NFS is terrible for big data). EMC Isilon sold a lot of storage into my space (gene sequencing) and very few customers are happy, especially when they find out what the other vendors could do.

I've organized bake-offs between DDN, Isilon, and a number of other vendors. DDN always came out ahead on price and performance (every time they were half the price and twice the speed as Isilon). DDN is the most represented of the vendors on the Top 500 Supercomputing list and also power a certain streaming movie/TV service we all know and love. DDN is also a pretty ethical - if they're a bad match for your application, they'll let you know and provide recommendations.

Whatever you do, don't build it yourself. As tempting and fun as it is, given that you're asking the question, you've already self-identified as someone who won't be able to support it. I've seen many smart people go the SuperMicro JBOD route only to create support nightmares for themselves.

Also, for that much space, avoid Amazon at all costs. It's way too expensive compared to dedicated hardware.

For cost, budget around $150-250k to get started. It might seem pricey, but you'll spend more than that on manpower building it yourself (or your first few months on Amazon).

In addition to DDN, IBM, Dell, and HP all have solutions in this range that aren't terribly expensive.

-Chris

Gluster or Ceph by Anonymous Coward · 2015-07-25 11:44 · Score: 1

Gluster or Ceph, depending on requirements.

Both are Open Source, call Red Hat if you want support.

In a hidden directory by aquabat · 2015-07-25 11:47 · Score: 1

I keep it all in a separate drive, and only mount it when I want to look at the data. Also, I mount it under .porn, so it isn't visible in a casual listing.

--
A republic cannot succeed till it contains a certain body of men imbued with the principles of justice and honour.

CoW and Replication on Resilient Storage by Anonymous Coward · 2015-07-25 11:51 · Score: 0

Or, Copy on Write with tons of copies and cheap storage...

What OS? If you are using Windows, Shadow Copies do this... if using Linux, use LVM with snapshots. Do that on both sides.

Or store it on a SAN with snapshots and replicate.

Re:CoW and Replication on Resilient Storage by Bengie · 2015-07-26 07:10 · Score: 1

Windows is limited to 512 total shadow copies. Shadow copies could accidentally be lost for a number of reasons, they are not guaranteed. Microsoft has a list of things to be careful about that can influence your chance of losing a shadow copy, including block size and defragmentation, which could cause older shadow copies to get destroyed.

LVM has performance issues. Many people complaints of over 10x reduction in performance after only a few snapshots. It also only works at the block level and not the FS level, which highly limits its usefulness.

Perfect device by Anonymous Coward · 2015-07-25 12:01 · Score: 0

Put it on /dev/null :-)

Your use case is likely unique by davidwr · 2015-07-25 12:17 · Score: 1

Given how few use cases there are like the one you describe, there are probably a lot of important considerations that didn't make it into your question that make your use case unique.

This is one of those cases where you really need to sit down and decide what works best for your situation, NOT what works best for other situations that require this amount of data storage.

--
Knowledge is how to play a game, intelligence is how to win, wisdom is knowing what game to play.

"Do my job" by Anonymous Coward · 2015-07-25 12:30 · Score: 0

OK, I'll do your job. Use multiple storage servers with DFS. To backup your DFS, buy the same thing somewhere else with 1.5x the capacity, and set up an rsync with a dedup FS. Aren't you glad you asked /. to do your job?

In a petafile, obviously by raymorris · 2015-07-25 12:43 · Score: 1

To store files close to a petabyte, you need a petafile, obviously.

Why not a NAS? by Anonymous Coward · 2015-07-25 12:49 · Score: 0

We got an EMC Isilon X410 cluster last year where I work. It supports SMB, NFS, HDFS, or OpenStack Swift. I'd recommend storing/retrieving your 100-200MB objects programmatically using Swift. IF they need the directory-tree, you can present it over SMB or NFS to the humans. We use a slower/larger Isilon NL cluster at our DR site which we replicate live date and snapshots to from the main one.

Backups by manu0601 · 2015-07-25 12:53 · Score: 1

Storing the data is the easy part, Glusterfs should do it just fine. The point I am curious about is backups: how do you backup such a volume?

EMC by Anonymous Coward · 2015-07-25 13:13 · Score: 0

Well, if you want users (multiple) to have access, you need a NAS, not a SAN. I've had pretty good luck with EMC Isilon. It's a NAS, supports easy scale out with a minimum of 3 or 4 nodes and a maximum of something over 100. A single file system can be I think multiple terabytes

Server Based Storage by kaustik · 2015-07-25 13:15 · Score: 1

Disclaimer: I work for a storage vendor. Also a long time Slashdot reader though, so this isn't mean as a sales pitch.

Half of a petabyte is not really a lot of data in today's world. I talk to people every day that are trying to find ways to manages many PBs (into the hundreds) and are having challenges doing this with traditional storage. The trend that was started by the big Internet companies is to get rid of the fibre-channel SANs and instead solve the problem of storage using standard x86 servers. They use Linux as an abstraction layer from the hardware, and applications acting as storage systems too pool many servers together.

One of the challenges you need to get over is stretching a namespace that big without filesystem limitations like maximum inode counts. This is generally accomplished using some type of key/value store (object) under the hood. Single flat namespaces with no practical size barrier.

Some options that are available today are Swift from OpenStack and Ceph from Red Hat if you want to go the open source route. These can be good choices if you have the engineering staff on hand to piece it all together and the talent to keep it running. GPFS is also making a come back in this area, and there are a ton of startups looking at this space now.

My company has a commercial solution for this stuff. Pretty cool - it's a Linux app and runs on the server of your choice. I'l save you the sales pitch, and if you want you can try it for free on your own here: http://scality.com/trial

Whatever you choose, best of luck to you!

Start Off Right... by BDMcGrew · 2015-07-25 13:45 · Score: 1

I am a professional and manage several hundred petabytes globally. From experience I can tell you, they may be asking for half petabyte right now but tomorrow that will double and again next year and so on. Plan big to start with and you'll save your future self a lot of grief! If you PM me I can give you more details but in short I can suggest:

1) Look at a scalable filesystem like GPFS or StorNext. Yes there is a price tag associated with big iron filesystems (and no I don't work for any of them) but you get what you pay for, and scalability is everything. As an example - pairing GPFS with TSM and the right hardware, I can create an infinitely scalable filesystem that'll scale to yodabytes.

2) Tier the storage system. Think SSD for the cache (here and now) I/O, winchester disk for the short term and tape for the long term. Yes, tape: compute cost per tb on tapes the vault versus square footage in the data center.

3) Separate your networks. Keep the client access separated from the disk i/o. Doing this will save massive congestion problems from day one!

There are lots of other things to consider but by today's standards a half petabyte isn't an insurmountable amount of data just like a terabyte was twenty years ago.

Mega by pestilence669 · 2015-07-25 14:35 · Score: 1

It may sound "funny," but I once priced Mega (KimDotCom) for offsite backup & storage. They turned out to be less expensive than Amazon Glacier by a bit AND instantly available. We didn't go with them. Instead, we replicated across data centers with multi-terabyte storage nodes.

Fuck a midget in the ass by Anonymous Coward · 2015-07-25 14:56 · Score: 0

I back up my 3 exabytes of porn by printing out the contents of the files.

Store it in the cloud by ljw1004 · 2015-07-25 14:59 · Score: 1

Store it in the cloud. 1/2 petabyte isn't even the "highest tier" requirement.

On Azure it will cost $168k/year to store this much data instantly accessible. Whatever other solution you come up with, if it takes more than 1 full time person to support, then it's already more expensive (and that's not even including the up-front capital costs, installation and setup costs, training costs, deprecation, maintainance, ...)

Outside the box by Anonymous Coward · 2015-07-25 15:57 · Score: 0

I would look at CohoData, Cleversafe, Qumulo, or Cloudian. It scales well, easy to manage, and comes in around $0.70-$1.00 per GB

Hadoop by RabidMonkey · 2015-07-25 16:04 · Score: 1

Sounds like a fairly simple case for a Hadoop cluster - a smallish one at that. We're currently deploying to clusters at 1PB/rack density, which means you could deploy a rack or two easily enough. You'd get compute, you get a single flat filesystem, you get redundancy, all built in. Our biggest cluster is now up to 16PB, all one big compute/storage beast, chugging away all day.

I'd suggest starting with the Hortonworks Sandbox VM - grab it, fire it up, play with it. Add some files, poke around, see if it meets your needs. Learn about mapreduce, or maybe your data can be put in to HIVE for analysis.

The nice thing is that yo ucan use hardware you may already have to get things going. Hortonworks is pretty much at the point of a 'next next finish' installer, so you really only need to dedicate a few hours to getting something up to test. Then, thre's a lot of tuning and craziness to running a bigger cluster, but a POC is simple.

Anyhow, I'm blind, because all I do is Hadoop clusters all day, but this seems like an easy win for ya.

GL;HF!

--
We emerge from our mother's womb an unformatted diskette; our culture formats us. - Douglas Coupland

You're out of your league by Loconut1389 · 2015-07-25 16:37 · Score: 1

Not only are you out of your league, but you're barking up the wrong tree.

1) You should hire someone to figure it out for you- as either on-site consultancy or use something like amazon.
2) You should use a different site that has more than 5 legitimate comments on a thread.

EMC SANs by AnythingButMicrosoft · 2015-07-25 17:13 · Score: 1

If costs are not a priority look into using multiple EMC SANs striped in a RAID array. I've installed a few with the largest encompassing 14 physical units for ~100 VMs, they work great.

Re:EMC SANs by swb · 2015-07-25 23:58 · Score: 1

Are there vendors that actually support RAID across otherwise independent SANs?
Like if you had SANs A through F, each with a 10 TB volume and you used SAN controller Z (which has no disks of its own) to take those 10 TB volumes and turn them into a single (say RAID-6) volume.
I've done this for laughs with a NAS4Free implementation, using its iSCSI client to mount LUNs from 3-4 different storage devices and then combining those mounts into a RAID LUN which I then exported via ISCSI and used on a client.
It seems like an interesting idea, and put together right seems like it might offer some relatively interesting redundancy versus some of the replication and mirroring options I've seen vendors advertise.

Not a do it yourself project by cmurf · 2015-07-25 18:08 · Score: 1

Get quotes from Netapp, EMC, and Red Hat.

MoosFS, Exablox or Scailty Ring by ACorvus · 2015-07-25 22:57 · Score: 1

How about MooseFS (http://moosefs.org) for an OSS solution, or if you want appliances off the shelf that won't cost you a limb or three, Exablox (http://exablox.com). Or if you need more than the 700TB that can give you, how about http://www.scality.com/ - which is software defined and you can use your own iron.

--
-- Sig Sig Sputnik

Easy by Anonymous Coward · 2015-07-26 00:51 · Score: 0

Amazo storage with a dedicated connection to the Amazon cloud from your data center

Use cluster or ceph by terry.bowling · 2015-07-26 00:58 · Score: 1

Both are free, hardware agnostic and the future of software defined storage. And Red Hat can provide enterprise support if you need.

Know what your objectives are by Anonymous Coward · 2015-07-26 02:52 · Score: 0

Disclaimer I work at one of the Big 5 storage vendors, but we try to be as upfront and straight forward when dealing with our customers. It is one of the reasons that time and again our sales and support teams are cited as being strong to work with. It is not to say that we're perfect, but just that we care a awful lot about our customers.

All of the posters who talk about IOPs, throughputs, availability requirements, required operating models, etc. are right. Basically these folks point out that you must define and adhere to your requirements and do things like compute the total cost of ownership over a 3-5 year time horizon -- basically the time to fully depreciate the equipment from a taxation perspective. In the TCO you'll want to include everything you think that you want to tackle: Ongoing development support or not, sparing or not, supply chain management or not, WAN bandwidth costs (especially important if you're partially in one of the Big 3 Cloud platforms), needs for regulation/legalities (Example if you're in an industry which must report on data breaches especially with customer data, think Payment Card Industry or Health Care, you may want a partner to share liabilities), O&M costs (including employees), and so on. Normalizing to a financial model will give you some indication which approach to take, and I would add if you do the model you should look at both the cost and benefit angle. In this case if the system is more directly related to revenues and acquiring the system allows you to increase your business volumes (e.g. revenues) even if the costs are higher then perhaps the lowest cost solution isn't the right approach.

While the above explanation doesn't really cover what we do as one of the Big 5 I will tell you that we have a Chief Economist and spend time with our customers to do the kinds of modeling I mention above. So the short answer is really: Don't purchase IT fashion, do your homework and come up with a solution that provides the best financial benefits to your company even if it is a multi-million dollar storage infrastructure OR only something stood up one of the Big 3 clouds.

As to the point of backup & DR when you begin looking at the total costs, including WAN pipes, make sure you're also adding in restoration simulation and thinking about how to have your users participate in some (or all) of the human generated data recovery practice. Barring legal retention requirements -- some of which can be challenging like those in the Healthcare industry where retention is at least for the life of the patient -- the defining criteria is about data restoration regardless of if your copy is onsite, offsite, mixed up with disaster recovery, etc. Data that is backed up and cannot be restored is well worthless. Even when thinking about fundamental data protection there are areas to be concerned about like multiple drive loss scenarios in a protection set, media reliability and so on. Here's an interesting point: These days there's lots of technology in the area of fundamental data protection like predictive sparing, RAID, erasure coding (yes I know RAID is a form of EC, but...), tape, and BluRay (thank you Facebook). It is this last technology that I want to talk about because it changes disaster protection in my opinion.

What if your media was certified for say 50 or 100 years, could survive water events, and was impervious to EMPs? Well this is BluRay and the advantage which BluRay has over tape: the media format started from the CD and carries over to today. I'll be the first to admit that there's still work to do in the industry, but BluRay shows promise and would have a significant impact on a disaster recovery process because there are new assumptions that could be made.

So my point is do your homework, not all of the Big 5 are evil, and at least some of the Big 5 are savvy enough to know that sometimes we're not the best answer!

We builded the storage system ourself-huge savings by Anonymous Coward · 2015-07-26 03:12 · Score: 0

To save boatloads of money you can build the storage yourself - for us it has been working very well for many years

http://www.juhonkoti.net/2012/01/02/building-a-85tb-cheap-storage-server-with-solaris-openindiana

iSCSI and Ceph an option by Anonymous Coward · 2015-07-26 05:08 · Score: 0

I did some work with Ceph and it was a very interesting experience. Instead of a single server or machine hosting data I had several. I could completely kill a server permanently and ungracefully and the other would have that data and replicate it to a new server. Adding openstack to the equation you now have 100% disposable and virtually configurable nodes that can be used however you want. all it does is PXE into the stack and you then tell the stack what that new node is, who it belongs to and what its role is. This could help you resolve the issue behind both groups needs and allow you to start and grow/scale with the group needs as opposed to running a bunch of storage iron empty while they ramp up. you then buy Cheap storage nodes that run Inexpensive disk and add some SSD for journaling/metadata (they suggest it and no joke it helps).

Best part of this you can grab 5 uber cheap servers off wherever load up and test it to see its proof of concept with nothing more than an investment of time.

Re:Just put "bomb" and "assassinate" in every line by Anonymous Coward · 2015-07-26 07:24 · Score: 0

But how are YOU gonna access the backup ?

Particularly from sunny Guantanamo ?

lustre is your anaswer by Anonymous Coward · 2015-07-26 10:25 · Score: 0

Lustre, IB Fdr, some type of 12Gb/s SAS Jbod. You dont backup on these types of use cases, hence you need a highly scalable filesystem like Listre. You only need to backup critical work data sets, and head nodes for the most part.

Silly questions to consider by Anonymous Coward · 2015-07-26 10:40 · Score: 0

Hi ... been through this. At this scale, small considerations you used to ignore really do matter. If you talk to a pro, you'll need to find out 1) How hot does the backup need to be ... is instant failover required?, 2) How many threads will be reading/writing at the same time ... meaning can you create just bulk storage, or does it need have parallel access ... the difference can be a factor of 5X, 3) Assuming you need some level of storage redundancy, are you talking RAID5, RAID6, RAID10, etc ... or can you deal with redundancy on a file basis (e.g., a Gluster file system), 4) Are there different recovery scenarios? ... Losing a file may be solved by shadowing at the file system level, and if that's what you really need, maybe you don't need full binary backups, 5) Is there a data set that can be used as a kernel to regenerate the rest of the data? ... is there a tradeoff between backup size/complexity versus processor cycles used to regenerate from a data kernel? Moral: large storage is not a upscaled version of small storage.

Speaking from experience (we have done this) by Anonymous Coward · 2015-07-26 12:52 · Score: 0

I worked on a project to do just this for research data. We decided to purchase two storage arrays and backup by taking snapshots and replicating the data to the other site. A NetApp DS4486 disk shelf can hold 48x4TB drives (192TB raw) in 4RU. Also, NetApp's only replicate unique data (i.e if your 500TB dedupes down to 250TB you only need 250TB at the backup site). Just setup snapshot and replication policies and away you go. That'd be how I'd do it. Best thing is that you can check all of the data from the backup site as it's all online. They also sell an Amazon virtual NetApp appliance so if you don't have another good site to replicate to then you can do that.

one bit at a time by bingoUV · 2015-07-26 17:20 · Score: 1

'nuff said

--
Bingo Dictionary - Pragmatist, n. A myopic idealist.

We use a combination of tools by Gumbercules!! · 2015-07-26 19:02 · Score: 1

We store and backup about this much data (a little more), although spread across a variety of machines. All in all, though, the data is primary virtual hard drives (we run a private cloud environment).

Storing it on disk is easy enough - and cheap enough, that it's little concern. Amazon, Azure, etc. are *insanely* expensive for this task, month by month, compared to self owned disks.

As our hypervisors are all Microsoft (Hyper-V - and yes, I know this is Slashdot and I just said I use a Microsoft product but it's easily the most economical approach, when 99% of your clients need Windows licensing), we use Windows Server 2012 R2 native tiered storage pools on a mix of SATA HDD and SSD to achieve the storage, generally spread across a group of Supermicro servers with large numbers of disk bays - effectively software defined storage.

For backup, we use the highly dense 1RU servers, with 12 bays (Supermicro again), with commodity 6 or 8TB SATA disks. Each RU can get near to 100TB of storage (raw) and they don't use much kW - and they cost hardly anything. Backups are performed using Microsoft DPM 2012 R2, as well, because, again, cheapest option and so far, 0 problems.

The biggest issue I have is airwalled backups - those are hard to manage, for low dollars, for this kind of setup. So I've resorted to having a few more backup machines and manually swapping the network cable from one group, to the next, as the equivalent of swapping tapes.

Re:RAID across SANS (was: EMC SANs) by Anonymous Coward · 2015-07-27 03:49 · Score: 0

Hitachi Data Systems' virtual storage platform (an 'appliance' front-end to virtualize FC SAN arrays) did this c. 2003, so I suppose the technology is still available now.

Sanify by Anonymous Coward · 2015-07-27 11:24 · Score: 0

I've used Sanify for the last 4.5 years. Rock solid. Commodity hardware, interface via iSCSI, auto-failover and migration while hot, whatever interconnect you want (ethernet, FC, IB) and software control for replication count and controller count. 16+TB volume might need a little more room in meta-data to support though. Sales email.

Use a full ZFS system by Anonymous Coward · 2015-08-03 07:50 · Score: 0

It will keep things redundant and safe from corruption. A incremental backup of the volumes can provide a large grain backup offsite.

Use a full ZFS system by Anonymous Coward · 2015-08-03 07:53 · Score: 0

It will keep things redundant and safe from corruption. A incremental backup of the volumes can provide a large grain backup offsite.
Works to Exabyte sizes.

Slashdot Mirror

Ask Slashdot: How Do You Store a Half-Petabyte of Data? (And Back It Up?)

219 comments