Slashdot Mirror


Ask Slashdot: How Do You Store a Half-Petabyte of Data? (And Back It Up?)

An anonymous reader writes: My workplace has recently had two internal groups step forward with a request for almost a half-petabyte of disk to store data. The first is a research project that will computationally analyze a quarter petabyte of data in 100-200MB blobs. The second is looking to archive an ever increasing amount of mixed media. Buying a SAN large enough for these tasks is easy, but how do you present it back to the clients? And how do you back it up? Both projects have expressed a preference for a single human-navigable directory tree. The solution should involve clustered servers providing the connectivity between storage and client so that there is no system downtime. Many SAN solutions have a maximum volume limit of only 16TB, which means some sort of volume concatenation or spanning would be required, but is that recommended? Is anyone out there managing gigantic storage needs like this? How did you do it? What worked, what failed, and what would you do differently?

29 of 219 comments (clear)

  1. ceph by drew8523 · · Score: 3, Informative

    we use Ceph, its fast, redundant, and crazy scalable, oh did i mention free (paid support)? ceph.com

    1. Re:ceph by u-235-sentinel · · Score: 2

      we use Ceph, its fast, redundant, and crazy scalable, oh did i mention free (paid support)? ceph.com

      Personally I've been using Ceph for the last few years myself. It has to be one of the best DFS's I've ever used. It includes security, speed, easy to expand by adding additional nodes. The free part was great. I found it looking through the repos one day. You can even tie it into other projects such as Hadoop (at least I recall reading it had a plug in a couple years ago).

      Great product!

      --
      Has Comcast disconnected your Internet account? Same here. You can read about it at http://comcastissue.blogspot.com
  2. Ambiguous by smittyoneeach · · Score: 4, Insightful

    Do you mean:
    (a) "Don't store it. Employ Amazon (or some other cloud) storage."? or
    (b) "Do not use Amazon."
    Clarity: it's like that one thing that is not the other thing, except for when it is.

    --
    Get thee glass eyes, and, like a scurvy politician, seem to see things thou dost not.--King Lear
  3. Talk to Vendors by Old+VMS+Junkie · · Score: 3

    Honestly, you should talk to the pros. I would call a couple of storage vendors, give them the basic outline of what you want to do, and let them tell you how they would do it. You can even get more formal and issue a Request for Information (RFI) or even a Request for Quote (RFQ). If you're a biggish company, your purchasing people probably have an SOP and standard forms for how to issue an RFI/RFQ. For the big boy storage vendors, half a petabyte is commonplace. The bigger question may very well be what this is going to look like at a software level. Managing the data might be a bigger challenge than storing it. Is this going to be organized in some sort of big data solution like Hadoop? Is it just a whole bunch of files and a people are going to write R or SAS jobs to query against it? Sometimes the tool set that you want to use will drive your choices in how to build the infrastructure under it.

    1. Re:Talk to Vendors by Anonymous Coward · · Score: 5, Informative

      Honestly, that's the WORST thing to do. When you talk to the pros, they will try and sell you some outrageous overpriced Fiber Channel system that's total overkill for what you are doing. I've worked with 'big data' storage companys like EMC and Netapp. We needed 300TB of 'nearline' storage, and EMC came up with a $3,000,000.00 TOTAL overkill Fiber Channel solution, and Netapp wasn't much better, coming in at close to $2,000,000.00. Total ripoff. The ONLY reason you would ever choose Fiber Channel over ISCSI is if you are doing HUGE transactional database, with millions of access per minute. If you just need STORAGE, I went with Synology, and got 300TB of RAID-10 storage for about 100K. I DUPLICATED it (200K total), and still only paid 10% of what the 'vendors' tried to sell me, I was VERY clear that I did not need Fiber Channel, I refused to spend tons of money for something that would have zero bearing on the performance, and found it's much better to research and provide your own solution at 10% of the cost of the big vendors. Why do you think EMC has almost 3Billion of revenue, because they convince pointy haired bosses that their solution is the best. Trust me, going with a 2nd tier vendor for 'near line storage' is a much better idea than talking to the 'big 5' to ask for a solution

    2. Re:Talk to Vendors by mlts · · Score: 2

      Oracle has a SAN (well, SAN/NAS) offering which does similar with a rack of ports/HBAs that were configurable, assuming the right SFP was present. Want FC? Got it. iSCSI? Yep. FCoE? Yep. Want to just share a NFS backing store on a LAG for a VMWare backing store. Easy doing.

      The price wasn't that shocking either. It wasn't dirt cheap like a Backblaze storage pod, but it was reasonable, especially with SSD available and autotiering.

    3. Re:Talk to Vendors by AK+Marc · · Score: 2

      He wasn't very clear about his complaint, but talking to professional sales people about what you need will never get you an optimal solution.

  4. Depends who you ask... by snowgirl · · Score: 4, Interesting

    At Facebook, it's memcached, with an HDD backup, eventually put onto tape...

    At Google, it's a ramdisk, backed up to SSD/HDD, eventually put onto tape...

    For anyone who can't afford half a petabyte of RAM with the commensurate number of computers? I have no good ideas... except maybe RAM cache of SSD, cache of HDD, backed up on tape...

    Using something like HDFS to store your data in a Hadoop cluster of file requests, is likely the best F/OSS solution you're going to get for that...

    --
    WARNING! This girl exceeds the MAXIMUM SAFE standards established by the FDA for BRATTINESS
    1. Re:Depends who you ask... by tsetem · · Score: 2

      Thumbs up on HDFS. The next question to ask your groups how they will be analyzing it. HDFS (and Hadoop/Spark/Whatever) will hopefully fit in nicely there. Not only will your data be redundantly copied across multiple systems, but as your data needs (and cluster) grows, so does your computational power.

      Getting data in & out can be done via Java API, Rest API, FUSE or NFS Mounts. The only issue is that HDFS doesn't play well with small files, but hopefully your groups will be using large files instead.

      Now administration is another story, but then there's Cloudera's Manager that's supposed to greatly simplify management. I'm currently using it to store about .25 PB right now for random analysis, but growing it's capacity is a straightforward task.

      As far as backing up, HDFS provides snapshots, 3x replication (or more) across nodes in the cluster. Of course there's always the big hammer of just getting a second cluster. As an old HW sage once told me, "If you can't afford to buy two, don't buy one"

  5. Enterprise Storage by NFN_NLN · · Score: 2

    This project must have an unrealistically low budget, otherwise there are quite a few Enterprise solutions that will do all OR a combination of these tasks.

    > how do you present it back to the clients?
    Look at a NAS, not a SAN. ie NetApp or 3Par C series.

    > And how do you back it up?
    Disaster Recovery replication to another system or hosted services. NetApp, EMC, 3Par, etc, etc

    > Many SAN solutions have a maximum volume limit of only 16TB
    NetApp Infinite volumes limit is 20PB

    You can contact a sales person from any of those companies to answer any of these questions.

    1. Re:Enterprise Storage by NatasRevol · · Score: 2

      Yeah, the 16TB limit says OP is looking at VERY low end solutions. As in not feasible for petabyte range projects.

      --
      There are two types of people in the world: Those who crave closure
  6. use slashdotFS by goombah99 · · Score: 3, Funny

    I use slashdotFS which is a markovian random comment generator which effectively embeds data in a stegenographic comment. The FS handles the details of creating and saving these so it's all transparent and mounts on your desktop like a regular drive. It's slow but it's capacity seems unlimited and frequently gets modded insightful

    --
    Some drink at the fountain of knowledge. Others just gargle.
    1. Re:use slashdotFS by goombah99 · · Score: 2

      another way is to convert it to jpeg and store it in facebook.

      --
      Some drink at the fountain of knowledge. Others just gargle.
  7. SanDisk sells a 512TB 3U shelf... by AcquaCow · · Score: 2

    SanDisk's Infiniflash is 512TB in a 3U chassis that is SAS-connected. You can front this with something like DataCore's SANsymphony to turn it into a NAS/SAN appliance.

    The pricing looks to be around $1/GB, which is a ton cheaper than building a SAN of that capacity, plus it's much smaller in power/space/cooling.

    --

    up 12 days, 22:30, 2 users, load averages: 993.20, 994.21, 994.56
    *makes note to limit user processes...
  8. How are you using the data? by MetricT · · Score: 2

    What clients will you be exporting it to? Linux, OS X, Windows? All three?

    What kind of throughput do you need? Is 10 MB/sec enough? 100 MB/sec? 10 GB/sec?

    What kind of IO are you doing? Random or sequential? Are you doing mostly reads, mostly writes, or an even mix?

    Is it mission critical? If something goes wrong, do you fix it the next day, or do you need access to a tier 3 help desk at 3 am?

    We have a couple of petabytes of CMS-HI data stored on a homegrown object filesystem we developed and exported to the compute nodes via FUSE. Reed-Solomon 6+3 for redundancy. No SAN, no fancy hardware, just a bunch of Linux boxes with lots of hard drives.

    There is no "one shoe fits all" filesystem, which is part of the reason we use our own. If you have the ability to run it, I'd suggest looking at Ceph. It only supports Linux, but has Reed-Solomon for redundancy (considered it a higher tier of RAID) and good performance if you need it. If you have to add Windows or OS X clients into the mix, you may need to consider NFS, Samba, WebDAV, or (ugh) OpenAFS.

  9. You're asking like you will be implementing it... by tlambert · · Score: 4, Interesting

    You're asking like you will be implementing it... don't.

    Gather all their requirements, gather your requirements on top of it (I'm pretty confident that some of those requirements were your additions for "you'd be an idiot to have that, but not also have this...", possibly including the backup).

    Then put out an Preliminary RFP to the major storage vendors, including asking them what they'd say you'd missed in the preliminary.

    Then take the recommendations they make on top of the preliminary with a grain of salt, since most of them will be intended to insure vendor lock-in to their solution set, revise the preliminary, and put out a final RFP.

    Then accept the bid that you like which management is willing to approve.

    Problem solved.

    P.S.: You don't have to grow everything yourself from seed you genetically modify yourself, you know...

  10. Easy by ArcadeMan · · Score: 5, Funny

    How Do You Store a Half-Petabyte of Data? (And Back It Up?)

    That's the easiest question I've ever seen.

    1. Wait about a decade or so.
    2. Buy two half-petabyte flash drives.
    3. Alternate your copies on the two flash drives, the previous one becomes your backup.

    NEXT!

  11. What are your budget and reliability requirements? by fishnuts · · Score: 2

    If you have a small budget and moderate reliability requirements, I'd suggest looking into building a couple Backblaze-style storage pods for block store (5x 180TB storage systems, apx $9000 each), each exporting 145TB RAID5 volumes via iSCSI to a pair of front-end NAS boxes. NAS boxes could be FreeBSD or Solaris systems offering ZFS filestores (putting multiples of 5 volumes, one from each blockstore, together in RAIDZ sets), which then export these volumes via CIFS or NFS to the clients. Total cost for storage, front-ends, 10GbE NICs and a pair of 10GbE switches: $60K, plus a few weeks to build, provision, and test.

    If you have a bigger budget, switch to FibreChannel SANs. I'd suggest a couple HP StorServ 7450s, connected via 8 or 16Gb FC across two fabrics, to your front ends, which aggregate the block storage into ZFS-based NAS systems as above, implementing raidz for redundancy. This would limit storage volumes to 16TB each, but if they're all exposed to the front ends as a giant pool of volumes, then ZFS can centrally manage how they're used. A 7450 filled with 96 4TB drives will provide 260TB of usable volume space (thin or thick provisioned), and cost around $200K-$250K each. Going this route would cost $500-$550K (SANs, plus 8 or 16Gb FC switches, plus fibre interconnects, plus HBAs) but give you extremely reliable and fast block storage.

    A couple advantages of using ZFS for the file storage is its ability to migrate data between backing stores when maintenance on underlying storage is required, and its ability to compress its data. For mostly-textual datasets, you can see a 2x to 3x space reduction, with slight cost in speed, depending on your front-ends' CPUs and memory speed. ZFS is also relatively easy to manage on the commandline by someone with intermediate knowledge of SAN/NAS storage management.

    Whatever you decide to use for block storage, you're going to want to ensure the front-end filers (managing filestores and exporting as network shares) are set up in an identical active/standby pair. There's lots of free software on linux and freebsd that accomplish this. These front-ends would otherwise be your single-point-of-failure, and can render your data completely unusable and possibly permanently lost if you don't have redundancy in this department.

  12. Re:Call ixsysyems, use ZFS by NatasRevol · · Score: 2

    ZFS is a great raid system. That's now owned by Oracle. Goodbye ZFS.

    --
    There are two types of people in the world: Those who crave closure
  13. Wrong questions. More details needed. by d3vi1 · · Score: 5, Informative

    You're not asking the right questions:

    The first correct question is why on earth would someone need to access half a petabyte? In most cases the commonly accessed data is less than 1%. That's the amount of data that realistically needs to reside on disk. It never is more than 10% on such a large dataset. Everything else would be better placed on tape. Tiered storage is the answer to the first question. You have RAM, solid/flash storage (PCI based), fast disks, slow high capacity disks and tape. Choose your tiering wisely.

    The second question you need to ask is how the customer needs to access that large datastore. In most cases you need serious metadata in parallel with that data. For Petabytes of data you cannot in most cases just use an intelligent tree structure. You need a web-site or an app to search that data and get the required "blob". For such an app you need a large database since you have 5M objects with searchable metadata (at 200MB/blob).

    The third question is why do you have SAN as a premise? Do you want to put a clustered filesystem with 5-10 nodes? Probably Isilon or Oracle ZS3-2/ZS4-4 are your answer.

    Fourth question: what are the requirements? (How many simultaneous clients? IOPS? Bandwidth? ACL support? Auditing? AD integration? Performance tuning?)

    Fifth question: There is no such thing as 100% availability. The term disaster in Disaster Recovery is correctly placed. Set reasonable SLA expectations. If you go for five-nine availability it will triple the cost of the project. Keep in mind that synchronous replication is distance limited. Typically, for a small performance cost, the radius is 150 miles and everything above impacts a lot.

    Even if you solve the problems above, if you want to share it via NFS/CIFS or something else you're going to run into troubles. Since CIFS was not realistically designed for clustered operation regardless of the distributed FS underneath the CIFS server, you get locking issues. Windows Explorer is a good example since it creates thumbs.db files, leaves them open and when you want to delete the folder you cannot unless you magically ask the same node that was serving you when it created the Thumbs.DB file. Apparently, the POSIX lock is transferred to the other server and stops you from deleting, but when Windows Explorer asks the other node who has the lock on the file you get screwed since the other server doesn't know. Posix locks are different from Windows locks. It affects all Likewise based products from EMC (VNX filler, Isilon, etc.) and it also affects the CIFS product from NetApp. I'm not sure about Samba CTDB though.
    I would design a storage based on ZFS for the main tiers, exported via NFSv4 to the front-end nodes and have QFS on top of the whole thing in order to push rarely accessed data to Tape. The fronted nodes would be accessed via WebDAV by a portal in which you can also query the metadata with a serious DB behind it.

    I've installed Isilon storage for 6000 xendesktop clients that all log-on at 9AM, i've worked on an SL8500, Exadata, various NetApp and Sun storages and I can tell you that you need to do a study. Have simulations with commodity hardware on smaller datasets to figure out the performance requirements and optimal access method (NAS, Web, etc.). Extrapolate the numbers, double them and ask for POC and demos from vendors, be it IBM, EMC, Oracle, NetApp or HP. Make sure that in the future, when you'll need 2PB you can expand in an affordable manner. Take care since vendors like IBM tend to use the least upgradable solution. They will do a demo with something that can hold 0,6PB in their max configuration and if you'll need to go larger you'll need a brand new solution from another vendor.

    It's not worth doing it yourself since it will be time-consuming (at least 500 man-hours until production) and with at least 1 full-time employees for the storage. But if you must, look at Nexenta and the hardware that they recommend.

    And remember to test DR failover scenarios.

    Good luck!

    --
    UNIX was not designed to stop you from doing stupid things, because that would also stop you from doing clever ones.
  14. Re:Call ixsysyems, use ZFS by darkpixel2k · · Score: 4, Informative

    Nope. Not 'owned'. It's covered under the CDDL and developed by a group that isn't associated with Sun. Open-ZFS.

    --
    There's no place like ::1 (I've completed my transition to IPv6)
  15. But restore ... by Ungrounded+Lightning · · Score: 2

    Just put "bomb" and "assassinate" in every line. ... It's all going to get backed up.

    But getting them to restore it after it's gotten lost or corrupted is difficult.

    --
    Bantam Dominique roosters crow a four-note song. Once you've heard it as "Happy BIRTHday" you can't NOT hear it that way
  16. What are your IOPS and throughput requirements? by DamnStupidElf · · Score: 2

    For high throughput/IOPS requirements build a Lustre/Ceph/etc. cluster and mount the cluster filesystems directly on as many clients as possible. You'll have to set up gateway machines for CIFS/NFS clients that can't directly talk to the cluster, so figure out how much throughput those clients will need and build appropriate gateway boxes and hook them to the cluster. Sizing for performance depends on the type of workload, so start getting disk activity profiles and stats from any existing storage NOW to figure out what typical workloads look like. Data analysis before purchasing is your best friend.

    If the IOPS and throughput requirements are especially low (guaranteed < 50 random IOPS [for RAID/background process/degraded-or-rebuilding-array overhead] per spindle and what a couple 10gbps ethernet ports can handle, over the entire lifetime of the system) then you can probably get away with just some SAS cards attached to SAS hotplug drive shelves and building one big FreeBSD ZFS box. Use two mirrored vdevs per pool (RAID10-alike) for the higher-IOPS processing group and RAIDZ2 or RAIDZ3 with ~15 disk vdevs for the archiving group to save on disk costs.

    Plan for 100% more growth in the first year than anyone says they need (shiny new storage always attracts new usage). Buy server hardware capable of 3 to 5 years of growth; be sure your SAS cards and arrays will scale that high if you go with one big storage box.

  17. Buy a Storage Pod by Areyoukiddingme · · Score: 3, Informative

    Buy Storage Pods, designed by BackBlaze. You can get 270TB of raw storage in 4U of rackspace for $0.051 per gigabyte. Total cost for half a petabyte of raw storage: $27,686. To back it all up cheaply but relatively effectively, buy a second set to use as a mirror. $55,372. For use with off-the-shelf software (FreeNAS running ZFS or Linux running mdm RAID) to present a unified filesystem that won't self-destruct when a single drive fails, you'll need to over-provision enough to store parity data. Go big or go home. Just buy another pod for each of the primary and the backup sets. Total of 6 pods with 1620TB of raw storage: $83,058. Some assembly required. And 24U of rackspace required, with power and cooling and 10Gbe ethernet and UPSs (another 4-8U of rackspace).

    Expect a ballpark price of something a little under $100,000 that will meet your storage requirements with sufficient availability and redundancy to keep people happy. It will require 2 racks of space, and regular care and feeding. Do the care and feeding in house. A support contract where you pay some asshole tens of thousands of dollars a year to show up and swap drives for you is a waste of money. Bearing that in mind, as other posters have said, talk to storage vendors selling turnkey solutions. Come armed with these numbers. When they bid $1 million, laugh in their faces. But there's an outside chance you'll find a vendor with a price that is something less than hyperinflated. Stranger things have happened.

    If you don't generate data very quickly, you can ease into it. For around $35,000, you can start with just 2 pods and the surrounding infrastructure, and add pods in pairs as necessary to accommodate data growth. Add $27,000 in 2 chassis next year to double your space. Add $26,000 of space again in 2017 and increase your raw capacity another 50%. (Total storage cost using BackBlaze-inspired pods is dominated by hard drive prices, which trend downwards.) When you find out your users underestimated growth, another $25,000 of space in 2018 takes you to somewhere in the neighborhood of 2 petabytes of raw storage, that you're using with double parity and 100% mirrored backup for a total effective useable space of approximately 918TB. You'll be replacing 2-3 drives per year, starting out, and 0-1 after infant mortality has run its course. Keep extras in a drawer and do it yourself in half an hour each on a Friday night. If you configured ZFS with reasonably sized vdevs, (3-5 devices) the array rebuild should be done by Monday morning. By 2020, you'll be back up to replacing 2-3 drives per year again as you climb the far side of the bathtub curve. While you're at it, you can seriously consider replacing whole vdevs with larger capacity drives, so your total useable space can start to creep up over time, without buying new chassis. By 2025, you will have 8 chassis in two racks hosting 2.88PB of raw storage space that's young and vital and low maintenance, having spent roughly $200,000.

    A bargain, really.

  18. Anything is possible with the right budget... by emag · · Score: 3, Informative

    Lucky (?) for you, I just went through purchasing a storage refresh for a cluster, as we're planning to move to a new building and no one trusts the current 5 year old solution to survive the move (besides which, we can only get 2nd hand replacements now). The current system is 8 shelves of Panasas ActiveStor 12, mostly 4 TB blades, but the original 2-3 shelves are 2 TB blades, giving about 270 TB raw storage, or about 235ish TB in real use. The current largest volume is about 100 TB in size, the next-largest is about 65 TB, with the remainder spread among 5-6 additional volumes including a cluster-wide scratch space. Most of the data is genomic sequences and references, either downloaded from public sources or generated in labs and sent to us for analysis.

    As for the replacement...

    I tried to get a quote from EMC. Aside from being contacted by someone *not* in the sector we're in, they also managed to misread their own online form and assumed that we wanted something at the opposite end of the spectrum from what I requested info on. After a bit of back and forth, and a promise to receive a call that never materialized, I never did get a quote. My assumption is they knew from our budget that we'd never be able to afford the capacities we were looking for. At a prior job, a multi-million dollar new data center and quasi-DR site went with EMC Isilon and some VPX stuff for VM storage/migration/replication between old/new DCs, and while I wasn't directly involved with it there, I had no complaints. If you can afford it, it's probably worth it.

    The same prior job had briefly, before my time there, used some NetApp appliances. The reactions of the storage admins wasn't all that great, and throughout the 6 years I was there, we never could get NetApp to come in to talk to us whenever we were looking for expansion of our storage. I've had colleagues swear by NetApp though, so YMMV.

    I briefly looked at the offerings from Overland Storage (where we got our current tape libraries), on the recommendation of the VAR we use for tapes & library upgrades. It looked promising, but in the end, we'd made a decision before we got most of those materials...

    What we ended up going with was Panasas, again. Part of it was familiarity. Part of it was their incredible tech support even when the AS12 didn't have a support contract (we have a 1 shelf AS14 at our other location for a highly specialized cluster, so we had *some* support, and my boss has a golden tongue, talking them into a 1-time support case for the 8 shelf AS12). We also have a good relationship with the sales rep for our sector, the prior one actually hooked us up with another customer to acquire shelves 6-8 (and 3 spares), as this customer was upgrading to a newer model. Based on that, we felt comfortable going with the same vendor. We knew our budget, and got quotes for three configurations of their current models, ActiveStor 14 & 16. We ended up with the AS16, with 8 shelves of 6 TB disk (x2) and 240 GB SSD per blade (10 per, plus a "Director Blade" per). Approximate raw storage is just a bit under 1 PB (roughly 970-980 TB raw for the system).

    In terms of physical specs, each shelf is 4U, have dual 10 GbE connections, and adding additional shelves is as easy as racking them and joining them to the existing array (I literally had no idea what I was doing when we added shelves on the current AS12, it just worked as they powered on). Depending on your environment, they'll support NFS, CIFS, and their own PanFS (basically pNFS) through a driver (or Linux kernel module, in our case). We're snowflakes, so we can't take advantage of their "phone home" system to report issues proactively and download updates (pretty much all vendors have this feature now). Updating manually is a little more time-consuming, but still possible.

    As for backups, I honestly have no idea what I'm going to do. Most data, once written, is static in our environment, so I can probably get away with infrequent longer retention period backups for every

    --
    "The urge to save humanity is almost always a false front for the urge to rule." --H.L. Mencken
  19. Re:Call ixsysyems, use ZFS by Bengie · · Score: 2

    My cousin used ZFS+gluster for this multi-petabyte system.

  20. That's it? by guruevi · · Score: 4, Informative

    500TB is nothing these days. You can easily buy any system and it will support it. Look at FreeBSD/FreeNAS with ZFS (or their commercial counterpart by iXSystems). If you want to have an extremely comfortable, commercial setup, go Nexenta or with a bit of elbow grease, use the open/free counterpart OpenIndiana (Solaris based).

    You can build 2 systems (I personally have 3, 1 with SAS in Striped-Mirrors, 1 with Enterprise-SATA in RAIDZ2 and 1 with Desktop-SATA in RAIDZ2) and have ZFS snapshots every minute/hour/day replicated across the network for backups, both Nexenta and FreeNAS have that right in the GUI. The primary system also has a mirrored head node which can take over in less than 10s. As far as sharing out the data: AFP/SMB/NFS/iSCSI/WebDAV etc. whatever you need to build up on it.

    My system is continuously snapshotted to it's primary backup so that in case of extreme failure (which has not happened in the 7 years since I've built this system) I can run from the primary backup until the primary has been restored with perhaps a few seconds of data loss (don't know if that's acceptable to you but in my case it's not a problem in case we do have a full meltdown)

    Where are those systems limited to 16TB? I wouldn't touch them with a 10-foot pole because they're running behind (within a few years a single hard drive will surpass that limit).

    --
    Custom electronics and digital signage for your business: www.evcircuits.com
  21. Backblaze Storage Pod? by im_thatoneguy · · Score: 2

    What are your performance requirements. If you just need a giant dump of semi-offline storage then look into building a backblaze Storage Pod.
    https://www.backblaze.com/blog...

    For about $30,000 you could build four storage pods. Speed would not be terrific. Backups are handled through RAID. If you want faster, more redundant or fully serviced your next step up in price is probably a $300,000 NAS solution. Which might serve you better anyway.

  22. Re: Don't by Anonymous Coward · · Score: 3, Insightful

    I think that the intention was to stimulate a discussion amongst a community of geeks who have a genuine interest in this type of technology and enjoy discussing solutions that they have built. Sure, you could just outsource the service and pay consultants to do it for you but I don't think that is the general ethos of the traditional Slashdot reader. Also, if you feel that you should be paid for commenting here then this is probably not the forum for you. Twat.