Slashdot Mirror


Facebook Experimenting With Blu-ray As a Storage Medium

s122604 links to CNN's explanation of what may be the future of cold (or at least lukewarm) storage at Facebook, which is experimenting with massive arrays of Blu-Ray discs for seldom-accessed user files. Says the report: The discs are held in groups of 12 in locked cartridges and are extracted by a robotic arm whenever they're needed. One rack contains 10,000 discs, and is capable of storing a petabyte of data, or one million gigabytes. Blu-ray discs offer a number of advantages versus hard drives. For one thing, the discs are more resilient: they're water- and dust-resistant, and better able to withstand temperature swings. Their data can be restored more quickly, and they're easier to transport. Most important, though, is cost. Because the Blu-ray system doesn't need to be powered when the discs aren't in use, it uses 80% less power than the hard-drive arrangement, cutting overall costs in half.

30 of 193 comments (clear)

  1. Why not just use hard drives and then store... by Anonymous Coward · · Score: 2, Insightful

    ... those drives offline or come up with a system to power up the drives via custom san hardware when you want to access them? With facebooks cash it should be do-able.

    1. Re:Why not just use hard drives and then store... by Horshu · · Score: 2

      As the summary says, discs are also waterproof and can deal with greater temperature swings. They'd also be cheaper, even at the bulk HDD rate that FB would pay.

    2. Re:Why not just use hard drives and then store... by ShanghaiBill · · Score: 4, Informative

      They'd also be cheaper, even at the bulk HDD rate that FB would pay.

      A quick on-line search show a spindle of fifty 50GB Blu-Ray discs (2.5 TB) retails for about $100. A 4TB HDD costs about $140. So HDD is actually cheaper per byte of storage. Maybe wholesale price ratios are way different from retail, but I see no reason to assume that. So BluRay doesn't win on price, volume, or access speed. The concerns about moisture and big temperature swings seems odd. Are Facebook data centers exposed to the weather?

    3. Re:Why not just use hard drives and then store... by binarylarry · · Score: 4, Funny

      This is a company who's product stack is written in PHP.

      --
      Mod me down, my New Earth Global Warmingist friends!
    4. Re:Why not just use hard drives and then store... by jklovanc · · Score: 2

      If you are going to slag someone about their use of the English language you could at least tell them the correct word. In this case "whose" is the correct word.

    5. Re:Why not just use hard drives and then store... by Zero__Kelvin · · Score: 2

      This estimate also ignores the cost of a robotic system, powering that system, and maintanence and doesn't factor in costs for redundancy (they need two robotic systems, not one.) The whole thing is phenomenally stupid. As someone already pointed out before I got here to say the same, if you want to take data offline simply literally take it offline. Power down the friggin hard drive array completely. Power it back up when needed.

      --
      Guns don't kill people; Physics kills people! - John Lithgow as Dick Solomon on Third Rock From The Sun
    6. Re:Why not just use hard drives and then store... by jklovanc · · Score: 4, Informative

      So HDD is actually cheaper per byte of storage.

      If the HD needs to be replaced much more frequently than the Blu-Ray media the advantage switches quite quickly. For example, if the HD is replaced every 5 years and the Blu-Ray media is replaced every 20 years the HD would have to cost 1/4 of the Blu-Ray to match the hardware price.

      The concerns about moisture and big temperature swings seems odd.

      Temperature and humidity control are very expensive as it takes a lot of electricity. If the media can handle higher temperature and humidity swings then operation costs will be much lower.

    7. Re:Why not just use hard drives and then store... by machine321 · · Score: 2

      Building storage with hard drives doesn't get you an article on Slashdot (or CNN); pretending you're going to build storage out of optical discs does.

    8. Re:Why not just use hard drives and then store... by niftymitch · · Score: 2

      They'd also be cheaper, even at the bulk HDD rate that FB would pay.

      A quick on-line search show a spindle of fifty 50GB Blu-Ray discs (2.5 TB) retails for about $100. A 4TB HDD costs about $140. So HDD is actually cheaper per byte of storage. Maybe wholesale price ratios are way different from retail, but I see no reason to assume that. So BluRay doesn't win on price, volume, or access speed. The concerns about moisture and big temperature swings seems odd. Are Facebook data centers exposed to the weather?

      Seldom used data sitting in spinning power draining disks has a continuous power cost.
      Power and cooling are important data center considerations.

      Facebook has an astounding pile of data in picture archives that after a couple months are
      only called on once in a while if ever again.

      Layers of storage from the modern very quick SSD devices to spinning rust disks to perhaps BluRay
      seem to have a place when access time and space considerations come to play. I wish them luck.

      One problem with BlueRay, DVD and CDROM media is the lack of data as storage beyond
      five years or so. But as a physical form factor goes these little devices do have a lot of potential.
      I wish them luck and wish I knew what vendor to invest in.

      --
      Truth is stranger than fiction, but it is because Fiction is obliged to stick to possibilities; Truth isn't. Mark Twain.
    9. Re:Why not just use hard drives and then store... by niftymitch · · Score: 2

      This estimate also ignores the cost of a robotic system, powering that system, and maintanence and doesn't factor in costs for redundancy (they need two robotic systems, not one.) The whole thing is phenomenally stupid. As someone already pointed out before I got here to say the same, if you want to take data offline simply literally take it offline. Power down the friggin hard drive array completely. Power it back up when needed.

      Bingo... but given the mass of data Facebook has set themselves up to store they would
      do well to try a multitude of things.

      And redundancy of two at this scale is not going to be sufficient.
      The media will need to be organized as a RAID larger and wider
      than anything folk are used to thinking about.

      A read error on one disc will need to be validated by a very big ECC code
      on the media and also on redundant media local and far away. Two copies
      gives little voting confidence as to which is incorrect so dust off your old
      HP-41 calculator and stat pack or perhaps SPSS and start working
      on the numbers. Then verify and check them with Haskell and R

      Big robot data systems are interesting and even dangerous as they
      get bigger and faster.

      Then there is the security of the OS running the robot. Stuxnet has
      a lesson to be applied here. Lots of stuff spinning... .

      --
      Truth is stranger than fiction, but it is because Fiction is obliged to stick to possibilities; Truth isn't. Mark Twain.
    10. Re:Why not just use hard drives and then store... by jklovanc · · Score: 2

      It all depends on the numbers which we don't have. The problem is the use of relative terms like "wide range" and "higher humidity". For example, High humidity in the tropic is much different than high humidity in the desert. If BR media can handle a wider range of temperature and a higher humidity level there will be savings in HVAC.

      So in five years, you may be able to get 20GB for what a 4GB HDD costs today.

      That is an assumption and I bet that Facebook has looked at what is coming down the pipe. It is quite possible that these price decreases will slow. By the way HD prices have not come down as fast as you seem to think. In 2010 a 2TB Seagate drive sold for 0.0000550 $/MB. In 2014 a 3TB Seagate sold for 0.0000367 $/MB That is a 38% drop in four years. If it followed Moore's law (cut in half every 2 years) it should be 0.00001375 $/MB or a 75% drop.

      PS. You probably meant TB not GB.

    11. Re:Why not just use hard drives and then store... by lucm · · Score: 4, Interesting

      What you describe is called a MAID and over the years it has proven quite unreliable. Hard disks are sensitive creatures and don't age well when being powered-on/powered-off randomly, and because of the nature of cold storage it is difficult to achieve a right balance of redundancy and power savings.

      Also I would advise you to be careful when you label something as "phenomenally stupid" otherwise in instances like this one it may make you look like you are "phenomenally uninformed".

      --
      lucm, indeed.
    12. Re:Why not just use hard drives and then store... by ShanghaiBill · · Score: 2

      I think there is plenty of reason to assume that the retail markup on writable Blu-Ray disks (a niche market at best)

      They are for sale on hundreds of sites, and hundreds more sellers on eBay. Not a "niche" market at all.

      I wouldn't be surprised at all if Facebook can buy 50 BD-R discs for $10.

      No way. That is 20 cents each. The lowest price, from eBay sellers in China, is $2 each. There is absolutely no way that a 90% margin could be maintained in a competitive market. If they wholesaled for 20 cents, someone would be hawking them on eBay for a 10% markup, or 22 cents. Maybe less.

    13. Re:Why not just use hard drives and then store... by lucm · · Score: 4, Insightful

      When you deal with cold storage you have to look at things from a node level, not in global storage size. If your basic unit is a 50GB device instead of a 4TB device, this means that each request you make to recall data has a much smaller footprint.

      Let's say that each stored account takes up 1GB of space. That's 50 accounts per BD drive, and 4000 accounts per hard disk. This means that when some dude comes out of jail and tries to access the photo his mom posted on his Facebook wall in 2010, there are 3999 accounts that are pulled out of their coma with it for no reason. On a BD that's only 49.

      As long as you partition stuff properly it's unlikely that a single request will span multiple BD drives. You may have to deal with clusters of BD disks and this requires a bit of tuning, but even with the best indexing system in the world you can't power up only part of a hard disk. So BD is a clear winner here, especially if to that footprint issue you add the fact that spinners die quickly when you keep playing with the on/off switch.

      Bytes are bytes when you live in a software world. But physical factors and limitations come into play when you deal with storage, and that's why most people with a software background can see WTF where there is instead good engineering.

      --
      lucm, indeed.
    14. Re:Why not just use hard drives and then store... by Eric+Green · · Score: 5, Interesting

      You're actually talking about MAID (Massive Array of Idle Disks), a technology that I first encountered in 2002. Now-bankrupt Copan Systems was the company I first encountered that was doing MAID, and New SGI (i.e. former Rackable Systems) bought their assets out of bankruptcy in 2010. Most storage companies now offer MAID add-ons for their storage arrays, though not all of them allow completely powering down the drive like Copan's solution did.

      The upsides of MAID: Disks are cheap. Turning on and spinning up a hard drive to pull up some bits is faster than a robot fetching a Blu-Ray disk, placing it into a drive in the jukebox, and waiting for the disk to spin up and come online. You could store many more bytes in a cabinet with MAID than you could in an optical disk cabinet.

      Downsides: The disk drives in a MAID array simply don't last that long, comparatively speaking. Spinning them up and down all the time is hard on a drive. So you end up having to replicate data and from time to time migrate data to new drives as old drives reach their service life. The service life of rarely used Blu-Ray media that has always been handled robotically (i.e., nothing touching its surfaces ever) is such that Blu-Ray media from ten years ago is probably still usable, the technology itself will become obsolete like DVD-RAM long before the media wears out. Not so much with hard drives, though disk arrays basically have unlimited life given typical failure patterns (i.e., if you're using RAID6, a drive develops errors, you remove the failing drive from the array, rebuild the array on a new drive, and chances of having two more drives fail during rebuild and thus losing the array are slim for a 12-drive array). So MAID has not really taken off the way we expected ten years ago.

      At the time I first encountered MAID I was working for a company called DISC Storage, which had a NAS head which would automatically migrate little-used data to an optical jukebox in a way similar to what Facebook appears to be attempting. I designed and implemented the clustering function that would replicate the data between two NAS heads / optical jukeboxes, since the DVD-RAM platters were not themselves RAID'ed, as well as implemented a lot of the back end functionality for jukebox control and so forth. In any event, it looked like a NAS head but most of the files had been migrated to the DVD-RAM platters, and if you accessed one of those files, you would (at some point maybe 15 seconds later) get your data back as the file got read back onto the hard drive. It worked. But it was somewhat slow and cumbersome, because you're relying on a robot to go out and fetch the disk and put it in a drive, and disk robots then, and now, simply aren't that fast compared to media that's already in a drive ready to be spun up and read.

      So anyhow, it was fairly obvious to me by mid 2003 that optical jukeboxes simply weren't going to be the future. In the ten years since DISC went under (there is a German company by that name now but it isn't the same company, it bought the name and some of the IP), I have not had any inclination to work for a company doing optical storage, because it's clear that for most problems it isn't the solution. It's too slow, too bulky, and magnetic disk drives and magnetic tape drives just continue getting bigger and cheaper every day. And now, with SSD coming on strong, optical jukeboxes look even less compelling.

      So color me amazed. Optical jukebox and optical media technology essentially has barely moved on in the past ten years and what wasn't particularly compelling then, is even less compelling now. If you have need to keep data for a *long* time, this is how you do it... but frankly, I will be surprised if Facebook even exists ten years from now given the pace of innovation in the industry (though I'm just as surprised that Slashdot still exists!), so I question why they would do this rather than invest in LTO tape libraries, which have the advantage of being significantly denser.

      --
      Send mail here if you want to reach me.
    15. Re:Why not just use hard drives and then store... by ShanghaiBill · · Score: 2

      While a hard drive may be cheaper at time of initial purchase,it likely has a significantly shorter lifespan as well

      This is just conjecture, unless you have some actual data on recorded BluRay lifespans.

      more reliable "Enterprise" drives typically cost three times as much)

      "Enterprise" HDDs are NOT more reliable. That is a myth promulgated by HDD vendors. Facebook uses "consumer" grade HDDs in their data centers. So does Google. So do all other informed non-idiots.

  2. Right to be forgotten? by tomhath · · Score: 4, Interesting

    Can I ask Facebook to delete my stuff from one of those (assuming I had a Facebook account in the first place)

    1. Re:Right to be forgotten? by Anonymous Coward · · Score: 3, Funny

      Can I ask Facebook to delete my stuff from one of those (assuming I had a Facebook account in the first place)

      You can ask, yes.

  3. Everything old is new again. by Nutria · · Score: 4, Informative

    Enterprises have been doing this with tape for 30 years.

    In fact, modern tape technology probably has a higher "volumetric" density than BD.

    --
    "I don't know, therefore Aliens" Wafflebox1
    1. Re:Everything old is new again. by evilviper · · Score: 3, Interesting

      Enterprises have been doing this with tape for 30 years.

      Tape has always had a limited life-span and is too easily damaged to completely trust with high-value archival data. Instead, archival on tape usually means "we're not quite confident enough to just delete this crap".

      Meanwhile, Sony's enterprise-grade write-once (WORM) magneto-optical (MO) discs have been around for decades, are physically tougher, and impervious to magnetic fields, sold with 100-year warranties that even cover data-loss recovery costs.

      BD-RW can certainly be seen as Sony's MO technology being brought down dramatically in price due to economies of scale, and intentionally to allow them to compete in the consumer space.

      --
      Slashdot gets worse every day... Pipedot: News for nerds, without the corporate slant
    2. Re:Everything old is new again. by saleenS281 · · Score: 2

      Unlikely. The time Blu-Ray saves in getting the the point on disk, it will lose in loading the media. Either way, access time will be measured in minutes, and do you really care if your data is returned in 3 minutes instead of 4? At that point I'd take the higher density, and known reliability all day long. Not to mention, I know I'll be able to buy tape and parts for another decade, the same can't be said of blu-ray.

    3. Re:Everything old is new again. by rahvin112 · · Score: 2

      I'm willing to bet tape has a MUCH longer life span too. CD-R's start dieing in less than 10 years, I doubt blu-ray lasts any longer. Even the archival grade disks where they claim to last longer than 0 years I'm not sure I believe them. The nice thing about magnetic tape is that they tend to last forever and only go bad from wear or exposure to magnetic fields.

    4. Re:Everything old is new again. by evilviper · · Score: 2

      MO disks require (IIRC) a bit to be raised to a very high temperature to alter, while bluray just requires the organic dye to degrade (as they all do).

      There are at least 3 distinct types of Blu-ray discs: Commercially pressed, -R, and -RW (well, they call them -RE, but... meh).

      Only one of the three types uses an organic dye that degrades. Instead BD-RW has much in common with MO discs, and was reportedly the first format Sony developed, thanks to their existing MO technology.

      You'd have to be ignorant or foolish to rely on dye-based mediums like bluray for anything archival.

      You are sadly showing your ignorance of disc technology. I've handled enough of both in my time to make a far better judgment than an armchair expert.

      --
      Slashdot gets worse every day... Pipedot: News for nerds, without the corporate slant
  4. There's just one problem with this... by Vengeance · · Score: 5, Funny

    When you first access this data, you have to sit through 42 previews before you get to it.

    --
    It was a joke! When you give me that look it was a joke.
  5. Re:But... by jeff4747 · · Score: 2

    Couldn't get all the way to the 7th sentence of the summary?

  6. Backup, not storage by gman003 · · Score: 3, Informative

    I read TFA. They're not using them as "storage" in the sense of active, accessible storage. It's a backup system.

    What they're trying is, instead of storing redundant copies of everything on multiple drives (for resilience and geolocality), they're keeping one copy live and keeping backups on blu-ray.

    So there's never a latency of minutes while it loads data from Blu-Ray, you just might be routed to Siberia or something to get the one active copy. If that copy's bad, error (restore from backup during next nightly batch or something).

  7. From the article ... by CaptainDork · · Score: 2

    "Those data demands will only increase with time, particularly as personal cameras and smartphones become capable of capturing higher-quality images."

    From Facebook: "We automatically take care of resizing and formatting your photos for you when you add them to Facebook."

    --
    It little behooves the best of us to comment on the rest of us.
  8. Re:Why not the "boring" Tape storage? by DoomSprinkles · · Score: 3, Insightful

    That's not really how tape systems work. Generally they keep an index online so you can tell the tape system to pop in a specific tape and goto a specific position, longest load times... in real world that i've personally witnessed... 10 mins

  9. Gotta be overhyped by duke_cheetah2003 · · Score: 2

    I dunno. I've never been pleased with the performance of optical media. I'd think being in a data center, heating up and cooling down from usage and storage is going to have very bad effects on recordable optical discs (CDs, DVDs, Blurays). Not to mention, it's always a pretty well known fact, consumer recorded media (the ones with dyes and stuff) aren't terribly reliable in the long term. My personal experience with recordable optical media is poor at best, I have very very few discs that've remained readable and error free after just five years of relatively decent care and storage. And this is not even using them every day, heating them up and cooling them down, just stored in a dark cool place.

    Seems... overhyped. I simply can't come to believe this is an actual viable storage medium for any kind of large scale operation. But enh, if it works for them, good deal. Seems like you'd get more bang for your buck using high capacity tapes which hold up much better to heating up and cooling down.

    The power saving claim also seems silly. This could be easy done with standard hard drives in a cartridge type system they're saying they're using, powering down unused drives and putting them into a storage position (though for me, I think it'd be much smarter to make the connector the moving part and just plug into the right bank of HDs, instead of moving HDs around in a cartridge.)

    The more I think about this operation, the less intelligent and efficient it seems to be.

  10. Tales of ~150,000ms access time by TheRealHocusLocus · · Score: 2

    Okay, so we need disc 101 from tray 1010101 and the robot arm is busy, three other fetches already in the queue. After 30,000ms client Javascript times out and substitutes a "retrieving data, re-try for a few minutes" place holder, sets a longer camp-on timeout and releases the request.

    The reason the robotic arm is busy is that despite random assignment to storage pools with some localized album grouping, web crawler activity for public albums, and bulk pre-fetch requests for semi-private albums by browser plugins run by logged-in users (which became more popular as access time increased) ... the lukewarm storage facilities are running hot and queues are full most of the time.

    Despite the polished and smoothly functioning presentation that encourages the users to "just wait a bit" ... a dark rumor grows deep in the hearts of many that the data is not merely delayed, they must brush off dust and cobwebs, or root for it because it had been haphazardly tossed into a pile of rubbish somewhere, relegated to the digital Basement. Facebook does not think your photograph is of sufficient merit. Grandmother has long passed and you had not wished to look at her last week, so... why should you be interested now?

    The effects are complex, but the cause is clear: the Internet is perverse. It re-routes around any attempt to take immediate access data off-line by degrees, accomplishing this through a series of countermeasures such as unwelcome crawlers depleting your cache, hitting your 'public' cold data systematically and regularly, then finally bankrupting your company as users migrate to another service whose superior performance does not arise from superior engineering -- merely the fact that fewer users are using it.

    So the moral of the story is, if you are Facebook and wish to remain so, you will either strive to find a way to keep the random access time for everything down below 2000ms -- or die.

    And also, Facebook would be wise to heed the following:

    once / forgotten by tourists / a bicycle joined a herd of mountain goats /// with its splendidly turned horns / it became / their leader /// with its bell / it warned them / of danger /// with them / it partook / in romps / on the snow covered / glade /// the bicycle / gazed from above / on people walking; / with the goats /// it fought / over a goat, / with a bearded buck /// it reared up at eagles / enraged / on its back wheel /// it was happy / though it never / nibbled at grass /// or drank from a stream /// until once / a poacher / shot it /// tempted / by the silver trophy / of its horns /// and then / above the Tatras was seen / against the sparkling / January sky /// the angel of death erect / slowly / riding to heaven / holding the bicycle's / dead horns //////~Jerzy Harasymowicz

    --
    <blink>down the rabbit hole</blink>