Where Facebook Stores 900 Million New Photos Per Day
1sockchuck writes: Facebook faces unique storage challenges. Its users upload 900 million new images daily, most of which are only viewed for a couple of days. The social network has built specialized cold storage facilities to manage these rarely-accessed photos. Data Center Frontier goes inside this facility, providing a closer look at Facebook's newest strategy: Using thousands of Blu-Ray disks to store images, complete with a robotic retrieval system (see video demo). Others are interested as well. Sony recently acquired a Blu-Ray storage startup founded by Open Compute chairman Frank Frankovsky, which hopes to drive enterprise adoption of optical data storage.
They could just delete most of the photos after they age a bit, analyzing it with some of their AI whiz-bang software.
If anyone ever asks to see the image again, they can just show one that is "close enough" and nobody would ever know the difference.
I personally, have never posted a photo to Facebook, so I'd be OK with that.
This issue is a bit more complicated than you think.
After 3 months of no views, just replace them with a goatse image.
That way, you only need to store one image which replaces 99.999% of all pics uploaded. No need for complex storage solutions!
Another advantage would be that you can serve it really, really fast. No wait time!
...gis sdrawkcab (usually not responding to ACs; don't bother posting as AC)
What happens when a user wants to delete an image permanently. If it's stored on an optical disc are they going to destroy the whole disc and burn it again?
- Dan
Should have read:
You won't believe this one weird trick Facebook uses to store data!
Other than that, fascinating look at how all that data is being stored and retrieved.
is that their monthly AWS fees must be ENORMOUS!
is it cold in here or what?
I've noticed large latency for rarely used pictures in FB for over eight months now, and by large latency I mean visit the page, then come back the next day to see the next batch of > 5 year old pictures and wait another day for the final batch of ~10 years ago pictures.
People upload the same memes all the time. Just hash and store the common images and you'll reduce the unique photos to one or two unique images per day. :)
An interesting question is at what point does it become viable for FB to follow Amazon's model to scale its own system as a business unit...
As in, when will FB conclude that it again needs to widen its revenue stream portfolio, and it therefore makes sense to offer its own version of AWS?
Any predictions on FBWS?
And there's the FB hardware development division, a business unit that so far has also remained in-house but has its own revenue potential. I think people tend to underestimate MZ's ambitions to leverage the FB core to create a broad spectrum business. (following Google's leverage of search revenue to devour the advertising business, etc., and Amazon's leveraging of book sales to devour retailing and then logistics, etc.)
Pretending this is my office full of bitter coworkers..
I think he just means that optical disks don't have quite the same amount of inertia or need to do internal self-checks as HDDs, before you can actually access them. They still "spin them up", but it happens in a few ms rather than on the order of 5-15s.
Wow, they discovered HSM only 40 years after it was introduced. Amazing.
Facebook has always compressed pictures. Nothing is stored at full size.
"Don't meddle in the affairs of a patent dragon, for thou art tasty and good with ketchup." ~ohcrapitssteve
Joking aside - its always good practice to have electronic AND hard copies (optical disc, microfiche paper) of all critical data including copies off site. That way even if some hackers from somewherestan manege to totally trash the companies electronic systems the data can still be recovered.
Is facebook still a thing? People still use it after all the security problems and personal information screw-ups?
Just cruising through this digital world at 33 1/3 rpm...
How is using blu ray cheaper than hard drives? Not only is it slower, but the medium + the hardware to burn them + the robotic retrieval system..
Seems like there could be an easier solution to this: hard drives in racks. No robots, no optical drives, and no blu ray discs.
One 500gb hard drive already has 10x the amount of storage as a dual layer bluray. In fact, a 10 pack of dual layer blu ray discs on amazon costs twice as much as a 500gb 3.5" drive. Am I missing something?
What critical data? Personal? Business?
At what point is it critical enough to go out of your way to store terabytes of data on CD/DVDs? Isn't an offline HD good enough?
I have done the following for a long time and I believe this is more than enough for most businesses
1. Backup to NAS (or equivalent)
2. Backup to offline disk (done monthly but could be done more often depending on business requirements)
3. Offsite Backup on the west coast (We are on the east coast)
At what point are you spending too much money securing data?
At what point are you being paranoid?
Those are all questions that will have different answers depending on the company and it's IT/Ownership.
First thing I did was open facebook and look to see what my oldest picture was. I don't have that many and it came up pretty quickly but I'm sure lots of other people had the same impulse.
Replace images of people's food with a stock image, and they could dispense with this whole system.
Didn't we see a story about this last year?
It might be that using Blu-Ray autochangers may be a very useful thing to have, especially for something that can fill the gap between HDDs and LTO tapes for backups [1].
The pathetic thing is that this technology isn't new. We used to have 100, 200, even 400 disk CD and DVD carousels. By replacing the CD reader with a burner, and using 128 GB BDXL media, that means tens of terabytes of tamper-resistant (important with all the ransomware out there) WORM storage.
The trick is getting BD media into the terabytes and getting it at a price point where it is decently affordable. For example, a 100 GB BDXL disk is $65, but it should be about 10% of that price in order to be a viable backup medium.
[1]: The cloud isn't an option in a number of cases (WAN bandwidth isn't cheap), and it is only a matter of time before a major provider gets hacked.
In the cloud, obvs.
systemd is Roko's Basilisk.
it would have answered your questions, and you wouldn't have looked like a tool, and i wouldn't have mocked you. the world would have been a better place! if only.
What I don't get is why FB doesn't just use tape. Tape drives are expensive, but the media itself is cheap -- LTO-4 cartridges are $15 apiece, and tape is a true archival grade media.
Plus, with tape, you copy it to that, yank the tapes out of the autochanger, and toss them in an unused corner of a room. Tapes take 0 watts in storage (other than what it takes for HVAC), so other than physical access concerns, they are easily stashed and will remain usable for quite a long time.
If any industry needs a kick in the pants with regards to capacity improvements, it is the tape media industry. A tape has far more area to put data on than a HDD platter, so there is a lot of room to add capacity, as well as reduce price with cartridges and drives, especially if mass produced so economies of scale kick in. Back in the 1990s, almost any business had some form of tape drive, which worked fairly decently for backups (although 4mm/8mm drives are nowhere near as reliable as a LTO drive.)
No, tape isn't trendy... but it functions well, and with WORM media or hardware write protection, it is resistant to malware. With hardware encryption in newer revs (LTO-4 and newer), it is trivial to just set a password and call it done when it comes to that security... that way, if a tape falls off the Iron Maiden truck, it is just a hardware loss... no worry about compromised data.
The nsa built that huge data center in Utah for nothing?
Now if the nsa would just open an api to retrieve it....
They resize them first, then compress. A 3~5mb pic is stored around 10% of the uploaded size.
Don't be apathetic. Procrastinate!
What I don't get is why FB doesn't just use tape
Because of the seek time. They still want the content available and the BlueRay method yields a 10 second delay from what I read (I may have read that wrong).
Plus, with tape, you copy it to that, yank the tapes out of the autochanger, and toss them in an unused corner of a room. Tapes take 0 watts in storage (other than what it takes for HVAC)
They can't just toss it. That's the whole point of the article. They still need access on demand.
I think that the BlueRay solution is cheap too. The article was making reference to how much colder that area was (because of the lower HVAC requirements I assume).