How Facebook Stores Billions of Photos
David Gobaud writes "Jason Sobel, the manager of infrastructure engineering at Facebook, gave an interesting presentation titled Needle in a Haystack:
Efficient Storage of Billions of Photos at Stanford for the Stanford ACM. Jason explains how Facebook efficiently stores ~6.5 billion images, in 4 or 5 sizes each, totaling ~30 billion files, and a total of 540 TB and serving 475,000 images per second at peak. The presentation is now online here in the form of a Flowgram."
I thought it was created just so that you could have all your spam and silly forwards in one place.
Mit der Dummheit kämpfen Götter selbst vergebens
But seeing as how this just got posted and already it's Slashdotted, I'll bet it's not the same way Flowgram stores its presentations.
My blog
Ohhhh boy, queue the pr0n jokes in 3... 2... 1...
at Flickr
More music, fewer hits
"You either have javascript turned off or you have an older version of Adobe Flash."
That was an informative article but I didn't see anything about Facebook. At least there weren't ads and they kept it to one page!
Perhaps I should turn on audio, or they should have a less friggin confusing UI.
Fascinating Presentation for those of you who actually bother to watch the Hour or so of content.
Does anyone see the irony in Flowgram's demonstration?
Flowgram Guy 1: "OK, this is how Facebook stores billions of photos and serves thousands of them each second"
Flowgram Guy 2: "Cool, maybe we should implement that technology"
Flowgram Guy 1: "Why? It's not as if we're ever going to have our servers swamped with thousands of requests..."
Summation 2
Well I guess it helps the storage problem by disabling the ability to add photos to groups, which seems to have occurred in the last few days... (for a few people at least)
I know /.'ers don't admit to having facebook accounts, but a link in case any lurker wants to see the comments about this
issue.
Let's all go look at pictures on fb from 12 noon EST to 12:05 EST. That ought to show them...
I 3 Myspace hunni!
Cheers!
Atheist: Buddhist in a Prius
alternate source?
I don't suppose there's a transcript of this anywhere, is there? That + slides would be infinitely more useful....
I don't know if this was posted on slashdot before and I'm too lazy to look, but this article from Rolling Stone about the founder of Facebook seems far more interesting than a slashdotted hour long flash presentation.
I wish that facebook wouldn't resize its images on the backend. My friends all post pictures from parties/trips, etc.. there, and I'd love to be able to just download the full res version to send off to be printed, but facebook resizes the largest dimension to be ~600px, which is pretty worthless for printing.
Yeah yeaj. there's other sites that don't, and I post my stuff there (to flickr, personally), but convincing that one person who took the nice photo of you to do it too is near impossible.
-Bucky
Flowgram serving 475000 /. users flawlessly , now that would be impressing.
RS
Shoes for Industry. Shoes for the Dead.
Some kind of problem with their Engrams
Better check what KSW says to do when slashdotted ...
alpha.flowgram.com
While the article is slashdotted, this is not a hard problem. It has an expense involved, but it is not difficult.
So, as another poster implied, 18K per photo on average, so about 8Gig per second, peak.
So, assuming that the pictures are evenly distributed, you'd need a bunch of machines and a good number of "tubes" and a way of directing requests to the correct image server or server cluster.
So, what's the problem? Why would you think this is difficult? It's all off the shelf technology, just a bunch of it.
Why don't they just use a 3rd party distributed storage system like Akamai NetStorage? Then they don't have to worry about adding capacity, redundancy, etc. All they have to do is upload the picture there, and Akamai mirrors it all around the world.
today is spelling optional day.
I use FaceBook every day and looking at photo albums and pictures is horribly slow on their site. I consider their implementation an example of something that still needs improvement.
I put some short video clips on Facebook's video application (just stuff of my daughter for my friends and family to see). These are AVI files generated by my digital camera, about 20-30MB in size, lasting about 1-1.5 minutes each.
They uploaded pretty quickly, but then they were put in a queue to be encoded for their flash player. It took over 3 days for them to be online in my profile! It seems they don't need to just have large capacity for storage, but a bunch more CPU for processing.
Look at the tomato! Isn't it sad? He can't dance! Poor tomato!
Next article, how to effectively serve a Flowgram that's referenced on Slashdot
This stuff is cool either way, even if it is just "childish spam." Many of us only dream to work on something that will become this large scale.
...Fortune 500 companies could probably learn a thing or two...
This Fortune 500 company could teach a thing or two on this subject. Since before 1999 DataTree has already did this. With over 40 billion land records online, and 600+TB of data, they deliver many millions of images daily. Not to put down FaceBook's Implementation, but DataTree does not need to run 10k webservers and 1800 SQL databases to provide images. It is nice to see the scalability factor of their design, but it does not mean that it is the most efficient way to do things, or to follow and learn from.That if you plan to do it (or hope to) it helps to read the ups and down of people who already have. And it's *nice* that some take the time out (as ./ did and a number of other sites) to talk about it so that we can learn from their experience and mistakes.
But if you already know everything, by all means, shoot. But the outline that just got you modded as insightful isn't an application, didn't detail redundancy of any sort and would be a management nightmare (ie, all the interesting stuff).
I mean really, we could propose that solution to just about any web based application but that's not hardly the story is it?
Quack, quack.
No, I dont work for them, but I do work for another company facing similar storage/distribution problems. When things get this big, its not simply "take what works and just make it bigger or get more of them", you have to start redesigning things. For a bad car analogy: its like saying a passenger train is just a bunch of greyhound busses.
tm
we had some problems in the beginning but the server should be much better now.
Simple: 70 thousand pen drives.
alias possession='chmod 666 satan && ls
Nothing to see here folks, move on.
(summarizing the big long presentation)
This is basically want to make a usermode GoogleFS. Their biggest problem is reducing reads - which are hampered by Posix file standards (inodes, metadata, etc...)
Instead they use a database-like index/data file arrangement. The index stays in memory and files are stored together in large contiguous spaces on a single file. It's possible to utilize a LUN for storage - but not there yet.
There... where's my cookie?
(Oddly enough - I'm writing the exact same code they are... bazaar world, eh??)
I said no... but I missed and it came out yes.
All of those hacks, they should have just written a filesystem.
Either they are not replying at all or so slow you can invade a smaller country for no reason except for doing it while you wait - so I'm not really interested in their opinion
If Google really cared they would fix Android Chrome to reflow text, instead of discriminating
And given enough info, our information mining overlords will be able to predict what passwords you use, what sort of "private" proclivities you indulge in, etc. Then your Big Brother issues a subpoena for that shit, and you're f$cked.
Kevin Smith on Prince
Comment removed based on user account deletion
As your screen name.
That was enough to put me off.
IANAL but write like a drunk one.