How Facebook Stores Billions of Photos
David Gobaud writes "Jason Sobel, the manager of infrastructure engineering at Facebook, gave an interesting presentation titled Needle in a Haystack:
Efficient Storage of Billions of Photos at Stanford for the Stanford ACM. Jason explains how Facebook efficiently stores ~6.5 billion images, in 4 or 5 sizes each, totaling ~30 billion files, and a total of 540 TB and serving 475,000 images per second at peak. The presentation is now online here in the form of a Flowgram."
To view the slideshow . . err I mean 'flowgram' (whatever the fuck that's supposed to mean), you dont need to register.
Your hair look like poop, Bob! - Wanker.
Perhaps I should turn on audio, or they should have a less friggin confusing UI.
Fascinating Presentation for those of you who actually bother to watch the Hour or so of content.
Not only that, but the UK Facebook site has been down most of the afternoon - some infrastructure, huh?
One swallow does not a fellatrix make
I do have Javascript installed, and am running Adobe Flash (Linux version). Doesn't work :(
Join the anonymous, help develop the network: http://www.i2p2.de
Worked for me from Ubuntu.
Slay a dragon... over lunch!
RS
Shoes for Industry. Shoes for the Dead.
This stuff is cool either way, even if it is just "childish spam." Many of us only dream to work on something that will become this large scale.
...Fortune 500 companies could probably learn a thing or two...
This Fortune 500 company could teach a thing or two on this subject. Since before 1999 DataTree has already did this. With over 40 billion land records online, and 600+TB of data, they deliver many millions of images daily. Not to put down FaceBook's Implementation, but DataTree does not need to run 10k webservers and 1800 SQL databases to provide images. It is nice to see the scalability factor of their design, but it does not mean that it is the most efficient way to do things, or to follow and learn from.I dont work for FB, but a company that does make use of 3rd party caching networks for very large content distributions
Tm
we had some problems in the beginning but the server should be much better now.
Get the Big Photo application.
It's not ideal, but it works quite well. A friend of mine is a professional photographer and she puts all her work up there. Works well for her.
(summarizing the big long presentation)
This is basically want to make a usermode GoogleFS. Their biggest problem is reducing reads - which are hampered by Posix file standards (inodes, metadata, etc...)
Instead they use a database-like index/data file arrangement. The index stays in memory and files are stored together in large contiguous spaces on a single file. It's possible to utilize a LUN for storage - but not there yet.
There... where's my cookie?
(Oddly enough - I'm writing the exact same code they are... bazaar world, eh??)
I said no... but I missed and it came out yes.
That won't work considering the number of files. Given the quote (which require nearly a year of hassle with the Akamai morons and sighing an NDA, thus the AC post) we got from those idiots, it would cost us almost $200k/year given our bandwidth use to store ~1,000 files. Facebook has 30 billion files and assuming the same price per file as we were quoted, Akamai would charge $6,000,000,000,000/year to host them. To put that number in perspective, that's more than the GDP of the Germany plus that of the UK. The Akamai VP (something Danzig, I remember the name because of the band of the same name) I talked to just wasn't able to comprehend why we wouldn't consider paying that much to host a few files. We ended-up renting five 1U servers in four different countries for about $15k/year. While we have a little less total bandwidth and it requires more management time to maintain, it's only 7.5% of the cost of Akamai and we can store many(1,000 times?) more files than Akamai would let us at that price point. To say that the Akamai guys don't understand math is an understatement.