Slashdot Mirror


How Facebook Stores Billions of Photos

David Gobaud writes "Jason Sobel, the manager of infrastructure engineering at Facebook, gave an interesting presentation titled Needle in a Haystack: Efficient Storage of Billions of Photos at Stanford for the Stanford ACM. Jason explains how Facebook efficiently stores ~6.5 billion images, in 4 or 5 sizes each, totaling ~30 billion files, and a total of 540 TB and serving 475,000 images per second at peak. The presentation is now online here in the form of a Flowgram."

12 of 154 comments (clear)

  1. Transcript? by dstar · · Score: 5, Insightful

    I don't suppose there's a transcript of this anywhere, is there? That + slides would be infinitely more useful....

  2. Re:Photos? You mean people use FB for photos too? by oskard · · Score: 4, Insightful

    I think you're thinking of MySpace.

    If you used the service, you'd know that Facebook privacy settings are actually implemented very well. For example, I set up an account for my mother so she can look at all her siblings photos. She hasn't been bothered by anyone outside of the family, and is really enjoying the ability to communicate with everyone.

    The best thing I can compare it to is AOL. Its got a built in Email clone, IM service, Forums, Groups, and of course, profiles. But unlike AOL, Facebook is just a web page. There's no lock in - its more of a resource provider than a service provider.

    --
    Sigs are for Terrorists.
  3. Full sized images, please by bucky0 · · Score: 5, Insightful

    I wish that facebook wouldn't resize its images on the backend. My friends all post pictures from parties/trips, etc.. there, and I'd love to be able to just download the full res version to send off to be printed, but facebook resizes the largest dimension to be ~600px, which is pretty worthless for printing.

    Yeah yeaj. there's other sites that don't, and I post my stuff there (to flickr, personally), but convincing that one person who took the nice photo of you to do it too is near impossible.

    --

    -Bucky
  4. Re:Slashdotted by elronxenu · · Score: 1, Insightful

    That's for sure.

    Plus, when their server (singular?) finally responded to me, it requires a later version of Flash than I have. So I can't read the presentation at all. Way to not get the word out, folks.

  5. Re:Photos? You mean people use FB for photos too? by 0100010001010011 · · Score: 5, Insightful

    This stuff is cool either way, even if it is just "childish spam." Many of us only dream to work on something that will become this large scale.

    Facebook started off (stolen idea or not) as a site with some php and a database. In the early years there were no applications or photos. They've managed to scale PHP beyond what most slashdotters will say PHP can even do. They've even contributed some of their stuff back to the PHP community.

    Look at some other similar 'home grown' sites that have had to quickly scale and invent stuff just to stay a float.
    Archive.org has their pentabox
    Google has their Google File System and all of their own hard ware design.

    Hopefully the site will recover. 540TB of data and 500k images per second while at the same time being able to process photos near instantly in the background to 4-5 different sizes is nothing to ignore. Fortune 500 companies could probably learn a thing or two...

  6. Not hard by mlwmohawk · · Score: 5, Insightful

    While the article is slashdotted, this is not a hard problem. It has an expense involved, but it is not difficult.

    So, as another poster implied, 18K per photo on average, so about 8Gig per second, peak.

    So, assuming that the pictures are evenly distributed, you'd need a bunch of machines and a good number of "tubes" and a way of directing requests to the correct image server or server cluster.

    So, what's the problem? Why would you think this is difficult? It's all off the shelf technology, just a bunch of it.

  7. Akamai? by ruiner13 · · Score: 2, Insightful

    Why don't they just use a 3rd party distributed storage system like Akamai NetStorage? Then they don't have to worry about adding capacity, redundancy, etc. All they have to do is upload the picture there, and Akamai mirrors it all around the world.

    --

    today is spelling optional day.

  8. Re:FLASH?! by Firehed · · Score: 2, Insightful

    I think it's becoming part of the HTML5 spec; however, it's tremendously more complicated due to the limitless plethora of video formats. With web-oriented images, it's almost all jpegs for photos and typically pngs for graphics, with plenty of gifs around. Tiff is a very established format but never sees use in websites since the files are stupidly large, and most other formats are specific to some editing program. With video, you've got half a dozen Quicktime formats, DivX, XviD, h.264, x264, WMV, Real, and a huge number of others (many of which are pro-oriented). Never mind the play/pause/scrubbing interface (which could become yet another CSS nightmare), the much bigger file size, the audio, auto-playing, etc.

    Until there's a jpeg for video, I'd say we should leave it alone. Flash is currently fulfilling that role, and all things considered does it reasonably well given the ease of implementation.

    --
    How are sites slashdotted when nobody reads TFAs?
  9. I find by msimm · · Score: 5, Insightful

    That if you plan to do it (or hope to) it helps to read the ups and down of people who already have. And it's *nice* that some take the time out (as ./ did and a number of other sites) to talk about it so that we can learn from their experience and mistakes.

    But if you already know everything, by all means, shoot. But the outline that just got you modded as insightful isn't an application, didn't detail redundancy of any sort and would be a management nightmare (ie, all the interesting stuff).

    I mean really, we could propose that solution to just about any web based application but that's not hardly the story is it?

    --
    Quack, quack.
  10. When you are talking 500Tb, you hit limits by Anonymous Coward · · Score: 5, Insightful
    Limits, like: Netap filers max out at 16Tb (raw) per volume, so you have to start using multiple volumes and get creative with mount points and hope you dont hit some other limit (max files/inodes, addressing limits of the os/fs, etc). The harder part is the "way of directing requests to the correct image server/cluster" you mention. Its not quite "off the shelf" technology, as you now have to implement something that can handle the 4750000+ requests per second and point them in the right direction for a single entry in a pool of 30000000000. And thats just images, you still have to route and serve the rest of the content for the pages. At those levels, a simple F5 load balancer is not going to cut it. Stacking a bunch of F5's still wont do. This will probably be distributed across several DCs stretched across distant geographical areas with some DNS magic to route traffic to locally close DCs. Keeping even the indexes in sync so the requests can be rerouted to the proper DC (if not stored locally) becomes an interesting problem to solve.

    No, I dont work for them, but I do work for another company facing similar storage/distribution problems. When things get this big, its not simply "take what works and just make it bigger or get more of them", you have to start redesigning things. For a bad car analogy: its like saying a passenger train is just a bunch of greyhound busses.

    tm

  11. Re:I dunno. by 7+digits · · Score: 5, Insightful

    In the late 90's we stopped using documents with images and text), because they had the following disadvantage:

    1) Printable
    2) Searchable
    3) You could look over them at a glance to find information

    We replaced them by the fabulous presentation with voice-over.

    It removed part of the ability to scan over information, to search, and to print.

    Unfortunately, it still had the disage of letting the user seek to some part of the presentation, so another iteration was needed.

    Now, welcome to the 21th century. Thanks to flowgram, you don't have to worry about printing anymore (you can't), or searching (you can't), or even pausing, going forward, or doing anything (you can't).

    If you get a phone call in the middle of the presentation, though luck. And of course, you have no way of knowing how long it is, how long is left, or anything. And if you miss a word or a sentence, you can always restart the presentation and listen more carefully the next time.

    I must congratulate the folks over flowgram.com. It seems very hard to have some idea that could be less usable. I'm pretty sure there is someone somewhere working hard at this, and some VC will give him money for that, but, for now, if you want to put have a shitty unusbale presentation online, flowgram is the way to go.

  12. Re:Photos? You mean people use FB for photos too? by vux984 · · Score: 4, Insightful

    If you used the service, you'd know that Facebook privacy settings are actually implemented very well.

    Given that I can't look at my sisters photos without signing up for an account I'd say her privacy is being 'protected' solely to induce all her friends and siblings to sacrifice theirs by joining facebook.

    I set up an account for my mother so she can look at all her siblings photos.

    You don't need facebook for that.

    and is really enjoying the ability to communicate with everyone.

    or that.

    But unlike AOL, Facebook is just a web page. There's no lock in - its more of a resource provider than a service provider.

    How exactly is requiring me to create and login to a facebook account to view content someone else wants me to be able to see not lockin?

    That's like requiring me to create a gmail account to receive email from people with gmail accounts. Or requiring me to sign up to AOL to see websites hosted by AOL. Facebook is pretty much the definition of lock-in.