Slashdot Mirror


New Linux Petabyte-Scale Distributed File System

An anonymous reader writes "A recent addition to Linux's impressive selection of file systems is Ceph, a distributed file system that incorporates replication and fault tolerance while maintaining POSIX compatibility. Explore the architecture of Ceph and learn how it provides fault tolerance and simplifies the management of massive amounts of data."

39 of 132 comments (clear)

  1. History by Alcoholic+Dali · · Score: 4, Informative

    Ceph was designed by Sage Weil (of WebRing fame), who is also one of the founders of DreamHost. They will likely be using it internally soon, if they aren't already. http://en.wikipedia.org/wiki/DreamHost

    1. Re:History by TooMuchToDo · · Score: 4, Informative
      http://www.dreamhost.com/jobs.html

      FILE SYSTEMS SOFTWARE ENGINEER
      Los Angeles, CA

      New Dream Network has a vacancy for a Senior File Systems Software Engineer in Los Angeles, CA. Minimum requirements – Master’s degree in Computer Science or Computer Engineering, minimum of 2 years experience in storage programming, and background in Linux kernel programming, file systems development, network programming and Operating Systems design.

      Qualified applicants should send a plain text resume to cephjobs@dreamhost.com

    2. Re:History by volcan0 · · Score: 2, Insightful

      I always liked dreamhost (not for their uptime....), this just confirms it. It is good to see a compagny using open source software contributing back like this.

    3. Re:History by MichaelSmith · · Score: 2, Funny

      Qualified applicants should send a plain text resume

      Ha! That'll cut down on the noise. I wonder how many job seekers have ever heard of plain text?

    4. Re:History by John+Hasler · · Score: 3, Funny

      "Plain text". That's just a Microsoft Word document with no embedded images or graphs or anything, right?

      --
      Warning: this article may contain humor, sarcasm, parody, and perhaps even irony. Read at your own risk.
    5. Re:History by ae1294 · · Score: 4, Funny

      I sent mine in ANSI format so I could blink my contact info...

  2. Is data integrity really necessary for large data? by BadAnalogyGuy · · Score: 2, Interesting

    Look at Google and Facebook, arguably among the top users of massive databases. They have petabytes upon petabytes of data stored and are constantly growing. But what happens if they lose some data?

    Nothing. They can always go back and regenerate that data. It's just a matter of time.

    So at this large scale, it doesn't make any sense at all to focus on data integrity beyond making sure that fopen() and fread() don't return garbage. It's the smaller databases that contain critical information that need data integrity. These are typically sub-terabyte, though some may creep over that limit in a few uncommon instances.

    And realistically, if you don't want your data to be hacked up, lost, then thrown out with a bad drive, ReiserFS or any other modern journaling filesystem is the right choice.

    I wouldn't bet money on distributed filesystems just yet.

  3. Is it ready for primetime? by Meshach · · Score: 5, Informative

    The headline in the Ceph wiki: Ceph is under heavy development, and is not yet suitable for any uses other than benchmarking and review.

    --
    "Maybe this world is another planet's hell"
    Aldous Huxley
  4. Totally not ripped from a webcomic... by AdmiralXyz · · Score: 2, Insightful

    "It took a lot of work, but this latest Linux patch enables support for multi-petabyte file organization and storage!"
    "Do you have support for smooth, full-screen Flash video yet?"
    "No, but who uses that?"

    --
    Dislike the Electoral College? Lobby your state to join the National Popular Vote Interstate Compact.
    1. Re:Totally not ripped from a webcomic... by yourexhalekiss · · Score: 4, Insightful

      "Do you have support for smooth, full-screen Flash video yet?"

      Frankly, that's Adobe's fault, not ours.

    2. Re:Totally not ripped from a webcomic... by Hurricane78 · · Score: 3, Insightful

      Yes it is ours. If “ours” means: Us idiots who made Flash dominant in the first place, by using it in any way.
      It always takes two. The ass doing it, and the idiot letting him do it. That guy with the narrow mustache from the 40s would agree to that: “What luck for rulers that men do not think.” ^^

      --
      Any sufficiently advanced intelligence is indistinguishable from stupidity.
    3. Re:Totally not ripped from a webcomic... by SanityInAnarchy · · Score: 3, Interesting

      Pick one.

      What you call a "rat's nest", we call "compatibility", and it works surprisingly well. Writing a game? Use OpenAL -- the distro will configure it to work. Need realtime audio for a DAW? Use JACK. Anything else? Use ALSA.

      What if you picked the "wrong one"? Doesn't really matter. If you managed to build a decent DAW on top of ALSA, it'll continue to work on top of ALSA. If you used OSS, that still works today.

      Video APIs? Flash has its own codecs, so all you need to know is xvideo.

      Seriously, you have even less of an excuse than people who bitch about how Linux has both GNOME and KDE, and oh, the horrors of actually having a choice.

      --
      Don't thank God, thank a doctor!
    4. Re:Totally not ripped from a webcomic... by jedidiah · · Score: 5, Insightful

      > Having a rats nest of audio and video apis doesn't help the situation. You freetards should be happy what you get for your piece of shit OS.

      The ffmpeg developers can manage yet the "professionals" at Adobe cant?

      "freetardry" is the only reason h264 acceleration is supported under Linux.

      If we waited for the nickel-and-dime-you approach to come to the rescue we would still be waiting.

      At least with MacOS, Adobe had a real excuse.

      --
      A Pirate and a Puritan look the same on a balance sheet.
    5. Re:Totally not ripped from a webcomic... by iknowcss · · Score: 2, Interesting

      Actually, I'm glad that he didn't link to it. I swear, every other story on Slashdot has some comment with a link to XKCD. Hey, we get the jokes. All of us read XKCD. You don't link to a video of Yakov Smirnoff every time you make a Soviet Russia joke, do you?

      --
      Life is rarely fair. Cherish the moments when there is a right answer.
    6. Re:Totally not ripped from a webcomic... by SanityInAnarchy · · Score: 4, Insightful

      So then you freetards need to stop whining when 99% of the world choices not to use or support your shitty OS.

      99% of the world does use our OS. You're likely doing it right now. Or did you think Slashdot runs on IIS?

      And not that it'd make much difference to an obvious troll, but I use proprietary software when appropriate, and I am in favor of open source, not necessarily "free software." Not every Linux user is RMS. (And if they were, they probably wouldn't be Linux users.)

      --
      Don't thank God, thank a doctor!
    7. Re:Totally not ripped from a webcomic... by fatalwall · · Score: 2, Insightful

      I don't read XKCD...

    8. Re:Totally not ripped from a webcomic... by nappingcracker · · Score: 2, Insightful

      Frankly, that's Adobe's fault, not ours.

      It could be our fault if you wanted it to be:
      http://www.gnu.org/software/gnash/
      http://swfdec.freedesktop.org/wiki/

      --
      |plastic....or gasoline?|
    9. Re:Totally not ripped from a webcomic... by evilviper · · Score: 3, Insightful

      "Do you have support for smooth, full-screen Flash video yet?"

      A) Yes, I do. MPlayer will play any Flash videos, with a bare minimum of resources, and fully supports multiple video output methods, like xv and gl.

      The PROBLEM is that Flash videos aren't directly available anywhere... You have to parse through a SWF video player object to even determine where to FIND the URL of the actual FLV or MP4 file. And add to that extremely aggressive plugin detection scripts on many sites, which will refuse to even embed the SWF if you happen to have an unknown VERSION of the flash player. Unfortunately, I've mentioned this before, and got several interested replies, but nobody has thus far written a browser plug-in that will masquerade as Flash 10, and understand just enough SWF to find the URLs, and either present them to the users, or automatically pass them to MPlayer. A sad, sad failing, to be sure, since

      B) I (and many, many others) care VASTLY more about Linux's support for massive storage arrays than we do for it's support of Flash, and other user-level fluff. My servers never need to visit YouTube... But booting from a hard drive more than 2 terabytes??? Don't expect Windows to let you do that, without very specialized hardware (EFI firmware). Linux, however, can do it out of the box with many common distros.

      --
      Slashdot gets worse every day... Pipedot: News for nerds, without the corporate slant
  5. Re:In soviet Russia by tomhudson · · Score: 4, Funny

    I for one, welcome our new petabyteFS

    Let me guess - you work for the SEC and need it for your porn collection

  6. Re:Is data integrity really necessary for large da by CoderJoe · · Score: 5, Informative

    Google's BigFile/BigTable architecture is a distributed filesystem. if a node goes down, the data that was on that node gets copied to other nodes to keep the replication count up.

    Facebook is using apache cassandra, which adopts similar designs.

  7. Re:Do niggers use linux? by Cryacin · · Score: 4, Insightful

    I think the big issue in the programming community as a whole is the current lack of understanding of the differences between eventual and atomic consistency.

    Distributed file systems work quite well when you have a single source of truth, but when you have multiple data stores, you can have multiple sources of truth. It essentially adds a temporal dimension to your data. As in, John Smith is a debtor of XYZ corp on Monday morning, but due to the server being down, we haven't realised on Tuesday morning that he paid his bill on Monday afternoon. Add late fee penalties.

    It adds another layer of complexity to an application that delayed gestures roll back transitive actions between actors in an Ecosystem. In the example, it would be to send out another letter stating that the late fee penalties have been removed, and if already paid, a refund is to be issued.

    --
    Science advances one funeral at a time- Max Planck
  8. Re:Is data integrity really necessary for large da by jdhutchins · · Score: 4, Insightful

    While google may be able to go ahead and re-index websites if it loses that data, "regenerating" gmail and google docs stuff isn't quite so easy, and even small amounts of data loss would kill those applications (especially among paid users).

  9. Re:Is data integrity really necessary for large da by morgan_greywolf · · Score: 5, Insightful

    Nothing. They can always go back and regenerate that data. It's just a matter of time.

    You just contradicted yourself. You're right; it's just a matter of time. Only, thing is, this is the Internet. How long to recreate that data? Weeks? Months? Years? 6 months is an eternity on the Net.

    If all the accounts and stories were lost on Slashdot due to a massive database failure, how many people would come back, creating a new account and so forth? How many long would it take before there was enough content and accounts to make it interesting again? Now realize that Slashdot is a drop in the bucket compared to Google.

  10. Re:In soviet Russia by ls671 · · Score: 3, Funny

    640 petabytes should be enough for everybody.

    --
    Everything I write is lies, read between the lines.
  11. Re:Do niggers use linux? by Ethanol-fueled · · Score: 5, Insightful

    It was noble of you to try to wrest control of a troll thread, but your comment loses a lot of credibility for being titled "Re: Do niggers use linux?"

    Would it hurt to at least change the title while you strive for visibility and relevance? When I saw the title of your post, I half-expected to see a poorly-written diatribe against Jamal Jackson for playing basketball and chasing caucasian women.

    Thank you, kind sir, for listening. We all must do our part to prevent trolling!

  12. Thread titles vs Trolling by MichaelSmith · · Score: 5, Funny

    Would it hurt to at least change the title while you strive for visibility and relevance?

    Well you didn't change it

  13. "Enterprisey" design? Yet no scrubbing? by Hurricane78 · · Score: 2, Interesting

    I see a lot too many layers over layers there. Which always smells like the inner-platform anti-pattern that a “enterprise consultant” would to, to me.
    But maybe I’m just misunderstanding things and that amount of layers is needed for large installations. Anyone here, who actually administers such large storage systems and read the article? Would be interesting to hear from someone with daily experience in this.

    Also, I could not find any mentioning of any ZFS-like scrubbing going on. Which in my experience equals zero reliability at all with today’s unreliable drives. How would that system detect a controller creating corruption? Or data degradation? I had those problems. And they killed half my data. Despite having a RAID, doing automatic backups with verification and having a git-like history of changes (to protect from accidental overwriting). Nothing of that helped me at all.
    Only constantly checking all data, and fixing them, before the errors become big enough for ECC to stop working, can prevent this.

    Did I miss it, or did they really forget that crucial part?

    --
    Any sufficiently advanced intelligence is indistinguishable from stupidity.
    1. Re:"Enterprisey" design? Yet no scrubbing? by Anonymous Coward · · Score: 2, Informative

      Did I miss it, or did they really forget that crucial part?

      You missed it. There is a scrubbing mechanism in ceph.

  14. Re:Is data integrity really necessary for large da by ProfMobius · · Score: 5, Informative
    First, Facebook & Google data are not possible to regenerate, as they are personal things, like emails, messages, posts, etc.

    Second, you have other sectors producing large amount of data beside your favourite networking website. One example is the LHC. It is going to produce terabytes of data per DAY (15 petabytes per year). Another are space telescopes. Those data can't just be 'regenerated'. 1 day worth of data is incredibly expensive to produce.

    Distributed file systems are already there, and people use them. Maybe not on your level of computer usage.

    When you don't know what you are talking about, I think it is better to just keep quiet.

    --
    EULA : By reading the above message, you agree that I now own your soul.
  15. Linux® by The+Yuckinator · · Score: 2, Insightful

    The first word in the article summary is "Linux®"

    Does that look weird to anyone else? I realize it's technically correct for the registered trademark symbol to be there, but somehow it just doesn't seem right.

    1. Re:Linux® by tomhudson · · Score: 2, Informative

      Definitely looks weird. I always write it in all-lowercase. But apparently the trademark is either all-caps ("LINUX®") or the standard capitalized form ("Linux®")

      Someone should remind them to register "linux®" (all lowercase), before Darl tries to. A capital first letter just doesn't look right.

    2. Re:Linux® by John+Hasler · · Score: 2, Informative

      A word mark is always registered as all upper case. Lower and mixed case are still covered.

      --
      Warning: this article may contain humor, sarcasm, parody, and perhaps even irony. Read at your own risk.
  16. Re:Is data integrity really necessary for large da by glwtta · · Score: 4, Insightful

    this copying of the node happens after the node goes down?

    One of the remaining replicas of each block on the failed node is copied so the total replication count does not go down. The original was perhaps poorly phrased, no need to be a dick about it, though.

    --
    sic transit gloria mundi
  17. How does this differ from glusterfs? by caffeinejolt · · Score: 2, Interesting

    I am not real familiar with ceph and after going through the pain to learn more about glusterfs (http://www.gluster.org/) only to learn that gluster was not quite ready for primetime (this was about 6 month ago - may have changed), I am a bit skeptical. Anyone know the main differences between ceph and glusterfs (besides that glusterfs can run in userspace)?

  18. Re:Is data integrity really necessary for large da by Anonymous Coward · · Score: 2, Interesting

    Yes, but Google's file system makes no attempt to implement either the POSIX standard or the Linux VFS. It's highly specialized to only deal with the types of loads that Google sees. As a general solution, it's worth is debatable.

  19. Re:In soviet Russia by Profane+MuthaFucka · · Score: 3, Funny

    If you woke up one morning in Tokyo to discover that someone had blurred your genitalia during the night, I'd bet you would consider puking on someone too.

    --
    Fascism trolls keeping me up every night. When I starts a preachin', he HITS ME WITH HIS REICH!
  20. Re:Is data integrity really necessary for large da by ae1294 · · Score: 2, Funny

    Why? Is there something special about those?

    You must be new here!

  21. Nope by avm · · Score: 2, Informative

    Nothing special at all. It only means Taco used sequential instead of randomised integers for user ids, which in turn can be viewed as a very loose chronology of user registrations.

    In other words, no.

  22. Re:pet-a-byte? by SlothDead · · Score: 2, Informative

    Tera -> Tetra -> 4 -> 1000^4
    Peta -> Penta (like Pentagram) -> 5 -> 1000^5
    Exa -> Hexa (like Hexagon) -> 6 -> 1000^6
    Zeta -> Setta (like 7 in many languages) -> 7 -> 1000^7
    Yotta -> Otta -> 8 -> 1000^8

    Or use 1024 if you don't like IEEE/IEC norms...