New Linux Petabyte-Scale Distributed File System

← Back to Stories (view on slashdot.org)

New Linux Petabyte-Scale Distributed File System

Posted by samzenpus on Wednesday May 5, 2010 @12:14PM from the check-it-out dept.

An anonymous reader writes "A recent addition to Linux's impressive selection of file systems is Ceph, a distributed file system that incorporates replication and fault tolerance while maintaining POSIX compatibility. Explore the architecture of Ceph and learn how it provides fault tolerance and simplifies the management of massive amounts of data."

16 of 132 comments (clear)

History by Alcoholic+Dali · 2010-05-05 12:22 · Score: 4, Informative

Ceph was designed by Sage Weil (of WebRing fame), who is also one of the founders of DreamHost. They will likely be using it internally soon, if they aren't already. http://en.wikipedia.org/wiki/DreamHost
1. Re:History by TooMuchToDo · 2010-05-05 12:35 · Score: 4, Informative
  
  http://www.dreamhost.com/jobs.html
  
  FILE SYSTEMS SOFTWARE ENGINEER
  Los Angeles, CA
  New Dream Network has a vacancy for a Senior File Systems Software Engineer in Los Angeles, CA. Minimum requirements – Master’s degree in Computer Science or Computer Engineering, minimum of 2 years experience in storage programming, and background in Linux kernel programming, file systems development, network programming and Operating Systems design.
  Qualified applicants should send a plain text resume to cephjobs@dreamhost.com
2. Re:History by ae1294 · 2010-05-05 16:17 · Score: 4, Funny
  
  I sent mine in ANSI format so I could blink my contact info...
Is it ready for primetime? by Meshach · 2010-05-05 12:30 · Score: 5, Informative

The headline in the Ceph wiki: Ceph is under heavy development, and is not yet suitable for any uses other than benchmarking and review.

--
"Maybe this world is another planet's hell"
Aldous Huxley
Re:In soviet Russia by tomhudson · 2010-05-05 12:32 · Score: 4, Funny

I for one, welcome our new petabyteFS

Let me guess - you work for the SEC and need it for your porn collection
Re:Totally not ripped from a webcomic... by yourexhalekiss · 2010-05-05 12:35 · Score: 4, Insightful

"Do you have support for smooth, full-screen Flash video yet?"
Frankly, that's Adobe's fault, not ours.
Re:Is data integrity really necessary for large da by CoderJoe · 2010-05-05 12:36 · Score: 5, Informative

Google's BigFile/BigTable architecture is a distributed filesystem. if a node goes down, the data that was on that node gets copied to other nodes to keep the replication count up.
Facebook is using apache cassandra, which adopts similar designs.
Re:Do niggers use linux? by Cryacin · 2010-05-05 12:45 · Score: 4, Insightful

I think the big issue in the programming community as a whole is the current lack of understanding of the differences between eventual and atomic consistency.

Distributed file systems work quite well when you have a single source of truth, but when you have multiple data stores, you can have multiple sources of truth. It essentially adds a temporal dimension to your data. As in, John Smith is a debtor of XYZ corp on Monday morning, but due to the server being down, we haven't realised on Tuesday morning that he paid his bill on Monday afternoon. Add late fee penalties.

It adds another layer of complexity to an application that delayed gestures roll back transitive actions between actors in an Ecosystem. In the example, it would be to send out another letter stating that the late fee penalties have been removed, and if already paid, a refund is to be issued.

--
Science advances one funeral at a time- Max Planck
Re:Is data integrity really necessary for large da by jdhutchins · 2010-05-05 13:19 · Score: 4, Insightful

While google may be able to go ahead and re-index websites if it loses that data, "regenerating" gmail and google docs stuff isn't quite so easy, and even small amounts of data loss would kill those applications (especially among paid users).
Re:Is data integrity really necessary for large da by morgan_greywolf · 2010-05-05 13:25 · Score: 5, Insightful

Nothing. They can always go back and regenerate that data. It's just a matter of time.
You just contradicted yourself. You're right; it's just a matter of time. Only, thing is, this is the Internet. How long to recreate that data? Weeks? Months? Years? 6 months is an eternity on the Net.
If all the accounts and stories were lost on Slashdot due to a massive database failure, how many people would come back, creating a new account and so forth? How many long would it take before there was enough content and accounts to make it interesting again? Now realize that Slashdot is a drop in the bucket compared to Google.

--
My blog
Re:Do niggers use linux? by Ethanol-fueled · 2010-05-05 13:27 · Score: 5, Insightful

It was noble of you to try to wrest control of a troll thread, but your comment loses a lot of credibility for being titled "Re: Do niggers use linux?"

Would it hurt to at least change the title while you strive for visibility and relevance? When I saw the title of your post, I half-expected to see a poorly-written diatribe against Jamal Jackson for playing basketball and chasing caucasian women.

Thank you, kind sir, for listening. We all must do our part to prevent trolling!
Thread titles vs Trolling by MichaelSmith · 2010-05-05 13:42 · Score: 5, Funny

Would it hurt to at least change the title while you strive for visibility and relevance?
Well you didn't change it

--
http://michaelsmith.id.au
Re:Is data integrity really necessary for large da by ProfMobius · 2010-05-05 13:44 · Score: 5, Informative

First, Facebook & Google data are not possible to regenerate, as they are personal things, like emails, messages, posts, etc.
Second, you have other sectors producing large amount of data beside your favourite networking website. One example is the LHC. It is going to produce terabytes of data per DAY (15 petabytes per year). Another are space telescopes. Those data can't just be 'regenerated'. 1 day worth of data is incredibly expensive to produce.
Distributed file systems are already there, and people use them. Maybe not on your level of computer usage.
When you don't know what you are talking about, I think it is better to just keep quiet.

--
EULA : By reading the above message, you agree that I now own your soul.
Re:Is data integrity really necessary for large da by glwtta · 2010-05-05 15:26 · Score: 4, Insightful

this copying of the node happens after the node goes down?

One of the remaining replicas of each block on the failed node is copied so the total replication count does not go down. The original was perhaps poorly phrased, no need to be a dick about it, though.

--
sic transit gloria mundi
Re:Totally not ripped from a webcomic... by jedidiah · 2010-05-05 15:30 · Score: 5, Insightful

> Having a rats nest of audio and video apis doesn't help the situation. You freetards should be happy what you get for your piece of shit OS.
The ffmpeg developers can manage yet the "professionals" at Adobe cant?
"freetardry" is the only reason h264 acceleration is supported under Linux.
If we waited for the nickel-and-dime-you approach to come to the rescue we would still be waiting.
At least with MacOS, Adobe had a real excuse.

--
A Pirate and a Puritan look the same on a balance sheet.
Re:Totally not ripped from a webcomic... by SanityInAnarchy · 2010-05-05 15:48 · Score: 4, Insightful

So then you freetards need to stop whining when 99% of the world choices not to use or support your shitty OS.
99% of the world does use our OS. You're likely doing it right now. Or did you think Slashdot runs on IIS?
And not that it'd make much difference to an obvious troll, but I use proprietary software when appropriate, and I am in favor of open source, not necessarily "free software." Not every Linux user is RMS. (And if they were, they probably wouldn't be Linux users.)

--
Don't thank God, thank a doctor!