New Linux Petabyte-Scale Distributed File System

In soviet Russia by ds_online · 2010-05-05 12:20 · Score: 1, Funny

but in soviet Russia file systems Distribute you

Re:In soviet Russia by peaceful_bill · 2010-05-05 12:21 · Score: 1, Funny

I for one, welcome our new petabyteFS
Re:In soviet Russia by tomhudson · 2010-05-05 12:32 · Score: 4, Funny

I for one, welcome our new petabyteFS

Let me guess - you work for the SEC and need it for your porn collection
Re:In soviet Russia by ls671 · 2010-05-05 13:26 · Score: 3, Funny

640 petabytes should be enough for everybody.

--
Everything I write is lies, read between the lines.
Re:In soviet Russia by davester666 · 2010-05-05 15:26 · Score: 1

only if I trim my porn collection to only include the actual sex acts...

--
Sleep your way to a whiter smile...date a dentist!
Re:In soviet Russia by ooshna · 2010-05-05 15:31 · Score: 1

Yes Japanese women puking on each other is not an actual sex act.
Re:In soviet Russia by Profane+MuthaFucka · 2010-05-05 16:04 · Score: 3, Funny

If you woke up one morning in Tokyo to discover that someone had blurred your genitalia during the night, I'd bet you would consider puking on someone too.

--
Fascism trolls keeping me up every night. When I starts a preachin', he HITS ME WITH HIS REICH!
Re:In soviet Russia by Anonymous Coward · 2010-05-05 17:33 · Score: 0

Not /everybody/, but /anybody/. Please turn your card in.
Re:In soviet Russia by ooshna · 2010-05-05 18:59 · Score: 1

Yeah but I wouldn't set up lighting and cameras before I did it.
Re:In soviet Russia by Chas · 2010-05-05 19:03 · Score: 1

No, I think that's PEDObyte.
Just keep them away from children (with guns and horrendous megaviolence preferably) and you're golden.

--

Chas - The one, the only.
THANK GOD!!!
Re:In soviet Russia by spazdor · 2010-05-06 06:14 · Score: 1

Those were installed by the same guy who mosaic'd your junk.

--
DRM: Terminator crops for your mind!
Re:In soviet Russia by jesset77 · 2010-05-08 01:42 · Score: 1

(with guns and horrendous megaviolence preferably)
Why stop at Megaviolence these days when you have Giga, Tera, and even Petaviolence at your disposal. :D
Also, that's why you simply don't mess with the animal activists. They will go all spatio-temporal distortion on your ass. 8I

--
People willing to trade their freedom of expression for temporary entertainment deserve neither and will lose both.

History by Alcoholic+Dali · 2010-05-05 12:22 · Score: 4, Informative

Ceph was designed by Sage Weil (of WebRing fame), who is also one of the founders of DreamHost. They will likely be using it internally soon, if they aren't already. http://en.wikipedia.org/wiki/DreamHost

Re:History by TooMuchToDo · 2010-05-05 12:35 · Score: 4, Informative

http://www.dreamhost.com/jobs.html

FILE SYSTEMS SOFTWARE ENGINEER
Los Angeles, CA
New Dream Network has a vacancy for a Senior File Systems Software Engineer in Los Angeles, CA. Minimum requirements – Master’s degree in Computer Science or Computer Engineering, minimum of 2 years experience in storage programming, and background in Linux kernel programming, file systems development, network programming and Operating Systems design.
Qualified applicants should send a plain text resume to cephjobs@dreamhost.com
Re:History by volcan0 · 2010-05-05 13:36 · Score: 2, Insightful

I always liked dreamhost (not for their uptime....), this just confirms it. It is good to see a compagny using open source software contributing back like this.
Re:History by MichaelSmith · 2010-05-05 13:40 · Score: 2, Funny

Qualified applicants should send a plain text resume
Ha! That'll cut down on the noise. I wonder how many job seekers have ever heard of plain text?

--
http://michaelsmith.id.au
Re:History by John+Hasler · 2010-05-05 15:47 · Score: 3, Funny

"Plain text". That's just a Microsoft Word document with no embedded images or graphs or anything, right?

--
Warning: this article may contain humor, sarcasm, parody, and perhaps even irony. Read at your own risk.
Re:History by ae1294 · 2010-05-05 16:17 · Score: 4, Funny

I sent mine in ANSI format so I could blink my contact info...
Re:History by Anonymous Coward · 2010-05-05 17:12 · Score: 0

Yeah, nothing's better than blinking text when I'm trying to read a phone number.
Re:History by frinkacheese · 2010-05-05 18:55 · Score: 1

%!PS
1.00000 0.99083 scale /Courier findfont 12 scalefont setfont
0 0 translate /row 769 def
85 {/col 18 def 6 {col row moveto (
That is a hilariously good start to a Thursday morning on the UK election day. Wish I could mod it funny.
)show /col col 90 add def}
repeat /row row 9 sub def} repeat
showpage save restore
Re:History by MiniMike · 2010-05-06 01:58 · Score: 1

Cue image of hordes of Microsoft-Certified job seekers searching in vain for a font titled "Plain Text".
Re:History by Mysticalfruit · 2010-05-06 03:40 · Score: 1

I always prefer to send my resume in hand written LaTeX

--
Yes Francis, the world has gone crazy.
Re:History by raddan · 2010-05-06 05:08 · Score: 1

Just Base64-encode it. Plenty o' plain text now.
Re:History by Almost-Retired · 2010-05-06 15:11 · Score: 1

Don't quit your day job just yet.
Re:History by ae1294 · 2010-05-07 04:11 · Score: 1

hilariously good start to a Thursday morning on the UK election day.

You chaps still having those? Jolly Good!
Re:History by dbIII · 2010-05-16 21:31 · Score: 1

EBCDEC or ASCII plain text?
Re:History by MichaelSmith · 2010-05-16 21:56 · Score: 1

BAUDOT thanks.

--
http://michaelsmith.id.au
Re:History by sco08y · 2010-05-17 00:52 · Score: 1

Not quite, it's a OLE compound document with an embedded Plain Text object.

Is data integrity really necessary for large data? by BadAnalogyGuy · 2010-05-05 12:29 · Score: 2, Interesting

Look at Google and Facebook, arguably among the top users of massive databases. They have petabytes upon petabytes of data stored and are constantly growing. But what happens if they lose some data?

Nothing. They can always go back and regenerate that data. It's just a matter of time.

So at this large scale, it doesn't make any sense at all to focus on data integrity beyond making sure that fopen() and fread() don't return garbage. It's the smaller databases that contain critical information that need data integrity. These are typically sub-terabyte, though some may creep over that limit in a few uncommon instances.

And realistically, if you don't want your data to be hacked up, lost, then thrown out with a bad drive, ReiserFS or any other modern journaling filesystem is the right choice.

I wouldn't bet money on distributed filesystems just yet.

Is it ready for primetime? by Meshach · 2010-05-05 12:30 · Score: 5, Informative

The headline in the Ceph wiki: Ceph is under heavy development, and is not yet suitable for any uses other than benchmarking and review.

--
"Maybe this world is another planet's hell"
Aldous Huxley

Re:Is it ready for primetime? by EdIII · 2010-05-05 15:32 · Score: 0, Redundant

Thanks. I was about to download it to service my rather large storage requirements for porn, but it seems too risky now.
Re:Is it ready for primetime? by Anonymous Coward · 2010-05-05 16:14 · Score: 0

"Service" porn storage?
Ha CAPTCHA = capacity!
Re:Is it ready for primetime? by BeardedChimp · 2010-05-05 20:21 · Score: 1

Yep and they are using btrfs for the underlying filesystem which is also not at the production use stage.

For me this is quite a co-incidence, I just spent all yesterday reading up on fault taulerant distributed file systems and ceph and seemed quite promising until I realised they are also waiting on kernel 2.6.34 as it has their patches merged.

For anyone who knows more about this stuff, I was quite interested in xtreemfs as it seems to allow you to add nodes anywhere on the internet and it will deal with the fault tolerance/striping. For my purposes I don't care about having massive throughput but unfortunately xtreemfs doesn't seem to be deployed in many places so I don't know how good it is.
Re:Is it ready for primetime? by atamido · 2010-05-06 03:51 · Score: 1

Yep and they are using btrfs for the underlying filesystem which is also not at the production use stage.
Would you clarify what the difference between Ceph and BTRFS is? From the description I thought that is what BTRFS and ZFS were supposed to be.

Totally not ripped from a webcomic... by AdmiralXyz · 2010-05-05 12:31 · Score: 2, Insightful

"It took a lot of work, but this latest Linux patch enables support for multi-petabyte file organization and storage!"
"Do you have support for smooth, full-screen Flash video yet?"
"No, but who uses that?"

--
Dislike the Electoral College? Lobby your state to join the National Popular Vote Interstate Compact.

Re:Totally not ripped from a webcomic... by yourexhalekiss · 2010-05-05 12:35 · Score: 4, Insightful

"Do you have support for smooth, full-screen Flash video yet?"
Frankly, that's Adobe's fault, not ours.
Re:Totally not ripped from a webcomic... by h4rr4r · 2010-05-05 12:55 · Score: 1

I see the adobe developer made it here alright.
Dude, get another job if you hate this one so much.
Re:Totally not ripped from a webcomic... by Anonymous Coward · 2010-05-05 13:34 · Score: 0, Offtopic

Both of my linux systems play full screen flash far better than my windows box plays any flash
Hahahaha. 0/10.
Re:Totally not ripped from a webcomic... by Insightfill · 2010-05-05 14:09 · Score: 0

Having a rats nest of audio and video apis doesn't help the situation. You freetards should be happy what you get for your piece of shit OS.

Wow, there's a +1 insightful and a -1 troll in the same post. I've got mod-points, but was really not able to decide which way to go with this one.
Re:Totally not ripped from a webcomic... by Anonymous Coward · 2010-05-05 14:11 · Score: 0

You have a point with audio APIs, but video APIs are pretty straightforward: USE VA-API. How many more times to do we have to tell you? Dipshit.
Re:Totally not ripped from a webcomic... by Hurricane78 · 2010-05-05 14:18 · Score: 3, Insightful

Yes it is ours. If “ours” means: Us idiots who made Flash dominant in the first place, by using it in any way.
It always takes two. The ass doing it, and the idiot letting him do it. That guy with the narrow mustache from the 40s would agree to that: “What luck for rulers that men do not think.” ^^

--
Any sufficiently advanced intelligence is indistinguishable from stupidity.
Re:Totally not ripped from a webcomic... by Anonymous Coward · 2010-05-05 14:27 · Score: 0

Frankly, that's Adobe's fault, not ours.
You're spot on. If it's not free and open, how can we contribute?
Re:Totally not ripped from a webcomic... by SanityInAnarchy · 2010-05-05 14:29 · Score: 3, Interesting

Pick one.
What you call a "rat's nest", we call "compatibility", and it works surprisingly well. Writing a game? Use OpenAL -- the distro will configure it to work. Need realtime audio for a DAW? Use JACK. Anything else? Use ALSA.
What if you picked the "wrong one"? Doesn't really matter. If you managed to build a decent DAW on top of ALSA, it'll continue to work on top of ALSA. If you used OSS, that still works today.
Video APIs? Flash has its own codecs, so all you need to know is xvideo.
Seriously, you have even less of an excuse than people who bitch about how Linux has both GNOME and KDE, and oh, the horrors of actually having a choice.

--
Don't thank God, thank a doctor!
Re:Totally not ripped from a webcomic... by FauxPasIII · 2010-05-05 14:50 · Score: 1, Redundant

At least link the the comic you're totally not ripping from. ;)
http://xkcd.com/619/

--
25% Funny, 25% Insightful, 25% Informative, 25% Troll
Re:Totally not ripped from a webcomic... by glwtta · 2010-05-05 15:21 · Score: 1

This may come as a shock, but Linux has more useful applications than "dicking around on youtube".

--
sic transit gloria mundi
Re:Totally not ripped from a webcomic... by jedidiah · 2010-05-05 15:30 · Score: 5, Insightful

> Having a rats nest of audio and video apis doesn't help the situation. You freetards should be happy what you get for your piece of shit OS.
The ffmpeg developers can manage yet the "professionals" at Adobe cant?
"freetardry" is the only reason h264 acceleration is supported under Linux.
If we waited for the nickel-and-dime-you approach to come to the rescue we would still be waiting.
At least with MacOS, Adobe had a real excuse.

--
A Pirate and a Puritan look the same on a balance sheet.
Re:Totally not ripped from a webcomic... by jedidiah · 2010-05-05 15:37 · Score: 1

Our tools are better. Your "freetard" rhetoric doesn't matter. So does your "market share" rhetoric.
Adobe doesn't have any real excuse for being shown up by ALL of the "freetard" developers.

--
A Pirate and a Puritan look the same on a balance sheet.
Re:Totally not ripped from a webcomic... by iknowcss · 2010-05-05 15:40 · Score: 2, Interesting

Actually, I'm glad that he didn't link to it. I swear, every other story on Slashdot has some comment with a link to XKCD. Hey, we get the jokes. All of us read XKCD. You don't link to a video of Yakov Smirnoff every time you make a Soviet Russia joke, do you?

--
Life is rarely fair. Cherish the moments when there is a right answer.
Re:Totally not ripped from a webcomic... by jedidiah · 2010-05-05 15:43 · Score: 1

Even so. The this whole argument is mindless nonsense. Adobe finally only offered partial acceleration support even for Windows just recently.
The idea that any variant of Flash is any better than any other (or worse) is just Lemming nonsense.

--
A Pirate and a Puritan look the same on a balance sheet.
Re:Totally not ripped from a webcomic... by SanityInAnarchy · 2010-05-05 15:48 · Score: 4, Insightful

So then you freetards need to stop whining when 99% of the world choices not to use or support your shitty OS.
99% of the world does use our OS. You're likely doing it right now. Or did you think Slashdot runs on IIS?
And not that it'd make much difference to an obvious troll, but I use proprietary software when appropriate, and I am in favor of open source, not necessarily "free software." Not every Linux user is RMS. (And if they were, they probably wouldn't be Linux users.)

--
Don't thank God, thank a doctor!
Re:Totally not ripped from a webcomic... by John+Hasler · 2010-05-05 15:51 · Score: 1

Yes, but users of OSs that don't can't understand why anyone would use an OS that doesn't.

--
Warning: this article may contain humor, sarcasm, parody, and perhaps even irony. Read at your own risk.
Re:Totally not ripped from a webcomic... by scotch · 2010-05-05 16:33 · Score: 0, Troll

xkcd links get an automatic redundant rating from me when I have points. Just so everyone knows.

--
XML causes global warming.
Re:Totally not ripped from a webcomic... by fatalwall · 2010-05-05 17:09 · Score: 2, Insightful

I don't read XKCD...
Re:Totally not ripped from a webcomic... by nappingcracker · 2010-05-05 17:16 · Score: 2, Insightful

Frankly, that's Adobe's fault, not ours.
It could be our fault if you wanted it to be:
http://www.gnu.org/software/gnash/
http://swfdec.freedesktop.org/wiki/

--
|plastic....or gasoline?|
Re:Totally not ripped from a webcomic... by evilviper · 2010-05-05 17:26 · Score: 3, Insightful

"Do you have support for smooth, full-screen Flash video yet?"
A) Yes, I do. MPlayer will play any Flash videos, with a bare minimum of resources, and fully supports multiple video output methods, like xv and gl.
The PROBLEM is that Flash videos aren't directly available anywhere... You have to parse through a SWF video player object to even determine where to FIND the URL of the actual FLV or MP4 file. And add to that extremely aggressive plugin detection scripts on many sites, which will refuse to even embed the SWF if you happen to have an unknown VERSION of the flash player. Unfortunately, I've mentioned this before, and got several interested replies, but nobody has thus far written a browser plug-in that will masquerade as Flash 10, and understand just enough SWF to find the URLs, and either present them to the users, or automatically pass them to MPlayer. A sad, sad failing, to be sure, since
B) I (and many, many others) care VASTLY more about Linux's support for massive storage arrays than we do for it's support of Flash, and other user-level fluff. My servers never need to visit YouTube... But booting from a hard drive more than 2 terabytes??? Don't expect Windows to let you do that, without very specialized hardware (EFI firmware). Linux, however, can do it out of the box with many common distros.

--
Slashdot gets worse every day... Pipedot: News for nerds, without the corporate slant
Re:Totally not ripped from a webcomic... by ArsonSmith · 2010-05-05 17:54 · Score: 1

that's actually a good idea.
in Soviet Russia, Yakov smimoff Links you!

--
Paying taxes to buy civilization is like paying a hooker to buy love.
Re:Totally not ripped from a webcomic... by Anonymous Coward · 2010-05-05 18:47 · Score: 1, Funny

Not every Linux user is RMS. (And if they were, they probably wouldn't be Linux users.)
Ahem ... That should be: "Not every GNU/Linux user is ... "
Re:Totally not ripped from a webcomic... by vegiVamp · 2010-05-05 20:56 · Score: 1

How many people would you estimate watch Dreamworks offerings ?

Guess what their renderfarm runs.

--
What a depressingly stupid machine.
Re:Totally not ripped from a webcomic... by Anonymous Coward · 2010-05-06 00:11 · Score: 0

So a nerd is sent to /. for the first time. At night the lights in his basements are turned off, and an oldtimer goes to the comments and posts "Number 123!" and is instantly modded +5 funny. A few minutes later, somebody else posts, "Number 378!" Again, the post gets a positive rating.
The new guy asks a low UID what is going on. "Well," says the oldtimer, "we've all seen every xkcd comic a million times before. So we just yell out the number instead of linking to the comic."
The new guy decides to try this for himself, and posts "Number 619!", but all he gets is a -1 redundant. He asks the low UID, "What's wrong? Why didn't I get modded up?"
"Well," said the older /.er, "sometimes it's not the joke, but how you tell it."
Re:Totally not ripped from a webcomic... by Anonymous Coward · 2010-05-06 01:06 · Score: 0

"We are working with Ubuntu and other partners to enable certification of Flash Player 10.1 for Linux on the Ubuntu 10.04 LTS release, an exciting release for Linux-based desktops and devices."
http://blogs.computerworld.com/16007/ubuntu_10_04_where_ubuntu_goes_from_here
Re:Totally not ripped from a webcomic... by overlordofmu · 2010-05-06 01:34 · Score: 1

Works for me, the full screen flash, it does.

Also, vi and sticks . . .
Re:Totally not ripped from a webcomic... by Bambi+Dee · 2010-05-06 03:09 · Score: 1

To get back on topic: I had to restart Firefox because Flash had somehow lost audio in the meantime.
Re:Totally not ripped from a webcomic... by tehcyder · 2010-05-06 22:42 · Score: 1

Nice way to miss the point.

--
To have a right to do a thing is not at all the same as to be right in doing it

Re:Is data integrity really necessary for large da by CoderJoe · 2010-05-05 12:36 · Score: 5, Informative

Google's BigFile/BigTable architecture is a distributed filesystem. if a node goes down, the data that was on that node gets copied to other nodes to keep the replication count up.

Facebook is using apache cassandra, which adopts similar designs.

Re:Is data integrity really necessary for large da by CoderJoe · 2010-05-05 12:40 · Score: 1

Oh, and I forgot about Amazon Dynamo.

Re:Do niggers use linux? by Cryacin · 2010-05-05 12:45 · Score: 4, Insightful

I think the big issue in the programming community as a whole is the current lack of understanding of the differences between eventual and atomic consistency.

Distributed file systems work quite well when you have a single source of truth, but when you have multiple data stores, you can have multiple sources of truth. It essentially adds a temporal dimension to your data. As in, John Smith is a debtor of XYZ corp on Monday morning, but due to the server being down, we haven't realised on Tuesday morning that he paid his bill on Monday afternoon. Add late fee penalties.

It adds another layer of complexity to an application that delayed gestures roll back transitive actions between actors in an Ecosystem. In the example, it would be to send out another letter stating that the late fee penalties have been removed, and if already paid, a refund is to be issued.

--
Science advances one funeral at a time- Max Planck

pet-a-byte? by jrivar59 · 2010-05-05 13:19 · Score: 1

I'm not really sure how much a petabyte is. Could someone please translate to Natalie Portmans? or Station wagons full of congresses? or Rods to the Hogshead?

Re:pet-a-byte? by fatalwall · 2010-05-05 17:14 · Score: 1

dont quote me on it as im too tired to look it up but i believe a petabyte is 1000 terabytes... and last i checked thats like billions of rods of hogsheds worth of Natalie Portmans being used as station wagons full of congresses.
Re:pet-a-byte? by SlothDead · 2010-05-06 00:02 · Score: 2, Informative

Tera -> Tetra -> 4 -> 1000^4
Peta -> Penta (like Pentagram) -> 5 -> 1000^5
Exa -> Hexa (like Hexagon) -> 6 -> 1000^6
Zeta -> Setta (like 7 in many languages) -> 7 -> 1000^7
Yotta -> Otta -> 8 -> 1000^8
Or use 1024 if you don't like IEEE/IEC norms...
Re:pet-a-byte? by Anonymous Coward · 2010-05-06 04:47 · Score: 0

I'm not really sure how much a petabyte is.
There are a few people who feel, quite strongly, that petabytes are just not cutting it.

Re:Is data integrity really necessary for large da by jdhutchins · 2010-05-05 13:19 · Score: 4, Insightful

While google may be able to go ahead and re-index websites if it loses that data, "regenerating" gmail and google docs stuff isn't quite so easy, and even small amounts of data loss would kill those applications (especially among paid users).

Re:Is data integrity really necessary for large da by morgan_greywolf · 2010-05-05 13:25 · Score: 5, Insightful

Nothing. They can always go back and regenerate that data. It's just a matter of time.

You just contradicted yourself. You're right; it's just a matter of time. Only, thing is, this is the Internet. How long to recreate that data? Weeks? Months? Years? 6 months is an eternity on the Net.

If all the accounts and stories were lost on Slashdot due to a massive database failure, how many people would come back, creating a new account and so forth? How many long would it take before there was enough content and accounts to make it interesting again? Now realize that Slashdot is a drop in the bucket compared to Google.

--
My blog

Re:Do niggers use linux? by Ethanol-fueled · 2010-05-05 13:27 · Score: 5, Insightful

It was noble of you to try to wrest control of a troll thread, but your comment loses a lot of credibility for being titled "Re: Do niggers use linux?"

Would it hurt to at least change the title while you strive for visibility and relevance? When I saw the title of your post, I half-expected to see a poorly-written diatribe against Jamal Jackson for playing basketball and chasing caucasian women.

Thank you, kind sir, for listening. We all must do our part to prevent trolling!

Lustre by Anonymous Coward · 2010-05-05 13:32 · Score: 0

How is this different than Lustre?

Re:Lustre by Lennie · 2010-05-06 05:12 · Score: 1

this article has some comparisons with Lustre:
http://www.linux-mag.com/cache/7744/1.html

--
New things are always on the horizon

Thread titles vs Trolling by MichaelSmith · 2010-05-05 13:42 · Score: 5, Funny

Would it hurt to at least change the title while you strive for visibility and relevance?

Well you didn't change it

--
http://michaelsmith.id.au

Re:Thread titles vs Trolling by randyleepublic · 2010-05-06 16:50 · Score: 0

That's "Well you didn't change it punk."
These fine points are everything.

--
Social Credit would solve everything...
Re:Thread titles vs Trolling by h00manist · 2010-05-17 03:12 · Score: 1

postingrepeatsoriginaltitles = off

--
Build your own energy sources from scratch. http://otherpower.com/

"Enterprisey" design? Yet no scrubbing? by Hurricane78 · 2010-05-05 13:42 · Score: 2, Interesting

I see a lot too many layers over layers there. Which always smells like the inner-platform anti-pattern that a “enterprise consultant” would to, to me.
But maybe I’m just misunderstanding things and that amount of layers is needed for large installations. Anyone here, who actually administers such large storage systems and read the article? Would be interesting to hear from someone with daily experience in this.

Also, I could not find any mentioning of any ZFS-like scrubbing going on. Which in my experience equals zero reliability at all with today’s unreliable drives. How would that system detect a controller creating corruption? Or data degradation? I had those problems. And they killed half my data. Despite having a RAID, doing automatic backups with verification and having a git-like history of changes (to protect from accidental overwriting). Nothing of that helped me at all.
Only constantly checking all data, and fixing them, before the errors become big enough for ECC to stop working, can prevent this.

Did I miss it, or did they really forget that crucial part?

--
Any sufficiently advanced intelligence is indistinguishable from stupidity.

Re:"Enterprisey" design? Yet no scrubbing? by Anonymous Coward · 2010-05-05 17:35 · Score: 2, Informative

Did I miss it, or did they really forget that crucial part?
You missed it. There is a scrubbing mechanism in ceph.
Re:"Enterprisey" design? Yet no scrubbing? by Lennie · 2010-05-06 05:13 · Score: 1

Also it uses BTRFS as the local filesystem, which does quiet a few checks as well.

--
New things are always on the horizon

Zetta = Peta * 1,000,000 by 0100010001010011 · 2010-05-05 13:43 · Score: 1

I think I'll stick with ZFS. It's a million times better, give or take.

Re:Is data integrity really necessary for large da by ProfMobius · 2010-05-05 13:44 · Score: 5, Informative

First, Facebook & Google data are not possible to regenerate, as they are personal things, like emails, messages, posts, etc.

Second, you have other sectors producing large amount of data beside your favourite networking website. One example is the LHC. It is going to produce terabytes of data per DAY (15 petabytes per year). Another are space telescopes. Those data can't just be 'regenerated'. 1 day worth of data is incredibly expensive to produce.

Distributed file systems are already there, and people use them. Maybe not on your level of computer usage.

When you don't know what you are talking about, I think it is better to just keep quiet.

--
EULA : By reading the above message, you agree that I now own your soul.

Re:Is data integrity really necessary for large da by PenguinBob · 2010-05-05 13:51 · Score: 1

If that were to happen, I'd finally be able to get a low UID!

They don't steal everything. by tomhudson · 2010-05-05 14:36 · Score: 1

That's Goldman-Sach's job, you insensitive clod!

Linux® by The+Yuckinator · 2010-05-05 14:41 · Score: 2, Insightful

The first word in the article summary is "Linux®"

Does that look weird to anyone else? I realize it's technically correct for the registered trademark symbol to be there, but somehow it just doesn't seem right.

Re:Linux® by tomhudson · 2010-05-05 15:32 · Score: 2, Informative

Definitely looks weird. I always write it in all-lowercase. But apparently the trademark is either all-caps ("LINUX®") or the standard capitalized form ("Linux®")
Someone should remind them to register "linux®" (all lowercase), before Darl tries to. A capital first letter just doesn't look right.
Re:Linux® by John+Hasler · 2010-05-05 15:55 · Score: 2, Informative

A word mark is always registered as all upper case. Lower and mixed case are still covered.

--
Warning: this article may contain humor, sarcasm, parody, and perhaps even irony. Read at your own risk.
Re:Linux® by tomhudson · 2010-05-06 03:26 · Score: 1

Poster saith: >"A word mark is always registered as all upper case. Lower and mixed case are still covered."
This is simply not true. For example, in Australia, the word "iPAD" (Siemens) vs "IPAD" (Apple) http://pericles.ipaustralia.gov.au/atmoss/Falcon_Users_Cookies.Run_Create

Trade Mark : 1120005
International Registration : 885058
Word: iPAD
Image:
Lodgement Date: 20-MAR-2006
Notification Date: 22-JUN-2006
Convention Details: 28-FEB-2006
004 928 859
EUROPEAN COMMUNITY
Registered From: 20-MAR-2006
Date of Acceptance: 05-JUL-2006
Acceptance Advertised: 20-JUL-2006
Registration Advertised: 16-NOV-2006
Entered on Register: 30-OCT-2006
Renewal Due: 20-MAR-2016
Class/es: 7, 9
Status: Registered/Protected
Kind: n/a
Type of Mark: Word
Endorsement
Owner/s: Siemens Aktiengesellschaft
Wittelsbacherplatz 2
80333 Munchen
GERMANY
Address for Service: International Bureau, WIPO
34, chemin des Colombettes
P.O. Box 18
1211 Geneva 20,
SWITZERLAND

as opposed to IPAD

Trade Mark : 1177855
Word: IPAD
Image:
Lodgement Date: 04-JUN-2007
Registered From: 04-JUN-2007
Date of Acceptance: 17-JUN-2007
Acceptance Advertised: 04-OCT-2007
Registration Advertised: 18-FEB-2010
Entered on Register: 01-FEB-2010
Renewal Due: 04-JUN-2017
Class/es: 9
Status: Registered/Protected
Kind: n/a
Type of Mark: Word
Owner/s: Apple Inc.
1 Infinite Loop
Cupertino
California 95014
UNITED STATES OF AMERICA
Address for Service: Clayton Utz
PO Box H3
AUSTRALIA SQUARE NSW 1215
AUSTRALIA
Goods & Services
Class: 9 Electronic information display terminals including electronic information kiosks and public access display apparatuses
History
Opposition
Indexing Details - Word Constituents
I IPAD

So as you can see, when you say "A word mark is always registered as all upper case. Lower and mixed case are still covered."", it's simply not true. the United States is only 5% of the world, and shouldn't be taken as definitive.
Ignore the rest -it's just a bunch of filler text to get around the "Your comment has too few characters per line (currently 22.0). " lameness filter. Why they don't fix this so that it ignores blockquotes that actually have valid content is beyond me, but then again, what can you do, right? Yadda, yadda, yadda, Please try to keep posts on topic. Try to reply to other people's comments instead of starting new threads. Read other people's messages before posting your own to avoid simply duplicating what has already been said. Use a clear subject that describes what your message is about. Offtopic, Inflammatory, Inappropriate, Illegal, or Offensive comments might be moderated. (You can read everything, even moderated posts, by adjusting your threshold on the User Preferences Page) If you are having a problem with accounts or comment posting, please yell for help.

Re:Is data integrity really necessary for large da by kevin7kal · 2010-05-05 14:49 · Score: 0, Redundant

this copying of the node happens after the node goes down? so the software time travels? That totally disproves Stephen Hawkin's recent time travel can only go forward statement! DUDE - AWSOME!

Re:Is data integrity really necessary for large da by glwtta · 2010-05-05 15:26 · Score: 4, Insightful

this copying of the node happens after the node goes down?

One of the remaining replicas of each block on the failed node is copied so the total replication count does not go down. The original was perhaps poorly phrased, no need to be a dick about it, though.

--
sic transit gloria mundi

How does this differ from glusterfs? by caffeinejolt · 2010-05-05 15:28 · Score: 2, Interesting

I am not real familiar with ceph and after going through the pain to learn more about glusterfs (http://www.gluster.org/) only to learn that gluster was not quite ready for primetime (this was about 6 month ago - may have changed), I am a bit skeptical. Anyone know the main differences between ceph and glusterfs (besides that glusterfs can run in userspace)?

Re:How does this differ from glusterfs? by perlchild · 2010-05-05 19:28 · Score: 1

Ceph reminds me more of Coda than glusterfs. Anyone remember coda?
Re:How does this differ from glusterfs? by Troy+Baer · 2010-05-06 02:52 · Score: 1

I remember that the guys who originally wrote Coda basically abandoned it and moved on to doing Lustre...

--
"My life's work has been to prompt others... and be forgotten." --Cyrano de Bergerac

Re:Is data integrity really necessary for large da by tjones · 2010-05-05 15:46 · Score: 1

Why? Is there something special about those?

Re:Is data integrity really necessary for large da by Anonymous Coward · 2010-05-05 15:48 · Score: 2, Interesting

Yes, but Google's file system makes no attempt to implement either the POSIX standard or the Linux VFS. It's highly specialized to only deal with the types of loads that Google sees. As a general solution, it's worth is debatable.

Re:Is data integrity really necessary for large da by Anonymous Coward · 2010-05-05 16:14 · Score: 0

If you've stored the data, you can reproduce the data in the event some of it is lost.

RAID much? PAR often?

Re:Is data integrity really necessary for large da by ae1294 · 2010-05-05 16:20 · Score: 2, Funny

Why? Is there something special about those?

You must be new here!

Re:Is data integrity really necessary for large da by CoderJoe · 2010-05-05 16:31 · Score: 1

Yes, but Google's file system makes no attempt to implement either the POSIX standard or the Linux VFS. It's highly specialized to only deal with the types of loads that Google sees. As a general solution, it's worth is debatable.

But that is not what the original question was about. The original question was about sites like Google or Facebook using anything like a distributed file system to keep from losing data.

Nope by avm · 2010-05-05 16:33 · Score: 2, Informative

Nothing special at all. It only means Taco used sequential instead of randomised integers for user ids, which in turn can be viewed as a very loose chronology of user registrations.

In other words, no.

Re:Nope by tjones · 2010-05-05 16:54 · Score: 1

Good to see you too, old timer.
Re:Nope by emjay88 · 2010-05-05 18:23 · Score: 1

These people looked deep within my soul and assigned me a number based on the order in which I joined.

--
1178161 is prime...
Re:Nope by SnowZero · 2010-05-05 18:31 · Score: 1

Yeah, but it's a pretty big prime.
Re:Nope by ae1294 · 2010-05-07 04:18 · Score: 1

Yeah, but it's a pretty big prime.
Yes but it's not an "optimus prime".

Re:Is data integrity really necessary for large da by gilboad · 2010-05-05 18:31 · Score: 1

Why do you assume that:
A: PB storage is very rare and only used by several large organizations.
B: PB storage is used to house generated data the can easily be replaced.

- Gilboa

Re:Is data integrity really necessary for large da by Per+Wigren · 2010-05-05 19:19 · Score: 1

..and the pretty amazing open source distributed multi-master no-single-point-of-failure database Riak.

--
My other account has a 3-digit UID.

Re:Oh great by Anonymous Coward · 2010-05-05 19:33 · Score: 0

This was a reference to ReiserFS.
A bad taste joke, but not offtopic.

Re:Is data integrity really necessary for large da by drsmithy · 2010-05-05 20:11 · Score: 1

Nothing. They can always go back and regenerate that data. It's just a matter of time.

No, they can't. This is a really, really important distinction to make. They cannot "regenerate" the data. They *might* (perhaps even "probably") be able to "recopy" the data, *assuming the original source is still available*.

Re:Is data integrity really necessary for large da by Anonymous Coward · 2010-05-05 22:58 · Score: 0

Calculate the overhead of say, RAID 6, for 1 Petabyte of data.

I did. by Lorien_the_first_one · 2010-05-06 00:02 · Score: 1

So there. :)

--
The diversity and expression of human opinion is essential to human survival.

Re:Is data integrity really necessary for large da by OeLeWaPpErKe · 2010-05-06 00:49 · Score: 1

Acutally your raid array can't regenerate your data in most failure scenarios because of idiotic design :

Bit error in RAID 1 :

disk A : 000000111011011
disk B : 001000111011011

that's the information your raid array has in case of a bit error. Do tell, which is the correct one ?

Or, better, yet, a 3 disk RAID-5 array :

disk A : 000000111011011
disk B : 001001010011001
parity disk : 001101101000010

clearly something is wrong ... now fix the problem.

RAID is worthless unless you know which data set is wrong.

(retitled) Do POSIX stds require atomicity? by daboochmeister · 2010-05-06 01:34 · Score: 1

Just curious - too far in my technical past for me to recall - Ceph is claimed to adhere to POSIX standards. Do POSIX standards accommodate the "eventually consistent" filesystem models?

--
"Ahh! I see you're in that indeterminate Schrodinger state where - oh, uh ... never mind." Dave Bucci

Re:Is data integrity really necessary for large da by Anonymous Coward · 2010-05-06 01:56 · Score: 0

Big parallel filesystems are a dime a dozen now. There is CXFS, Panasis, pNFS is coming, Lustre, PVFS (not good), Ceph (not good), GFS (Google File System, which is being rewritten/seriously updated), GPFS, Hadoop, QFS (not sure of its scaling) and more.

Almost all of these things have the same basic architecture where the metatdata (actual inodes) are separate from the data, and the data can be RAIDED/duplicated and/or a single file can be striped across a number of storage machines. Like memory, filesystems on large scale have a hierarchy as well. Memory, you have, registers, L1/2/3 cache, system memory, swap if you still use that. Storage is going the same way.

The crazy thing is that the disparity between CPU speed and IO speed is becoming greater and greater. Also, datasets are getting larger and larger.

"included in kernel since 2.6.34"? by therealkevinkretz · 2010-05-06 02:09 · Score: 1

kernel.org only has up to 2.6.33.

Re:Do niggers use linux? by CAIMLAS · 2010-05-06 03:00 · Score: 1

It was noble of you to try to wrest control of a troll thread, but your comment loses a lot of credibility for being titled "Re: Do niggers use linux?"

While it's off-topic, it's at least an honest question! I'm sure the slashbots want to know the answer.

--
~/ssh slashdot.org ssh: connect to host slashdot.org port 22: too many beers

Re:Is data integrity really necessary for large da by Lennie · 2010-05-06 04:16 · Score: 1

Facebook uses MySQL/memcached, cassandra is only used for systems running the statistical analysis.

--
New things are always on the horizon

Re:Is data integrity really necessary for large da by inerlogic · 2010-05-06 05:55 · Score: 1

RAID5 doesn't use a parity disk, nor does it protect against bit level errors, if a drive fails, you obviously know which drive is bad, and unless a second drive fails, recover your data.

if you want bit level error checking, use a ZFS based RAID.

Re:Is data integrity really necessary for large da by tehcyder · 2010-05-06 22:35 · Score: 1

When you don't know what you are talking about, I think it is better to just keep quiet.

That would reduce the number of posts on slashdot by about 99%.

--
To have a right to do a thing is not at all the same as to be right in doing it

Re:Is data integrity really necessary for large da by Anonymous Coward · 2010-05-09 00:09 · Score: 0

The parity in RAID5 is spread over the disks, so for any block of data, yes there is a "parity disk" for it.

Re:Is data integrity really necessary for large da by ckaminski · 2010-05-17 06:33 · Score: 1

Except that not a single one of those is usable across existing multiple platforms (Solaris,BSD/Linux/Windows, etc.), where you might want to have cardinality of your data. You put your .Net data on your .Net platforms, and your J2EE data on your J2EE platforms, etc. I always have to have a layer like CIFS or SMB on top.

I suppose that's fine, you can always throw extra layers, but if a system is doing distribution already, why can't you throw in location transparency on top and add a Posix layer.

And none of them so far are Free for all the above mentioned platforms. Hell, I'll take just Linux and Windows. GPFS is, but IBM will ass-rape you in $$$ for it. Lustre has the same problem if you want Windows support.

The rest (almost universally) don't support Posix semantics (though OCFS is apparently shooting to do so).

Then there's the fact that almost all of the open Linux only ones have a SPOF in the form of the metadata server.

Re:Is data integrity really necessary for large da by Anonymous Coward · 2010-05-17 13:35 · Score: 0

I deal with genomic data sets. Each data set is 30-80Gb. We're a small group, but the big ones have petabytes of storage. If the biological material is used up, you cannot regenerate the data.

Slashdot Mirror

New Linux Petabyte-Scale Distributed File System

132 comments