ZFS, the Last Word in File Systems?
guigouz writes "Sun is carrying a feature story about its new ZFS File System - ZFS, the dynamic new file system in Sun's Solaris 10 Operating System (Solaris OS), will make you forget everything you thought you knew about file systems. ZFS will be available on all Solaris 10 OS-supported platforms, and all existing applications will run with it. Moreover, ZFS complements Sun's storage management portfolio, including the Sun StorEdge QFS software, which is ideal for sharing business data."
From the article:
Unlimited scalability
As the world's first 128-bit file system, ZFS offers 16 billion billion times the capacity of 32- or 64-bit systems.
Microsoft immediately countered by saying WinFS will now support "twelveteen million billion times" as much storage as Sun's ZFS, and is "a bazillion times" more secure.
When reached for comment, Sun CEO Scott McNealy replied "neener neener". Microsoft CEO Steve Ballmer responded by putting gum in Sun President Jonathan Schwartz's hair.
And it looks like it's going to be opensourced along with most of Solaris 10!
Presumably a 32 bit machine will be able to handle a 128 bit file system, in the same way as Solaris 10 is currently destined for (at most) 64 bits.
Of course ZFS is the last word in file systems. I mean, what can come after zed?
"Have you ever thought about just turning off the TV, sitting down with your kids, and hitting them?"
1) Even Sun has succumbed to recursive acronyms, now.
2) Is it just me, or is the post surprisingly bereft of unique details? I mean, integration with all existing applications is rather assumed, given that it's a file system and all...
It's only an insult if it's not true.
Logically, the next question is if ZFS' 128 bits is enough. According to Bonwick, it has to be. "Populating 128-bit file systems would exceed the quantum limits of earth-based storage. You couldn't fill a 128-bit storage pool without boiling the oceans."
So, what was the point of creating a 128-bit filesystem?
-1, Marketing Hype.
*Yawn*
... it took them long enough.
Perhaps they had to rewrite an LVM from scratch in order to opensource it?
Having a global pool does lessen maintenance/support, but what method are they using to place data on the disks?
Frequently accessed data needs to be spread out on all the disks for the fastest access, so does that mean Sun has FS files/tables that track usage and repositions data based on that?
IBM has ZFS on their z/OS Unix Systems Services (POSIX interfaces on z/OS) component. ZFS was developed to provide improvements over the HFS (Hierarchical File System) that they ship with the OS.
Nah, the ultimate filesystem has to be xyzzyfs! Your data magically appears... :-)
Enable 3D printed prosthetics!
We heard earlier that solaris 10 will be open source.
I wonder if that means that this filesystem can be included in other kernels.
Sagans?
Best Slashdot Co
I'm really happy with UFS2/SU, and have been more than happy with the original UFS in general since 1994 when I first started off with NetBSD.
But, with ZFS, maybe we finally have found a FS with replacing it with. I sure look forward to trying Solaris 10, though I'm sure that I will find that SunOS has a better feal to it, like always.
Maybe DragonflyBSD will be the one to do this, FreeBSD is generally more restrictive to radical changes; for good reasons, you don't get that stability without reason.
Sadly Google returns no hits for rearchistrated
Score:-1, Funny
"ZFS, the Last Word in File Systems?"
The last word in file systems is "systems". And stop asking file systems these questions, you fool.
War is one of the most horrible things a human can be exposed to. And one of the worlds largest industries.
If it's the last word, why are we even talking about it?
and
Compared to AIX or HP-UX, 28 steps is shockingly bad, both have had much simpler logical volume management for several versions now (AIX for 5 years or more? certainly as long as I have used it). The existing Solaris 9 logical volume infrastructure is years behind the competition, this is bringing it up to date, but not putting it far ahead.
Ewan
COME ON! It may be a slow day, but how is this news? There's only one link, and it's to Sun's marketing info.
Can someone please provide a link to some technical details other than it being 128-bit? What does this file system actually do that is even remotely special? What's under the covers? And, more importantly, does it actually work as described?
-1,Uninformative
But of course you'll still have to have your boot image within the first 1024 cylinders.
Does this mean the absolutely awful Disksuite/Solaris Volume Manager is finally, mercifully, dead, too?
I'll do a dance of utter joy if so. Disksuite is 10 pounds of shit in a 5 pound bag.
- A.P.
"Remember when the U.S. had a drug problem, and then we declared a War On Drugs, and now you can't buy drugs anymore?"
Kneel Before Zod!
Art Schools Dietzilla
Who else instantly thought of, "640 K ought to be enough for anybody", uttered by the chief architect of twenty years of chaos?
Until now it does sound just like raid, but:
I guess I just don't get it; I know they are talking about logical corruption and not a physical failure, but this is kind of like raid with somethink like SMART, or isn't it?
And what kinds of corruption can there be? Journaling filesystems already work well for write errors and such, or so I thought.
I know the architecture seems innovative and different (at least for me), but is there really new functionality?
Sorry if I seem ignorant this time. I don't know if I was able to get my point across; the things this filesystem does, wouldn't they be better left on a different layer?
O make me a mask
I've been working on a file system (inspired by an old Signetics memory device) that's likely to *really* be the last word. It's still in alpha because I'm having trouble verifying its functionality, but it seems to work very well so far.
I call it WOFS.
Such a feature would rock, because it would be possible to make things like installers completely atomic: interrupt the installer process and the whole thing rolls back.
Use 'slashdot stuff' in the subject line in any email you send me if you want to get past the spam filter.
... ZFS will also make you forget everything you knew about English grammar.
"We've rethought everything and rearchitected it," says Jeff Bonwick
Rearchitected? WTF? Howsaboot "Redesigned?"
I'm still wrapping my brain around "adaptive endian-ness" as well.
--QTone
Soon it will show one hit!
Looks like Sun went out and redid their filesystem based on the performance characteristics of machines today, instead of machines of yesteryear.
Some highllights, for those that don't (or won't) RTA:
* Data integrity. Apparently it uses file checksums to error-correct files, so files will never be corrupted. About time someone did this.
* Snapshots, like netapp?
* Transactional nature/copy-on-write
* Auto-striping
* Really, Really Large volume support
All of this leads to speed and reliability. There's a lot of other stuff (varying blocks sizes, write queueing, stride stuff which I haven't heard about in years), but all of it leads to above.
Oh, and they simplified their admin too.
It's hard to make a filesystem look exciting. Most of the time it just works, until it fails. The data checksum stuff looks interesting, in that they built error correction into the FS (like CDs and RAID but better hopefully).
It might also do away with the idea of "space free on a volume," since the marketing implies that each FS grows/shrinks dynamically, pulling storage out of the pool as needed.
Any users want to chime in?
Looks to me like nothing more than an excuse to put up a patent tollboth for anyone who wants to implement ZFS.
"Sun's patent-pending "adaptive endian-ness" technology"
ok, that aside. First 128bit file system, and get this: transactional object model
I think this means it is optimistic but they figure it has blazing fast performance, who am I to argue. Fed up with killing this indexing garbage on the work machine, bloody microsoft, disabled it and everything and every full moon it seems to come out and graze on my HDD platter.
From the MS article : This perfect storm is comprised of three forces joining together: hardware advancements, leaps in the amount of digitally born data, and the explosion of schemas and standards in information management.
Then I started to suspect they would rant about moores law and sure e-bloody-nough
Everyone knows Moore's law--the number of transistors on a chip doubles every 18 months. What a lot of people forget is that network bandwidth and storage technologies are growing at an even faster pace than Moore's law would suggest.
That is like saying, everyone knows the number 9 bus comes at half 3 on wednesdays, but noone expects 3 taxis sat there doing nothing at half past 3 on a tuesday.
Can we put this madness to rest? Ok back to the articles.
erm... lost track now....
#hostfile 0.0.0.0 primidi.com 0.0.0.0 www.primidi.com 0.0.0.0 radio.weblogs.com
Something like:
:)
SELECT * FROM storage WHERE path = '/home/gorbachev/.cshrc'
In Soviet Russia, I ruled you
It would take over 500 years to fill a 64 bit filesystem written at 1GB/sec (and of course 500 years to read it back again). 64 bits is already an impossibly large figure. There's absolutely nothing special or clever whatsoever about doubling the size of your pointers aside from using up more disk space for all the metadata.
64 bits is enough for today's filesystems in much the same way that 256 bit AES is enough for today's encryption - there are far bigger things that will require complete system changes than that so called "limit". I suspect a better filesystem will come along well before those 500 years are up... I agree with grandparent:
-1, Marketing Hype.
I was going to respond to the article, but I forgot everything I know about file systems.
You have enemies? Good. That means you've stood up for something, sometime in your life. --Winston Churchill
Though it before xyzzyfs, it is the last because it automatically generates and collects porn. Most geeks would never get past it.
Fight Spammers!
Then why didn't IBM call its improved HFS "HFS Plus"? No wait, that would collide with Apple's HFS and HFS Plus, used in Mac OS.
It would appear that there can be only twenty-six distinct file systems. Then Microsoft went and innovated NTFS with Four-Letter-Word File System Technology, which actually was just a copy of IBM's HPFS, the first to introduce File System Named After a Competitor Technology.
You organize a 128bit file system with a database.
Why bother with folders as a root? You can create a folder hierarchy *with* a database too.
GPL Deconstructed
So what are the chances that someone could accidentally wipe the shared data pool for an entire company and how hard is recovery on a volume striped across a few hundred hard drives?
This article is shocking. I'm used to much less hype and far more technical details from Sun. Software patents and bullshit are not what I expect when I follow a link to them.
I don't like any of this.
Friends don't help friends install M$ junk.
It's a 128-bit filesystem, so doesn't that make it the last 8 words?
"No matter where you go, there you are." -- Buckaroo Banzai
ZFS achieves its impressive performance through a number of techniques:
* Dynamic striping across all devices to maximize throughput
* Copy-on-write design makes most disk writes sequential
* Multiple block sizes, automatically chosen to match workload
* Explicit I/O priority with deadline scheduling
* Globally optimal I/O sorting and aggregation
* Multiple independent prefetch streams with automatic length and stride detection
* Unlimited, instantaneous read/write snapshots
* Parallel, constant-time directory operations
ZFS has some similarities to NetApp's WAFL in that it uses "copy on write".
One of the fun things with ZFS is that it automatically stripes across all the storage in your pool. Disk size doesn't matter - it's all used. This even works across SCSI and IDE.
One of the important things is that volume management isn't a seperate feature. Effectively, all the current limitations of volume managers are blown away:
Just as it dramatically eases the suffering of system administrators, ZFS offers relief for your company's bottom line. Because ZFS is built on top of virtual storage pools (unlike traditional file systems that require a separate volume manager), creating and deleting file systems is much less complex. Not only does this eliminate the need to pay for volume manager licenses and allow for single support contracts, it lowers administration costs and increases storage utilization.
ZFS appears to applications as a standard POSIX file system--no porting is required. But to administrators, it presents a pooled storage model that eliminates the antique concept of volumes, as well as all of the related partition management, provisioning, and file system sizing problems. Thousands--even millions--of file systems can all draw from ZFS' common storage pool, each one consuming only as much space as it needs. The combined I/O bandwidth of all of the devices in that storage pool is always available to each file system.
This is also part of the stuff making admin and configuration far far simpler. The thing I like is that it should be far harder to go wrong with ZFS (not available in Solaris Express yet so I haven't seen this for myself).
The very high degree of reliability as standard is very welcome too:
Data can be corrupted in a number of ways, such as a system error or an unexpected power outage, but ZFS removes this fear of the unknown. ZFS prevents data corruption by keeping data self-consistent at all times. All operations are transactional. This not only maintains consistency but also removes almost all of the constraints on I/O order and allows changes to succeed or fail as a whole.
All operations are also copy-on-write. Live data is never overwritten. ZFS writes data to a new block before changing the data pointers and committing the write. Copy-on-write provides several benefits:
* Always-valid on-disk state
* Consistent, reliable backups
* Data rollback to known point in time
"We validate the entire I/O stack, start to finish, no guesswork involved. It's all provable data integrity," says Bonwick.
Administrators will never again have to run laborious recovery procedures, such as fsck, even if the system is shut down in an unclean fashion. In fact, Solaris Kernel engineers Bill Moore and Matt Ahrens have subjected ZFS to more than a million forced, violent crashes in the course of their testing. Not once has ZFS lost data integrity or leaked a single block.
For more technical info see Matt Ahrens's and Val Henson's blogs - since they're among the engineers who worked on it.
The codename for the first generation of Novells current filesystem was ZFS. Why? because it was supposed to be "the last, or final word" in file systems.
Novell now Novell Storage System (I think it used to be NetWare Storage System).
Apart from the obvious fact that SUN didnt manage to be very original in naming their filesystem, its noteworthy that Novell is porting their ZFS - now NSS - to Linux. It'll be part of Novell Open Enterprise Server - on both Linux and NetWare kernels.
From the top of my mind, here are some features of NSS that SUN needs to exceed to qualify for a new "final word..":
- Background compression
- Fast on-demand decompression
- Transactions
- Pluggable Name spaces
- Pluggable protocols (ie. http, nfs, etc)
- Advanced Access control model with inheritance, rights filters, etc. integrated with directory service (duh!)
- Quotas on user, group, directory level
- 64-bit (ok, SUN obviously got that one)
- mini-volumes
- journaled
- etc.
oh well, I wont bother continuing, but its worth looking out for NSS. Hopefully Novell will open source it and not make it exclusive to their distros.
Right now there are a lot of file systems that do somehing not all that different than what Sun is proposing. The project I am on is evaluating them as we speak for a center wide filesystem. I've had the fun (no sarcasm, honestly) of setting up a number of different onces and helping to run benchmarks and tests against each. All of them have strengths. Every single one of them has some nasty weaknesses.
If you are looking for an open source based cluster file system, Lustre is what you want. It's supported by LLNL, PNNL, and the main writers at ClusterFS Inc. It's a network based cluster FS. We've been using it over GigE. However, we've found that there needs to be a ratio of 3:1 for data server:clients for a ratio. Wehave only used one metadata server. Failover isn't the greatest. Quotas don't exist. it also makes kernel mods (some good and bad) to do a mild fork of the linux kernel (they put them into the newer kernels every so often). It only runs on Linux. Getting it to run on anything else looks...scary.
GPFS runs on AIX and Linux. Even sharing the same storage. It runs and is pretty stable. it has the option to run in a SAN mode or network based FS. In the latter form, it even does local discovery of disks via labels so that if a client can see the disks locally it will read and write to them via FC rather than to the server. It, however, is a balkanized mess. It requires a lot more work to bring up and run: there is an awful lot of software to configure to get it to run (re: RSCT. If you haven't had the joys of HATS and HAGS, count yourself very, very lucky).
ADIC's StorNext software is another option. This one is good if you are interested in ease of installation, maintanence, and very, very fast speeds (damn near line speed on Fibre channel). I have set this one up for sharing disks in less than two hours from first install to getting numerous assorted nodes of different OS's to play together (Solaris, AIX, Linux). It freakin on virtually everything from Crays to Linux to Windows. It's issues seem to be scaling (right now doesn't go past 256 clients) and it has some nontrivial locking issues (righting to the same block from multiple clients, and parallel I/O to the same file from multiple clients if you change the file size).
There are some others that are not as mature. Among them are Ibrix, Panasas, GFS, and IBM's SANFS. All of them are interesting or promising. Only SANF looks like it runs on more than Linux though at this point. Our requirements for the project I am on are to share the same FS and storage instance among disparate client OSes simultaneously. This might not be the same for others though and these might be worth a look. Lustre dodges this because its open source and they're interested in porting.
Do you know why the road less traveled by is littered with the bones of the unwary?
The last word in file systems is "systems".
Thank you.
Please read my Canon EOS tech blog at http://www.everyothershot.com
True. However, it is more ambiguous than "million million million", as absent minded Brits might interpret it as a "million million million million".
Or would you rather they say 6.0 × 10^18?
Yes.
Most people can't imagine that.
Most people can't imagine it anyway, whether you call it "six billion billion", "6.0 x 10^18", "6 x 2^60", or "1.27 x e^43". Or understand any number higher than the number of dollars they carry in their wallet, for that matter. Anyone who needs to make any decisions in life based on this ZMS number ought to be able to understand it any of those ways (although getting help from a calculator for the last one or even two is understandable). Of course, many people manage things they can't understand. This is life.
//Information does not want to be free; it wants to breed.
As someone who's been involved with performance/stress optimizations I can tell you that for each situation you can carefully put together two types of tests: one which proves that there's a problem, another that proves the problem doesn't exist.
The proof is in the pudding. Let Sun release it and administrators use it for a year or two, then we'll see if it's good enough. Right now I'm having doubts it's as good as they want you to believe.
Two words:
"Patent burdened"
Logically, the next question is if ZFS' 128 bits is enough. According to Bonwick, it has to be. "Populating 128-bit file systems would exceed the quantum limits of earth-based storage. You couldn't fill a 128-bit storage pool without boiling the oceans."
Well...I never really like the oceans anyways. They were always so wet.
If anyone wants to read more details on the "Zettabyte File System" they can view the white papers on ZFS self-tuning and QOS as they contain far more detail than the marketing article given.
Says he needs a new wallet...
..oh wait, he does.
If Bill Gates had a nickel for every time Windows crashed...
It takes 40+ muscles to frown, but only four to extend your arm and bitchslap the motherfucker
"We're absolutely trying to make disk storage more like memory, and often use that analogy in our presentations. For example, when you add DIMMS to your computer, you don't run some 'dimmconfig' program or worry about how the new memory will be allocated to various applications; the computer just does the right thing. Applications don't have to worry about where their memory comes from. Likewise with ZFS, when you add new disks to the system, their space is available to any ZFS filesystems, without the need for any further configuration. In most scenarios it's fairly straightforward for the software to make the unequivocably best choices about how to use the storage. If you want to tell the system more about how you want the storage used, you'll be able to do that too (eg. this data should be mirrored but that not; it's more important for this data to be accessed quickly but that can be slower). We hope that with relatively modern hardware, all but the most complicated and demanding configurations will be handled adequately without any administrator intervention." read more
After years of everyone saying that the relational model was the answer to all data organziation needs... the hierarchical model reappeared in the form of XML, and people realized that it is convenient to organize some types of data hierarchically.
Convenient, and flawed.
XML isn't designed to handle changing data. It's designed to be a data markup language, which indicates it's used for presenting data, not managing data.
So far, the relational model is the best mathematically-rigorous method of managing sets of data. There are many advantages to hierarchical data representation, but for manipulation, the relational still trumps.
Do I want to use SQL to access my files? Not if I don't have to. There are perhaps better methods, even some transparent methods.
But, do I want to continue to self-organize my data? Hell, no! There's just too much information stored on my computer, and on my network, these days. And, considering that much of my data has multiple relationships, the hierarchical model is growing a bit long in the tooth. Many of my documents belong in multiple hierarchies.
But, there might be a real solution soon:
Gnome Storage looks to be a good first step.
Microsoft is to software what Budweiser is to beer.
Speaking of numbers no one can pronounce....
One of the key feature of ZFS is that you can create a file system over a pool of storage. Nothing stops you from building a distributed storage pool of 18.3 million desktop drives (they don't have to be locally connected). You could apply the same concept as SETI@HOME and allow end users with excessive storage space to lend them. Didn't someone talk about a peer to peer backup system a while ago?
And com'n, don't be so against hypes. Not all numbers are evil. And the overhead to process some extra bits are miniscule. The space and time required are in logarithmic time to the size of the number set. E.g., 128-bit is some billions billion times the size of 64-bit, but only takes 2 times more to store and process. And this time is already small compared to the actual I/O time, and the space compared to combined storage space.
I once had a signature.
You can find some more technical information about ZFS in my weblog. Check out the comments to my first entry about ZFS, there are a few juicy details there and I'll do my best to answer any questions posted to my blog.
Disclaimer: I work on ZFS at Sun.