IBM's JFS & PTh-NG Reaches 1.0
jd writes "IBM's Journaled Filing System becomes the second commercial filing system for Linux to reach the exalted 1.0 status! It also follows close on the heels of another of IBM offering, the PThreads Next Generation project, which also hit 1.0 today." Check out this LWN story on it as well. It's worth noting that this is a free as in open source version - GPLed. There will be another commericial version as well.
AFAIK the IBM "Next Generation POSIX Threads" (NGPT) is mainly a more conforming implementation of POSIX Threads than the older Linux Threads.. It's still based on "kernel threads" (i.e. clone()'d processes), but I assume the reference to M:N threading means that it can map a number of user level threads onto a different number of kernel threads in the same way Solaris maps user threads onto OS Light Weight Processes (LWPs).
There's an interesting update to the clone() man page that indicates some of the recent clone() enhancements that were added to allow full POSIX thread implementation:
http://lwn.net/2001/0628/a/man-clone.php3
Ahhh, I've noticed this
But, it's not as bad as you think
I use 2.4.5-acsomethingorother
It has threaded coredumps - which is nice, until of course you get one (well "one" - it dumps every single core seperately for each cloned process)
So you have to manually go through them and bt them - but it does work, it's simply fiddly, and you can't easily check other threads without seperately reading in the cores.
In terms of threaded processes however, gdb is a lot better - indeed, the worst thing we've been bitten by is some of the bugs that it has with C++ and static members in superclasses.
gdb CVS seems competant with threads - I've had it weird out on me occasionally, but generally it's capable of doing the job.
gdb 5.0 isn't capable (at least, not the one debian ship), but the CVS version is a damn site better.
HTH
Again, using thousands of threads is not going to get the best performance out of it, if it does not actually have thousands of CPU's or network cards or disks. A reasonably small pool of threads is all it takes. You need a thread for each peripheral you ever need to wait for, and each processor you need to keep busy, and then some.
There is no special set of processing principles for a $100,000 server which say that you can stuff in any number of threads. Of course, with a larger address space, memory and greater memory bandwidth, you can be more wasteful and notice it less.
The threading library LinuxThreads has to play the violin while standing on its head to make threads look POSIX-like. The kernel people look upon POSIX threads as braindamaged, and they are largely right.
In my view, most of the noncompliances that exist actually make LinuxThreads a better library. For example fcntl() locks are owned by threads rather than the ``process''. This is more natural to program with; you don't want a locking request by one thread to just proceed even though it overlaps with an existing lock held by another thread in the same process! Think of the race condition bugs that can cause. Yet that's what POSIX requires. It's evident that POSIX is largely driven by vendors who have operating systems on which it's easier to hack threads this way, as sort of dadoes that are engraved into processes.
Another braindamaged area of POSIX is that all threads share the same current working directory. This is upheld by LinuxThreads using CLONE_FS,
but in principle it doesn't have to be. Again, think of the bugs this can cause! One thread does a chdir(), and the file accesses done by another thread go beserk into another directory.
Then there is the whole problem of security contexts. In POSIX, your effective and real user and group ID's are process-wide; if a thread changes user ID, it changes it for all threads. Like chdir, changes in user ID *need* to be done in a procedural discipline, regardless of what you may think of the idea of having multiple security contexts in one address space. Think you can fork() around the user ID problem? If you fork() in a POSIX threaded process, the child can use only async-safe functions, or the behavior is undefined. If you need to do more things, you must exec() a new image.
On the other hand, there are some reasonably nice behaviors in POSIX that don't work on Linux, like doing a fork() in one thread, but doing the wait() in another. (Workaround: make a fork server thread which handles fork and wait requests on behalf of others).
The way I see it, this IBM project is just reinventing LinuxThreads with a few differences, the biggest being M:N threading. If you look at their list of known issues listed in the release notes, it's about the same as for LinuxThreads: no process shared mutexes, no process identity for threads (except when only one underlying system task is used). These two are the biggest sources of complaints from some LinuxThreads users.
The rest of my comments have to do with M:N, specifically the false claim that M:N provides a performance enhancement for all multithreaded applications.
I don't believe that M:N threading is a good idea It creates issues and complications in the library implementation. The responsibility of scheduling and dispatching is divided between user space and the kernel instead of being done in one place. What M:N threading does is speed up voluntary context switches---context switches that take place within a threading function that is directly or indirectly invoked by the application, such as a synchronization function (pthread_cond_wait, pthread_mutex_lock, etc). Such a function can just call the user space scheduler, which can dispatch a thread without cooperation from the kernel. This is how M:N reduces overhead.
M:N does nothing for involuntary context switches, which have to somehow go through the kernel (for example, a signal is delivered to the process, which swaps context so that returning from the handler will cause a new thread to run). Actually, this kind of context switch can be worse than the ordinary Linux kernel context switch, depending on how it is done. The current task is interrupted to run kernel code (transition 1). Then a handler in user space is dispatched (transition 2). Then the handler returns to the kernel (transition 3) then the kernel passes control back to user space with a new context (transition 4). On the other hand, a native context switch is just two transitions: interrupt the current task, and dispatch the new one. In any case, the kernel is involved in the involuntary context switches of the so-called ``user space'' scheduler, which puts their expense in about the same ballpark as a kernel task switch (within the same address space).
So what about the faster voluntary context switches that M:N provides? Unfortunately, most of the *useful* voluntary context switch points are in the kernel: such as blocking calls that wait for I/O or real-time events. So M:N does nothing for I/O or response to real-time inputs. Dispatching a response to the completion of I/O or a real time input always requires a switch from the kernel to the user process.
M:N also does nothing for compute-intensive multithreading that is done *sanely*. Sure, M:N may speed up a program that performs, say, some operation on a large matrix using 50 threads on two processors, because when these threads synchronize, it can be done using those fast voluntary context switches. But M:N will do nothing for a program that uses two threads over two processors to do the same thing, which is the more sane design.
As a rule, the number of threads in an application should be not much more than the minimum that is required to utilize the various functional units of the hardware (processors and peripherals). Using too many threads just causes wasteful context switches that accomplish nothing, increases the memory access footprint of the application (since each thread has its own private data areas, such as the stack), and causes the scheduler to have to choose from among more threads.
It's not worth trying to speed up brain-damaged applications that make poor use of threads, yet this is exactly what M:N is for.
Hard to say. Red Hat like to use all sorts of experimental stuff, so they're one possibility. On the other hand, I'm going to make a wild guess and say Debian'll be the first distro to provide the packages as an option.
It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
Shhhhh! You're giving TiVO ideas!
It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
Well..... if you want to stretch a point, it's actually at 1.0.1pre.
It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
--
Care about electronic freedom? Consider donating to the EFF!
I have had no success with setting break points and getting them to trigger. And I have absolutely no luck using core dumps generated when the program was not run under gdb. This is made more annoying because gdb changes something so that the code that I can crash easily when run directly does not crash when run under gdb. Pretty annoying!
Anybody have an explanation of what is going on?
PS: also don't use "-march=i686" in a multithreaded program. Is this a gcc bug?
Linux should ignore stupid standards. All information that is stored in the os should be seperate copies for each thread. There is no reason to share even file descriptors (they should be duped during clone and each process can seek() individually) The only things threads should share is their virtual memory. Light-weight threads can still be implemented by having calls like setuid() force the thread to become a heavy thread.
I would also like to see them get rid of the locks they put around virtually every system call (such as getc()!!!). The overhead of these is insane, and I really really can manage to do this on my own if I really have multiple threads reading from the same file.
Linux threads are 1:1 mapped with kernel threads. This is great for I/O intensive operations because they can be independently scheduled by the kernel. If one task is bloking on some thing or other in the kernel, the others can be scheduled just fine. The problem comes with scalability; scheduling a kernel thread is just as expensive as scheduling a process, so if you have many many threads and relatively few processors things can get slow because the task switching/creation overtakes the performance.
The solution to this is to have both kernel threads and userland threads. Things that don't block and/or require other things that block the kernel should be spawned as userland threads (they are about an order of magnitude faster for both scheduling and creating), and things that will block the kernel (network I/O, file I/O, etc.) should be setup as kernel threads. This makes life easier on the scheduler and also makes GUI applications "feel" faster than they currently do. Solaris has been doing this for quite some time already.
The wheel is turning but the hamster is dead.
The wheel is turning, but the hamster is dead.
Process creation under Linux (>= 2) is cheaper than on many other systems, however. It's (usually) still a net win.
OTOH, for _kernel level_ threads, you need more than just a register context and stack -- the scheduler ends up needing a fair amount of bookkeeping data anyway.
Linus just decided that there's not much point in making threads a special case ("thread-like" behavior depends on which of several things are shared, specified by flags).
Things get a lot simpler that way, and is probably one reason why thread and process creation are so cheap on Linux.
Even the recently added CLONE_THREAD stuff is a pretty generalist approach -- it just controls whether or not threads (including those considered "processes") share a "thread group" (the id reported by getppid(2) in 2.4).
DNA just wants to be free...
Solaris has had a similar capability since 7. You just stick the logging option in your vfstab and you're off. Its not vxFS, but its quick and easy, and a lot more robust than no logging at all. It will be nice when Linux users can do the same. Playing with a new fs is all very well if you have the free space and time to implement it. Sticking one word in a config file is a lot easier though ;)
[punch /tmp]$ diff -u xfs_announce jfs_announce
... It is a
--- xfs_announce Tue May 01 08:19:51 2001
+++ jfs_announce Thu Jun 28 14:30:02 2001
@@ -1,10 +1,12 @@
-SGI is pleased to announce the 1.0 release of XFS, high-performance
-journaled file system for Linux.
+IBM is pleased to announce the v 1.0.0 release of the open source
+Journaled File System (JFS), a high-performance, and scalable file
+system for Linux.
-http://oss.sgi.com/projects/xfs/
+http://oss.software.ibm.com/jfs
-XFS, widely recognized as the industry-leading high-performance
-filesystem, provides rapid recovery from system crashes and the
-ability to support extremely large disk farms.
-mature technology that has been proven on thousands of IRIX
-systems as the default filesystem for all SGI customers.
+JFS is widely recognized as an industry-leading high-performance
+file system, providing rapid recovery from a system power outage or crash and the
+ability to support extremely large disk configurations. The
+open source JFS is based on proven journaled file system technology
+that is available in a variety of operating systems such as AIX and
+OS/2.
;-)
For better or for worse - I also watched the NetBSD people just about go ballistic when he said Linux has been ported to every platform ("that matters" was probably implied).
XFS and JFS are both sold as part of a commercial (as opposed to "open" or "free") operating system. Reiserfs is not.
The Math Maestro Tutoring Services in Seattle
Am I right in saying that Linux threads
are kernel level threads and the IBM/GNU threads are user space threads?
What's wrong with Linux threads? I've been using them for a while. They seem to work for me.Why are the IBM ones so much better?
Since they're both posix threads (pthread_create()) how does one determine which one will be used when both are on one system?
AFAIK (which ain't much), ext3 has this as a design consideration, or at least as part of the implementation.
I was tempted to wait, but got impatient and used ReiserFS. Perhaps when I upgrade the 20GB drive in there, I'll switch.
Jesus was all right but his disciples were thick and ordinary. -John Lennon
You can/should be able to replace the WinME box with a Linux box (but the initial setup may require a Win box. My DSL was setup with NT4.0, but now runs on Linux).
A linux box can be made to read PC disks (Dos, Win3.1->NT), Amiga disks, Mac disks, and a bunch of others.
There is zero problem accessing files over the network amongst these systems. Depending on the mix, you can use NFS, SMB, AppleTalk, ftp, etc.
Finding your NIC is the first step. In the past, I would have reccommended linuxnewbie.org to ask that sort of question. Not sure where these days.
BTW, treat your father like the head of a major corporation: do what you have to do. Don't tell him you switched computers on him. Chances are he'll never notice. I take it you are still in school (be it elementary or university. It doesn't matter) and now have some time off for the summer. Presumably your father works. That gives you plenty of time to fiddle with things. Eventually, you'll get it right, and just leave it on.
Jesus was all right but his disciples were thick and ordinary. -John Lennon
I'm not sure why anyone (other than Micro-Soft) would think this would be a sign of forking. Presumably, all will wind up in the kernel, and all will be either compiled in or available as modules for all the major distros. "mount -t auto yadda yadda yadda" should take care of this.
This actually makes Linux stronger, as it provides choice. Got an old system: use ext3. Want speed: use ReiserFS. Want (I don't know. Sorry): use JFS, XFS, fooFS.
This isn't the either/or situation of FAT, FAT-32, and NTFS (not sure if XP will add another FS standard). All versions of the GNU/Linux OS should have the ability to read all of the filesystems.
You do, however, have a point that there is some overlap. I am not a part of any of the devel teams (or any devel teams for that matter) so it would seem, at least from a lay perspective, that having (just for example) Hans, the ext3 group, etc working on and submitting patches for JFS would result in the strongest solution. But there is nothing to preclude this from happening (except, perhaps, for Hans' ego and financial plans. And I don't know how open IBM is. And...)
Anyway, this is fairly early in the development, and the choice of filesystems is hardly as major a concern as the Gnome/KDE schism WRT forking possibilities.
Jesus was all right but his disciples were thick and ordinary. -John Lennon
Speaking as an ex-AIX worshipper (I have now been converted to the One True OS and it's apt-get luvin' incarnation), IBM JFS has one MAJOR attraction for me: ease of filesystem modification.
I haven't tried JFS/Linux yet, but after years of expanding filesystems on production systems without outage ("chfs -a size=+blocks /fs/mount/point"), porting this to Linux is all it'll take me to switch. Performance? Pah! nice to have, but if you're reliant on FS performance you need to splurge a few beans on more memory. Ease of administration is the major win with JFS here folks....
--
I'd rather have a bottle in front of me than a frontal lobotomy
I just installed JFS to benchmark it. It seems to be a bit buggy : sometimes the keyboard blo
-- Pure FTP server - Upgrade your FTP server to something simple and secure.
{{.sig}}
Multiple filesystems don't add to fragmentation, since they are all interface compatible. When multiple things have the same interface (whether they be software components or cars) multiplicity is called 'choice'. When they have different interfaces (GNOME, KDE, etc) that is 'fragmentation'.
A deep unwavering belief is a sure sign you're missing something...
Actually, XFS is dynamically resizable too. And fast at that.
BTW> Is it just me, or do ugly people and slow systems have something in common...
A deep unwavering belief is a sure sign you're missing something...
Whenever a potentially interesting tech article comes up on /., all we get is a bunch of pansy-assed posts about licenses or whatnot. This is news for NERDs. Where are the hard-hitting questions? Why hasn't somebody downloaded the damn thing and posted benchmarks comparing the major JFSs? (I'm working on it!) How fast is it? How stable is it? How easy is it to install? How does it work internally? Good grief, you'd think you'd get something meaty on a discussion like this...
A deep unwavering belief is a sure sign you're missing something...
the second commercial filing system for Linux to reach the exalted 1.0 status!
Is that something I would run on my filing cabinet?
Well, we've got ourselves a new filesystem to play with, guys, and IMHO a pretty cool one, at that. I'm a summer intern with the JFS team at IBM, so I'm definitely looking at this subjectively. However, I've been testing it's stability for the past month or so, and have been pleased with the results. (It'll pass RedHat's "cerberus" kernel stressing tests.)
Oh, and there are more features on the way, so check the JFS website every once in a while!
-Will the Chill
Creator of RPerl, Scouter, Juggler, Mormon, Perl Monger, Serial Entrepreneur, Aspiring Astrophysicist, Community Organiz
> As someone who can't use Linux (we have a WinME box sharing our cable modem connection), I have often wondered how compatable are such things as different file systems? Can a linux box read a PC floppy or HD?
if you mean windows/dos formated floppy/hd by saying PC floppy than the answer is yes, it can read PC floppy or HD. > How about one for Mac?
not sure about this one , but i think it has support for mac filesystem.
> Can a redhat box access files from a Mandrake one?
yes. only difference between different linux distributions a) location of config files b) some distros are using bsd init (slack ?)instead of sysV init like rest of the the distros c) packaging system d) amount of apps shipped with the distro.
> Can I get a linux box to access the internet through the Windows network?
i'm no expert in this area but i believe you can do that with the help of samba. try looking at samba.org for more info. there is also free samba book published by oreilly on their site. good luck
-- http://electronicintifada.net --
- ADFS -- RiscOS
- Amiga FFS
- Apple Mac
- BFS -- Boot sector thing for SCO UnixWAre
- DOS FAT
- Fat16/32
- NTFS
- EFS -- Pre Iso9660 filesystem for CDROMS
- Ramdisks
- ISO 9660 for CD's. Plus the MS Joliet extensions
- Minix FS -- Nostalgia I suppose.
- FreeVxFS -- main FS in Sco UnixWare the docs say.
- HPFS -- OS/2's filesystem.
- QNX4FS -- for QNX Systems
- ext2 -- most common Linux filesystem
- System V filesystems, for SCoO, Xenix, and Coherent
- UDF -- for DVDs
- UFS -- for SunOS, FreeBSD, NetBSD, OpenBSD, NeXTstep.
And now we're getting ReiserFS, JFS, and XFS from SGI is on the way. Ext3 is out there too but not used all the time.So yeah, you can read Win32 formatted disks and just about everything else under the sun.
And you don't have to use Windows to use @home. There's plenty of documentation out there on this.
ReiserFS is funded by Namesys, a for-profit corporation AFAIK.
What about ReiserFS and XFS? Or is there some weird meaning to "commercial filing system" that I don't get?
Uh, I'm talking about the "wakka wakka" article, not mine. You know, the gut-bustingly funny one.
The evaluation of an action as 'practical' . . . depends on what it is that one wishes to practice.
A post with Score: 6 has appeared on slashdot before (anyone remember it?). This suggests that there was a race in slashcode; and since slashdot still uses a non-transactional data store, I bet the it's still there. I think everyone gets where I'm going with this.
So, here's the plan: First, someone moderates this back down to 4. Second, everyone with mod points synchronizes their clocks to UTC (apt-get install ntp). Third, everyone picks an appropriate (positive, smartass) moderation for this article and moves their pointer over the "Moderate" button. Fourth, at exactly midnight (00:00:00 UTC, June 29), everyone clicks "Moderate".
Everyone without mod points places bets on how high we can get this sucker.
The evaluation of an action as 'practical' . . . depends on what it is that one wishes to practice.
A lot of people are using Linux distributions (e.g. RH) that don't come with journaling file system by default (argh....). Sure, we can create a dummy partition, and more the system to it, and create the journaled file system (JFS, ReiserFS, ...), and re-install on that, blah blah blah... it's that simple.
But hey, we got production system out there, and the disk has no space for new parition anymore. And we have patched and configured the system to work exactly like we want (performing, reliable, and secure), and we have no intention to re-install things and migrate. And I have a VAIO 505FX that does not have spare disk space to for use to install journaling FS, and I don't want to go thru all the hassle of re-installing everything.
So, a FS conversion tool would be really nice, like that DOS to NTFS thingy.
I'm sure the group (IBM, Reiser, SGI, Ext3) that comes up with the first will have a first-mover advantage and more users.
Amen brother. AIX rocks.
I wish they would port the AIX LVM. The linux lvm is a load of shit.
Conformity is the jailer of freedom and enemy of growth. -JFK
"Make it so."
XFS is *on the way*? I run three computers with XFS and will be installing a fourth shortly. Take a look at their site - not only are they at 1.0, but there's a very useful install disc for RH7.1!
I don't care if you call your filesystem fooFS; please don't call it BarFS.
Got friends?
I've got IP routing working between a Windows box with a cable modem and a Mac. Took about fifteen minutes to set up.
"What are we going to do tonight, Bill?"
www.lucernesys.comHorizon: Calendar-based personal finance
GPLGP
LGPLGPL
GPLGPLGPL
GPLGPLG
PLGPLIBM
GPLGPLG
PLGPLGPLG
PLGPLGP
LGPLG
IBM is just trying to get Linux as stable as AIX. I think they are going to get rid of AIX in a few years, and they hope Linux will be there for enterprise applications. Good for IBM: Free software development. Good for Users: An open source stable operating system with commercial goodies.