Oracle To Bring Dtrace To Linux
mvar writes "Dtrace co-author Adam Leventhal writes on his blog about Dtrace for Linux: 'Yesterday (October 4, 2011) Oracle made the surprising announcement that they would be porting some key Solaris features, DTrace and Zones, to Oracle Enterprise Linux. As one of the original authors, the news about DTrace was particularly interesting to me, so I started digging. Even among Oracle employees, there's uncertainty about what was announced. Ed Screven gave us just a couple of bullet points in his keynote; Sergio Leunissen, the product manager for OEL, didn't have further details in his OpenWorld talk beyond it being a beta of limited functionality; and the entire Solaris team seemed completely taken by surprise. Leunissen stated that only the kernel components of DTrace are part of the port. It's unclear whether that means just fbt or includes sdt and the related providers. It sounds certain, though, that it won't pass the DTrace test suite which is the deciding criterion between a DTrace port and some sort of work in progress.'"
So, are they porting Solaris functionality to OEL as a precursor to phasing out Solaris entirely? It would suck to see Solaris go from a nostalgia point of view, but it never made much sense to me why one company would continue to develop two Unix-like operating systems.
If you want Dtrace and ZFS, just go with FreeBSD. You get pf and jails thrown in for the effort.
This is a great technology story - even if only for one version of Linux so far. DTrace will bring tremendous value for troubleshooting and performance analysis, and is a technology I use (almost) every day.
For example, yesterday I had a CPU bound workload with an unexpected level of variation, and used DTrace to measure the effect of CPU thread affinity and interrupt activity on that workload. I used DTrace to pull the runtime along with other details: number of scheduling events for that thread, along with the CPUs that the thread ran on; also, for preemption, the pre-emptor thread (to see why) along with both its user-level and kernel stack traces; also the interrupt thread and device. I fairly quickly showed that the runtime variation was caused by network interface interrupts from an entirely different application. This analysis would take quite a lot longer without DTrace, and may be prohibitively difficult to complete.
Many of my uses of DTrace are much more straightforward than that; including identifying file system latency for applications, application response time, and CPU dispatcher queue latency. I've listed many more examples in the DTrace book (http://www.dtracebook.com). It should be a great resource of ideas for those looking to use DTrace on Linux - since the hardest part for people has been knowing where to start, given the ability to see everything.
And why do you want to keep the raid and LVM stack?
If you're creating a filesystem and you can make it aware of its own backing storage (and adjust stuff like block size - cause you know, there are disks with 4K sectors now), and have it manage caching by itself (and thus, be aware of how much memory the has, and how much of it is actually RAM and not virtual), and have it check for redundancy and do online checks and repairs - which you realize it's just awesome if you ever try to do fsck on a 20TB filesystem (and because it knows how much data it's actually used, have it only check what's used, instead of blindly regenerating blank space for an array of disks). And variable strip size, and thin provisioning, CoW and free snapshots and clones, and a lot of other stuff ZFS does because it doesn't need to "respect its elders" LVM and md.
You obviously haven't had to use both in anger. SystemTap is another "me too" project like so many things on Linux, where the only people saying it's as good are the people who haven't used the product it's an imitation of. Oh, and then there's the RMS type who will say it's "better because freedom has value" or something to that effect. Doesn't help you when you're actually trying to tune an application for performance.
The worst thing about the first complaint is that ZFS actually does have very clean layering. At the bottom you have the storage pool allocator at the bottom is basically malloc() for persistent storage (equivalent to the block device layer, but with a more convenient interface), the data management unit sits on top of this and provides transactional I/O to the underlying storage, and the ZFS POSIX layer sits on top of this and provides POSIX filesystem semantics. You could replace the ZPL with something that provides things that look like raw block devices (ZVOLS - used for iSCSI shares, among other things), and you could also fairly easily add something that provided an SQL interface to storage, since the transactional support is all done a layer lower down the stack.
The second complaint is really a matter of semantics. Scrubbing a ZFS partition provides a superset of the functionality of fsck. Not only does it validate and repair the metadata, it also validates and repairs the data on the disks. Most ZFS users don't need to run it, because all it really does is try to read every file on the disk - the underlying filesystem code performs the repairs automatically any time some read data fails the checksums.
I think the real problem with ZFS is the name. Calling it a filesystem makes people mentally class it as something like UFS or ext2fs, when it's actually a complete storage stack.
I am TheRaven on Soylent News
OpenSolaris may be dead, but FreeBSD now ships with version 28 of ZFS, which includes nice things like deduplication. iXSystems, which sells storage appliances based on FreeBSD, is funding ongoing development work, so ZFS on FreeBSD is going to stay actively maintained irrespective of whether Oracle makes future versions of the code available. FreeBSD also got the ability to boot from ZFS before Solaris and integrates ZFS very nicely into the existing storage stack (it works as a GEOM consumer and provider, so you can easily do things like put a FAT partition in a ZVOL and have transactional I/O and deduplication on the filesystem used by a Windows 95 virtual machine).
I am TheRaven on Soylent News
As long as the leave strace in place. Apple replaced ktrace with dtrace and I've been hating it ever since.
It's not that dtrace is bad, it's just that they have different purposes, and dtruss has several problems ktrace/strace does not:
Ok, rant over.
It has been available for Linux since 2008. 02-Aug-2008 Work in progress port of Sun's DTrace system for Linux. It is actively maintained. http://www.crisp.demon.co.uk/tools.html I don't see anything new to the table outside of keyboard, mouse, and framebuffer recording. I'm not sure a lot of Linux users would find that an attractive addition.
Built-in instruments can track
User events, such as keyboard keys pressed and mouse moves and clicks with exact time.
CPU activity of processes and threads.
Memory allocation and release, garbage collection and memory leaks.
File reads, writes, locks.
Network activity and traffic.
Graphics and inner workings of OpenGL.
Having to work for a living is the root of all evil.