Sun Wins Top Tech Innovation Award
Carl Bialik from WSJ writes "Sun's DTrace trouble-shooting software won top prize in the Wall Street Journal's 2006 Technology Innovation Awards competition. It's the second time in three years that Sun took the top award. From the article, which also names a dozen other winners: 'Where most debugging takes place as software is being developed, DTrace analyzes problems with systems that are in production — running a company's database, say, or executing stock trades. It does this with a process called "dynamic tracing," which enables a developer or systems administrator to run diagnostic tests on a system without causing it to crash. Before DTrace, such tests often took days or weeks to reproduce the problem and identify the cause. With DTrace, performance problems can be tracked to their underlying causes in hours, even minutes.'"
They are pretty much completely unrelated. I think you could get dtrace to do what strace does, but strace is a special-purpose tool of very limited scope. If you think they are comparable then you don't know anything about drace.
DTrace has a degree of OS integration that makes it non-trivial to copy, linux's alternatives don't even come close even though a tool like this would be very useful in linux.
For the foreseeable future, if you want to have this type of debugging on your server then the server has to run Solaris. And if your server is bigger than a 4-way then it makes sense that it's a Sun server.
There is value in premium gear, and while it won't make Sun the next Dell, it can hopefully help improve their standing in their core market.
And this is /. where folk think strace == dtrace
With strace can you trace everything from I/O operations through to system calls to monitor your live application without taking anything offline and get almost no performance hit?
Like it or not, dtrace is a huge innovation - it's also open sourced and coming really soon to an operating system near you. I think anyone involved in major application deployments is going to welcome dtrace and think it worthy of the award.
strace is more like Solaris's truss, except truss is quite a lot better. IMO dtrace is for more serious debugging, tools like truss & strace are quick and dirty tools for easy to solve problems where just knowing the system call and their return values is enough to diagnose the issue.
Several people have mentioned strace, but I have yet to see anyone mention oprofile. I haven't used dtrace before, but oprofile allows you to see where an application is spending it's time transparently, with negligible performance hit, and without restarting the application.
oprofile has been around since late 2002 it seems, so it's not particularly new either. How does dtrace compare to oprofile?
Game! - Where the stick is mightier than the sword!
I noted in my article Boxing in the LLRing, which despite positive responses Slashdot rejected in favor of Roland Piquepaille's daily column and various political commentary, that Squeak has an amazing debugger (I am not going to call it a full-blown analyzer) that allows you to debug applications as they are running on the very interesting Seaside application server.
As described in this paper (pdf), Seaside provides multiple control flows and a high level of abstraction that is very useful to web app developers.
The 4500 word article is coverage of a 300 developer "Lightweight Languages" all-day seminar held in a real boxing ring in Tokyo, covering 30 languages and frameworks including Perl, Python, Ruby, Haskell, OCaml, Squeak, and many others.
But a few points.
1) You need to boot bsd specially into a dtrace mode to use this. That presumably means that the BSD version either slows the system is isn't of production quality. When my database server is dying under the load, rebooting it isn't high on the list of things I want to do.
2) FreeBSD are pretty nimble at developing this kind of thing. I'm more curious to see how long it takes MS or Dell to have something comparable.
3) Sun provided the source and a development machine; presumably because of FreeBSD's favorable licensing. I'm not sure that's an option for any closed source product.
Dynamic instrumentation (you know -- the "D" in DTrace's name) has been in-use on the live air traffic control systems of several countries' Air Traffic Control systems (http://www.ocsystems.com/cs_memoryleak.html, http://www.ocsystems.com/cs_injectingfaults.html) for more than a decade.
Your worry about bugs in the dynamic instrumentation tool affecting the production system is no different than worrying about bugs in the operating system affecting the production system and addressed the same way -- by seriously thorough testing.
strace is a tool very similiar to truss or sotruss on Solaris. These tools can be used to watch an application at it runs monitoring system calls but is somewhat archaic.
dtrace is a monitoring tool that provides access to the entire system without worring about causing any damage to a running system. I can choose what to monitor from
everything on the system down to watching an individual thread execute I can even access every public variable in the kernel and it is easy to use.
Dtrace provides much of the functionality of adb,mdb, truss, tnfextract, tnfdump, and lockstat plus much much more.
I used to work for Sun and DTrace is da bomb to put it mildly. It has ZERO impact on real-time execution and can even see into the OS (if you use Solaris). I've built many a real-time system over the years and having this tool would have saved me countless hours of debugger time and logic analyzer time. The one down side to Dtrace is that it does so much it is hard to Master. There is a week long course Sun reccomends before you really can get the most for your efforts. I think it deserves a place on the Innovation shelf right beside the T1 chipset. And there are plans to port a version to Linux, but it may not be free. It also probably won't be able to see as deep into the OS layers as it does with Solaris but that will come in due time. Sun's license isn't 100% compatiable with the Linux GPL either so that could be another issue
Oprofile is more for profiling.
d probes/
LTT helps you analyse events as they happen over time.
Dprobes is one possible source of LTT events.
http://dprobes.sourceforge.net/
http://www.opersys.com/LTT/
http://dprobes.sourceforge.net/documentation/man/
You might want to check out the DTrace Toolkit and take a look at the DTrace scripts it includes. Many of the tools you see there are very admin-oriented, and those are mostly simple examples of what can be done with DTrace
Remember, it offers observability to most, if not all, of the system in a variety of ways which makes DTrace suitable for both admins and develoopers.
For those who, like me, had heard of dtrace but little more (is it like strace, for example), this is very handy article written by one of the authors in Communications of The ACM
p a=showpage&pid=361&page=1
http://www.acmqueue.org/modules.php?name=Content&
Yeah,it's 5 pages long, so those won't RTFA are even less likely to read this, but it's a good read covering motivation, history, solution compromises and some anecdotes that could qualify for http://thedailywtf.com/
I spent a lot of money on booze, birds and fast cars. The rest I just squandered. - George Best
The rest of the userland, however, is a disaster. The filesystem hierarchy is GNU, BSD, or SysV depending on how you look at it, and many of the core utilities are missing useful options.
Blastwave.org - not always the most up to date releases, but certainly the best replacement for those utilities you don't like.
The default shell doesn't do things like tab completion (or even have a history buffer), and the man pages seem to be formatted for printing not on-screen display.
It is trivial to change your shell to bash (distributed with Solaris 10). Give that these are your complaints I have to advise you to NEVER touch AIX. You think Solaris is bad in userland...
Finkployd