Sun Wins Top Tech Innovation Award
Carl Bialik from WSJ writes "Sun's DTrace trouble-shooting software won top prize in the Wall Street Journal's 2006 Technology Innovation Awards competition. It's the second time in three years that Sun took the top award. From the article, which also names a dozen other winners: 'Where most debugging takes place as software is being developed, DTrace analyzes problems with systems that are in production — running a company's database, say, or executing stock trades. It does this with a process called "dynamic tracing," which enables a developer or systems administrator to run diagnostic tests on a system without causing it to crash. Before DTrace, such tests often took days or weeks to reproduce the problem and identify the cause. With DTrace, performance problems can be tracked to their underlying causes in hours, even minutes.'"
What about Linux's strace?
Reality is nothing but a collective hunch.
DTrace has a degree of OS integration that makes it non-trivial to copy, linux's alternatives don't even come close even though a tool like this would be very useful in linux.
For the foreseeable future, if you want to have this type of debugging on your server then the server has to run Solaris. And if your server is bigger than a 4-way then it makes sense that it's a Sun server.
There is value in premium gear, and while it won't make Sun the next Dell, it can hopefully help improve their standing in their core market.
Sun's DTrace trouble-shooting software won top prize in the Wall Street Journal's 2006 Technology Innovation Awards competition. It's the second time in three years that Sun took the top award.
Sounds like they've put those HP founders to work, instead of just parading them around in t-shirts.
The theory of relativity doesn't work right in Arkansas.
However, inline analyzers have existed. Intel's VTune is clunky, limited in supported architectures but useful where it applies. Parallel developers might well use DAKOTA and KOJAK to do the same for MPI applications, which traditional analyzers can't handle at all. I also would not advise anyone to just use analyzers. You would be wise to monitor events - there are patches for Linux, such as evlog, which give you very flexible event logging. Linux also provides the ability to monitor all kinds of other statistics - either as standard or through patches such as Web100 (for the network) or LTT-ng (for profiling).
Does this mean I think Sun don't deserve the award? I've not used that tool, so I'm not in a position to say. It would have to do a lot in addition to basic analysis to earn the right to be innovative, never mind the title of "top technical innovation". If it can, that's great and can Sun kindly port it to Linux. If it can't, then all I can say is that the competition must've sucked this year.
It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
yea What about Linux's strace? that was mentiond by chroot james
GI
After all, it takes a considerable amount of insight to pick a code analyzer (admittedly one as brilliant as dtrace) as important and newsworthy. Good job, guys! It shows you can look deeply at a topic and understand what makes computer systems valuable. A lesser effort would award something from Microsoft, Google or Apple, whose products are great, but lack the sophistication of many Sun innovations.
There exists no way of exchanging information without making judgments. --Bene Gesserit Axiom
Wall St is packed with accountants and tie-wearing beancounter types isn't it? A tech award from them would surely be an insult to any true geek!
Engineering is the art of compromise.
Several people have mentioned strace, but I have yet to see anyone mention oprofile. I haven't used dtrace before, but oprofile allows you to see where an application is spending it's time transparently, with negligible performance hit, and without restarting the application.
oprofile has been around since late 2002 it seems, so it's not particularly new either. How does dtrace compare to oprofile?
Game! - Where the stick is mightier than the sword!
I noted in my article Boxing in the LLRing, which despite positive responses Slashdot rejected in favor of Roland Piquepaille's daily column and various political commentary, that Squeak has an amazing debugger (I am not going to call it a full-blown analyzer) that allows you to debug applications as they are running on the very interesting Seaside application server.
As described in this paper (pdf), Seaside provides multiple control flows and a high level of abstraction that is very useful to web app developers.
The 4500 word article is coverage of a 300 developer "Lightweight Languages" all-day seminar held in a real boxing ring in Tokyo, covering 30 languages and frameworks including Perl, Python, Ruby, Haskell, OCaml, Squeak, and many others.
But a few points.
1) You need to boot bsd specially into a dtrace mode to use this. That presumably means that the BSD version either slows the system is isn't of production quality. When my database server is dying under the load, rebooting it isn't high on the list of things I want to do.
2) FreeBSD are pretty nimble at developing this kind of thing. I'm more curious to see how long it takes MS or Dell to have something comparable.
3) Sun provided the source and a development machine; presumably because of FreeBSD's favorable licensing. I'm not sure that's an option for any closed source product.
and maybe after it is ported to linux/*bsd and ten years have gone by, admins will actually start using it to its full potential. Now, if someone were to code a nice gui frontend to dtrace, that'd be innovation, because it would take an absolute master of UI design to turn using dtrace into something that was easy-to-do for the uninitiated.
How we know is more important than what we know.
Admins are not necessarily coder are they? You have to be a fairly savy developer to be able to use Dtrace.
You ultimately need to fix the code and need someone to modify it.
Having said that...I am sure it will get easier to use in the future. I for one welcome all the help I can get. Admins included!!
Dynamic instrumentation (you know -- the "D" in DTrace's name) has been in-use on the live air traffic control systems of several countries' Air Traffic Control systems (http://www.ocsystems.com/cs_memoryleak.html, http://www.ocsystems.com/cs_injectingfaults.html) for more than a decade.
Your worry about bugs in the dynamic instrumentation tool affecting the production system is no different than worrying about bugs in the operating system affecting the production system and addressed the same way -- by seriously thorough testing.
It was released about 4/25, but doesn't show up when you look for dtrace - its works great in Linux/UNIX environments for tracing errors through different packages / libraries.
great job theif!
-Iridium
"During times of universal deceit, telling the truth becomes a revolutionary act" -- George Orwell
To paraphrase the old saw... That award and $2.95 ought to cover a cup of coffee - er, I mean, a cup of Java!...
This issue is a bit more complicated than you think.
Oprofile is more for profiling.
d probes/
LTT helps you analyse events as they happen over time.
Dprobes is one possible source of LTT events.
http://dprobes.sourceforge.net/
http://www.opersys.com/LTT/
http://dprobes.sourceforge.net/documentation/man/
The sun has the first working positive energy fusion reactor!!! Not to mention being responsible for all kinds of gravitational effects like keeping jupiter from hitting mars... Finally it gets it's day in the, er, No I didn't RTFA... why, does it say something important?
Everybody now! The sun is a mass of incandescent gas, a gigantic nuclear furnace...
Peter Marshall: Paul, according to Redbook, what is "Plank's Constant?"
Paul Lynde: Well, if Plank were all that constant, he wouldn't be needing that Ex-Lax, would he?
(Uproarious laughter from the studio audience.)
http://www.classicsquares.com/lyndesquares.html
* * * * *
It's only when you look at an ant through a magnifying glass on a sunny day that you realise how often they burst into flames.
--Harry Hill
It's so reliable you never need to look for problems.
Please check out the Chime project which is about visualization software for DTrace. You can find more information at http://www.opensolaris.org/os/project/dtrace-chime /
For those who think that DTrace is old news, I really suggest that you download one of the OpenSolaris-based distributions
http://www.opensolaris.org/os/about/distributions/
and play around with DTrace. Yes, it's CLI is aimed at the geek in all of us but there is software like Chime and MacOS X's upcoming Xray which will help with those who prefer a different sort of UI.
I wonder how much of choice Sun had in releasing this as an open source? Is Sun's DTrace is a direct descendant of military contracts such that Sun MUST release the code for free, since it was written on the people's dime, i.e. the US Mil that payed for it?
For those who, like me, had heard of dtrace but little more (is it like strace, for example), this is very handy article written by one of the authors in Communications of The ACM
p a=showpage&pid=361&page=1
http://www.acmqueue.org/modules.php?name=Content&
Yeah,it's 5 pages long, so those won't RTFA are even less likely to read this, but it's a good read covering motivation, history, solution compromises and some anecdotes that could qualify for http://thedailywtf.com/
I spent a lot of money on booze, birds and fast cars. The rest I just squandered. - George Best
Zero as in for sufficiently large values of zero, you mean.
A disabled dtrace probe point is a NOP or two, which certainly adds a tiny bit of overhead (icache space and instruction fetch time probably count more than the cycle it takes to run the NOP); and an enabled dtrace obviously does more (increments some counters).
Agreed dtrace has a very elegant way of minimizing overhead; but when you write ZERO in all caps like that, you might trick people into thinking it's actually zero. OTOH, since you use phrases like "da bomb" I guess most people would have dismissed whatever you said anyway.
Glad to see you have rightfully been modded down....
IANAL but write like a drunk one.
Mod up the parent!
Geesh, why does no one link to the original USENIX paper on DTrace:
Dynamic Instrumentation of Production Systems
Quite a fascinating read, actually.
Oh, I worry about bugs throughout the infrastructure. OS bugs, compiler bugs, system library bugs, firmware bugs - all of these can turn even a 100% perfect application (were such a thing to exist) into a smouldering heap of junk. They are unpredictable and almost impossible to trace in those situations where the programmer only has the application to look at. Dynamic instrumentation is, I believe, slightly worse in that non-fatal bugs in a system call, for example, would eventually be inferred by observing that data is mangled after such a call in all places in the system. With dynamic instrumentation that uses embedded operations in the code, the same holds true. Instrumentation that runs in parallel and dips in at intervals is much more of a problem, as there is then almost certainly no correlation between anything in the code and side-effects from the instrumentation. You can eventually deduce that the errors must be external to the code (and all functions linked to it), but in any seriously large application, or if the OS is complex (or, worse, black-box), this can take a hellish long time.
Probably the worst-case scenario is where the side-effects aren't direct. The instrumentation might very occasionally add a delay that, on rare occasions, causes a time-sensitive component of the application to miss a critical deadline. The bug would then not be in the code of either program, but would be in the sequence of operations of successive time-slices. Sequencing bugs are bloody murder, because they are not programming bugs. The code can be 100% clean and still have this class of bug. (Sequencing bugs are much more general than, say, race conditions, and would typically be at a much lower level.)
Debugging programs is extremely difficult and time-consuming to do right, because by the very fact that you are running in a debugging environment, you have changed the characteristics of the environment the program is running in (unless it is ALWAYS running in such a mode). Even disregarding all the above problems, I feel certain that the vast majority of programmers have encountered bugs that cease to exist when debugging information is added, or where the program is placed in a debugger... or, for that matter, ONLY exist when debugging information is added. The last of these is particularly nasty for those in rigid work environments. If there's some bug X that users are seeing that is obscured by bug Y that is introduced by debugging/instrumentation data, there are workplaces where fixing bug Y is not permitted as it is not a user-documented bug and so no time/money has been budgetted for fixing it. That can make fixing X really fun. You can spot projects that are likely a victim of the "no complaint, no fix" attitude - they eventually work just well enough but no better, ran way over on time and are likely to be fragile under unusual conditions.
Surprisingly, this is not a dig at the Usual Target of Slashdot Gripe. Rather, I've seen this attitude when employed within the public sector, which is notorious for producing an amazing amount of crap. Which is ironic, because the less formal and informal projects from the public sector are equal to or better than commercial projects. Sure, there are a lot of crap projects on Freshmeat, but if you look at the really good stuff, you'll see a lot comes from Government research groups, Universities and - occasionally - the US DoD, but they're all projects managed by geeks, not wannabe accountants. (How bad does a person have to be if they can only pretend to be an accountant?)
It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)