A Unified Theory of Software Evolution
jso888 writes "Salon has a nice article today on Meir Lehman's work on how software evolves and is developed. Lehman's investigation of the IBM OS/360 development process became the foundation for Brooks' Law: "Adding manpower to a late software project makes it later." He is hopeful that his work will make software development less of an art and more of an engineering science."
Software doesn't evolve by chance, folks, it is DESIGNED by its CREATORS.
/. is a site for SERIOUS INTELLECTUAL DISCUSSION.
Please check your crackpot theories and psuedo-science at the door.
Thank you.
dinner: it's what's for beer
This can be simplified: "Adding manpower to a software project makes it later."
There's rarely that many programmers needed for a given task anyway. You need a project leader and lots of monkeys to test it... very few projects should have more than 10 programmers (if any).
//TheToon
While "Brook's law" might be a law, it's only useful in retrospect. Most software projects have no idea how far behind they really are. So basically, you can always add manpower, you're really only half way through anyways...
Where's the guy with the .sig "it takes nine months to bear a child, no matter how many women you assign to the task" when you need him?!?!?!?!?
"When I first wrote about this topic, nobody took a blind bit of notice."
No, sir, I did and many collegues who were also interested in good timely work. We lent your books to each other with the notion "that's something you should read".
Great to hear that you are still alife and enjoying to give programmers and their managers something to look at and something worth to read and think about.
Youngsters, better pay respect to this old software camel with the hole in the sole of his shoe (and probably also in his all-too British pullover), or I DDOS your toilet!
"Unless IBM programmers had suddenly figured out a way to write error-free code -- an unlikely assumption -- Lehman made a dire prediction: OS/360 was heading over a cliff. IBM, in stressing growth over source-code maintenance, would soon be in need of a successor operating system."
Which means that commerical systems don't so much evolve as stub their growth paths out and switch direction or spawn new generations because embedded complexity has killed off the feasibility of maintaining it. In other words, all new releases are the cause of and ultimately an attempt to escape from, the chimera that is overly complex code. In commercial terms this should be astounding. We're paying to gronk up our own because we erroneously believe the NEXT version will be something radically new and elegant which of course it can't be.
New Version "x+1.y" is simply an ejection seat.
I'm not attempting to flamebait here, just submitting an observation. It seems to me that many of the complexity issues can be overcome by designing better languages. I've never stopped scratching my head over the perseverance of old languages like C++ and FORTRAN. Sure, they are extremely useful in the hands of experienced folks, but they need to die. They were good solutions to problems decades ago, but so much has been learned since then and the constraints of sparse computer resources and CPU speed have moved a lot.
a standard printed book value of time estimates for projects. the auto repair industry has standard estimates for certain repairs, why doesn't the software repair industry. i know they're worlds apart, but it sure would help out a little to be able to pull out a little book and say, well, you need a gui interface consisting of 15 screens to maintain 20MB of data, it's going to be 10,000 hours for developing, testing and documenting. if you want to cut the documentation, we can do that, but you're really slitting your throat there.
From the article:
Michael Godfrey, a University of Waterloo scientist, is equally hesitant but still finds the Lehman approach useful. In 2000, Godfrey and a fellow Waterloo researcher, Qiang Tu, released a study showing that several open-source software programs, including the Linux kernel and fetchmail, were growing at geometric rates, breaking the inverse squared barrier constraining most traditionally built programs. Although the discovery validated arguments within the software development community that large system development is best handled in an open-source manner, Godfrey says he is currently looking for ways to refine the quantitative approach to make it more meaningful.
It would have been interesting had they delved deeper into this finding. Yeah, I know, the true believers in open source all feel superior (we are, aren't we?), but exploring the reasons why it works would be interesting.
Is it the large-scale peer-review process? Is it that we occasionally rewrite parts (filesystems, VMM, etc)? Something else?
Good article; I think the description of the sociological basis of the "laws" is correct. My experience suggests that the slowest development paths are those that cross other people's areas.
i ant*)
(And yes, I know about XP's "All code is shared.")
As for the maintenance, it's my normal experience, but the prohects I've been involved in may be atypical. (*cough*Canadian*cough*telecommunications*cough*g
We spend a *lot* of time reworking old code to (a) fix obscure bugs, many of which are slow leaks shown up by weeks serving live traffic (b) adapt the code to support new releases of underlying hardware product and (c) adding new features to satisfy users.
Although the performance audit showed that IBM researchers were churning out code at a steady rate, Lehman found the level of debugging activity per individual software module to be decreasing at an equal rate; in other words, programmers were spending less and less time fixing problems in the code. Unless IBM programmers had suddenly figured out a way to write error-free code -- an unlikely assumption -- Lehman made a dire prediction: OS/360 was heading over a cliff. IBM, in stressing growth over source-code maintenance, would soon be in need of a successor operating system.
Except that the "[dire] need of a successor operating system" isn't so dire at all: the world's richest man didn't get where he got by writing code that didn't need to be replaced by a successor operating system, did he? The whole premise is to produce something that works now, and when it stops working later, you sell a later version. Heck, just a couple of months ago, Billy announced that 92.3% of the calendar year would focus on new code, leaving the rest for the old.
What's smarter, coding the Microsoft way, or coding a server that's been up since before Windows NT was released, without a patch in 7 years, handling half a megabit of data both upstream and down, every second of every day forever. Where's the revenue?
~r~
Note: the 92.3% figure might only be for the year 2002, with later years being still closer to 100%.
From the article:
"In software engineering there is no theory,"
I don't buy that... at least not completely. I would say something more like, "In software engineering, theory is extremely underutilized."
I believe there are many instances of engineered software, but not necessarily high-profile stuff. A lot of DoD conscripted code may never the the civilian light of day, but there are procedures and documentation requirements that, flawed or not, enforce certain practices. Can we call that "theory"? Anyhow, defense suppliers can afford the extra development time, 'cause the government is forking over big bucks for the code to right.
For the mainstream (read desktop) apps, where all the money is, the time to market and feature pressures will continue to suppress even the best "unified theory" of software development.
C++ isn't that old. C, yes, but C++ is one of the newer languages.
Best Slashdot Co
Manny Lehman is credited with coining the expression "Software Engineering". About 1968, I think. See also the website of the company he founded Imperial Software Technology .
I was interested in the fact that some researchers have only recently come to the conclusion that software is written by people.
It is questionable how useful purely statistical methods are in these situation.
One thing I would be interested in knowing is how staff turnover effects development. For matainable software to be possible a consistent approach must be maintained on adding new functionality, this usually requires deep understanding of alrge code base, and if your programmers keep changing, the newbies may not follow the rules.
Choose your allies carefully, it is highly unlikely you will be held accountable for the actions of your enemies
There's a lot of piss poor code out there because there are a lot of piss poor programmers out there -- people who should not be in this industry, people who took a couple of classes in VB and think that qualifies them for the title of "Programmer." And they can still bullshit their way past hiring managers with their shiny buzzwords.
I'm trying to teach myself to set people on fire with my mind... Is it hot in here?
I worked on a system running OS/360, up to Release 23 or so, when IBM 'retired' it.
We installed new Releases about once every 6 months. IBM also had 'patches' available for about 19,000 known bugs.
These patches were not incorporated into the latest release because each of them, if installed, broke some other aspect of the OS.
We, and every other site, only installed those patches needed to work around problems that the particular site encountered. And you always hoped that today's patch would not break something else that your users needed.
- The functional capability of the OS too, since new hardware keeps coming out
and the Law of Declining Quality ("The quality of E-type systems will appear to be declining unless they are rigorously adapted, as required, to take into account changes in the operational environment").
Exactly what is happening to windows? And why Linux is so successful -> Open Source like fetchmail et al being more linear in their development, all users get a stab at getting the environment right.
But users who aren't prepared to do any work to make things better in their environment for their PC are always going to lose. But it's the same as those people who make their desks tidy and optimise them for work, and those that don't. The difference on your virtual desktop is that you can't easily hope someone else will tidy it for you...:)
Conversion Rate Optimisation French / English consultant
it just goes to show that 99% of the work in creating software is in the design.
you have to try to map out not only what you will need but what you might need in the future.
yes, it's a near impossible task but it's the only way to avoid automatically commiting yourself to an endless cycle of patches and hacks.
the good part is, if you can plan the project well enough then the actual coding becomes nearly trivial.
the problem arises when the boss says 'i don't care about scalability or flexibility, i just want code now' and i have to try to explaining that i'm trying to save his ass 8 months down the line when clients (and not to mention, the boss himself) bombard us with feature requests, etc.
OS/360 was actually heading over a cliff. The various pieces of software did not work when they were put together. The OS was delivered years late and massively over budget. Many IBM 360's (costing six figures back when $1 was worth something) were delivered and then spent years simply running emulators for the old machines they replaced, because the native software wasn't ready.
Yes, there were lots of things they could have done -- like define a subset of the original committee-designed bloated specification, get that working, then start adding features. But the manager (Fred Brooks) didn't know that, yet, and didn't even know the project was in trouble until it was impossible to deliver anything at all on deadline. Afterwards, he wrote a book, The Mythical Man-Month, which has become a standard text for large-project management. But he learned how by doing it wrong, more massively than anyone ever had before...
So it is accurate to say that C++ has only been standardized recently. But unless you're comparing C++ to Fortran/Simula/Algol, it is just wrong to call it "new".
The Daily Build
Creating common APIs allows seperate development projects to proceed at their own pace. You don't need OO for this, but it helps.
I think one of the reasons that Linux has been so successful is because Linus decided long ago to take a modular approach to designing his monolithic kernel.
-josh
To a Lisp hacker, XML is S-expressions in drag.
Back in the early 1980s I headed up a small team that developed 'industrial strength' applications.
Our firm licenced this software to major manufacturing firms with a Money Back Guarantee. As in, "If you are not satisfied, for any reason, we will either fix the problem or give you back your money. Your choice." We were never asked for a refund.
It was semi-open source. You could have the source any time you wanted, but asking for the source voided your warranty, since problems in your data might have been caused by your own temporary code changes.
Funny thing. I've had that on my resume for many years, but no prospective employer has ever asked how I did it.
No one has hired me specifically to help them produce similar quality code. Much of the time their reaction to my resume is, 'but you don't know c++' (or their other favorite). I know enough about c++ to know that I want to stay away from that second generation language for all but the most specialized situations.
I have also been told, on numerous occasions, that I'm not qualified to lead a particular project because I lack experience mannaging the large team that will be needed. I've never gained that experience because I've never needed a large team to accomplish anything.
As an MBA, as well as being an application designer & a coder, I know that large teams do have a place -- mostly where you have a blank cheque and are earning a percentage of the total billing. (:-)
A quick warning... I consider myself a relative newborn in the world of software development. I present these opinions under the consideration that my opinions can change at any moment. =]
A lot of the dire predictions of software atrophy and such are a result of applying the wrong methodology to a project. Yes there are uses for Software engineering, but I think this approach is overkill for even large scale projects. Check out Software Craftsmanship: The New Imperative for a different perspective. A perspective I think is in need of serious consideration. The gist is returning to the days of master craftsman and apprenticeships. This focuses a bit more on the learning aspect than actual development methodologies, but you can always go to The Pragmatic Programmer to fill in that gap.
"As time passes, the system becomes less and less well-ordered. Sooner or later the fixing ceases to gain any ground. Each forward step is matched by a backward one. Although in principle usable forever, the system has worn out as a base for progress."
This is where "refactoring" (see Fowler's Refactoring) really shines. I find it difficult to believe that refining the software base is not progress. An initial revision where the code functions by its contract (if your into designing by contract), then you refactor the body of the function/method for speed / elegance. Then you can run your unit tests on the function / method to test that the refactoring session did not break any of the design contracts (whew).
I think they may be trying to restate the broken window theory (see Pragmatic Programmer), were a broken window (or bug) in a building (or system) leads to delapidation elsewhere in the building (or system).
And then there are the agile methods, including XP. I think these answer a lot of the limitations and issues with Software Engineering practices. Interacting with clients (having a client there during each iteration) gives you the benefit of almost real-time feedback so that you can update your user stories on the fly, etc.
Without rambling on any farther, my point is not too spend too much time looking for a specific unified theory. Read up about all the ideas, methods, and theories. Take the best parts from each, then crank the knob all the way up (if I may borrow that from XP =] ). Don't let anyone tell you there is a science to software development that is easy to reproduce, and that you are just a link in the overall chain. You practice and perform a craft. Enjoy it!
Take, for example, Lehman's "Second Law" of software evolution, a software reworking of the Second Law of Thermodynamics.
"The entropy of a system increases with time unless specific work is executed to maintain or reduce it."
As evidenced by the back of my Subaru.
Samsung took back my unlocked bootloader because Google wants me to rent movies. They're both evil.
Its simply true; there is no other language that even comes close to filling all the roles of C++. Most of the languages people advocate for taking a certain niche from C++ are implemented in C++.
Its a very difficult language to learn, and hard to use properly. It has lots of syntax, and many idiosyncrasies. Yet it yields you control of the machine in the manner of C, adding in alot of the niceties of high level languages for those who know how to use them.
You might argue that its less error prone for certian programmers to use a more specialized and high level language for certain tasks. You might make a good case that C++ should not be someones first learned language (I say learn assembly, then C, then C++, then some high level lang).
What you cannot say is that C++ should be ditched. It is filling a vast role in real-world programming, where nothing else can compete.
We had a case were a system no longer proved ameniable to feature addition or continual improvement to match the changing operational and customer requirements. In the end the benefits of refactoring the codebase to match the changing production requirements were more costly than to rewrite the system using more modern libraries, methodologies and frameworks. It got rewritten and the old system phased out.
It wasnt a case of "fixing" inherently broken software, it worked perfectly well, just the operational flow it supported changed due to new customers and more efficient management procedures.
Incidentally we have found with each major rewrite of that system ( there has been two ) there has been an immediate growth spurt in customers. I am not sure if it is because it looks like something new, or that the software better matches the operational requirements or because of increasing feature addition. Either way the last two rewrites have been paid for almost immediately by the addition customers the new software has brought in.
mocom--
The recurring theme that its the programmers fault, not the language, is entirely tired and completely wrong. You have to maximize the productivity of the average programmer. Sure you can snidely conclude that they are stupid and just not man enough for C++, but that isn't going to get your product out the door any faster or reduce the error rate.
access to machine registers and memory
architecture specific machine instructions
transfer of execution to an arbitrary address
coerce object refs to addresses and back
invoke OS services
This doesn't mean that you can't write GC in Java! IBM implemented a JVM and GC system entirely in Java, called Jalapeno. To do this, they created a Java class called "Magic" that had empty methods for these services which any Java compiler could build. Then, the internal Jalapeno VM compiler would recognize calls to the Magic class, verify that what they are compiling is a valid part of the JVM and inline appropriate machine code where these calls occur.
Now, all GC systems can be written in reference to this Magic class and porting the VM is simply a matter of generating appropriate machine code for these half-dozen methods. And you get all the security of Java's automatic memory management model!
Check the ACM's OOPSLA Conference Proceedings, 1999, Implementing Jalapeno in Java or www.research.ibm.com/jalapeno for the paper.
From the article:
Is fetchmail complex enough that it needs to be growing geometrically? I mean yeah, fetchmail does a lot, and I do know what "geometric" means. Still, I doubt the world of email is changing fast enough that you'd want to choose that as your example of out-of-control software maintenance.
[Insert obligatory ESR goading.]
The study by Mr. Godfrey and Mr. Tu can be found at http://plg.uwaterloo.ca/~migod/papers/iwpse01.pdf . (4 pages in a PDF file).
You're absolutely right about this. I'm another semi-old-timer. In the early 1980's, I was on the team (six people, all with developer background) to write a bisynchronous communications package (HASP station emulator). We had a standing offer--anybody who could find a bug would get a free dinner at any restaurant. We only had to pay off once.
Nobody seems to care about doing this anymore, or maybe they never did in the first place, and we were all just naive.
Where I work, it has been a commonly held belief that all software evolves until such time as it can send and receive email. If it doesn't do this, it isn't complete. :)
Jason Pollockjavac, the standard JDK compiler, is actually a Java program. Also, kopi.
Open source software OTOH is built by widely separated people with narrow bandwidth links between each other and only a shared vision of the Right Thing to guide them. The result, as predicted by Conway's law, tends to be highly modular architectures focussed around a few core protocols or APIs that capture the vision.
Modular systems are inherently more flexible and reusable than monolithic systems because they exhibit low coupling between the modules. In contrast the monolithic software is more likely to have high coupling between modules, even though they are supposedly independent.
(There is also a related concept of "cohesion", which is the extent to which the features of each module hang together as conceptual wholes. I suspect that OSS will show higher cohesion than closed source software)
It would be interesting to get some statistics to test this theory. Does anyone know of any good software for measuring coupling in C code? I'd like to run some commercial and OSS software through it and see what it says.
Paul.
You are lost in a twisty maze of little standards, all different.