Outstanding Objects (Developed Dirt Cheap)
Mark Leighton Fisher writes "Some readers might be interested in
Outstanding Objects (Developed Dirt Cheap); or "Why Don't Developers Search the Literature?" It seems like I still see a lot of wheel reinvention going on, even with the wealth of code and information now available on the Net."
some times it's not appreciated. but hopefully people eventually come around.
library uses wrong language
library has the wrong license
library pulls in too many external dependencies
library not threadsafe
But it's worth the search - occasionally you find a real gem.
It doesn't matter if the code is available from somewhere "out there", from inside your company, or even from inside your group. The reality is that developers in general don't play well with others. Why? For a number of reasons.
First, it is no fun to use someone elses code. This is why at one time Apple computer (many years ago) had 13 different (yes, I counted them) memory managers being written. It was fun to write a memory manager, to solve the problems involved, etc.
Second, people don't trust one another. How do I know that you have implemented this code correctly? How do I know that you will deliver the modifications that I need? That you will deliver them on time? I can't, so it is better to do it myself.
Bottom line, we don't play well with one another, because we want all the fun for ourselves and because we don't trust the other folks (called flipping the bozo bit in some corners).
Most developers probably don't even know how to search CPAN or install a module from it (or PEAR for PHP). So they roll their own inferior solution. Those who have spent the three minutes reading the docs are getting an incredible benefit.
I write a program, and part of it needs to simply read a
Do I _REALLY_ want to pull in libpng and libSDL just to do this? What kind of risks does pulling these libraries in add to my project? How much will this bloat my code? Will users be confused from the different versions of these libraries? What if I one day want to port to a platform that these libraries work on?
Turns out it's usually simpler, easier, and less risky to just roll your own.
I work in a mixed Perl and Java shop. We use some third-party stuff, but it's preferred that libraries be developed in-house. There's a distrust of public-domain software here, and there's an extra layer of process related to integrating third-party code. The CM and Ops guys don't care, but the QA and development guys are scared to death of anything that might have a bug that we didn't write.
Well, I would RTFA if I could RTFA, but I'll try to give you an answer anyway.
The problem is granularity. I interpret from the title (and nothing else, since I can't RTFA) that they want to reuse individual classes rather than entire projects.
I _hate_ the fact that every UML diagram begins with a blank sheet of paper and that individual classes are almost never reused.
Individual classes, however, are even harder to reuse than whole libraries. In theory you could take somebody's generic model of, say, a Person, and extend it with the extra things you need. As long as Person were well-written it might actually be reusable.
But in practice, it won't extend the classes you need it to extend, and it'll probably be tied in to a vast array of other classes that you simply don't need, making your life very complicated. Since requirements gathering is far harder than code writing, people who have to gather their own requirements generally just end up writing the code to match, since it's a trivial effort.
You lose a lot when that happens: you can't reuse a lot of other processing code that you want. However, how long will it take to find that code? Days, plus the time it takes to adapt? How long would it take you to write it yourself?
The lower the granularity, the harder reuse is. I'd like to see better, but with present programming languages it's not going to happen.
I do, however, try to find those libraries before reinventing the wheel. Occassionally I do find one that will work, and then I'll be faced with integrating it into the project. At that point, I've always found it beneficial to go through and edit the source, for two reasons: 1) a consistent coding style throughout the project makes it easier to maintain, and 2) I tend to learn a hell of a lot by actually trying to understand what I'm editing. Then, maybe, next time I can reinvent the wheel all by myself.
Back in 1990 I worked for a small company that built
graphics boards and my first task was to debug
the "polygon fill" routine in their firmware.
It turns out they use their own "home brewed"
algorithm that was slow, memory hungry, and didn't
handle degenerate cases correctly. If anyone in
the company would have taken the time to pull
any one of the graphics textbooks off their shelf
(e.g., Foley, van Dam) they would find a much better
solution.
I ended up rewriting the module myself using
the classic solution -- it was faster, used little memory,
and handled degenerate cases reasonably.
It was my experience that everything was a badly
reinvented wheel when I worked there.
I have been involved in software reuse since the mid-1980s and possibly even earlier. There has been lots of energy expended on the problem of making existing implementations extensible, one of the strengths of OO technology, though not requiring OO. The big piece that has always been missing has been a major concerted effort focused on facilitation matching a developer's needs with existing software.
... I have long felt that CASE tools, yup those tools that are totally out of vogue right now, would be of greatest value if they had a dual function. Their primary function would continue to be as a means of describing architecture, design, or code, but a secondary function would be to, in the background, perform a continuous search of existing work looking for matches. I have never seen a tool that does this, yet this seems a tremendously valuable function.
There are many mechanisms that can assist such as:
1 - technical reviews. When these happen, you get a number of co-workers together to review your work. Not only does this assist in ensuring that direct work (architecture, design, code) is correct, but it also provides an opportunity for all those involved in the review to search their knowledge of pre-existing "parts", be they architectural, design, or actual code, and to suggest you investigate them. Of course, if you're like me, then actual review meetings where a number of people sit down and examine your work just do not happen any more. Thus this form of identifying existing work that can be reused no longer functions.
2 - CASE tools
3 - personal memory - only works for those items you already are familiar with, which frequently gets voided when changing jobs.
4 - institutional memory - this is similar to the technical review mechanism, yet is less well defined. The real question here is HOW does an individual tap into an institutional memory? Documentation search? This is far less than perfect even if all work was well documented. Code search? Even worse at turning up matches to needs.
So... the bottom line is that it truly is VERY difficult to match up needs of a software development effort with the existing software that is available.
Once case in point... I worked on a very large project for an FAA (Federal Aviation Administration) contract. One mechanism I needed was a circular buffer/queue. These seem very straight forward to implement, and an obvious place to use an existing piece of design/code. Well, even after extensive search and review I could not find such a part and built my own. Later, I discovered there were at least six independent implementations of a circular buffer/queue in this single project team. All of them were general enough to meet the other implementation's needs, yet somehow none of us knew of the others' overlapping work. If we couldn't coordinate the reuse of these six independent efforts (and that means we all built the same basic algorithm, found the same set of bugs... and yes, using our code management tool I was able to see the same bugs being fixed in each implementation... and thus a total and unnecessary duplication of effort), how in the world will we ever solve the problem of reusing work outside the single project team, or outside a company?
There are some examples of wild success with reuse... though they seem to me to be more success though definition. All of those shell scripts that are built from individual command line tools are examples of reuse, where each command line tool represents a unit of software available for reuse. But, I think we all think of reuse more at the code module level... a function, or class, or small library. And it is at this level that I think we fail miserably, and it is my contention that we fail because we can't easily find the candidates for reuse.
Some or all of these things happen:
The library is under a license other than the MPL or LGPL.
We try to make the library work for hours, only to find it doesn't do exactly what we need, or is horribly broken.
We try to use the library, but it's broken, and the developer lives in France and only responds to emails while we're asleep.
It would be faster for us to write our own then to decipher someone else's code.
The only real third party library we use mostly does the job, but we had to wrap it and implement all of the features it didn't. In the end, we should have just created our own library. The way I see it, it's just one more thing we can sell.
Yes, I'm paid by the hour, but I also care about the quality of that hour. If the problem is interesting, I tend to research other solutions (to scope out the pitfalls and features I might not have thought of), then I'll often implement my own solution, because learning someone else's code tend to be pretty high on the tedium scale.
If the problem is NOT interesting, I have a lot more motivation to find someone else's code that I can use; if I find a quality solution, I can plug it in, spend hopefully minimal time debugging and testing it, and move on.
And there CAN be pride in using someone else's code, actually; I really get a kick out of using libraries and sending back elegant enhancements or bugfixes back to the authors ("Your library was excellent. I improved it.").
Also, if the code you found is really good stuff, it might help you to finish up a complex feature in record time, which also feels nice ("Oh, I almost forgot to mention it, but that new report we scoped out yesterday is out on the test server").
There are only 10 types of people: those who understand decimal, those who don't, and, uh, 8 other types I forget.
I spent a fun week exploring through the computer graphics repositories of the time (it was some years ago), but I finally decided I'd had enough fooling around, so I hacked out some quick C and converted the files.
The converter I wrote and debugged in a couple of hours was virtually guaranteed to crash and burn on any WTK NFF files but those, but I didn't care. What I needed was those files in Inventor so I could get on with the job of lighting and animating them.
That's the problem with the Booch Components and a good percentage of the things I see out there in the repositories today: they solve the general problem with such elegance that they're really optimally useful only for people who want to understand the general problem instead of knowing exactly enough to solve the specific problem they have.
Well, here's a news flash: a good part of the time I'm to busy to learn how to solve the general problem. What's more, I know I'll never need this knowledge I acquire again, so a quick in-and-out of my brain is all I need.
--
The end
That doesn't do your company any good when you leave them. Then, they're worse off -- a custom solution, and no documentation! If you use standardized parts, (hopefully) they're well documented so that the second generation coder can figure out what's going on. Heck, he might already be familiar with them!
the preceding comment is my own and in no way reflects the opinion of the Joint Chiefs of Staff
One of the biggest problems I usually run into that no one has mentioned yet is this: when searching the literature, I don't know what terms/keywords I should be searching for to find the code to solve the problem.
For example, one project I worked on was an interactive calendar application that would dynamically place multi-day events across the days on which they occurred. Well, I needed those event bars to be as compact as possible, so I searched for an algorithm to figure this out... After 3 days of searching and finding nothing, I asked a Computer Science professor at the local university and he couldn't come up with anything either. I know there is code to solve this type of problem, but I simply couldn't find the keywords to use to locate the code.
And just to chime in about what everyone else has mentioned:
My top reason for not using public code is the lack of quality documentation. If I didn't write it, it's really hard for me to understand it and make use of code without putting in considerable time studying the code, which is more time than I have.
-Adam
I often don't reuse for reasons described very well by other posters, but I wanted to mention some cases where I either did reuse or wanted to.
Two years ago I was developing online courseware for a company that trained/certified future medical transcriptionists. We needed to develop a typing test. Now, a typing test is all about doing two things -- (1) noting when someone types something the shouldn't be there and (2) noting when someone doesn't type something that should. So you're comparing for absensces or additions between a given text and a key. Sound like anything else? My first thought was 'diff'. My second thought was Perl (after all, this is text slinging). My third thought was CPAN. And sure enough, Mark Jason-Dominus' excellent Algorithm::Diff saved me at least days of time and quite possibly weeks.
Now, this was possible in part because I was working as a contractor, and so was probably trusted a bit more, and also, in part because my supervisor/contact with the company was pretty savvy. I can contrast this with some other experiences. Like the company I worked for that wanted a webserver log file analysis package. Again, lots of text slinging, but perl or any other scripting language was out because we wanted the source as closed as possible. Nope, it had to be in C, and I was discouraged from trying to find a regex library to use. I essentially ended writing my own regex engine. It was buggy. It needed optimization. The syntax was less powerful . The stats package itself was good, especially for 1997 (it could do things I've only seen other log analyzers do in the last two years), but because it all ran on top of this flaky regex engine, it couldn't fly. I think it got canned after I left... nobody wanted to touch it. I seriously think I lost months of my life on this, and the company lost a good product. All from trying to reinvent the wheel...
Tweet, tweet.
Why don't we use more available code? Leaky abstractions for one. And look at the DLL and .so hell we're in now, where we have libraries that depend on libraries which depend on libraries...yik...all that to save a tiny bit of work, ain't worth it. Write your own.
You do have a good point there about owning the code. I've found this repeatedly at my company with using external/free software solutions in general.
.NET & the mono project--should be easier to push this if I can say the same (intelligently) written code will work in Linux).
It's a completely idiotic approach to take. They have no problem paying for a god-awful piece of software that somebody thought was cute, but never bothered to check and make sure it'll support everything we need (or even bother to ensure that it'll scale properly), but they are vehemently against using any type of open source software and discourage use large chunks of external code in your apps.
A lot of this stems from code-ownership. They want somebody that they're paying to come back and blame; whether it be me, the outside vendor, or whomever. It's ridiculous and I've tried to make my case time and time again when trying to deal with some of the morons at support, that if we were dealing with an open solution and I could see the code, that I could figure it out for myself. Sure a lot of times it's nice to be able and call and throw it in their laps, but when you hit that week mark and your support vendor isn't able to come up with a solution and you've reached the limits of what isn't in a binary format, you start to get a little frustrated.
I'm slowly making in-roads though; I can see a lot of managers' concern with using Open-source because if you don't get a support contract, you're relying on your in-house people to support everything, which obviously causes a problem if those in the know leave. But is it really necessary to pay a small fortune for a SQL Server, Windows 2000 Server running IIS to produce some basic web applications when Apache/PHP/MySQL|PostgreSQL are all available out there. (And this is the reason that I'm loving
Ah well, got off topic there, but I'm always a big fan of using external code, but have found repeatedly that I'm not allowed unless I sneak it in and take credit for it myself (which I have/will never do).
Simple, a lot of coders are afraid of being sued later on.
Or get caught in some strange license restriction they didnt understand. Since few coders are lawyers its easy to quickly 'get into trouble'.
Not many things worse then having your work ( or your job if you are a captive coder ) deemed useless due to an 'oversight'.
While it sounds simple, with the current state of affairs in the world ( example SCO ), for some its just not worth the liability risk.
---- Booth was a patriot ----
I've seen a lot of cases where someone reinvented the wheel, but instead of using an axle, they just put the wheel under the object so you have to move the wheel to the front to keep going, and then to top it off, they made the wheel square or triangular.
I did that once. In 1969 I needed a sort, so I looked in the fortran programming book I had and implemented a bubble sort, to sort records on disks. When it became clear that the sort would take a couple of months to complete, I started working on optimizations. As I was working for a college, one of the profs suggested that I take a look at the collected algorithms of the ACM. There I learned about quick sort and heap sort. I was able to incorporate those, but I had to deal with the fact that I had a lot of data on disk that I couldn't bring into memory. Eventually I got a sort working that wasn't too bad for the times and the computer resources.
Then Knuth published Vol 3 which I studied with my experience in mind. I was amazed to see Knuth develop and analyze my progession of improvements and cite the authors and dates, all of which preceded my efforts by at least 5 years. I also noted that Knuth's statement that the common use of the bubble sort in programming texts was a great crime.
Roll forward a five years and I join a group where a new backup utility has been released which sorts the blocks to be backed up, but there was some sort of stack overflow problem. But I was the new guy who wanted nothing to do with tapes and besides, the two engineers working on it were comp sci graduates and I was a college drop out.
Roll forward another five years and a coworker had been stuck with this code which was still running into some sort of strange stack overflow problem. He's the polite, persistent nagger type, so I reluctantly agree to take a look at the code. In 15 minutes I realize that the code is quick sort, which then means that there is a serious bug in the code because its overflowing a 200 element stack (and if you don't know wny I knew there was a serious bug instantaniously, you MUST NEVER CODE ANY SORT). In the next 15 minutes I found 3 coding errors in the code, 1 was from the original fortran code, and 2 were from editing the fortran assembly to inline it into the module.
If I have a problem that calls for a sort, I can reuse Knuth's research into sort algorithms, which reuses the work of others to create significant advance in the subject, I can reuse the algorithm coding by transliterating the MIXAL, I can find code in various languages from all sorts of sources (which I evaluate by reusing Knuth's methodology), I can call a binary library routine that I may or may not be able to see source code for, but whose bahavior I can check with rules of thumb reused from Knuth, and I can research new algorithms and possibly reuse after reusing Knuth's analysis to verify the claimed advantages.
In my view, reuse is the obvious method to apply to any development process.
As a mechanical designer, I would be foolish to fail to use standard nuts and bolts, and where that's not reasonable, foolish for failing to use standard conventions and standards for screw threads. A mechanical engineer works with dozens of references and hundreds of catalogs.
For a programmer to fail to access reference materials and catalogs looking for existing components and art (craft) to reuse is idiotic.
The problem is that there are few good references to draw on.
The only reasonably comprehensive compendium of computer software algorithms I know of is in the 6-10 books that Knuth has published.
I've opened a number of books where the title suggests a comprehensive listing of software algorithms or C++ objects, and been greatly disappointed at the lack of depth. These books are like going into the drug store or Sears to find the right nuts and bolts for a project. I was hoping for something closer to Home Depot which only scratches the surface of reusuable fastener technology.
Softwa