Finding New Code

← Back to Stories (view on slashdot.org)

Posted by ryuzaki0 on Monday February 5, 2007 @03:08AM from the trying-to-do-the-search dept.

tabandmountaindew writes "Too much time is wasted re-implementing code that someone else has already done, for the sole reason it's faster than finding the other code. Previous source code search engines, such as google codesearch and krugle, only considered individual files on their own, leading to poor quality results, making them only useful when the amount of time to re-implement was extremely high. According to a recent newsforge article a fledgling source-code search engine All The Code is aiming to change all of this. By looking at code, not just on its own, but also how it is used, it is able to return more relevant results. This seems like just what we need to unify the open-source community, leading to an actual common repository of unique code, and ending the cycle of unnecessary reimplementing."

6 of 158 comments (clear)

Min score:

Reason:

Sort:

Re:I call bullshit on this by thisIsNotMyName · 2007-02-05 03:29 · Score: 2, Informative

Coders typically reinvent the wheel because it is usually easier to rewrite something than it is to learn how the existing code works. That actually leads into my main problem with using code search like this to try to promote more code reuse, namely, trust. The search engine is going to need to provide some way for me to make a judgment on how well written and bug free I can expect the code to be. Joel Spolksy of Joel On Software has said that if you are writing an application that is mission critical to your business you need to control not only the application layer, but also the layer below that, whatever it may be. If I'm going to be giving up control (of creating) of pieces of even my application layer, I need some assurances that the code is high quality.
Just more results, not more content by hsa · 2007-02-05 03:39 · Score: 2, Informative

I tried searching for more general level stuff.

"Radiosity"

gave me several pages of "Radix Sort".

For inexperienced reader: these are not related at all. One is a general sorting algorithm in computer science and one is lightning algorithm used in computer graphics, and games like Quake.

So in my guess it just does a fuzzy search and yields more results. Getting more results, which are not the ones you want won't help you one bit. Useless for me.
Re:Are we really making it better for us, or worse by Anonymous Coward · 2007-02-05 03:46 · Score: 1, Informative

If these companies release public binaries, and you feel what they are doing is morally wrong, you could consider anonymously blowing the whistle on them to people who are capable of analyzing the binaries and who can take it from there legally speaking. FSF or some similar organization would seem to be a prime candidate. Some people seem to be very skilled at recognizing open-source code even in binaries - mostly, I believe, by looking for embedded strings and similar - but they probably would be more effective if they could look for specific libraries in specific binaries instead of searching at random.

Corporations probably won't learn the difference between "freeware" and "Free software" until it becomes apparent that not doing so will get them in legal troubles.
Re:Are we really making it better for us, or worse by Kjella · 2007-02-05 04:05 · Score: 2, Informative

I've talked to NO LESS THAN a dozen commercial companies in the last 2-3 years where they're actively taking FOSS source and incorporating it into their products, because.. (and I quote) "..Its freeware, so we can use it however we wish." The licensing differences between "freeware" and "free software" seem to escape them.

A 150,000$ dollar lawsuit in RIAA style (multiply by number of source files as separate 'works' if you like) and a cease-and-desist forcing them to stop shipping their product, along with federal criminal copyright charges should be enough, don't you think? Not that I generally approve, but where are the ambulance chaser lawyers when you need them? I think the "I'm sure it's all a big mistake and you just release the source and we're happy" enforcement of the GPL is part of the problem - there's no penalty to being caught and companies in general have no ethics.

Also note that BSD code lets you incorporate it, LGPL too if you distribute chages to the library itself. If it's internal you're not distributing it so no requirements at all. But you said "incorporating it into their products", so I assume you ment just that.

--
Live today, because you never know what tomorrow brings
Re:Too much time? by tppublic · 2007-02-05 04:34 · Score: 2, Informative

I wonder if writing it yourself is a time saver.
Like most questions, the appropriate answer is "it depends". Take an example: I just spent yesterday rewriting a single class to fit into a standardized library. After 20 minutes of coding, 1 hour of documenting, and 2 hours of writing tests, I actually have something that meets the library standards. Could I have used the original class? Sure. But it had problems and inconsistencies. The main problem is that most open source code goes through the coding, but never gets the documenting and aggressive testing, because it's too much overhead.
The undocumented bugs, or system assumptions, will lead to using code and then countless hours debugging problems you didn't expect.
This is so true. This even happens in the hardware world. A T1 connection seems like it's pretty simple. By modern networking standards, it's really slow. However, finding a T1/E1 LIU (Link Interface Unit) that actually works is surprisingly hard. An engineer familiar with the specification can break the vast majority of implementations in less than 5 minutes. I actually know of only one that works (not built by my company, I might add).
There is probably a point at which the system complexity of the resued code becomes great enough that the re-use is valuable. But how big, and how mature the reusable codebase, affects this decision.
For big problems, the 'reusable code' simply becomes a framework that is built upon. This could be anything from subversion to packages like Joomla. Reuse of such large items certainly makes sense.
On a smaller scale, maturity is important, but a lot of the issue centers around the available documentation, test code, AND documentation of the test code (1500 lines of JUnit tests don't do that much good if you can't tell what each test is actually doing). There are very FEW libraries that are actually well written, well documented, and well tested. That's why so much code gets rewritten.
Re:I call bullshit on this by cant_get_a_good_nick · 2007-02-05 05:32 · Score: 3, Informative

There was a Joel On Software post a while back that explained this.

Most applicable quote:
If it's a core business function -- do it yourself, no matter what.