Finding New Code
tabandmountaindew writes "Too much time is wasted re-implementing code that someone else has already done, for the sole reason it's faster than finding the other code. Previous source code search engines, such as google codesearch and krugle, only considered individual files on their own, leading to poor quality results, making them only useful when the amount of time to re-implement was extremely high. According to a recent newsforge article a fledgling source-code search engine All The Code is aiming to change all of this. By looking at code, not just on its own, but also how it is used, it is able to return more relevant results. This seems like just what we need to unify the open-source community, leading to an actual common repository of unique code, and ending the cycle of unnecessary reimplementing."
I'm not a coder, but my impression of the vast majority of coders is that they reinvent the wheel because they believe that everyone screwed up their wheel implementation and if no one is going to do it right, they should.
1) "Java Only for now, more coming soon!"
2) "Alpha"
3) The linked article is a "product announcement" on Newsforge
This is slashvertisement for a vaporware product. Although this is promising, there is nothing concrete there to call it "what we need to unify the open-source community", not even an alternative to Google codesearch.
Btw, is alpha the new beta?
If we create this grand, uber code-searching portal, which can search the context of the code, aren't we making it easier for commercial entities to go ahead and and pick and choose those bits of code to use in their products, knowing full well that they're going to violate the GPL (or other OSS licensing models) by doing so?
I've talked to NO LESS THAN a dozen commercial companies in the last 2-3 years where they're actively taking FOSS source and incorporating it into their products, because.. (and I quote) "..Its freeware, so we can use it however we wish."
The licensing differences between "freeware" and "free software" seem to escape them. Just google around and you'll see thousands of FOSS projects listed on sites like TUCOWS, download.com and others, as "freeware" and not the proper "free software" that they are. There are also people who think "free software" means just that (lowercase "F" there).
Let's be sure that if we have a search engine that let's brainless developers look like experts by cutting and pasting bits of OSS code from here and there together to make their software work, that they know what the license is and that they must be in compliance with it to use it.
Please?
Don't get me wrong, if you're developing a stand alone project that wont be a dependency for someone else, then you absolutely want to rewrite as little code as possible. Let someone else maintain as much of your codebase as possible. But if you are writing something that other projects will be using as a dependency, don't you dare make me download four other libraries just to run your code. Write your own dang StringUtils or, if you're lazy and your project is GPLed, just copy the code.
I just ran a search for "the 500,000 lines of code I need to finish by friday all the stupid extra features the PHB wanted after we had set a deadline based on the original spec".
0 results, rather disappointingly.
Do not try to read the dupe, thats impossible. Instead, only try to realize the truth
What truth?
There is no dupe
I am an embedded engineer. Various firms I have worked for have tried to implement some kind of "reuse" code store. But every time real-time considerations and platform specifics have derailed it (thankfully) in the early stages. At the low-level so called "code reuse" is (IMHO) a nothing short of a right royal pain in the neck. It looks good on paper, managers like the concept - but it is impossible to implement without large amounts of hardware abstraction. Maybe it makes more sense further up the SW tree, at a higher level when things are not so resource critical.
We shouldn't ignore a good idea just because it makes it easier for someone to do something illegal. There are laws to protect the code. I think the benefits will outweigh any loss from commercial companies stealing the code. I hope this does work out as well as it looks like it could.
http://bgcommonsense.blogspot.com
I think in order to be really useful for not reinventing the wheel, it should allow intelligent searching for licensing. That is, it should allow to restrict your search to codes with certain licenses, or even better, to code under a license compatible with any given license (or set of licenses).
For example, if you are working on code which you want to release as BSD, it's not much help if you find code licensed under the GPL, even if that code on its own is great. Likely, if you are writing GPLed code, you are not interested in code under licenses incompatible with the GPL (like e.g. the MPL).
Of course, the search engine cannot make a guarantee that the license will fit your needs, but then, it cannot guarantee that about the code's functionality either.
The Tao of math: The numbers you can count are not the real numbers.
While the article mentions that too much time is spent re-implementing new code, I disagree that this is necessarily a bad thing (tm). Re-inventing the wheel can often cause evolution of code, as opposed to the stagnation that can occur if something remains static. Now, of course people will say that this is GPL code, and people can then modify it -- this is of course true, but modification on that level seldom equates to evolution per se, sometimes because the changes as specific to the application, sometimes because you are trying to do something with code that simply wasn't designed for (I guess you could equate it to trying to run a a web server from Windows 95).
There is probably a point at which the system complexity of the resued code becomes great enough that the re-use is valuable. But how big, and how mature the reusable codebase, affects this decision.
www.jmagar.com
-
I tried searching for more general level stuff.
"Radiosity"
gave me several pages of "Radix Sort".
For inexperienced reader: these are not related at all. One is a general sorting algorithm in computer science and one is lightning algorithm used in computer graphics, and games like Quake.
So in my guess it just does a fuzzy search and yields more results. Getting more results, which are not the ones you want won't help you one bit. Useless for me.
One thing that I find disappointing with all the code search engines is they all treat them as regular text files, more or less.
None of them seem to make an effort at understanding the code syntax.
That's why a few years ago I wrote one for C/C++ code called http://csourcesearch.net/
I just did it as an experiment, and using all open source software and in my spare time, but I think it having the ability to syntactically know the difference between a comment, a function, a structure, etc. makes a big difference.
When Google launched their engine, I was disappointed they didn't take the extra time needed to make their parser/engine smart.
Except that most people have jobs with deadlines. Besides, there's always more code to write. Even if I pull available code out of a repository, I'll still be continuously writing code. Furthermore, there's value in seeing how other people implement a solution because it is probably not exactly the way you would have implemented it and you just might learn something from their solution.
I've talked to NO LESS THAN a dozen commercial companies in the last 2-3 years where they're actively taking FOSS source and incorporating it into their products, because.. (and I quote) "..Its freeware, so we can use it however we wish." The licensing differences between "freeware" and "free software" seem to escape them.
A 150,000$ dollar lawsuit in RIAA style (multiply by number of source files as separate 'works' if you like) and a cease-and-desist forcing them to stop shipping their product, along with federal criminal copyright charges should be enough, don't you think? Not that I generally approve, but where are the ambulance chaser lawyers when you need them? I think the "I'm sure it's all a big mistake and you just release the source and we're happy" enforcement of the GPL is part of the problem - there's no penalty to being caught and companies in general have no ethics.
Also note that BSD code lets you incorporate it, LGPL too if you distribute chages to the library itself. If it's internal you're not distributing it so no requirements at all. But you said "incorporating it into their products", so I assume you ment just that.
Live today, because you never know what tomorrow brings
Is the code easy to find? Will a quick search of sensible key words take me to a short list of results with high accuracy? No point in spending an hour wading through results that may or may not be useful when I can implement and test it myself in 2 hours.
Is the license clear? I may eventually want to release as open source or commercially use something I write. If I include someone else's code/library I have to make a note (hopefully in the LICENSE file provided with the code or in the top of the code comments) on what the license is. Is it BSD, GPL, public domain, not stated or some commercial license that lets me look at the code but not use it myself?
Is the code self contained? This generally means does it come as a library. I dont like copying and pasting code into my code - especially if its not the same coding practice as my own. (this comes abck to licenses above - if its self contained and with an incompatable license atleast I can rip and replace later if I need to)
Is the code well known? Is if the defacto standard for doing this type of thing (STL, Perl core, glibc)? Or is it one of several well known options for the same thing (gtk, qt, kde)? Or is it an unknown? This will help you know how well this code is field tested already - I don't like signing up to be someone else's beta tester for free!
Is the code still maintained? Is this an active company with a project? Or a group on source forge? Are the developers still around and the forums active? If I need a new feature further down the line is there chance of support? I don't usually want to pick up the whole dead weight of supporting unsupported code that I didn't write if I can avoid it.
Can I use it as is? It frequently takes longer to modify an "almost there" modules to do what you need than it would have done to reimplement the wheel as it were and write it yourself first time, and writing it yourself will atleast make future debugging easier assuming you have a good memory for design and good coding practices.
Is there documentation? The old comparison about documentation and sex, when its good its very very good and when its bad its better than nothing. I dont want to have to read someone elses uncommented un documented code just to evaluate if it might work for me. I want to be able to read a good overview of the library, its functions, methods, attributes, errors and exceptions - CPAN is an excellent (in most cases) example of what I mean.
Thats a pretty hard list of requirements to meet - true it shouldn't be, but this is the real world. If those requires are not met then odds are it will be less effort in the long run for more reward for me to implement it myself.
$_="Slashdotter";$syn="OTT";s;..;;;sub _{print shift||$_};s!ash!Perl !;s=$syn=ack=i;tr+LLEd+BLAH+;_"Just Another ";_
While I certainly would welcome anything that could help me find code, the reason I'd want it is to find reference code, not reusable code. I've been programming for, oh, two decades now and one thing I find myself doing constantly is finding a bunch of libraries or bits of code and coming to the conclusion that I should just write it myself because of one of the following:
:)
1. The library/code is good, but doesn't quite work the way I want it to
2. The library/code is close, but getting it to work the way I want is painful
3. The library/code is bad
4. The documentation is bad/nonexistent
5. The license is prohibitive or annoying (i.e. it's not LGPL or BSD or the like)
6. I enjoy writing code and sometimes I feel I could do it more elegantly, or efficiently (I might just want a very specific and optimized part of it)
More often than than not though I just enjoy coding and I love learning to code by writing new code. The black box thing... eh... I like to tinker under the hood and find out how things work.
But my point is that finding code is not that hard. It's finding code that fits *exactly* what we want. Code is usually just not quite as modular as we'd like to believe and, if we're honest, as programmers we have a certain vanity about writing code so it does things My Way.