Finding New Code
tabandmountaindew writes "Too much time is wasted re-implementing code that someone else has already done, for the sole reason it's faster than finding the other code. Previous source code search engines, such as google codesearch and krugle, only considered individual files on their own, leading to poor quality results, making them only useful when the amount of time to re-implement was extremely high. According to a recent newsforge article a fledgling source-code search engine All The Code is aiming to change all of this. By looking at code, not just on its own, but also how it is used, it is able to return more relevant results. This seems like just what we need to unify the open-source community, leading to an actual common repository of unique code, and ending the cycle of unnecessary reimplementing."
I wonder how fast we will see other types of code.
-- if you mod me down, I will become more powerful than you can possibly imagine
I'm not a coder, but my impression of the vast majority of coders is that they reinvent the wheel because they believe that everyone screwed up their wheel implementation and if no one is going to do it right, they should.
1) "Java Only for now, more coming soon!"
2) "Alpha"
3) The linked article is a "product announcement" on Newsforge
This is slashvertisement for a vaporware product. Although this is promising, there is nothing concrete there to call it "what we need to unify the open-source community", not even an alternative to Google codesearch.
Btw, is alpha the new beta?
If we create this grand, uber code-searching portal, which can search the context of the code, aren't we making it easier for commercial entities to go ahead and and pick and choose those bits of code to use in their products, knowing full well that they're going to violate the GPL (or other OSS licensing models) by doing so?
I've talked to NO LESS THAN a dozen commercial companies in the last 2-3 years where they're actively taking FOSS source and incorporating it into their products, because.. (and I quote) "..Its freeware, so we can use it however we wish."
The licensing differences between "freeware" and "free software" seem to escape them. Just google around and you'll see thousands of FOSS projects listed on sites like TUCOWS, download.com and others, as "freeware" and not the proper "free software" that they are. There are also people who think "free software" means just that (lowercase "F" there).
Let's be sure that if we have a search engine that let's brainless developers look like experts by cutting and pasting bits of OSS code from here and there together to make their software work, that they know what the license is and that they must be in compliance with it to use it.
Please?
Don't get me wrong, if you're developing a stand alone project that wont be a dependency for someone else, then you absolutely want to rewrite as little code as possible. Let someone else maintain as much of your codebase as possible. But if you are writing something that other projects will be using as a dependency, don't you dare make me download four other libraries just to run your code. Write your own dang StringUtils or, if you're lazy and your project is GPLed, just copy the code.
real programmers use CPAN
I just ran a search for "the 500,000 lines of code I need to finish by friday all the stupid extra features the PHB wanted after we had set a deadline based on the original spec".
0 results, rather disappointingly.
Do not try to read the dupe, thats impossible. Instead, only try to realize the truth
What truth?
There is no dupe
So what if a particular solution has been implemented before? Continuously writing code should keep people on their feet, and ready for when they have to respond to a unique situation, right?
Maybe they should store annotated AST's instead of raw code.
Now we'll see the One True Wheel (tm) which will fit on all cars.
Screw different tires for different conditions, screw differing performance.
The One Wheel will rule us all.
I am an embedded engineer. Various firms I have worked for have tried to implement some kind of "reuse" code store. But every time real-time considerations and platform specifics have derailed it (thankfully) in the early stages. At the low-level so called "code reuse" is (IMHO) a nothing short of a right royal pain in the neck. It looks good on paper, managers like the concept - but it is impossible to implement without large amounts of hardware abstraction. Maybe it makes more sense further up the SW tree, at a higher level when things are not so resource critical.
The reason people often roll their own "trivial bits" of software is not so they get the best quality. It is so the developers have full comprehension of the code and that it follows the guidelines they use for their projects.
There are enough faulty assumptions about code you developed internally. Why should one keep trying to cram the square peg into the round hole. Sure you can shave it down, refactor the block all you want. But it may well prove to be faster and better to craft the piece you need.
Diversity does not hurt in the long term. If everyone just followed the status quo, and reused the same blocks of code, how boring would that be?
I only look human.
My mother is a halfling and my dad is an ogre, so that makes me an Ogreling
We shouldn't ignore a good idea just because it makes it easier for someone to do something illegal. There are laws to protect the code. I think the benefits will outweigh any loss from commercial companies stealing the code. I hope this does work out as well as it looks like it could.
http://bgcommonsense.blogspot.com
I think in order to be really useful for not reinventing the wheel, it should allow intelligent searching for licensing. That is, it should allow to restrict your search to codes with certain licenses, or even better, to code under a license compatible with any given license (or set of licenses).
For example, if you are working on code which you want to release as BSD, it's not much help if you find code licensed under the GPL, even if that code on its own is great. Likely, if you are writing GPLed code, you are not interested in code under licenses incompatible with the GPL (like e.g. the MPL).
Of course, the search engine cannot make a guarantee that the license will fit your needs, but then, it cannot guarantee that about the code's functionality either.
The Tao of math: The numbers you can count are not the real numbers.
While the article mentions that too much time is spent re-implementing new code, I disagree that this is necessarily a bad thing (tm). Re-inventing the wheel can often cause evolution of code, as opposed to the stagnation that can occur if something remains static. Now, of course people will say that this is GPL code, and people can then modify it -- this is of course true, but modification on that level seldom equates to evolution per se, sometimes because the changes as specific to the application, sometimes because you are trying to do something with code that simply wasn't designed for (I guess you could equate it to trying to run a a web server from Windows 95).
I'm curious about these companies because un my experience, companies take licenses very seriously. Were these small companies or large companies? Were they clueless idiots in general, or just on licenses? Was this done out of malice or lack of understanding? Did you press the issue to them? If so, what was their reaction.
I would love to hear more.
There is probably a point at which the system complexity of the resued code becomes great enough that the re-use is valuable. But how big, and how mature the reusable codebase, affects this decision.
www.jmagar.com
-
That's one of the things I like about image-based languages. Code snippits can easily be downloaded and integrated on a running system.
I tried searching for more general level stuff.
"Radiosity"
gave me several pages of "Radix Sort".
For inexperienced reader: these are not related at all. One is a general sorting algorithm in computer science and one is lightning algorithm used in computer graphics, and games like Quake.
So in my guess it just does a fuzzy search and yields more results. Getting more results, which are not the ones you want won't help you one bit. Useless for me.
...GCC is. A Compiler and a complete set of libraries for most mainstream languages, along with the source. You have everything you need to write your application.
No need to reinvent basic_string or itoa()
But of course this is little help to those in my profession who insist upon wrapping up (obfuscating) everything they do in layers of 'abstraction'.
Most of thier stuff is legacy code before it is even released. Me, I stick to the KISS philosophy. That way, when the FNG comes along he can understand what my code is doing, and fix bugs with confidence he isn't going to break something else. But hey, that's just cynical me, YMMV.
This system would be a great feature of SourceForge. Finding all the common components in different projects to be factored out and share instead. I was always disappointed in SourceForge's lack of intelligence about the related contents of its different projects. This thing could find relevant code and import it into the integrated navigation.
--
make install -not war
BSD rules !
Oh really? Which part of the market does it rule again? None, you say?
If these companies release public binaries, and you feel what they are doing is morally wrong, you could consider anonymously blowing the whistle on them to people who are capable of analyzing the binaries and who can take it from there legally speaking. FSF or some similar organization would seem to be a prime candidate. Some people seem to be very skilled at recognizing open-source code even in binaries - mostly, I believe, by looking for embedded strings and similar - but they probably would be more effective if they could look for specific libraries in specific binaries instead of searching at random.
Corporations probably won't learn the difference between "freeware" and "Free software" until it becomes apparent that not doing so will get them in legal troubles.
I'm committed to the better solution being better languages. The likes of python, ruby, boo (no lisp related debates please) add better features making coding denser and faster. Better tools and to some extent in the areas of GUI widgets components mean there is less re-work.
The big gain is not having to search for what has been done already, learn it, tailor it.
If I download open source code and combine it with my software, I have to agree to some conditions which I might understand and which the "author" might not even be aware of (maybe he cut and pasted from some GPL'd code). I risk having the FSF or some company come after me with hotshot lawyers like Eben Moglen or David Boies, after I've distributed my code.
If I write the code myself, it takes longer, but it's likely to be better - faster - cheaper - more relevant to the task and tools at hand. And it's unencumbered (well, apart from the patent nuisance which is orthogonal to this subject).
We need a catchy phrase to describe "unencumbered software", which is what everyone thinks of when they first hear the phrase "free software" until Richard Stallman explains it to them a half dozen times.
"I've talked to NO LESS THAN a dozen commercial companies in the last 2-3 years where they're actively taking FOSS source and incorporating it into their products, because.. (and I quote) "..Its freeware, so we can use it however we wish.""
*shrug*
Works for me, now if you'll excuse me I have some MPAA/RIAA/Usenet to illegally download.
'binary diff', 'levenshtein distance' -- no hits.
'morphological analyser' -- 1 hit (inappropriate)
It's completely useless. Am I jumping to conclusions? Mm... no, I don't think so, it really is utterly useless.
Whence? Hence. Whither? Thither.
One thing that I find disappointing with all the code search engines is they all treat them as regular text files, more or less.
None of them seem to make an effort at understanding the code syntax.
That's why a few years ago I wrote one for C/C++ code called http://csourcesearch.net/
I just did it as an experiment, and using all open source software and in my spare time, but I think it having the ability to syntactically know the difference between a comment, a function, a structure, etc. makes a big difference.
When Google launched their engine, I was disappointed they didn't take the extra time needed to make their parser/engine smart.
Just because you find code doesn't mean that it works as advertised, doesn't have memory leaks, buffer overruns and so on.
Code also isn't tagged with environmental issues: how much memory is used, STL or Boost requirements, what system functions are needed, library dependencies, etc.
Code is often layered on top of custom libraries. Sure, here's code to render HTML, but it needs a dozen custom data structure modules from the Netscape code base, for example.
What code search engines don't tell us is sometimes the code you get just sucks ... or is poorly adapted to your need.
... or ... homemade browser or ... whatever, just function to help me do what im trying to do without reinventing the wheel because i know its been done already.
i usually search for a function to say, validate a string for phone
but most of the time when you do find it chances are it wont be adapted to your need, so much that you end up writing your own based on what you've seen. which explain why there are gazillion versions of (example) javascript calendars on the web.
One thing that happens too is that the code just sucks, its buggy, its buggy because of your own rules, or sometimes plain buggy.
I wish there was a checkbox on the page to specify "bug free" or "not sucky".
But that's impossible, if something could be done it would be to store only independent controls or function. things that work on their own. so that i don't need to end up rewriting the whole thing to make it work in my code.
If you look like your passport photo, you're too ill to travel. - Will Kommen
So far, every search I have made, from names of people, companies, class names, etc. yielded results that didn't even contain the search term(s).
Fuzzy searching is one thing, but at least TRY to get an exact match and let me know when you're just taking the first 2 characters of a search term.
Two points.
First, in some cases taking a chunk of somebody else's code is a great time saver. It can be problematic, however, when the code doesn't do what you want and then you have to jump into it. Is there anything more painful than trying to understand somebody else's code? On rare occasions it's easy - good comments, good style, good structure. In most cases it's a pain in the ass, even if they've taken the effort to write good comments and nice code. In these cases one has to make the call on whether it would be easier to simply write their own code, using the 3rd party code as reference. Sometimes this is beneficial, especially if you're planning to maintain the code for a long time. In other cases it turns out to be a big waste of time. Magic eight ball, don't fail me!
Second, in the case of more "complete" code that you'll be linking to...this can lead to distribution hell. This is especially difficult for users of FOSS (and especially for non-Windows projects, simply because people are more apt to package an installer on Windows). Well, it's not difficult for the developer, it's difficult for the *user* when the developer says "oh, here's a library that I'd like to use...sure it's in alpha, and not part of most standard distributions, and relies on eighteen other packages that aren't part of most standard distributions, but it'll sure save me some time, so if the user wants to use my software, they can suffer like I have to suffer."
Choices choices choices. If you're going to insert code into your project, then it's up to you to decide if that code does what it should, and if you want to maintain it. If you're going to link to code, it's important to research first if your users have easy access to the libraries.
That said, there are no downsides to new ways of seeking out code. Just for research and learning it's always going to be a boon.
/ducks *sorry*
I worked for a company, that wrote its own distributed computing system (in Java/XML). It sucked awfully by all measures (latency, CPU-load, memory requirements, bandwidth), but they would not dump it in favor of PVM or one of the MPI implementations because:
This is such a common problem, there is a term for it...
In Soviet Washington the swamp drains you.
Rather than use this as a product announcement, they should have quietly rolled it out. Gotten more than the Java repository going, then announced.
Rating: Meh
Politics is the art of looking for trouble, finding it everywhere, diagnosing it incorrectly and applying the wrong fix.
Is the code easy to find? Will a quick search of sensible key words take me to a short list of results with high accuracy? No point in spending an hour wading through results that may or may not be useful when I can implement and test it myself in 2 hours.
Is the license clear? I may eventually want to release as open source or commercially use something I write. If I include someone else's code/library I have to make a note (hopefully in the LICENSE file provided with the code or in the top of the code comments) on what the license is. Is it BSD, GPL, public domain, not stated or some commercial license that lets me look at the code but not use it myself?
Is the code self contained? This generally means does it come as a library. I dont like copying and pasting code into my code - especially if its not the same coding practice as my own. (this comes abck to licenses above - if its self contained and with an incompatable license atleast I can rip and replace later if I need to)
Is the code well known? Is if the defacto standard for doing this type of thing (STL, Perl core, glibc)? Or is it one of several well known options for the same thing (gtk, qt, kde)? Or is it an unknown? This will help you know how well this code is field tested already - I don't like signing up to be someone else's beta tester for free!
Is the code still maintained? Is this an active company with a project? Or a group on source forge? Are the developers still around and the forums active? If I need a new feature further down the line is there chance of support? I don't usually want to pick up the whole dead weight of supporting unsupported code that I didn't write if I can avoid it.
Can I use it as is? It frequently takes longer to modify an "almost there" modules to do what you need than it would have done to reimplement the wheel as it were and write it yourself first time, and writing it yourself will atleast make future debugging easier assuming you have a good memory for design and good coding practices.
Is there documentation? The old comparison about documentation and sex, when its good its very very good and when its bad its better than nothing. I dont want to have to read someone elses uncommented un documented code just to evaluate if it might work for me. I want to be able to read a good overview of the library, its functions, methods, attributes, errors and exceptions - CPAN is an excellent (in most cases) example of what I mean.
Thats a pretty hard list of requirements to meet - true it shouldn't be, but this is the real world. If those requires are not met then odds are it will be less effort in the long run for more reward for me to implement it myself.
$_="Slashdotter";$syn="OTT";s;..;;;sub _{print shift||$_};s!ash!Perl !;s=$syn=ack=i;tr+LLEd+BLAH+;_"Just Another ";_
... sounds to me like a thoroughly bad idea.
There's a reason for things like specifications, documentation, source control, testing, etc.
Maybe you'd rather google for popular home remedies rather than consult a professional doctor?
I'd google for *algorithms* if I was at a loss, but I'd certainly want to implement them myself.
What is useful in terms of code reuse is more controlled coherent collections of code that are highly tested, documented and generally controlled, such as the C++ Boost collection, but even there while I'd use it for a hobby project, I'd want to throroughly vet it for eror handling etc and run it through my own test suite before I'd consider using it professionally (assuming that it wasn't against company policy to do so).
There's also the whole issue of Copyright and patents. Nowadays there's no legal need to add a copyright notice to code - it's automatically copyright protected by default, so you could only reuse code that explicity grants a licence on terms that are suitable to your use (presumably including the separate right to modify as well as merely use). Patents are more tricky... At least if you unknowingly reimplement some patented algorithm yourself there's less liability than if you deliberately reused a patented algorithm, but grabbing stuff off the internet would seem to blow away any "clean room" type defense. Not that I support software patents, but unfortunately, in the US at least, they are a reality.
Other than popularity ranking in the results, there is no karma system/peer review ranking associated with the snippets. Wouldn't that be also useful? Binaries at download places get a ranking system (4 cows for example, etc)
More languages Summer of 2007?
You product announce in Feb 2007, and expect people to remember to "check back with you" 6 months later? Let me know how that works for you. This crowd will have forgotten you by then because someone else will have done it properly.
Languages to get up RIGHT NOW and screw "Summer of 2007":
The problem is, this outfit is already trying to be in the list of also-rans.
To spark submission, get backing, hold some competitions and rewards for code submissions. Winners are based upon level of abstraction (for max reuse) and level of usefulness.
Consider a "Howto" to create abstract code. Most of the submissions I see on other sites have wonderful examples, but it takes too long abstract the code for use in another project. A lot of young and not-so-young developers spend so much time with heads-down-over-the-code projects, they don't have time to learn how to really do it. They know the theories, but they are always too busy on the "current" project and will "refactor later when we have time to do the 2.0".
Politics is the art of looking for trouble, finding it everywhere, diagnosing it incorrectly and applying the wrong fix.
1st of all it's a dupe. Aprox. a year old I suspect.
2dn: Frameworks, standardised open source Application stacks and comprehensive documentation are what speed up coding and developement. Copy/Pasting foreign code rarely helps.
3rd: It's only for Java.
We suffer more in our imagination than in reality. - Seneca
It would help if developers would stop pulling logging frameworks and other superfluous crud into libraries. It's like the woman who swallowed the fly, developers are looking to solve a specific problem not introduce your entire friggin dependency tree.
Licensing is a separate issue, code that does one thing and does it well is widely reused (eg: libdl, libm, libz...).
I find it somewhat ironic that there are Google Adwords Ads all over this "new open source coded" search engine... I mean don't they aim to take Google down?
While I certainly would welcome anything that could help me find code, the reason I'd want it is to find reference code, not reusable code. I've been programming for, oh, two decades now and one thing I find myself doing constantly is finding a bunch of libraries or bits of code and coming to the conclusion that I should just write it myself because of one of the following:
:)
1. The library/code is good, but doesn't quite work the way I want it to
2. The library/code is close, but getting it to work the way I want is painful
3. The library/code is bad
4. The documentation is bad/nonexistent
5. The license is prohibitive or annoying (i.e. it's not LGPL or BSD or the like)
6. I enjoy writing code and sometimes I feel I could do it more elegantly, or efficiently (I might just want a very specific and optimized part of it)
More often than than not though I just enjoy coding and I love learning to code by writing new code. The black box thing... eh... I like to tinker under the hood and find out how things work.
But my point is that finding code is not that hard. It's finding code that fits *exactly* what we want. Code is usually just not quite as modular as we'd like to believe and, if we're honest, as programmers we have a certain vanity about writing code so it does things My Way.
So your out of your domain.
IT was UNIX its self that came with the concept of "Software Tools".
It's the goal of any good programer to make simple tools that one can reuse over and over. But few can actuly do it, and do it well.
Consider that many people in FOSS mostly code for fun. If this gizmo is always going to find the exact piece of code they need.... then it is taking the fun out of coding! This will kill FOSS! And it is aimed at damaging Google as well! So it must be some Microsoft hideous machinery!! (You know how I know this? It does not work!!)
this is the same thing, except this one is developed by students. http://sourcerer.ics.uci.edu/ it crawls sourceforge and lparses the code, the creator, etc. ITs in beta stages of course, but it has great potential.
Too much time is wasted re-implementing code that someone else has already done, for the sole reason it's faster than finding the other code.
I thought that if you take the *faster* route you are *saving*, not wasting time...
For those who have not yet used Krugle, check out a posting by Justin Royce a while back - his review is fairly complete. (http://webtekconcepts.com/2006/11/30/krugle-goes- grassroots/)
Search engines ultimately are judged on the accuracy and relevancy of their results. Krugle parses the code to understand the context. The relevancy is based on whether the result is in a function call or class def vs comments etc, and Krugle leverages project meta data such as number of committers and frequency of project updates to accurately prioritize search results.
And documentation can be searched quite nicely on Google. What's the running time of this algorithm? What are the limits on the inputs? How to change traversal of the data structure to different order? Without a clear answer, maintaining the code may well be more difficult than writing it from scratch.
Java only? Please. Lack of broader language support makes this next to worthless. They're promising more languages in summer of 2007, but who knows whether they'll actually deliver. Krugle and GCS both support two dozen plus languages already, including C/C++ and all the popular scripting languages (Perl, PHP, Python, Ruby, etc.). Wake me in six months when ATC has made good on its promise to catch up.
Here are some of the top reasons why I end up rejecting use of open source code:
1. No description on the web site of features & system requirements.
2. No documentation for how to install or use it.
3. No stable release and/or still alpha.
4. Written in PHP.
Finding the code is the least of the problems.
GCHQ Quantum Insert installed. If only our tongues were made of glass, how much more careful we would be when we speak
After a bit of hacking around on a functional area, they have a rough idea what that part of the program needs to do. This is the point where things go awry. The sensible next step is to draw up a rough list of requirements, find and evaluate the existing libraries or frameworks, and choose one to replace the current code and fill future needs. Unfortunately, there are strong irrational factors that push against this.
And the ultimate reason:
People are proud of their code and proud of their unique style. They have a visceral aversion to replacing their unique conception of the problem with a standard approach. (It would be a shame if everyone threw away their unique approach; after all, we always need diversity and new approaches. But there's a difference between making a serious attempt to create something new and inventing excuses to cling to every bit of one's creative output.)
As a result of one or more of these pressures, the experimental implementation often hangs around, keeps growing, starts to look very much like previously existing libraries (as developers borrow ideas and come to convergent conclusions), and imposes high support overhead. Learning to use an existing, mature implementation of the functionality you want (or similar functionality) should be a prerequisite for making a serious attempt at creating a full-fledged alternative.
He is just doing this as a joke to get on Slashdot. All it does is do an approximate substring search of the source text. He doesn't even parse it. Wow, it must be a slow news day today.
Because a student is supposed to be learning how to do something, and possibly proving what they have learned to a 3rd party. Someone writing code (in other situations) is usually doing it to accomplish a specific task, minimising the time/money spent on doing that. If they can't find some existing code, they should be able to implement it themselves. If all they've ever done is copy code when they were a student, they are unlikely to have this ability.
Next question.
CPAN.
This is the #1 reason I work heavily in Perl. If I need to do something, 95% of the time there is a module on CPAN that will do the hard parts for me so I can just focus on getting what I need done.
I would love something like CPAN for Java. I prefer it's syntax and do a decent amount of webapps (w/ Echo2) with it. It's just when I work in Java takes me a while since I end up writing so much by myself.
"The Federal Reserve is a fraudulent system."--Lew Rockwell
End The FED. -
Obviously, you are not a coder. There are some coders who will reimplement everything. That is the classic NIH syndrome (back in the late 80's, early 90's, HP had it real bad).
....
Others will ask will it work for me? If it will not, then they will redo one (interestingly, many of them will "borrow" design and code from L?GPL version. I have seen it in 3 places now).
If it will work AND is not the core slowdown, then they will use.
If it works, but it is too slow due to being too general, then they will redevelop. And for a lot of core code to an application, that is the case. GTK/Gnome and KDE/qt have redeveloped a lot of code. I do not always agree with their choices, but
Overall, most coders do not redo.
I prefer the "u" in honour as it seems to be missing these days.
The .NET frameworks of were distributed to the developers of the land, but all of them were deceived. Deep in his fortress in Redmond, Gates developed his own framework. A master framework. One framework to rule them all. And soon, all of the developers were bound to Gates and forced to serve his bidding.
.... reasons.
source is done in various languages, modifying existing code to fit a new project can be more prone to creating bugs then doing it fresh, code might be useful in a way it was not original though of being used (creating another aspect of search...), a there may be licensing issues, etc..
Ultimately what we need is a higher level abstraction machine BUT disconnected from actual code. A level of abstraction that can be used to define the objective and constraints required and apply it to a code fragment/algorithm base and designed to go through a cycling of refinement regarding specifics. Once done, code is then generated in the language specified, or best fit.
And in time refinement of the code fragment/algorithm would be debugged to improve overall quality of resulting code.
Is this to far in the future in thinking about?
No! The base line abstraction mechanics is already identified and defined and even mostly coded.
Virtual Interaction Configuration
Also see: Abstraction Physics
Regardless of his motives, he analysed usage logs of his system for the period in which it was alpha, and from the search patterns it is clear that multiple people found it quite useful --- they found the sort of code they were looking for. not only that, but the search algorithm was optimised based on these trial usages.
I do not. Most of the code is just not of that quality, uses wrong dependencies.... Is not generic enough, is not written in the language I like, etc, etc. Its one thing to have some quick sort or eigenvalue algorithm written right ONCE. It is quite different with the 99% or everything else.
First one to compile and run an Eliza program purely by (semantic) googling of blogs and sections of code, wins.
fortune -o
Ignoring the slashvertisement, I think the real reason people tend not to reuse code is because any code they find will be either (1) broken, or (2) not up to the specific task and also broken. With rare exceptions, all code is broken to some degree, including yours (and including mine). Newer code tends to be slightly less broken about older code, as more people find new ways to break things; but however much some CS professors like to go on about OOP or whatever the latest fad is, the art of software engineering simply hasn't advanced to the point where people can reliably build non-broken software, in the sense that civil engineers build non-broken bridges or architects build non-collapsing buildings. Until we find the silver bullet that reduces software engineering to a reliably solvable problem, there's really not much we can do.
What, there's no silver bullet? Well, then I guess we're just SOL(*), aren't we?
(*) SOL: Secure in Our Ljobs (the L is silent)
How is this different from existing code search engines like http://www.koders.com/?
What is the USP of this new code search engine?
Regards,
Mahesh
Most of the time when I'm looking for source code, it's just to get some examples of how other people have dealt with a particular problem, or when figuring out how to use a new API. Only rarely have I been able to just drop in someone else's code and have it work as-is. There are usually dependencies/conflicts to work out, or incompatible parameters (arrays vs delimited buffer, etc). The sample code may also be tied to a sample GUI, which would need to be merged with my own. Or hacked severely to work as a console/server app. Many times, the time spent understanding the code well enough to modify it would have been better spent re-inventing the wheel.
This isn't to say that re-use is always bad. I've used things like pkzip and zlib libraries and saved a _lot_ of time (and some of my now-scarce sanity).
On the other hand, some libraries pack so many features (e.g. Stingray) that implementation can be rather involved. Nothing against Stringray -- they make some very difficult stuff fairly easy to do. But for the first project I used it on, we were only using a small subset of the capabilities and the overhead of learning the API, adapting to their updates/upgrades and rebuilding/distributing the library modules outweighed the advantages. Especially over the life of the project, which evolved from Visual Studio 6 on Win98 to VS.Net on XP. There's also been significant turnover in the development team over the years, adding to the learning curve.
There are 270,000 words in the English language. Most speakers know only a fraction and use an even smaller fraction. When they need to express themselves, they reuse the closest word that easily comes to mind or they make up 'slang'. Why? It may be a complexity issue related to the capacity of human memory, or it might be an emotional issue, or it is most likely is something that I don't even understand.
Coding is like that, but to a greater degree. Determining the 'code fit' may be harder than finding the right word, even if you have the library in your lap. It may take a machine to fix the coding problem, just like it took a book to catalogue all the words. Someday, if we can find the right words to describe the code we need, a machine will probably find it for us.