Undocumented Open Source Code On the Rise

← Back to Stories (view on slashdot.org)

Undocumented Open Source Code On the Rise

Posted by Soulskill on Sunday June 15, 2008 @04:40AM from the exercise-for-the-reader dept.

ruphus13 writes "According to security company Palamida, the use of open source code is growing rapidly within businesses. However, the lack of documentation and understanding of how the code works can increase the vulnerability and security risks the companies face. OStatic quotes Theresa Bui-Friday saying, 'In 2007, Palamida's Services team audited between 300M to 500M lines of code for F500 to venture-backed companies, across multiple industries. Of the code we reviewed, Palamida found that applications written within the last five years contain 50% or more open source code, by a line of code count. Of that 50% of open source code, 70% was undocumented. This is up from 30% in 2006.' How can businesses protect themselves and still draw on open source code effectively?"

9 of 94 comments (clear)

Min score:

Reason:

Sort:

Re:Source code is its own documentation by mikael_j · 2008-06-15 04:56 · Score: 1, Informative

I disagree, I tried changing some stuff in the rTorrent source code and noticed that sometimes the only comments/documentation to be found was the GPL notice at the beginning of each file, I never did manage to make the changes I wanted (but I got kind of half-way there at least).

/Mikael

--
Greylisting is to SMTP as NAT is to IPv4
Re:Not just for security by Otter · 2008-06-15 05:05 · Score: 4, Informative

"Documented" in this story means that the company's developers have documented what the hell is going into their codebases (with respect to licenses, keeping things updated, and so forth). It has nothing to do with either user documentation or source code comments in the original open source project.
That said, the "70%, up from 30%" numbers are absurd. There is no way that the failure rate to document use of open source code more than doubled in 2007.

--
What I'm listening to now on Pandora...
Re:Avoid projects with one developer by jgrahn · 2008-06-15 05:34 · Score: 2, Informative

A basic problem with open source is that once you get beyond the top 50 or so projects, the quality is usually crap. Look at the source from a few random projects on SourceForge. There aren't that many real "community" projects, where multiple programmers are working on the same code. The long tail isn't very good.

You have a point, but s/the top 50/the top 1000 or so/. You have to count various C libraries, and things like the Perl modules at CPAN. Many of them are in wide use, and should be trustworthy.
Also, I'm not so sure that community projects are generally better than single-person projects -- if you don't count crap projects which only the author can love.
Re:70% Undocumented, huh? by Anonymous Coward · 2008-06-15 07:44 · Score: 1, Informative

If you have ten projects, and two use open source and the others don't, then if your records indicate one project uses open source, your records are 50%. If you don't record the use of open source, you are 0%. Note: this has nothing to do with how well the *code* is documented, it is how well the sources for code are documented.
Re:70% Undocumented, huh? by civilizedINTENSITY · 2008-06-15 07:54 · Score: 3, Informative

But its not per line, but per application. If they used open source and documented "we used code from project whatever", that counts as one case of documented code.
Re:70% Undocumented, huh? by civilizedINTENSITY · 2008-06-15 07:55 · Score: 2, Informative

No, it means that in 100 projects that used open source code, 50 of the projects documented that they had code from a certain open source code base.
Re:I notice an omission by civilizedINTENSITY · 2008-06-15 07:57 · Score: 2, Informative

Actually, they never said anything about whether the open source code was well documented or not. They said the projects using opensource didn't document that a particular opensource code base was part of the project.
Re:Meh, I'll save y'all reading all of this by civilizedINTENSITY · 2008-06-15 08:01 · Score: 2, Informative

"hey, your hello world program uses library Y, which is 2 million lines that we don't think is documented properly," then the "application" does not *contain* 50% or more open source code, but rather *references* a certain amount of open source code, which is probably a meaningless statistic.
Its not that the 2 million lines of code is undocumented. It might be documented very well. Its that the project doesn't record the fact that code is used from Project OpenThingee. Thus, when OpenThingee finds a problem and patches it, no one knows that their codebase needs a patch, because no one knows it uses OpenThingee code. Therefor, its not a meaningless stat.
Re:Source code is its own documentation by Xtifr · 2008-06-15 09:27 · Score: 4, Informative

I keep hearing people pro open source code say "I can check it!" Well can you? Have you done so - in a project spanning more than a few thousand lines of code? Yes, all the time. Not every line of code, of course, but with my Debian Developer hat on, I have at least browsed through the vast majority of the code for, e.g. tcl/tk, and at least skimmed the code for hundreds of other projects. And even with my day-job hat on, I have done a lot of ad-hoc browsing through random open-source projects that we're either using or thinking of using. Evaluating the code base is, or should be, a big part of deciding whether to use (or continue to use) a given project or library.

You seem to be suggesting that the only way open-source can be safe or useful is if everyone evaluates every line of code they use. That's silly, of course. Open source can be safe and useful as long as enough people evaluate enough of the code. And given the number of random patches (some good, some bad) that the Debian project alone receives on a daily basis, I can assure you that a lot of people our there are reading a lot of code.

Of course, I don't personally need to evaluate every line of code in a project as long as I know (and I do) that there are others out there like me who at least do spot inspections. A little pro-active inspection up-front to give yourself at least a basic idea of how the code works can save a lot of grief further on down the line. I count it time well spent.
With proprietary code there are someone I can call and they are by contract obliged to fix problems within a certain time frame. That has nothing to do with the code being "proprietary", and everything to do with having a support contract. Do you imagine that companies using open-source don't have support contracts?
Have you ever even considered just how bloody huge the code base is for something like a database? What does that have to do with anything? I've seen tiny projects that were incomprehensible messes of tangled spaghetti code, and huge projects that were clearly and cleanly laid out, well organized, and a piece of cake to maintain, support, study and evaluate. Frankly, I'll take the latter over the former anyday. It's not about the size of the code base, it's about the structure and organization.
Also as a developer I got enough to do creating my own applications [...] Ah, well if you're the kind of developer who works in complete isolation on your own projects with no interaction with anyone else, I can understand your point of view. But that kind of development is pretty rare these days. Most of us work on teams, and evaluating other people's code is an almost-daily part of the job. The majority of that, at least in my case, involves code reviews (formal or informal) for other people in the company, but our code reviews are by no means limited to in-house code. We take more care with our own code because we know that we're the only eyes on it, but that doesn't mean we're foolish enough to assume that all third-party code is perfect and flawless.