Slashdot Mirror


Undocumented Open Source Code On the Rise

ruphus13 writes "According to security company Palamida, the use of open source code is growing rapidly within businesses. However, the lack of documentation and understanding of how the code works can increase the vulnerability and security risks the companies face. OStatic quotes Theresa Bui-Friday saying, 'In 2007, Palamida's Services team audited between 300M to 500M lines of code for F500 to venture-backed companies, across multiple industries. Of the code we reviewed, Palamida found that applications written within the last five years contain 50% or more open source code, by a line of code count. Of that 50% of open source code, 70% was undocumented. This is up from 30% in 2006.' How can businesses protect themselves and still draw on open source code effectively?"

10 of 94 comments (clear)

  1. Source code is its own documentation by mangu · · Score: 4, Insightful
    I'd rather have the source code which I can read and try to understand than an executable file alone.


    The only reason why we don't see an article "Undocumented Commercial Software On the Rise" is because the public cannot see how badly documented the commercial software is.

    1. Re:Source code is its own documentation by jps25 · · Score: 5, Insightful

      I disagree.
      This isn't about closed vs open source, this is about decent programming.
      Comments in code are neccessary and a minimal requirement for any project.
      At least add one line to any function explaining what the function does, what its input is and what it returns.
      This isn't so hard and it won't kill you, but it'll make life easier for you and anyone else who will have to deal with the code later.
      It also makes finding errors easier, as your code may not be doing what your specifications say it should do.
      I don't understand this hatred for comments and the "code-is-its-own-documentation"-philosophy. I really don't.

      <code>
      #include <iostream>
      #include <algorithm>
      #include <iterator>

      #define ch_ty(ty)           std::istream_iterator<ty>::char_type
      #define tr_ty(ty)           std::istream_iterator<ty>::traits_type

      #define cin_iter(ty)        std::istream_iterator<ty, ch_ty(ty), tr_ty(ty)>( std::cin )
      #define void_iter(ty)       std::istream_iterator<ty, ch_ty(ty), tr_ty(ty)>()

      int main( int argc, char *argv[] ) {
        while ( (cin_iter(size_t)) != void_iter(size_t)
                    ? ( std::cin.unget(),
                        argc += *cin_iter(size_t)
                    ) : (
                      printf( "\nsum: %d\n", --argc ), system("exit")
                    ) );
      }
      </code>

      Perhaps easy to understand, but one comment-line would save you minutes wasted understanding and reading it.

      or

      <code>
      #include <stdio.h>

      int v,i,j,k,l,s,a[99];main(){for(scanf("%d",&s);*a-s;v=a[j*=v]-a[i],k=i<
      s,j+=(v=j<s&&(!k&&!!printf(2+"\n\n%c"-(!l<<!j)," #Q"[l^v?(l^j)&1:2])&&++
      l||a[i]<s&&v&&v-i+j&&v+i-j))&&!(l%=s),v||(i==j?a[i+=k]=0:++a[i])>=s*k&&
      ++a[--i]);printf("\n\n");}

      </code>

      Well, obviously obfuscated, but one comment and it's immediately clear what it does.

    2. Re:Source code is its own documentation by Splab · · Score: 4, Insightful

      I keep hearing people pro open source code say "I can check it!" Well can you? Have you done so - in a project spanning more than a few thousand lines of code? Just because the code is there to see doesn't actually mean its doable to waddle through it.

      I'm not for either open source or proprietary code, my employer pays me money to produce code, what he does with it is his business, but what I do have, is experience using both proprietary code and open source code - both models have pros and cons.

      With proprietary code there are someone I can call and they are by contract obliged to fix problems within a certain time frame. One particular instance is a database we are paying license fees for, I will not name them but to this date I have found more than 10 vectors that causes crashes. Those problems have been addressed by the vendor in a timely manner (I have yet to find bugs that would be show stoppers, but some did require annoying workarounds). With OSS we don't have this possibility, yes, we can log a bug in whatever bug tracker they use and hope someone will address our issue, but we have no guarantee - also in my experience logging a bug with OSS developers can be quite a daunting process, people can have some serious egocentric issues, while this of course is also applicable for proprietary software, there are someone higher in the food chain who can be called.
      With OSS we of course got the good fortune of being able to go through the source code and try to fix the code ourselves... right?
      Have you ever even considered just how bloody huge the code base is for something like a database? Tracking down a bug, well yes, the gdb can tell you where the program stopped working, but unless you have some really really good code reading skills and are up to date on everything that happens algorithm wise you have close to zero chance of fixing anything without causing major problems.

      Also as a developer I got enough to do creating my own applications, I simply do not have the time to dig through thousands of lines of code every time something new breaks. Yes open source is nice, small projects are easy to help get along, fixing small bugs, but at some point the project grows so big that anyone using it needs to have someone they can call at 4 am in the morning to help them.

      Oh and just because some software is proprietary it doesn't mean you don't have access to the source code, even at Microsoft you can buy access to the source.

      We got builds with debug flags from the database vendor because we cannot share our database with them, therefore stack traces etc. has to be generated locally and shipped to them. (yes this is a bit annoying, but having sensitive records out in the wild is a tad more problematic).

      I don't pick OSS over proprietary or visa versa, I pick what ever tool fits my needs.

    3. Re:Source code is its own documentation by Xtifr · · Score: 4, Informative

      I keep hearing people pro open source code say "I can check it!" Well can you? Have you done so - in a project spanning more than a few thousand lines of code? Yes, all the time. Not every line of code, of course, but with my Debian Developer hat on, I have at least browsed through the vast majority of the code for, e.g. tcl/tk, and at least skimmed the code for hundreds of other projects. And even with my day-job hat on, I have done a lot of ad-hoc browsing through random open-source projects that we're either using or thinking of using. Evaluating the code base is, or should be, a big part of deciding whether to use (or continue to use) a given project or library.

      You seem to be suggesting that the only way open-source can be safe or useful is if everyone evaluates every line of code they use. That's silly, of course. Open source can be safe and useful as long as enough people evaluate enough of the code. And given the number of random patches (some good, some bad) that the Debian project alone receives on a daily basis, I can assure you that a lot of people our there are reading a lot of code.

      Of course, I don't personally need to evaluate every line of code in a project as long as I know (and I do) that there are others out there like me who at least do spot inspections. A little pro-active inspection up-front to give yourself at least a basic idea of how the code works can save a lot of grief further on down the line. I count it time well spent.

      With proprietary code there are someone I can call and they are by contract obliged to fix problems within a certain time frame. That has nothing to do with the code being "proprietary", and everything to do with having a support contract. Do you imagine that companies using open-source don't have support contracts?

      Have you ever even considered just how bloody huge the code base is for something like a database? What does that have to do with anything? I've seen tiny projects that were incomprehensible messes of tangled spaghetti code, and huge projects that were clearly and cleanly laid out, well organized, and a piece of cake to maintain, support, study and evaluate. Frankly, I'll take the latter over the former anyday. It's not about the size of the code base, it's about the structure and organization.

      Also as a developer I got enough to do creating my own applications [...] Ah, well if you're the kind of developer who works in complete isolation on your own projects with no interaction with anyone else, I can understand your point of view. But that kind of development is pretty rare these days. Most of us work on teams, and evaluating other people's code is an almost-daily part of the job. The majority of that, at least in my case, involves code reviews (formal or informal) for other people in the company, but our code reviews are by no means limited to in-house code. We take more care with our own code because we know that we're the only eyes on it, but that doesn't mean we're foolish enough to assume that all third-party code is perfect and flawless.
    4. Re:Source code is its own documentation by Bloater · · Score: 4, Funny

      I disagree.
      This isn't about closed vs open source, this is about decent programming.
      Comments in code are neccessary and a minimal requirement for any project.
      But far less important than most people realise. Most code should be self documenting.

      At least add one line to any function explaining what the function does, what its input is and what it returns. Isn't:
      template<typename InputIterator>
      typename iterator_traits<InputIterator>::value_type
      sum(InputIterator begin, const InputIterator& end)

      enough?

      I don't understand this hatred for comments and the "code-is-its-own-documentation"-philosophy. I really don't. If your code is unreadable, then it is bad (see your example). Oh wait... I think I just had a "Whoosh" moment... I did, didn't I? Somebody mod parent up +1 Funny
  2. 70% Undocumented, huh? by Devin+Jeanpierre · · Score: 5, Insightful

    How do you measure something like how well things are documented with a percentage? Some code simply doesn't need documentation. Other code needs plenty. Is 0% a 1:1 relationship between lines of code and lines of comments? That whole thing seems a bit strange. They could certainly back it up if they wanted to, but that'd be too much effort.

    --
    -Devin Jeanpierre
  3. Same old, same old. by khasim · · Score: 5, Insightful

    In today's world of 24/7 and persistent network access, developers dispersed across multi-national sites can include open source, freeware, public domain, evalware (demos of commercial software), etc, into the code they are writing without triggering the usual checkpoints in the procurement process.
    I've seen that same issue YEARS ago. And I'm not talking code snippets. I'm talking systems that had "evalware" tools in them.

    This has NOTHING to do with "multi-national sites" or any of that.

    This has EVERYTHING to do with clearly stating the rules and ENFORCING those rules.

    The rules do not enforce themselves. Someone, somewhere has to approve the code that goes in.

    The problem is that management does NOT understand code and will happily farm out the work to anyone who says that they can produce X lines for $Y. Without oversight. The less oversight, the less expensive the project is. Which means bigger bonuses for those same executives.
  4. Re:Not just for security by Otter · · Score: 4, Informative
    "Documented" in this story means that the company's developers have documented what the hell is going into their codebases (with respect to licenses, keeping things updated, and so forth). It has nothing to do with either user documentation or source code comments in the original open source project.

    That said, the "70%, up from 30%" numbers are absurd. There is no way that the failure rate to document use of open source code more than doubled in 2007.

  5. Re:Not just for security by tacocat · · Score: 4, Interesting

    I would be interested to know what languages you have used.

    I have found Perl to be very well documented, even though it appears to be on a decline or leveled off on the number of developers and active projects.

    Meanwhile, I have looked into use Rails and found it a great example of shitty code practices. I've stated this very case to the development community and they pretty much debunked my statements as one belonging to an inexperienced developer unwilling to "go the distance".

    I hope this might be slightly helpful in getting people like the Rails community to either understand that they really do need documentation or get companies to throw aside Rails as POS software that is so lacking in documentation that it's a greater burden to have it than to use the alternatives.

    There is an excellent case where if you have a highly experienced and knowledgeable developer then you maybe don't care. But if you have to replace this developer with one less knowledgeable or want to expand your development team, you suffer a huge start up cost of trying to bring someone up to speed at your expense.

    Specifically, the Rails plug-ins are documented with over simplified tutorials that aren't even available for free and so you have to make an extra effort to find the documentation for the software that you download since they aren't in the same location. Restful Authentication is one example in particular.

    Add to that the documentation in Ruby DBI. There isn't any. The documentation says to see Perl DBI for documentation. Considering this is a reference to a different language with different syntax and some of the Perl methods aren't possible in Ruby and likewise Ruby DBI has methods that aren't available in Perl. WTF? This is documentation.

  6. Gotta love Slashdot by Anonymous Coward · · Score: 5, Insightful

    Gotta love this place. At the time of this posting, there are 11 comments modded 3 or higher. Of those, only ONE makes any reference to the act of documenting where the code is coming from (which is what the article is about). All the rest are talking about writing documentation for code, or commenting code as its written. Way to miss the ball, guys! This article is addressing you specifically, yet you have no idea what they're even saying because you can't be bothered to try to listen. Nice.