Slashdot Mirror


Code Cleanup Culls LibreOffice Cruft

mikejuk writes with an interesting look at what coders can get around to after a few years of creating a free office suite: dealing with many thousands of lines of deprecated code: "Thanks to the efforts of its volunteer taskforce, over half the unused code in LibreOffice has been removed over the past six months. It's good to see this clean-up operation but it does raise questions about the amount of dead code lurking out there in the wild. The scale of the dead code in LibreOffice is shocking, and it probably isn't because the code base is especially bad. Can you imagine this in any other engineering discipline? Oh yes, we built the bridge but there are a few hundred unnecessary iron girders that we forgot to remove... Oh yes, we implemented the new chip but that area over there is just a few thousand transistors we no longer use... and so on." Well, that last one doesn't sound too surprising at all. Exciting to think that LibreOffice (which has worked well for me over the past several years, including under the OpenOffice.org name) has quite so much room for improvement.

33 of 317 comments (clear)

  1. Not at all. I've had a house built. by Anonymous Coward · · Score: 5, Interesting

    There are probably dozens of extra nails that were just hammered in rather than be removed. There are extraneous pieces of lumber.

    And a house that was remodeled? I've seen newspaper used as filler. I've seen layers of roofing, with things buried in between layers.

    Frankly I don't know what's inside my walls, and I'm not sure I want to know.

  2. I'd bet there is. by JGuru42 · · Score: 5, Informative

    It would not be very surprising to see a lot of dead code.

    I maintain the code for MoreTerra, a Terraria map editor program and I'm pretty sure I've got dead code in there and that's a pretty small project.

    With a large number of people working on the code it likely ends up slowly clogging up as no one quite knows what the others are doing.

    Dare I ask what type of dead code exists in something extra huge, but closed source, like the Windows code base or for MS Office? But I'd
    bet for all MS's faults that the code for Norton Antivirus is 10x worse.

    1. Re:I'd bet there is. by hedwards · · Score: 4, Interesting

      I'm mostly surprised that they're still getting performance improvements. It seems like they've done more over the last year than Sun did during the entire time it owned the project to unbloat it.

    2. Re:I'd bet there is. by machine321 · · Score: 5, Funny

      Now if only they'd take over the Java project.

    3. Re:I'd bet there is. by Anonymous Coward · · Score: 4, Informative

      Dead code is a result of OOP development (C++/Java)

      Because a lot of OOP code has extra methods used to circumvent or enforce memory protection (private/public) with variables inside classes. Sometimes the methods are created with anticipation that they will be used, but all the code is is a getVARname or setVARname dozens or hundreds of times when something like get(varname) set(varname) would be more efficient. in C you don't have this problem because memory protection basically doesn't exist unless you roll your own.

    4. Re:I'd bet there is. by Anonymous Coward · · Score: 4, Interesting

      Just to be Snarky, I'll point out that the Glasgow Haskell Compiler politely informs me whenever it finds a dead function. Functional languages are light years ahead of anything else when it comes to the Compiler actually being able to reason about the code it's compiling.

    5. Re:I'd bet there is. by Stevecrox · · Score: 4, Informative

      The Eclipse Java Compiler can indicate a warning if a private function is never called, the Eclipse Code Compiler and Findbugs will both throw warnings if an area of code is unreachable. Findbugs is able to detect if a variable is declared but never used (dead store) and will throw a warning. Lastly CPD (a part of PMD) is able to look for identical code blocks allowing you merge duplicate functions.

      Sure that doesn't cover public functions but I don't think there is harm in unused getters and setters and it's easy to find if a function is called through tools found in Eclipse. Just because Java developers don't use these tools doesn't mean they don't exist.

  3. Physical to virtual comparison by RelaxedTension · · Score: 4, Insightful

    Can you imagine this in any other engineering discipline? Oh yes, we built the bridge but there are a few hundred unnecessary iron girders that we forgot to remove...

    Those would be perfectly valid if upon discovering your girder was 3 inches too short you could instantly create a copy of it, set the original aside, then alter and test that copy of the girder. Then you might leave a few extras lying around.

  4. Re:Worked Well? by theguyfromsaturn · · Score: 4, Informative

    I never get crashes with LibreOffice. Whenever I try Word on some documents (docx) I get a crash. I was completely unable to edit some documents in Word (sent to me by colleagues) until I opened them in LibreOffice, saved them in doc format, then reopened them in Word. It happens with distressing regularity. I find LibreOffice much more stable than Word personally. The worst part is when once I edited a doc in Word, saved it, and when later tried to open it again had a similar problem. I am not sure what document elements cause this but it's a sad state of affairs when LibreOffice is not only more stable (for me), but handles better MS own file format (even though there are still big deficiencies in the docx file handling in LibreOffice). So, stability issues? I guess it depends on your computer.

    --
    I like my dinosaurs feathery, and my pterosaurs hairy (or is it pycnofibery?)
  5. Bad examples by intx13 · · Score: 5, Insightful

    Bridges often have unused structural elements: walk-ways made unsafe by modern traffic levels, maintenance accesses unused for safety reasons, supports made redundant beyond the factor of safety by bridge improvements, etc. Chips and boards too: FPGAs with 10% utilization, chip designs re-purposed with functional components disabled, subsystems replaced in boards by new designers not confident enough to remove the old design, etc.

    Cruft in software is more often removed because (1) software has a potentially longer lifetime than hardware and (2) it's a lot easier to remove an uncalled function from a program than a girder from a bridge! Software cleanup should be an expected and planned part of a project's life cycle.

    1. Re:Bad examples by Ethanol-fueled · · Score: 4, Interesting

      ...subsystems replaced in boards by new designers not confident enough to remove the old design, etc.

      It sounds crazy, but I work with a real-life example, a beamforming circuit board that utilizes a certain technique, but has all the legacy components utilizing another technique that was never even implemented!

      In that case, it wasn't a matter of confidence, but probably corporate sloth - engineers are expensive, and so they figure that paying the board-house more for the extra components per board would be cheaper than getting an engineer to redesign the board.

  6. Re:It doesn't matter by Mr2cents · · Score: 4, Informative

    I do think it matters. Yes, a compiler can throw out dead code, but not in all cases. E.g. if you have an enum where some values aren't used, and you then call a function if a variable has that unused value, how is the compiler going to find out? It's not only functions, there could be unused tests in code etc. All this clogs up the code and can make reading the code a living hell. It can turn an elegant part into a mess. Not mentioning the time wasted of developers trying to find out what a function does, only to discover it's not used. The article doesn't deal with the results in terms of code size or performance, but I'm very interested to find out.

    Anyway: you can either have clean code or maintainable code, but not both at once in my experience.

    --
    "It's too bad that stupidity isn't painful." - Anton LaVey
  7. Cruft in engineering... by Forbman · · Score: 4, Interesting

    Oh yes, we built the bridge but there are a few hundred unnecessary iron girders that we forgot to remove...
    Well, look at bridges built in the 1800's compared to the ones today. Would we build a modern bridge today using wrought iron links http://en.wikipedia.org/wiki/Clifton_Suspension_Bridge? Each building made in a certain period in a way represents a degree of refinement compared to its predecessors. Better materials, better methods. Buildings in general cannot be "cleaned up" the way code can, where "cruft" today was yesterday's conservative design.

    Read a book about the differences in the construction of the World Trade Centers versus the Empire State Building, for example (the WTC has sibling buildings still around using the same techniques, such as the Aon [nee Amoco] Building in Chicago)...

  8. This happens in other disciplines. by tragedy · · Score: 4, Interesting

    The quoted section in the summary asks if we could imagine this in other engineering disciplines. As the rest of the summary points out, it happens all the time in microchips. It also happens a lot in civil engineering, including bridge building. Removing things takes work. Unless there's work to be saved by doing it, or some way to profit from selling what's removed as scrap or it's a safety issue to leave it most engineers won't remove old parts of a structure. Consider underground pipes. How often are they removed when they're replaced? If the new ones are being laid down where the old ones went, they'll be replaced. Otherwise, 90% of the time they'll just leave the old ones there. Same goes for just about everything. Old installations of any kind are full of stuff that no longer serves any purpose. Brackets and supports for heavy equipment that isn't used anymore, old wiring and panels, concrete slabs that something mystery object used to sit on, etc. When was the last time you saw anyone take away some 30 ton piece of equipment then pay more money to have the floor where it used to sit un-reinforced? Now, sometimes they do. Usually it's when the place is being sold and the new owners are re-modelling. Other times the owners do decide to do a major cleanup. That's exactly what's being done here with libreoffice. Makes it no different than any other engineering discipline then.

    Incidentally, if it's truly "dead" code, then it shouldn't actually be compiled, so it's not like the bridge engineer left in a bunch of extra girders, it's more like he's keeping addendum 6-c to revision 12b of the plans for section 3 in the same file cabinet as revision 13 rather than shifting it to a storage box and warehousing it.

    1. Re:This happens in other disciplines. by Anonymous Coward · · Score: 5, Interesting

      A nice engineering example is the stone pylons at each end of the Sydney Harbour Bridge. They were built to support the cranes that were used in constructing the steel arch of the bridge. Since the bridge's completion they've served no structural purpose whatsover.

      As the parent poster suggests, it would have cost time and money to remove them. However, in bridge building they plan around that - a bit of extra effort was put in at the start and the pylons were designed and built in such a way that they looked good after the bridge was finished. They were left in place as a feature of the completed structure and, as they were built in sandstone, they do a reasonably good job of making the bridge work visually with the feel of the historic precint beneath the southern end of it.

      Dead code rarely adds anything to the aesthetics of software.

    2. Re:This happens in other disciplines. by Nethead · · Score: 4, Informative

      Ever been in the telco closets of a 50 year old office building? Old 9600 baud modems still powered up and connected to 66 blocks, old DS0 smartjacks with red lights, all next to Cat5 and fiber cross connects.

      Look above the drop ceiling of an old department store store sometime and gander at all the serial cable wire that is covered by the Token Ring wire covered by the 10base5 wire that is covered by the ThinNet wire that is covered by the Cat5 wire that is covered by the fiber ducts. All that tangled in with the old 25 pair telco wire.

      If it's not on your work order, you don't touch it.

      --
      -- I have a private email server in my basement.
    3. Re:This happens in other disciplines. by Anonymous Coward · · Score: 4, Interesting

      We are currently preparing to move to another office building in another town.

      Our old premises are just like you describe. Build in 1964.
      Not counting anything else: There is 600 kilometer of Cat5e cabling in the building-complex. (I was involved in the Coax -> Cat5e upgrade back in '97. Still remember some details.)

      Actually it is a good thing.
      The recycle value of the wiring is more than the value of the building and the grounds.

      With todays fucked up real-estate market for office buildings we couldn't sell the old location to anybody.
      But because of the recycle value we had no trouble selling the entire site to a demolition company.

      Win-win for everybody:
      We get money for what is basically an unsellable building-complex.
      They will break it down and recycle it. Gives them about a year of guaranteed work and a reasonable profit when due to the recession their business is at an all time low.
      Afterwards they will sell the cleared grounds to the City Councel who are desperate to get a reasonably priced area of land at the edge of the city-center by 2014 so they can build a new City hospital. (The current hospital is on land not owned by the City, The contracts end in 2016 and can not be renewed because that would require a re-check of the environmental status of the grounds and everybody knows there is pollution there. That's wasn't an issue in 1986 when the old hospital was build, but new legislation passed in 2008 making it an issue now.)
       

  9. Not to mention the human genome itself. by Freddybear · · Score: 5, Interesting

    Human DNA (and just about every other species as well) is full of things like inactive duplicate genes (some with slight alterations), pieces of old retroviruses, and other mutations and replication errors that have been "commented out". Plus a whole lot of sequences which we don't know what they're good for yet.

  10. Re:Not at all. I've had a house built. by Anonymous Coward · · Score: 5, Insightful

    I've seen chemical plants built with millions of dollars worth of unnecessary piping and valves, because the project timeframe meant that it was cheaper to install extra connections that might never be used and save engineering time than waste time re-engineering it.

    If removing unnecessary items can save thirty thousand dollars (say) at the cost of three days, removing the cruft is only worth it if the delay costs less than ten thousand a day.

  11. Re:oooh yes by zidium · · Score: 5, Informative

    Mr2Cents,

    Your actions are indicative of a person who is not yet truly a craftsman of the software engineering trade.

    Speaking from personal experience dealing with huge, complex, unmaintainable PHP legacy systems for the last ten years, let me tell you a far better path:

    1. Search the code base for what may be directly calling the code.
    2. Set debug breakpoints at the start of each piece of cruft code and rigorously test the app.
    3. Create a custom exception (e.g. CrapCodeHitException) and throw it at the beginning of each code segment you want to remove. If you don't hit any of the exceptions after, say, a week of normal browsing doing other things, plus testing, then proceed to step 4.
    4. Catch the CrapCodeHitExceptions at the highest level you dare, log this into a separate log file you will have permission to read. Commit the code into a releaseable branch so that it ends up on your QA and staging servers.
    5. Get approval to have the logging code be pushed to staging. Add comments above each cruft piece of code stating a) the level of risk you think if it is removed, b) when one should feel free to remove it (pick increments like 3 mo, 6 mo, 1 yr, based on risk), c) your name. If shit hits the fan cuz of removal, you want to man up and accept responsibility so your peers don't waste precious cycles needlessly troubleshooting why this "perfectly fine" code was seemingly arbitrarily removed.
    6. After each time of your comments has elapsed, if the code was never triggered (parse the logs!), feel free to remove it. Please leave a note behind that you removed such and such, tho, and stick your name on it. Remove these notes after a year.

    I've personally cleaned up 100,000s lines of code using this mechanism on several large and complex sites, without a single failure.

    --
    Slashdot Valentines Beta Massacre: iT WORKED! The boycotts killed Beta!!
  12. Re:Not at all. I've had a house built. by Belial6 · · Score: 4, Funny

    If it is a house that I have owned, it would be Pez dispensers. Whenever we do remodeling, we make a point to slip a pez despenser into the walls. My wife and I figure that some day a long time in the future, someone will have a mildly amusing story.

  13. Re:Worked Well? by ColdWetDog · · Score: 4, Funny

    Of course, that behavior is quite similar to what happens when you open the moderately complex docx document in Word.

    --
    Faster! Faster! Faster would be better!
  14. Re:oooh yes by Mr2cents · · Score: 4, Interesting

    Well, it was an embedded company. This was the first project of such a large scale, and they lacked experience. The manager had worked there for 30 years, and had an electrical engineering background. So it wasn't an ideal situation: he was certainly competent and he could definitely write code, but lacked the experience in software engineering, like how to keep a large software base maintainable. So for example, he had this obsession with not changing code once it was tested. Since all these modules were tested, he was very nervous about changing it. What he failed to see, though, was that more modules were going to be added, and without a clean definition of how this data was to be represented, constant conversions were going to be needed (plus some other things I'm not going into now). Also, automated testing was not a practice there. This is one of the things I worked on introducing at the company, although frankly it was much to late to add it to the existing project. So he was never going to allow to merge the changes back in, because it "broke" tested code.

    And I'm not saying I'm all for changing tested code - not at all , but in some circumstances, spending some effort up front can save you a lot of time later on. And I'm sure it would have.

    Also, I'm quite confident of my skills. Sure I can still improve, but I surely developed a reputation of writing code that "just works". After 4 years, only a small fraction of the bugs were assigned to me, mostly they were located in parts I didn't write. Mastering the programming language is important, but there are lots of other very important things that matter. You just need a lot of discipline, checking all input, immediately failing when something goes wrong (not letting bad data trickle down the code), clean separations, higher order programming, trying to minimise the interface between modules etc... The list is so large, it becomes more of an intuition.

    I hope this clarifies it a bit, I surely wasn't expecting the Spanish Inquisition.

    --
    "It's too bad that stupidity isn't painful." - Anton LaVey
  15. Re:It doesn't matter by smi.james.th · · Score: 5, Informative

    I am not a Microsoft fan, but Windows 7 is actually a very well-written OS, in my experience. If you have lots of RAM then it uses it, there's no sense in having 8GB of RAM if it's only using 250MB and paging the rest of what it needs.

    As a point of reference, have a look at this article. If you only have 512MB of RAM then Win7/64 will only use about 200MB of RAM.

    --
    One thing I know, and that is that I am ignorant...
  16. Don't know about LibreOffice by rsilvergun · · Score: 4, Informative

    but recent (3.x) versions of OpenOffice ate my kids documents. It really sucked. From what I can gather it's a known bug in the document recovery module that hasn't been fixed to this day. The program crashes, writes a blank document out as the 'recover' document, then cheerfully overwrites all your original file and any of the automatically made backups. I suppose that somewhere along the line there was some user error. My kid probably could have said 'no' to something and stopped the whole mess. But seriously, she shouldn't have too. I've got a 500 frickin' gig drive in her machine. The biggest word doc I've ever seen in my life was 5 megs (mostly pictures). Why the hell do we still delete shit? Just make a huge undo buffer or something. I've got half a fscking terabyte. Come on OO.org, just use it already!

    --
    Hi! I make Firefox Plug-ins. Check 'em out @ https://addons.mozilla.org/en-US/firefox/addon/youtube-mp3-podcaster/
  17. Re:Not at all. I've had a house built. by Grishnakh · · Score: 4, Insightful

    No, a better analogy is to build a house (full of extra materials as the parent said), and then use a giant replicator machine to mass-produce the house, almost instantly, and create thousands and thousands of new homes using that house as the basis. The wasted material in the one house is bad, but not that bad because it's one house, and it takes extra time and labor to do it more efficiently. But multiplied across thousands of identical copies, that wasted material adds up a lot. Plus, it's inefficient and you could have a better-performing house by doing a better job with the small details (better at energy efficiency for example). The slight increase in energy efficiency with that one house, realized by spending a bunch of extra time and effort removing wasted materials and doing a better job with various small details (like making sure the house wrap is applied extremely well rather being hurried and missing some staples in important places), won't amount to much with just the one house. However, multiplied across many thousands of houses, those energy savings add up to a lot.

    The fact that software is easily and quickly replicated with perfect precision and little or no effort or time really makes it hard to make good analogies for it without resorting to Star Trek-style replicators; it's the only technology we have that's like that. And because it can be and is copied so easily, very different dynamics apply to it than to many other fields of endeavor.

  18. Re:It doesn't matter by hairyfeet · · Score: 4, Insightful

    And THIS gets modded "informative"? Did someone give the trolls extra mod points this week? Did I miss a memo? Because I'm sitting here with not one but TWO Win 7 X64 machines running, one an AMD EEE netbook with 8Gb, the other my AMD 6 core Thuban desktop, also with 8Gb, and I'm using less than 600Mb with the basic theme on the netbook and just a hair under 980Mb on the desktop and that's with Aero and all the bling cranked. You are probably using an old tool that counts cache as "used" memory but since Windows dumps the cache if ANY program requests the memory that simply isn't useful anymore. BTW your old XP box uses so little RAM because it'll dump to paging even if there is plenty of memory free which is just fricking stupid, the RAM is using the same voltage regardless so why not use it to speed things up?

    But if you are seriously looking at 2gb and aren't trolling you need to have that thing checked, because either you have more bugs than a Bangkok Whore or one of your apps is leaking memory like a sieve.

    --
    ACs don't waste your time replying, your posts are never seen by me.
  19. Re:Not at all. I've had a house built. by Grishnakh · · Score: 5, Funny

    If you have newspaper or other similar material in your walls, which wasn't processed and designed as insulting filler

    What material in your walls could be more insulting than newspaper?

  20. Re:Not at all. I've had a house built. by theNAM666 · · Score: 4, Funny

    Well, there are a variety of fiber, composite and other materials with higher R-factors per volume, but that's beside the point.

    Newspaper as in processed recycled newspaper bought as insulation from Home Depot or the equivalent is one thing. Stuffing newspapers into your wall after receiving them in the post and reading them (a common practice) is quite another. The latter retains moisture and can lead to mold, rotting and structural damage, just to start with the most obvious problems.

  21. Re:It doesn't matter by dokc · · Score: 4, Funny

    Slashdot, home of the Linux Cocksucker Boy Toys.

    Can you, please, post us a link to the Linux Cocksucker Boy Toys source code.

    --
    In love, war and slashdot discussions, everything is allowed.
  22. Re:Automate it by rgmoore · · Score: 4, Insightful

    I'm pretty sure that they don't want to automate it. One of the first things Libre Office did after they forked from OO.o was to come up with a list of "easy hacks" for people who wanted to get involved but didn't know where to start. That includes stuff like dead code removal and translating comments from German to English. By leaving that stuff marked out but undone, they hope to ease new people into the project. That may not be the most efficient way of doing this kind of thing, but if it helps to recruit new developers it will do a lot more for the project in the long run than just getting rid of the cruft. It's a big difference between a project run by paid coders on a tight budget and one that depends on a variable number of volunteers.

    --

    There's no point in questioning authority if you aren't going to listen to the answers.

  23. Re:It doesn't matter by bonch · · Score: 5, Informative

    No, it's not. Windows 7 will use as little as 200MB of RAM if you only have 512 physically available. You're misunderstanding what's actually going on as you fret over megabytes in Task Manager.

  24. Re:It doesn't matter by Rockoon · · Score: 4, Insightful

    Many its GUI controls are using very bad confusing abstractions. For example, audio, networking, etc.

    They are necessary abstractions because these subsystem themselves are abstractions.

    If you want confusing audio subsystems, look at the mess in Linux right now.. most Linux installs literally have multiple audio subsystems that can output to each other in very confusing master/slave relationships.

    --
    "His name was James Damore."