Slashdot Mirror


Are You Sure This Is the Source Code?

oever writes "Software freedom is an interesting concept, but being able to study the source code is useless unless you are certain that the binary you are running corresponds to the alleged source code. It should be possible to recreate the exact binary from the source code. A simple analysis shows that this is very hard in practice, severely limiting the whole point of running free software."

32 of 311 comments (clear)

  1. Bogus argument by Beat+The+Odds · · Score: 5, Insightful

    "Exact binaries" is not the point of having the source code.

    1. Re:Bogus argument by Anonymous Coward · · Score: 5, Informative

      The guy who submitted that article is the person who wrote it. Awesome "work", editors.

    2. Re:Bogus argument by CastrTroy · · Score: 4, Insightful

      Ok, maybe not exact binaries, but what if you can't even make a binary at all, or if you do make one, how do you ensure it's functioning the same? That's the problem that many people have with open source code that exists in languages that you can only compile with a proprietary compiler. Take .Net for instance. It's possible to write a program that is open source, and yet you're at the mercy of Microsoft to be able to compile the code. Even when I download Linux packages in C, it's often the case that I can't compile them, because I'm missing some obscure library that the original developer just assumed I had. What good is code if you are unable to compile it is right up there with "what use is a phone call, if you are unable to speak". Some code only works with certain compilers, or with certain flags turned on in those compilers. Simply having the source code doesn't mean you have the ability to actually use the source code to make bug fixes should the need arise.

      --

      Anthropic principle: We see the universe the way it is because if it were different we would not be here to see it.
    3. Re:Bogus argument by ZahrGnosis · · Score: 4, Insightful

      If you're worried about the lineage of a binary then you need to be able to build it yourself, or at least have it built by a trusted source... if you can't, then either there IS a problem with the source code you have, or you need to decide if the possible risk is worth the effort. If you can't get and review (or even rewrite) all the libraries and dependencies, then those components are always going to be black-boxes. Everyone has to decide if that's worth the risk or cost, and we could all benefit from an increase in transparency and a reduction in that risk -- I think that was the poster's original point.

      The real problem is that there's quite a bit of recursion... can you trust the binaries even if you compiled them, if you used a compiler that came from binary (or Microsoft)? Very few people are going to have access to the complete ground-up builds required to be fully clean... you'd have to hand-write assembly "compilers" to build up tools until you get truly useful compilers then build all your software from that, using sources you can audit. Even then, you need to ensure firmware and hardware are "trusted" in some way, and unless you're actually producing hardware, none of these are likely options.

      You COULD write a reverse compiler that's aware of the logic of the base compiler and ensure your code is written in such a way that you can compile it, then reverse it, and get something comparable in and out, but the headache there would be enormous. And there are so many other ways to earn trust or force compliance -- network and data guards, backups, cross validation, double-entry or a myriad of other things depending on your needs.

      It's a balance between paranoia and trust, or risk and reward. Given the number of people using software X with no real issue, a binary from a semi-trusted source is normally enough for me.

    4. Re:Bogus argument by icebike · · Score: 4, Insightful

      But too his credit, he did say a "simple analysis" although when reading TFA he omitted the word "minded" from the middle of that phrase.

      Virtually all of his findings are traced to differences in date and time and chosen compiler settings and compiler vintage.
      Unless he can find large blocks of inserted code (not merely data segment differences) he is complaining about nothing.

      He his certainly free to compile all of his system from source, and that way he could be assured he is running
      exactly what the source said. But unless and until he reads AND UNDERSTANDS every line of the source he is
      always going to have to be trusting somebody somewhere.

      Its pretty easy to hide obfuscated functionality in a mountain of code (in fact it seems far too many programmers pride
      themselves their obfuscation skills). I would worry more about the mountain he missed while staring at the
      mole-hill his compile environment induced.

      --
      Sig Battery depleted. Reverting to safe mode.
    5. Re:Bogus argument by arth1 · · Score: 5, Informative

      To borrow from The Watchmen:

      Who compiles the compiler?

      Your attribution isn't just a little off, it's way off.
      Try Iuvenalis, around 200 AD.

    6. Re:Bogus argument by oGMo · · Score: 5, Insightful

      Simply having the source code doesn't mean you have the ability to actually use the source code to make bug fixes should the need arise.

      And yet, it still means that you can fix it, or even rewrite it in something else, if you want. Not having the source code means this is between much-more-difficult and impossible. The lesson here should be that everything we use should be open source, including compilers and libraries, not "well in theory I might have problems, so screw that whole open source thing .. proprietary all the way!"

      --

      Don't think of it as a flame---it's more like an argument that does 3d6 fire damage

    7. Re:Bogus argument by NoNonAlphaCharsHere · · Score: 4, Funny

      To borrow from the Tao Te Ching: "The Source that can be told is not the Source."

    8. Re:Bogus argument by Aaron+B+Lingwood · · Score: 4, Interesting

      "Exact binaries" is not the point of having the source code.

      You are correct. However, it is a method to confirm that you have received the entire source code.

      The point being made is that a binary could always contain functions that are malicious, buggy or infringe on copyright while the supplied source does not.

      Case Study:

      A software company (lets call them 'Macrosift') takes over project management of a GPL'd document conversion tool. Macrosift contribute quite a bit of code and the tool really takes off. Most users are obtaining this tool be either the Macrosift-controlled repository or a Macrosift partner-controlled repository as a pre-compiled binary. It can even convert all kinds of documents flawlessly into Macrosift's Orifice 2015 new extra standard format which no other tool seems to be able to do.

      Newer versions of OpenOffice, LibreOffice, JoeOffice come out and this tool just doesn't seem to be doing the job. Sure, it converts perfectly from everything into MS .xsf but doesn't work so well the other way and won't work at all between some office suits. The project gets forked by the community to make it feature complete. The project managers start by compiling the source, and to their surprise, the tool will not work as well as the binary did. After a year passes, the community realizes they've been had. By painstakingly decompiling the binary, they discover that the function that converts to MS proprietary .xsf is different to that in the source. Another hidden function is discovered in the binary that introduces errors and file bloat after a certain date if the tool is being used solely on non-MS documents.

      How else can I ascertain whether you have supplied me with THE source code for THIS binary if I can not produce said binary with provided source code?

      --
      [Rent This Space]
    9. Re:Bogus argument by Lumpy · · Score: 5, Informative

      There are very talented people that can hide things in only a few lines of code. See http://ioccc.org/ for some examples that will make your skin crawl.

      --
      Do not look at laser with remaining good eye.
    10. Re:Bogus argument by 14erCleaner · · Score: 4, Informative

      Who compiles the compiler?

      I guess it's time to introduce another generation to the devious genius of Ken Thompson.

      You can't trust code that you did not totally create yourself. (Especially code from companies that employ people like me.)

      --
      Have you read my blog lately?
    11. Re:Bogus argument by Andy+Dodd · · Score: 5, Informative

      Yeah. Unfortunately, the issues he presents here DO make it more difficult to prove that someone is providing a binary that could NOT have possibly originated from the provided source code.

      As an example, the kernel source initially released for the Samsung GT-N8013 (USA Wifi Note 10.1) was not what was used to build the binaries in question.

      The "difficult to prove but obvious" - Any kernel built from the provided source had a massively broken wifi driver that would completely stop functioning, usually within 5-10 minutes, requiring the module to be removed and reinserted. Pulling the wifi module source from a different Samsung tarball (such as a GT-I9300 release) would result in a working driver. But how do you prove the source provided is correct?
      In the case of the N8013, we were lucky - Samsung changed a bunch of debug printk()s slightly in their released binary. Small stuff, not functionally relevant, such as typo fixes and capitalization differences in their touchscreen driver's debug printk()s - but at least provable to be different.

      So we could prove that the kernels didn't match, but couldn't necessarily prove that the biggest functional problem was due to a source difference.

      We asked Samsung to provide source that corresponded to the UEALGB build for that device, and their response was, "That build is a leak and hence we are not obligated to provide source for it." Effectively admitting that the provided source was not meeting the requirements imposed by the GPL for that build, and then claiming that the software build preinstalled on every device sold in the USA for the first 1-2 months after launch was a "leak" and thus they didn't have to provide source for it.

      Needless to say, between that and other situations, that was my last Samsung device.

      --
      retrorocket.o not found, launch anyway?
    12. Re:Bogus argument by Hatta · · Score: 5, Informative

      But unless and until he reads AND UNDERSTANDS every line of the source he is
      always going to have to be trusting somebody somewhere.

      Even if he reads and undertands every line of the source, he's still trusting someone. He has to read and understand every line of the source code of the complier he is using, and the compiler that compiled that compiler, and so on.

      Reflections on trusting trust is almost 30 years old now. It should be well known.

      --
      Give me Classic Slashdot or give me death!
    13. Re:Bogus argument by frost_knight · · Score: 5, Informative

      For true malice there's also The Underhanded C Contest.

      From their home page: "The goal of the contest is to write code that is as readable, clear, innocent and straightforward as possible, and yet it must fail to perform at its apparent function. To be more specific, it should do something subtly evil."

      --
      It always takes longer than you expect, even when you take into account Hofstadter's Law. --Hofstadter's Law
    14. Re:Bogus argument by aristotle-dude · · Score: 4, Informative

      "Exact binaries" is not the point of having the source code.

      Uh, you must not have worked in a shop that does continuous integration automated builds? Do you really think QA should be handed binaries that you compile and have them trust them?

      The problem is that GCC will always give you a different binary every time you compile from the same source. This makes it impossible that the binary you received comes from the source you claim to have used. You can get around this by never receiving binaries from anywhere but the automated build machine but it would still be useful to be able to test that a build that you received was built from the code you expect.

      There were several reasons why Apple moved away from the GCC tool chain to LLVM and Clang but one of the abilities of the LLVM stack is that you can actually get identical binaries from the same source compiled on different machines at different times.

      --
      Jesus was a compassionate social conservative who called individuals to sin no more.
    15. Re:Bogus argument by mrogers · · Score: 4, Informative

      The latest alpha release of the Tor Browser uses a deterministic build process for exactly that reason: users of open source software (or the small minority of users with the necessary technical skills) should be able to check that the published binaries match the published source exactly - no malware, no easter eggs, no backdoors. If someone detects a mismatch, they can alert the rest of the community.

      Mike Perry, who spent six weeks getting deterministic builds working for Tor, has some interesting thoughts on why this is an important issue for security tools, even if the users completely trust the developers.

      I'd like to see more open source projects following Tor's lead. Gitian is a deterministic build tool that might help - it enables multiple people to build a binary from the same source and check that they get identical results.

  2. Being able to is nice, but who has the time? by intermodal · · Score: 4, Interesting

    Given the scale of most modern programs' codebase, good luck actually reviewing the code meaningfully in the first place. That said, if you're really that concerned about the code matching the source, run a source-based distro like Gentoo or Funtoo. For most practical purposes, though, users find binary distributions like Debian/Ubuntu or the various Red Hat-based systems to be more effective in regards to their time.

    --
    In SOVIET RUSSIA... erm...NSA AMERICA, the Internet logs onto YOU!
  3. The obvious thing is by Chrisq · · Score: 4, Insightful

    If you are that paranoid study the source code then recompile

  4. touch o' hyperbole by ahree · · Score: 5, Insightful

    I'd suggest that "severely limiting the whole point of running free software" might be a touch of an exaggeration. A huge touch.

    1. Re:touch o' hyperbole by MozeeToby · · Score: 4, Interesting

      The issue the author is bringing up is that you have no way to easily determine that the published binary is, in fact, functionally identical to the published source code. Imagine you write an app that accesses private data and open source it, saying "check the source, the only thing we use the data for is X". And if you look at the source, that's certainly true. But there's no way to verify that the binary download was built from the published source; especially if the resulting binary is different every time you build it and different if you build it on different machines with different configurations. So, everyone who grabs the binary instead of building from source is taking it on trust, just like proprietary software, that the program does what it claims.

  5. Incorrect suppositions. by Microlith · · Score: 5, Insightful

    A simple analysis shows that this is very hard in practice, severely limiting the whole point of running free software."

    No it doesn't. The whole point of running free software is knowing that I can rebuild the binary (even if the end result isn't exactly the same) and, more importantly, freely modify it to suit my needs rather than being beholden to some vendor.

    1. Re:Incorrect suppositions. by Shoten · · Score: 5, Insightful

      A simple analysis shows that this is very hard in practice, severely limiting the whole point of running free software."

      No it doesn't. The whole point of running free software is knowing that I can rebuild the binary (even if the end result isn't exactly the same) and, more importantly, freely modify it to suit my needs rather than being beholden to some vendor.

      There's another point too...which incidentally is the whole point of running a distro like Gentoo...that you can compile the binary exactly to your specifications, even sometimes optimizing it for your specific hardware. I don't get at all this idea he has about "reproducible builds;" if he builds the same way on the same hardware, he'll get the same binary. But what he's doing is comparing builds in distros with ones he did himself...and the odds that it's the same method used to create the binary are very low indeed.

      If he's concerned about precompiled binaries having been tampered with, he's looking at the wrong protective measure. Hashes and/or signing are what is used to protect against that...not distributing the source code alongside the compiled binary files. If you look at the source code and just assume that a precompiled binary must somehow be the same code "just because," you're an idiot.

      --

      For your security, this post has been encrypted with ROT-13, twice.
  6. Not a concern by gweihir · · Score: 4, Insightful

    If you need to be sure, just compile it yourself. If you suspect foul play, you need to do a full analysis (assembler-level or at least decompiled) anyways.

    The claim that this is a problem is completely bogus.

    --
    Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
  7. Problems with verifying the binaries from source by tooslickvan · · Score: 5, Funny

    I have recompiled all my software from the source code and verified that the binaries match but for some reason there's a Ken Thompson user that is always logged in. How did Ken Thompson get into my system and how do I get rid of him?

  8. Trust by bunratty · · Score: 5, Insightful

    I took a graduate-level security class from Alex Halderman (of Internet voting fame) and what I came away with is that security comes down to trust. To take an example, when I walk down the street, I want to stay safe and avoid being run over by a car. If I think that the world is full of crazy drivers, the only way to be safe is to lock myself inside. If I want to function in society, I have to trust that when I walk down the sidewalk that a driver will not veer off the road and hit me.

    When you order a computer, you simply trust that it doesn't have a keylogger or "secret knock" CPU code installed at the factory. It's exactly the same with software binaries, of course. In the extreme case, even examining all the source code will not help. You must trust!

    --
    What a fool believes, he sees, no wise man has the power to reason away.
  9. Re:What a problem by h4rr4r · · Score: 5, Funny

    Hey now, you have to be pretty IT savvy to type ./configure, make and make install all in the same day. Some of us make good money doing that, don't just go suggesting everyone should be doing it.

  10. Diverse Double-Compiling by David A. Wheeler by tepples · · Score: 5, Interesting

    If you've compiled the compiler with competitors' compilers (try saying that ten times fast), you should be fairly safe from Trusting Trust.

  11. Re:What a problem by arth1 · · Score: 4, Insightful

    Has anybody thought about recompiling the source and seeing if you get the same binary?

    Has anybody thought of reading the article before posting questions like this?

    That said, this particular "article" isn't worth the waste of bytes it takes up. It's like seeing a 6 year old trying to explain a combustion engine.

    Binaries will almost always differ - if nothing else because you need the entire environment exactly like the binary builder. Not just the time stamps, compile paths, hostnames and account names, which are the obvious.
    If your compiler or linker is a minor version off what he used, the results can be very different, even if using the same compile options.
    But that's not enough: If your hardware is different, randomization of functions in a library will be different.

    To flesh out his article a bit more, the author could have done a test with two different Gentoo systems. Different but mostly compatible hardware, and a slight difference in the toolchain. That might have opened his eyes.
    Then again, probably not.

  12. Re:What a problem by TheRaven64 · · Score: 5, Insightful

    Most of the time, even that isn't enough. C compilers tend to embed build-time information as well. For verilog, they often use a random number seed for the genetic algorithm for place-and-route. Most compilers have a flag to set a specified value for these kinds of parameter, but you have to know what they were set to for the original run.

    Of course, in this case you're solving a non-problem. If you don't trust the source or the binary, then don't run the code. If you trust the source but not the binary, build your own and run that.

    --
    I am TheRaven on Soylent News
  13. Required in some industries by mrr · · Score: 5, Interesting

    I work in the gaming (Gambling) industry.

    Many states require us to submit both the source code and build tools required to make an exact (and I mean 'same md5sum') copy of the binary that is running on a slot machine on the floor.. to an extent that would blow you away.

    They need to be able to go to the floor of a casino, rip out the drive or card containing the software, take it back to THEIR office, and build another exact image of the same drive or SD card.

    md5sum from /dev/sda and /dev/sdb must match.

    I can tell you the amount of effort that goes into this is monumental. There can be no dynamically generated symbols at compile time. The files must be built compiled and written to disk exactly the same every time. The filesystem can't have modify or creation times because those would change.

    This is a silly idea for open source software, the only industry I've seen apply it is perhaps the least-open one in the world.

  14. Philips multimedia devices and GPL by taara · · Score: 4, Interesting

    One example being Philips TV or BluRay built on Linux. When asked for source code, it is provided, but there are no way to ensure that the source code is for the device, because the provided binaries are encrypted and signed.

  15. Bad choice of target by ray-auch · · Score: 4, Informative

    Bad choice of target - .Net does actually have multiple compilers available, including open source. But more to the point for this discussion, it has multiple DEcompilers available, including open source.

    Want to know what that nasty MS compiler put in your .Net binary ? - run it through ILSpy.

    Don't trust the ILSpy binary - decompile it with itself, or with a.n.other decompiler.

    In fact, because .Net decompiles so well, the problem of this article (binaries don't compare) just doesn't occur. Want to check your .Net binary against the supposed source ? - easy (well, a hell of a lot easier than with C++). Build your binary from the source, decompile both binaries and compare the two sets of decompiled source. It works, it is consistent and reliable, and it is one hell of a lot more useful at showing up differences than comparing two binaries.