Are You Sure This Is the Source Code?
oever writes "Software freedom is an interesting concept, but being able to study the source code is useless unless you are certain that the binary you are running corresponds to the alleged source code. It should be possible to recreate the exact binary from the source code. A simple analysis shows that this is very hard in practice, severely limiting the whole point of running free software."
"Exact binaries" is not the point of having the source code.
Given the scale of most modern programs' codebase, good luck actually reviewing the code meaningfully in the first place. That said, if you're really that concerned about the code matching the source, run a source-based distro like Gentoo or Funtoo. For most practical purposes, though, users find binary distributions like Debian/Ubuntu or the various Red Hat-based systems to be more effective in regards to their time.
In SOVIET RUSSIA... erm...NSA AMERICA, the Internet logs onto YOU!
If you are that paranoid study the source code then recompile
I'd suggest that "severely limiting the whole point of running free software" might be a touch of an exaggeration. A huge touch.
No it doesn't. The whole point of running free software is knowing that I can rebuild the binary (even if the end result isn't exactly the same) and, more importantly, freely modify it to suit my needs rather than being beholden to some vendor.
...or just using a binary that you compiled from binary yourself.
For a lot of projects, that's not nearly as hard as some people like to make it sound.
A Pirate and a Puritan look the same on a balance sheet.
If you need to be sure, just compile it yourself. If you suspect foul play, you need to do a full analysis (assembler-level or at least decompiled) anyways.
The claim that this is a problem is completely bogus.
Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
It's a fair argument. If you are not compiling your binaries, how do you know what you have is compiled from the source you have available?
Truth? You don't. If you suspect something, you should investigate.
I have recompiled all my software from the source code and verified that the binaries match but for some reason there's a Ken Thompson user that is always logged in. How did Ken Thompson get into my system and how do I get rid of him?
> Are You Sure This Is the Source Code?
Yes. Yes I am sure. I built it myself. It even includes a few of my own personal tweaks. It does a couple of things that the normal binary version doesn't do at all.
A Pirate and a Puritan look the same on a balance sheet.
1) Submitter is the one who wrote the blog post 2) No cross-reference, no references, no differing opinions at all 3) "severely limiting the whole point of running free software" is more than a bit of an exaggeration
Religous speak to God. Insane are spoken to by God. When all shut up, one can finally hear Shostakovich in peace
I took a graduate-level security class from Alex Halderman (of Internet voting fame) and what I came away with is that security comes down to trust. To take an example, when I walk down the street, I want to stay safe and avoid being run over by a car. If I think that the world is full of crazy drivers, the only way to be safe is to lock myself inside. If I want to function in society, I have to trust that when I walk down the sidewalk that a driver will not veer off the road and hit me.
When you order a computer, you simply trust that it doesn't have a keylogger or "secret knock" CPU code installed at the factory. It's exactly the same with software binaries, of course. In the extreme case, even examining all the source code will not help. You must trust!
What a fool believes, he sees, no wise man has the power to reason away.
..are a bitch. The amount of hoops eg. the bitcoin developers jump through to proof they didn't mess with the build are large. Running specific OS build in emulators with fake system time and whatnot. No easy task.
Hey now, you have to be pretty IT savvy to type ./configure, make and make install all in the same day. Some of us make good money doing that, don't just go suggesting everyone should be doing it.
I do IC design. Logical Equivalency Checking is well worn tool. You can futz about with the logic in a lot of different ways. LEC means we can do all sorts of optimization and still guarantee equivalent function. We can even move logic from cycle to cycle and have it checked that things are logically equivalent.
You run two compilers on the same source code you won't get the same code. You run two different versions of the compiler on the same code you wont' get the same code. You run the same compiler with different options you won't get the same code. They should however all be logically equivalent.
If you've compiled the compiler with competitors' compilers (try saying that ten times fast), you should be fairly safe from Trusting Trust.
Has anybody thought about recompiling the source and seeing if you get the same binary?
Has anybody thought of reading the article before posting questions like this?
That said, this particular "article" isn't worth the waste of bytes it takes up. It's like seeing a 6 year old trying to explain a combustion engine.
Binaries will almost always differ - if nothing else because you need the entire environment exactly like the binary builder. Not just the time stamps, compile paths, hostnames and account names, which are the obvious.
If your compiler or linker is a minor version off what he used, the results can be very different, even if using the same compile options.
But that's not enough: If your hardware is different, randomization of functions in a library will be different.
To flesh out his article a bit more, the author could have done a test with two different Gentoo systems. Different but mostly compatible hardware, and a slight difference in the toolchain. That might have opened his eyes.
Then again, probably not.
This a problem that doesn't exist. You establish a chain of evidence and authority for the binaries via signing and checksums, starting with the upstream. Upstream publishes source and there's signing of the announcement which contains checksums. Package maintainer compiles the source. The generated package includes checksums. Your repo's packages are signed by the repo's key.
You can, at any point in time with most packaging systems, verify that every single one of your installed binaries' checksums match the checksums of the binaries generated by the package maintainer.
If you don't trust the maintainer to not insert something evil, download the distro source package and compile it yourself.
If you suspect the distro source package, all you have to do is run a checksum of the copy of the upstream tarball vs the tarball inside the source package, and then all you need to do is review the patches the distro is applying.
If you suspect the upstream, you download it and spend the next year going through it. Good luck...
Please help metamoderate.
Most of the time, even that isn't enough. C compilers tend to embed build-time information as well. For verilog, they often use a random number seed for the genetic algorithm for place-and-route. Most compilers have a flag to set a specified value for these kinds of parameter, but you have to know what they were set to for the original run.
Of course, in this case you're solving a non-problem. If you don't trust the source or the binary, then don't run the code. If you trust the source but not the binary, build your own and run that.
I am TheRaven on Soylent News
./configure, make, make install assumes you're building on the target machine. Many times you want to build on one machine and deploy on another. Even now, there are a lot of packages that don't work properly when cross-compiling. So you end up hardcoding config files, overriding options, patching the source/Makefiles, etc.
Also, in our environment we need to isolate the build system from the host environment to avoid contamination from the host libraries, and we need to version-control the build system so that we can go back and build the same product we built three years ago for the purposes of fixing a bug for a paying client.
So while open-source helps a lot, many times it takes significant effort to bring in some arbitrary package and build it from source.
Finally, someone gets it. The backdoor is never where you're looking for it.
I work in the gaming (Gambling) industry.
Many states require us to submit both the source code and build tools required to make an exact (and I mean 'same md5sum') copy of the binary that is running on a slot machine on the floor.. to an extent that would blow you away.
They need to be able to go to the floor of a casino, rip out the drive or card containing the software, take it back to THEIR office, and build another exact image of the same drive or SD card.
md5sum from /dev/sda and /dev/sdb must match.
I can tell you the amount of effort that goes into this is monumental. There can be no dynamically generated symbols at compile time. The files must be built compiled and written to disk exactly the same every time. The filesystem can't have modify or creation times because those would change.
This is a silly idea for open source software, the only industry I've seen apply it is perhaps the least-open one in the world.
However, thinking about this logically most places you can get the source code and executable from the same place... and if the executable matches... how paranoid can you be?
How paranoid do you want to be? Reflections on Trusting Trust - Ken Thompson
Today, in day to day practice, you are on "reasonably safe grounds" if you get the executable from either the authoritative source, or an associated mirror, and it matches the published cryptographic checksum/hash value. (md5, SHA, etc.) Of course if you can build from source, after checking the checksum of the source archive, and of any libraries you need to add, you should be in good shape as well. (And it isn't necessarily a bad thing to plan ahead and grab a copy, and then wait a little bit for either source or patches. Sometimes a patch turns out to break things. Rarely you will find out there was an intrusion last week at the site you grabbed your software from. Not being on the bleeding edge sometimes give you added buffer.) And I would avoid building and testing on production systems - use separate build & test systems, even if they are Virtual Machines like VMware or VirtualBox.
It is a good practice to make use of checksums to check the validity of important files being copied or archived as well since sometimes the process can go badly for various reasons.
Your point about Windows source from Nigeria is spot on. Dealing in stolen code, more generally, is seldom an aid to doing anything legal, and may cause enormous problems.
much of left-wing thought is a kind of playing with fire by people who don't even know that fire is hot - George Orwell
One example being Philips TV or BluRay built on Linux. When asked for source code, it is provided, but there are no way to ensure that the source code is for the device, because the provided binaries are encrypted and signed.
I can only recommend you to read this: http://cm.bell-labs.com/who/ken/trust.html
Not only is limited in that way- which itself is an interesting fact, but it's limited in a lot of other ways also.
For one, source code is often bad, as in impenetrable, just off the top of my head-
* Realms of private, non-API / SPI code which is effectively *how the program actually works* which is also completely undocumented.
* Grotesque architectural errors made by (affordable) beginners which have nevertheless been cast in stone by exposing them publicly (God classes filled with global variables, etc. )
* Telegraphic and or misleading method and variable names, e.g. .VariablesWithMissingVowels, also known as Varwmvwls which nevertheless often serve as the ONLY documentation for that variable or method,
* Unfortunate architectural decisions made early on by experienced programmers who may be proud of those decisions. (tunneling package private methods out to "friend classes") and thus subverting the purpose of package private classes and making the source code scope modifiers an effectively an unreliable indicator of source code scope, for instance)
*500 -1000 line methods with some or all of the above characteristics.
* Just massive code bases- I am facing one with literally half a million classes right now...That's right almost 450,000 classes, in a code base that is deliberately architected to defy built-in scoping rules of the language, so virtually anything could call anything ...
And on and on.
All of these things will never be fixed for reasons we all understand, I presume, but reflect on of what this implies for open source. It implies that the much vaulted idea that more developers will iteratively make the code base better over time is a fiction with respect to the actual quality of the code base itself.
No team is going to stop adding features and create more work for itself in the form of resolving conflicts for the sake of enabling their program to do what it already can do.
This doesn't even get into the whole ego thing.
Worse still, anything exposed as public in any way may have a million clients depending on it and change effectively becomes impossible, open source or not. All things public, or even more precisely all things reachable in the code base by "outsiders" through any device found in the host language whatsoever, intended or otherwise, are effectively unchangeable.
In lieu of a successful campaign to stop development and do a rewrite, only a fork will make any of the above better. Forks are becoming more common, but they fail to sustain their branching a high percentage of the time (57%) and anyways presume the power TO fork and on large project this is harder to achieve.
The net effect is, open source code bases fail to live up to one of the major the promises of open source, iterative improvement of the code base.
It's true that some people may fix bugs that they are motivated for external reasons to correct and it's helpful to look at the code base if you're writing a plugin through a public API, but the code itself is often awful and this awfulness , often produced because of limited time and resources has the ironic effect of driving away many times those resources in the form of all the would-be developers who are just turned off. For those who do partake, the existing code has the effect wasting many multiples of the time originally *saved* as each new developer struggles to make sense of the impenetrable code base.
In my experience there is no easy fix or even pricey one. Original authors are quick to fix on the (self serving) idea that whatever documentation which exists *ought* to be enough and anyone who still has questions must be an *idiot*. Wasting time incrementally slogging around this code becomes some sort of test that the dev is *serious* and *smart* when the reality is more like smart, serious devs came, saw and left without saying a word.
Code quality is only subjective at the edges. Undocumented code should not exist. F
Bad choice of target - .Net does actually have multiple compilers available, including open source. But more to the point for this discussion, it has multiple DEcompilers available, including open source.
Want to know what that nasty MS compiler put in your .Net binary ? - run it through ILSpy.
Don't trust the ILSpy binary - decompile it with itself, or with a.n.other decompiler.
In fact, because .Net decompiles so well, the problem of this article (binaries don't compare) just doesn't occur. Want to check your .Net binary against the supposed source ? - easy (well, a hell of a lot easier than with C++). Build your binary from the source, decompile both binaries and compare the two sets of decompiled source. It works, it is consistent and reliable, and it is one hell of a lot more useful at showing up differences than comparing two binaries.
... not only is this the source code for the binary I am running, but also that the build system actually works. This is because not only might I want to make changes to the source to improve it, but I might want to do so in a hurry to fix a security hole. Since I might need to rebuild and run the built binary, I might as well test and make sure what the build system built really runs. So I just install the binary I built. Then I know for sure. Who needs the distributed binary (it might have a root kit in it).
now we need to go OSS in diesel cars