Are You Sure This Is the Source Code?
oever writes "Software freedom is an interesting concept, but being able to study the source code is useless unless you are certain that the binary you are running corresponds to the alleged source code. It should be possible to recreate the exact binary from the source code. A simple analysis shows that this is very hard in practice, severely limiting the whole point of running free software."
"Exact binaries" is not the point of having the source code.
I think I'm done with slashdot. The "articles" have just become tweets in disguise.
Has anybody thought about recompiling the source and seeing if you get the same binary?
Given the scale of most modern programs' codebase, good luck actually reviewing the code meaningfully in the first place. That said, if you're really that concerned about the code matching the source, run a source-based distro like Gentoo or Funtoo. For most practical purposes, though, users find binary distributions like Debian/Ubuntu or the various Red Hat-based systems to be more effective in regards to their time.
In SOVIET RUSSIA... erm...NSA AMERICA, the Internet logs onto YOU!
If you are that paranoid study the source code then recompile
Lots of builds include a timestamp or use it so this isn't always guaranteed.
I like to use auto-generated hash signatures of code in my builds when I want to know an exact version or even exact build of the same source tree.
The trouble with trying to get an exact match is there are so many variables. Do you have the same operating system, the same architecture, the same versions of the same libraries, the same version of the same compiler? What about the same compiler flags? Unless all of those things are an exact match the odds against getting a matching binary are slim. Really, though, it becomes a bit of a moot point because, once you have the source code, you can create your own binary and don't have to wonder if the previous binary was a match.
Now you have a binary which "corresponds" to the source code.
Thus narrowing the issue to binaries in stage3 archive.
I'd suggest that "severely limiting the whole point of running free software" might be a touch of an exaggeration. A huge touch.
"severely limiting the whole point of running free software"
Yet somehow we survive!
No it doesn't. The whole point of running free software is knowing that I can rebuild the binary (even if the end result isn't exactly the same) and, more importantly, freely modify it to suit my needs rather than being beholden to some vendor.
If you need to be sure, just compile it yourself. If you suspect foul play, you need to do a full analysis (assembler-level or at least decompiled) anyways.
The claim that this is a problem is completely bogus.
Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
It's a fair argument. If you are not compiling your binaries, how do you know what you have is compiled from the source you have available?
Truth? You don't. If you suspect something, you should investigate.
The point of free software isn't that you can know that a particular binary is from particular code.
The point is that you have the code available for inspection and that you can modify and build it yourself.
If your build behaves differently it will soon become clear that that binary is not the same.
I have recompiled all my software from the source code and verified that the binaries match but for some reason there's a Ken Thompson user that is always logged in. How did Ken Thompson get into my system and how do I get rid of him?
> Are You Sure This Is the Source Code?
Yes. Yes I am sure. I built it myself. It even includes a few of my own personal tweaks. It does a couple of things that the normal binary version doesn't do at all.
A Pirate and a Puritan look the same on a balance sheet.
Most distributions use mostly identical software, so chances are you end up with identical gcc and so are comparing identical behaviours. Not very useful.
FreeBSD now has a binary patch system and to that end someone worked out how to create binary diffs from freshly built packages against older ones. One of the major pitfalls is timestamps inserted by the compiler. Adjust for that and re-creating suddenly gets a lot more predictable.
Apparently this "tester" hasn't take a very close look as to what is really happening, but thought it more important to wax lyrically about his dreams then moan he couldn't make them reality.
Well, he hasn't really tried, I say. Consequently, his blogged moaning is a waste of time.
1) Submitter is the one who wrote the blog post 2) No cross-reference, no references, no differing opinions at all 3) "severely limiting the whole point of running free software" is more than a bit of an exaggeration
Religous speak to God. Insane are spoken to by God. When all shut up, one can finally hear Shostakovich in peace
I took a graduate-level security class from Alex Halderman (of Internet voting fame) and what I came away with is that security comes down to trust. To take an example, when I walk down the street, I want to stay safe and avoid being run over by a car. If I think that the world is full of crazy drivers, the only way to be safe is to lock myself inside. If I want to function in society, I have to trust that when I walk down the sidewalk that a driver will not veer off the road and hit me.
When you order a computer, you simply trust that it doesn't have a keylogger or "secret knock" CPU code installed at the factory. It's exactly the same with software binaries, of course. In the extreme case, even examining all the source code will not help. You must trust!
What a fool believes, he sees, no wise man has the power to reason away.
..are a bitch. The amount of hoops eg. the bitcoin developers jump through to proof they didn't mess with the build are large. Running specific OS build in emulators with fake system time and whatnot. No easy task.
If this means that much to you, why not just use a source based distro like Gentoo (You can have the added bonus of it being tuned to your system)?
I do IC design. Logical Equivalency Checking is well worn tool. You can futz about with the logic in a lot of different ways. LEC means we can do all sorts of optimization and still guarantee equivalent function. We can even move logic from cycle to cycle and have it checked that things are logically equivalent.
You run two compilers on the same source code you won't get the same code. You run two different versions of the compiler on the same code you wont' get the same code. You run the same compiler with different options you won't get the same code. They should however all be logically equivalent.
I thought I knew Slashdot's source code... then Boom! I find this:
meta http-equiv="refresh" content="600"
If you've compiled the compiler with competitors' compilers (try saying that ten times fast), you should be fairly safe from Trusting Trust.
Unless I'm missing something pretty profound, even having the exact *source* won't always result in the exact binary. My understanding (and I could be wrong about this) is that you can take a well written program and plug it into multiple compilers. GCC may be one of the most popular options, but it's not the only one.
But compilers all optimize differently. GCC 3.x optimizes somewhat differently than GCC 4.x. You can tweak this behavior by manually setting compiler flags, or you can compile binaries that explicitly target different CPU architectures. A binary compiled to target all x86 processors may run differently on Haswell than a binary that's compiled specifically for Haswell.
In other words, flags set at compile time will change performance characteristics, even if the source code is identical, and while some projects may publish the exact details of every compiler flag they set, this doesn't seem to be the norm. Most projects I've seen say "Here are some binaries, and here's the source code if you want to play with it."
Clearly, the point of source code isn't to exactly duplicate every binary in every situation but to give you the data that goes *into* the compiler before the executable is compiled.
Or am I missing something?
I've dealt with a case where a regulatory authority must review code and perform the build to match compiled artifacts with distributed binaries in a (large, linux based) embedded system. You can do it if you have absolute control over the build environment.
Funny things come up when you start analyzing compiled or archived build output. I had to modify squashfs tools to prevent uninitialized superblock struct members from causing unreproducible file systems... there are unused members in the struct that just pick up whatever happens to be on the stack at the time and put it in the file archive. In another case I wrote a cpio archive normalizer to 'fix' things like the device major/minor number that gets recorded in the archive. Also, readdir(3) does not sort, which matters when making reproducible archives. There are GCC macros (__TIME__, for instance) that will embed a timestamp in an object file that can be trouble as well. Also, gzip has an undocumented flag (-m, i believe) to prevent it from sticking a timestamp in a compressed file.
Hexdump, diff and md5sum are your friends. It's possible to do this but you have to go deep.
err.. WHAT?
I have recompiled all my software from the source code and verified that the binaries match
How many different compilers did you use? Did you try any cross-compilers, such as compilers on Linux/ARM that target Windows/x86 or vice versa?
How did Ken Thompson get into my system
See bunratty's comment.
and how do I get rid of him?
See replies to bunratty's comment.
And even building in Linux with GNU, I have come across problems with source that wouldn't compile and the endless chase of dependencies and libraries. And having problems with libraries no longer supported or not supported on my platform - *cough*Ubuntu*cough*.
This a problem that doesn't exist. You establish a chain of evidence and authority for the binaries via signing and checksums, starting with the upstream. Upstream publishes source and there's signing of the announcement which contains checksums. Package maintainer compiles the source. The generated package includes checksums. Your repo's packages are signed by the repo's key.
You can, at any point in time with most packaging systems, verify that every single one of your installed binaries' checksums match the checksums of the binaries generated by the package maintainer.
If you don't trust the maintainer to not insert something evil, download the distro source package and compile it yourself.
If you suspect the distro source package, all you have to do is run a checksum of the copy of the upstream tarball vs the tarball inside the source package, and then all you need to do is review the patches the distro is applying.
If you suspect the upstream, you download it and spend the next year going through it. Good luck...
Please help metamoderate.
./configure, make, make install assumes you're building on the target machine. Many times you want to build on one machine and deploy on another. Even now, there are a lot of packages that don't work properly when cross-compiling. So you end up hardcoding config files, overriding options, patching the source/Makefiles, etc.
Also, in our environment we need to isolate the build system from the host environment to avoid contamination from the host libraries, and we need to version-control the build system so that we can go back and build the same product we built three years ago for the purposes of fixing a bug for a paying client.
So while open-source helps a lot, many times it takes significant effort to bring in some arbitrary package and build it from source.
Even if you have the source, it doesn't mean you can confirm what the binary is doing. See the classic "Trusting Trust" attack which is decades old. In my experience the most common reason for binaries that are not reproducible is due to build timestamps being embedded into the binary. For example, the ar command added the D flag in the past few years exactly for the purpose of being able to output reproducible results. (see the man page at http://linux.die.net/man/1/ar) It's true that reproducible binaries are probably a good thing from a security stand point, but in practice it can be a lot of work to make sure the build produces these. And even then, as Thompson showed, that doesn't always guarantee that what you see is what you get.
We frequently discover a bug and need to fix it without upversioning the whole package (which could result in other incompatibilities with the rest of the system).
So we track down the code for the version we're using, get it building from source with suitable config options, and then fix the bug. In the simple case the bugfix is present in a later version and we can just backport it. In the tricky case you need to get familiar enough with the code to fix it (and hopefully in a way that the upstream maintainers will accept).
Depending on compiler options, some code that isn't completely valid (no overflow/underflow/etc.) can end up logically completely different when you turn on optimization.
Finally, someone gets it. The backdoor is never where you're looking for it.
I thought the point of open source software (from the user end) is so that you can get it for free to do some trival task that you only need to do a few times where buying some comerical software would be a waste of money (or so you don't have to find cracks or keys for the comerical software).
> It should be possible to recreate the exact binary from the source code. A simple analysis shows that this is very hard in practice, severely limiting the whole point of running free software...
I want the binaries to be different, because my PC is a particular combination of parts.
Binaries being different are the whole point of having a single source code of Free Software app.
Alas, who cares about binaries? We're reaching a point where things are compiled just-in-time!
PS: All this is my personal opinion.
I work in the gaming (Gambling) industry.
Many states require us to submit both the source code and build tools required to make an exact (and I mean 'same md5sum') copy of the binary that is running on a slot machine on the floor.. to an extent that would blow you away.
They need to be able to go to the floor of a casino, rip out the drive or card containing the software, take it back to THEIR office, and build another exact image of the same drive or SD card.
md5sum from /dev/sda and /dev/sdb must match.
I can tell you the amount of effort that goes into this is monumental. There can be no dynamically generated symbols at compile time. The files must be built compiled and written to disk exactly the same every time. The filesystem can't have modify or creation times because those would change.
This is a silly idea for open source software, the only industry I've seen apply it is perhaps the least-open one in the world.
One example being Philips TV or BluRay built on Linux. When asked for source code, it is provided, but there are no way to ensure that the source code is for the device, because the provided binaries are encrypted and signed.
I write embedded control firmware for MSP430 processors, building and debugging with IAR Embedded Workbench. In production I build each version to two targets with identical source files but with the single change of different loader output file formats, one for the TI gang programmer used in production, another for the field update loader that we must sometimes distribute to update customers' systems. A third output format (with debug information) is needed if I am going to go in through the JTAG port to do any debugging. Surprise: the resulting memory images from any two of these builds using the same source files have not been identical any time that I have checked. There is no hash nor any date field by the time the image is loaded and I make the comparison with the contents of target hardware memory. In this case, the linker does not always place modules in the same order, and that seems to account for the difference. As far as I can tell they are always linked correctly and so far the program images always seem to have identical functionality, but it means that I cannot use the memory compare function of the JTAG debugger to verify a memory image that was loaded with either the Gang programmer or our field update loader. I asked IAR about this, and they said that yes, the module order was not guaranteed to be consistent between loader output file formats. So I can be sure that each of these build output files does correspond to a known source, and the same source, and all of them work if any of them do, but the memory images they produce fail comparison. Grumble, grumble.
What difference does it make?
Do you think your smart enough to detect tampering by reading source code?
To detect tampering run strings on the binary and pipe it to grep. If the following string appears 1.3.6.1.4.1.981 you are fucked.
I was once part of a startup whose project was distributed by a big outfit. They naturally wanted to archive our source, for which they needed to do a proof build. Unfortunately the archiving company's builds didn't match ours. We eventually discovered that our toolchain running on our "official" build machine (an ancient AMD K6 whitebox) didn't generate exactly the same bits as what they were using (~20 bytes were different.)
We never found a functional difference, but they had already accepted our version and would have had to re-QA the new one, which is unbelievably expensive in the commercial software world. IIRC we finally gave them our build machine and bought whatever the archive company was using for the next time.
How did Ken Thompson get into my system
See bunratty's comment.
I hope that wasn't a whooshing sound I just heard....
Why we need code: I was setting up Spring MVC in Tomcat using Fedora 18's package manager version of Spring. As a Struts person, I had never used Spring MVC and was following samples of how to set it up. And I kept getting this error about DispatchServlet.properties not being found. None of my tutorials mentioned that, it was all controller-servlet.xml stuff. (And Java annotations, but I like my ugly warts in XML files where they belong, not in my code.) What the crap is DispatchServlet.properties and why the crap can't it be found? After THREE HOURS I finally figure out that DispatchServlet.properties comes with Spring, but isn't in the JAR file. I downloaded a 300MB-ish distribution from Spring Source and dug around until I found DispatchServlet.properties and put it on my classpath, and Spring MVC started working.
So, no, I don't want to make identical binaries, but I do need the source.
Ultimate Terror Weapon
Why are you running the binary, if you care about having a version that is trustful to the source code? Just compile your own, never use precompiled binary, problem solved.
Virtually all of his findings are traced to differences in date and time and chosen compiler settings and compiler vintage. Unless he can find large blocks of inserted code (not merely data segment differences) he is complaining about nothing.
Using sub-projects is a common problem. Consider a project A that builds upon independent projects B and C. A, B and C are independently developed by three different developers. The source to all three are publicly hosted. A's available source does not include B and C's source, rather it has a link to their respective repositories. A reasonable thing to do.
The problem comes in that daily snapshots of B and C that A used to build his binary are not know tags or otherwise identified. Happens all the time. Even in projects from Google itself.
No one, especially TFA, mentions gitian? Really?
https://gitian.org/
https://github.com/bitcoin/bitcoin/tree/master/contrib/gitian-descriptors
I thought the point of open source software was so that some unscrupilous guys could charge morons for the privilege of downloading it (ala Open Office) or so that junk sites could trick users into installing thier worthless installer and bundle crapware and spyware with it.
Dude is saying if you download a binary dist that you won't be able to compile the source code to match it? Ya, no shit Sherlock. That is why you download the source code and compile it yourself. While there are trusted sources, you never know what is in binary dist. At least, when you compile it yourself, you can examine the source code.
So, is this guy going to tell us that binary dist can have malware next?
Be seeing you...
I can only recommend you to read this: http://cm.bell-labs.com/who/ken/trust.html
Even if you can build your executable from source, small difference in compiler or library versions can change the signature significantly. Add to that, that if your compiler has been compromised, you may still be pwned even if the source is clean.
Not only is limited in that way- which itself is an interesting fact, but it's limited in a lot of other ways also.
For one, source code is often bad, as in impenetrable, just off the top of my head-
* Realms of private, non-API / SPI code which is effectively *how the program actually works* which is also completely undocumented.
* Grotesque architectural errors made by (affordable) beginners which have nevertheless been cast in stone by exposing them publicly (God classes filled with global variables, etc. )
* Telegraphic and or misleading method and variable names, e.g. .VariablesWithMissingVowels, also known as Varwmvwls which nevertheless often serve as the ONLY documentation for that variable or method,
* Unfortunate architectural decisions made early on by experienced programmers who may be proud of those decisions. (tunneling package private methods out to "friend classes") and thus subverting the purpose of package private classes and making the source code scope modifiers an effectively an unreliable indicator of source code scope, for instance)
*500 -1000 line methods with some or all of the above characteristics.
* Just massive code bases- I am facing one with literally half a million classes right now...That's right almost 450,000 classes, in a code base that is deliberately architected to defy built-in scoping rules of the language, so virtually anything could call anything ...
And on and on.
All of these things will never be fixed for reasons we all understand, I presume, but reflect on of what this implies for open source. It implies that the much vaulted idea that more developers will iteratively make the code base better over time is a fiction with respect to the actual quality of the code base itself.
No team is going to stop adding features and create more work for itself in the form of resolving conflicts for the sake of enabling their program to do what it already can do.
This doesn't even get into the whole ego thing.
Worse still, anything exposed as public in any way may have a million clients depending on it and change effectively becomes impossible, open source or not. All things public, or even more precisely all things reachable in the code base by "outsiders" through any device found in the host language whatsoever, intended or otherwise, are effectively unchangeable.
In lieu of a successful campaign to stop development and do a rewrite, only a fork will make any of the above better. Forks are becoming more common, but they fail to sustain their branching a high percentage of the time (57%) and anyways presume the power TO fork and on large project this is harder to achieve.
The net effect is, open source code bases fail to live up to one of the major the promises of open source, iterative improvement of the code base.
It's true that some people may fix bugs that they are motivated for external reasons to correct and it's helpful to look at the code base if you're writing a plugin through a public API, but the code itself is often awful and this awfulness , often produced because of limited time and resources has the ironic effect of driving away many times those resources in the form of all the would-be developers who are just turned off. For those who do partake, the existing code has the effect wasting many multiples of the time originally *saved* as each new developer struggles to make sense of the impenetrable code base.
In my experience there is no easy fix or even pricey one. Original authors are quick to fix on the (self serving) idea that whatever documentation which exists *ought* to be enough and anyone who still has questions must be an *idiot*. Wasting time incrementally slogging around this code becomes some sort of test that the dev is *serious* and *smart* when the reality is more like smart, serious devs came, saw and left without saying a word.
Code quality is only subjective at the edges. Undocumented code should not exist. F
The whooshing sound was David A. Wheeler flushing Ken Thompson down the drain.
I don't understand the problem. If you have the source code and are concerned about the authenticity of the binary, why not just build it yourself and use your own binary?
Apparently wizard is not a legitimate career path, so I chose programmer instead.
> unless you are certain that the binary you are running corresponds to the alleged source code.
Yes, i am.
> It should be possible to recreate the exact binary from the source code.
Obviously it is, since that is how i obtained the binary in the first place
> A simple analysis shows that this is very hard in practice, severely limiting the whole point of running free software.
wat?
CLI paste? paste.pr0.tips!
I'm going back to Windows, then.
but one that is virtually eliminated by simply compiling your own code.
Obviously, as a practical matter, you aren't going to get 100% identical binaries from a given chunk of source unless your build environment is very carefully set up to achieve that end(something that people don't typically bother with).
However, as a matter of theory, I'm left with a question: If I give you a piece of source code and a complete build environment, you can compile and produce a binary in a certain number of operations. If I were to give you a piece of source code, a build environment, and a binary, would there be any general algorithm more efficient than just compiling it and checking whether the output is identical to answer the question "Is this binary a product of that source and build environment?"
Is there any property that you can exploit, if provided with the alleged binary output, to perform a 'verification' operation that is less computationally expensive than a naive 'compilation', or would that be possible only in certain special cases, with no useful general method?
Just download the source and compile it yourself!
Bad choice of target - .Net does actually have multiple compilers available, including open source. But more to the point for this discussion, it has multiple DEcompilers available, including open source.
Want to know what that nasty MS compiler put in your .Net binary ? - run it through ILSpy.
Don't trust the ILSpy binary - decompile it with itself, or with a.n.other decompiler.
In fact, because .Net decompiles so well, the problem of this article (binaries don't compare) just doesn't occur. Want to check your .Net binary against the supposed source ? - easy (well, a hell of a lot easier than with C++). Build your binary from the source, decompile both binaries and compare the two sets of decompiled source. It works, it is consistent and reliable, and it is one hell of a lot more useful at showing up differences than comparing two binaries.
1984 called and they want their problem back! https://en.wikipedia.org/wiki/Reflections_on_Trusting_Trust#Reflections_on_Trusting_Trust
It's a good problem neverthless.
... not only is this the source code for the binary I am running, but also that the build system actually works. This is because not only might I want to make changes to the source to improve it, but I might want to do so in a hurry to fix a security hole. Since I might need to rebuild and run the built binary, I might as well test and make sure what the build system built really runs. So I just install the binary I built. Then I know for sure. Who needs the distributed binary (it might have a root kit in it).
now we need to go OSS in diesel cars
Recompiling is not enough because you can't trust the compiler either, unless you write your own bootstrapping compiler to compile the compiler.
He says "using the tools that are recommended by the distributions". No idea about the rest, but in openSUSE we use "osc". Nobody in his right mind uses rpmbuild to build a RPM which is supposed to be distributed.
And part of the openSUSE build system happens to be "build-compare". Every time a package changes we automatically rebuild every dependent package (not very efficient, but ensures binary compatibility), so we are very interested in having reproducible builds to avoid unneeded rebuilds. If some code uses __DATE__ or __TIME__ we change it for the modification time of the changelog file.
And here shoud be the picture of that guy in funny hat that says [something]? Tell me about it.
This article is just screaming for this.. if you want to run exact source code every time, you need to run Gentoo (or ports or alike)!
No, but it's the droid I've been looking for.
Table-ized A.I.
Of course the binary is going to be different on different platforms, especially if it is compiled with platform-specific optimizations.
That's the WHOLE FUCKING POINT of running free software.
Unless you're compiling the source with the exact same versions of everything from the IDE to the underlying headers and libraries, you're not going to get a bit-for-bit binary match. Which is, again, the WHOLE FUCKING POINT of running free software.
It should NOT be easy to get the exact same binary on your system as was packaged with the software you're using. Again, whole point.
And run the resulting binaries. Voila, problem solved: you know the binaries you are running correspond to your source code.
If the binary is open source, how do you know that? You can't compile it if it's only open source, you need a license to that.
And what if it WERE allowed to be compiled, but was compiled using some weird flags or libraries that they didn't distribute? That's allowed by a non-GPL Open Source license.
With BSD, how do you know that the version you have is the one that was used in the closed version?
Again, you don't.
If it were GPL, the license is such that you are given what's needed to compile.
Well, as long as it's GPL3. GPL2 didn't *specifically* require signing keys, so Tivo's GPL2'd OS you cannot check to see that the version it is running is the same one you get from compiling the source they gave you.
A. I don't need to use the provided binary.
B. None of the principals of Free Software involve 'ability to produce identical binaries'.
C. It is possible to determine whether a binary corresponds to source code if malfeasance is sufficiently suspected.
The claim is that you CAN fix it yourself.
And that is true, even if you have to trust the code elsewhere. You have the code, you write a fix, you compile the fix and see if it works.
When it does, you have fixed the bug.
No need to compile the compiler (which used to be the way I installed Linux back in the Ygdrasil and Slack1.0 days...)
The halting problem is taken as "you can't decide if a program will halt". But it doesn't. It indicates that you cannot guarantee any given program will halt. But a "Hello World" program WILL be proven to be haltable in its environment. And the vasty majority will also be provably haltable.
However, you cannot prove haltability for ALL programs.
You can guarantee equivalent function on programs where you can guarantee the function of one. When you re-compile with different options, if you could guarantee the halting of the original program, you can guarantee this second version will be guaranteed equivalent function (or not, thereby prove that the program you compiled is not the same program).
If you gave me the source, the build environment and a binary, what point is there to the binary if I don't trust you to think the source is relevant to the binary?
If that were the case, I'd ignore your binary, compile my own and run that.
Why, then, would I care if your binary was built from that source? I'm not running it.
I can think think of two issues (aside from the malicious code issue which is being beaten to death).
First, we can't tell if the binary matches the source, so we can't tell if they're fully complying with the GPL.
Second, since we can't tell if the binary matches the source, if we try to hack around in the source we have the potential to be working in a different build than the published binary and getting wildly different results.
As for the malicious code, if you can compile the build from source and have a byte for byte match, you can be sure that you have the correct source. If there is malicious code, you'll be able to find it later. Or better yet, maybe someone else is verifying it. Does anyone question the value of being able to go back and look at malicious source code to see what it's done?
Maybe we should make it easier to make reproducible binaries?
Dependencies - do you know what they are? Everything right down to the OS heap layout can affect the layout of a binary. The chances of you being able to 100% audit your build machine and reproduce the exact same binary are slim to zero. Especially with things like randomization we're DELIBERATELY randomizing the built executable.
Good luck with that guys, give me a call when you've fixed it.
I agree 100%. This is just another guy who after hearing about the NSA spying, suddenly wants to see boogeymen in everything. While I do not in anyway advocate and actual detest what the NSA has done, I am most pissed that proving the events happened has suddenly given every conspiracy theory nutcase legitimacy and they can all run around screaming "See I told ya so!" And now since they have something to point a finger at other people who may have been fencesitting or just not knowledgable of the conspiracy culture are now actually listening to these guys. Alex Jones does NOT need any more influence on American than he already does. I bet Art Bell has been jumping for joy for weeks now too. They both should have epic ratings.
Fight the binary monoculture.
Did anyone understand the bit, about how not generating the same exact binary as another, "severe limits the whole point of running free software?"
It's maybe somewhat interesting that we don't all build the same binaries, but I'm not sure I get how it prevents the maintenance that proprietary software users have to learn to live without, and surely that's the biggest point of Free Software.
Indeed, presumably the more I take advantage of Free Software's special powers, the more likely that my binary would be different than some arbitrary stale pre-maintenance one.
If you RTFA, the weird way he's looking at the situation is slightly less weird, but still pretty weird. And I don't think he ever really explains his warped view of "the whole point of running free software." But he does have some kind of quasi-security question ... framed just wrong enough to make things hard.
He is asking, "I have this here binary, and I want to know whether or not it matches this source. Is it a match? Yes or no." and he's having a hard time.
But if you're really worried about the binary (i.e. if this is really important), you don't ever ask that question. Instead, you utter the statement, "I want a binary compiled from this source." And once you look at it that way, everything gets really easy. Type "make."
(Is this all really about Ken Thompson's old story?)
I just want to point out an often overlooked difference between Debian and other free OSes: Debian is actually a very comprehensive build system for complete OSes, it's not just a set of packages. A .deb file may seem equivalent to an .rpm, but actually the toolset behind the two formats is as different as night and day.
To the point, a Debian-based OS is not a "binary" distribution as opposed to "source" distributions like Gentoo. In a Debian, every package is actually available as both source (.dsc files) and possibly binaries for various architectures (.deb files). The final vendor can opt to create an installation CD for a particular architecture, but there's nothing stopping anyone from created a source-based CD, too. Debian is a build system designed specifically for free software operating systems, and despite the clunkiness of its toolsset, it does its job very well.
So, it's unsurprising that the author found a strong equivalence in Debian. Indeed, the .deb files we get are procuded by .dsc files using the equivalent build process he used, but on the vendor's build farm.
Unsurprising, but still worthwhile that he checked this to make sure. So much of computer security is based on trust, and what looks obvious may not be. So, we now have some evidence about Debian's reliability in this particular matter.
If you're concerned about a compromised binary and you have source that you can verify as genuine...you build the package from source and remove the pre-built binary. Just about every modern distribution I can think of has some facility to do this. It's as if he/she is looking at the issue from entirely the wrong angle and creating an issue where there isn't one. If the source from everything down to the compiler used to create that package is available then it can all be built from source, from verified sources...it would take a long time but can be done, and I'm sure there's more efficient ways of hunting down potential compromises.
I just don't see what the point of his/her argument is. I run Fedora, if I thought that a particular package in Fedora might have a key logger built into it...I'd get the source RPM, read through it to the best of my ability, look it up on forums to see if I missed anything obvious...and if everything looked good, I'd use the source RPM instead of the pre-built binary. Any program used to build that binary, you follow the same process down the chain. It doesn't matter if the pre-built, distro-offered binary matches what you get when you build it from source -- it just matters that what you can build from source is secure and has the same functionality. Besides, as soon as you add in different compiler flags per build the signature of the binary is going to be substantially different anyway, et. al.
Slow news day?
Transfusion (the recreation of Blood in Quake) is advertised as open source software, and it is a GPL derived work. Years ago the lead programmer admitted to hiding the source code because released executables somehow don't count as a release.
Sometimes allegedly open source projects just blatantly violate open source licenses. The real question is: who cares enough to sue?
If the snapshot of your code at compile time is signed and linked to the binary output which is also signed then you can be sure that the two correspond. i.e. You can take your arrow of time and shove it up your straw man's....
"in theory, to build binary packages from source packages that are bit for bit identical to the published binary packages"
Only if I have the exact same development environment as the published binary.
AccountKiller
Gentoo Linux would suit your needs, as it compiles from source at the install stage ...
AccountKiller
What? Did someone say Gentoo?
Or, indeed, *BSD.
make world!
Ken is a former USG employee. He seeded the fields they now exploit with that "C" thing.
To avoid a targeted attack, just use a signed compiler package, e.g. from Debian.
Unless Debian happens to be compromised at the time you download packages, as it was in October 2003 and July 2006.
tim?
What's the point? Are you imitating Alan Sokal? Trying to lure extrordinarily stupid wannabe terrorists into wasting a huge amount of time? Or did you just want to show the world you have way too much time on your hands yourself? Or is it that confusing concept called "humor"?