Famous Last Words: You can't decompile a C++ program
The Great Jack Schitt writes "I've always heard that you couldn't decompile a program written with C++. This article describes how to do it. It's a bit lengthy and it doesn't seem like the author usually writes in English, but it might just work (haven't tried it, but will when I have time)."
Why would you want to do this unless you were stealing source?
;)
I'd just leave it alone.
Or is it just my ISP?
to find security holes
Now even the story posters don't read or verify the articles they're posting...
Teenagers these days don't have as much sex as they want each other to think they do.
Information is lost in compilation. You can never reconstruct the exact original source. You end up with valid C++ that has no more human-understandable information than the equivilent machine code.
Like turning hamburgers into cows...
Surely he now understands the English infinitive "to be Slashdotted".
"Molest me not with this pocket calculator stuff."
- Deep Thought
I've always heard that you couldn't decompile a program written with C++.
;)
Well, you can decompile every binary programm at least to assembler code, so why shouldnt it possible with C++?
Maybe he ment "you can't decipher the source of a C++ programm"
--
One by one the penguins steal my sanity...
A c/c++ decompiler that totally worked would be the Holy Grail of crackers. Unfortunately it is actually impossible to get everything back because lots of info is lost on compilation.
Nevertheless there are tools out there that attempt to decompile programs; I think of them more as ways of making assembly more readable.
Note, a lot of them wouldn't work on hand-written assembly, because they rely on knowledge of how certain compilers compile various things- e.g. there was a Delphi decompile available.
graspee
Slashdot has DDoS'ed the damn thing into oblivion.
On the other hand, did anyone get to mirror it?
but it'll look like this
class a
{
public:
void b(int c);
void d(int e);
private:
int g;
int h;
};
int main()
{
a f;
f.b(23);
int x; x=0; x++;
if(x > 3) goto j;
f.d(x); x++
if(x > 3) goto j;
f.d(x); x++;
if(x > 3) goto j;
f.d(x);
j: f.b(42);
return 0;
}
Yeah, but they should know how to decompile the slasdot effect first... another one down. Anybody with a Mirror or Google Cache link ?
This sig can be distributed under the LGPL license
Last time I checked 'Intresting' was in the English dictionary.
When anger rises, think of the consequences.
Confucius (551 BC - 479 BC)
(haven't tried it, but will when I have time)
Yeah I dunno how you have time for anything anymore.. having to post duplicate articles and all.
I would find both your educational background and the dictionary you're using very intresting.
Did you mean interesting?
*BBBRRRRTTTT*
Incorrect! Intresting is *NOT* in the dictionary. Interesting, however, is.
Check it out.
Look it up
Ohhh grammar nazis, right.
When anger rises, think of the consequences.
Confucius (551 BC - 479 BC)
*BBBBRRRRRRTTTT*
Incorrect! Spelling Nazi may have been the answer you're looking for.
She always sleeps standing.
Grammar: The system of rules implicit in a language, viewed as a mechanism for generating all sentences possible in that language. I think you'll find that includes spelling.
When anger rises, think of the consequences.
Confucius (551 BC - 479 BC)
Sure you can decompile an optimized and symbol-stripped C++ program, but you'd never have it the original compact form of the source as you do with the Java class file decompilers due to the heavy use of inline functions and templates used in C++. A C program, sure, but decompiling C++ is not terribly useful.
Like turning hamburgers into cows...
It's right here in my Oxbridge English Dictionary. What are you on about?
I'm scared of numbers that can't be written as a fraction. It's an irrational fear.
that thing was slashdotted even in "the mysterious future". hrm. i left the ads on cause that's the only feature i wanted :D
slashdot: where everyone yells sarcastic metaphors to themselves to understand the issue
Would you like me to point out the incorrect structure of your grammar?
When anger rises, think of the consequences.
Confucius (551 BC - 479 BC)
what the hell is a progr?
Its write hear in my Oxbrige Enlish Dictionairy. What are you on about?
"But binary can only sometimes be translated into slightly-readable assembly code."
I don't think you understand what assembly code is, dumbass.
Hell, I'd be happy if the people working for me could consistently compile their c/c++. I need a new job...
.. You Can't Slashdot me!
When you think about it, the higher level the language is, the easier it should be to "decompile". The closer the original source was to asm, the more the individual coder's style will be reflected in the asm - the higher level it is, the more the obvious patterns the compiler uses every time for given constructs will be present. Reverse engineering a program written in asm to human readale source is a nightmare, but if you knew for instance that the source was C++ and it was compiled by gcc 3.2 (easy enough to tell), it's probably pretty easy to see from the asm patterns the classes and whatnot, to see the structure of the source.. then you just have to comine that with what the program actually does to give human meaning ack to the variale and class names and whatnot.
11*43+456^2
Would you like me to pull my dick out of your mom's asshole, and scrape off the blood, feces, and sperm and put it in a glass for you? Add a little milk, put it in a blender, and you'll have a nice shake.
Here is some code that supposedly decomplies... not that I've tried it.
Quote from the FAQ:
I would have posted AC but that have me blocked out for some reason...
Davak
Yes please.
When anger rises, think of the consequences.
Confucius (551 BC - 479 BC)
dear anonymous c,
please stop breathing and kill any offspring you may have inexplicably fathered for the sake of our gene pool. Thanks.
Respectfully,
-- Human Race
why run from Vincenzo?
I don't know about you but I pee in the shower all the time!!! intrestingh!!!
a/s/l here. Sorry, adding domain tags to your s
Well, it isn't. Sure, if you're so lazy uou want to have source rebuilt from binaries with one click, complete with comments, makefile and documentation, that's of no use. But imagine the program does some very clever trick. Something you ooh about, "How the hell does he do that? It's impossible?". You want to include that trick in your code. You need it. So - you have three options: 1) Try to design it from scratch. Helluva work, you don't know where to start. 2) Look into the binary. If you're ASM guru, you MAY succeed. But ASM from high-level languages is hell to read. 3) Decompile the puppy, look for that piece through what looks like piles of junk, but is way more readable than ASM and find it. Then just rewrite it in pretty fashion, changing variable names and functions to your needs and include in your own software. It's "the best of the worst", last resort at finding a solution to a small problem. Not a way to edit the source and add a single feature to the original program, like remove print protection from Acrobat Reader. The decompiled program most probably won't be possible to compile. You won't make a cow from hamburgers. But with some luck you may find out the cow was a bull and got killed by a truck.
45 5F E1 04 22 CA 29 C4 93 3F 95 05 2B 79 2A B2
..sic:
"I work for SGI. I make nearly 6 figures. I know programming, and I damn well know computers. I've been working since 1968 in the computing industry."
While I'm inclined to agree with you in just about every way, you immodestly makes me want to smack you.
I work for a very profitable information services entity. I've been working in the IT Security Industry for five years, after having left the film industry. I know my way around computing, (I started with a Commodore PET in the 70s), but I'm not afraid to say that I don't know something when I need help, or I have an opportunity to learn.
Oh... and I make a comfy bit, that's well into the six figure range. There are plenty of folks who make a bunch more than I do, who know a bunch less. Salary is not an indicator of expertise, in this arena, except maybe where self-promotion is concerned.
You forgot to say that you are a jackass.
FUCK YOU FUCKING ASSHOLE!
For the clueless mod that modded this clownshoes up, he's no other than ekrout
Btw, FUCK BILL FUMEROLA and FUCK SCOTT LONG
Thank you
Brett Glass
Please film this activity. It would make an intresting video.
I work for SGI. I make nearly 6 figures. I know programming, and I damn well know computers. I've been working since 1968 in the computing industry.
And just cuz you're an arrogant, self-important sonofabitch doesn't mean you're wrong!
I damn well know computers. I have been working with them since 1904, when the Black Man made the first computer out of a peanut. I now work for Cray research making 18 figures.
I can scratch a superscalar CPU out of silicon with a pocket knife. I even have friends who can write major programs in binary code (yes, just 1s and 0s)... even though writing a simple "hello world" program can ammount to 92,752 bits. I fail to realize that this ability does not a good computer scientist make. Things like intelligent design and research make a CS good.
The parent post is fluff. It's stupid, the man is flamboyant and exagerating. He clearly has no real education of computer engineering and does not recognize that any executable code can be reverse-engineered or decompiled. Especially since every langage (save interpreted languages like Java) are compiled to machine code -- specific, unambiguous, structured code. "Decompiling" this is only really a matter of translating it into your langauge of choice.
So, Mr. Proud American, please get off your imaginary high horse. You're not fooling anyone.
Think about it: In the Matrix Reloaded, GNU/Smith touches anyone else, and they become GNU/Smiths. He is just as viral as the GPL!
And it's not like they only get a GNU/Smith arm in their othervise normal bodies, they turn completely GNU/Smith.
Intresting is a perfectly cromulent word.
I've been working in the computing industry since 1995 - school doesn't count - and make over 6 figures (well over if you want to include bonuses, options and restricted shares). What does this have to do with what I do or don't know...? Nothing!
I agree with your comments about the article, but there's no reason to throw salary around to make a point. I make more than most of my friends and try not to rub anyone's nose in the dollars. (The reason I'm posting AC right now.)
6 figures...riiight.
You don't even know what you are talking about.
Here's the text from the original article:
1. Make a copy of the program you want to decomplie. Let's assume it's PROG.EXE. Copy it to PROGBACK.EXE.
2. Copy PROGBACK.EXE to a DOS PC if you're not using one.
3. Type EDIT PROGBACK.EXE from C:\ (or where ever you copied it to).
4. Enjoy the source code! You can print it out or change it or just look at it.
5. If you change it, use FILE SAVE.
I've done some reverse-engineering on programs written in C/C++ (Intel x86). After a while you learn how to recognize different things like virtual function calls, while/for-loops, switch and stuff like that. However, it's a totally different thing to decompile to C++. It may be possible to decompile compiled code to C, but don't expect that it will look much like the original source, especially if the code was optimized by the compiler :)
[insert joke about it being hideously ugly with templates here.]
{I did not read the article itself because it is, of course, slashdotted)
The cake is a pie
all modern compilers are optimizing compilers, and they reorganize code completely to suit themselves in the most efficient manner. The compiler will reorganize modules and rewrite lines of code in order to make better use of registers, processor features/limitations that
You cannot really see a programmer's style as a result. When you decompile, you'll get it returned as whatever the compiler shifted the code around as.
-
Don't smart compilers change recursion to iteration automatically so how could it be the same source? That's all I know from my little java knowledge.
Anyone recommend a java decompiler known to work on the most recent versions of java, properly?
Something that will literally give me code I can re-compile immediately?
There seem to be a lot of people in this story saying "shame on you for reverse engineering". It has its uses, how else would viruses, worms, and trojans be analyzed to figure out what they do and how they do it.
well, when SGI lays you off this week, you're going to have plenty of time to learn how to create programs in binary, just like your friends.
scott king
not the source's lies.
Losing source code and var names (name spaced globals aka statics and scoped locals) allows the cracker (these are rarely hacking tools, they're mostly cracking tools,) to focus on what the machine actually was told to do instead of smothering it with shades of meaning which interfere with understanding the code.
C++ or Java or Smalltalk, or almost any highly structured language using machine code libraries or virtual machines result in structured blocks of code and heap and stack allocation.
A good decompiler can take the machine code, peel away the name spaces and code calls, extract the patterns in the code and the hacker/cracker can read the patterns instead of wasting time on the code.
Forensic analysis work is extremely useful at telling you what happened when something dies but it is no good at telling you how something worked. For that you need code traces.
Map those code traces onto the structure the decompiler reveals and you understand the program better than the authors/coders.
MSBPodcast.com The opinions expressed here are my own. If you don't like 'em... Think up your own stuff.
OrgName: University of Oklahoma, Health Sciences Center
OrgID: UOHSC
Address: P.O.Box 26901
City: Oklahoma City
StateProv: OK
PostalCode: 73190
Country: US
NetRange: 156.110.0.0 - 156.110.255.255
CIDR: 156.110.0.0/16
NetName: UOKHSC
NetHandle: NET-156-110-0-0-1
Parent: NET-156-0-0-0-0
NetType: Direct Assignment
NameServer: DNS.ONENET.NET
NameServer: TERRA.OSRHE.EDU
Comment:
RegDate: 1992-01-06
Updated: 2002-04-19
TechHandle: ZO24-ARIN
TechName: OUHSC
TechPhone: +1-405-271-1905
TechEmail: networking@ouhsc.edu
Source level access is at the core of the digital rights debate. I do not seek to recycle the arguments presented better elsewhere.
I highly recommend the following website for those seeking to investigate decompilation/reverse engineering.
FraviaEven with complete original source code, understanding a non-trivial C++ application is very difficult. Source derived from an optimized executable is going to be a LOT rougher. No real function names, module names, variable names, or comments. Use of standard libraries (STL, MFC, Boost) is likely highly obscured as well. A tool like this would probably produce source that looks more like a C/machine language hybrid rather than normal C++. The primary use of something like this is if you are looking for a very specific piece of logic such as a password check or an encryption operation or protocol details. When were these famous last words anyway?
...trying to rebuild a wrecked sand castle just by looking at the grains of sand. You can't. Compilers throw away a lot of information needed by people but not necessary for the machine. Compilers optimize the code to run more efficiently and that's a one-way street. Sorry to burst your bubble but trying to reconstruct original source is like trying to herd cats.
Thank you, thank you. I'm Mr. Metaphor and I'll be here all week.
http://www.itee.uq.edu.au/~cristina/dcc.html
In reply to those who think that reconstructing the original source is impossible: That's not even the point. The goal is to take assembly and construct C/C++ that's readable, not exactly what you started with.
Executables often have a lot of info in the symbol table and other places that often let you reconstruct even the same variable names for the decompiled source!
Bingo. Yes the odds of getting the original are practically null, but the problem isn't quite like a simple mathmatical multiplication/factoring in which all numbers are equal. I'm finding with my research that most likely combined with a minimalization goal can carry one far. Intent (and hence structure) can be morphed by the compiler but still must remain intact if the code is to work as the programmer intended.
I'd rather decompile my bare bum than C++
slashdot.tv has not been registered and is available for $50.00/ year*.
Code or pseudocode is available free for many thousands of tough algorithmic problems which have been studied and published in the literature (e.g. Knuth et al) which is to be found in most good university libraries and/or the Internet.
Scroogle
JCPM (copyright) (c)
No. No, it doesn't.
Sorry, but it's obvious by this point that you're a complete retard, so I won't bother even looking at any replies you make. Have a nice day, fuckwit.
hi! could you show me how to stop breathing? thanks.
Updating Total Annihilation to use opengl, increasing the number of weapons (currently 256), and increasing the weapon limit (3 per unit).
Shame on you Davak, you should go find honest code. There's nothing wrong with trying to understand how things work. Some people are stuck with legacy equipment or code they can't replace easily and this is their only option for improvement or even fixing it. Those people would be better off if free code were available. Sometimes the only way to make that free code is to understand the original code. There's nothing wrong with reverse engineering software, ever. Republishing someone else's binary is not legal, but it's not immoral. If the code were honest to begin with, the reverse engineer part would not be required. These days, it's cheaper to throw out the dis-honest code and hardware and buy some hardware that's well understood. If you make hardware or software, I hope you understand the implications for your product - I'm not buying it.
Friends don't help friends install M$ junk.
Intrestingly enough, I try and avoid making my pubic hair public hair - it's so embarrassing.
In europe it is legal to use reverse engineering for compatibility reasons enabling your software to work with others people software (mainly Microsoft)
If you make the reverse engineering in europe you could develop compatible software and then export it to US. So it may be great news for us. In fact it is becoming really complicated to develope software for/at US. Patents, legislation, compatibility. It seems that more lawers than programmers are needed to write something more complicated than HelloWorld.exe.
There is a need for tools that enable the compatibility of the programs or we will end with a monopoly of all kinds of progrmas (And it is illegal to use your O.S. monopoly to obtainthe monopoly of let say...web browsers).
sick. DoT (Denial of Thought)
Since Java programs tend to be more OO and thus complex
the decompiler will give just as much information.
Java is much simpler than C++ so things might look trivial
compared to a decompiled C++ application but because of
of complex relationships between objects with no names
you will have a hard time following it.
In C++ you will have larger methods due to inlining, but a
"really" smart decompiler can find common pieces of code
and separate them to a method.
So yes Java will be simpler to read decompiled, but only
because the language is simpler (which means you will
be more productive and write more code).
Indeed! You can teach yourself more engrish at www.engrish.com
Last time I checked 'Intresting' was in the English dictionary.
That would from the Engrish dictionary actually.
If this technique works (haven't read it, page is slashdotted), maybe it could be used to implement Java-style runtime reflection for C++, which would be extremely cool and useful. Get a pointer to a method, decompile it to find out the expected arguments and return type, and dynamically invoke it.
>;k
Why do people keep thinking that decompilation is possible? In short: decompiling a computer program is solving the halting problem. Period.
The long version: In a compiled computer program there is no distinction for either code or data. Every byte in memory can be data, but it can also be executed as valid computer code.
Now, the catch is that during compilation, data and code are mixed in the resulting binary. For instance take the compilation of a 'case' statement. There are several ways of compiling a case:
- you can write it as a list of IF's, which is perfectly fine decompilable
- you can write it as a jump, based on the case expression.
The fun part about the second possibility is that it's far more efficient, but it poses a problem: when decompiling this you have to know where the bounds of the case lie. What's the furthest jump that can be made? It's a jump based on a calculated value, so you should know which values are possible. But for that, you need to run the program, and more specifically, you must run all possible execution paths.
This can be rewritten as the instance of the halting problem: can a computer find out for any program whether or not it will halt? It is proven that a computer program cannot be written to do this task. Neither can a computer program decompile any other computer program.
--
If code was hard to write, it should be hard to read
You can decompile any program. A compiled program is just your high-level program translated into machine language. There is no sort of magical encryption or similar transformation that it undergoes once you compile it.
All you need to do is read in the bytes of any binary program, interpret the bytes as their machine language equivalents for whatever platform you are using, and then convert your MOV statements to assignment operators, JMP statemets to higher level loop structures, etc..
Of course, you won't retain the names of identifiers, which are referred to only by memory locations in a compiled program; and some control structures might be rearranged due to compiler optimization and the lack of machine language equivalents, but the meat and potatoes of it is all right there.
It's by no means easy to accomplish, especially with higher and higher level programming languages, but impossible? humbug! =)
Why does the poster have time to read the article and post it to Slashdot, but not actually try out the very thing discussed in the article?
I'll never understand the expression "when I have time." It always comes up in a conversation. If you have time to talk about it, make the time to do it. Jeebus.
Is that your real name, Larry? Must've sucked in junior high, eh?
Having finally gotten through to the server momentarily, it appears that the article in question only applies to MS Visual C++.
It all those "stand" things that make it work very well. yeah variable names can be helpful. But those standard calls give me allow of variables names of extact meaning. Very helpul.
Hmm... Sounds a lot like programming with batch files to me. Why don't you guys get with the program and dump C/C++ for smaller, faster, easy to understand programs that run on Windows and DOs! Batch files forever. I will personally growl at those who do not respect them. Click here if you don't know what Panthera leo means
i really disagree about the second one. mostly, because *gasp* not all algorithms people use come from comp sci books. people develop algorithms all the time, so looking into code for that specific reason might be extremely helpful.
BSD is for people who love UNIX. Linux is for those who hate Microsoft.
what a sick hack :-)
ever heard the term RTTI?
Cypher: Well you have to. The compilers work for the construct program. But there's way too much information to decode the Matrix. You get used to it. I...I don't even see the code. All I see is an array, function pointer, integer. Hey, you uh... want a drink?
Neo: Sure.
Cypher: You know, I know what you're thinking, because right now I'm thinking the same thing. Actually, I've been thinking it ever since I got here. Why, oh why didn't I sell my VA Linux stock?... Good shit, huh? Cowboy Neal makes it. It's good for two things, degreasing Perl code and killing brain cells.
Of course, it would probably help if the new people actually have skills.
As I know else where it is easy to decompile the program. To think about one way - THAT IS WHAT THE COMPUTER DOES TO RUN IT.
It you write a program that simulates the functions of the processors. YOU KNOW THEY ARE WELL KNOWN ELSE NO COMPILERS. And load the program the way the load it. AGAIN WELL KNOWN. And now follow all the branches and data pointers, you have a ness map of the binary.
Once you have that pattern matching and known funciton calls (say printf for example) you have map worked out quite well.
Add back know inputs and function symbolic names and the code appears.
One note: it is not the original code. But it the 100% functional equivilent.
you have been warned
C decompilers exist; here's one. There are others. Most aren't very good. It's a hard problem.
Without debugging information, decompilation tends to result in code with arbitrary variable and function names, of course. But you get names when a DLL or .so is entered, so at least you get the program's major interfaces.
Minimal C++ decompilation could be done by adding vtable recognition to a C decompiler.
A more difficult problem is recognition of idioms. Things like "for" statements tend to decompile as lower level constructs. That's OK as a first step. You need some internal representation Initial decompilation might represent all transfers of control with "goto"; higher level recognition then deals with that.
The key to doing a good job is "optimization", finding more concise source code that will generate the object code. The key to this problem is defining an internal representation that can represent any valid machine-language program, and which can be modified as higher level information about the program is discovered. The first step is usually to start at the starting address and build a code tree by following calls, like a good debugger does. Then you start to improve on the code tree, doing things like this:
Decompilation won't always succeed. But you should find all the places where the code is doing something the compiler doesn't understand, and get code back for everything else.
It's a big job, and somebody ought to do it. Among other things, it would be a valuable tool for finding compiler bugs.
You have just proven my point. You need to write a program that simulates a process, and then run the program on that. Then you must run it using ALL POSSIBLE INPUTS.
Now, go study the halting problem, then write your reply again.
--
If code was hard to write, it should be hard to read
I haven't RTA, but how would you determine which code goes in what source files? On a per-class basis? And then end up with files like "a.hpp" "a.cpp" "aaa.hpp" etc.?
...
Or put them all into one big source dump, which would take an eternity to load up for any non-trivial program? Mind you, disassemblers tend to work this way
I think this would also be slow and horribly inefficient - ever tried to disassemble executables that are a hundred MB or so in size? Still waiting? Multiply that time ten fold as JMPs, JNZ and JZs are analyzed to determine whether something is a while, do-while or a for loop. Or to determine if a statement is a 'break' or a 'goto'.
All this before even mentioning the matter of different compilers - g++, msvc, borland c++, etc., etc. etc.
Ah I dunno, who cares - to ye who implements this shit: kudos, my friend, kudos.
First: There would be no private members. That is information that does not exist in the compiled code. Everything would be public, or just declared as a struct.
Second: The sequence starting from int x; going to j: would be optimized away to just:
f.d(1);
f.d(2);
f.d(3);
Third: Since class a, is a global name (which you can find by looking at the name-mangled a.b() and a.d(), the decompiler should be able to come up with the correct name.
Fourth: It might be too hard for the compiler to correctly guess the layout of class a, given that it has not virtual member functions (thus doesn't need a vtable), the default constructor is simple enough to be inlined, and that no heap allocation of a-objects occur. If the two member functions uses both g and h it should be able to find them both, but there might be friend functions elsewhere that uses more members, and that is again information that would be hard to infer from the binary code. It should guess this correctly, but it should also insert some warning that it wasn't really sure...
It looks like the author is decompiling simple C programs that are compiled using Visual C++. His sample programs consist of nothing more than a main() function, a global character array, and sometimes another global function or two. It does not address ANY features of C++, even fundamental ones like classes.
I do not see how this is decompiling C++. It is simply decompiling C.
No, I think they were looking for the grammar nazi. Any relation?
Now, the catch is that during compilation, data and code are mixed in the resulting binary.
Not last time I checked. My compiler emits at least four segments in a compiled program: .text (program code), .rodata (initialized data marked as 'const'), .data (initialized data), and .bss (zero-filled data, which is run-length encoded). Segments .text and .rodata are also write-protected.
Yes, there is a halting problem, but this isn't it. Segments make distinguishing code from data straightforward. I understand that a few programs make platform-specific API calls that write-enable .text would be harder to disassemble (and subsequently decompile), but do most user programs make such calls?
Besides, even if the halting problem were relevant, the halting problem can be solved in a real computer, which has limited memory and is thus a linear bounded automaton rather than a Turing machine.
Will I retire or break 10K?
You rode the short bus to school, didn't you?
And it's perfectly fine for decompilation to be lossy. The point of decompilation is not to recover the original source code byte-for-byte but to recover something from which a programmer of ordinary skill can recover the gist of the algorithm.
Will I retire or break 10K?
I have worked on a few projects where I was asked to pick up in the middle of a project already started in c++ or c, and even with comments and full normal source... figuring out someone else's code for a project of any size is a big pain, I can't imagine how hard a decompiled source would be.
whenever someone told me my java source code is unreadable, I usually feed it into a decompiler, then send it back to the whiner. It usually fixes them up. (yes, the decompiled .java has no compile, but neither does the original source)
First off, there is no "decompiling" going on here. That would imply that you will end up with code having a semi-resemblence to the original code - which is certainly not happening. What is going on here is simply just another compilation phase. This time, instead of an object file target compliant with the system ABI, you are getting a C/C++ file target which should theoretically be compilable into a program that will generate the same output for the same runtime input. The scope of effort and implications barely overlap as they are so vastly different.
Of course, with C++, being a strongly typed language that resolves so many things at compile time, decompilation is not possible for any non-trivial example (which all the examples in the link were- indeed they didn't use any C++ features at all). This is even ignoring the effects of compiler optimizations. The C++ language is far more expressive than the output dialects of the compiler making the whole idea of decompiling silly. C, on the other hand, is basically a platform-independent assembly language which is why the one-to-one examples of C and asm output seem to imply one can move back and forth between the two at will. Still this is a mistaken impression.
Now - is compilation from object code to (non-equivilent but functionaly similar) C code useful and interesting? Certainly. And all compiler developers and most hard core debuggers can do this pretty much at will. Its the only way to check the correctness of your compiler and its generated code and, in desperate circumstances, can give you some clue as to what an existing application for which you have no source to, is doing. This is called reverse engineering, btw, NOT decompilation. Unfortunately the material pointed to here provides absolutely no new insights and is quite rudimentary at best. Anyone intimately familiar with their compiler and environment already has more knowledge than this paper provides. Really doesn't justify a slashdot posting but I guess whomever posted it simply isn't a C/C++ developer.
For those who don't know about it already: The Decompilation Page
A friend of mine work(ed) with a company in Kingston, ON that was spun off from Queens University. Their sole purpose and business model is to take whatever binaries and source a company has available, run it through their cluster of analysis systems, and produce a "clean" update of the system. As per usual, there is about 10-15% of the produced code that needs some hand inspection and tweaking to complete the task.
Their "big" business was the Y2K work, as their software isn't limited to just reverse-engineering, but can also refactor the re-engineered code (e.g. change all "year" values in the system from 2 digit to 4 digit, updating all related I/O formatting functions, overlay structures, etc.)
On the flip side, their stuff involves complex pattern matching and heuristics that put any other system I've heard of to shame. It requires clusters of systems running for days to do the initial code analysis. (OTOH, it probably took years to create the original code.)
I can't provide more specifics on the company because they're having some legal issues with co-investors.
I do not fail; I succeed at finding out what does not work.
(Yes, I know the author's native language probably isn't English. But I couldn't resist.)
Laura
Who is the dumbass replying to the wrong comment?
Loop unrolling is only one heuristic for optimizing, and it's something many programmers do by hand to tweak performance. For example, you could do one of the following:
Alternatively, many programmers would realize they're dealing with constant values (most compilers won't recognize that strlen() as producing a constant), and write:
An ideal optimizing compiler would produce the latter code from the first. How in the world would a reverse-engineering tool recognize those two forms as equivalent?
I do not fail; I succeed at finding out what does not work.
Microsoft owns your code. They have an inhouse decompiler that will decompile any visual c++ code. They use this in case your product gets to popular, they buy a copy of your code, decompile, do a bit of reworking and ...wam... they own your market and you are out of business.
DON'T BUY MICROSOFT!!!
Nice save. Someone might have clicked on something less than wholesome.
I thought decompiling a program in any language would be similar. Why is C++ singled out?
The article (link provided for those who don't read URLs) is wrong, even in the first section.
The title of the first "chapter" is "Why is c++ Decompiling possible?". But immediately he lists "what is totally loss when you compile a program and what stays there".
In the Lost column he puts templates and classes. The remains list has things like function calls and local variables.
Well, guess what? Those things are are "lost" are everything that distinguishes C++ from C. If you don't have classes (meaning no inheritance or virtual functions either) and don't have templates either, then you're really just programming in "a better C", not C++.
So all his approach can hope to "decompile" is C code. Which is something we've seen done in various forms for decades.
No, he hasn't. The halting problem is a general statement, i.e. can a computer (universal Turing Machine, whatever) determine, in the _general_ case, for any program, whether or not it will halt properly. For non-theoreticians out there, that's basically asking, "is it possible to write a program that tells whether any given other program will return zero?" The answer to that, obviously, is no, the only way to find out what the return value will be is to run it. (Yes, there's a mathematical proof of the same thing, I'm sure you can find it if you want to...)
:)
But decompilation isn't the same thing at all. You're taking assembler code and trying to reverse engineer the data and control structures that generated it. Yes, there are complexities inherent in the process (your example of the switch-case statement is a good one) but these are bound by the variables they depend on. The switch statement has a finite number of cases to jump to and the decompiler can easily check each branch for just the one switch statement and characterize it's behavior.
Sure, exploring each branch will take exponential time, but no one said decompilation had to be done in polynomial time. There's no need to simulate all possible inputs, just the ones at each branch point. They can be done totally independently of one another, since all we want to do is build something that's functionally equivalent at each module. The programmer reading the decompiled output bears the burden of trying to figure out what the program does, i.e. how the various modules work together.
So, no, decompilation != halting problem. (I'm not saying that decompilation is easy or accurate here, just that it can be done and it's not a proven theoretical impossibility.) Now, go study decompilation, then write your reply again
Finally, somebody can reverse eng. SCO and rewrite the parts that are the same or similar as Linux before the damned lawsuit is finished. Mootify it!
Table-ized A.I.
It's a shame that nobody mentioned IDA yet -- an interactive decompiler that does not restore the source code but instead tries to work with the human to figure out what parts of machine code do and mean by splitting data and code and giving readable names to functions and variables to start with.
I haven't seen any comments on the linked "book" itself yet. In short: it sucks hard. Go take a look and try not to laugh.
Patrick Doyle
I mod down every jackass who puts his moderation policy in his sig. Oh, wait a sec....
Unfortunately, the user interface left a lot to be desired, owing to the authors peculiar, um, tastes in UI design. And it never produced assembly that could be fed back into an assembler after modification, which would've been most useful. It did, however, come with a fairly detailed analysis of the Mac roms (back in the day ;-) and so was invaluable in learning the secrets of the Mac toolbox. Anybody know of a similar product for linux/windows/os x?
As an aside, the parent article is complete crap written at a high school level by someone that recently learned that what you put in the compiler vaugly resembles what you can see from the debug prompt. I kept waiting for the "here's my magic algorithm that allows me to trace object usage accross appearent compilation units", but this pointless article didn't even come close.
While complete decompilation is impossible, if any data has been thrown away, partial translation into a form that is more useful may be possible: when many ask for a "decompiler", they really want that useful translation instead, though may not know it -- knowing only that a complete decompilation would be such a useful translation.
IOW, "You can't always get what you want, but you might just find, you can get what you need."
The real danger here are people (think PHBs) who can't tell the difference between a translation and a complete decompilation, see that the former is possible, think it is the same as the latter, and INSIST, on pain of being fired, that a tool be developed to produce the latter. A little knowledge is a dengerous thing in that context.
You could've hired me.
RTTI doesn't let you ask for a method/variable by its name and then change it.
You can write a program in Java without actually hard coding any of the method/variable names that you are calling (bad idea... but not possible in C++ without having to roll a whole lot of your own code or using a special pre-processor type of deal).
I have done quite a lot of reverse-engineering stuff. Assembler really isn't that hard, nor is reading disassembled programs. Just use a debugger or string-search in the disassembled code to find the place you want to modify. Study the piece a bit and change it. What makes it _really_ easy is library calls, that will show in the code as "call printf". With this technique i have added, removed and patched a lot of features in different programs (removed splash-screens(yes, some legal apps use this annoying feature), removing ugly skinning of programs windows etc.).
;)
Yes, when i last checked this was all legal in Finland.
For bigger patching this could be really useful.. but for that purpose there are open source programs
I can tell you. Blow your head off with a suitably powerful firearm. Tie a cinderblock around your legs and throw it off a bridge. Use your tiny imagination. Do whatever it is you did to those animals you most likely tortured when you where younger. See you on the other side, worthless.
why run from Vincenzo?
Obviously we are being given an obliquely put challenged to hack the Irish e-voting system and decompile its code!
A better question might why you _wouldn't_ want to be able to decompile your code.
I would imagine that anyone using or seeking knowledge in the digital domain would want every tool possible added to their arsenal. Not everyone making code is a nice or law abiding person.
Since when was taking apart my car ilegal? "trade secrets" and lisence agreements that demand you give up your right to take apart a piece of software are hogwash and total BS. I have the right to know what my computer is doing, because it's mine, and besides, I can still take apart the cotton gin and rigure out how it works, the same applies to software. Just becuase it's a computer and has thousands of magical circuts doesn't mean I can't learn how it ticks.
Removing one of the barriers that keeps people from learning what a program does is a great idea.
Candy-Coated Knowledge
I'm curious how this works. I hear about it all the time, that some company or other has ripped off a piece of someone else's code and they're suing. The thing is, if I understand this correctly, a number of modern compilers optimize the executable to such an extent that t's theoretically possible that two completely different pieces of code employing the same algorithm will end up with an extremely similar executable. Unless your algorithm was unique, how can you possibly prove definitively that someone ripped off your code.
Anyway i seen alot of people saying decompiling is impossible or at least not practical, well that is not true. Decompiling c++ is very practical because of high level keywords(if,while,for) ,local variables, and parameters. All of these generate certain instruction similer on every platform and just about every proccesser.
I also extending the artical to contain 92 pages in total which will cover OOP, and crt, and a whole bunch of other stuff
A coding sequence cannot be revised once it's been established.
Why not?
Because by the second day of compilation, any objects that have undergone reversion compilations give rise to revertant binary like rats leaving a sinking ship. Then the ship sinks.
What about C decompilation?
I've already tried it. Reverse disassembly, templates as an decoding agent and potent mutagen. It created a virus so lethal the system was dead before it left the startup sequence.
Then a repressive extension that blocks the operating code?
Wouldn't obstruct replication, but it does give rise to an era in replication so that the newly formed code carries the mutation and you've got a virus again. But this-- all of this is academic. The code was made as well as we could make it.
Guys, Tyrell knows....
Well, duh. You asked for a variable that was a pointer, you got it. If you used an int, it wouls hold a value. Understand the language before reverse-engineering it!
How can you be joking at a time like this? Marco is loss!
Well, you can decompile every binary programm at least to assembler code, so why shouldnt it possible with C++?
There's a huge difference between disassembling and decompiling. With assembly, you generally have a 1 to 1 correspondence between machine language instructions and assembly instructions. That is, one specific instruction you feed to the assembler becomes one specific assembled instruction. Sometimes it's more complicated than this, but only slightly.
Now look at c, where one line of code could be arbitrarily many opcodes, depending on the complexity of the logic within that line (and the length of the line). Now suddenly, instead of looking at one instruction and translating it back to it's equivalent, your decompiler has to look at possibly hundreds of instructions, parse them logically and figure out where each line starts, and ends, and what the logical purpose of each set of instructions is. Then dealing with structures (or in C++, objects) where you have to come up with a definition for how data is laid out based solely on the instructions for dealing with that data.
That's quite a bit more complicated. I sure as hell couldn't do it. I know I could write an assembler or disassembler, I might be able to write a simple compiler, but there's no way in hell I could write a functional decompiler.
"I don't care about the Constitution!" --Bill O'Reilly, November 17, 2009
Something that will literally give me code I can re-compile immediately?
If you're going to re-compile it immediately, it sounds like you really don't have a use for a decompiler =)
a few more of these and you will be up to +5.
then we can freely post goatse.cx links all over the place
Write a program which will loop until any key is pressed. When will it halt?
Interactive input is outside the parameters of a Turing machine and thus not related to the halting problem. Besides, in modern operating systems, input is an API call, and API calls are ridiculously easy to handle correctly in a disassembler.
Will I retire or break 10K?
>> You need reasons?
>>1) Finding backdoors
>>2) Testing security
>>3) Fixing bugs
>>4) Adding features
>>5) Discovering copyright violations
>>6) Interfacing to non-supported clients
Let me Microsoft-ize your post
1) Finding backdoors^Hundocumented features
2) Testing security^Hundocumented features
3) Fixing bugs^H^HExtending undocumented features
4) Adding features^Hdoggy bloat
5) Discovering copyright violations^H^Hblackmail and copyright duplication
6) Interfacing to non-supported clients^H^H^H^HMaking code non-standard to prevent competition and otherwise non-documented to suffer competition and cause difficult coworking.
That sum it up? Oh yes, IRC...
7) Profit!
One of the funniest posts I've read in a while...
The previous reply wasn't me, Ping.
OK, my statement may have been a bit misleading. By definition of an LBA (linear bounded automaton), the input for an LBA is the contents of its memory when it is powered on; an LBA has no "input registers", that is, addresses whose contents will change other than by a write to memory by the CPU. Because every LBA has an equivalent (but humongous) finite state machine, it is possible to determine whether any LBA will halt or loop by running it for n cycles where n is the number of states of the machine's memory (n = 2^m where m is the number of bits in the machine's memory). In practice, two identical machines are run in a tortoise-hare configuration (one cycle of the tortoise to two cycles of the hare), and the machine has halted or looped when both machines are in the same state.
Take interactivity into account, and programs that run on real computers will always eventually halt because if nothing else, the power will die, or the CPU will melt, or something.
Will I retire or break 10K?
But I don't see any reason to justify my claims either.
All PSPACE-complete problems are decidable by a machine running algorithm that uses P space. This PDF states that the acceptance problem (equivalent to the halting problem) for linear bounded automata (which it calles "linear bounded deterministic Turing machines") is PSPACE-complete. Here's an algorithm that decides it.
Will I retire or break 10K?
They forgot to remove debugging symbols. Then it's really easy. :)
First he crows about MS Visual C++, then says It's harder to reverse engineer something created than to create it in the first place..
Who is this, the King of Trolls?
I want to delete my account but Slashdot doesn't allow it.
Not only does the author completely fail to realize that the technique he is describing doesn't remotely qualify as decompilation, and is is nothing but normal reverse engineering, but he figures that the appropriate response to negative criticism is to remove evidence of it rather than attempt to intelligently respond. I noticed that my vote of 1 of 5 was still intact on his voting page, though.
I was originally surprised when I first read the article that someone would think it had merit enough to write about, but having some insight into the mindset of the author that I did not have before (offered by his rapid censorship of my remarks), my surprise has waned completely.
File under 'M' for 'Manic ranting'
Dear handybundler,
I have been reading Slashdot for a few years now, always at -1 just to get the full, dirty details of the discussion. I'd just like to say that I thoroughly enjoy your comments, and hope to see more (although IIRC -1ers are limited to 2 a day).
Anyway, keep up the good work.
Regards,
M
Wow with that kind of attitude, you'll probably be there first from all the stress the trolls are causing you. Read at +1 or stop being such a pussy. KTHXBI.
FUNNIEST TROLL EVER.
:(
Mad props to whoever wrote it -- the fake links at the end got me pissing myself. Shame the old days of -1 accounts being able to post zillions a day are gone
Seriously, I learned a bit of x86 assembler, and with the intel architecture reference manuals available online for free download (and they'll even send you free hardcopies if you prefer -- which I did),
you should be able to follow along -- with not too little effort on your part.
You may not realize that in a hex dump, all them silly hex codes are actually machine instructions which can be easily translated into "human readable" assembly language statements by any decent hex editor.
So if you can understand a bit of assembler some op codes, and a bit about your computer architecture, and your operating system, you can follow along pretty well and get some idea of what's going on.
Of course if you can read a hex dump and don't need to see it in assembly language, and can then
immediatly see the patterns in the code and translate that hex dump int java, c++, perl, etc....then you must be NEO...and the Matrix is afraid, VERY AFRAID.
So, for entertainment,
1. Read a memory dump in your favorite hex editor.
2. Open up any binary file in your favorite hex editor and follow along...
3. For practice, whenever you write a c or c++ program, use the -S switch (gcc) and compare the assembly code produced to the c or c++ code you wrote. (For Java you can look at the bytecode in a
similar manner)
Why? Because you is an uber hacker, that's why.
"to be Slashdotted" is not an infinitive...
Some programs end up losing the source code. Harddrive crashs viruses and so on so being able to recover the source in these cases can save programer months.(and some time years).
Other cases are companys that have gone bust and the program is no longer being made.(here you have to be careful not to step on a sold on licence)
Even so back door scanning linux does not care about the black box. To find a back door you only need asm not c++. Just asm will take longer to find it but some auto search programs can reduce the time down no end. Basicly a black box is point less as hackers will find there way around them.
Asm does not care what the source is complied in as most programs get convered to asm of some form anyhow.
And was it brillig in your slithey toves?
Sorry I missed you on IRC (if that was you). Hanging a closet door...
It's a full-fledged IP network, not a p2p app. Hundreds, if not thousands, of IP apps will work on it with no modification whatsoever. You're not relying on the assumption that someone won't declare your freenet node illegal, either. For that matter, there is no certainty that you are even connecting to Meta... for all you know, you're just VPNing to some friend you met online. It may not be the original "MetaNET", or even a metanet at all...
I like DNS, the web, ftp, email, irc, im... I don't see why it's necessary to reinvent the wheel, if that is indeed what freenet is trying to do. For myself, at least, freenet simply is the wrong approach. I want a true internet, just a non-shitty one. And the idea of IP-over-freenet must have had something to do with smoking crack...
I know this guy. A sad thing is, lives in the US, and as far as I know, he's a native english speaker, I just can't understand a thing he says. I read this "book" week or two ago when he finished it. I thought this was a very rough draft, but I guess not. I couldn't help but laugh at some things, like it's irrelevance to C++ in general. He should have just used C, since he never even mentions a class.... Well, to be fair, he did mention classes when he describes what is lost in the compilation process, which is untrue, especially if it is a polymorphic class. In fact, I didn't see one thing in this article that would set it apart from one written on the same subject, except using C.
For a laugh, look at his other tutorials. Surprisingly, his "book" here is among some of the better material. Most have to do with C++, and some assembly, and some even cover the same material in this lengthy and pointless article. I especially like his tutorial on using Macros in C++, a concept so backwards and wrong it shouldn't even have to be mentioned. Sure, macros have uses, but with C++, you have real inline functions and constant variables, so why use them for anything besides #include? Anyway, his other works can be found on pscode.com.
What all this boils down to here, is that nothing new is said here. Not only that, but what is said is presented and worded so poorly that anyone reading it is either going to die of laughter or confusion. If you want to read something on reverse engineering, pick up the dragon book, an assembly book, a good disassembler, and some of the very nice documents on cracking software. Many of these are written by people who will be years ahead of you no matter how hard you work, people who actually know what they're talking about.
- Mik Mifflin
If SCO commits perjury and puts linux into unix we will know instantly and can throw these guys in the slammer.
http://saveie6.com/
You know, I used to read Fravia when I was in junior high. I strongly dislike the way the GNU philosophy is associated with "warez" and software cracking on that site. Yes, reverse engineering and disassembling are important tools, but they are not to be used to steal software, but to understand how software works, and thereby implement a free version using similar but superior algorithms and ideas.
A solution to the problem with music today
It "works" for trivial-sized programs. I can do better in my sleep.
If you take a binary image and compile (back) into C code, that C code will be very different then the original code. As the artical says there will be things lost, classes become dynamic function calls etc.
The new C code would in effect be a different expression of the same idea would it? Doesn't that make it copyrightable in it's own right?
If you make a copy note for note of a public doman piece, and publish it, that is a copyrightable work, since it is one rendering of the work. Someone else could make the same copy of the original, and copyright it and as it is a different layout etc, it is copyrightable as well.
I think this is the same type of thing.
Am I wrong?
Should I be asking Slashdot? (of course, slashdot is, as we all know full of lawyers!)
Been doing it for twenty years. It is easy to do.
Stop trying to use logic... actually do it.
Why didn't they just call it B?
I've done this many times to get a decent code quality assembly language function from a C or C++ compiler:
1. Write a function in C or C++ with the parameters you want to use
2. Write another function which does three things
a. Assings a long value of 0xabcdef01 to a local variable of type long (we'll search for this number later)
b. Calls the function in step 1 with dummy paramters. I usually use unique values to make it easier to track which value goes to what parameter
c. Assigns the return values to local variables
3. Compile the C or C++ file with full optimization turned on
4. Look at the map file from the link or even use the debugger to find the four bytes "abcdef01" to locate where the function we want the assembly code is located
5. Disassemble the assembly binary code at the function call and then disassemble the assembly code for the function itself.
6. Save to a file.
7. Inspect the function and, if necessary, rewrite how the parmaters are passed into it.
I forgot to add the last step which is to hand optimize the code generated.
Reverse engineering as I understood is that you have 2 sets of programmers, the frist set disassemble the code and write a specification that contains no source.
The second set of programmers code from that spceification.
For this to be leagal, the must not be any contact or mixing of the 2 sets of programmers other than the specification.
Before you use a nice algorithm that you lifted from decompiling the ASM, make sure you do your research.
It just takes intelligence and insight. Frankly many things aren't that hard- I remember reading some 'essays' about this topic when I was studying. Do a google search for 'fravia' '+orc' and tutorial- You'll find some mirrors. There's even some essays on how to add new functions to notepad given the binary only... which goes to show decompiling a c++ (or any, really) program isn't new.
Please make some observation that can't be deduced from half of a Computer Architecture course.
In your examples, don't use the same variable name in the original source. For example in one of your original sources you used s1 as a variable name. When you were decompiling this, you stated So let's create an alias for the address's 0000:0000 - 0000:0003, which will be s1. which amazing is the same varable name.
Please try to pick non-trival examples to highlight you decompilation acume.
It's relatively easy to come up with the set of C statements that would mimick a particular set of asm statements that you wish to decompile, but the end result would be a C program that was not much easier to read to the original asm was. Changing various assembly operations into C operations does get you back the information you really need.
// do a bunch of stuff using x
// do stuff with temp
The symbolic names make up the bulk of the lost information, but often times programmers will organize a sequence of code in a certain way to make it easier to understand. The compiler will often rearrange that code in a manner that makes it easier for the computer to understand. Compilers will do screwy things like increment a variable on the stack, while holding the original in a register for later usage. Where the original C code might have had the variable increment at the end like this:
while (x 10)
{
x++;
}
The way the compiler optimizes register usage may cause the assembly to actually increment x just after doing the conditional, then hold the non-incremented value in a register for use down below. The decompiled asm might look like this:
while (x 10)
{
int temp = x;
x++;
}
While this may seem like a trivial difference in the C code, it can often distort the intent of the algorithm. When a C programmer sees a construct like the latter, they naturally assume that the temp variable was used because more natural constructs would not. The C programmer then wastes time mulling it over only to discover that it was just dumb.
I am currently on a project where I am maintaining some pretty poorly written code. I can't tell you how much time I waste looking at a particularly ugly algorithm trying to figure out why they are doing all these screwy things, only to discover they were just idiots.
My point is, that the compiler and optimizer are going to mangle the logical order of the code in such a manner that it will be far more difficult to read.
Like I said at the beginning, simple translation of assembly to C is easy, getting back the meaning that gives the endeavor any value at all is much more difficult.
As last words go, "You can't decompile a C++ program" can't hold a candle to, "Hey Honey, watch this!"
Go ahead and mod me off-topic, but it had to be said.
A huge portion of the benefit of the "ugly" solution is that the overhead of invoking memcpy() as a function far exceeds the execution of the byte assignments. If your compiler is smart enough to unroll memcpy(), it would produce the output I described.
I do not fail; I succeed at finding out what does not work.
because Its written in pig latin and I havn't read it myself so it might suck, but read it anyway...
This is just as bad as those people at work that are constantly forwarding news stories with the proprity flag set.
All your base are belong to us!
Sorry about that. And I apologize for being rude.
8. Profit ???
You know, it's funny how you can take a common sample of performance tweaked code and turn it into a personal attack. Think I'll skip the many snarling replies that come to mind, and just remind myself you have no idea who I am or what my skills are.
Get out of the research labs and start creating and maintaining code that has to run on old hardware, old compilers, old third-party products, and has no upgrade budget. You go and explain to the management that performance tweaks would make the code "unmaintainable", and I'll stand by and have a chuckle while interviewing your replacement.
Real production code has tweaked segments. It's not "pretty", it's not as "readable" as some would like, but it is functionally equivalent. Odds are that if you're trying to reverse-engineer code, it's going to be old code that has been hacked and tweaked to keep it going after being rushed into production.
But hey, what do I know? I've only been programming for about 20 years, so obviously some prof saddled with the first year classes must have more insight. I humbly bow before the wisdom of one who preaches source code style while discussing decompilers...
I do not fail; I succeed at finding out what does not work.
I read carefully and noticed that the only sentences that were Davak's bitched about AC posting. Because he had nothing else to offer, I imagined he agreed with what he posted from the FAQ. Once again, I can thank him for nothing. Next time he might add some commentary to the FAQ or just say it ain't so.
Friends don't help friends install M$ junk.
No you can't dissassenble every assenbly binary into the source code. I've worked with self modifying code, which actually writes itself on the fly, sometimes in compelx ways. (fortunatly modern operating systems technically don't allow this, but once in a while you still run across it) I've also worked with hand written machine language. You can't get assembly because there never was any, and the author took advantage of variable instruction lenght so that a JMP to $200 or JMP to $201 is valid, but the results are vastly different. Makes for some really small code, but it is difficult to write, and debug, bordering on impossibal.
You must suck.
"The Great Jack Schitt"
Never heard of him. I don't know him.