Researchers Reverse-Engineer Dropbox, Cracking Heavily Obfuscated Python App
rjmarvin writes "Two developers were able to successfully reverse-engineer Dropbox to intercept SSL traffic, bypass two-factor authentication and create open-source clients. They presented their paper, 'Looking inside the (Drop) box' (PDF) at USENIX 2013, explaining step-by-step how they were able to succeed where others failed in reverse-engineering a heavily obfuscated application written in Python. They also claimed the generic techniques they used could be applied to reverse-engineer other Frozen python applications: OpenStack, NASA, and a host of Google apps, just to name a few..."
Good thing I stopped playing the game.
It's hosed now.
Sounds remarkably like security through obscurity to me. With the predictable outcome.
You have no right to feel secure if you only think you're secure assuming noone else examines your source code.
http://en.wikipedia.org/wiki/Kerckhoffs%27s_principle
Your head of state is a corrupt weasel, I hope you're happy.
even then, all it takes is someone versed in the assembly language of the platform your application runs on, a copy of IDA pro or something similar, and a few hours of his time. I know this is a bit of a lost art in today's world of python and javascript, but it's still valid.
Better delete your dropbox-hosted /copporn
Lawyers have trouble understanding that law doesn't dictate the limits of curiosity, greed, mathematics, or physics. If there is sufficient incentive, it WILL be cracked. In this case, I think they wanted to demonstrate that drop box is not secure. This should be a 'duh' experience for anyone in IT worth their salt.
They also claimed the generic techniques they used could be applied to reverse-engineer other Frozen python applications: OpenStack...
Wow, they can reverse engineer OpenStack? That's amazing - what do they use, an obscure set of commands called "wget", "git", and "tar"?
Clever of you to post as AC.
Because no compiled (or assembled) code has ever been cracked.
Wouldn't that only be applicable if:
a> these people were "End users"
b> it was enforceable in their jurisdiction
To me, the fascinating thing is that someone wanted to do that.
"First they came for the slanderers and i said nothing."
Rediculous!
Wouldn't that only be applicable if:
a> these people were "End users"
b> it was enforceable in their jurisdiction
Actually, yes, but if they *aren't* it falls under the DMCA, which is much, MUCH worse...
And jurisdiction... well... http://www.youtube.com/watch?v=EOJNs5YPR4g
I hope your sarcasm is understood, it's a dangerous technique to use on the internet.
However, there's an interesting twist to the pcode vs. native code dichotomy, from reverse engineering standpoint, as anyone who's well versed in the brain-mangling line noise that calls itself the IOCCC will know. One of the best obfuscations is to embed an interpreter into your code, and then do all the hard work in the bytecode.
Your head of state is a corrupt weasel, I hope you're happy.
They should have written it in perl.
Fuck, that woooosh just blew my wig off!
Your head of state is a corrupt weasel, I hope you're happy.
Why do so many developers waste time on obfuscation and other ways of hiding the source in scripting languages?
Using utilities like IonCube to 'protect' PHP-code will never stop the dedicated people from reverse engineering the application or re-engineering it. I've seen that countless times. It is security-through-obscurity at best and it will prevent people from both fixing bugs and re-submitting the fixed code to the developers, and finding security issues from simple code reviewing.
If developers of competing applications needs to steal code they're really crappy developers and whatever that makes their application unique will be equally crappy and thus not a threat.
"For every complex problem, there is a solution that is simple, neat, and wrong." -- H.L. Mencken (1880-1956) --
Why? If you're looking for the selfish angle, maybe he/they just wanted the notoriety. However, he/they might've just wanted to do a public service. Most people trust dropbox to be secure. Of course, slashdot users should all know better than to trust the 'cloud' for anything sensitive, but a way to get this info to people who would not otherwise know this is to make a splash about a successful pen-test.
Lots of guys see it as a challenge; the digital equivalent of saying 'you can't have this.' Well, challenge accepted.
Been there. Done that.
I believe it was EA that was doing that way back as part of their DRM for their Commodore 64 disk-based games. It would load the interpreter and a script, then execute the script [drawing it's fancy startup screens, checking for various bad sectors on their disk, over-writing parts of the script and interpreter, loading the game from various parts of the disk].
Sleep your way to a whiter smile...date a dentist!
They should have written it in perl.
They would have missed the fun of seeing how obfuscation made the code harder to read.
The point of the article wasn't to crack it, it was to show that if something sounds insecure by design, it is insecure...
DropBox allows you to "log in" to it's website via click in the application -> no credentials required. Therefore it must either store user credentials or some other secret(s) on client side (host_id and host_int in this case).
Any process running under privileges accessible to you can be cracked (albeit sand-boxing, in which case you need system privileges) and it can't hide data from end-user / other processes in same privilege space (albeit sand-boxing....).
They can make it more difficult though (extracting Bluray key from windows media player will take anyone at least a few days)
More and more big companies think they can hide data on client side and be secure. Dropbox, Windows Live (LiveConnect) and numerous others are now relying on fast exchange of nonces in addition to client-side secret storing to make it secure "enough".. But breaking the nonce handshake and authenticating in programmatic fashion will add maybe 10% more cracking/programming effort on top of the regular cracking effort.
TLDR: If it is insecure by design, it is insecure and no amount of obfuscation will help you....
See yesterday's slashdot article where the article author claims that no one will steal your ideas. No-reverse-engineering clauses in EULAs exist to prevent competitors from cloning software by disassembling and studying its internals. Did the researchers break a (valid) contract?
Neat, I finally submit to the cloud, and there we go with the security shenanigans!
How is Dropbox not secure? Do you mean the client you have control of isn't secure? That's all the article is speaking of - they haven't found a way to steal your data from Dropbox unless they already have a secret from your PC.
In order to access your account, they need the secret host_id (which is generated per device and unique to that device) and host_int from your computer (although, if they already have host_id, they can get host_int from the server - so really, they only need host_id). Presuming they have access to your computer, they can use these keys to access your account. (ie, without actually having your password). If they already have access to your computer however - well, at this stage we're splitting hairs. Any software which stores your login credentials on your own computer is at best hiding an access method through obscurity.
The only way to avoid this is to require you to enter your password each time you want to sync your files. Same with Google Drive. Same with .. every piece of software that stores login credentials on the client. Calling DropBox "insecure" when you actually mean "as secure as any client-side auto-login software can be" is a misnomer.
"The true measure of a person is how they act when they know they won't get caught." - DSRilk
Use a non-compiled language, get what you deserve...
Python is compiled, if you distribute *.pyc files only.
NB: The message above might reflect my opinion right now, but not necessarily tomorrow or next year.
Python and javascript are syntactically much more difficult to master than assembly language.
Plus, there are way more privitives to learn...
If Pandora's box is destined to be opened, *I* want to be the one to open it.
Actually, yes, but if they *aren't* it falls under the DMCA, which is much, MUCH worse...
And jurisdiction... well... http://www.youtube.com/watch?v=EOJNs5YPR4g
*if* they live in murica... but they could just as well live somewhere in the *rest* of the world.
Privitives -> primitives
If Pandora's box is destined to be opened, *I* want to be the one to open it.
Lawyers have trouble understanding that law doesn't dictate the limits of curiosity, greed, mathematics, or physics. If there is sufficient incentive, it WILL be cracked.
Non sequitur. Law also dictates that you can not steal and break into someone elses vault (limiting physics arguably). There will be sufficient incentive that people will do it nevertheless, thereby breaking the law. That does not mean it is an invalid law.
NB: The message above might reflect my opinion right now, but not necessarily tomorrow or next year.
Yes, only with Perl would they be able to implement security through obscurity and open-source it at the same time.
-- Make America hate again!
Minecraft is written in java.
http://michaelsmith.id.au
The link in TFA says that Przemyslaw Wegrzyn is from Poland. No idea about Dhiru Kholia but that's not a typical name for the US.
That only works reliably for C-like code though.
Python and javascript are syntactically much more difficult to master than assembly language.
That's why there are so many assembly masters as compared to script kiddies, err, Python and JS "masters"? Or were you meaning to be funny? The mods certainly were clueless. (Interesting, really?)
The cesspool just got a check and balance.
DMCA will need to be changed, for it to ever be able to prohibit cracking things like dropbox. Dropbox is too-general purpose for you to ever be able to guarantee in advance, that the copyright holder (the person whose authorization matters) will join a block in denying permission to the public. If I hold the copyright on a file, and a dropbox user uses dropbox to apply a technological measure that limits access to that file, I can give myself (and everyone else) permission to bypass that technological measure.
HDCP has the same issue.
As copyright owner of this comment, I authorize everyone to defeat any technological measure which limits access to it.
Presentation slides (view online or download PDF), and links to the paper (PDF) and "dedrop" source code (GitHub):
http://www.openwall.com/presentations/WOOT13-Security-Analysis-of-Dropbox/
USENIX WOOT '13 web page dedicated to this talk, including video and audio (view/listen online or download the video .mp4 via a direct link from there):
https://www.usenix.org/looking-inside-drop-box
(Somehow the Slashdot story only links to a third-party article and to the paper PDF, but not to any of the authors' and the conference's web-based content.)
I've never considered dropbox to be secure after they screwed up and allowed anyone to log into any account a few years back. Now they include a public folder for sharing in dropbox so what's secure about that?
If you want to use dropbox and ensure things are secure, then encrypt it yourself. Otherwise, do not bitch if those private fotos of your wife end up shared on the net.
Mod me up/Mod me down: I wont frown as I've no crown
You can learn my privitives any time you want baby.
What happens when Dropbox changes how everything works. How long before it is reverse engineered again. That's why I wouldn't want to depend on this kind of hack for anything.
The "trusting trust" attack that you linked already has countermeasures. One by David A. Wheeler, called diverse double compiling, involves bootstrapping the compiler using several independently developed compilers for the same language and seeing whether they ultimately produce the same binary. Of course, these countermeasures are no help for a proprietary language such as the Pascal variant used by Delphi.
How do you know the machine building your CPU will not inject a backdoor in it?
Because Kevin Horton's NANDputer was built by hand out of a pile of 74HC00 (quad 2-input NAND gate) ICs on a breadboard. There isn't enough room in any single 7400 to insert a backdoor.
Any company shipping their open source code and a closed source compiler for it would invite suspicion.
Does this include Mozilla Corporation and Python Software Foundation, which ship open source code and binaries compiled using Microsoft Visual C++?
And games that ran via an interpreter go back to at least the Infocom Z-Machine in 1979.
I wonder if the developers promised that it was "basically impossible" to decompile the code. Or did the developers more honestly say, "this will buy us a bunch of time."
If there's one thing I can't stand, it's language elitism. Look, the language you choose to write your application in is completely irrelevant. Programming languages are tools to help you solve problems and, unless you're a compiler writer or theoretician, aren't really all that interesting in and of themselves. If you think you're a better programmer than someone because of the language you've chosen rather than the types of problems you're able to solve and the quality of your solutions, then you've completely missed the point.
So why hasn't the US Government issued arrest warrants for these terrorists? Other people bringing security flaws to the attention of the service/product provider often get arrested and some even commit suicide under a mountain of charges. Oh, these two are "researchers"; I understand now. [ cough, cough ]
Writing one's own product doesn't really help to interoperate with the service in which your potential customers are already storing their data.
and NASA is apparently an app, not an aerospace agency
NASA is an aerospace agency, but it's also the website of the aerospace agency. That and the Toki Pona word for crazy or foolish.
In fact, if you RTFA, you see that's what Dropbox does.
Have you heard about SoylentNews?
Just because someone reverse-engineered the dropbox client doesn't mean that dropbox is insecure. (Well, maybe their 2FA is bypassable.)
There are already a lot of dropbox alternatives that have open source clients and even ones that do encryption. But there isn't a good Skype alternative I've seen that lets me participate in Skype group chats. I don't even care about video/audio chat. Can someone reverse engineer the Skype client next?
A pyc is pretty much just a parse tree. It's been syntax checked, etc. but not really compiled. As docs.python.com explains, a pyc doesn't run any faster than a .py. The heading on the docs page is:
"Compiled" Python
With compiled in quotes because though some people use that word, it's not really true.
It's "Unmaintainability Through Obscurity." There never was any (even falsely-justified) security component to it. Nobody is going to say this has somehow made Dropbox less safe.
As copyright owner of this comment, I authorize everyone to defeat any technological measure which limits access to it.
> One of the best obfuscations is to embed an interpreter into your code, and then do all the hard work in the bytecode.
"One of the best" is kind of nebulous, but it's ALWAYS more secure against reverse engineering to distribute a compiled binary, machine code. At least on a PC, or any platform with a decent debugger. Here's why. If you do anything with an interpreter or bytecode, the attacker has at least two options. They can choose to EITHER:
A) Use a debugger to dump the generated machine code and work from that.
OR
B) Use any other method to go after the provided file, the interpreter, or the bytecode.
Distributing a compiled exe (machine code) forces the attacker to do A, eliminating all of the options listed in B.
Of course, what I do, what I think is better, is I ship readable source. Any security needed is handled by actual security, such as encryption of sensitive data, rather than by trying to obfuscate how the program works.
Laws are essentially the terms of the social contract. We agree to be a part of society, as such we agree to abide by the laws put in place by our society. As with any other contract, violating the terms comes with certain consequences, but also as with civil contracts, these consequences only come about if someone notices/catches you/proves it.
"These things go well beyond python -- that python client could have been in the clear/open-source from the beginning but you shouldn't be able to bypass 2FA and get in un-authenticated."
That's right. And remember kids, when you see people pushing closed binaries and wont provide source - this is exactly the kind of basic fsck up they are almost certainly trying to hide.
If you write a secure system, you dont need to worry about people seeing the source. You want them to see the source. You want them to appreciate just how elegantly you solved the problem. You hide the source when you know you screwed it up and are just stuck hoping no one catches on.
=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Friends don't let friends enable ecmascript.
Bad choice in metaphors really. Assembly would be more akin to making a 3D model out of clay and presenting it instead of doing it in software and printing it off on a 3D printer. You end up with the same result, one is easier to do, but one has a more human feel to the fine details of the finished product. Curves are put on the clay were only a tangle mess of excess plastic is on the printed version. In the end both do their job, ones highly more efficient with no excess the other easy to reproduce and build tangents off of.
Management: Make sure people don't steal our stuff! ...yeah, sure.
Developers: Okay, uhhh . . . it's obsfucated now, is that good?
Management: Don't give me any of your technical mumbo-jumbo, is our IP secure? We can't monetize it without keeping our secret sauce.
Developers:
Management: Good enough for me! I'm going on a business lunch, you folks get back to work.
I remember sigs. Oh, a simpler time!
Compile: to create a set of *machine instructions* from a high-level programming language, using a compiler
Grace Hopper, who coined the term "compile", defined it as "accept things that were people-oriented and then use the computer to translate to *machine code*.”
A primary purpose of compiling code is so that the user doesn't need to have a copy of the matching version of the interpreter. Compiled code runs by itself.
Python bytecode is a couple of steps removed from machine code. Look at how many lines of code are required in the bytecode interpreter to interpret that bytecode and do something with it. Compiled code doesn't need any interpreter, much less hundreds of thousands of lines of interpreter.
Dirty, dirty doughnuts, and men in uniforms who love them.
I really think what gets classified as "language elitism" tends to average out about 50-50. As you say, the languages are tools to get things done and choosing the right tool for the job is important, and there is a legitimate place for most if not all languages. It's a sad fact that a lot of humans derive some enjoyment to badmouthing anything they personally do not like or use, and I see some of that.
Still, I think there is also a significant difference between programming something from the ground up using appropriate language(s) and simply stringing together a bunch of library calls using some sort of ultra-high level script as glue, and if you tell me which language you prefer I can probably make a reasonably good guess as to which side of that divide you lean towards.
=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Friends don't let friends enable ecmascript.
The *machine* doesn't have to be hardware, or are you saying that Java isn't compiled either?
I've always assumed that data on Dropbox wasn't very secure, which is why I was happy to find that ecryptfs works well with dropbox across multiple machines (assuming they are all running Linux). To wit:
/home/orp/e /home/orp/e
/home/orp/e, and it "magically" appears in its unencrypted form (name, content) on any other machine that was updated on Dropbox that has the encrypted partition mounted the same way. All dropbox ever sees is the encrypted stuff.
chinook: ~orp df
Filesystem 1K-blocks Used Available Use% Mounted on
/home/orp/Dropbox/e 491451392 129077764 361240528 27%
chinook: ~orp ls Dropbox/e
./
../
ECRYPTFS_FNEK_ENCRYPTED.FWZS4gY2TLKRZUavoct.ewyb3LhUsTmtMCkw6-7kc4NR3-58yIKIxSsrgk--
ECRYPTFS_FNEK_ENCRYPTED.FWZS4gY2TLKRZUavoct.ewyb3LhUsTmtMCkw9VkRKmwOO95LV0W1qwwNHk--/
ECRYPTFS_FNEK_ENCRYPTED.FWZS4gY2TLKRZUavoct.ewyb3LhUsTmtMCkwKsqUWInaV2aVwzvhw6CcW---
ECRYPTFS_FNEK_ENCRYPTED.FWZS4gY2TLKRZUavoct.ewyb3LhUsTmtMCkwOggoYf2PUQpQQmgJLHwIaU--/
ECRYPTFS_FNEK_ENCRYPTED.FWZS4gY2TLKRZUavoct.ewyb3LhUsTmtMCkwQEdvushvgMYZ2uRpeRJ9EU--
[etc]
This works with the same partition mounted across multiple machines. Save a file to
The main disadvantage to this approach is that if you are trying to access files on a non-linux machine you are hosed; Lastpass and other password managers that have file encryption functionality can give you cross-platform encryption but not with the nice filesystem access that Dropbox provides.
A squid eating dough in a polyethylene bag is fast and bulbous, got me?
Kind of like the mechanic that gets stuck in the middle of an argument between a Ford guy and a Mopar guy.
"Oh but Mopar does that all wrong! It should be done the way Ford does it!"
"Yeah but Ford insists on doing this other bit all backwards... Mopar is where it's at!"
...and then you have the mechanic...
"Eh... I get paid to fix cars. What the hell do I care which brand name is on the back?"
And yes, there is a difference between building a solution from ground-up and using an existing framework to build a solution on. The difference is that the ground-up solution will take longer and be more expensive (though, will have better obscurity). The choice is made based on a business decision... how long do we have until we need to monetize this?
So, in the end, you have programmer A telling programmer B that he sucks at programming because a (possibly poor) business decision was made by programmer A's boss. If that's where programmer A wants to hang his hat... I say go ahead and let him.
Someone flopped a steamer in the gene pool.
Yes, only with Perl would they be able to implement security through obscurity and open-source it at the same time.
"Only Perl can parse Perl." Yargh! Havin' ye source be indistinguishable as compiled for me Parrot.
*if* they live in murica... but they could just as well live somewhere in the *rest* of the world.
Whoosh! Thanks for clarifying the point of my joke :) You haven't been keeping up on the news much lately, have you?
If you think you're a better programmer than someone because of the high level language you've chosen rather than the types of problems you're able to solve and the quality of your solutions, then you've completely missed the point.
FTFY.
Your point only applies to comparisons between high level languages. And arguably, this may apply to a lesser degree between languages of different levels of abstraction.
Programming in assembly involves actually understanding what the processor is doing under the hood. Q.e.d. programmers comfortable with writing in assembly are better than solely high level programmers.
You can call it elitism when comparing a driver of a BMW and that of a Toyota, but it's not elitism to say that professional race car drivers are better than commuter drivers.
"If a nation expects to be ignorant and free in a state of civilization, it expects what never was and never will be."
Yes, they used to use headerless Forth for that, not all that hard to hack if you were familiar with Forth.
I really think there is a lot of devil in the details that you gloss right over with that. Get a good glue-man to use good libraries and give him a job that is suited to those tools and you should expect a good product. But give him a job those tools are not suited for and you should expect crud. The language of implementation is only one variable among many. It's possible to write junk in any language.
Using a well reviewed library might allow you to avoid all kinds of easy and common mistakes. But relying on code you dont understand makes it harder to debug the result, and that just might be an understatement. It's always going to be possible to accomplish tasks with fewer cycles using hand tuned code. The counterargument is that you can usually hand tune only 10% or less of the code, after it's running, and get 90% of the benefit that way.
So ideally it seems large projects would have a life cycle starting with RAD in ObjC or Python or whatever you want, leading to a stable form which would then be relentlessly optimised into a mature and reliable product. I can remember seeing that happen occasionally in the past, but last few years programs seem to be considered obsolete before they really clear beta.
On top of that I see a disturbing trend towards low level stuff like device drivers being done in RAD languages, but that's another subject.
And yes, all too often the choices wind up being made for financial reasons by people that havent the slightest clue what the ramifications of their decisions will be.
=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Friends don't let friends enable ecmascript.
That's not an instance of the scenario I mention, why do you bring it up?
As a way of beginning discussion about the difference between the scenario you mention and the scenario I mention. Mozilla and PSF ship open source code alongside binaries that cannot be produced with only free tools, as opposed to binaries produced with the MinGW version of GCC.
Clueless.Programming is a whole lot more than merely syntax and what to do with it. That sounds like:
learn words. Words mean things. Mean you words are.
Which is my impression of script kiddies. They're at the first rung of the ladder that leads to programming.
The cesspool just got a check and balance.
It's not language elitism. It's a given that the knowledge base required to do something with JS or Python is significantly lower than with assembly. I wish I was an assembly master in addition to what I know, obviously it's preferred for my current hardware platforms, but even with other modern architectures it'd help me better understand how to optimize my code more than I do currently. As it is, I'm probably a "script kiddie" when it comes to assembly. A little knowledge is a dangerous thing. At least I know enough to know that and leave those sections to experts in their fields.
The cesspool just got a check and balance.
I'll give you a story about 2 companies. One chose language A and its eco-system. It was all proprietary and locked in. The other chose language B, and it's eco-system. This language was much more open source oriented, and by its nature, not locked to the vendor's platform. 2 very comparable systems were built (either could have fronted for the other) For sake of argument we'll say that the people were equal, possibly even of greater skill in language A (actually, my real lang A people probably were significantly better, sadly) Both companies were within an order of magnitude of customers and revenue. Profit was in company B's favor. Number of customers was in company B's favor, by a multiple. IOW, company B served more customers with smaller transactions with greater profit but lower revenue.
Now, company A required 800% more servers to serve almost an order less data to 1/10th the customers. Actually, I'm not sure about that, but I do know for a fact that the data layer was an 80 fold increase while the total number of servers at B was less than 1/20th of just the DB servers at company A, and company B had at least 100% overhead available for growth. I have no idea about the app and web layer comparisons between the two. I also know that company B had releases roughly 12 times as often as company A, and a lot less rollback / downltime. Also, the personnel comparison was 1/10th the people at B vs A in development. Neither company was small.
What you can take away from that is yes, language has a very big impact on what you can deliver and how fast you can deliver it, and the cost of maintenance and scalability do directly correlate to the technologies chosen. Even if there are smarter people overall on the poorer technological choice, they will lose on every metric except maybe the initial release, and then only by time to release.
So whatever elitism you might think I exhibit, it's merely my observations of my experiences. You see someone building a bridge out of marshmallows, you just know the end is going to be ugly.
The cesspool just got a check and balance.
Partially correct. You can find some nugget in open source. However, for real successful enterprises, you generally have to take that and do significant work and extensions to actually produce a worthwhile product. This can be true even of commercial code.
The cesspool just got a check and balance.
Yeah, normally they would have the bad tracks on the outside edge of the disk, but I know that EA mitigated against this by having data out there as well as checking a couple of times. And they also took advantage of the fact that the head could read more accurately than it could write, so they would read data from track 39, and check for an bad sector on track 39.5, which you couldn't write using the 1541 [as writing clobbered the 1/2 track on either side of the head, so you could either write the data or the bad sectors, so you had to crack the DRM instead].
Sleep your way to a whiter smile...date a dentist!
Unable to see the irony of spelling dyslexia incorrectly? Enjoy your Asperger.
Compile: to create a set of *machine instructions* from a high-level programming language, using a compiler
Grace Hopper, who coined the term "compile", defined it as "accept things that were people-oriented and then use the computer to translate to *machine code*.”
A primary purpose of compiling code is so that the user doesn't need to have a copy of the matching version of the interpreter. Compiled code runs by itself. Python bytecode is a couple of steps removed from machine code. Look at how many lines of code are required in the bytecode interpreter to interpret that bytecode and do something with it. Compiled code doesn't need any interpreter, much less hundreds of thousands of lines of interpreter.
*machine code*, *machine code*. As Inigo Montoya would say: You keep using that word. I do not think it means what you think it means. Seriously, what's the point of quoting Grace Hopper if we are willing to ignore the historical definitions of what compiler and *machine*.
The phrase "machine instructions" was never meant to literally stand for "hardware machine instructions" to the exclusion of anything else. From very early on in the evolution of computers, compilers created somewhat portable symbolic instructions meant to be further decoded or translated at start-up run time into the actual hardware level instructions.
From a purely theoretical POV, the concept of a machine that could execute instructions preceded the existence of hardware machines. Think turing machines, -recursive functions, turing-complete string rewriting systems and lambda calculus. Think of the idea of algorithmic systems that can translate a program representation from one mathematical model of computation to another. That is a compiler. That is, in the world of the computable, a machine has never been exclusively of a hardware nature, and the notion of a compiler has never been constrained by that limitation.
Moving from the esoteric to the mundane, p-code is the most commonly known historical name for this approach that has existed since the 60's (and which is now typically referred to bytecode.) Mainframes and mini-computers sported such compilers in a variety of languages - BCPL, COBOL, PL/1, etc.
The world of practical computing has always moved around and above this notion.
Hell, if languages that produce bytecode/p-code are not compiled because these are not true hardware instructions, then neither is the x86 family of assemblers and native compilers because the x86 family of "native" CICS instructions are not true compilers.
Why? Well, because, unlike RISC hardware platforms, those instructions are interpreted at run-time into the micro-code instructions specific to the hardware.
That is the x86 CISC instruction set is not hardware machine code, but an extremely low-level p-code/bytecode interpreted at run-time by an on-the-die interpreter.
Heh and don't forget SimEarth - written in Clipper of all things. Clipper was basically compiled DbaseIV :-O Kinda' miss working in that actually, it was fun work at the time and FAR faster than the interpeted DbaseIV stuff :-) SimAnt and others may have also been done this way, I didn't dig into those too much...
Build it, Drive it, Improve it! Hybridz.org
Except that different languages are good at different things. This doesn't mean that one language is better than another, it's that one language can be better for a given purpose. I'm not a better programmer than anybody else because I like C++ and Perl. I'm a better programmer if I know what to write in C++ and what in Perl, as opposed to somebody who would just write everything in Java (or Python or Haskell or....).
You're correct in that language elitism is stupid, but not in that the differences between languages is minor or of little practical import.
"When you have eliminated the unacceptable, whatever is left, however improbable, must be the truthiness" - Holmes