Researchers Reverse-Engineer Dropbox, Cracking Heavily Obfuscated Python App
rjmarvin writes "Two developers were able to successfully reverse-engineer Dropbox to intercept SSL traffic, bypass two-factor authentication and create open-source clients. They presented their paper, 'Looking inside the (Drop) box' (PDF) at USENIX 2013, explaining step-by-step how they were able to succeed where others failed in reverse-engineering a heavily obfuscated application written in Python. They also claimed the generic techniques they used could be applied to reverse-engineer other Frozen python applications: OpenStack, NASA, and a host of Google apps, just to name a few..."
/popcorn
Use a non-compiled language, get what you deserve...
Good thing I stopped playing the game.
It's hosed now.
Sounds remarkably like security through obscurity to me. With the predictable outcome.
You have no right to feel secure if you only think you're secure assuming noone else examines your source code.
http://en.wikipedia.org/wiki/Kerckhoffs%27s_principle
Your head of state is a corrupt weasel, I hope you're happy.
...have a no-reverse-engineering clause?
They also claimed the generic techniques they used could be applied to reverse-engineer other Frozen python applications: OpenStack...
Wow, they can reverse engineer OpenStack? That's amazing - what do they use, an obscure set of commands called "wget", "git", and "tar"?
They should have written it in perl.
Why do so many developers waste time on obfuscation and other ways of hiding the source in scripting languages?
Using utilities like IonCube to 'protect' PHP-code will never stop the dedicated people from reverse engineering the application or re-engineering it. I've seen that countless times. It is security-through-obscurity at best and it will prevent people from both fixing bugs and re-submitting the fixed code to the developers, and finding security issues from simple code reviewing.
If developers of competing applications needs to steal code they're really crappy developers and whatever that makes their application unique will be equally crappy and thus not a threat.
"For every complex problem, there is a solution that is simple, neat, and wrong." -- H.L. Mencken (1880-1956) --
It took two developers to scrape Dropbox?
They should have written it in perl.
They would have missed the fun of seeing how obfuscation made the code harder to read.
The point is that with this knowledge, they were able to bypass everything else which shouldn't have been the case -- namely they were able to workaround 2FA and "intercept" ssl traffic meaning they tricked the server into communicating with them. These things go well beyond python -- that python client could have been in the clear/open-source from the beginning but you shouldn't be able to bypass 2FA and get in un-authenticated.
This is the real problem here; their implementation of 2FA and ssl basically depended on a 'proper' client which shouldn't have been the case.
The point of the article wasn't to crack it, it was to show that if something sounds insecure by design, it is insecure...
DropBox allows you to "log in" to it's website via click in the application -> no credentials required. Therefore it must either store user credentials or some other secret(s) on client side (host_id and host_int in this case).
Any process running under privileges accessible to you can be cracked (albeit sand-boxing, in which case you need system privileges) and it can't hide data from end-user / other processes in same privilege space (albeit sand-boxing....).
They can make it more difficult though (extracting Bluray key from windows media player will take anyone at least a few days)
More and more big companies think they can hide data on client side and be secure. Dropbox, Windows Live (LiveConnect) and numerous others are now relying on fast exchange of nonces in addition to client-side secret storing to make it secure "enough".. But breaking the nonce handshake and authenticating in programmatic fashion will add maybe 10% more cracking/programming effort on top of the regular cracking effort.
TLDR: If it is insecure by design, it is insecure and no amount of obfuscation will help you....
Neat, I finally submit to the cloud, and there we go with the security shenanigans!
"with any of the countless other sites, programs and applications written in Python: NASA, Minecraft, Django, OpenStack and a host of Google products, to name just a few."
Minecraft??
Yes, only with Perl would they be able to implement security through obscurity and open-source it at the same time.
-- Make America hate again!
I never knew that Dropbox had two-factor authentication. They only ask for a single password.
I remember an old co-worker referring to perl as "a write only language". I was young and dumb and didn't get what he meant. Until 1 year later when I tried to add a new feature to my perl script.
why would you need to reverse-engineer openstack when you can just grab the code since it's like open
The lesson here: Don't store your data on other people's hard drives.
Presentation slides (view online or download PDF), and links to the paper (PDF) and "dedrop" source code (GitHub):
http://www.openwall.com/presentations/WOOT13-Security-Analysis-of-Dropbox/
USENIX WOOT '13 web page dedicated to this talk, including video and audio (view/listen online or download the video .mp4 via a direct link from there):
https://www.usenix.org/looking-inside-drop-box
(Somehow the Slashdot story only links to a third-party article and to the paper PDF, but not to any of the authors' and the conference's web-based content.)
What happens when Dropbox changes how everything works. How long before it is reverse engineered again. That's why I wouldn't want to depend on this kind of hack for anything.
The "trusting trust" attack that you linked already has countermeasures. One by David A. Wheeler, called diverse double compiling, involves bootstrapping the compiler using several independently developed compilers for the same language and seeing whether they ultimately produce the same binary. Of course, these countermeasures are no help for a proprietary language such as the Pascal variant used by Delphi.
How do you know the machine building your CPU will not inject a backdoor in it?
Because Kevin Horton's NANDputer was built by hand out of a pile of 74HC00 (quad 2-input NAND gate) ICs on a breadboard. There isn't enough room in any single 7400 to insert a backdoor.
Any company shipping their open source code and a closed source compiler for it would invite suspicion.
Does this include Mozilla Corporation and Python Software Foundation, which ship open source code and binaries compiled using Microsoft Visual C++?
I wonder if the developers promised that it was "basically impossible" to decompile the code. Or did the developers more honestly say, "this will buy us a bunch of time."
Writing one's own product doesn't really help to interoperate with the service in which your potential customers are already storing their data.
and NASA is apparently an app, not an aerospace agency
NASA is an aerospace agency, but it's also the website of the aerospace agency. That and the Toki Pona word for crazy or foolish.
There are already a lot of dropbox alternatives that have open source clients and even ones that do encryption. But there isn't a good Skype alternative I've seen that lets me participate in Skype group chats. I don't even care about video/audio chat. Can someone reverse engineer the Skype client next?
... TahoeLAFS?
A pyc is pretty much just a parse tree. It's been syntax checked, etc. but not really compiled. As docs.python.com explains, a pyc doesn't run any faster than a .py. The heading on the docs page is:
"Compiled" Python
With compiled in quotes because though some people use that word, it's not really true.
It's "Unmaintainability Through Obscurity." There never was any (even falsely-justified) security component to it. Nobody is going to say this has somehow made Dropbox less safe.
As copyright owner of this comment, I authorize everyone to defeat any technological measure which limits access to it.
> One of the best obfuscations is to embed an interpreter into your code, and then do all the hard work in the bytecode.
"One of the best" is kind of nebulous, but it's ALWAYS more secure against reverse engineering to distribute a compiled binary, machine code. At least on a PC, or any platform with a decent debugger. Here's why. If you do anything with an interpreter or bytecode, the attacker has at least two options. They can choose to EITHER:
A) Use a debugger to dump the generated machine code and work from that.
OR
B) Use any other method to go after the provided file, the interpreter, or the bytecode.
Distributing a compiled exe (machine code) forces the attacker to do A, eliminating all of the options listed in B.
Of course, what I do, what I think is better, is I ship readable source. Any security needed is handled by actual security, such as encryption of sensitive data, rather than by trying to obfuscate how the program works.
Management: Make sure people don't steal our stuff! ...yeah, sure.
Developers: Okay, uhhh . . . it's obsfucated now, is that good?
Management: Don't give me any of your technical mumbo-jumbo, is our IP secure? We can't monetize it without keeping our secret sauce.
Developers:
Management: Good enough for me! I'm going on a business lunch, you folks get back to work.
I remember sigs. Oh, a simpler time!
Compiled? Then they'll just replace the libs with hacked ones in the loader, and stub out the calls that they'd rather you didn't make. Child process returned 0 to say license is valid - and gee, wasn't it quick!
The problem is not the TSA or the NSA. The problem is the USA.
Compile: to create a set of *machine instructions* from a high-level programming language, using a compiler
Grace Hopper, who coined the term "compile", defined it as "accept things that were people-oriented and then use the computer to translate to *machine code*.”
A primary purpose of compiling code is so that the user doesn't need to have a copy of the matching version of the interpreter. Compiled code runs by itself.
Python bytecode is a couple of steps removed from machine code. Look at how many lines of code are required in the bytecode interpreter to interpret that bytecode and do something with it. Compiled code doesn't need any interpreter, much less hundreds of thousands of lines of interpreter.
The *machine* doesn't have to be hardware, or are you saying that Java isn't compiled either?
Compile: to create a set of *machine instructions* from a high-level programming language, using a compiler
No, it's not and I'm not sure where did you get this narrow definition contrary to contemporary usage. Go ahead and find some more examples of people referring to bytecode comilers as ""compilers" in quotes because they're not really compilers y'know".
Grace Hopper, who coined the term "compile", defined it as "accept things that were people-oriented and then use the computer to translate to *machine code*.â
No, she didn't. She did use this phrase to describe her FORTRAN compiler, tho.
Seriously, should I come over to your place and hit you on the head with my copy of Aho's Dragon Book?
I've always assumed that data on Dropbox wasn't very secure, which is why I was happy to find that ecryptfs works well with dropbox across multiple machines (assuming they are all running Linux). To wit:
/home/orp/e /home/orp/e
/home/orp/e, and it "magically" appears in its unencrypted form (name, content) on any other machine that was updated on Dropbox that has the encrypted partition mounted the same way. All dropbox ever sees is the encrypted stuff.
chinook: ~orp df
Filesystem 1K-blocks Used Available Use% Mounted on
/home/orp/Dropbox/e 491451392 129077764 361240528 27%
chinook: ~orp ls Dropbox/e
./
../
ECRYPTFS_FNEK_ENCRYPTED.FWZS4gY2TLKRZUavoct.ewyb3LhUsTmtMCkw6-7kc4NR3-58yIKIxSsrgk--
ECRYPTFS_FNEK_ENCRYPTED.FWZS4gY2TLKRZUavoct.ewyb3LhUsTmtMCkw9VkRKmwOO95LV0W1qwwNHk--/
ECRYPTFS_FNEK_ENCRYPTED.FWZS4gY2TLKRZUavoct.ewyb3LhUsTmtMCkwKsqUWInaV2aVwzvhw6CcW---
ECRYPTFS_FNEK_ENCRYPTED.FWZS4gY2TLKRZUavoct.ewyb3LhUsTmtMCkwOggoYf2PUQpQQmgJLHwIaU--/
ECRYPTFS_FNEK_ENCRYPTED.FWZS4gY2TLKRZUavoct.ewyb3LhUsTmtMCkwQEdvushvgMYZ2uRpeRJ9EU--
[etc]
This works with the same partition mounted across multiple machines. Save a file to
The main disadvantage to this approach is that if you are trying to access files on a non-linux machine you are hosed; Lastpass and other password managers that have file encryption functionality can give you cross-platform encryption but not with the nice filesystem access that Dropbox provides.
A squid eating dough in a polyethylene bag is fast and bulbous, got me?
Yes, when you're young and dumb you tend to write "write-only" scripts. Eventually you learn to not do that.
Yes, only with Perl would they be able to implement security through obscurity and open-source it at the same time.
"Only Perl can parse Perl." Yargh! Havin' ye source be indistinguishable as compiled for me Parrot.
That's not an instance of the scenario I mention, why do you bring it up?
As a way of beginning discussion about the difference between the scenario you mention and the scenario I mention. Mozilla and PSF ship open source code alongside binaries that cannot be produced with only free tools, as opposed to binaries produced with the MinGW version of GCC.
Compile: to create a set of *machine instructions* from a high-level programming language, using a compiler
Grace Hopper, who coined the term "compile", defined it as "accept things that were people-oriented and then use the computer to translate to *machine code*.”
A primary purpose of compiling code is so that the user doesn't need to have a copy of the matching version of the interpreter. Compiled code runs by itself. Python bytecode is a couple of steps removed from machine code. Look at how many lines of code are required in the bytecode interpreter to interpret that bytecode and do something with it. Compiled code doesn't need any interpreter, much less hundreds of thousands of lines of interpreter.
*machine code*, *machine code*. As Inigo Montoya would say: You keep using that word. I do not think it means what you think it means. Seriously, what's the point of quoting Grace Hopper if we are willing to ignore the historical definitions of what compiler and *machine*.
The phrase "machine instructions" was never meant to literally stand for "hardware machine instructions" to the exclusion of anything else. From very early on in the evolution of computers, compilers created somewhat portable symbolic instructions meant to be further decoded or translated at start-up run time into the actual hardware level instructions.
From a purely theoretical POV, the concept of a machine that could execute instructions preceded the existence of hardware machines. Think turing machines, -recursive functions, turing-complete string rewriting systems and lambda calculus. Think of the idea of algorithmic systems that can translate a program representation from one mathematical model of computation to another. That is a compiler. That is, in the world of the computable, a machine has never been exclusively of a hardware nature, and the notion of a compiler has never been constrained by that limitation.
Moving from the esoteric to the mundane, p-code is the most commonly known historical name for this approach that has existed since the 60's (and which is now typically referred to bytecode.) Mainframes and mini-computers sported such compilers in a variety of languages - BCPL, COBOL, PL/1, etc.
The world of practical computing has always moved around and above this notion.
Hell, if languages that produce bytecode/p-code are not compiled because these are not true hardware instructions, then neither is the x86 family of assemblers and native compilers because the x86 family of "native" CICS instructions are not true compilers.
Why? Well, because, unlike RISC hardware platforms, those instructions are interpreted at run-time into the micro-code instructions specific to the hardware.
That is the x86 CISC instruction set is not hardware machine code, but an extremely low-level p-code/bytecode interpreted at run-time by an on-the-die interpreter.
A compiler take a stream of symbols and emits a different stream of symbols based on a predefined set of rules.
That is it.
Java has a compiler
Python has a compiler
Latex has a compiler