Experts Crack Petya Ransomware, Enable Hard Drive Decryption For Free
Reader itwbennett writes: Petya appeared on researchers' radar last month when criminals distributed it to companies through spam emails that masqueraded as job applications. It stood out from other file-encrypting ransomware programs because it overwrites a hard drive's master boot record (MBR), leaving infected computers unable to boot into the operating system. Now, security experts have devised a method that, while not exactly straightforward, allows users to recover data from computers infected with the ransomware without paying money to cyber criminals. Folks over at BleepingComputer have confirmed that the aforementioned technique works.
Props to the guys that cracked it and made it available!
Just cruising through this digital world at 33 1/3 rpm...
You read it wrong. These are emails posing as coming FROM job applicants, to companies looking for hires (or just random people in said company).
These days you can't even get proper encrypting malware, what are the chances that actual encrypting software available to public is any different?
I hope that they'll offer at least a partial refund to anyone who's paid in the last 30 days.
If you're familiar with an MD5 hash, that's what's stored on the drive. Except it's a slightly different version than MD5.
If you're NOT familiar with MD5, I'll try to explain it a bit. The malware author wanted to handle the key being entered incorrectly, to have an error message saying "that's not the correct key". Without that error message, a typo while entering the key would result in decrypting the drive incorrectly, permanently destroying the data. So the malware needed a way to determine if the key is correct or not. To determine whether or not a key (or password) is correct without storing it, programmers use something called a hash.
Here's a really bad hash algorithm, just to demo the concept:
Where X is the key (a number):
(square root of X) = 110
So we store the hash, 110. Someone enters 9 as the key. The malware does the math:
(square root of 9) + 9 = 12
Since the hash doesn't match 110, that's the wrong key and it throws an error.
The hash function I just used is bad because based on the result, 110, you can easily figure out that the key must be 100. The malware used a better hash function, one based on something called "salsa20". However, the hash function they used wasn't very secure. You only have to try maybe a million keys before you find the right one. With CPUs that can try a million keys in just a few seconds, it's easy to find the key which matches the stored hash.
A lot of this randomware comes in the form of javascript files. I'm sure it could be adapted to encrypt your Linux machine and, if you have /boot mounted, replace your EFI bootloader as well (MBR might be a bit tricky. It'd have to monitor/wait for when you use sudo).
I have no idea if Linux ransomware exists, but if it doesn't, it's not due to it being technically non-feasible. There are some safeguards that make it more difficult yes, but not impossible. When Mac OS started getting more popular, we started seeing more Mac malware. Get off your high horse.
I don't have the specifications for a MBR memorized, but I suspect that by knowing what information should be at specific offsets, (or by experimenting with possible values), the person was able to perform something similar to a known-plaintext attack to extract the key. In any case, bravo!
I've been doing security for 20 years, so most of my explanation is based on reading between the lines. I think it was the last link in the article mentioned the crack starts with getting the "verification hash" from the disk, or similar wording. The rest is knowing what hashes are used for and how encryption an crypto malware works in general.
If the key were infinitely long, there would be infinitely many keys that match the hash. Since the key is approximately the same length as the hash, there is approximately ONE key that matches the hash. In computer forensics, you ALWAYS work on an image of the drive, never the original, so trying a wrong key won't hurt, if there happen to be two keys which match the hash. As you mentioned, you can also test whether or not a candidate key produces reasonable output.
So they made a Genetic Algorithm to efficiently crack Salsa. In this case, Salsa10 and not Salsa20, but what does that mean for the Salsa algorithm in general? I've not seen any real analysis of the greater fallout if Salsa is weaker than expected!
Fear: When you see B8 00 4C CD 21 and know what it means
My demonstration hash algorithm was intended to be:
For key X,
Hash = (square root of X) + X
As mentioned above, that's easily reversible, so it's a bad hash function. Good hash functions are much more complicated and should require at least a thousand years of CPU time to reverse.
> incredibly dumb on the part of the malware authors.
> All they really needed to do was have a known unencrypted blob that they could compare against after decrypting and completely avoid storing any hash
In this case, it's the same thing. Both are Salsa based.
In the general case, you can afford to use a stronger algorithm for a short plaintext (the hash) and a faster (weaker) algorithm for the main encryption, so using the hash is MORE secure than misusing the main encryption routine as if it were a hash.
Typically, a hash is consists of an encryption primitive repeated many times, 64 times in the case of SHA256 and MD5. So the hash should be stronger (and much slower) than a similar encryption.
Remember the suggestion was to encrypt a small, fixed-sized block.
You can create a rainbow table for encrypting a 16-byte block MORE easily than a hash rainbow of the same 16 bytes, because it's precisely the same operation, except the hash version does the operation 64 times.
If you stored a million bytes of encrypted data, that would be (probably) more difficult than 16 bytes of hash, but not harder than a million byte hash.
Let P be the number of possible plaintexts and J be the number of possible hashes. The average number of plaintexts which hash to a given value is therefore P / J.
We said the input is the same length as the hash. Therefore, there are always the same number of potential hashes of that lemgth as there are potential plaintexts. That is, P = J. Therefore, the average number of plaintexts per hash is P / P = 1.
When designing a hash function, it is fairly trivial to ensure that the distribution is approximately uniform, and virtually all hash functions in use have this property. Therefore, for substantially all hash values, the number of possible plaintexts is approximately equal to the average, which is 1.
>no algorithms that I am aware of come close to ... 1
Even distribution is a design requirement for hash functions. Any unevenness is predictably and therefore brokenness.
MD5 gives even distribution, though it is otherwise broken for many use cases. In one experiment, the experimenter hashed 10 million values, I believe, and compared the number of times each possible value appeared in the first 8 bits and the last 8 bits. The difference between the most common value and the least common was less than 1%. To my knowledge, there's no theory that MD5 isn't evenly distributed .
For SHA256, it is known that the distribution isn't perfectly even, but the variance from even distribution may well be less than 1% for SHA256 as well.