Practical Exploits of Broken MD5 Algorithm
jose parinas writes "A practical sample of an MD5 exploit can be found, with source code included,in codeproject, a site for .Net programmers.
The intent of the demos is to demonstrate a very specific type of attack that exploits the inherent trust of an MD5 hash. It's sort of a semi-social engineering attack.
At Microsoft, the MD5 hash functions are banned.
The main problem is that the attack is directed to the distribution of software process, as you can understand reading the paper, Considered Harmful Someday. Some open source programs, like RPM, use MD5, and in many open source distributions MD5 is used as check sum."
If you contract your file from x bytes down to a fixed size, no matter what algorithm you use, you will always have collisions.
Unless you start to give your hash keys as the same size as the original file, there is not anything that can be done about it, ever.
liqbase
...better use Tiger or Whirlpool (based on AES). AFAIK there are no known vulnerabilities or attacks for these two yet.
Do not be alarmed. This is only a test.
Now we know why people distribute modified game ISOs on the net and check it with md5 :P In all technicality, couldnt this mean that someone could land a virus on someone else's machine because the person trusted the hash?
Viable Slashdot alternatives: https://pipedot.org/ and http://soylentnews.org/
Isn't this the problem with all algorithms? Only way to know if something is something... is to see something instead of its checksum?
As for md5... with only 32bits, it should've came up with repetitive hashes in end anyways?
I guess since the article explains some issues against md5 security, the only answer would be to trust the source that is supplying the hash in the first place? Coming down to the fact that a system is only as secure as its user?
At Microsoft, the MD5 hash functions are banned.
they use crc instead!
MD5 --> MD6
This seems to work on the assumption that you want to do some harm with a program you created yourself, you can't actually take a random RPM and turn it into an evil RPM with the same MD5. So, yes, it's bad, but it's not as bad as you might think.
Heya,
... are compressed. I use archlinux and I know the contents of the packages is compressed.
I thought that the known attacks would not work agains compressed content, because you had to add some "random" data to the content and while decompressing the decompressor would give an error. I know it isn't exact, but I thought that md5's of compressed content was safe. I'm not sure if rpm's,
greetings,
Michel
Considering this is such a "well known" result, you would think that MD5 should have been abandoned long ago. Is this true for other popular hash functions?
"Yields falsehood when preceded by its own quotation" yields falsehood when preceded by its own quotation.
This isn't a problem for software distribution, really, since the good.bin file needs to start with a vector designed to enable a collision. A good-faith programmer wouldn't include that vector.
It is a problem for stuff like contracts; you draw up two versions of a contract, a good one and an evil one, let someone sign the good one, and later keep them to the clauses in the evil one.
So while there IS a very big problem, the example is a bit contrived.
Unfortunately there is no way of guaranteeing they wont be found next month.
e.g. reiserfs does hash filenames too. does this mean that i would be able to overwrite or modify a file owned by root if i'm able to guess a filename which produces the same hash as this specific file?
This is kinda interesting. Well, the user need to use my installer, but it might infect them, and they have no way to tell. But the interesting thing is... A couple of script/programs use md5sum, like rpm and such. How much work is it to change into sha1? AFAIK SHA1 is 160bits, whilst md5 is 128bit, so a bit more space in the rpm is needed. Apart from that?
The solution to all this is gpg signing. I've heard little fuss about that... Yes, it is simply a longer signature, making it more difficult to break, but still..
Assembling etherkillers for fun an profit
Well, duh.
Athletic Scholarships to universities make as much sense as academic scholarships to sports teams.
My next paper is "Considered Harmful" Papers Considered Harmful
What about using document formats that are plain text but transformed to a paper/screen viewable format? The original demonstration of this (two postscript documents) was done by adding specific garbage to the files, making it slightly clearer what was going on. With Word or Excel, you stand no chance of determining if a file has had garbage added to it or not.
Banning MD5 is a decent PR move at Microsoft, as long as they teach their target audience of morons why they're doing it first, but don't mistake this for actual security, just because it happens to have real world security benefits.
P.S. A lot of file formats will fall to this, including some used by open source projects. I'm not bashing Microsoft for being Microsoft; I'm bashing them for a) making a big deal out of something that everybody else took as a matter of course months ago, and b) holding this up as an example of them shifting focus towards being more secure.
I wonder how does this affect the file integrity checkers. A lot of these softwares store hashes and use them to verify if a file has changed.
So the next time someone installs a root kit, he just needs to do it in a way TFA points out.
pardon me if i might sound redundant or ignorant, but why shouldn't md5 be considered a free algorithm ?
sha1 creaky at the edges, which, AFAIK is used by GIT.
Perhaps SCO will get their source code into the kernel by financing SHA1 collisions?
Sam
blog.sam.liddicott.com
Guys,
I don't mean to bitch here, but the parent is just trying to karma whore, by stating why HE thinks is obvious, but if you read the insightful replies to his BS, you will see he is full of it...
Actualy, just look at his posting history... Karma bitch...!
AFAIK there are no known vulnerabilities or attacks for these two yet
I am no cryptography expert so I can not read and understand those algorithms. But the fact that there are no known vilnerabilities for an algorithm doesn't make it secure.
Maybe they are just not used as much as other well known algorithms. And therefore nobody has found vulnerabilities for them yet?
thomasdamgaard.dk.
Two pages, same hashes, etc. (This is the guy who wrote the MD5 someday paper.)
http://www.doxpara.com/t1.html
http://www.doxpara.com/t2.html
No attack? what about brute force? It's an attack, not a good one, but it is an attack.
RPM uses both MD5 and SHA1, the chances of finding a collision that satisfies both hashes is small, even if both MD5 and SHA1 are compromised since the hash the data differently.
rpm -Kvv xorg-x11-libs-6.8.2-37.FC4.49.2.i386.rpm /var/lib/rpm/Packages rdonly mode=0x0 /var/lib/rpm/Packages /var/lib/rpm/Pubkeys rdonly mode=0x0
./updates-released/packages/xorg-x11-libs-6.8.2-37 .FC4.49.2.i386.rpm: /var/lib/rpm/Pubkeys /var/lib/rpm/Packages
D: Expected size: 2655615 = lead(96)+sigs(344)+pad(0)+data(2655175)
D: Actual size: 2655615
D: opening db index
D: locked db index
D: opening db index
D: read h# 278 Header sanity check: OK
D: ========== DSA pubkey id b44269d0 4f2a6fd2 (h#278)
Header V3 DSA signature: OK, key ID 4f2a6fd2
Header SHA1 digest: OK (f37bf5cb97db696f14133b90e23f2455b9f94587)
MD5 digest: OK (8eda29837b6992876bd867df03b3b8af)
V3 DSA signature: OK, key ID 4f2a6fd2
D: closed db index
D: closed db index
D: May free Score board((nil))
No, so let's just abandon the idea with hashes and encyption algorithms altogether. :p
Having said that, I find it highly amusing that Microsoft would ban MD5. Are they really trying to say we can't trust them as a source of software?
<evil grin>
Ruby Neural Evolution of Augmenting Topologies
Misunderstandment. I never made a statement about MD5 not being "free". But MD5 is vulnerable, that's why I pointed out some alternatives.
Do not be alarmed. This is only a test.
There is also no way of knowing if your going to get hit by a bus next month .
So all we can do is take sensible precautions to make sure it doesn't happen .
The only things certain in war are Propaganda and Death. You can never be sure which is which though
Attacks only get better, a theoretical vulnerability is one to worry about as they are almost always followed by a practical exploit like this. Move away from SHA1 before the same thing happens.
I am trolling
Not a SINGLE post about Bittorrent (which uses MD5) and the **AA's? I'm rather dissapointed in you people.
There isn't sufficient energy in the solar system to complete a brute force attack.
By the way:
1) I am risking my life right now for writting this and I must leave the internet cafe in 4 minutes.
2) See who claims that MD5 is insecure, follow the links and understand who knows the backdoor to the "alternative, more secure" hashing algorithm.
GTG
HASTA LA VISTA!
The NIST is having a two-day workshop in Gaitherburg, Maryland (USA) on October 31-Nov 1. Xiaoyun Wang will be giving a keynote speech, and there'll be plenty of technical material to go around. The workshop website is: www.csrc.nist.gov/pki/HashWorkshop/program.htm. I don't work for NIST or anything, but I thought this was interesting and they haven't really done a good job getting the word out about this conference.
Wer mit Ungeheuern kämpft, mag zusehn, dass er nicht dabei zum Ungeheuer wird. --Nietzsche
... or maybe you (and the mod weho gave you Insightful) just have a strange sense of humour?
If the MD5sums for two files don't match, then you can still say the files definitely are not the same. So it's not entirely useless. You can still use it to check for non-deliberate file corruption {such as you might see if you have a faulty drive or motherboard}, since the example was so contrived as it could almost never happen by accident.
.....
Also, I don't see how you could apply the scheme through the usual layers of archiving and compression. If I have a file "tldpsk.tar.gz" which contains "photoindex", "camprobe", "photograb", "install", "copying", "readme" and "manifest" -- all of which are human-readable text files, and the manifest, which is also reproduced on the download page, contains the original MD5sums of all the other files -- and I take its MD5sum, sure I might be able to produce another file with the same MD5sum as "tldpsk.tar.gz". But the new "tldpsk_altered.tar.gz" probably won't uncompress cleanly {first alarm bell} because the extra data you added probably won't be a valid file within the tar archive. And even if it is a valid file, then the manifest will be wrong {second alarm bell}. If you added extra data to one or more of the inner files -- let's say you put something nasty into "camprobe" and something else into "install" -- then the chances are that these files now won't be perfectly human-readable {third alarm bell}, even if their MD5sums match the ones in the manifest and on the download page.
PGP signatures can help, of course; but all a PGP signature really proves at the end of the day is that the file was signed by someone who knew the purported author's secret key. In an ideal world, of course, that means nobody but the author; but if the author of the package was unlucky enough to trust an MD5sum on a compromised file, that might not necessarily be the case
Je fume. Tu fumes. Nous fûmes!
While this is a very serious attack, it doesn't yet mean that every use of MD5 is totally unsafe.
... yet :-)
It is feasible now to generate 2 different pieces of data with the same MD5 hash. As many file formats allow one to embed invisible 'junk', it is possible to create a 'good' version and an 'evil' version with the same MD5 hash.
BUT it is NOT (yet) feasible to create a piece of data with a given MD5 hash. This means that if you can not modify the 'good' version, you can't create a matching 'evil' version.
An example of where the usage of MD5 isn't broken are *nix passwords. Your password is hashed with MD5 (a salt is added to your password too, but that's not important here), and that hash value is stored. Anyone who can supply a password (doesn't have to be the same one!) which has the same MD5 hash, is allowed access.
So if it would be feasible to generate data with a given MD5 hash, one could easily generate a matching password, when given the MD5 hash (which you can often easily acquire, especially in NIS/yp environments). But luckily, this is not possible
So you'd better start protecting those hashes (that's what a shadow password file does), en better yet, move to a better algorithm. Like the Blowfish algorithm that OpenBSD has been using for years now.
The problem is that a broken algorithm just makes that piece of social engineering a lot easier.
If I just told you you can download the latest auto-installer of the latest WoW patch from www.i-pwn-ur-puter.ru instead through the slow Blizzard installer, you might think "uh, wtf, I think I'll play it safe anyway and get it directly from Blizzard. I trust them more than I trust a warez and script-kiddie site."
Now picture that I tell you "and here's a link to the MD5 sum on Blizzard's site. You can check for yourself that the the file on our site is the original file and it hasn't been tampered with." In fact, I would even _urge_ you to make a habit to check all your downloads against the original MD5 sums, for your own good.
It already looks a lot safer and more legitimate. Well, maybe not to _you_, but to a lot of people it does.
That's the whole problem. That false sense of security makes the "if we can convince you to run our insecure extractor code" part a helluva lot easier.
A polar bear is a cartesian bear after a coordinate transform.
Once again, OSX proves to be more secure!
*ducks*
Unfortunately for now there is no other way but "break and fix" cycling. However, sometimes I feel people are too easily persuaded to buy into a new thing since 'The old one has been proven vulnerable, and this brand new shiny thing looks promising'.
Brute force is always a possibility. But the possibiblity of someone having and willing to spend the resources, technology, money and time to screw you over by brute forcing a 256 bit Whirlpool hash is marginal.
Do not be alarmed. This is only a test.
surprisingly many stories hashes to the same value..
This is a complicated issue. Generally, the security offered by an encryption algorithm isn't measured by its popular usage, but by the amount of time qualified professional cryptographyers/mathematicians/hackers have studied it without finding a critical vulnerability. My claim is probably too broad: there is no magical formula that determines how secure an algorithm is. But in depth work by professionals does endear confidence in an algorithm.
As a general rule of thumb, it is wise to use an algorithm that has been seriously studied for 10-20 years. At this point, it is modern enough to withstand modern brute force attacks, and (hopefully) understood well enough to ensure that there are no structural vulnerabilities. If it is much older than that and still studied, it is likely because a flaw has been found and people are trying to push it as far as it goes.
After all, I am strangely colored.
Haha! Good one!
Do not be alarmed. This is only a test.
I wonder why we're not using multiple checks on files. In the ports systems of OpenBSD and NetBSD, checksums using several algorithms are stored for each file. Right now, I think only SHA1 and file size are checked, but I think the feasibility of an attack would be severely reduced if another alrorithm (say RMD160) were used instead. The same could be done with other systems (apt, BitTorrent, ...)
Please correct me if I got my facts wrong.
And to think that they're still working on getting MD5 digests in TCP packets...the algorithm could be as useless as the checksums they currently use by the time this change would have become widely accepted (now it's probably never going to happen).
Please correct me if I got my facts wrong.
There's the brute force method. It only takes a month to run.
said this before...
... LAST YEAR.
Dan Kaminsky is actually the dude who came up with the Stripwire idea.
Tom
Someday, I'll have a real sig.
Hey, wait up, you forgot your wallet, mr... Anderson?
As far as I know, the technique used for finding these MD5 collisions, cannot be performed with a GIVEN hash. So it's not possible to create, say, a copy of an already available RPM, add malicious code to it, and easily find some data to add to it to generate the same hash. This is not possible.
The only thing the current 'crack' does is create two RANDOM input files that generate the same hashed output. So it's only useful for someone who can control both the 'original' and the 'malicious' version of the data which is being protected by an MD5 hash.
So the dangers here are kind of limited though you could still do a lot of damage with it.
The known attacks produce MD5 collisions with the same length!!
Padding and byte-counts and so forth are a red herring and provide no mitigation. The basic colliding messages are all exactly 1 block long. From there you can do message-extension. Any hash function with this so-called Merkle-Damgard structure is vulnerable to length extension attacks. And yes, this is true of Whirlpool too (in contrast to another poster's assertion): if you find two messages with the same hash and the same length, you can extend them both with the same string and continue to get collisions.
"Secure" is both an adjective and a verb. We usually use it as an adjective, like "orange" or "happy".
I've had several discussions with my wife over the years about something being "orange". She insists that some maize-like color is truly orange, but the color of the University of Illinois' basketball jersey is more of a red.
Choose an algorithm that makes you feel secure. Understand that your chosen algorithm, whatever it may be, is not proved unbreakable. It's just that the work involved for someone to break it is greater than the value to them of doing so - and greater than the value to you.
Raise your children as if you were teaching them to raise your grandchildren, because you are.
AFAIK this is an attack on the underlying Merkel-Damgard paradigm which both SHA-1 and MD5 (amoungst others) employ. The paradigm goes as follows:
...
:-)
IV | Intialisation vector of n-bits
MB_i | Message Black i of n-bits
HB_i | Hashblock i of n-bits
f:(IV , MB_i) -> HB_i | is the underlying compression function which takes both an IV and a message block as input and outputs a hashblock.
First the orginal messaage is split up (and padded if need be) into n-bit blocks. Then f is applied with an IV and MB_1 as input resulting in HB_1. f is then applied iteratively each time tacking the next message block as input while using the previouse hash block as the IV input.
f(IV, MB_1) = HB_1
f(HB_1, MB_2) = HB_2
f(HB_s-1, MB_s) = HB_s = H(Message)
Merkle and Damgard proved that the over all construction is collision resistent given that the compression function f is collision resistant.
As the parent post pointed out though the last block had better include the over all message length. If this is not the case then by extendeing 2 different but colliding messages x,y with the same plain text q the input to the compression function becomes identical since H(x) = H(y) = IV input for f. If on the other hand the length of the orginal message is included in the last block then even though the IV input for f is the same for f(H(x),q_1) and for f(H(y),q_1)..., the final message block (q_s) will again be different resulting in a different final hash block.
If on the other hand len(x) = len(y) then the problem persists since both IV and message blocks will be the same when the final iteration of the algorithm is reached.
Infact this attack is even stronger since by the same reasoning one can see that to produce H(x+q) all one needs is to know is H(x) (and len(x) if that is included in the last message block). No other information is needed about the orginal message x! H(x) is simply inserted as the IV for f when hashing q and so the iteration is "jump started" just where x finishes. (If the length is included in the last block then all that need be used is len(x)+len(q).)
Disclaimer: Not to 100% sure about all this though so please feel free to correct me if i'm wrong...
Anyone here can provide some links/references to appropriate documentation to educate myself?
Achille Talon
Hop!
For a start, it still massively enlarges the number of people you must trust.
E.g., do you automatically trust everyone working at Blizzard? How about some disgruntled temp employee at the publisher? Do you trust everyone at the company who made the install program too? Etc.
Due to the whole "if H(A) = H(B), then H(A+Q) = H(A+Q)", they don't even have to convince Blizzard of anything. As long as I control the A and B versions of the self-extractor module, Blizzard's own content is the Q in that equation. It doesn't matter. Automatically _any_ self-extracting patch made with executable A, can have that executable part at the front stripped and replaced with the malicious executable B, and the result will still match the MD5 sum. I don't have to convince them to change their self-installing patch on their site: it will just be automatically compatible with the exploit as it is.
Basically the whole thing has been bumped from "if the hash matches, you're extremely probably getting the same version" to "you have to trust everyone along the whole bloody chain that they didn't silently plant a part they can swap-in later." I don't know about you, but the latter makes me a bit less comfortable.
Then there are a lot of files that people install that aren't from the devs themselves.
E.g., while getting Blizzard to include my modified files might be hard, it probably wouldn't be hard at all to get a site like Vidiot Maps to replace one of their maps with my own more detailed map of an area. They'd probably even thank me for it.
Basically, yes, it does require a lot more effort and preparation than just generating a virus that matches a given MD5 sum from scratch. But you have to trust everyone along the chain that they haven't made that effort. If they did, that illusion that MD5 is safe as houses just becomes a liability.
A polar bear is a cartesian bear after a coordinate transform.
It's dumb to ban MD5. It is still, and will continue to be a useful tool in situations where a cursory comparison is needed for reasons other than security. It would be silly, stupid even, to use a larger hash that requires more (slower) computation in these situations. Banning MD5 can mean one of two things: Either it's a stupid publicity stunt that will result in slower code overall, or that Microsoft doesn't trust it's developers to be smart enough to know when a particular hash is appropriate.
Hence the reason for going with SHA-256. It doesn't fix the issues with the SHA algorithm (which may eventually be found to break the whole SHA system, much as MD5's first clear weaknesses proved to merely be the first steps on the path to shatter its reliability in the course of only months), but rather covers up the weakness with additional difficulty that is still sufficient to deter attackers.
You can never go home again... but I guess you can shop there.
RPM only uses MD5 to check for corruption of the type you might find during download. RPM actually uses GPG or PGP to sign the generated RPMs for security, and GPG is (afaik) capable of using nearly any hashing algorythm including ones that are yet to be invented. So as far as security is concerned RPM doesn't use MD5 but rather uses whatever hashing algorythm the GPG key that signed the RPM was generated with.
Windows is a bonfire, Linux is the sun. Linux only looks smaller if you lack perspective.
Whirlpool is not "based on AES". It shares a few similiarities (and a designer) but it is a distinct algorithm in its own right. It has a larger block size, different S-boxes, a different linear component, a different key schedule and so on.
I would interpret "based on AES" as meaning it actually uses AES itself (perhaps in tandem Davis-Meyer mode or similar).
I like Whirlpool but it's not fast. I think Tiger is quite a bit faster.
Bernstein's cache timing attacks, and the ever-growing gap between processor speeds and memory access times, mean that table-based primitives (which both of these are) are going out of style.
Xenu loves you!
The good.bin file DOESN'T need to start with a "vector designed to enable a collision". Any good file with any resulting hash can be used. You put random crap into the BAD file to make it match the hash of the bad file. Most file formats (including executables, Word documents, PDFs, and common compression formats like .zip files) have some way to include some bytes somewhere that have no effect on the use of the file. Any such format is vulnerable to this attack.
All you have to do is sign the checksums so that the recipient can tell where they've come from and the jobs done.
thank God the internet isn't a human right.
I just read the article, and i was thinking :If we can replace the extractor, then why bother creating an evil file with the same checksum ? The extractor can do the "evil patch" at extraction. The method in this article isn't very usefull for evil purposes.
I've been reading the posts in this thread and I've noticed that there are two types of posters here: the ones who got it 100% right, and the clueless ones (there appear to be little or no posters in the middle ground).
Now, the clueless ones are thinking of lots of "attacks" using this vulnerability, some of them really wrong. Since this has the potential of getting lots of people to do stupid things (like not trusting MD5 when they should), let's talk a little bit about the vulnerability and its effects.
First of all: this is not new. There was an article here explaining the same attack a few months ago (about x.509 certificate collisions and how to fake postscript orders, if you know what I am refering to, please post a link).
The attack goes like this:
You have a block B1 that is known to collide with another block B2.
You have some custom made code that looks like this:
-----BEGIN SNEAKY CODE---------------
If DATA[1] = DATA[2] then
do something good
else
do something bad
end
DATA[1]
DATA[2]
-----END SNEAKY CODE-----------------
The trick is that since there's a collision between B1 and B2 and MD5 makes the hash by reading sequentially, the hash for the whole program will be the same whether you fill DATA[1] and DATA[2] with B1 or B2 (in any combination). Since the code is DESIGNED to do different things depending on the collision area, by changing the contents of DATA[1] and DATA[2] you can have programs that do "good" or "bad" things, with the same hash. Please note it's been DESIGNED with that in mind.
From now on I'll talk on absolute terms, while in reality there is a very small probability of things being right for an attack without being planned that way, so keep in mind that before saying "but that's not the whole truth.....".
Now let's discuss what's possible to do and what's not:
1.Oh no! Now, someone will create a virus that has the same hash than my favorite app!
False: the app (or installer) would have to have been designed with that "feature" in advance.
2.MD5 is worthless and should not be used anymore.
False: MD5 is useless in the situation presented above. There are some very good uses of MD5 that are safe (like access control: this attack does nothing practical to you salted MD5 shadow file). MD5 should probably be watched for other undesirable properties, though. An alternate cryptographically secure function should be kept in reserve.
3.I'll use another hash function, I'll be invulnerable to this attack.
(somewhat) False: You'll be invulnerable until someone finds ONE collision in your new hash function (it might take a long time but....). Then you'll be vulnerable again. But now we all know what can be done with ONE collision. What you're thinking is probably good, but it's no silver bullet.
4.Microsoft will forbid the use of MD5 and DES, and use SHA-1 and AES. We should do the same.
(somewhat) True: Not for the reasons you're thinking though. If MS is really doing this, this attack is a lame excuse to do it. MD5 is still useable for some things, and SHA-1 is not much better than MD5 in the things related to this attack. IIRC these collisions were found using an attack derived from an attack on SHA-1. Right now, SHA-1 collisions can be found in 2^63 operations (and the clock is ticking). We should probably consider using a new hash function someday, but leave the decision to the cryptologists. About AES, it's about time. DES can be brute forced in reasonable time, and that's been like that for a few years. 3DES is slow. That's the reason for the AES contest, we should use since we have it.
5.Someone could distribute some sort of binary and the switch it so it does lots of damage to unsuspecting people.
True: That's exactly what the attack is about. Maybe you were wrong to trust [insert a name here].
6.Who should be doing what and when?
If you work in crypto, you probably k
GPG 0x1B479C78
Why did he use Javascript?
<!-- --> is all you need, and it works in every browser.
Why is it that when you believe something it's an opinion, but when I believe something it's a manifesto?
I'm working on a system that uses MD5 in order to cache some data so that it does not always have to be retrieved over the network - the client instead asks for the MD5 and checks to see if it is in its disk cache before doing the more expensive operation of pulling the whole dataset and adding it to the cache. We plan to ship the system with the cache pre-populated so that in the field there won't be any "cache misses" until we ship an update and then there will only be just the one.
In this case, I think the attack vector would be as follows. Some malware opens the cache and replaces the full data with an alternate set of data wiht the same hash key. This would result in the client not operating correctly (because it has the wrong data.) Of course, they could be done even without the md5 being compromised because the program just assumes that the data in the cache file is valid. This leads me to believe I ought to re-run the MD5 on the data in the cache when the cache file is loaded in order to validate it (discarding any records that don't hash correctly).
So, assuming I do that, I should probably change away from MD5. Which algorithm should I use in its place? Any recommendations from the good folks on slashdot?
Avoid Missing Ball for High Score
rpmbuild -ba --sign ...
...
... this is (and has been for a long time) common practice across all rpm-based distros and tools (urpmi/apt etc).
or
rpm --addsign rpm
yum may verify the signature, but then
Source: Wikipedia
Dewey, what part of this looks like authorities should be involved?
That attack doesn't work!!
This attack operates on one fact:
if MD5(x) == MD5(y) then MD5(x + q) == MD5(x + q)
Here's the rub: the MD5 hash of your infected patch is MD5(x + q). The MD5 hash of blizzard's patch is MD5(q)
MD5(q) != MD5(x + q)
if MD5(x) == MD5(y) then MD5(x + q) == MD5(y + q)
the rest still holds.
The linked to article talks about matching PS documents. That is a much more serious attack. The real flaw though is that it depends upon tricking people. The concept is that you generate a special postscript file. The file is read and digitally signed by by somebody. Then you swap out part of the file with annother one with the same md5sum and the PS yeilds a different message. The flaw here is with the person who signed the document is in confusing the output of a program with the program itself. The fact is that the program could have detected the target's printer and printed a different message there then on everybody else's printer.
Now, I have digressed. Anyway this attack, while related to the ps one, is flawed in that it depends on a combination of 2 macilious files, a malicious bin file, and an extactor that is malicous also.
The postscript code is closer to this (which would not work for c, due to the way executables are, but at least shows the basic flow of the exploit style.)
Remeber that that actually would not work, but shows the basic flow.Again the problem is that a signature or hash can mean only that the file has not been modified (and really even that is not garenteed as these attacks show), but gives no indication that the author is not malicious. These current attacks require the author to be malicious.
TFA is indeed not clear. I'm assuming he meant that somebody publishes a) an extractor and b) an archive, and with MD5's for them both.
People download, run, it all works great. Based on this trust, you download them both and try them too. Their MD5's match what everyone says it should be.
Unfortunately, in the meanwhile the site owner has replaced the good archive with the evil one. The MD5's are the same, but you will now get pwn3d.
But the fundamental problem here is really just that you are treating the extractor as a trusted binary. The archive isn't an issue.
The extractor could instead have been coded to do something evil simply based on today's date, your login name, or a random number. No MD5 fiddling needed.
Why don't we just make it a standard practice to provide "verification" hash checksums from multiple algorithms? If you provide *both* the MD5 and the SHA1 hashes for a file, it will be many, many levels of magnitude more difficult to construct replacement data which hashes the same both ways. The collision space for this must be infintessimal?
If everybody just switches to AES or somesuch, aren't we just postponing the problem until similar methods of attack are proven against its algorithms? By combining multiple hash algorithms, you gain a sort of independent oversight.
Ernest MacDougal Campbell III
geek ramblings
http://www.cryptography.com/cnews/hash.html
http://www.nsrl.nist.gov/collision.html
would you mind telling me what the CRC field is for in the gzip format then ?
There are places where the networks are not touching,and there are places where they are-Boeing's Lori Gunter
No, you don't understand. No, I'm _not_ proposing to start from a Q and H(Q), so you can stop repeating that already.
I'm proposing to start from an A that's been planted in advance, and thus from an A+Q. As long as my "seed" A is in there, I don't flippin' care what Blizzard's Q is there. I can replace my pre-planted A and with my pre-prepared B, and end up... guess what? With exactly a case of A+Q versus B+Q.
E.g., an auto-extracting installer is exactly a case of A1+Q1. You have an installer A1 at the front which Blizzard _didn't_ write, and an archive at the end Q1 which Blizzard did create.
If I control the A1 and have a B1 which hashes to the same value, I can swap it in the above scenario and keep the MD5 sum for the whole file unchanged.
The executable itself, the A1 and B1 above will in turn, yes, be constructed like that. Out of a base installer Q, and two pieces of code A and B.
Basically it's a case of A+Q+Q1 vs B+Q+Q1. Where only the Q1 part is under their control, Q is the actual installer, and A looks benign.
Do you understand _now_? Again, I'm not saying I'm gonna start from scratch with a H(Q) and build a collision from there, I'm saying it's possible for an insider to plant a seed in advance that they can replace later.
The same applies for anything else. I can for example distribute a pre-compiled library where one module is an A+Q case. The Q is the actual library module you wanted to use, and the A is planted there for future use. The moment you link that in your program, it becomes again an A+Q+Q1. Where Q1 is your code. Voila, I can now swap my B in and keep the MD5 sum intact for every single program you compiled with that library.
That's the problem. No, you can't start from scratch, but basically it becomes feasible for one _hell_ of a lot of people to plant such pieces of padding in advance, for future use.
A polar bear is a cartesian bear after a coordinate transform.
I'm saying it's possible for an insider to plant a seed in advance that they can replace later[...]No, you can't start from scratch, but basically it becomes feasible for one _hell_ of a lot of people to plant such pieces of padding in advance, for future use.
It's already possible without this MD5 attack. It's called modifying the code to put a trojan or subtle vulnerability in it. You're just proposing a needlessly complicated method of taking advantage of being on the inside.
Your initial posting didn't mention anything about inside jobs, it described the attack entirely as if some outsider could launch this on their own.
Please mod this guys posting down, "people on the inside can trojan executables" is not the slightest bit insightful, nor is it specific to MD5 attacks.
But he, Sir, is not!
As others have said, hashes (eg checksums) are numerical methods performed to produce a small number from a large one.
The hash is complex so that when you have the small number it's hard to work out the large one. But imagine my hash p(n) takes number n between 10000 and 20000 and returns a single digit. The number of collisions is huge. I have a 1 in 10 chance of picking an "n" that gets the same hash as your number, "m" say. What I can't do is workout your "m".
With passwords however. Most collissions aren't proper words. So a human can readily check which collision matches. Try a search for "jack the ripper".
According to the Tiger home page, Tiger2 is just Tiger using the MD5/SHA padding method. It's probably done to make it a more convenient drop-in replacement, rather than for any security reasons.
Xenu loves you!
Ok, thanks!
Do not be alarmed. This is only a test.