A Competition To Replace SHA-1
SHA who? writes "In light of recent attacks on SHA-1, NIST is preparing for a competition to augment and revise the current Secure Hash Standard. The public competition will be run much like the development process for the Advance Encryption Standard, and is expected to take 3 years. As a first step, NIST is publishing draft minimum acceptability requirements, submission requirements, and evaluation criteria for candidate algorithms, and requests public comment by April 27, 2007. NIST has ordered Federal agencies to stop using SHA-1 and instead to use the SHA-2 family of hash functions."
The draft can be found (in PDF) here.
Er Galvão Abbott - IT Consultant and Developer
...the magical SHA-24M?
The security of a given hash/encryption would seem to be a function of how much effort has gone into breaking it. Lots of algorithms can look good on paper, but until people really tear into the math and code, it's true level of unbreakability is undecidable. A 3 year competition is not likely to bring enough IQ, theorems, malevolence, or brute CPU cycles to bear against any candidate.
The point is that any attempt to quickly create a new algorithm is likely to create an insecure one. Shouldn't we be trying to create candidate algorithms for the year 2050 to give the algorithms time to withstand attack? Or do we plan to keep creating new algorithms as a serial security-by-obscurity strategy.
Two wrongs don't make a right, but three lefts do.
Think about it. You walk into a video store and you see Rot-13 and right next to it you see Rot-7 --which one you gonna spring for?
Not 13. Seven. Seven Little monkeys sitting on a fence...
Schneier proposed such a competition in March 2005: http://www.schneier.com/crypto-gram-0503.html#1
Please correct me if I got my facts wrong.
The amount of research done in to hash functions is nothing like the amount that goes in to ciphers. I'm not really sure why this is the case because hashes are much more important than ciphers. Hashes are used in MACs to protect the integrity and authenticity of a message.
Ask yourself this, is it more important that somebody can read your SSH connection or that somebody can hijack the channel? The reasons for wanting a good hash function suddenly become very clear.
It's true that hashes are becoming less important as a result of AEAD modes. But they have uses far beyond MACs and it's good to see a competition from NIST to stoke research in to those primitives.
Simon.
When you digest a message and obtain a hash it is obvious that there will be collisions.
It is obvious that there will be many possible inputs that produce the same output.
however the actual chance of encountering two inputs that hash to the same value by accident is vanishingly small.
with SHA1 even finding two inputs that hash to the same value deliberately is very hard and finding a second input to match an existing output is considered virtually impossible.
If I show you some pictures of people can you tell which one is me? Would you let me on a plane with just a grainy picture?
that is a very different situation because two photos of you will be far from identical. Secure hash functions are only usefull in the case where things are supposed to be identical (two copies of the same file for example).
note: i'm known as plugwash most places but i screwd up registering that here somehow in the past and now can't register
``Maybe secure hashing needs to store a mixture of the low level and the high level details but in a context specific way - the face picture example should also store the detailed iris pattern as well as an overall face picture, both should match to allow this person through. It might be easy to find someone who looks like me, but the specific portion cannot be modified without surgery.''
The idea is that, in a good hash function, each input bit affects all the output bits more or less equally. This is especially true of cryptographic hashes, and for a good reason. The stronger the correlations between input and output, the weaker the hash function.
Please correct me if I got my facts wrong.
Collision always exist,
what u don't want is an easy way to locate them!
Know that there is a collision is a normal thing,
be able to fabricate Hundred of them in a couple of minute
(It's NOT possible now with SHA1, but it's becoming just a bit easier..)
So better start now to replace it so if even more serius
problems are founded we will be in a good shiny new boat!
(It will probably also be sunked but we buy time in this way)
Does anyone know whether or not common protocols and formats such as TLS, ssh, X.509 certs, etc are being updated to use newer hash functions?
Its easy to change parts of a self-contained system, such as password hashes, but common protocols require interoperability and standards compliance.
This is actually fairly interesting situation, where NIST certification and platform interoperability may actually be at odds with each other.
You clearly don't know what a crytographic hash is about. And this is not what is ment by collisions resitant. What it means is that there is minum amount of work needed to produce a collision.
There are a number of different type of collisions as well. Lets assume we have a 256-bit hash. There is the kind of colision where you just find *any* 2 strings that produce the same hash, which should require on avarage 2**128 "operations". A harder task is given a string and its hash find another string with the same hash. For a secure hash 256-bit hash function this will require on avarage 2**256 "operations".
There are other properties that are important as well. Its a well established idea. Hashes are very very usefull and are used for a lot more that file verification and we know what properties they need. We are just not very good at producing very good hashes yet.
If information wants to be free, why does my internet connection cost so much?
Anybody know if SHA-512 is mathematically vulnerable to the same kind of attack as SHA-1 (only presumably requiring more computing power)? Or is it really a different kind of beast?
I always wonder about what would happen if we used multiple hash functions together. E.g. you provide an SHA-1 hash, an MD5 hash, and an RMD-160 hash, all for the same message. Would that be harder to fool (i.e. make the system think you got the original, but it's actually a forgery) than one hash function that generated as many bits? What about weaknesses in the individual hash functions; would you be worse off because a flaw in any one of your hash functions affects you, or better off, because you have more hash functions that need to be fooled?
By the way, IIRC, OpenBSD and NetBSD include multiple hashes per archive in their ports trees, but use only one for verification.
Please correct me if I got my facts wrong.
The unfortunate thing is that in order to win this competition, the hash function submitted will have to be pretty conservative - very much like the SHA-512 family. There isn't time to analyze anything really new and have enough confidence in its security to bless it as the new standard for ever on. But if these attacks (and especially the recent attacks on the whole Merkle-Damgard structure) show us anything, it is that the old way turns out not to be the best way to build hash functions, and that more innovative ideas (eg Daemen et al's "belt and mill" proposal RADIOGATUN) should be the way forward.
What we need is for NIST to pull the rug under everyone near the end, and say "thanks for putting huge amounts of energy and hard work into designing and attacking all these hash functions, now you can all make new proposals based on what we've all learned and we'll start over again!"
Xenu loves you!
I think that you are missing the point of a hash.
A hash is a signature of the file, its designed to give a good confidence that a given file that you have been supplied matches the one that you think has been supplied.
The theory being that being able to create a file that is of the same length as the orignal, is not corrupt (eg a zip file still unzips, an executable still runs, a pdf still displays) and is different from the original but still hash should be infeasable (not impossible, most cryptography doesn't look for impossible, not practical within a given time frame is sufficient for most needs)
Another use of hashes is on data storage systems, especailly with backup systems, where two files with the same hash and length are treated as the same file (so no need to write it to tape twice) this way you only have to sort the list of hashs and look for matches, rathering than having to diff every file against every other one.
Personally I think I'd rather binary diff matches hashes just to be safe - but thats time intensive. The chances of two files each having the same size and SHA-256 hash and being different is less than the chance of your sotrage device being destoryed (meteroite, fire, flood, plane) before you are able to back up either file
$_="Slashdotter";$syn="OTT";s;..;;;sub _{print shift||$_};s!ash!Perl !;s=$syn=ack=i;tr+LLEd+BLAH+;_"Just Another ";_
WHIRLPOOL.
It's a balanced design, an SPN to boot.
The big problem with the SHA's [and their elk] is that they're all UFN [unbalanced feistel networks], in particular they're source heavy. Which means the the branch/diffusion is minimal (e.g. it's possible to make inputs collide and cancel out differences).
SPN [substitution permutation networks] like WHIRLPOOL are balanced in their branch/diffusion.
Best of all, WHIRLPOOL is already out there. just a sign the paper!
Tom
Someday, I'll have a real sig.
So, I haven't studied this matter at all, but it seems to me that if you use more than one has algorithm on the same message, the chances of a different message generating the same has from both algorithms should be vanishingly small. Any cryptographers here care to fill me in?
-jcr
The only title of honor that a tyrant can grant is "Enemy of the State."
No you can't very easily modify it - thats the point.
You can exhaustively search for a collision, but the time requirement is very much non trivial.
Feel free to prove me wrong - unless you have a huge botnet or a supercomputer available I dont give you much chance of finding a collision that way for md5 let alone SHA-1
$_="Slashdotter";$syn="OTT";s;..;;;sub _{print shift||$_};s!ash!Perl !;s=$syn=ack=i;tr+LLEd+BLAH+;_"Just Another ";_
This "news" is several months old.
Oh well I know, it's Slashdot.
2000 times quicker than brute force (where brute force is average time 2^159 attempts) means the algorithm is not as secure as it used to be thought.
This has demonstrated a cryptographic weakness, there could quite well be more, look at the research over the years on weakening md5, therefore moving to different algorithm would be advisable.
Its doesn't mean that you are going to be able to find a collision in non trivial time, but it did lower the bar. Lowering it enough that people wanting high grade protection should switch to a more secure algorithm.
Context specific data has no place in a hash, it would only weaken it.
$_="Slashdotter";$syn="OTT";s;..;;;sub _{print shift||$_};s!ash!Perl !;s=$syn=ack=i;tr+LLEd+BLAH+;_"Just Another ";_
TF title is wrong. It says: "A Competition To Replace SHA-1". But, it's to replace the whole SHA family, which includes both SHA-1 and SHA-2.
SHA-2 includes SHA-256 and SHA-512. Why the whole SHA family? Because its design is not very trustworthy anymore since the "Chinese" attacks in 2005.
Sure there must be collisions, but that's not the point.
The point is that you can verify that data is correct with a good amount of confidence, from a relatively small hash code. So I can download a lot of data through, say, bittorrent, and despite the fact that I don't necessarily trust the people I actually download from, I can verify that the hash is right and therefore I am confident that the data I receive is what the original seeder put out: no-one's decided to play games and (say) sneak their CC number grabber into the data.
So what you want is an algorithm which is reasonably easy to run, which SHA-1 is, but where it is not easy to find a collision. For example, if my hash code was simply to give the total byte sum modulo 1000, then while it would almost certainly catch accidental errors in data, it would be very easy for an attacker to stick in his CC number grabber to your data then fiddle the byte sum back to where it should be.
Your idea pretty clearly shows you have no idea of what hashes are used for: there is no point preserving the data structure, it takes a lot of extra space and gives virtually no security. For example, SHA-1 produces a 20 byte hash. I can put something that size up on my personal website without getting huge bandwidth charges even if millions of people want to download it - and then I can distribute my 1GB zipfile by way of people I don't necessarily trust (but who have more bandwidth than I) and still the eventual recipients can be confident that what they receive is what I sent out. If I include the virtual FAT table of this zipfile, my hash size goes up by about 500,000 percent (literally), and so do my bandwidth charges. And I get virtually no extra security, because all that an attacker has to do above finding an SHA-1 collision is ensure that the change doesn't affect the FAT table: i.e. he replaces some suitable virtual file of mine with one of his, keeps the name and size the same and he's done.
Can we make a real competition and call it Hashing Idol where every week another function gets voted out? Or they could compete in a head to head. Two functions enter ring. One function leaves.
...
Have I been watching too much TV?
Well, there's spam egg sausage and spam, that's not got much spam in it.
I have a perfect solution to the hashing problem, for verifying the data integrity between two points...
You simply have to find autistic twins. The one at the source looks through the MB file, then writes a hash, explaining that it "smells like 5 green triangles". If the twin at the destination agrees, you know you have a match.
It's nearly impossible, even to brute-force this method... I mean, you need to covertly aquire a sample of their DNA, and wait several years for the clone to mature.
Of course, this method's weakness is that it doesn't scale-up effectively. There are only so many autistic twins out there, and human cloning technology is still quite expensive.
Slashdot gets worse every day... Pipedot: News for nerds, without the corporate slant
This is why we have editors...
:-(
I don't know what's scarrier, my loose grasp of formal language, or that I'm the author of two comp.sci text books
hehehehe
Whatever, I do what I want!
Tom
Someday, I'll have a real sig.
As other people have pointed out, I'm not necessarily sure that these competitions really result in a whole lot of new development work per se. Rather, they serve as encouragement to researchers in the field, to take whatever they've been working on for the past few years, tidy it up and make it publishable, and submit it as a candidate for standardization.
The research into new functions progresses more or less on its own in the academic world most of the time. These competitions basically seek to tap into what's going on there, and re-synchronize the commercial/governmental world with whatever the state-of-the-art is in academic cryptography.
"Ladies and gentlemen, my killbot features Lotus Notes and a machine gun. It is the finest available."
...or whatever sound that Elk makeThey make a variety of sounds, most of them surprisingly high and squeaky for such large animals.
Note to ACs: I usually delete AC replies without reading them. If you want to talk to me, log in.
I disagree. The AES competition really galvanized the research community and resulted in some exciting innovations in cryptography - eg the application of bitslicing to the construction of ordinary primitives a la Serpent - and cryptanalsysis - eg Lucks's "saturation attack". And we're seeing similar effects with the eSTREAM competition, which in turn results from the way that all the stream ciphers proposed for NESSIE were broken.
Xenu loves you!
Your point about the impossibility of producing unique M-byte hashes for every N-byte message (where N>M) is of course mathematically correct. But instead of thinking of hashes as working via obscurity, think of the function of the ideal hash to be: the impossibility of finding data with a matching hash without so radically changing the input data that the change is obvious to anyone who sees it. For example, if someone can produce a page of text that has the same hash value as garbage, or as a video of a monkey, the value of the hash function is not diminished. Whereas if someone can produce two license agreements that differ only in the phrases "I accept the following terms" and "I do *not* accept the following terms", the hash function is considered broken.
A hash function seeks to re-map a mathematical space in such a way that previously "adjacent" places in the input space are far apart in the output space. An ideal hash function would do this
- First they ignore you, then they laugh at you, then ???, then profit.
Mathematically, using multiple algorithms may not offer much of an advantage, but practically, where you may by necessity have to work with algorithms that have flaws (because you have to pick from algorithms that are well-agreed-upon standards), or that may be discovered to have flaws in the future, it seems like a good way to hedge one's bets. Aside from the added complexity, there doesn't seem to be any compelling reasons not to do it, if time and computational power allow.
"Ladies and gentlemen, my killbot features Lotus Notes and a machine gun. It is the finest available."
My expert advice is that now that we've seen what happened to the SHA-1 family, I think they should just skip the inevitable upcoming round of exploits for the SHA-2 family and go straight for a new SHA-3 family.
Beware: In C++, your friends can see your privates!
Sure, it might take a while, but it would be possible.
sure but with a decent hash algorithm the sun would most likely go red giant before you finished so its a non-issue.
note: i'm known as plugwash most places but i screwd up registering that here somehow in the past and now can't register
You meant to say "maximum amount of work" is needed.
I'm not sure whether a "broken hash" which e.g. maps blocks of 512 bits to 256 bits isn't better than a bijective hash which maps a 256 bit space to a 256 bit space, because bijectiveness is a property which can probably be exploited just as well as non-bijectiveness (= hash collisions).
Hey don't blame me, IANAB
Right?
n dex.htm
I mean, I have one program called "HashCalc" http://www.slavasoft.com.nyud.net:8090/hashcalc/i
Which includes:
Support of 12 well-known and documented hash and checksum algorithms: MD2, MD4, MD5, SHA-1, SHA-2( 256, 384, 512), RIPEMD-160, PANAMA, TIGER, ADLER32, CRC32.
I don't even know HALF of those. I knew of SHA1, but not of SHA256 384 or 512. Let alone Panama, Tiger, Adler32.
Not sure how "Secure" these other hashes or checksums are. The longer the string of characters, the more secure?
I still see sites offer up a CRC32 to check on your downloads...
I would have thought that brute force no longer implied 2^X attempts as the 'standard' for brute force. Wikipedia gives a lovely equation (hope it's accurate!,) such that if we want a 50% chance of finding a collision, we let H be the number of possible values in the hash, and let n(P) be a function representing the number of values that must be tested for a desired probability of collision P. n(P) = 1.1174sqrt(H). Applying that to a 160-bit value gives us 'only' 1,350,853,710,837,386,639,816,681 values. That's a phenomenal result (1.35x10^24) because it is drastically lower than 2^160 (1.46x10^48.) And the neat thing about the birthday attack is that as we increase P, n'(P) decreases. (dn/dP)
Its not a troll morons, its the truth. Rijndael was THE LEAST SECURE algorithm that made the finals. It uses the same fundamental concepts as serpent, but takes shortcuts for speed. Serpent was the most conservative cipher in the contest, and twofish the most flexible. Either of them would have made sense. Rijndael was just fast on 686 class hardware (and not even much faster than twofish), and should not have been chosen to be AES. If you don't know shit about crypto, then don't mod posts about it.
Why don't we try using SASH-1280 for a while? http://kirils.org/sash-1280-1.0.pdf
Yeah, I did write that myself, but it doesn't for granted mean it's insecure, does it?
Do not. Touch. Down.
It is in Amsterdam
How appropriate for a conversation about HASH.
Even applying that math, you still have non-trivial time requirements. Assuming you can test for collisions at a rate of 10^12 attempts / second (that's a thousand billion attempts per second, a rate that would requre a very large grid or supercomputer) you still have to wait in the order of magnitude of 10^12 seconds. I.e., about 31,700 years.
In reality, collision testing would not be able to be done anywhere near that fast, I'd guess the assumption of 10^12 attempts / second is about 1,000 times faster than anyone could do it. Anyone with real world experience in finding hash collisions feel free to chime in right about now.
A non-tiny data block (anything over say over 100kb) would require a huge amount of processing power and memory bandwidth in order to find a collision with a similar sized data block. Lets go back to our 10^12 cracking grid. To find a 100kb data block with a given hash at that rate would require the computer to have 100kb*10^12 transferred from memory to the CPU every second and then hashed. That's 100,000,000,000,000kb, or 100petabytes between CPU and memory every second plus whatever CPU power is required to perform 10^12 hashes on that data. While clever use of the hardware and lots of L2 cache could perhaps keep this off the main memory bus, it's still a phenomenally difficult task for any CPU.
Take these figures, and assume that it is now 2,000 times easier. These figures still put the time at finding a collission outside of the average lifespan of a person. Now factor in the fact that not just *any* collision will do, we need one that's about the right size and has at least a certain *part* that looks like something useful. I.e., a random data block is not a useful substitution for a PDF of an important doc, the recipient will notice if they receive random data instead of something readable. The practicality of any attack given these real world constraints goes down by an order of magnitude or two.
In short, SHA-1 is still safe. There is no realistically exploitable attack vector, at least not in the public domain as yet. That being said, I am a paranoiac. In the DB file stores that I am in charge of I keep both MD5 and SHA-1 hashes in the metadata table, and have been recommending this for years.
I hate printers.
...is provably collision-resistant.
http://senderek.de/SDLH/
'Proof' by Ron 'RSA' Rivest...
http://diswww.mit.edu/bloom-picayune/crypto/13190
SDLH is simple and secure to any number of bits of security desired once set up properly.
Factoring the modulus in SDLH is the only way to break it.
For that you need a state of the art number factoring algorithm (currently General Number Field Sieve or Shor's Algorithm).
Case closed.
Sun will propose an encryption scheme, it will be rejected, and Sun will release an open source alpha version of it, written in slow and unusable form, then make a press release about how their rejected product will replace something that isn't an encryption scheme at all *cough* Fortress *cough*
In addition to those two cases, what else can we say? Well, we can say that a cryptographically strong hash is a many-to-one mapping, where the set of cases that map to the same hash value should follow a random distribution AND where the set of differences between the aliases of any two hash values should also follow a random distribution.
(By random, I mean that the deltas aren't constant, cyclic, polynomial or otherwise predictable. You can't use any subset in order to predict any other subset. I also mean that for an infinite number of deltas, all possible delta values will occur with a frequency equal to the probability of that delta. In this case, you would want a flat probability, so all delta values will occur equally.)
Another way of saying all of this is to say that finding the aliases for a given input should be an NP-complete problem - the only way to find them is to look.
Anything else? Well, since most of this is extremely hard to test for, there needs to be a few tests that are more practical. The first would be that the hashing function should be in its simplest form. There should be no way of reducing it - for all inputs or for selected subsets of inputs. The second would be that the hashing function is non-differentiable (otherwise related inputs will produce related outputs). The third is that the function should be sensitive to initial conditions (for much the same reason). The fourth is that for any random selection of inputs where the pool of inputs is statistically significant, a standard statistical test for randomness should yield unimaginably high confidence limits that the distribution is indeed random. The chances of it not being random - ie: the chances of you finding an alias algorithmically and not herustically - should be no better than finding the alias by chance alone. (So, even if the hash could be broken, the chances of you finding the flaw are no better than of you finding the alias in the first place.)
I'm sure there are a million other tests that could be applied, but these are the more "obvious" ones that spring to mind. People can turn anything into cryptographic hashes (FFTs and even cellular automata have been used) - my guess is that chaotic systems might be another area of interest, provided you could guarantee the input was always in a chaotic region, as chaotic systems are only that way for specific conditions and can become periodic or stable outside of those limits, which is exactly what you don't want.
It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
You can still use the existing hash functions securely. According to my own analysis, SHA0 requires 108 rounds to be secure and SHA1 requires 104 rounds. Of course SHA0/SHA1 provides only 80-bit security and MD4/MD5 only 64-bit security, so you are better off using the 128-bit secure SHA2-256 or the 256-bit secure SHA2-512, but you have to use 96 rounds for the SHA2-256 to be secure and 104 rounds for the SHA2-512 to be secure. The point is that you are perfectly safe if you hash every block with any of the SHA functions or with MD5 twice. For the MD4 to be secure you have to hash each block three times. Whirlpool also needs 12 rounds to be secure, 10 is not enough.
And who would you trust to generate the shared N that everybody uses? Whoever knows p and q will trivially be able to break the hash function every way you can.
Other serious problems with this hash function are (1) The output is much too long (2) it's far too regular to substitute for a random oracle where one is needed, and (3) it's much, much too slow.
It's cool - and the proof is cool - but it's not a serious contender for normal applications.
Xenu loves you!
Correct me if I'm wrong but isn't that a just binomial expansion approximation?
$_="Slashdotter";$syn="OTT";s;..;;;sub _{print shift||$_};s!ash!Perl !;s=$syn=ack=i;tr+LLEd+BLAH+;_"Just Another ";_
Just different application. For example, if your application needs hashes, use two hashes. One hash of a byte-sized fixed field that gives you the 'hints' you desire, and because it is a fixed byte-size, it reduces the problemspace, and another hash that is considered in the context of the fixed field hints. The hashing algorithm need not change to do what you want.
XML is like violence. If it doesn't solve the problem, use more.
From the official requirements PDF:
"A.3 The algorithm must support
224, 256, 384, and 512-bit message
digests, and must support a maximum
message length of at least 264 bits."
Someone either forgot the ^ carat, or thinks the world can get by on nine bytes of data at a time.
Evan - needs to hit preview before submitting