Theoretical Breakthrough Made In Random Number Generation (threatpost.com)

← Back to Stories (view on slashdot.org)

Theoretical Breakthrough Made In Random Number Generation (threatpost.com)

Posted by BeauHD on Tuesday May 17, 2016 @09:50AM from the you-are-so-predictable dept.

msm1267 quotes a report from Threatpost: Two University of Texas academics have made what some experts believe is a breakthrough in random number generation that could have longstanding implications for cryptography and computer security. David Zuckerman, a computer science professor, and Eshan Chattopadhyay, a graduate student, published a paper in March that will be presented in June at the Symposium on Theory of Computing. The paper describes how the academics devised a method for the generation of high quality random numbers. The work is theoretical, but Zuckerman said down the road it could lead to a number of practical advances in cryptography, scientific polling, and the study of other complex environments such as the climate. "We show that if you have two low-quality random sources -- lower quality sources are much easier to come by -- two sources that are independent and have no correlations between them, you can combine them in a way to produce a high-quality random number," Zuckerman said. "People have been trying to do this for quite some time. Previous methods required the low-quality sources to be not that low, but more moderately high quality. We improved it dramatically." The technical details are described in the academics' paper "Explicit Two-Source Extractors and Resilient Functions."

16 of 152 comments (clear)

Min score:

Reason:

Sort:

Re:Theoretical breakthrough... by MobileTatsu-NJG · 2016-05-17 10:41 · Score: 2

A few days ago I had dinner with a friend of mine who worked on this project. Curiosity got the best of me and I said: "Level with me man, did you really create a random number generator?" And he said: "More or less."

--

"I like to lick butts!" by MobileTatsu-NJG (#32700246) (Score:5, Informative)
Re:Multiplication by michelcolman · 2016-05-17 10:47 · Score: 2

And the secret algorithm is... multiplication!
No, then zero will occur more often than other numbers.
I suppose xor ought to work if the two sources are totally unrelated. But then you have to be sure that they really are completely unrelated. For example, feed it the same source twice and all you get is zero.
So I imagine they must have done something a lot more clever than that.
Re:Why not hardware by Anonymous Coward · 2016-05-17 10:50 · Score: 5, Informative

Hardware random number generation based on stochastic random processes (e.g. Johnson noise across a resistor) is a real thing.
There is the minor difficulty that it requires that you actually add the hardware to do the sampling - sure HP didn't mind dropping a /dev/hwrng into my new $3500 laptop, but most people don't buy top-of-the-line hardware, and nobody likes hardware dongles.
The REAL difficulty is that you're now subject to all the usual threats to analog processes. We like to just *assume* that all the noise sources in our analog devices are uncorrelated because that makes our analyses tractable, but that's not the case. In other words, you want to believe that the 20th bit of the noise-sampling ADC is independently flapping in the breeze and providing 1/0 randomly and in equal measure, but that doesn't make it so. This is especially the case inside a box packed with high-speed digital circuitry, which will be coupling nonrandom, nonwhite signals into anything it can. Worst of all, these signals will tend to be correlated with the ADC: Its conversion clock, like almost every other clock signal in a modern computer, is probably either phase-locked to or derived directly from the same crystal in order to provide reliable, synchronous timing between circuits.
Re:FP by macklin01 · 2016-05-17 10:50 · Score: 5, Informative

I read it too, and I fail to see the breakthrough. There are plenty of pseudo random number generators, such as the Mersenne Twister, with very long periods, so just occassionally XORing even a poor quality random number into the feedback loop, is enough to make it completely unpredictable.
Mersenne Twister is pretty much the standard for simulating a uniform distribution in a lot of scientific computing. These depend not only upon unpredictability (useful for avoiding biases, and clearly important in the security realm), but also upon properties of the uniform distribution.
But when we test it out, we find it's still not as great as we'd like: look at a histogram of outputs, and you'll see that until you get really large numbers of function calls, the histogram isn't particularly uniform. (In particular, numbers near the bottom and top of the range don't get called quite as often.) This means that simulation properties that rely upon uniform distributions over both long and short time periods may be thrown off, and short- and mid-time simulation results may well stem from the MT rather than from the mathematical model. Moreover, low-probability events may have artificially smaller probabilities in the simulations (because of the non-uniformity of the distributions near the bottom and top ends of the range).
Over very short numbers of function calls (a few hundred to a few thousand), the outputs can even tend to cluster in a small neighborhood. So suppose that you are simulating a tissue with many cells, and calling MT as your uniform distribution to decide if the cells divide or stay dormant (each with independent probability p, so each cell divides if PRNG/max(PRNG) < p). The math says that for a uniform distribution, you don't need to worry about what order you evaluate your decision across all the cells. But if the PRNG outputs cluster over several sequential calls, then a neighborhood of cells may simultaneously divide if they are all evaluated close to one another sequentially. In analyzing the spatial behavior of such a simulation, you may draw incorrect conclusions in smaller spatial structures that, again, derive from non-uniformity of the PRNG, rather than problems with predictability. (And then you may accept/reject a particular hypothesis or mathematical model pre-maturely.)
So, there's definitely more to it than just unpredictability, depending upon where the code is being used.

--
OpenSource.MathCancer.org: open source comp bio
Low reliability random vs dedicated white noise by wierd_w · 2016-05-17 10:57 · Score: 2

I see this very salient question raised earlier-- why not use a dedicated white noise generator for random picks?
Most places you are going to need a good random integer are going to be in places where a dedicated diode based noise source is going to add cost to the system, and thus not be ideal. Things like an ATM machine, a cellphone, a laptop, whatever.
The cost is not in the actual component cost (a diode costs what, a few cents?) but in a standardized interface to that noise source.
Conversely, there are several very low quality sample sources that have no physical correlation with each other on such systems that can be leveraged, which have well defined methods of access, which add zero additional cost to the system.
Say for instance, microphone input + radio noise floor on a cellphone. Perhaps even data from the front and rear cameras as well. (EG, take Foo miliseconds of audio from the microphone, get an snr statistic from the radio, and have a pseudorandom shuffle done on data from the front and rear cameras. These sources are, independently, horrible sources of random. They are however, independent of each other. If this guy has found a really good way of getting good random from bad sources, then a quality random sequence can be generated quickly and cheaply this way. This would allow the device to rapidly produce quality encrypted connections without a heavy computational load, vastly improving the security of the communication, and do so without any additional OS level support, or extra hardware.)
When deriving a key pair for the likes of GPG, it can take several minutes of "random" activity on the computer to generate a sufficiently high quality entropy pool for key generation. That is far too slow to have really good, rapidly changing encryption on a high security connection. Something like this could let you generate demonstrably random AES 256 keys rapidly on the fly, and have a communication channel not even the NSA can crack.
The trick here is for them to reliably demonstrate that the rapidly produced sequences are indeed quality random sequences with no decernable pattern.
As long as Apple gets it and shuffle sucks less by _Shorty-dammit · 2016-05-17 11:16 · Score: 3, Funny

As long as Apple gets it and shuffle stops jumping back to the same area many, many times a day then I'll be happy. That shit gets on my nerves, yo!
Re:FP by JanneM · 2016-05-17 12:05 · Score: 5, Interesting

For large-scale simulations you need them to be pseudo-random, as in repeatable. If you are running a parallel simulation across hundreds or thousands of nodes, you can't send random data to all the nodes. You'd spend all your time sending that, not getting anywhere with your simulation.
Instead, what you do is that each node runs its own random generator, but seeded with the same state. That way they all have the same random source, with no need to flood the interconnect with (literally) random data.
Another reason is repeatability of course. You may often want to run the exact same simulation many times when you're developing and debugging your code.

--
Trust the Computer. The Computer is your friend.
summary by slew · 2016-05-17 12:16 · Score: 2

One way to think about this is to understand how to get a unbiased number out of a biased coin? A simplistic Von-Neumann extractor.
Basically, flip the coin twice, discard HH/TT and assign HT and TH to 0 and one. This throws away alot of flips (depending on how biased the coin is) but gets you what you want if the flips are independent.
Now imagine instead of a 2x2 matrix, a really large matrix that takes in a sequence of flips that depends on the minimum entropy of the source so that you avoid throwing away flips. How do you fill the elements of this matrix?
You can watch here and learn something...
Re: Multiplication by Ol+Olsoc · 2016-05-17 13:27 · Score: 4, Informative

Multiplication and other arithmetic operators are indeed secret as far as the average American high schooler is concerned.
Well, a Mathematician was interrogated because some woman on a plane saw him........
doing math, which in today's America is a sign of trrsm.

--
The shepherds did so well protecting the flock that the sheep no longer believed that wolves existed.
Re:FP by Darinbob · 2016-05-17 14:03 · Score: 2

Sure, lots of people can do that, but can they do the math behind it well enough to get a paper out of it?
Re:Why not hardware by Darinbob · 2016-05-17 14:24 · Score: 2

A problem with the hardware RNGs is that they very often don't just give you a random number on demand. They often require you to have a delay between starting and ending the operation for example, and the longer you wait the more entropy is generated. Not bad drawbacks for the occasional crypto input, but could be a problem for running simulations.
Just prove it ;) by thesupraman · 2016-05-17 18:57 · Score: 2

Let me know when you can prove that, Nobel have a prize for you ;)
That being the problem with randomness, it is borderline impossible to prove (and if you are unlucky, easier to disprove).
The associated problem in cryptography is trustable randomness ;)
The classic case of that is the RNG embedded in most Intel chips, where Intel refuse to give
specific details, and no one would trust them anyway, so it is not used for anything really secure.
That is not a problem with Intel specifically of course, it is a problem with trusting any 3rd party.
The main issue with a quantum noise based RNG is how do you verify it - just trust the vendor? ;)
Re:FP by TechyImmigrant · 2016-05-17 22:51 · Score: 4, Interesting

I read it too, and I fail to see the breakthrough. There are plenty of pseudo random number generators, such as the Mersenne Twister, with very long periods, so just occassionally XORing even a poor quality random number into the feedback loop, is enough to make it completely unpredictable.
This proof is about proving mathematically that an algorithm is a good entropy extractor. MT has no such proof. It doesn't even meet the definition of an entropy extractor and isn't cryptographically secure. For algorithms in the same class, PCGs are currently top of the pile, but they still aren't secure.
They claim to have an algorithm for a secure two input extractor (where the two inputs are independent) which can extract from low entropy source. 'low' means less than 0.5. Single input extractors suffer from an absence of proofs that they can extract from sources with min-entropy less than 0.5.
I fail to see it as new because the Barack, Impagliazzo, Wigdersen extractor from 2007 did the same trick using three sources (A*B)+C where values from A,B and C are treated as elements in a GF(2^P) field and can operate with low min-entropy inputs. This is so cheap that the extractor is smaller than the smallest entropy sources. Intel published this paper using the BIW extractor for what is probably the most efficient (bits/s/unit_area and bits/s/W) to date.
I fail to see that it is useful, because real world sources have min-entropy much higher that 0.5. Low entropy inputs is a problem mathematicians think we have, but we don't. We engineer these things to be in the 0.9..0.999 range.
On top of it, these two papers are as clear as mud. I tried and failed to identify an explicit construct and ether I'm too dumb to find it or they hid it well. I did find an implication that the first result uses a depth-4 monotone circuit that takes long bit strings and compresses them to 1 bit and the second paper extends this with a matrix multiply. That sounds both inefficient and more expensive than an additional small source needed to use the cheaper 3-input BIW extractor.
My take is that there have been two breakthroughs in extractor theory in recent years. Dodis's papers on randomness classes and extraction with cbc-mac and hmac for single input extractors and the BIW paper for three input extractors. Both useful in real world RNGs.
I may be wrong though. I've asked a real mathematician to decode the claims.

--
I should use this sig to advertise my book ISBN-13 : 978-1501515132.
Re:FP by TechyImmigrant · 2016-05-17 22:53 · Score: 4, Informative

Mersenne Twister is pretty much the standard for simulating a uniform distribution in a lot of scientific computing.
No. Permuted Congruential Generators are. There's no contest. MT is slower and doesn't pass TestU01.
MT is used for reasons of inertia.

--
I should use this sig to advertise my book ISBN-13 : 978-1501515132.
Re:FP by TechyImmigrant · 2016-05-17 22:55 · Score: 2

Is there a reason why using the hardware entropy source to change the state of a quality PRNG those few hundred thousand times per second wouldn't work?
That is a normal construct. It's even codified in SP800-90A with the additional input option on the Reseed() function.

--
I should use this sig to advertise my book ISBN-13 : 978-1501515132.
Re:FP by serviscope_minor · 2016-05-18 00:42 · Score: 4, Informative

Mersenne Twister is pretty much the standard for simulating a uniform distribution in a lot of scientific computing
kind of is, for the reason as you say of:
MT is used for reasons of inertia.
MT19937 is pretty good. Much better than the LCGs that were popular before it. Those have some really serious pitfalls and you can hit them awfully easily. MT19937 is generally pretty good, though not perfect. It has far fewer pitfalls than LCGs and you're actually pretty unlikely to hit them (it doesn't fail TestU01, it passes almost all of the tests and fails a few). It's also fast enough that you have to try quite hard to make the RNG the limiting factor (it is possible though).
You can go wrong with MT19937, but it's hard to. That makes it pretty decent. Prior to C++11, xorshift used to be my go-to generator that didn't suck because it was a 5 line paste. These days, it's MT19937 simply because it's in the standard library and good enough for almost all tasks.

--
SJW n. One who posts facts.