Truly random numbers are critical for computing and cryptography. Although deterministic algorithms can generate numbers that seem random, critical applications depend upon truly unpredictable physical processes. Unfortunately, the resulting sequences can still contain hidden regularities, so theoretical computer scientists have long sought “extractors” that produce perfect randomness from these imperfect sources.
It has been known for 30 years that there should be many ways to combine two imperfect sources to generate nearly perfectly random numbers, but only now have researchers shown explicitly how to create such “two-source extractors.” The work, which received a best-paper award at the 48th ACM Symposium on Theory of Computing (STOC 2016) last June, brought together developments from several disparate areas of computer science. “The search has been going on mainly in the past 10 or 15 years, and this is the culmination of this effort,” said Avi Wigderson of the Institute for Advanced Study at Princeton University in Princeton, NJ. “It’s a huge jump, both technically and also quantitatively.”
The practical implications may be modest for now, though. Security specialist Bruce Schneier, for one, does not see any urgent need for better random numbers. Schneier helped create the widely used Fortuna pseudorandom-number generator, and says there are already many adequate sources. “In my world no one’s worried about this,” he said. “These systems already work” to provide secure communications when attention is paid to all of the other implementation details. “We have lots of problems; this isn’t one of them.”
Wigderson’s enthusiasm is more fundamental. “The point of studying these things theoretically is you’d like guarantees about the behavior of a published algorithm or cryptographic protocol.” Moreover, other researchers have already built on the results to devise even more efficient algorithms. As a side benefit, the two-source extractor also has implications for other mathematical issues, including solving a 70-year-old problem in graph theory.
An Adversarial Approach
Random numbers are used in a wide variety of computational tasks. For example, the outcome of complex processes that cannot be calculated explicitly, such as climate change, can be modeled by averaging random samples from a probability distribution. Other problems are just easier to attack using random numbers. In fact, some problems have proven-efficient solutions based on random numbers, while the best deterministic algorithms known require impractically long calculations.
Theoretically, assessment of these methods is only possible if the numbers are really random, meaning that every outcome is equally likely, no matter what has come before. Any hidden regularities could introduce biases that skew the results, and there are published cases that suffer from such distortions, said David Zuckerman of the University of Texas, Austin, who coauthored the new work with graduate student Eshan Chattopadhyay, now at the Institute for Advanced Study.
It turns out to be surprisingly difficult to extract guaranteed high-quality randomness from a single source, if the source is even modestly non-random.
To guard against such problems, “in the computer science setting you try to think as adversarially as possible,” said Henry Yuen of the University of California, Berkeley. “You try to think that nature is conspiring against you.” This adversarial approach is particularly appropriate in cryptography, where randomly generated keys, shared between sender and legitimate recipient, are used to hide information from potential eavesdroppers. “What you’d really like to be sure is that there’s no other being in the universe that can predict the outcome,” Yuen said. “It might be uncertain to you, but how do you know it’s uncertain to someone else?”
For example, if strings of 100 bits were chosen randomly, any particular string would only occur about every 2100 (about 1,000,000,000,000,000,000,000,000,000,000) times. If a particular string occurred much more often, say every 1,000 times, a human observer would certainly not notice, unless the string itself was distinctive. Moreover, a statistical test could miss this rare violation of randomness unless it was told to test for the specific string. A malevolent adversary who knew about that special string, however, could use that knowledge to drastically reduce the computational challenge of deducing an encoded signal.
Assessing deviations from randomness can be subtle. Theoretical computer scientists use a quantity called min-entropy, which is the negative base-2 logarithm of the highest probability of occurrence out of all possible sequences. (The more-familiar Shannon entropy quantifies the information in a message as the average of this logarithm over all sequences, which is always larger than this worst case.) In the one-in-a-thousand example discussed earlier, the min-entropy would be log2(1,000), or about 10 bits, which is one-tenth of the 100 bits of a truly random distribution.
One way to get sequences with near-maximal min-entropy is to use pseudorandom number generators, which use deterministic algorithms to create sequences that are mathematically assured of appearing random. For example, some algorithms cycle irregularly through all possible strings starting from some “seed” sequence. For a particular seed, however, this pseudorandom sequence will be the same. This repeatability can be convenient for debugging code, but creates vulnerabilities, so cryptography protocols like Fortuna spend a lot of effort on frequently injecting new seeds.
To completely avoid pseudo-randomness, computer chips often come with physical random number generators based on the outcome of noisy electronic events. There are even companies that sell devices based on the outcome of quantum processes that are not predictable, even in principle.
These physically derived sources may still contain hidden regularities, however, for example because of correlations between different events. The quality of the randomness also depends on how manufacturers have implemented all of the other details. Some researchers have even speculated, especially since the disclosures by Edward Snowden, that commercial hardware sources could be subtly manipulated to make the codes easier to break by the U.S. National Security Agency or others. Researchers, therefore, would like to have assured randomness even from these sources.
It turns out to be surprisingly difficult to extract guaranteed high-quality randomness from a single source, if the source is even modestly non-random. For some 30 years, theoretical computer scientists have recognized and pursued two related ways to circumvent this problem. Both cases had been proved to have appropriate algorithms; the problem has been to find them.
The first approach leverages a small, high-quality random seed to extract high-quality random bits from a poor source. For a decade now, researchers have known how to efficiently construct such “seeded extractors” that are essentially as efficient in making use of low-quality sources as is theoretically possible; if the source is an n-bit string, for example, that string need only have a min-entropy of order log(n).
The second approach avoids the need for a perfect seed by combining two low-quality sources, which must be independent. Researchers had long ago proved that almost all possible extraction algorithms would work. “It’s like finding hay in a haystack,” Zuckerman joked. Yet in fact, for a decade now, the best-known two-source extractor still required that each n-bit source string have a min-entropy of almost n/2 bits.
“A lot of problems in theoretical computer science are like this,” Zuckerman said, where the challenge is finding explicit examples of entities that are known to exist. Using a complex procedure, Chattopadhyay and Zuckerman showed how to construct extractors that require only log(n) to a power, specifically the 74th power.
An early draft of the paper was posted online in the summer of 2015. “This was such a big result that all the experts immediately just started reading it. It quickly became clear that this was a breakthrough result and all the ideas were correct—and also quite clever,” said Yuen. Other researchers quickly extended the results, reducing the exponent from 74 to slightly over one. Zuckerman’s former student Xin Li, now at Johns Hopkins University, also adapted the method to generate multiple bits, as opposed to the single bit of the original paper.
“Everyone has been waiting for this,” said Li, but he cautioned that “most of this work is still theoretical.” Li said he expects the work will eventually find its way into practical computing use, but “right now the running time is not very good, so it will take some time to improve this.”
To explicitly construct two-source extractors, Chattopadhyay and Zuckerman brought together three sets of “tools that were lying around the garage,” but which had been developed for seemingly unrelated problems, Wigderson said. “The combination is really pretty amazing.”
The first tools, called “non-malleable extractors,” are similar to seeded extractors, and had been studied for cryptography. Xi Lin had recently highlighted their unexpected usefulness in the two-source problem. The second tool, drawing on decade-old work on pseudorandom generators for parallel computing, also had “seemed totally irrelevant” to the current problem, Wigderson said. Finally, Chattopadhyay and Zuckerman extended 30-year-old work to explicitly construct resilient functions, which can ensure that voting is not swayed by small coalitions of voters (or adversarially selected bits).
“Part of our contribution was getting a way to convert two low-quality sources and combine them in a certain way so that most of the bits are random but there are a few that are not,” Zuckerman said. The resilient function then ensures that the few non-random bits cannot prejudice the outcome.
An important by-product of the two-source extractor design, Wigderson said, is an explicit solution of another longstanding challenge in graph theory. Specifically, it concerns Ramsey graphs, graphs that contain no large cliques of connected nodes or groups of unconnected nodes larger than a certain size.
“It’s amazing, but if you would like to build a large graph with this property, it’s very hard,” Wigderson said, even though almost all randomly chosen graphs have it, as shown statistically by the prolific mathematician Paul Erdös in 1947. “People have struggled … for many, many decades” to construct them systematically, he said.
The tools developed by Chattopadhyay and Zuckerman show how to create such graphs systematically. (Independently, Gil Cohen, now at Princeton University, also presented a solution to this problem at STOC 2016.) “The general quest” of dealing with randomness in combinatorial situations, Wigderson said, “is a big deal in many ways.”
Chattopadhyay, E., and Zuckerman, D.,
Explicit Two-Source Extractors and Resilient Functions, Electronic Colloquium on Computational Complexity, Revision 2 of Report No. 119 (2015), http://eccc.hpi-web.de/report/2015/119/
Chattopadhyay, E., and Zuckerman, D.,
How random is your randomness, and why does it matter? TheConversation.com, Sept. 18, 2016, https://theconversation.com/how-random-is-your-randomness-and-why-does-it-matter-59958
Recommendation for the Entropy Sources Used for Random Bit Generation, (Second draft January 2016), National Institute of Standards and Technology Special Publication 800-90B, http://csrc.nist.gov/publications/PubsDrafts.html#800-90B
Center for the Study of Rationality, Randomness. https://www.youtube.com/watch?v=syUxHJFwwQQ