Sign In

Communications of the ACM

Research highlights

Technical Perspective: Backdoor Engineering


View as: Print Mobile App ACM Digital Library Full Text (PDF) In the Digital Edition Share: Send by email Share on reddit Share on StumbleUpon Share on Hacker News Share on Tweeter Share on Facebook

Imagine you are a cyber spy. Your day job is to tap cryptographically protected communications systems. But how? Straightforward cryptanalysis has long become impractical: the task of breaking modern algorithms, if implemented correctly, far exceeds all computational power available to humanity. That leaves sabotage.

You can target many Achilles heels of a crypto system: random-bit generators, side channels, binary builds, certification authorities, and weak default configurations. You infiltrate the teams that design, implement, and standardize commercial security systems and plant there hidden weaknesses, known as backdoors, that later allow you to bypass the cryptography.

Take random-bit generation. Security protocols distinguish intended peers from intruders only through their knowledge of secret bit sequences. Servers have to choose many key values at random to protect each communication session, and an adversary who can successfully guess these can impersonate legitimate users.

One trick to backdoor random bits can be understood with basic high-school algebra. A deterministic random-bit generator (DRBG) is initialized (seeded) with a start state s0, and then iterated with some generator function: Si+1 := G(Si)

ueq01.gif

In simple DRBGs (say, for simulations), the si may serve as both the state of the generator, as well as its output. So anyone who saw an output si and knows G can easily predict all future outputs. Crypto-grade DRBGs make four improvements: (a) hardware noise sources (slow) seed s0, (b) the state si has hundreds or thousands of bits, (c) a second function H derives output values ri:=H(si) from the internal state

ueq02.gif

and (d) both G and H are one-way functions. These can be computed efficiently, but their inverses not. After H, an adversary who can see some of the outputs ri cannot infer anything about the internal states si or other outputs rj. We know many excellent choices for G and H: one-way functions or permutations carefully engineered to be fast and to have no other known exploitable properties. Most are constructed from secure hash functions or block ciphers.

As a saboteur, you do not want these used. Instead, you lure your victims toward a far more dangerous option: the class of algebraic one-way functions that enabled public-key crypto. These are orders of magnitude slower and require much bigger values for equal security. Modular exponentiation is a simple example. If you follow a few rules for choosing a big integer g and a big prime number p, then G(x):=gx mod p is such a one-way function. While gx alone is monotonic, and thus easy to invert, the mod p operation (take the remainder after division by p) ensures the result remains uniformly spread over a fixed interval and appears to behave highly randomly. The inverse discrete logarithm problem, of calculating x when given (gx mod p, p, g), becomes computationally infeasible, and we have a one-way function. (In the following, we drop mention of the mod p operation, and just apply it automatically after each arithmetic operation.) The exponentiation operator gx has an important additional property, not affected by the mod operation: (gx)y = (gy)x. While this commutativity is useless to honest designers of DRBGs, it can be invaluable to saboteurs.

Convince your victims that G(si):=gsi and H(si):=hsi are excellent choices for generating random numbers of the highest security:

ueq03.gif

You can claim "provable security based on number-theoretical assumptions," but this is, of course, just a smoke screen. The sole advantage of this construction is that it allows a backdoor. If you can choose g as g:= he, then knowing your secret integer e immediately allows you to convert any output value ri into the next internal state of the DRBG as (ri)e = (hsi)e = (he)si = gsi = Si+1:

ueq04.gif

So, if you contact a server and receive one ri, you can now immediately predict all future rj used to protect the communication with others, and decrypt or impersonate their messages. Job done. And nobody else can do this, because finding e from h and g is computationally infeasible (the aforementioned discrete logarithm problem). Unless, of course, they steal your backdoor by generating their own e' and replacing your g with their g' := he'.

The following article by Checkoway et al. reports on the amazing independent reconstruction of exactly such a backdoor, discovered in the firmware of a VPN router commonly used to secure access to corporate intranets. In 2004, the NSA planted the above DRBG in NIST standard SP 800-90, including a g and h of their choice. The details differ only slightly (elliptic curve operations rather than modular exponentiation, which uses slightly different notation; the top 16 bits of ri discarded, can be guessed via trial and error). The basic idea is identical.

But planting a backdoor in a standard is not enough. You now also have to ensure industry implements it correctly, such that an ri reaches you intact. And that nobody else replaces your g. And that is where this story begins.

Back to Top

Author

Markus G. Kuhn (mgk25@cam.ac.uk) is a Senior Lecturer teaching computer security and cryptography at the University of Cambridge, England.

Back to Top

Footnotes

To view the accompanying paper, visit doi.acm.org/10.1145/3266291


Copyright held by author.
Request permission to (re)publish from the owner/author

The Digital Library is published by the Association for Computing Machinery. Copyright © 2018 ACM, Inc.


 

No entries found