Security and Privacy Research Highlights

On the (In)Security of ElGamal in OpenPGP

By Luca De Feo, Bertram Poettering, and Alessandro Sorniotti

Posted Jun 1 2023

Abstract
1. Introduction
2. Elgamal Encryption
3. Computing Discrete Logarithms
4. Cross-Configuration Attacks
5. Attacks Enabled By Side-Channels
6. Practical Exploitation
7. Conclusion
Acknowledgments
References
Authors
Footnotes

Roughly four decades ago, Taher ElGamal put forward what is today one of the most widely known and best understood public key encryption schemes. ElGamal encryption has been used in many different contexts, chiefly among them by the OpenPGP email encryption standard. Despite its simplicity, or perhaps because of it, in reality there is a large degree of ambiguity on several key aspects of the cipher. Each library in the OpenPGP ecosystem seems to have implemented a slightly different “flavor” of ElGamal encryption. While-taken in isolation-each implementation may be secure, we reveal that in the interoperable world of OpenPGP, unforeseen cross-configuration attacks become possible. Concretely, we propose different such attacks and show their practical efficacy by recovering plaintexts and even secret keys.

1. Introduction

The 1984 ElGamal cryptosystem⁹ is one of the oldest and best-known public key encryption schemes. In the 80s and 90s, it earned wide adoption for being simultaneously efficient and patent-free. Its most prominent use is in OpenPGP,⁶ a standard for consumable and interoperable email security, where it has been the default and most popular encryption option for decades. At the time of writing, still at least one in six registered OpenPGP keys have an ElGamal subkey, with about 1000 new registrations per year.

The ElGamal scheme builds on elegant mathematical structures and can be defined very compactly. This simplicity, together with the opportunity to mature for roughly four decades now, suggests that a crisp specification with clear parameter choices, rules, and algorithms would be present in international standards, in particular in OpenPGP. Surprisingly, this turns out not to be the case: Our research reveals that OpenPGP’s understanding of ElGamal encryption is open to interpretation, with several choices subject to the discretion of the implementer.

In this paper, we consider cross-configuration attacks on OpenPGP. Such attacks emerge when different interpretations (‘configurations’) of the same standard interact insecurely with each other. Toward identifying such conditions for ElGamal encryption, we need to first understand the universe of OpenPGP interpretations that are used in practice. We approach this challenge from various angles: We carefully study RFC4880⁶ which defines OpenPGP, we inspect the source code of a number of relevant OpenPGP-implementing software libraries and we conduct a large-scale examination of millions of keys registered on OpenPGP key servers.

Our results reveal an insecure posture. For instance, we develop a plaintext recovery attack that can be mounted on ciphertexts produced by the ubiquitous GNU Privacy Guard against keys generated following the original ElGamal specification.⁹ The attack is effective against 2048-bit keys, a bit length that is considered practically secure today. We further illustrate how cross-configuration attacks can be combined with known side-channel exploitation techniques like FLUSH+RELOAD²³ or PRIME+PROBE.²⁰ Concretely, using the example of gcrypt—the cryptographic library underlying the GNU Privacy Guard that has already been fixed twice after seminal work^{14, 23} on side-channel attacks—we demonstrate that if standards conform 2048 bit ElGamal key is used to decrypt a cipher-text, then an attacker that is OS- or VM-colocated with the decrypter might be able to fully recover the decryption key. Given that interoperability is the explicit and almost exclusive goal of any standardization effort, and common-place in the OpenPGP world, we conclude that our cross-configuration attack conditions are as realistic as the attack results in awakening.

This manuscript is organized as follows: In Section 2, we survey (a) the meaningful options available when implementing ElGamal encryption, (b) the options adopted by different cryptographic libraries, and (c) the options picked by over 800,000 users in practice (as far as reflected on key server databases). In Section 3, we recall various standard algorithms for solving discrete logarithms. In Section 4, we describe “vanilla” cross-configuration attacks, and in Section 5, we describe those combined with side-channel attacks. In Section 6, we conduct end-to-end exploits and describe how we bring in the required side-channel information. We conclude in Section 7.

1.1. Related work

Since ElGamal encryption was first proposed, a number of research results revealed security issues, both in implementations (e.g., CVE-2018-6829, CVE-2018-6594) and at the specification level. For instance, Bone et al.⁵ identified security conditions if ElGamal encryption is used in textbook form. However, as OpenPGP employs ElGamal exclusively for key transport, the attacks do not seem to apply in practice. To the best of our knowledge, our work is the first to challenge the security of ElGamal encryption in the specific way OpenPGP builds and relies on it.

Protecting modular exponentiation from side-channel attacks has been the subject of research for years.^2,10,11,15 Specifically, cache-based side channels have been explored extensively, using various techniques such as EVICT+TIME,⁴ PRIME+PROBE,²⁰ or FLUSH+RELOAD.^{3, 23} Yarom and Falkner²³ in particular target the gcrypt implementation of modular exponentiation which at the time was using square-and-multiply. Prompted by that work, the implementation was significantly revised. The changes are however not sufficient to protect against the attack we present in Section 5. Most cache-based side-channel attacks target L1 or L2. However, Liu et al.¹⁴ show that exploitation based on last-level cache is possible for attacks targeting both data and instructions. The work also targets the gcrypt implementation of modular exponentiation and the library has also been fixed to avoid leaking table accesses. Once again, however, this is not sufficient to prevent the attack described in Section 5.

2. Elgamal Encryption

After providing an abstract formulation of ElGamal encryption, we present common ways to instantiate the scheme in practice.

Let G be a group and g ∈ G a generator. To create a key pair (sk, pk), pick a random integer x, compute the element X = g^x, and output (sk, pk) := (x, X). Given pk, to encrypt a message M, pick an ephemeral random integer y, compute the elements Y = g^y and Z = X^y = g^xy, and output C = (C₁, C₂) := (Y, M · Z) as the ciphertext. Given sk, to decrypt C, first recover element Z from C₁ as per Z = Y^x = g^yx and then use C₂, Z to recover M = C₂/Z.

To instantiate the scheme, the following details have to be fixed: Which group G shall be used? How is generator g chosen? Shall g generate the full group G or just a subgroup? From which sets and according to which distributions are exponents x, y picked? Multiple configurations for the parameters are possible and promise to lead to correct and secure public key encryption instances. Below we describe four such configurations that have in common that G is a “prime field group,” that is, the multiplicative group ℤ^*_p = (ℤ / pℤ)^* of a field ℤ/pℤ where p is a large prime number (also referred to as the modulus). In such cases the order of G is given by ord(G) = p − 1 and the order of any subgroup G’ ⊆ G is an integer divisor of p − 1.

Configuration A: “The original”. The original proposal⁹ demands choosing g such that it generates p − 1 elements, that is, the full group. Exponents x, y are picked uniformly from [1 … p − 1]. The only condition on p is that p − 1 has at least one large prime factor [This is to counter the Pohlig-Hellman attack on Discrete Logarithm Problem (DLP).]

If Configuration A is used, any element in G could become a public key. However, some of these public keys would have a lower multiplicative order than others, that is, generate a smaller subgroup, and thus promise less security. (E.g., in the extreme case of X = 1, the encryption operation does effectively nothing.) This can be remedied by restricting the ElGamal group operations to a prime order subgroup G’ ⊊ G. Indeed, if G’ = 〈g〉 has prime order, then all its elements necessarily have the same order (with the one exception of the neutral element which is easily tested for). Let hence p − 1 = q₀ … q_n be the prime factor decomposition of ord(G). As p is a large prime number and hence odd, we know that one of these prime factors is 2, and we hence w.l.o.g. write p − 1 = 2q₁ … q_n. Now, for any prime q = q_i in this list of factors there exists a subgroup G’ ⊊ G of order q. The idea of using such a group G’ in cryptography goes back to Schnorr.¹⁸

Configuration B: “ElGamal over Schnorr groups”. Pick a large prime group order q and a prime p such that q | (p − 1), choose a generator γ with 〈γ〉 = G and let g = γ^(p−1)/q and G’ = 〈g〉. Note that this implies ord(G’) = q. Pick exponents x, y in [1 … q − 1].

This setup satisfies the Pohlig–Hellman condition and ensures that all non-trivial elements of G’ have the same order. Further, if q << p, as exponents x, y are now picked from [1 … q − 1] instead of [1 … p − 1], implementations of Configuration B are very efficient.

While Configuration B removes the risk of accidentally relying on small order elements, an adversary can still exploit the existence of small subgroups in an active attack: Observe that for any t with t | (p − 1) there exists an element α of order t, that is, with α^t = 1 and 1 ∉ {α, α², …, α^t−1}. Fix such a pair (t, α), let C = (C₁, C₂) be an honestly generated ciphertext as above, and assume an adversary intercepts C and re-injects it as , for h ∈ [1 … t]. When processing C’, the decrypter obtains . Note that M’ ≠ M iff α^h-x = 1 iff h ≡ x (mod t); otherwise M’ ≠ M. That is, by testing whether the injected ciphertext C’ properly decrypts, the adversary can confirm or refute a guess on x (mod t). Recovering x (mod t₁), x (mod t₂), … for carefully picked t₁, t₂, … allows recovering the full exponent x via the Chinese Remainder Theorem. As for each t−value the adversary has to request up to t decryptions, small t-values are a precondition for the attack to be effective. So is the existence of sufficiently many suitable t-values. Hence, one countermeasure against the attack is to ensure that the number of distinct divisors t of p − 1 is very small, and another is to arrange that all admissible t-values are very large. Configurations C and D formalize these ideas and can be traced back to Pollard¹⁷ and Lim and Lee.¹³

Configuration C: “ElGamal over safe primes”. The number of subgroups is minimized: Pick a large prime group order q and a prime modulus p such that p − 1 = 2q, choose a generator γ with 〈γ〉 = G and let g = γ^(p−1)/q and G’ = 〈g〉. Pick exponents x, y in [1 … q − 1]. Note that p ≈ q. Choosing g = 4 = (±2)² is always feasible, leading to smaller parameters.

Configuration D: “ElGamal over Lim-Lee primes”. All non-trivial subgroups are large: Pick a large prime modulus p such that for the prime factorization p − 1 = 2q₁ … q_n all q_i have a similar, large bit length. (Breaking DLP is then hard for all group orders q_i.) Choose a generator γ with 〈γ〉 = G and let g = γ^(p−1)/q and G’ = 〈g〉. Pick exponents x, y in [1 … q − 1].

As exponents in Configuration C became large again, some implementations work with “short exponents” for improved performance. Configuration D maintains good performance without such tricks. However, its parameters are larger. When building a secure system from scratch, it seems advisable to use one of these two options.

2.1. ElGamal in OpenPGP

The standard. A widely-deployed cryptography standard that suggests using, and mandates implementing, ElGamal encryption is OpenPGP. While its first version was put forward in 1998, the latest official version appeared in 2007 as RFC4880.⁶ We observe that the RFC refers to external documents for the specification of the ElGamal cryptosystem. Precisely, Section 9.1 provides two references that both ultimately resolve to ElGamal’s original paper,⁹ that is, our “Configuration A”. We note that Sections 5.1 and 5.5, which define how ElGamal ciphertexts, public keys, and secret keys are to be represented as binary strings, add no further conditions on the generation and use of ElGamal keys and ciphertexts.

OpenPGP libraries. We inspected how OpenPGP libraries interpret the standard. Here are our findings on three popular ones:

Go Go does not provide code to generate ElGamal keys; it only provides algorithms to encrypt and decrypt. The ephemeral exponent y used in encryption is chosen from [0 … p − 1]. This fully conforms with RFC4880.
Crypto++ The Crypto++ library follows Configuration C by generating a random safe prime and choosing for the generator the smallest quadratic residue. It though deviates from Configuration C by picking both x and y from a short interval of size roughly twice the security parameter. (See the full version^{7, 8} for details.) As the group generator is not primitive, this setting conflicts with RFC4880.
gcrypt The gcrypt library generates a Lim-Lee modulus like in Configuration D; the minimum size of the prime factors q_i | (p − 1) is roughly twice the security parameter. (See the full version^{7, 8} for details.) Unlike in Configuration D, the generator is chosen to be the smallest integer generating the full group ℤ^x_p. Both exponents x and y are sampled from short intervals of size roughly q^3/2. Due to the short exponents, this setting conflicts with RFC4880.

ElGamal parameters in the wild. To get a complete picture of the use of ElGamal in OpenPGP, it is not sufficient to look at the prominent open-source libraries. Indeed a large portion of the user base relies on proprietary or exotic implementations, which are impossible to track. To address these, we look at the OpenPGP keys registered on public key servers. We analyze an OpenPGP server dump¹ produced on Jan 15, 2021 subkeys.

An OpenPGP ElGamal public key consists of the triplet (p, g, X). This information alone is not sufficient to ascribe the key to one configuration with certainty, however, partial information can be deduced by attempting to factor p − 1. For example, safe primes are easily recognized by running a primality test on (p − 1)/2, then a quadratic residuosity test reveals whether g generates the prime order subgroup or the full group. For random primes of the sizes we look at, it is in general infeasible to obtain a full factorization of p − 1, however, partial factorizations and residuosity tests let us at least formulate credible hypotheses on the key generation process.

To classify public keys, we conducted trial division on p − 1 with primes up to 2²⁵, then repeatedly applied the elliptic curve factorization method (ECM),¹² until we felt guilty for the carbon emissions. Then we applied n-th residuosity tests to g for the factors we found. We did not attempt to gain information on the exponent x that defines X = g^x. In our findings, 69.4% of the keys use safe primes, 25.3% use moduli that were very likely generated using the Lim–Lee method, and 5.0% use what we call “quasi-safe primes”, that is, primes of the form p = Q · q + 1 where q is an unusually large prime (0.988 times the size of p on average) and Q > 2. Finally, there are only 2158 moduli (less than 0.4%) for which we found non-trivial factors, but we were not able to finish the factorization, indicating that they were either chosen at random or like in Configuration B. These would almost be irrelevant if it was not for the attack we describe in Section 4.

3. Computing Discrete Logarithms

In the next sections, we will need to solve discrete logarithms given partial knowledge of the exponent. We review here the necessary algorithms. In what follows, let g be a group generator of order N, and let X = g^x be the element of which we seek the discrete logarithm.

Pollard’s Rho algorithm¹⁷ is the most efficient generic algorithm to compute discrete logarithms. On average, it performs group operations and uses a constant amount of memory. Pollard¹⁷ also introduced the lesser-known Lambda method, which performs better when it is known that x < B << N, requiring only group operations and O(log(B)) memory (Section 5.1 of van Oorschot and Wiener²²).

When N = LM, we can compute x mod L by solving a discrete logarithm in the group of order L generated by g^M. The Pohlig–Hellman algorithm¹⁶ applies this to all prime factors of N, and then recovers x via the Chinese Remainder Theorem (CRT).

We will combine all of these techniques in the case where N = q₀ ··· q_n, and x < B. Assume q₀ < ··· < q_n, and let Q = q₀ ··· q_n−1. We first compute x mod q_i for 0 ≤ i < n using Pollard’s Rho, then use the CRT to compute w := x mod Q. Thus g^x = g^zQ+w for some unknown z < ⌈B/Q⌉. Finally, we recover z as the discrete logarithm of g^y/g^w to base g^Q, using Pollard’s Lambda. The total cost is time and O(log(B/Q)) storage. We stress that this strategy is only better than Pohlig-Hellman when B << N. It was proposed by van Oorschot and Wiener to reduce the security of variants of Configuration A that use short exponents for public keys.²¹ We will use it, instead, to recover ephemeral secrets in cross-configuration scenarios described in Section 4.

In Section 5, we will need to solve discrete logarithm instances where some non-adjacent bits of x are known. Neither the Rho nor the Lambda method can take advantage of this information, but the simpler baby-step/giant-step (BSGS) method¹⁹ can. If x has n unknown bits, BSGS performs 1.5 · 2^n/2 group operations on average, and stores 2^n/2 hash table entries. A linear time/memory trade-off is possible in BSGS.

However, as n becomes larger, BSGS has two important drawbacks: it uses unrealistically large amounts of memory, and it parallelizes poorly. A better alternative is van Oorschot and Wiener’s (vOW) parallel collision search applied to meet-in-the-middle algorithms (Section 5.3 of van Oorschot and Wiener²²), which is much more memory efficient, and promises a linear parallel speed-up. Based on their analysis, vOW is expected to require 7 · 2^3n/4-m/2-1n group operations, where 2^m is the amount of storage available, counted as a number of hash table entries, and subject to the constraint m ≤ n/2.

4. Cross-Configuration Attacks

The disagreements on the interpretation of the OpenPGP standard may raise doubts about the interoperability between the libraries. For instance, in an imaginary setting where the Crypto++ code is used to generate a key pair, the Go code is used to encrypt a message to the public key, and the gcrypt code is used to decrypt the ciphertext, it has to be asked whether confidentiality is maintained. While in a basic scenario these three libraries can, to the best of our knowledge, in fact, interoperate securely, we shall now see that some choices made by Crypto++ and gcrypt prove to be fatal in a broader context.

As van Oorschot and Wiener²¹ have already observed: if (1) p − 1 contains enough small factors, and (2) g generates the full group (e.g., in Configuration A), then using short exponents in the public key X = g^x can lead to key recovery, as described in Section 3. As we have seen, Crypto++ uses safe primes, so it is not at risk. Although gcrypt generates p − 1 with many distinct factors, none of these is small, and it is thus also safe.

However, both Crypto++ and gcrypt also use short exponents in the ephemeral key Y = g^y. But the generator g is part of a public key that may have been generated by a different library, possibly one adhering to Configuration A. Our analysis of registered keys shows that such public keys, albeit rare, do exist. This immediately leads to a message recovery attack: given a ciphertext (g^y, M·g^xy), the attacker finds y, uses the public key to compute g^xy, and recovers the cleartext M.

To recap, whenever: (1) the public key of the receiver defines a generator g of a group containing “enough” small-order subgroups, and (2) the sender uses short exponents in the ephemeral key, an attacker who intercepts messages can decrypt any communication from sender to receiver. The exact cost of the attack depends on the number and size of the small subgroups, which we analyze next.

4.1. Practical exploitation

Like in Section 3, let N = q₀ ··· q_n =: Qq_n be the group order of a public key, with q_n not necessarily prime. Assume that ephemeral exponents for encryption are sampled from a short interval [0 … B[, with B << N. As we saw, the cost of recovering the exponent depends on two factors: the largest divisor q_n−1 of Q, and B/Q (note that, while Q and the q_i‘s depend on the receiver’s public key, the bound B is set by the sender).

To quantify the threat, we look at these parameters for those groups in the OpenPGP key dump that contain small factors. None of the “quasi-safe” moduli are seriously at risk, since Q is too small. From the remaining 2158 keys, we narrow down the selection to those that had 2048 bits modulus, were not created before 2016, and were not expired nor revoked, leaving us with 2071 keys. In Figure 1, we plot the (normalized) cumulative counting function

Figure 1. Normalized cumulative distribution function Φ(λ,ρ) of group orders N that contain a 2^ρ-smooth factor Q of at least λ bits.

which counts, among the 2071 keys, how many are exposed to message recovery with a Rho effort at most and a Lambda effort at most . Note that the factors of a single group order can be arranged in more than one way, and thus appear in more than one bucket.

From ϕ(λ,ρ), we obtain the number of keys for which message recovery is possible with a given effort with respect to a given encrypting library. For example, for 2048-bit moduli gcrypt implements the sampling of y such that y < 2³⁴⁴: limiting the Lambda and Rho efforts to roughly 2⁴⁰ modular multiplications, a computation well within reach of a casual attacker, we find that there are at least ϕ(344-80, 80) = 15 vulnerable keys. Even more concerning is the case of Crypto++, which samples exponents in [1 … 2²²⁶]: in this case, the number of weak keys goes up to ϕ(226-80, 80) ≥ 231. It is plausible that a state-level attacker could target as many as ϕ(344-140, 140) ≥ 83 (gcrypt) and ϕ(226-140, 140) ≥ 895 (Crypto++) keys.

To confirm the vulnerability we used GPG to encrypt a message to the only key contributing to ϕ(344-56, 59), expecting to complete the attack using roughly 2³⁰ modular multiplications. We were able to recover the plaintext in less than 2.5 hours on a single Intel E5-2640 core clocked at 2.40GHz.

5. Attacks Enabled By Side-Channels

The attack conditions exploited in Section 4 did not depend on implementation flaws but were solely implied by different implementers coming up with different conceptual interpretations of ElGamal encryption. The current section considers cross-configuration attacks that are also based on implementation weaknesses. We focus specifically on runtime leakage of the modular exponentiation algorithm of gcrypt, which represents a crucial component of its ElGamal implementation. Specifically, an exponentiation algorithm (EA) computes R = B^x from B and x, where base B is an element of a cyclic group and exponent x is a non-negative integer. While the gcrypt implementation took explicit measures to prevent side-channel leakage (in particular, to protect exponent x), by exploiting a condition that the authors allegedly overlooked we present a key recovery attack that significantly reduces the security margin of gcrypt. In isolation, that is, not in the cross-configuration setting, the attack is only practically exploitable against gcrypt-generated keys with moduli up to 1024 bits. However, we show that the vulnerability is more worrisome in an interoperability context where gcrypt decryption is used in combination with a Crypto++ key: in this case, we experimentally confirm a practical key recovery attack against a non-negligible fraction of Crypto++ keys with 2048 bits modulus. In the full version,^7,8 we provide similar analyses of the EAs of the Go and Crypto++ libraries.

The gcrypt library implements its EA via the sliding-window method, which is derived from the basic Square-and-Multiply algorithm. The algorithm depends on a parameter w (for “width” or “window length”) that is instantiated such that 2 ≤ w ≤ 5. The first step of the EA is independent of exponent x and tabulates the values B¹, B³, …, B^2w−1. The second step is independent of base B and regroups the exponent bits into an odd-digit representation (ODR). Precisely, an ODR of a non-negative integer x is a sequence Γ = (γ_l, …, γ₀) such that

The exponent x is first converted into ODR form that has leading coefficient γ_l { = 1,^a the ODR is then encoded into a sequence (e₀, c₁, e₁, c₂, e₂, …, c_L, e_L, c_L+1) such that its odd-index subsequence (e₀, e₁, …, e_L) coincides with the support (γ_j)_j:γj≠0 of Γ (i.e., the set of non-zero elements; note here we have e₀ = γ_l = 1, and for all e_i we have e_i ∈ {1, 3, …, 2^w−1}), and the values c₁, c₂, …, c_L+1 correspond with the lengths of the vanishing subsequences of Γ (i.e., the contiguous runs of zero elements). Precisely, the encoding is such that

The third step of the EA is a computation-intensive online phase in which the results of the first two steps are combined.

We provide a pseudo-code version of gcrypt‘s EA in Figure 2. The function f_w invoked in line 2 (and specified in the full version^7,8) computes the exponent encoding (e₀, c₁, e₁, c₂, e₂, …, c_L, e_L, c_L+1) as follows: it sets e₀ = 1, then alternates between (a) advancing through the bit sequence of x (from most to least significant bit) until it encounters a bit set to 1, and (b) greedily outputting the largest possible pair (c_i, e_i) subject to e_i ∈ {1, 3, …, 2^w−1}. Also lines 5–8 might require further explanation: The code is functionally equivalent with ‘For j ← 1 to c_i: R ← R · R’ followed by ‘R ← R · T[e_i]’, but the implementation reduces side-channel leakage by hiding for each multiplication whether the factor W is R or T[e_i], and, in the latter case, which index e_i is used for the table look-up.^b

Figure 2. Core of exponentiation routine of gcrypt.

Despite its built-in protection measures, we identify a side-channel condition in gcrypt‘s EA that can lead to full exponent recovery. The root of the problem is that the algorithm implements sliding-window exponentiation. Intuitively, this method covers the digits of the bit-representation of the exponent x with a sequence of w−wide non-overlapping windows such that every 1-bit of the exponent is covered by a window and conditioned on this the number of windows is minimized. In Equation (1), value L corresponds with the number of windows used for this, any coefficient c̄_i corresponds with the position of the i-th window, and any coefficient e_i encodes the w exponent bits that the i-th window covers. In the EA implementation, processing a window corresponds with a multiplication R ← R · T[e_i] while bridging a gap between two windows corresponds with a sequence of squaring operations R ← R·R. These operations are jointly implemented in lines 7,8, and each iteration of the loop of line 4 processes one gap-window pair.

The approach of our side-channel attack is to closely monitor the execution of line 4: The number of iterations of the loop during one exponentiation immediately reveals the encoding length L, and the time taken for the i-th iteration is, by line 6, linear in c_i + 1 and thus leaks, one by one, the coefficients c₁, …, c_L.^c Finally, we can recover the coefficient c_L+1 by similarly monitoring the loop of line 10. That is, if (e₀, c₁, e₁, …, c_L, e_L, c_L+1) = f_w(x) is the encoding of exponent x then our measurements reveal all the c_i components while, for now, the e_i components remain hidden.

Before assessing the options for amplifying the extracted information toward recovering the full exponent, we make a final observation on gcrypt‘s EA. Recall that line 7 tries to hide both the table index e_i and whether the multiplication in line 8 is with T[e_i] or R. However, as our method recovers the coefficients c_i straightaway, and exclusively the last iteration of the loop of line 6 will perform multiplication with W = T[e_i] (rather than with W = R), line 7 de facto just hides the e_i coefficients. Thus, replacing lines 6–8 with the instructions “For j ← 1 to c_i: R ← R·R” followed by “Securely R ← R · T[e_i]” would result in code that is equivalent from a security perspective yet simpler and more efficient.^d We, thus, conclude that the authors of gcrypt were likely not aware of the fact that the coefficients c₁, …, c_L+1 leak so easily.

Now that we retrieve copies of the coefficients c₁, …, c_L+1, the next subsections are dedicated to assessing the value of this information toward an attack that recovers the full exponent.

5.1. Modeling the leakage

As a warm-up, observe that when w = 1 the coefficients c_i leak the whole exponent, indeed all e_i are necessarily equal to 1 in this case. For larger values of w, however, a search over the possible values of e_i is necessary. Our goal here is to quantify this partial leakage.

Letting n+1 = ⌈log₂(x+1)⌉ be the bit length of x, we start by observing that n = Σc_i and is thus fully known. The sequence of c_i‘s indicates the position of each e_i in the binary writing of x, and from the description of f_w we know that e_i is odd and e_i < min(2^w, 2^c_i), leaving us with min(w, c_i) − 1 unknown bits in e_i. Note that there is no e_L+1 associated to c_L+1, which indeed leaks the number of trailing zeros of x. Thus, the total number of unknown bits of x, which we shall call the entropy of x, is exactly

Given h, h^x and the leakage for x, the computational effort for finding x grows exponentially with H_w(x). The value H_w(x) depends on x, on the window size w, and on the algorithm f_w used for the encoding. We are thus interested in estimating statistics on H_w(x) for secrets x of n bits.

To this end, we model the encoding function f_w as a Markov process: Model x as a continuous stream of independent and uniformly distributed bits, for any fixed w the leakage of the encoding f_w is described by a finite state machine (see the full version^7,8 for a detailed description), which we then convert to a Markov chain.

Using the Markov chain central limit theorem, we compute that as n → ∞ the entropy H_w(x) tends to a normal variate of mean nμ and variance nσ², where μ and σ² are the “per bit” mean and variance tabulated in Table 1. For example, for w = 2, the average per-bit entropy is μ = 7/24, meaning that, given a uniformly sampled n+1-bits exponent and its leakage c₁, …, c_L+1, on average 7n/24 bits remain to be found.

Table 1. Per bit statistics of the entropy H_w(x) as n → ∞ for different values of the window width w.

Of course, these asymptotic approximations only give a rough estimate of the actual probability distribution of H_w(x) for a given bit length n, however, in our experiments we found them to be quite accurate in practice even for small values of n. Using these approximations, we will give rather precise estimates of the computational effort needed to recover random secret keys generated by gcrypt and other libraries.

5.2. Key recovery

To assess the practical impact of the leakage of sliding window exponentiation exposed so far we need to look into the exact parameters used by the various libraries. Recall that gcrypt uses short secret exponents. The window size used for exponentiation goes from 1 to 5, depending on the bit size of the exponent. Table 2 summarizes the parameters adopted by gcrypt for some common modulus sizes.

Table 2. Group (|q|) and secret exponent (|x|) bit sizes, and window size (w) used by gcrypt for each modulus size (|p|). E(H_w(x)) is the mean leakage entropy estimated using Table 1. “Pollard” indicates the (base 2 log of the) expected running time of Pollard’s Rho algorithm in a group of size q, as a number of modular multiplications. “vOW” indicates the expected running time of van Oorschot and Wiener’s algorithm using a table of 2⁶⁰ entries.

Based on gcrypt‘s parameters, Table 2 also reports on the mean leakage entropy H_w(x) as deduced from Table 1. We can thus compare the expected difficulty of recovering the secret exponent via the leakage against the target difficulty of solving discrete logarithms directly. gcrypt‘s moduli are parameterized so that the best algorithm for computing discrete logarithms is, indifferently, index calculus in ℤ_p, or Pollard’s Rho in the subgroup of largest prime order q. For simplicity, we use the cost of Pollard’s Rho, namely , as our reference difficulty.^e

To exploit the leakage, we resort to vOW parallel collision search (see Section 3), which costs 7.2^{3H_w(x)/4-m/2-1}H_w(x) modular multiplications, where 2^m is the amount of RAM available, counted as a number of hash table entries (e.g., for |p| = 2048 an entry occupies slightly more than 256 bytes). Both Pollard’s Rho and vOW can be parallelized with a linear speed-up (Section 5.1–3 of van Oorschot and Wiener²²), justifying the comparison.

The last two columns of Table 2 report the base 2 logarithm of the expected efforts of Pollard’s Rho, and of vOW assuming a hash table with at most 2⁶⁰ entries (note that even this “little” memory is quite unrealistic). Comparing the columns, the leakage of windowed exponentiation does not appear to be a serious threat for the average gcrypt secret key. However this assessment is deceiving on two accounts: (a) the standard deviation of H_w(x) is rather large, and thus some keys will be much easier to break than others; and (b) gcrypt is assumed to be secure even when it operates on secret keys generated by a different library, as long as these are valid OpenPGP keys. Some libraries, such as Crypto++, generate secret exponents with as few as |q| bits, their keys are thus easier targets.

Taking for reference the most commonly used modulus size |p| = 2048, Crypto++ exponents are between 1 and 226 bits long, and the window size used by gcrypt is at most w = 3, leading to an average entropy of 98.0 bits. Worse still, more than one in 10,000 keys is expected to have at most 80 bits of leakage entropy: for such a small search space, vOWis expected to only require 2⁵⁰ modular multiplications (roughly 2³⁵ Gcycles in software) for a hash table of 2³⁵ entries (roughly 4 TiB). We thus conclude that attacking a non-negligible proportion of Crypto++-generated public keys within gcrypt‘s decryption routine is well within reach of a moderately resourceful adversary.^f

5.3. Proof of concept

To confirm the reality of the threat, without going as far as spending several thousands of dollars, we implement a simple parallelized BSGS using GMP 6.1.2, and test it on a node equipped with 20 Xeon E5-2640 cores clocked at 2.40GHz and 64GiB of RAM. Due to memory contention, the parallel speed-up is sublinear, nevertheless, we observe a speed-up slightly above 10x using all 20 cores, thus justifying our preference for BSGS over the more complex vOW algorithm which is marred by larger constants. Our implementation uses hash table entries of 16 bytes, independently of the modulus size; the node can thus accommodate for tables with as many as 2³² entries, ensuring the running time scales like the square root of the entropy, up to 64 bits of entropy. Figure 3 plots the running time of BSGS for increasing values of H_w(x), and super-imposes the cumulative distribution function of H_w(x) for Crypto++ and gcrypt, indicating what percentage of keys is broken in a given time. Since BSGS is deterministic, we use for the benchmarks the exponent that is tested by BSGS last, thus the expected running time for an average exponent of given entropy is half the time reported in Figure 3.

Figure 3. Running time of BSGS for increasing values of the entropy H_w(x), on our node equipped with 20 cores and 64GiB of RAM. The dashed line indicates extrapolated running times. Also represented the cumulative distribution functions of the normal distributions of parameters (225 · 7/16, 225 · 0.098632) and (339 · 341/640, 339 · 0.126731), approximately corresponding to the entropy distributions of Crypto++ and gcrypt for a modulus of 2,048 bits.

As a final confirmation, we instrument the Crypto++ code to generate random secret exponents in a tight loop, and output them along with their leakage entropy. Over 2²⁸ samples, we obtained a 226 bits exponent with 62 bits of leakage entropy. Our probing strategy described in Section 6 produces the expected leakage pattern, from which our BSGS code recovers the exponent in approximately 30 min using 20 cores.

6. Practical Exploitation

In this section, we prototype the attacks described in Section 5 to show that practical exploitation is possible. The attacks are mounted on a Core i7-8650U CPU with Ubuntu 18.04.5. We demonstrate how FLUSH+RELOAD may be employed to extract leakage from the sliding window modular exponentiation implementation of gcrypt and use that to obtain full key recovery. The exploitation of gcrypt presents interesting aspects for two reasons: (i) the library was patched to protect its modular exponentiation routine against cache-based side channels attacks highlighted by Yarom and Falkner²³ and by Liu et al.¹⁴; and (ii) the attack can be conducted on commodity hardware specifically owing to the ElGamal interoperability issues highlighted in Section 2.

6.1. Threat model

We consider a victim process that uses a private key to decrypt ElGamal ciphertexts. We assume the attacker has knowledge of the victim program and of the specific decryption algorithm that is used. We also assume that the attacker may cause the victim process to execute, for example, because the victim uses a PGP-enabled mail client that automatically decrypts emails upon reception. Consequently, we assume that the attacker is able to synchronize side channel measurements with the victim execution. No physical access is required to perform the attacks. Given that we target cache-side channels, the attacker may either be local, or it may be running on a different, colocated, VM. The former may target any cache level, the latter may only target LLC (unless the VM is hyperthread-colocated). Wlog we conduct the exploitation against the L1 cache: as shown by Liu et al.,¹⁴ the same exploitation may be achieved against higher cache levels.

6.2. Environment

We target the function _gcry_mpi_powm of the latest version gcrypt shipped by Ubuntu 18.04.5 LTS. The function implements the approach described in Figure 2. The implementation takes a few precautions to avoid leaking sensitive data and instruction cache accesses. In particular, it hides (i) whether the operation at line 8 is a modular multiplication or squaring; and (ii) whether, at line 7, W is initialized with R or with an entry from table T (and in that case, it also hides which position e_i of the table is loaded). This is accomplished by way of a conditional memcpy-like operation (mpi_set_cond) that—depending on whether a flag argument is one or zero—copies a source operand into a destination operand, or leaves the destination operand unchanged. Masking is employed to ensure that an attacker cannot learn the nature of the operation by observing the control flow or changes to data or instruction cache. Furthermore, the flag is set using branchless operations.

Despite these precautions, considerable leakage can still be obtained by an attacker who can observe control flow-dependent cache perturbations: in particular, the full set of values c₁, …, c_L can be recovered by counting the number of iterations of the for the loop at line 6 for each of the iterations of the loop at line 4. The attacker may obtain this information by monitoring the state of the instruction cache line that contains code that is executed as part of an iteration of the outer loop and not of the inner loop. In practice, this can be achieved with the FLUSH+RELOAD side channel applied on a line of the instruction cache that is executed once per outer loop before the start of the inner one. The probe reveals when such a cache line is executed, and the time between two probes would depend on the execution time of the inner for loop. Crucially, the latter depends on the set of values c₁, …, c_L, which would thus be leaked. This strategy is applicable to the _gcry_mpi_powm function: the implementation inlines the logic to determine the c_i values between the start of the first and the start of the second loop and so it fills more than one cache line; the invocation of the multiplication in the inner loop makes one iteration sufficiently long and constant-time so that inter-probe timings measurably encode the number of iterations of the inner loop.

6.3. Evaluation

To validate that the assumptions are correct, we prototype an end-to-end attack in the SMT-colocated threat model. The prototype uses two SMT-colocated processes, attacker and victim. The victim process may be triggered by the attacker to perform ElGamal decryption. We use Crypto++ to generate a 2048-bit ElGamal instance. The attacker process uses L1i FLUSH+RELOAD on the virtual address of the memory-mapped libgcrypt.so a shared object that contains code that is executed by the outer loop before the inner loop begins. To collect side channel data, the attacker partitions time into N slots using the rdtsc instruction: at the beginning of each slot, the target line is loaded into the cache and the access time is measured; after that, the cache line is flushed and finally, the remaining (if any) time of the slot is spent in a busy loop. As we shall discuss later, we do not have to change any of the settings that might influence the running time of the operation (e.g., c/p states or Turbo Boost). The attacker thus collects N timing samples for load instructions from the target cache line. We establish a threshold for the target system to determine whether the load is likely to have been served from the cache (cache hit). We then collect 10,000 measurements by repeatedly triggering the victim. We keep track of each slot across all runs with a set of per-slot counters, all initialized to zero. For each run, whenever the sample of a slot indicates a hit based on the threshold, we increment the value of the corresponding counter.

Figure 4a plots the value of the counter of each slot for all runs. While patterns are discernible, the leakage cannot be obtained yet without further data processing. We employ a clustering strategy based on execution time, under the hypothesis that Figure 4a contains samples that are generated at different clock frequencies. Influencing factors are likely to include p-states, c-states, frequency scaling, and the power state of certain execution units. Figure 4b confirms the hypothesis: it plots a histogram of the running times for the operation. The distribution of running times has a long tail, but most of the mass is concentrated in the three peaks. We use this information to cluster samples and show the results of the clustering in Figures 4c and 4d, showing probes whose running time falls in the interval covered by the highest, and second highest peak, respectively. The peak intervals in Figures 4c and 4d accurately encode the c_i leakage for the key used in the exponentiation. Figures 4c and 4d are identical modulo a scaling factor which depends on the difference in running time. With the leakage thus obtained, we refer back to Section 5 for a detailed description of how the secret exponent is fully recovered.

Figure 4. FLUSH+RELOAD attack using 10,000 decryption operations with gcrypt‘s modular exponentiation function. a): number of samples with an icache hit on the target cache line across all runs. b): histogram of the running times (tsc count) for the operation. c) as a) but only considering samples whose total running time lies in the [2.25 · 10⁶, 2.31 · 10⁶] interval. d) as a) but only considering samples whose total running time lies in the [2.2 · 10⁶, 2.25 · 10⁶] interval.

7. Conclusion

We analyze one of the oldest, best understood and most historically used asymmetric encryption schemes, ElGamal.⁹ We reveal that despite its popularity and longevity, when we speak about ElGamal we are referring to several different flavors of it, with key choices being left at the discretion of vendors and implementers. We show how some of these choices create interoperability challenges that lead to insecurity. We propose two “cross-configuration” attacks that are attributable to different, and—from a security standpoint—incompatible, configurations that operate together in the interconnected OpenPGP ecosystem. We believe our work can improve the security of the OpenPGP community and influence the new revision of the standard that is being drafted at the time of writing.

Acknowledgments

The authors thank Werner Koch, Anil Kurmus, Yutaka Niibe, and Filippo Valsorda for the discussions and feedback, and the CCS’21 reviewers for their comments that helped us improve the paper.

Submit an Article to CACM

CACM welcomes unsolicited submissions on topics of relevance and value to the computing community.

You Just Read

On the (In)Security of ElGamal in OpenPGP

View in the ACM Digital Library

Copyright held by authors/owners. Publication rights licensed to ACM.
Request permission to publish from permissions@acm.org

DOI

10.1145/3592835

June 2023 Issue

Published: June 1, 2023

Vol. 66 No. 6

Pages: 107-115

Table of Contents

Join the Discussion (0)

Become a Member or Sign In to Post a Comment

The Latest from CACM

Explore More

News Apr 18 2024

Keeping AI Out of Elections

Bennie Mols

Artificial Intelligence and Machine Learning

BLOG@CACM Apr 17 2024

Technical Marvels

Herbert Bruderer

Computer History

BLOG@CACM Apr 16 2024

The Value of Data in Embodied Artificial Intelligence

Shaoshan Liu

Artificial Intelligence and Machine Learning

Shape the Future of Computing

ACM encourages its members to take a direct hand in shaping the future of the association. There are more ways than ever to get involved.

Get Involved

Communications of the ACM (CACM) is now a fully Open Access publication.

By opening CACM to the world, we hope to increase engagement among the broader computer science community and encourage non-members to discover the rich resources ACM has to offer.

Learn More

1. Introduction

2. Elgamal Encryption

3. Computing Discrete Logarithms

4. Cross-Configuration Attacks

5. Attacks Enabled By Side-Channels

6. Practical Exploitation

7. Conclusion

Acknowledgments

On the (In)Security of ElGamal in OpenPGP

DOI

June 2023 Issue

Related Reading

Join the Discussion (0)

Become a Member or Sign In to Post a Comment

Shape the Future of Computing

Communications of the ACM (CACM) is now a fully Open Access publication.