Home/Magazine Archive/March 2009 (Vol. 52, No. 3)/Error Correction Up to the Information-Theoretic Limit/Full Text

Research highlights
# Error Correction Up to the Information-Theoretic Limit

Ever since the birth of coding theory almost 60 years ago, researchers have been pursuing the elusive goal of constructing the "best codes," whose encoding introduces the minimum possible redundancy for the level of noise they can correct. In this article, we survey recent progress in *list decoding* that has led to *efficient* error-correction schemes with an *optimal* amount of redundancy, even against *worst-case errors* caused by a potentially malicious channel. To correct a proportion (say 20%) of worst-case errors, these codes only need close to a proportion of redundant symbols. The redundancy cannot possibly be any lower information theoretically. This new method holds the promise of correcting a factor of two more errors compared to the conventional algorithms currently in use in diverse everyday applications.

Coping with corrupt data is an essential part of our modern day lives; even if most of the time we are blissfully unaware of it. When we watch television, the TV has to deal with signals that get distorted during transmission. While talking on our cellphones, the phone has to work with audio signals that get corrupted during transmission through the atmosphere though we definitely are aware of it when the connection is poor. When we surf the Internet, the TCP/IP protocol has to account for packets that get lost or garbled while being routed. When we watch movies on DVDs, the player has to overcome loss of data due to scratches. Even when we buy our groceries, the scanner has to deal with distorted barcodes on packages.

The key ingredient in coping with errors in these and many other applications is an *error-correcting code* or just *code* for brevity. The idea behind codes is conceptually simple: add redundancy to the information so that even if the resulting data gets corrupted, e.g. packets get corrupted during routing or the DVD gets some scratches, the original information can still be recovered.

Ideally, we would like to add a small amount of redundancy and at the same time be able to correct many errors. As one might expect, these are conflicting goals and striking the right balance is where things get interesting. For example, consider the code where every information bit is repeated say a 100 times (this is known as the repetition code). Intuitively, this code should do well. In particular, the following is a natural error-recovery procedure or a *decoding algorithm*: for every consecutive 100 bits of the data, identify whether the majority of the bits is 0 or 1, and output the corresponding bit. Unless we happen to be unlucky, this decoding algorithm can recover from quite a few errors. The downside is that every 100 bits of data contain only *one* bit of information imagine how large a DVD would need to be in order to store a movie with such redundancy. On the other extreme is the parity code, which appends the parity of the information bits at the end of the message. This code uses the minimum amount of redundancy possible but has poor error-recovery capabilities. Indeed, even if just two bits get flipped, it can go undetected. For example, 0001 gets encoded as 00011 under the parity code. If the first two bits get corrupted and we receive 11011, we would misinterpret the original message to be 1101. Imagine Clark Gable saying at the end of your small parity encoded DVD for *Gone with the Wind*, "Frankly, my dear, I don't give a ham!"

To capture this inherent tension between the redundancy and the error tolerance of codes, let us define codes and some key parameters formally. A code *C* is given by an *encoding* map of the form *C*: ^{k} ^{n} (for integers *k < n*) which encodes a sequence of *k* symbols from into a larger sequence of *n* symbols. Given a *message m* ^{k}, *C*(*m*) is known as the corresponding *codeword.* The parameters *k, n*, and are called the *dimension, block length*, and *alphabet* of *C*, respectively. We will often use the ratio *R = k/n*, which is called the *rate* of *C*. Note that *R* exactly captures the amount of information contained per bit of a codeword. *The Hamming distance* between two strings in ^{n} is the number of coordinates where they differ. The *minimum distance* of a code is defined to be the smallest Hamming distance between two distinct codewords.

Thus, our question of interest can be now re-stated as follows: given a code *C* of rate *R*, what is the maximum fraction of errors (which we will henceforth refer to as ) that can be tolerated by *C*? Now as every codeword has *k* symbols of information, it is intuitive that in the worst case at least *k* symbols of a codeword should be uncorrupted to have any hope of recovering the original information. In other words, we can only have (*n - k*)/*n* = 1 - *R*, irrespective of the computational power of the decoder.

The main focus of this article is the following question:

Can we construct a code

Cof rateRthat can be efficiently decoded from close to a 1 -Rfraction of errors?

Quite surprisingly, we will show the answer to be yes. Thus, the above simple information-theoretic limit can in fact be approached. In particular, for small rates, we can recover from situations where almost all of the data can be corrupted. For example, we will be able to recover even if Clark Gable were to spout "alhfksa, hy meap xH don'z hive b hayn!" There are in fact two parts to the above question. First, the code should be such that the identity of the message is uniquely (or near uniquely) pinned down based on the noisy version of its encoding. Second, we need an *efficient* procedure to recover the message based on the corrupted codeword, with runtime bounded by a hopefully small polynomial in the block length *n*. Brute-force approaches such as searching through all codewords require time exponential in *n*, and are computationally prohibitive. The main message of this article is that a variant of the widely deployed Reed-Solomon codes can in fact be error-corrected *efficiently*, even as the fraction of errors approaches the absolute 1 - *R* limit.

We stress that in the above question, the noise model is a *worst-case* one, where the channel is modeled as an adversary/jammer that can corrupt the codeword in an arbitrary manner, subject only to a limit on the total number of errors. No assumptions are made on how the errors are distributed. The worst-case model was put forth in Hamming's influential paper.^{12} An alternate approach, pioneered by Shannon in his seminal work,^{16} is to model the noisy channel as a stochastic process whose behavior is governed by a precisely known probability law. A simple example is the binary symmetric channel (BSC* _{}*) where each bit transmitted gets flipped with a probability , independent of all other bits. For every such channel, Shannon exactly characterized the largest rate at which reliable communication is possible.

We note that obtaining *algorithmic* results is more difficult in worst-case noise model. In fact, traditionally it was believed that codes designed for worst-case noise faced a limit of (1 - *R*)/2 fraction of errors, which is factor 2 off from the information-theoretic limit of a (1 - *R*) fraction of errors. In attempting to correct *e* > (*n* - *k*)/2 errors, we face a problem: there may be more than one codeword within Hamming distance *e* from the received word. In any code mapping *k* symbols to *n* symbols, there must be two codewords at distance (*n* - *k* + 1) or less. If one of these codewords is transmitted and gets distorted by errors to halfway between the two codewords, unambiguous recovery of the original message becomes infeasible. This suggests that beyond a fraction (1 - *R*)/2 of worst-case errors, the original message is unrecoverable. This was indeed the conventional wisdom in coding theory till the 1990s.

A natural strategy, in the event of multiple close-by codewords, would be to allow the decoder to output a list of codewords. This model is called *list decoding*. It was introduced in the late 1950s by Elias^{2} and Wozencraft,^{19} and revived with an algorithmic focus for a cryptographic application by Goldreich and Levin.^{4} Further, it turns out that the bad case above is rather pathological-for typical patterns of *e* errors, for *e* much bigger than (*n* - *k*)/2, the original codeword will be the only codeword within distance *e*. Thus, list decoding in addition to handling the bad error patterns for "unique" decoding, also allows us to uniquely recover the transmitted codeword for *most* error patterns.

It is interesting to note that even though the list decoding problem has a long history in the coding theory world,^{a} a large share of the algorithmic progress in list decoding has happened in the theoretical computer science community. One of the reasons for this happy coincidence is that list decoding has found numerous applications in cryptography and computational complexity theory (e.g., see the discussion on randomness extractors in Section 5).

In particular, in the last decade, the subject of list decoding has witnessed a lot of activity, culminating in algorithms that correct close to the information-theoretically optimal 1 - *R* fraction of errors with rate *R*. The purpose of this article is to discuss this recent body of results which deliver the full promise of codes against worst-case errors. We begin in Section 2 by describing a popular family of codes and a few decoding algorithms for it.

In 1960, Irving Reed and Gustave Solomon described a construction of error-correcting codes, which are called Reed-Solomon codes after them, based on polynomials over finite fields.^{b} Almost 50 years after their invention, Reed-Solomon codes (henceforth, RS codes) remain ubiquitous today in diverse applications ranging from magnetic recording to UPS bar codes to satellite communications. To describe the simple and elegant idea behind RS codes, imagine Alice wishes to communicate a pair of numbers (*a*, *b*) to Bob. We can think of (*a*, *b*) as specifying a line in the plane (with *X*, *Y* axes) with equation *Y* = *aX* + *b*. Clearly, to specify the line, it suffices to communicate just two points on the line. To guard against errors, Alice can oversample this line and send more points to Bob, so that even if a few points are distorted by errors, the collection of points resembles the original line more closely than any other line (Figure 1). Thus, Bob can hope to recover the correct line, and in particular (*a*, *b*).

To encode longer messages consisting of *k* symbols via an RS code, one thinks of these as the coefficients of a polynomial *f*(*X*) of degree *k* - 1 in a natural way, and encodes the message as *n* > *k* points from the curve *Y* - *f*(*X*) = 0. Equivalently, the encoding consists of the values of the polynomial *f(X)* at *n* different points. Formally, if F is a finite field with at least *n* elements, and *S* = (_{1}, _{2}...._{n}) is a sequence of *n distinct* elements, the RS encoding RS_{F, S, n, k} (*m*) of a message *m* = (*m*_{0}, *m*_{1},..., *m*_{k}-1) is given by

where *f(X)= m*_{0} + *m*_{1}*X* + ... + *m*_{k}-1, *X*^{k}-1. We stress here that the choice of *S* is up to the code designerwe will exploit this feature in Section 3.2.

The following basic algebraic fact will be crucial:

A non-zero polynomial

p(X)of degreeDover a fieldFcan have at mostDdistinct roots, i.e., at mostDelementsFcan satisfyp() = 0.

This fact implies that the encodings of two distinct messages *m* and m by RS_{F,S,n,k} must differ in more than *n - k* locations.^{c} The minimum distance of the RS code is thus at least *n - k +* 1. It is in fact equal to *n - k +* 1: e.g., consider encodings of the messages corresponding to the zero polynomial and the polynomial . A minimum distance of *(n - k +* 1) is the best possible for a code of dimension *k*, making RS codes optimal in this regard.

**2.1. Correcting errors in RS codewords**

Why is the above large distance useful? If at most *(n - k)*/2 errors corrupt the evaluations of a polynomial *f*(*X*), then the encoding of *f(X)* remains the best fit of the corrupted data in terms of agreements than the encoding of any other polynomial *g(X)* of degree less than *k*. Thus, one can hope to recover *f(X)* and the correct message even in the presence of *(n - k)*/2 errors. However, it is not immediate how to isolate the errors and recover *f(X)* *efficiently.* We do not know the locations of the errors, and trying all possibilities will need exponential time.

Back in 1960, even before polynomial running time was formalized as the notion underlying efficient algorithms, Peterson^{15} described a polynomial time algorithm to solve the above problem. We now describe the high-level idea behind a different algorithm, due to Welch and Berlekamp,^{18} following the elegant description in Gemmell and Sudan.^{3}

Assume that the encoding *(f( _{1}*),...,

**Error Location via Bivariate Interpolation:** The key is thus a clever method to locate the set *E* of error locations quickly. To motivate this, let us cast the problem geometrically as an equivalent *noisy curve fitting* problem. We are given *n* points (_{i}, y_{i}) *i* = 1,2,..., *n* in the "plane" **F** × **F**. The goal is to find the unique curve with equation *Y f(X) =* 0 where degree(*f*) < *k* that passes through all but *e* *(n k)/2* locations *i*, namely those *in E.* If there was no noise, fitting a curve through all *n* points is easyit is just polynomial interpolation. We know *Y f(X)* passes through *n e* points, so we can get a curve that passes through all the points by fitting vertical lines through the error points along with the curve *Y f(X) =* 0; see Figure 2. Algebraically, if we define

then the curve *P(X, Y) =* 0 passes through all the points, i.e., *P( _{i}*,

Given *P(X, Y)*, one can find its factors (which can be done efficiently) and thus read off the message polynomial *f(X)* from the *Y f(X)* factor. But how do we find *P(X, Y)?* Finding *P(X, Y)* in its factored form (1) is begging the question, but note that we can also write *P(X, Y)* in the form *P(X, Y) = D*_{1}(*X) Y - N*_{1}(*X)* where degree(*D*_{1}) *e* *(n k)/*2 and degree (*N* _{1}) < *e* + *k* *(n + k)/*2.

Knowing such a polynomial exists, we can try to find a nonzero bivariate polynomial *Q(X, Y) = D*_{2}(*X)Y N*_{2}(*X)* satisfying

- degree(
*D*_{2})*(n - k)/*2 and degree*(N*_{2}) < (*n + k*)/2 *Q(*,y_{i}_{i}) = 0 for*i =*1, 2, ...,*n*

This can be done by setting up a system of linear equations over **F** with unknowns being the coefficients of *D*_{2}*(X)* and *N*_{2}*(X)*, and *n* linear constraints *Q( _{i}*,

One can prove, using elementary algebra, that when the number of errors *e* *(n k)/*2, *any* interpolated *Q(X, Y)* satisfying the above two conditions *must* have *P(X, Y)* as a factor, and be of the form *P(X, Y) A(X)* for some nonzero (possibly constant) polynomial *A(X).* The intuitive reason is that since there are so few errors in the data compared to the curve *Y f(X)* = 0, the *curve P(X,Y)* = 0 is the most economical way to fit all the data points. The formal proof proceeds by considering the polynomial *R(X)*^{def}= *Q(X, f(X))* and arguing it must be identically zero since (i) it has at least *(n + k)/*2 roots (namely all _{i}'s for which *f*(* _{i}*) =

**2.2. List decoding Reedsolomon codes**

We now turn to list decoding of ReedSolomon codes. The setup is as before: given *n* points (* _{i}*,

Before delving into the algorithms, we pause to consider how large a number of errors *e* one can target to correct. Clearly, we need the guarantee there will be only a few (say, at most polynomially many in *n)* solution polynomials *(X)* or else there is no hope for the decoder to output all of them in polynomial time. Using the fact that encodings of any two polynomials differ in more than *(n k)* locations, it can be shown (via the so-called "Johnson bound") that for , the number of solutions (called the *list size)* is guaranteed to be polynomially small. Whether one can prove a polynomial list size bound for certain RS codes for even larger *e* remains a key open question.

We now describe the idea underlying Sudan's breakthrough algorithm for list decoding RS codes.^{17} Recall that we want to solve a noisy curve fitting problem, and output all curves *Y (X) =* 0 with deg() < *k* that pass through *n e* or more of the *n* points (* _{i}*,

For the BerlekampWelch algorithm, arguing that *Y (X)* was a factor of *Q(X, Y)* followed from the very special structure of *Q(X, Y).* In the list decoding case, Sudan exploited special properties of intersections of curves of the form *Y - (X)* with any interpolated bivariate polynomial *Q(X, Y)* with appropriate degree constraints. Informally, Sudan's idea is that given the strong degree constraint on *Q(X, Y)*, every curve *Y (X) =* 0 with deg() < *k* that picks up at least *n e* of points must be "used" by the interpolated curve in meeting the requirement to pass through all *n* points. As an example, in Figure 3, the goal is to find all lines (i.e., we have *k =* 2) that pass through all but *e =* 9 of the *n =*14 input points (there are two such lines, marked in the figure *as L*_{1}(*X, Y)* and *L*_{2}*(X, Y)).* There are enough degrees of freedom in the equation of a degree 4 curve so that one can fit a degree 4 curve through any set of 14 points. The figure illustrates one such curve, which is the product of the two lines with an "ellipse" *E(X, Y).* (Note that the total degree of *Q(X, Y)* is 4.) Further, we see that the two relevant lines pop out as factors. This is not a coincidence, and every degree 4 curve passing through the 14 points must have these two lines as factors. The reason: if a line is not a factor, then it can intersect a degree 4 curve in at most 4 points. Since each of these lines intersects any interpolated curve in at least 5 points, it must be a factor.

More formally, the "degree" measure of the interpolated polynomial *Q(X,Y)* will be the (1, *k * 1)-degree, which is defined as the maximum of *i* + (*k* 1)*j* over all monomials *X ^{i}*Y

For low rates, this algorithm enables recovery even in settings when noise overwhelms correct data, and close to 100% of the symbols may be in error. This feature sets the stage for several powerful applications in cryptography and complexity theory. However, the algorithm does not give any improvement over the (1 *R*)/2 error fraction corrected by traditional algorithms for rates > 1/3, and also falls short of the radius suggested by the combinatorial bounds.

We now turn to the improved algorithm correcting a fraction of errors due to Guruswami and Sudan.^{10} The key new idea is to insist that the interpolated polynomial *Q(X, Y)* has *multiple* zeroes at each of the *n* points. To explain this, we attempt a high-level geometric description. Consider the example in Figure 4 with *n =* 10 points, the goal being to output all lines that pass through at least *n e =* 4 points. This example cannot be solved by Sudan's algorithm. Indeed, since there are five solution lines, if they are all factors of some interpolated curve, the curve must have degree at least 5. However, there is no guarantee that an arbitrary degree 5 curve through the points must have every line passing through 4 of the points as a factor (the line has to pass through 6 points to guarantee this). Let *C** be the degree 5 curve that is the product of the five solution lines. As mentioned above, if we interpolate a degree 5 curve through the 10 points, we will in general *not* get *C** as the solution. However, notice a special property of *C**it passes through each point *twice;* a priori there is no reason to expect that an interpolated curve will have this property. The GuruswamiSudan idea is to *insist* that the interpolation stage produces a degree 5 curve with zero of multiplicity at least 2 at each point (i.e., the curve intersects each point twice). One can then argue that each of the five lines must be a factor of the curve. In fact, this will be the case for degree up to 7. This is because the number of intersections of each of these lines with the curve, counting multiplicities, is at least 4 × 2 = 8 which is greater than the degree of the curve. Finally, one can always fit a degree 7 curve passing through any 10 points twice (again by a counting argument). So by insisting on multiplicities in the interpolation step, one can solve this example.

In general, the interpolation stage of the GuruswamiSudan list decoder finds a polynomial *Q(X, Y)* that has a zero of multiplicity *w* for some suitable integer *w* at each *( _{i}*, y

**Soft Decoding:** We now comment on a further crucial benefit of the multiplicities idea which is relevant to potential practical applications of list decoding. The multiplicities can be used to encode the relative importance of different codeword positions, using a higher multiplicity for symbols whose value we are more confident about, and a lower multiplicity for the less reliable symbols that have lower confidence estimates. In practice, such confidence estimates (called "soft inputs") are available in abundance at the input to the decoder (e.g., from demodulation of the analog signal). This has led to a promising *soft-decision* decoder for RS codes with good coding gains in practice,^{13} which was adopted in the Moonbounce program to improve communication between Ham radio operators who bounce radio signals off the moon to make long distance contacts.

We now discuss a variant of RS codes called *folded ReedSolomon codes* (henceforth folded RS codes), which will let us approach the optimal error-correction radius of a fraction 1 *R* of errors. The codewords in the folded RS code will be in one-to-one correspondence with RS codewords. We begin with an informal description. Consider the RS codeword corresponding to the polynomial (*X*) that is evaluated at the points *x*_{0}, *x*_{1}, ..., *x _{n}*

We would like to stress on a subtle point here: in the worst-case error model, the "atomic" unit of error is an alphabet character. This was used crucially in the example above to rule out an error pattern that was admissible for the smaller alphabet. For the reader who might we worried that this constitutes "cheating," e.g., what if one collapses the entire RS codeword into one large symbol, we offer two counter-points. First, since we will only use a *constant* folding parameter, the increase in alphabet size from that of RS codes is modest. Second, in Section 4, we will see how to convert folded RS codes into codes over alphabets whose size *does not depend at all* on the block length, while still maintaining similar error-correction properties.

We now formally define the folded RS code. Let the nonzero elements of the field **F** be generated by , i.e., every nonzero element is * ^{i}* for some 0

The block length of these codes is *N* = *n*/*m*. The rate of the code remains *k*/*n*, since the folding operation does not introduce any further redundancy.

The folding operation does restrict the error patterns that one needs to worry about. But how can one actually exploit this in a decoding algorithm and manage to correct a larger fraction of errors compared to the unfolded RS codes? We turn to this question next.

**3.1. Multivariate decoding**

Recall the two step Guruswami-Sudan (GS) algorithm. First, we interpolate a bivariate polynomial *Q*(*X*, *Y*) through the points (* _{i}*,

Let us now return to problem of list decoding folded RS code with *m =* 4. Given the received word whose *i*th symbol (for 0 *i < N)* is *(y _{i,0}*

Given the setup above, the first thing to explore is whether one can generalize the GS algorithm to the setting of interleaved RS codes. To see one such generalization, note that RS codes are interleaved RS codes of order 1. The GS algorithm interpolated a nonzero bivariate polynomial *Q(X, Y)* in this case. Thus, for an interleaved RS code of order 4, a natural attempt would be to compute a nonzero 5-variate polynomial *Q(X, Y, Z, U, W)*, where (as before) *Y* is a placeholder *for (X)* and (this part is the generalization) *Z, U*, and *W* are placeholders *for _{1}*(X),

The generalization above (and similar ideas) were tried out in a few works, but could not decode beyond the radius. Finally, 7 years after the GS algorithm was published, Parvaresh and Vardy^{14} had an ingenious idea: force the polynomials _{1}(*X*), _{2}(*X*) and _{3}(*X*) to be related to (*X*). In particular, they only look at the subcode of the interleaved RS code where * _{j}* (

Finally, let us return to the problem of list decoding folded RS codes with *m* = 4. *In* folded RS codes also, we have the property that * _{j}*(X) is related

**3.2. The final piece**

The improvement over Parvaresh-Vardy comes from comparing apples to oranges. In particular, till now we have seen that the folded RS code with folding parameter 4 is a special case of the PV code of order 4. Instead let us compare the folded RS code with a PV code of *smaller* order, say 2. It turns out that the folded RS code with folding parameter 4 are compressed forms of certain specific PV codes of order 2 and we will exploit this observation. In particular, as in Figure 7 compare the folded RS codeword with the PV code of order 2 (where the polynomial (*X*) is evaluated at the points {1, , *...*, ^{n}^{1}}\{^{3}, * ^{7}*, ...,

Thus, our list decoding algorithm for folded RS with folding parameter *m* can be modularly defined as follows: unfold the received word for the appropriate PV code of order *sm* and then run the Parvaresh&Vardy list decoder on this unfolded received word. It turns out that this list decoder can correct such a folded RS code of *R* from up to fraction of errors. By picking *m* to be (somewhat) larger than *s* and picking *s* to be sufficiently large (in terms of 1/), we can conclude the following result.

Theorem 1. *For every >* 0 *and* 0 < *R <* 1, *there is a family of folded RS codes that have rate at least R and which can be list decoded up to a fraction* 1 - *R - of errors in time* (*N/ ^{2}*)

We note that the time complexity has an undesirable dependence on , with 1/ in the exponent. Improving this bound remains a challenging open question.

So far we have discussed codes over large alphabets. For example, folded RS codes of rates *R* that can be list decoded from 1 *R * ** fraction of errors need alphabet size of roughly ***n ^{O(1/2}*), where

We start with perhaps the most natural small alphabet: {0, 1}. For codes defined over this alphabet (also called *binary* codes), it turns out that to list decode from fraction of errors the best possible rate is 1 *H*(*)*, where *H(x) = x*log_{2}*x * (1 *x*) log_{2}(1 *x*) is the entropy function. Two remarks are in order. First, the rate 1 *- H*(*)* is much smaller than the rate of 1 that folded RS codes can achieve. (It turns out that to attain a rate of 1 , the alphabet size needs to be at least 2^{1/}; more on this later in the section.) Second, as shown in Shannon's seminal paper,^{16} the quantity 1 *H*(*)* is *exactly* the same as the best possible rate (aka "capacity") that can be achieved in the binary symmetric channel BSC. Thus, list decoding can bridge the traditionally perceived gap between the Shannon stochastic model and the Hamming worst-case model.

We "transfer" our result for folded RS codes to a result for binary codes via a natural method for composing together codes called *code concatenation*, proposed by Forney over 40 years ago. Thanks to the powerful algorithms for decoding folded RS codes, we can use this approach to achieve a certain trade-off called the *Zyablov bound* between rate and fraction of errors corrected.^{9} In a subsequent work, using another generalization of code concatenation, we improved the trade-off to the *Blokh-Zyablov bound*.^{8} Figure 8 plots these two trade-offs along with the best possible trade-off (list-decoding capacity). There is a large gap between the list-decoding capacity of binary codes and the best bound known to be achievable with efficient algorithms. Closing this gap remains a central and extremely challenging open question.

We now briefly mention how we resolve the large alphabet issue that was raised in Section 3. When the folding parameter of the folded RS code is a constant (as in Theorem 1), the number of bits needed to represent a symbol from the alphabet is no larger than roughly the logarithm of the block length of the folded RS code. This is small enough to use the idea of code concatenation mentioned above to reduce the alphabet. In order to maintain the optimal trade-off between rate and fraction of errors list decoded, we need to combine concatenation with an approach to redistribute symbols using *expander graphs*.^{6} This leads to codes of rate *R* that can be list decoded from a fraction 1 *R* of errors over an alphabet of size 2^{1/} ^{4}, which is close to the lower bound of 2^{1/} mentioned earlier.

First, we mention some related work that appeared subsequent to the initial publication of our result. Further work on extending the results in this article to the framework of *algebraic-geometric codes* has been done in Guruswami^{5} and Guruswami and Patthak.^{7} A surprising application of the ideas in the Parvaresh-Vardy list decoder is the construction of *randomness extractors* by Guruswami, Umans, and Vadhan.^{11} Randomness extractors convert input from a weakly random source into an almost perfectly random string, and have been intensively studied in theoretical computer science for over 15 years. This recent extractor is almost optimal in all parameters, while having a simple, self-contained description and proof.

Even though the work presented in this article makes good progress in our theoretical understanding of list decoding, applying these ideas into practice requires further innovation. We conclude by posing two practical challenges.

The first challenge is specific to making the list decoding algorithms for folded RS codes more practical. Recall that the algorithm involved an interpolation step and a "root-finding" step. There are fast heuristic approaches for the latter step that could be used in practice. The interpolation step, however, seems too inefficient for practical purposes due to the large size of the linear systems that need to be solved. It would be very useful to have more efficient algorithms for this step. We note that such improvements for the GuruswamiSudan algorithm have been obtained.

The second challenge is more general. Codes have found numerous practical applications in domains such as communication and data storage. Despite its promise and the recent advances, list decoding has not yet found widespread use in practical systems (though as mentioned earlier, the Moonbounce program does use the multiplicities based list decoder). One possible reason could be that the previous list decoding algorithms do not provide much gain for the high rate regime over traditional unique decoding algorithms. However, this is no longer a concernwe now have algorithms that obtain much better theoretical bounds in this regime. Further, folded RS codes are very similar to RS codes that are ubiquitous in practice. Hopefully in the near future, list decoding will be used more widely in practical systems.

The research described here was supported in part by NSF award CCF-0343672 and fellowships from the Sloan and Packard Foundations. We thank Ronitt Rubinfeld for several valuable comments on an earlier draft of this paper.

1. Berlekamp, E. Factoring polynomials over large finite fields. *Math. Comput. 24* (1970), 713735.

2. Elias, P. List decoding for noisy channels. *Technical Report 335*, MIT Research Lab of Electronics, 1957.

3. Gemmell, P., Sudan, M. Highly resilient correctors for multivariate polynomials. *Information Processing Lett. 43*, 4 (1992), 169174.

4. Goldreich, O., Levin, L. A hard-core predicate for all one-way functions. In *Proceedings of the 21st ACM Symposium on Theory of Computing*, 1989, 2532.

5. Guruswami, V. Artin automorphisms, cyclotomic function fields, and folded list-decodable codes, 2008. Manuscript.

6. Guruswami, V., Indyk, P. Linear-time encodable/decodable codes with near-optimal rate. *IEEE Trans. Information Theory 51*, 10 (2005), 33933400.

7. Guruswami, V., Patthak, A. Correlated algebraic-geometric codes: Improved list decoding over bounded alphabets. *Math. Comput. 77*, 261 (Jan. 2008), 447473.

8. Guruswami, V., Rudra, A. Better binary list-decodable codes via multilevel concatenation. In *Proceedings of the 11th International Workshop on Randomization and Computation*, 2007, 554568.

9. Guruswami, V., Rudra, A. Explicit codes achieving list decoding capacity: Error-correction up to the Singleton bound. *IEEE Trans. Information Theory 54*, 1 (2008), 135150. Preliminary version in *STOC'06*.

10. Guruswami, V., Sudan, M. Improved decoding of ReedSolomon and algebraic-geometric codes. *IEEE Trans. Information Theory*, *45* (1999), 17571767.

11. Guruswami, V., Umans, C., Vadhan, S.P. Unbalanced expanders and randomness extractors from ParvareshVardy codes. In *Proceedings of 22nd IEEE Conference on Computational Complexity*, 2007, 96108.

12. Hamming, R.W. Error detecting and error correcting codes. *Bell System Technical J. 29* (Apr. 1950), 147160.

13. Koetter, R., Vardy, A. Algebraic soft-decision decoding of ReedSolomon codes. *IEEE Trans. Information Theory 49*, 11 (2003), 28092825.

14. Parvaresh, F., Vardy, A. Correcting errors beyond the GuruswamiSudan radius in polynomial time. In *Proceedings of the 46th IEEE Symposium on Foundations of Computer Science*, 2005, 285294.

15. Peterson, W.W. Encoding and error-correction procedures for BoseChaudhuri codes. *IEEE Trans. Information Theory*, *6* (1960), 459470.

16. Shannon, C.E. A mathematical theory of communication. *Bell System Technical J. 27* (1948), 379423, 623656.

17. Sudan, M. Decoding of ReedSolomon codes beyond the error-correction bound. *J. Complexity*, *13*, 1 (1997), 180193.

18. Welch, L.R., Berlekamp, E.R. Error correction of algebraic block codes. *US Patent Number 4,633,470*, December 1986.

19. Wozencraft, J.M. List Decoding. *Quarterly Progress Report, Research Laboratory of Electronics, MIT*, 48 (1958), 9095.

a. The problem was introduced about 50 years ago and the main combinatorial limitations of list decoding were established in the 1970s and 1980s.

b. For this article, readers not conversant with fields can think of a finite field as {0, 1, ..., *p*1) for a prime *p* with addition and multiplication operations defined modulo *p*.

c. If not, RS_{F,S,n,k}(m)RS_{F,S,n,k}(m), which corresponds to the evaluation of the non-zero polynomial (*m _{i}*m

d. We skip formalizing the notion of multiple zeroes in this description, but this follows along standard lines and we refer the interested reader to Guruswami and Sudan^{10} for the details.

e. This is the same as looking at an appropriate subset of messages.

The original version of this paper was published in *IEEE Transactions on Information Theory*, 2008.

DOI: http://doi.acm.org/10.1145/1467247.1467269

Figure 1. Oversampling from a line *Y* = *aX* + *b* to tolerate errors, which occur at *X* = 3 and 5.

Figure 2. An RS codeword (polynomial *f*(*X*) evaluated on points _{1}, _{2},..., _{11}); its corruption by two errors (at locations _{2} and _{5}); and illustration of the curve fitting through the noisy data using correct curve and "error-locator lines."

Figure 3. illustration of list decoding of RS code that evaluates lines over the points 7, 5, 4,..., 4, 5, 6, 7. The two lines are recovered as factors of a degree 4 interpolated curve through all the points.

Figure 4. Illustration of GuruswamiSudan algorithm for list decoding RS codes. The lines are recovered as factors of a degree 5 curve that passes through each point twice.

Figure 5. Rate vs. error-correction radius for RS codes. The optimal trade-off is also plotted, as is the ParvareshVardy's improvement over RS codes.

Figure 6. Folding of the ReedSolomon code with parameter *m* = 4. Each column represents an alphabet character.

Figure 7. The correspondence between a folded RS code (with *m* = 4 and *x _{i}* =

Figure 8. Error-correction radius of our algorithms for binary codes plotted against the rate *R*. the best possible trade-off, i.e., list decoding capacity, is = *H*^{1}(1 *R*), and is also plotted.

**©2009 ACM 0001-0782/09/0300 $5.00**

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee.

The Digital Library is published by the Association for Computing Machinery. Copyright © 2009 ACM, Inc.

No entries found