High-frequency trading (HFT) has emerged as a powerful force in modern financial markets. Only 20 years ago, most of the trading volume occurred in exchanges such as the New York Stock Exchange, where humans dressed in brightly colored outfits would gesticulate and scream their trading intentions. Today, trading occurs mostly in electronic servers in data centers, where computers communicate their trading intentions through network messages. This transition from physical exchanges to electronic platforms has been particularly profitable for HFT firms, which invested heavily in the infrastructure of this new environment.
Although the look of the venue and its participants has dramatically changed, the motivation of all traders, whether electronic or human, remains the same: to buy an asset from a location/trader and to sell it to another location/trader for a higher price. The defining difference between a human trader and an HFT is that the latter can react faster, more frequently, and has very short portfolio holding periods. A typical HFT algorithm operates at the sub-millisecond time scale, where human traders cannot compete, as the blink of a human eye takes approximately 300 milliseconds. As HFT algorithms compete with each other, they face two challenges:
Online algorithms provide a natural class of algorithms suitable for HFT applications. In an online problem, new input variables are revealed sequentially. After each new input the algorithm needs to make a decisionfor example, whether or not to submit a trade. This is in stark contrast to an offline problem, which assumes the entire input data is available at the time of the decision making. Many practical optimization problems addressed in computer science and operations research applications are online problems.1
Besides solving an online problem, HFT algorithms also need to react extremely fast to market updates. To guarantee a fast reaction time, efficient memory handling becomes a necessity for a live trading algorithm. Keeping a large amount of data in memory will slow down any CPU, so it is important that an algorithm uses only a minimal amount of data and parameters, which can be stored in fast accessible memory such as the L1 cache. In addition, these factors should reflect the current state of the market and must be updated in real time when new data points are observed. In summary, the smaller the number of factors that need to be kept in memory and the simpler the computation required to update each factor, the faster an algorithm is able to react to market updates.
Based on the speed requirement and the online nature of HFT problems, the class of one-pass algorithms is especially suitable for HFT applications. These algorithms receive one data point at a time and use it to update a set of factors. After the updating, the data point is discarded and only the updated factors are kept in memory.
In this article we discuss three estimation problems that can arise in HFT algorithms and that can be efficiently solved with a one-pass algorithm. The first is the estimation of a running mean of liquidity; this can be useful to an HFT algorithm in determining the size of an order that is likely to execute successfully on a particular electronic exchange. The second problem is a running violation estimation, which can help quantify the short-term risk of a position. The third problem is a running linear regression, which can be used in trading pairs of related assets.
The one advantage that HFT has over other market participants is reaction speed. HFT firms are able to see every action in the marketthat is, the information contained in the limit order bookand react within microseconds. Though some HFT algorithms may base their actions on a source of information outside the market (say, by parsing news reports, measuring temperature, or gauging market sentiment), most base their decisions solely on the processing of messages arriving to the market. By some estimates, there are approximately 215,000 quote updates per second on the New York Stock Exchange.4 The challenge for HFT algorithms is to process this data in a way that allows them to make decisions, such as when to enter positions or reduce risk. The examples used in this article assume that HFT algorithms can observe every update in the best bid and ask prices, including the best bid and ask sizes. This subset of information contained in the limit order book is often referred to as the Level-I order book information.
The following three examples of online algorithms, each motivated with an application in HFT, are described in detail in this article:
In all three cases, the algorithm has a single parameter, alpha, which controls the rate at which old information is forgotten. Figure 1 plots the raw liquidity measure (bid size plus ask size) in blue. Red and green represent the online liquidity factor, with alpha=0.9 and alpha=0.99, respectively. Note that as alpha approaches a value of 1, the signal gets smoother and efficiently tracks the trend in the underlying data.
Figure 2 plots the online volatility measure for various values of alpha. Once again, notice that the measure is smoother for the larger alpha. Although a larger alpha provides a smoother signal, it also lags further behind the underlying trend as it gives a lot of weight to older data. As discussed later, choosing a value for alpha translates into a trade-off between a smooth signal and a reduced lagging of the trend.
To illustrate the online regression algorithm, we look at the time series of mid prices for SPY and SSO, two highly related ETFs (SSO is the double-leveraged version of SPY). As shown in Figure 3, the relationship between the two assets seems very close to linear over the course of a day. Figure 4 plots the online mean and intercept for two values of alpha.
As indicated by its name, a one-pass algorithm reads every input variable exactly once and then discards it. This type of algorithm is very efficient in terms of memory handling, as it requires only a minimal amount of data to be stored in memory. Here, we present three important examples of online one-pass algorithms: the exponential moving average, the exponentially weighted variance, and the exponentially weighted regression. Each of these algorithms has been used in a HFT application in the previous section.
First, let's look briefly at the simple moving average of a time series. This is an estimate of the mean of a time series over a moving window of a fixed size. In finance, it is often used to detect trends in price, in particular by comparing two simple moving averages: one over a long window and one over a short window. In another application, the average traded volume over the past five minutes can serve as a prediction of the volume traded in the next minute. In contrast to the exponential moving average, the simple moving average cannot be solved with a one-pass algorithm.
Let (Xt)t = X0, X1, X2 ... be the observed sequence of input variables. At any given time t we want to predict the next outcome Xt+1. For M > 0 and t M, the simple moving average with window size M is defined as the average of the last M observations in the time series (Xt)tthat is, . The moving average can also be computed via the following recursion:
While this is an online algorithm, it is not a one-pass algorithm, as it needs to access every input data point exactly twice: once to add it to the moving average and then again to drop it out of the moving average estimate. Such an algorithm is also referred to as a two-pass algorithm and requires keeping an entire array of size M in memory.
Example 1: One-Pass Exponential Weighted Average. In contrast to the regular average , the exponential weighted average assigns an exponentially decreasing weight to older observations:
Here is a weighting parameter chosen by the user and needs to satisfy 0 < 1. As this exponential weighted average gives more importance to more recent input compared with older data points, it is often considered to be a good approximation of the simple moving average.
Compared with the simple moving average, the exponential weighted average takes all previous data into account and not just the last M observations. To compare the simple moving average and the exponential weighted average further, Figure 5 shows how many data points receive 80%, 90%, 95%, 99%, and 99.9% of the weight in the estimation as a function of . For example, if = 0.95, then the last M = 90 observed data points contribute to 99% of the estimated value. As a warning, if the time series (Xt)t has very heavy tails, then the exponential smoothed average might be dominated by an extreme observation, whereas the moving average is less prone to extreme observations as these eventually drop out of the observation window. Frequent restarting of the estimation procedure can solve this long-term memory effect of exponential smoothing.
The reason for favoring the exponential moving average over the simple moving average in HFT is it can be efficiently solved using a one-pass algorithm, initially introduced in Brown.3
This formula also provides a simple interpretation of the parameter as a control of how much weight is given to the most recent observation, compared with all previous observations.
Example 2: One-Pass Exponentially Weighted Variance. The exponential smoothing described in the previous section estimates a moving average of a time series. In finance, the volatility of a time series is often an important factor as well. Broadly speaking, volatility should capture how much a time series fluctuates around its mean. There is no widely accepted definition of volatility for high-frequency financial data. Here, we consider the volatility to be the standard deviation (square root of variance) of a data point in the time series (Xt)t. Similar to the exponentially weighted moving average discussed previously, an online one-pass algorithm can be constructed that estimates the volatility of the time series (Xt)t with an exponential weighting scheme.
The variance of a random variable is defined as Var (X) = E[(X E[X])2]. Estimating the exponential weighted variance of the time series requires two estimators: one that estimates the mean E[X] and one that estimates the variance as illustrated in Figure 6.
The standard deviation of the next measurement point Xt+1 is then estimated as . Again, the input parameter (0,1) needs to be chosen by the user and reflects how much weight is assigned to older data points compared with the latest observed data input. Here, we initialized the estimator of the variance with 1, which is a rather arbitrary choice. Another way is to have an initial "burn-in" period for which the time series (Xt)t is observed and a standard variance estimator of the series over this burn-in time window can be used to initialize the estimator. Of course, a similar method can be used to initialize the estimator of the exponentially weighted average estimator.
Example 3. One-Pass Algorithm for Exponentially Weighted Linear Regression. The last example is an online one-pass algorithm for the exponentially weighted linear regression model. This model is similar to ordinary linear regression, but again gives more importance (according to an exponential weighting) to recent observations than to older observations. As already shown, such regression methods are very useful in HFT strategies to estimate the relation of different assets, which can be, for example, exploited in creating pair trading strategies.
In this model we consider a two-dimensional time series (Xt, Yt)t and conjecture that the variables X and Y are related via a linear relation that is corrupted by a noise term t with zero mean. That is,
The variable Y is referred to as the response variable, whereas X is called the explanatory variable. For simplicity let's assume just one explanatory variable here, but the extensions to several explanatory variables are straightforward. In the standard offline approach to linear regression, the parameters 0 and 1 are calibrated after all the data points are observed. These data points are collected in a vector Y = (Y0, Y1, ..., + Yt)T, and a matrix
The column of ones in the matrix X corresponds to the intercept in equation 3. If we further write the parameters and 0 and 1 as a vectorthat is, = (0, 1)T then the relationship between Y and X can conveniently be written in matrix notation as
where is a vector of stochastic noise terms, and each of these error terms has zero mean.
The most common approach to estimating the parameter is using ordinary least squares estimationthat is, is chosen such that it minimizes the sum of squared residuals . The solution to this minimization problem is .
As in mean and variance estimations, more recent data points should be more important for the estimation of the parameter . Also a one-pass algorithm of is required for fast computation.
Next let's consider a recursive method that updates sequentially and minimizes
Again, the parameter needs to be in the range (0,1) and is chosen by the user. The parameters 0 and 1 of the weighted least squares estimation can be computed with an efficient online one-pass algorithm. At each step of the algorithm a 2 × 2 matrix Mt and a 2 × 1 vector Vt need to be saved in memory and updated with a new data point according to the following recursion:
As for the mean and variance estimator, the initialization of the recursion can be done using a burn-in period. Finally, after time t, the best estimate of is . In the literature this method is also called recursive least squares with exponential forgetting.2
Online one-pass algorithms are instrumental in HFT, where they receive large amounts of data every microsecond and must be able to act extremely fast on the received data. This article has addressed three problems that HFT algorithms face: the estimation of a running mean of liquidity, which can be useful in determining the size of an order that is likely to execute successfully on a particular electronic exchange; a running volatility estimation, which can help quantify the short-term risk of a position; and a running linear regression, which can be used in trading pairs of related assets. Online one-pass algorithms can help solve each of these problems.
4. Clark, C. Improving speed and transparency of market data. Exchanges, 2011; https://exchanges.nyx.com/cclark/improving-speed-and-transparency-market-data.
©2013 ACM 0001-0782/13/10
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and full citation on the first page. Copyright for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and/or fee. Request permission to publish from firstname.lastname@example.org or fax (212) 869-0481.
The Digital Library is published by the Association for Computing Machinery. Copyright © 2013 ACM, Inc.
The following letter was published in the Letters to the Editor of the February 2014 CACM (http://cacm.acm.org/magazines/2014/2/171678).
-- CACM Administrator
Jacob Loveless et al.'s article "Online Algorithms in High-Frequency Trading" (Oct. 2013) is an example of potentially valuable research misdirected. Ask any proponent of free-enterprise economics to explain its merits, and you will likely hear two themes: Profit motivates, and profit accrues by producing and selling valuable goods and services. The first buys the producer a bigger piece of the pie; the second increases the total size of the pie, thus raising, at least on average, the economic status of all. It works, most of the time, quite well.
Unfortunately, there are also many ways to profit while producing grossly inadequate, zero, or even negative economic value. Some of us are drawn to such schemes, so much so they work much more diligently at them than at a productive enterprise. To the extent this happens, free enterprise is undermined. Like printing counterfeit money, it works only if a minority does it, and even then, at the expense of everyone else.
Among the most serious such non-value-producing profit schemes is speculating in zero-sum derivative markets that produce no economic value at all, managing only to shuffle cash between winners and losers. Millisecond trading is just an escalation in vying for money this way. Even in financial markets like common stocks, where the original purpose is investment, and that do contribute to producing value, trading at sub-second time intervals is pure speculation or worse, as genuine investors could collectively be net losers to speculators. Putting effort into developing and using more successful speculation strategies is like going to a potluck dinner but bringing no food, just a bigger plate, while pushing more aggressively toward the front of the line.
Online and one-pass algorithm research can surely be redirected toward value-producing applications (such as robotics) where they can do more than just seize profits at someone else's expense.
Rodney M. Bates
Strong City, KS
Displaying 1 comment