Weathering a New Era of Big Data

visualization depicting atmospheric humidity — A visualization of data from the NASA Center for Climate Simulation, a state-of-the-art supercomputing facility in Greenbelt, MD, that runs complex models to help scientists better understand global climate. This visualization depicts atmospheric humidity

Throughout history, mankind has attempted to gain a better understanding of weather and forecast it more accurately. From ancient observations about wind direction, cloud formations, and barometric pressure to more recent attempts to accumulate data from satellites, sensors, and other sources, weather forecasting has both fascinated and infuriated everyone from picnickers to farmers and emergency responders. It is, in a word, unpredictable.

Yet over the last two decades, thanks to increasingly powerful computers, big data, and more sophisticated modeling and simulations, weather forecasting has been steadily moving forward. Amid growing concerns about global warming and more volatile weather and climate patterns, researchers are attempting to develop better algorithms and systems. Says Cliff Mass, professor of atmospheric science at the University of Washington, “Numerical data is the core technology of weather prediction. Everything is dependent upon it.”

Moreover, the stakes continue to grow. There is mounting concern that not all weather models are created equal. In some cases, European and American forecasting methods lead to significantly different predictions—and noticeably different results. This includes predicting the impact of snowstorms in the northeast U.S. in the winter of 2013–2014 and the effects of Hurricane Sandy on that region in 2012.

Meanwhile, private companies such as IBM are entering the picture and introducing entirely different tools and methods for forecasting weather-related events.

Says Lloyd Treinish, an IBM Distinguished Engineer at the IBM Thomas J. Watson Research Center, “The history of weather forecasting and the history of computing have been very tightly coupled. Since the 1940s, revolutions in computing have been very closely tied to weather forecasting and building better equations and models. Over the last couple of decades, we have seen steady improvements in computing, sensor technology, an understanding of the atmosphere, and the overall science. Over time, we are learning how to put all the pieces to work.”

A Clearer View

The basis for understanding weather in a more systematic way dates back more than a century, to a time when scientists were beginning to examine the physics of the atmosphere and were first applying numerical methods to understand extraordinarily complex physical processes. By the 1950s, forecasters had begun using mainframe computers to build weather models, moving beyond field observations and telegraph reports from the field. By the 1960s, satellites and sensors from automatic stations, aircraft, ships, weather balloons, and drifting ocean buoys entered the picture; they created entirely new and powerful ways to collect data, so it is possible to better understand the mechanisms associated with weather and climate.

Along the way, advances in technology and modeling have led to remarkable improvements—although, as Mass notes, “We are constantly limited by computer power.” It is a concern echoed by Ben Kyger, director of central operations for the National Centers for Environmental Prediction at the U.S. National Oceanic and Atmospheric Administration (NOAA). “Scientists increase the grid resolution to take advantage of the available processing power. Higher resolutions mean that scientists can develop more accurate forecasts that extend further out in time.”

Advances in technology and modeling have led to remarkable improvements, although “we are constantly limited by computer power.”

Today, the most powerful weather computers rely on hundreds of thousands of processors and, in many cases, millions of data points. The National Weather Service (NWS) in the U.S. currently relies on a pair of supercomputers with over 200 teraflops (a teraflop equals one trillion floating-point operations per second) of capacity. By way of comparison, China’s Tianhe-2 (“Milky Way 2”) supercomputer, which topped the June 2014 Top500 supercomputer rankings, delivers performance of up to 33.86 petaflops per second (since a petaflop is equal to 1,000 teraflops, Tianhe-2 provides nearly 170 times more raw processing power than the NWS has available to it).

Meanwhile, the Korean Meteorological Administration in South Korea is expanding its computing storage capacity to 9.3 petabytes in order to better predict weather events, including typhoons. The European Centre for Medium-Range Weather Forecasts (ECMWF) processes 300 million observations on a daily basis, producing about 65 terabytes of forecasts every day, with peaks of 100 terabytes. ECMWF’s archive holds 65 petabytes of data, and it is growing at rate of approximately 50% annually, says software strategist Baudouin Raoult.

Interestingly, weather forecasting organizations worldwide rely on much of the same data derived from many of the same sources, Mass points out. Differences in forecasts typically revolve around the ways in which mathematicians and researchers approach statistical processing and how they average and round off numbers. In addition, “Web (forecasting services) obtain weather data from various sources and apply it in different ways,” he explains. “The result is different forecasts but the underlying modeling is much less variable.”

Kyger says current NWS models are more than 60% accurate beyond five days—generally considered a “skillful forecast” benchmark. Yet, because the physics and dynamics of the atmosphere are not directly proportional to expanding a grid resolution, it is not possible to rely on a static model or linear equation to extrapolate data. In fact, with every hardware upgrade, it can take up to a year to fine-tune a new model to the point where it outperforms an existing model. “At one point, a skillful forecast was only a day or two. The improvements are very gradual year over year, but they add up to the point where significant improvements take place,” he explains.

Peter Bauer, head of ECMWF’s Model Division for the European Centre, says predictive skills have improved to the point where researchers are witnessing about a one-day-per-decade improvement rate in forecasting. This means that today’s six-day forecasts are about on par with the accuracy of five-day forecasts a decade ago. “In addition to extending the range and accuracy of large-scale forecasts, the techniques for predicting regional and local weather parameters such as precipitation, surface temperature, and wind have dramatically improved,” he points out.

The Sky’s the Limit

In practical terms, even a small improvement in forecasting quality can produce enormous benefits for individuals, businesses, and society, from providing warnings for short-term events such as tornados and floods to long-term issues such as how to construct buildings and design infrastructure. For instance, before Hurricane Sandy slammed the northeastern U.S. in October 2012, the ECMWF had successfully predicted the storm track and intensity of the event five days out, while NWS models lagged by about a day. The deviation in modeling focused attention on the perceived deficiencies of the NWS.

Kyger acknowledges the episode was a “disappointment” for the NWS. This led, in May 2013, to the U.S. Congress approving $23.7 million in supplemental funding to upgrade NWS systems from 90 teraflops to upward of 200 teraflops, as well as addressing other issues. However, U.S. forecasting technology continues to generate concerns. “There have been a number of important forecasts where U.S. prediction systems performed in an inferior way,” Mass says. A recent blog posted by Mass stated that the U.S. had slipped into fourth place in global weather prediction, behind facilities in continental Europe, the U.K., and Canada.

“A major reason why the U.S. is falling behind is that the other centers are using far more advanced data assimilation or higher resolution, both of which require very substantial computer power, which the U.S. National Weather Service has been lacking,” Mass explains. Over the last decade, Congress has not provided adequate funding to keep up with the fast-moving computing and data environment; “for a very modest cost, the United States could radically improve weather prediction,” he says.

The upgrades to the NOAA supercomputers completed in August 2013 were part of the first phase of a two-step plan to increase its available processing power. Early results show that, in some cases, a 15% forecasting improvement has resulted. The computing power will increase to 1,950 teraflops in 2015, if current funding stays in place. NOAA operates the systems as a private cloud that is scalable. It uses these resources across agencies and tasks, and utilizes capacity in the 90%-plus range. Kyger says a cluster or grid approach that extends beyond NOAA is not feasible, for financial and practical reasons.

Meanwhile, the ECMWF is continuing to refine and improve its forecasting model. Moving forward, Bauer says, the Centre is attempting to focus on the environmental system in a far more comprehensive way, in order to gain a better understanding of key factors impacting weather, including greenhouse gasses, ocean temperatures, and sea ice. “The better the observations and the more critical data points we have, the better the mathematical methods,” he explains. “More important in the future will be the prediction of extremes, which places a greater emphasis on predicting the right probabilities of events and doing so in a longer time range.”

In the end, the quest for more accurate weather forecasts leads back to the need for more computing power and the development of better algorithms.

IBM’s Deep Thunder initiative is further redefining the space. It has predicted snowfall accumulations in New York City and rainfall levels in Rio de Janeiro with upward of 90% accuracy by taking a somewhat unconventional approach. “We are not looking to use the traditional method of covering a large area with as high a resolution as possible using uniform information,” Treinish says. “We are putting bounds on the computing problem by creating targeted forecasts for particular areas.” As part of the initiative, IBM plugs in additional types of data sources—including agricultural measurements and wind farm data—and manipulates existing sources in different ways.

In fact, as the Internet of Things (IoT) takes hold, new types of sensors and crowdsourcing techniques will appear, and will further redefine weather forecasting. Kyger says the NWS has already started to experiment with crowdsourcing and other social media input, including data from hyperlocal Twitter accounts. Treinish believes smartphones and other devices could provide insights into everything from temperature and wind conditions to barometric pressure and humidity on a block-by-block level. The challenge, he says, is that the massive amount of data can be “really noisy and not of real high quality.”

Adding to the challenge, the IoT will collect far more data, but at the same time will further tax existing and already constrained supercomputers.

In the end, the quest for more accurate forecasts leads back to the need for more computing power and the development of better algorithms; that, in turn, drives the need for even more powerful computers. There is an ongoing need for adequate funding and additional IT resources; it also is critical to continually upgrade models using an assortment of statistical and regression analysis techniques, combined with human analysis and judgement.

“The goal,” Kyger says, “is to continually look for things that we can do better. It’s a closed loop cycle that never ends.”

Figures

Figure. A visualization of data from the NASA Center for Climate Simulation, a state-of-the-art supercomputing facility in Greenbelt, MD, that runs complex models to help scientists better understand global climate. This visualization depicts atmospheric humidity during the Great Mississippi and Missouri Rivers Flood of 1993.

A Clearer View

The Sky’s the Limit

Further Reading

Figures

Weathering a New Era of Big Data

DOI

September 2014 Issue

Join the Discussion (0)

Become a Member or Sign In to Post a Comment

The Latest from CACM

Shape the Future of Computing

Communications of the ACM (CACM) is now a fully Open Access publication.

A Clearer View

The Sky’s the Limit

Further Reading

Figures

Weathering a New Era of Big Data

DOI

September 2014 Issue

Related Reading

Join the Discussion (0)

Become a Member or Sign In to Post a Comment

Shape the Future of Computing

Communications of the ACM (CACM) is now a fully Open Access publication.