Opinion
Architecture and Hardware Viewpoint

From Terabytes to Insights

For scientists and engineers tapping the NSF's high-performance cyberinfrastructure, the path to wisdom follows a route both miraculous and familiar.
Posted
  1. Introduction
  2. Empowering the Best Minds
  3. Biocomplexity and Life on Earth
  4. Only Steps
  5. Author

One of my favorite yardsticks of wisdom comes from Ralph Waldo Emerson over a century ago. "The invariable mark of wisdom," he wrote, "is to see the miraculous in the common."

As scientists, engineers, and educators, we are privileged to have our lives infused with the miraculous. Discovery, learning, and innovation are paths we travel daily. Many of us have seen our work transformed in unimagined ways by the power and breadth of the information and communications revolution.

I used an IBM 650 for my Ph.D. research (early 1960s) to classify marine bacteria, writing a program to handle what we considered a large amount of data gathered on several hundred bacterial cultures. My research on the environmental factors that converge to cause cholera has traveled many miles since the IBM 650—to the sequencing of the organism that causes cholera, to handling vast amounts of climate data gathered by satellites, to easy communication with colleagues around the world, particularly those working in countries where cholera remains a deadly scourge. While the IBM 650 is now literally a museum piece at the Smithsonian Institution, the changes borne of the information age have transformed the way infectious diseases are understood and revealed new prospects for ameliorating their lethal consequences.

The National Science Foundation has long partnered with the science and engineering community to support frontier research in computing and network science and educate the next generation of scientists and engineers. We supported campus computing centers in the 1960s and computational science in the 1970s. The first supercomputer centers and networks linking researchers came on the scene in the 1980s. They were followed by the Partnerships for Advanced Computational Infrastructure (PACI) in the 1990s.

This decade has brought the Terascale and Grid initiatives, and we have set our sights on distributed resources united as the most advanced computing facility available for all types of research by scientists in the U.S. It is a step toward a cyberinfrastructure of high-performance computing, high-bandwidth networks, very large data stores, and sophisticated tools for knowledge discovery.

Teams of researchers in every field of science and engineering are laying the foundations for a cyberinfrastructure revolution, and NSF is planning to launch a related effort to address these growing research and education needs. We thus chartered (May 2001) the Advisory Committee on Cyberinfrastructure to consult with the science and engineering community; its final report (issued in February 2003) recommends a course of sustained cyberinfrastructure for all of science and engineering in the coming years. NSF’s PACI partnerships and Terascale facilities have laid the groundwork for and built a community eager to carry forward this initiative. A great challenge today is to sustain the momentum of discovery and realize the progress our new tools promise.

Back to Top

Empowering the Best Minds

More than any other field, astronomy and physics already benefit from the supercomputing revolution. For example, the Laser Interferometry Gravity-Wave Observatory (LIGO) is designed to search for gravity waves produced by colliding black holes and collapsing supernovae. LIGO will join with other gravity-wave observatories around the world to become more than the sum of its parts. The Grid Physics Network is joining LIGO and the Sloan Digital Sky Survey with the Large Hadron Collider at the European Organisation for Nuclear Research (CERN, the European accelerator laboratory) to form a computational and communications grid linking resources from the U.S. and Europe. Similarly, the National Virtual Observatory (NVO) now brings data from all wavelengths and from ground and space-based telescopes to an international community of astronomers. An NVO prototype has even produced an early and unexpected payoff: a new instance of a difficult-to-find type of star known as a brown dwarf. Such a virtual observatory will ultimately change the way all kinds of astronomy is done.


Completing the human genome project might have taken years or decades longer without the terascale power of our newest computers.


While databases in astronomy and physics are orders of magnitude larger than those in other fields, planned instruments and observational platforms will boost data collection from many disciplines in the years to come.

In the Network for Earthquake Engineering Simulation, researchers from across the U.S. will study how building design, advanced materials, and other measures can minimize earthquake damage and loss of life. They will operate equipment and observe experiments from anywhere on the Net. The first test of the requisite Grid technology (conducted November 2002) involved a shake table vibrating a model bridge fitted with about 100 sensors and streamed video and data to watching engineers, who then analyzed the bridge’s performance.

In the life sciences, these same new information and communication tools, combined with advances in molecular biology, fueled the second great scientific revolution of the last century: genomics. Scaling from the tiny genome of the first bacterium sequenced, Haemophilus influenzae, with 1.8 million base pairs, to the 3.12 billion base pairs of the human genome was a leap of enormous computational and scientific complexity. Completing the human genome project might have taken years or decades longer without the terascale power of our newest computers.

We have completed the sequencing of scores of organisms, from many of the microorganisms that cause human disease, including the parasite that causes malaria, to the tiny Arabidopsis thaliana, which serves as a model for plant research, as well as rice and the laboratory mouse. Sequencing is under way on a host of other organisms and on many of the world’s major food crops.

A current challenge is to describe gene function and unravel the structure and function of proteins. It can take from 20 milliseconds to several seconds for a nascent protein to fold into its functional conformation. Until recently, it took 40 months of computer time to simulate such folding. Terascale computer systems have reduced the time to one day. However, even at today’s speeds, understanding the function of each protein requires many of the best minds in the world and advanced cyberinfrastructure to empower them.

Back to Top

Biocomplexity and Life on Earth

The combination of computing, communications, and genomics has also transformed our understanding of the diversity of life on Earth and its evolution. Cyberinfrastructure is needed here to plot the intricate relationships among organisms. We don’t know what’s out there. The total number of species may be between 10 million and 100 million. Only about 1.7 million of them are known, and only about 50,000 have been described in any detail in the scientific literature. With our new tools, we can, for the first time, envision tracing the phylogenetic relationships among all organisms. The tree of life is the baseline against which we will measure how organisms—including ourselves—interact and respond to change.

In this context, the NSF’s planned National Ecological Observation Network will be invaluable in tracking environmental change, from the microbial to the global. Today, we do not have the capability to answer ecological questions on a continental or even regional scale, whether they involve invasive species that threaten agriculture, the spread of disease, or agents of bioterrorism.

I use the term "biocomplexity" to describe the dynamic web of relationships arising when living things interact with their environment. A biocomplexity perspective can reveal to us all sorts of surprising connections and offer great potential for insight. A robust, flexible, comprehensive cyberinfrastructure will help us achieve this perspective.

The following example is just a hint of the power of biocomplexity studies. Erich Jarvis of Duke University Medical Center, the 2002 NSF Waterman Award winner, is investigating the neurobiology of vocal communication in songbirds to determine how vocal learning and associated brain structures evolved. Vocal learning, or the ability to imitate sounds, is present in only three groups of birds—parrots, hummingbirds, and songbirds—and three groups of mammals—bats, whales, and humans. Evidence suggests that vocal learning evolved independently in all six groups and is therefore accompanied by anatomically distinct patterns of gene expression. Jarvis hopes to model how the brain generates, perceives, and learns behavior by unraveling this puzzle. This could advance our knowledge of brain dysfunction, the evolution of intelligence, and how humans learn language. His work integrates behavioral, anatomical, electrophysiological, molecular biological, and bioinformatics techniques.

Back to Top

Only Steps

Ultimately, gaining insights from terabytes will speed the application of new knowledge to domestic and global problems. However, data, computing speed, and networks are only steps on the path to wisdom; they do not constitute wisdom. We understand now that changes in global climate cannot be understood without taking into account the way humans’ personal and institutional actions affect the atmosphere, the oceans, and the land. We know that providing a secure homeland will increasingly depend on understanding other cultures—their ideas and attitudes—as well as on advancing cybersecurity and antidotes to biological and chemical threats.

The greatest question may be how we can avoid the pitfalls and still grasp the opportunities in science and technology. Our view of vast physical, as well as personal, differences is shrinking; every part of the globe will soon seem as close as our own backyard. We need to keep our eyes on that future and plan now for the time when we are all each other’s next-door neighbors. That view of our place in the cosmos will define science and engineering for a 21st-century society. Cyberinfrastructure will help take us there and beyond.

Back to Top

Join the Discussion (0)

Become a Member or Sign In to Post a Comment

The Latest from CACM

Shape the Future of Computing

ACM encourages its members to take a direct hand in shaping the future of the association. There are more ways than ever to get involved.

Get Involved

Communications of the ACM (CACM) is now a fully Open Access publication.

By opening CACM to the world, we hope to increase engagement among the broader computer science community and encourage non-members to discover the rich resources ACM has to offer.

Learn More