Opinion
Computing Applications Last byte

Q&A: Gray’s Paradigm

Tony Hey talks about Jim Gray and his vision of a new era of collaborative, data-intensive science.
Posted
  1. Article
  2. Author
  3. Footnotes
  4. Figures
Microsoft Corporate Vice President Tony Hey
Tony Hey speaking at the ninth annual Microsoft Research Faculty Summit, which brought together 400 academics from 150 universities across five continents.

Tony Hey, vice president of the External Research Division of Microsoft Research, has long straddled the scientific and computational worlds. Hey began his career as a particle physicist at the University of Southampton before changing fields and serving as head of its School of Electronics and Computer Science. Prior to his appointment at Microsoft, Hey served as director of the United Kingdom’s e-Science Program, where he worked to develop technologies to enable collaborative, multidisciplinary, and data-intensive science. Here, he talks about a book of essays he co-authored, The Fourth Paradigm, which commemorates the work of his late colleague Jim Gray and points the way to a new era of scientific collaboration

The title of your book, The Fourth Paradigm, refers to the idea that we need new tools to cope with the explosion of data in the experimental sciences.

Jim Gray’s insight was that experimental science and theoretical science have been with us since Newton, and over the last 50 years, computational science has matured as a methodology for scientific research. Jim thought that we are now seeing the emergence of a fourth paradigm for scientific research, namely data-intensive science. For this, researchers need a different set of skills from those required for experimental, theoretical, and computational science.

Different skill sets such as?

For data-intensive science, researchers need a totally new set of skills such as an understanding of data mining, data cleansing, data visualization, and how relational databases work. The new data-intensive research paradigm does not replace the other ones—it’s quite clear that data-intensive science uses both theory and computation.

How did you come to be involved in this line of research?

I first met Jim Gray in 2001, when I was running the U.K.’s e-Science Program. In discussions with Jim over the next five years, I came to agree with his view that the computer science community can really make a difference to scientists who are trying to solve difficult problems.

And that’s an idea you carry on in your work with Microsoft?

Indeed. Computer science has powerful technologies it can offer scientists, but also things it can learn from tackling some of the difficult scientific challenges. So I really have a wonderful job, working both with great computer scientists and with great scientists.

The essays in The Fourth Paradigm focus on new research in areas like environmental science, health, infrastructure, and communication.

There are important problems facing the world that we need to solve. The book is a call to arms to the scientific and computer science community.

It’s also a great testament to what can happen when scientists and computer scientists collaborate.

Yes. One of the tools we have produced in a project with the Berkeley Water Center is called SciScope. The researchers have been looking at the hydrology of the Russian River Valley in California, in which the patterns of use have completely changed over the last 50 years. Trees have been chopped down, rivers have been dammed, houses have been built, and all those sorts of things. The U.S. Geological Survey has stream data going back many years, but if you want to combine it with the rainfall data over the same period, that’s held by National Oceanic and Atmospheric Administration, a different government agency.

So SciScope enables you to combine the two data sets.

You can add your own data and do new research. It’s an example of what I call “scientific mashups,” and it is, I think, increasingly how much research will be done in some fields. It’s a little like Tim Berners-Lee’s vision of the Semantic Web, but in a scientific context.

Astronomy is another field that has benefited from computer science.

The Sloan Digital Sky Survey changed everything, because it generated a high-resolution survey of 25% of the night sky. So, instead of an astronomer getting time on a telescope to look at a particular star system, going back to the university, analyzing the data, and publishing the results with one or two grad students, you’ve now got data on more than 300 million celestial objects available to study. In this case, the data is published before any detailed analysis has been done.

Gray was instrumental in building online databases to house the Sloan Digital Sky Survey data.

Jim and Alex Szalay also thought they could apply the same sort of infrastructure to a sensor network, so we built a sensor network in the grounds of Johns Hopkins University to investigate soil science. The exciting thing is that a similar sensor network is now being deployed in Latin America, in the Atlantic rainforest near São Paulo.


We are now seeing the emergence of a fourth paradigm for scientific research, namely data-intensive science.


What have these projects taught you about fostering meaningful collaboration between the scientific and computer science communities?

I’ve come to the conclusion that you cannot force scientists to adopt a technology no matter how useful you think it would be for them! You have to get as close to their way of working as possible and give them an immediate win. You can’t say, “Go climb this cliff, and at the top there’s a reward.” So you need to form a partnership where there’s an early win for the scientist and a win for you in that they’re using at least some of your great research technology, suitably packaged to be usable by scientists.

What sort of reception has The Fourth Paradigm received?

It’s been very complimentary, which is gratifying, and there’s been a huge explosion on Twitter and in the blogo-sphere. We’re working on ideas for a follow-up, and I’m working with the National Science Foundation’s Advisory Committee on Cyberinfrastructure on a data task force. It would be premature to say we know exactly what people need, since that’s what the scientific community has to tell us. We haven’t got there yet, and that’s one of the reasons why it’s a very exciting time in science and computer science.

Back to Top

Back to Top

Back to Top

Figures

UF1 Figure. Tony Hey speaking at the ninth annual Microsoft Research Faculty Summit, which brought together 400 academics from 150 universities across five continents.

Back to top

Join the Discussion (0)

Become a Member or Sign In to Post a Comment

The Latest from CACM

Shape the Future of Computing

ACM encourages its members to take a direct hand in shaping the future of the association. There are more ways than ever to get involved.

Get Involved

Communications of the ACM (CACM) is now a fully Open Access publication.

By opening CACM to the world, we hope to increase engagement among the broader computer science community and encourage non-members to discover the rich resources ACM has to offer.

Learn More