Daniel A. Reed discusses "Exascale Computing and Big Data" (cacm.acm.org/magazines/2015/7/188732), a Contributed Article co-authored by Jack Dongarra in the July 2015 Communications of the ACM.
---
SCRIPT
00:00 Data management; computer power.
00:10 Each has developed over the years, following dances called by two distinct and separate computer science disciplines.
00:19 Data sources now exceed a quintillion bytes in size, while supercomputers strive to perform a quintillion operations per second.
00:28 This is exascale computing. It may at last bring the two together, as supercomputers process unprecedented data wealth, and data stores teach supercomputers how to think.
00:43 Join us as we talk with ACM Fellow Daniel Reed about Exascale Computing and Big Data.
00:51 [Intro graphics/music]
01:01 Dan Reed is no stranger to explosive growth. He was present at the birth of the first web browser as a Professor at the University of Illinois.
01:12 Now from his position at the University of Iowa, he's witness to the twin explosions of data management and supercomputer power.
01:21 I went to Microsoft actually to apply supercomputing architecture ideas to the design of burgeoning cloud datacenters. And then part of why I came back to academia was to try to take some lessons from cloud computing and bring them back into supercomputing.
01:35 Although data size and computing power have grown alongside each other, it's unusual for the two to be examined together.
01:44 The hardware that both communities are using is very similar. ... But the software ecosystems are almost completely disjoint.
01:53 Leading-edge supercomputing has traditionally been the province of scientific and academic fields.
01:59 In research circles, often capital is expensive; the people are relatively inexpensive.
02:06 But big data has been more widely used by companies like Google, which spends as much as a billion dollars per data center.
02:13 In the private sector, the opposite is true. People are very expensive, capital is cheap. And so in each case, they make a different optimization based on the economics that drive their behavior.
02:26 Now in a paper co-written with Jack Dongarra, Dr. Reed says there are reasons to bring the two together. For one thing, science applications need to handle more data than ever before.
02:37 One of the great things that's happened in science has been the explosive growth of experimental data. ... We've gone from a world where data was pretty rare to one where it's extraordinarily plentiful. And so how you build higher-level tools to understand that data is one of those places where insights from the cloud computing world are really relevant.
02:57 On the commercial side, the demands of such applications as voice recognition and video processing require ever more supercomputing power.
03:06 Likewise, people who are in business analytics or the commercial sector are increasingly building very sophisticated computational models, and they need insights and ideas from the supercomputing community. But the people don't talk to each other much because of this software divergence.
03:23 But as the scientific and business communities push the exascale boundary, there's a need to grow together. And the real winners may be you and me.
03:33 Success for exascale computing will not just be building an exascale machine. It will be inexpensive desk-side petascale machines. In the same way the success of petascale computing has been in desk-side terascale machines.
03:49 But to make that happen, exascale data and exascale computing must first learn to dance... together.
03:57 Find out more in this month's Communications of the ACM, in the contributed article, "Exascale Computing and Big Data."
04:07 [Outro and credits]