"If people sat outside and looked at the stars each night, I'll bet they'd live a lot differently."Bill Watterson, Calvin and Hobbes
Who has not been in awe and had their breath taken away while gazing at the star-studded night sky? Looking into infinity helps us realize the universe is more vast than we could ever contemplate and helps put our daily lives in perspective to a much larger cosmos.
Dark energy and dark matter, the observable universe and its beginnings, the structure and formation of galaxies and starsthese are just some of the topics that computational cosmologists work on in order to understand the universe. As the universe expanded and temperatures cooled after the early hot phase, the present complex structure of the universe started to form. The expansion of the universe is now accelerating due to a mysterious dark energy. The mass in the universe is dominated by dark matter; galaxies and stars are located within extended clumps of this form of matter, whose ultimate nature is unknown.
Modern cosmological simulations of ever-higher accuracy and larger complexity have an insatiable appetite for computing resourcesprocessors, memory, and high-performance networks. Simulating the universe will stretch the capabilities of the most advanced supercomputers for the foreseeable future. This demand is driven by the desire for increased realism and accuracy, and by the large amount of simulated and observed data that must be jointly analyzed.
The following paper describes the Hardware/Hybrid Accelerated Cosmology Code (HACC) framework. The architecture of the framework is organized around particle and grid methods. HACC uses a novel algorithmic structure to map code onto multiple supercomputer platforms, built around different node-level architectures, whether heterogeneous CPU/GPU or multi/many-core systems.
During the last decade, the manner in which computers are built has changed drastically. Hardware evolution embraced disruptive change as increasing clock frequency was no longer a viable path to increasing performance. Before, the approach to achieve increased performance was to follow Dennard scaling. Transistor sizes would be scaled down and a smaller feature size would result in higher circuit operational frequency. Computer chips would run at higher frequency, and would deliver higher performance.
The frequency increase of computer chips stopped due to power density constraints; at the same time shrinking transistor sizes also hit a wall due to the eventual constraint set by the atomic nature of matter. Thus, computer architects in pursuit of building high-performance computer systems turned toward increasing the number of cores, rather than increasing the performance of each core. In this brave new world, high performance was to be achieved through extreme parallelism rather than high frequency. The most energy-efficient approach to reaching maximum performance is to increase parallelism using efficient low-frequency processors. For example, using a large number of very simple processors with limited capabilitiesacceleratorsis very energy efficient. The ensuing revolution changed how computers are built. Accelerators are now common, making computing inexpensive, while memory and communication remain relatively more expensive, thereby changing the balance of memory and communication per compute unit.
Revolutionary new architectures appeared in the middle of the last decadesuch as the Roadrunner supercomputer with IBM's Cell processor chip, the first supercomputer to cross the petaflop barrier. The Cell chip introduced an asymmetric architecture, containing a general-purpose Power core connected to a number of simple accelerators. This new architecture at the time required a different programming approach, and scientistsincluding members of the HACC teamstarted rewriting their code in order to handle architectural diversity.
After Roadrunner, the team ran their code on two very different machines: on the BlueGene/Qthe third generation of the IBM Blue Gene supercomputer running on more than 1.5 million compute cores, and on Cray's Titan, a hybrid Cray XK7 system with GPU accelerators. HACC demonstrated high performance and good scaling on these two different architectures.
The evolution of hardware architecture poses a number of challenges for software developers, and particularly for scientific code developers. These are typically small communities that maintain their codes over many years. As access to supercomputers is typically limited, granted time on these machines needs to be spent wisely, by running performance-optimized codes. This puts a requirement on domain scientists to adapt and optimize their code for the target machine.
In order to adapt their codes to new machines, the scientists must understand all levels of system architecture of the target machine. The authors explore this topic. How does one code physical models for the vast variety of supercomputers, for very different architecturesarchitectures with or without accelerators, and with very different ratios of computing/memory/networking? And, maybe most importantly, how to make that code be both portable between these very different architectures, and execute with high performance on all of them?
To view the accompanying paper, visit doi.acm.org/10.1145/3015569
The Digital Library is published by the Association for Computing Machinery. Copyright © 2017 ACM, Inc.
No entries found