The MPI community recently celebrated 25 years since the start of the MPI standardization effort. This early-1990s effort was due to the emergence of commodity clusters as a replacement to vector machines, in what was dubbed by Eugene Brooks as "The attack of the killer micros." Commodity clusters needed very different software than vector systems, and two efforts were started to satisfy this need: The first effort, developed by High Performance Fortran Forum, was HPF—a data parallel extension to Fortran 90 that would provide portability across vector, SIMD, and cluster systems. The more modest second effort, developed by the Message Passing Interface Forum, was MPI—a portable message-passing library aimed specifically at clusters.
The MPI effort succeeded beyond the dreams of the early forum members. Today, all large supercomputers are commodity clusters, all support MPI, and basically all large scientific application codes; as well as an increasing number of data analytics codes, use MPI. The same will be true for the coming generation of exascale systems.
Early competitors to MPI, including HPF, have disappeared. This success has multiple reasons: Some good choices made in the MPI design, the relative ease of its implemention, the early availability of high-quality implementations, the confidence that an MPI library will continue to be available on future HPC systems, and the malleability of a library solution that can support multiple programming styles.
One critical cause of this success has been the continued evolution of the MPI specification, in support of evolving architectures and application needs: The MPI 1.1 specification, released in June 1995, was a document of 231 pages describing 128 functions; the MPI 3.1 specification, released June 2015, is an 836-page document describing 451 functions. Over time, MPI came to accommodate threads, parallel I/O, and an extensive set of collective operations, including non-blocking ones.
The following paper convincingly shows that the potential of MPI one-sided communication can be realized.
One major extension to MPI has been the introduction of one-sided communication, first in MPI 2.0, and then, with major additions, in MPI 3.0. The main communication paradigm for MPI point-to-point communication has been two-sided communication, where a send call at the source is matched by a receive call at the destination. This paradigm has weaknesses: The complex matching rules of sends to receives result in significant software overheads, especially for receive operations; overlap of communication and computation requires the presence of an asynchronous communication agent that can poll queues concurrently with ongoing computation; and send-receive communication either requires an extra copying of messages (eager protocol) or extra handshakes between sender and receiver (rendezvous protocol).
One-sided communication requires the involvement of only one process: the source process (for Put) or the destination process (for Get). This already enables a significant reduction of software overheads. It requires the involved process to provide the location of both the local and remote communication buffers; this is rarely a problem since the same association between local and remote buffer tends to be reused multiple times. It separates between communication and synchronization as only one of the two communicating processes will know the communication occurred; this is often an advantage as one synchronization can cover multiple communications. Most importantly, one-sided communication, especially Put, is a very good match to the capabilities of modern Network Interface Controllers (NICs): They very often support remote direct memory access (rDMA) operations whereby local and remote NICs collaborate in copying data from local memory to remote memory with no software involvement, aside from the call that initiates the transfer at the source node. Therefore, one-sided communication has the potential to significantly reduce the software overheads for communication.
This is extremely important as the next generation of networks and NICs will have the capability of handling tens or hundreds of millions of messages per second: With current communication protocols, this would mean that tens of GigaOps would be consumed by communication.
The following paper convincingly shows that the potential of MPI onesided communication can be realized. It provides both a general framework for the efficient implementation of MPI one-sided communication on modern architectures, and an experimental proof that such an implementation can significantly reduce communication overheads and improve the performance of large-scale applications. The paper is timely and important for two reasons: First, users tend to avoid new features in MPI (or other software) unless they have a convincing proof of their advantages and a solid implementation; the paper provides such a proof and provides guidance for new releases of the MPI library. Second, hardware vendors are often focused on optimizing their future systems for past applications; NIC designers are focused on accelerating two-sided communication as it is currently the main communication paradigm. The paper provides a timely warning that more attention must be devoted to one-sided communication.
To view the accompanying paper, visit doi.acm.org/10.1145/3264413
The Digital Library is published by the Association for Computing Machinery. Copyright © 2018 ACM, Inc.
No entries found