Sign In

Communications of the ACM

Research highlights

Exploiting Vector Instructions with Generalized Stream Fusion

Exploiting Vector Instructions with Generalized Stream Fusion, illustration


Ideally, a program written as a composition of concise, self-contained components should perform as well as the equivalent hand-written version where the functionality of what was many components has been manually combined into a monolithic implementation. That is, programmers should not have to sacrifice code clarity or good software engineering practices to obtain performance—we want compositionality without a performance penalty. This work shows how to attain this goal for high-level Haskell in the domain of sequence-processing functions, which includes applications such as array processing.

Prior work on stream fusion3 shows how to automatically transform some high-level sequence-processing functions into efficient implementations. It has been used to great effect in Haskell libraries for manipulating byte arrays, Unicode text, and unboxed vectors. However some operations, like vector append, do not perform well within the stream fusion framework. Others, like SIMD computation using the SSE and AVX instructions available on modern x86 chips, do not seem to fit in the stream fusion framework at all. We describe generalized stream fusion, which solves these issues through a careful choice of stream representation. Benchmarks show that high-level Haskell code written using our compiler and libraries can produce code that is faster than both compiler- and hand-vectorized C.

Back to Top

1. Introduction

It seems unreasonable to ask a compiler to be able to turn numeric algorithms expressed as high-level Haskell code into tight machine code. The compiler must cope with boxed numeric types, handle lazy evaluation, and eliminate intermediate data structures. However the Glasgow Haskell Compiler has become "sufficiently smart" that, in many domains, Haskell libraries for expressing numerical computations no longer have to sacrifice speed at the altar of abstraction.

The key development that made this sacrifice unnecessary is stream fusion.3 Algorithms over sequences—whether they are lists or vectors (arrays)—are expressed naturally in a functional language using operations such as folds, maps, and zips. Although highly modular, these operations produce unnecessary intermediate structures that lead to inefficient code. Eliminating these intermediate structures is termed deforestation, or fusion. Equational laws, such as map f ○ map g ≡ map (f ○ g), allow some of these intermediate structures to be eliminated; finding more general rules has been the subject of a great deal of research.


No entries found

Log in to Read the Full Article

Sign In

Sign in using your ACM Web Account username and password to access premium content if you are an ACM member, Communications subscriber or Digital Library subscriber.

Need Access?

Please select one of the options below for access to premium content and features.

Create a Web Account

If you are already an ACM member, Communications subscriber, or Digital Library subscriber, please set up a web account to access premium content on this site.

Join the ACM

Become a member to take full advantage of ACM's outstanding computing information resources, networking opportunities, and other benefits.

Subscribe to Communications of the ACM Magazine

Get full access to 50+ years of CACM content and receive the print version of the magazine monthly.

Purchase the Article

Non-members can purchase this article or a copy of the magazine in which it appears.
Sign In for Full Access
» Forgot Password? » Create an ACM Web Account