Advertisement

Research and Advances

Micropipelines

The pipeline processor is a common paradigm for very high speed computing machinery. Pipeline processors provide high speed because their separate stages can operate concurrently, much as different people on a manufacturing assembly line work concurrently on material passing down the line. Although the concurrency of pipeline processors makes their design a demanding task, they can be found in graphics processors, in signal processing devices, in integrated circuit components for doing arithmetic, and in the instruction interpretation units and arithmetic operations of general purpose computing machinery. Because I plan to describe a variety of pipeline processors, I will start by suggesting names for their various forms. Pipeline processors, or more simply just pipelines, operate on data as it passes along them. The latency of a pipeline is a measure of how long it takes a single data value to pass through it. The throughput rate of a pipeline is a measure of how many data values can pass through it per unit time. Pipelines both store and process data; the storage elements and processing logic in them alternate along their length. I will describe pipelines in their complete form later, but first I will focus on their storage elements alone, stripping away all processing logic. Stripped of all processing logic, any pipeline acts like a series of storage elements through which data can pass. Pipelines can be clocked or event-driven, depending on whether their parts act in response to some widely-distributed external clock, or act independently whenever local events permit. Some pipelines are inelastic; the amount of data in them is fixed. The input rate and the output rate of an inelastic pipeline must match exactly. Stripped of any processing logic, an inelastic pipeline acts like a shift register. Other pipelines are elastic; the amount of data in them may vary. The input rate and the output rate of an elastic pipeline may differ momentarily because of internal buffering. Stripped of all processing logic, an elastic pipeline becomes a flow-through first-in-first-out memory, or FIFO. FIFOs may be clocked or event-driven; their important property is that they are elastic. I assign the name micropipeline to a particularly simple form of event-driven elastic pipeline with or without internal processing. The micro part of this name seems appropriate to me because micropipelines contain very simple circuitry, because micropipelines are useful in very short lengths, and because micropipelines are suitable for layout in microelectronic form. I have chosen micropipelines as the subject of this lecture for three reasons. First, micropipelines are simple and easy to understand. I believe that simple ideas are best, and I find beauty in the simplicity and symmetry of micropipelines. Second, I see confusion surrounding the design of FIFOs. I offer this description of micropipelines in the hope of reducing some of that confusion. The third reason I have chosen my subject addresses the limitations imposed on us by the clocked-logic conceptual framework now commonly used in the design of digital systems. I believe that this conceptual framework or mind set masks simple and useful structures like micropipelines from our thoughts, structures that are easy to design and apply given a different conceptual framework. Because micropipelines are event-driven, their simplicity is not available within the clocked-logic conceptual framework. I offer this description of micropipelines in the hope of focusing attention on an alternative transition-signalling conceptual framework. We need a new conceptual framework because the complexity of VLSI technology has now reached the point where design time and design cost often exceed fabrication time and fabrication cost. Moreover, most systems designed today are monolithic and resist mid-life improvement. The transition-signalling conceptual framework offers the opportunity to build up complex systems by hierarchical composition from simpler pieces. The resulting systems are easily modified. I believe that the transition-signalling conceptual framework has much to offer in reducing the design time and cost of complex systems and increasing their useful lifetime. I offer this description of micropipelines as an example of the transition-signalling conceptual framework. Until recently only a hardy few used the transition-signalling conceptual framework for design because it was too hard. It was nearly impossible to design the small circuits of 10 to 100 transistors that form the elemental building blocks from which complex systems are composed. Moreover, it was difficult to prove anything about the resulting compositions. In the past five years, however, much progress has been made on both fronts. Charles Molnar and his colleagues at Washington University have developed a simple way to design the small basic building blocks [9]. Martin Rem's "VLSI Club" at the Technical University of Eindhoven has been working effectively on the mathematics of event-driven systems [6, 10, 11, 19]. These emerging conceptual tools now make transition signalling a lively candidate for widespread use.
Research and Advances

Concurrent operations on priority queues

Among the fastest priority queue implementations, skew heaps have properties that make them particularly suited to concurrent manipulation of shared queues. A concurrent version of the top down implementation of skew heaps can be produced from previously published sequential versions using almost mechanical transformations. This implementation requires O(log n) time to enqueue or dequeue an item, but it allows new operations to begin after only O(1) time on a MIMD machine. Thus, there is potential for significant concurrency when multiple processes share a queue. Applications to problems in graph theory and simulation are discussed in this article.
Research and Advances

Efficient distributed event-driven simulations of multiple-loop networks

Simulating asynchronous multiple-loop networks is commonly considered a difficult task for parallel programming. Two examples of asynchronous multiple-loop networks are presented in this article: a stylized queuing system and an Ising model. In both cases, the network is an n × n grid on a torus and includes at least an order of n2 feedback loops. A new distributed simulation algorithm is demonstrated on these two examples. The algorithm combines three elements: (1) the bounded lag restriction; (2) minimum propagation delays; and (3) the so-called opaque periods. We prove that if N processing elements (PEs) execute the algorithm in parallel and the simulated system exhibits sufficient density of events, then, on average, processing one event would require O(log N) instructions of one PE. Experiments on a shared memory MIMD bus computer (Sequent's Balance) and on a SIMD computer (Connection Machine) show speed-ups greater than 16 on 25 PEs of a Balance and greater than1900 on 214 PEs of a Connection Machine.
Research and Advances

The number of buffers required for sequential processing of a disk file

The number of buffers required for sequential processing of disk files is investigated with the assumption that there is a single user served by two processors: one reads blocks from the disk into buffers in main memory, while the other processor processes the buffers. If processing is faster than reading, then two buffers suffice. However, if processing is slower, using only two buffers does not guarantee minimum completion time. The minimal required number of buffers is calculated in this article.
Research and Advances

A subset coloring algorithm and its applications to computer graphics

We consider the following problem: we are given a diagram made up of intersecting circles, where each region is colored either black or white. We wish to display this diagram on a bitmap device, where we are allowed to (i) paint a given circle white and (ii) invert the colors within a given circle, changing white to black and vice versa. (These operations are frequently provided in graphics hardware or software.) We ask: using only these paint and invert operations, is it possible to draw the diagram? A generalization of this problem leads to an analogous coloring problem on a subset of the power set of n elements. We give a polynomial-time algorithm that answers the question above, and produces a "short" sequence of instructions to draw the diagram, if one exists. A simple modification of the algorithm permits us to handle the case where there are more colors than just black and white, and the colors are represented by bit strings. This corresponds to the conventions frequently used with color raster devices.
Research and Advances

HAM: a general purpose hypertext abstract machine

The HAM is a transaction-based server for a hyper text storage system. The server is designed to handle multiple users in a networked environment. The storage system consists of a collection of contexts, nodes, links, and attributes that make up a hypertext graph. The versatility of the HAM can be illustrated by showing how Guide buttons, intermedia webs, and NoteCard FileBoxes can be implemented using its storage model.
Research and Advances

On visual formalisms

The higraph, a general kind of diagramming object, forms a visual formalism of topological nature. Higraphs are suited for a wide array of applications to databases, knowledge representation, and, most notably, the behavioral specification of complex concurrent systems using the higraph-based language of statecharts.
Research and Advances

Distributed programming in Argus

Argus—a programming language and system developed to support the implementation and execution of distributed programs—provides mechanisms that help programmers cope with the special problems that arise in distributed programs, such as network partitions and crashes of remote nodes.

Shape the Future of Computing

ACM encourages its members to take a direct hand in shaping the future of the association. There are more ways than ever to get involved.

Get Involved

Communications of the ACM (CACM) is now a fully Open Access publication.

By opening CACM to the world, we hope to increase engagement among the broader computer science community and encourage non-members to discover the rich resources ACM has to offer.

Learn More