Technical Perspective: The Software-Centric Approach of SYNERGY

With performance and power limitations becoming the greatest challenges in today’s datacenters, there is significant interest in using more application-specific computing devices, often called “accelerators.” The Field-Programmable Gate Array (FPGA) is one such device demonstrated to significantly address both challenges. An FPGA is a programmable device that can implement hardware circuits. The circuits are specified using bits loaded into a configuration memory that defines the logic and connections for the desired circuit. This is analogous to the binary object code that specifies processor instructions in the software world.

FPGAs were first intended to build chips faster, without the significant overhead of going through a full chip fabrication cycle. The ability to reprogram FPGAs quickly led to them becoming a technology for doing application-specific computing. The FPGAs in today’s datacenters are allocated to a user as physical devices dedicated for the duration of an application. The challenge is to enable sharing FPGAs among multiple users in the same way that processors are shared by multiple users using context switching.

The computing model for software has a memory for data and instructions and a relatively small processor state. The simplest form of context switching requires saving the processor state and leaving everything else in memory. A new context requires loading the new processor state and mapping the virtual memory tables to point to memory of the new context. This makes virtualization of software processes straightforward and how to do it is well-understood.

FPGAs have a much more complex computing model because an application is essentially a circuit. The size of an application must fit on the area available in the FPGAs. The memory model is even more complex. There can be many small memories scattered throughout the circuit associated with different parts of the circuit. The memories can be as small as a bit in a flip-flop. FPGAs also have arithmetic blocks, usually called DSP units because of their original inclusion to support digital signal processing, that will have multiplier circuits and registers. All the small memories, flip-flops, and registers are state. Supporting virtualization requires some mechanism to save and restore this state. Some approaches mitigate the size of state by only supporting the pausing of an application at times when the state is more easily captured because it is only in specific memories and registers that are made accessible for retrieving and restoring.

Addressing the virtualization of FPGAs is a problem that must be undertaken with careful collaboration between the hardware and software communities. FPGAs were not invented with any intention to use them for application-specific computing and really have no features that support computing, such as supporting virtualization. Solutions proposed to this point have mostly come from the hardware side, without full consideration of what the software world requires.

The accompanying paper describes SYNERGY, which takes an interesting software-centric approach that satisfies many of the requirements of virtualization. It can support task suspension and resumption, task migration, multiplexing multiple processes spatially or by time multiplexing, and some level of process isolation. The approach begins with rewriting the source Verilog hardware description language code so it is compliant with a Verilog JIT compiler. This leverages the compiler interface (ABI) as a standard interface for a runtime that can manage the state variables, such as saving and restoring, and even handle tasks such as file I/O that are not easily accessible from hardware. With this ABI, applications can even be migrated between different families of FPGAs.

This software-centric approach views an application at a much higher level of abstraction by identifying the important state variables and the times when it is safe to manipulate the state. In contrast, hardware-centric approaches to this problem view the application as a circuit with state changes occurring at every clock cycle and the state being held in flip flops, registers, and various memories. Saving this low-level state is clearly much more challenging.

The implementation of SYNERGY is a tremendous amount of work and shows how a different perspective can offer new directions to investigate. For example, I’m a hardware person that would never think of code rewriting and JIT compilers! More sophisticated hardware designs must be tested and demonstrated. This may lead to constraints on design styles that must be identified. SYNERGY incurs significant overheads in circuit area and time for state saving/restoring that would currently limit its use, although the authors believe some of that can be addressed with further work. Getting to the point of what can currently be achieved in performance with current virtualization of pure software will require even more changes. What is needed is to take what can be learned from this paper, what the hardware community has tried in their implementations of virtualization and specify what architectural features can be added to FPGAs to make virtualization feasible with minimal overheads. Processors have already gone through this evolution.