mallocand Dynamic Memory Allocation
When you call functions in an API, you expect them to work correctly; sometimes this expectation is called a contract between the caller and the implementation. Callers also have performance expectations about these functions, and often the success of a software system depends on the API meeting these expectations.
So there is a performance contract as well as a correctness contract. The performance contract is usually implicit, often vague, and sometimes breached (by caller or implementation). How can this aspect of API design and documentation be improved?
Any significant software system today depends on the work of others: sure, you write some code, but you call functions in the operating system and in a variety of software packages through APIs that cut down on the amount of code you must write. In some cases you even outsource work to distant servers hooked to you by cranky networks. You depend on these functions and services for correct operation, but you also depend on them for performing well enough so the overall system performs well. There is bound to be performance variation in complex systems that involve paging, network delays, sharing resources such as disks, and so on. Even in simple setups such as a stand-alone computer with all program and data in memory, however, you can be surprised when an API or operating system does not meet performance expectations.
People are accustomed to speaking of the contract between an application and the implementation of an API to describe correct behavior when an API function is invoked. The caller must meet certain initial requirements, and the function is then required to perform as specified. (These dual responsibilities are like the pre- and post-conditions that appear in Floyd-Hoare logic for proving program correctness.) While today's API specifications do not make correctness criteria explicit in a way that leads to correctness proofs, the type declarations and textual documentation for an API function strive to be unambiguous about its logical behavior. If a function is described as "add the element e to the end of list l" where e and l are described by types in the calling sequence, then the caller knows what behavior to expect.
There is more to an API function than correctness, however. What resources does it consume, and how fast is it? People often make assumptions based on their own judgments of what the implementation of the function being performed should be. Adding an element to the end of a list should be "cheap." Reversing a list might be "a linear function of list length." For many simple functions, these intuitions suffice to avoid troublethough not always, as described later in this article. For functions of any complexity, however, such as "draw string s in font f at (x,y) in window w," or "find the average of the values stored on a remote file," there is room for surprise if not frustration at the performance. Further, the API documentation provides no hints about which functions are cheap and which are costly.
To complicate matters, after you have tuned your application to the performance characteristics of an API, a new version of the API implementation arrives or a new remote server stores the data file, and rather than overall performance increasing (the eternal hope for something new), it tanks. In short, the performance contracts that accompany the composition of software components in systems deserve more attention.
Programmers begin building intuitive API performance models early on (see Sidebar 1). To be useful, the model need not be very accurate. Here is a simple taxonomy:
Always cheap (examples:
toupper, isdigit, java.util.HashMap.get). The first two functions are always cheap, usually inlined table lookups. Lookup in a properly sized hash table is expected to be fast, but a hash collision may slow down an occasional access.
Usually cheap (examples:
fgetc, java.util.HashMap.put). Many functions are designed to be fast most of the time but require occasionally invoking more complex code;
fgetc must occasionally read a new character buffer. Storing a new item in a hash table may make the table so full the implementation enlarges it and rehashes all its entries.
Some libraries offer more than one way of performing a function, usually because the alternatives have quite different performance.
The documentation for
java.util. HashMap makes a commendable start at exposing a performance contract: "This implementation provides constant-time performance for the basic operations (
put), assuming the hash function disperses the elements properly among the buckets. Iteration over collection views requires time proportional to the "capacity" of the HashMap..."3
The performance of
fgetc depends on properties of the underlying stream. If it is a disk file, then the function will normally read from a user-memory buffer without requiring an operating-system call, but it must occasionally invoke the operating system to read a new buffer. If it is reading input from a keyboard, then the implementation probably invokes the operating system for each character read.
qsort, regexec). Performance of these functions varies with properties of their arguments, (for example, the size of an array to sort or the length of a string to search). These functions are often data-structure or common-algorithm utilities that use well-known algorithms and do not require system calls. You can usually judge performance based on expectations for the underlying algorithms (for example, that sort will take n log n time). When using complex data structures (for example, B-trees) or generic collections (where it may be difficult to identify the underlying concrete implementation), it may be more difficult to estimate performance. It is important to understand the predictability may be only probable;
regexec is generally predictable based on its input, but there are pathological expressions that will cause exponential time explosions.
fopen, fseek, pthread_create, many "initialization" functions, and any call that traverses a network). These functions are not cheap and often have high variance in their performance. They allocate resources from pools (threads, memory, disk, operating system objects), often requiring exclusive access to shared operating system or I/O resources. There is usually substantial initialization required. Calling across a network is always expensive (relative to local access), but the variation in the expense can be larger still, making the formation of a reasonable performance model much more difficult.
Threads libraries are easy marks for performance issues. The Posix standard took many years to settle down, and implementations are still plagued with problems.6 Portability of threaded applications remains dicey. Some of the reasons for difficulties with threads are: the need for tight integration with operating systems, almost all of which (including Unix and Linux) were not designed originally with threads in mind; interaction with other libraries, especially making functions thread-safe and dealing with performance problems thereby induced; and several different design points for threads implementations, loosely characterized as lightweight and heavyweight.
Some libraries offer more than one way of performing a function, usually because the alternatives have quite different performance.
Returning to sidebar 1, note that most programmers are told that using a library function to fetch each character is not the fastest way (even if the code for the function is inlined to avoid the function-call overhead). The more performance-conscious will read a very large array of characters and extract each one using array or pointer operations in the programming language. In extreme cases the application can map pages of the file into memory pages to avoid copying data into an array. In return for performance, these functions put a larger burden on the caller (for example, to get buffer arithmetic correct and to make the implementation consistent with other library calls such as a call to
fseek that would require adjusting buffer pointers and perhaps contents).
Programmers are always counseled to avoid premature optimization in their programs, thus delaying extreme revisions until simpler ones have proven inadequate. The only way to be sure about performance is to measure it. Programmers usually write the entire program before confronting mismatches between performance expectations (or estimates) and the reality delivered by an implementation.
Performance variation. The performance of "predictable" functions can be estimated from properties of their arguments. The "unknown" functions may also vary greatly depending on what they are asked to do. The time required to open a stream on a storage device will certainly depend on access times, and perhaps data-transfer rates, of the underlying device. Storage accessed via network protocols may be especially costly; certainly it will be variable.
Many cheap functions are cheap only most of the time, or they have an expected cheap cost. A "get a character" routine must occasionally refill a buffer using an operating system call that always takes much longer than fetching a character from a full bufferand might occasionally take a very long time indeed (for example, reading a file on a heavily loaded file server or from a disk that is dying and will succeed only after multiple read retries).
The functions with "unknown" performance are likely to exhibit wide performance variations for a variety of reasons. One reason is function creep (see sidebar 2), where a generic function becomes more powerful over time. I/O streams are a good example: a call to open a stream invokes very different code in libraries and operating systems depending on the type of stream being opened (local disk file, network-served file, pipe, network stream, string in memory, and so on). As the range of I/O devices and types of files expands, the variance of the performance can only increase. The common life cycle of most APIsto add functionality incrementally over timeinevitably increases performance variations.
A large source of variation is differences between ports of libraries to different platforms. Of course, the underlying speed of the platformboth hardware and operating systemwill vary, but library ports can lead to changes in the relative performance of functions within an API or performance across APIs. It's not uncommon for a quick-and-dirty initial port to have many performance problems, which are gradually fixed. Some libraries, such as those for handling threads, have notoriously wide porting-performance variation. Thread anomalies can surface as extreme behaviorsan impossibly slow application or even deadlock.
These variations are one reason why constructing precise performance contracts is difficult. It is usually not necessary to know performance with great precision, but extreme variations from expected behavior can cause problems.
Failure performance. The specifications for an API include details of behavior when a call fails. Returning an error code and throwing an exception are common ways of telling the caller the function did not succeed. As with specifications for normal behavior, however, the performance of the failure is not specified. Here are three important cases:
sqrt(-1)fails fast. Even when a
malloccall fails because no more memory is available, the call should return about as fast as any
malloccall that must request more memory from the operating system. A call to open a stream for reading a nonexistent disk file is likely to return about as fast as a successful call.
Intuition about failure performance is rarely as good as that for normal performance. One reason is simply that writing, debugging, and tuning programs provides far less experience with failure events than with normal events. Another is that a function call can fail in many, many ways, some of them fatal, and not all described in the API's specs. Even exception mechanisms, which are intended to describe more precisely the handling of errors, do not make all possible exceptions visible. Moreover, as library functionality increases, so do the opportunities for failure. For example, APIs that wrap network services (ODBC, JDBC, UPnP, ...) intrinsically subscribe to the vast array of network-failure mechanisms.
A diligent application programmer uses massive artillery to deal with unlikely failures. A common technique is to surround rather large parts of a program with
try...catch blocks that can retry whole sections that fail. Interactive programs can try to save a user's work using a giant
try... catch around the entire program, the effect of which is to mitigate failure of the main program by saving, in a disk file, key logs, or data structures that record the effects of the work done by the user before the failure.
The only way to deal with stalls or deadlocks is to set up a watchdog thread, which expects a properly running application program to check in periodically with the watchdog, saying, in effect, "I'm still running properly." If too much time elapses between check-ins, the watchdog takes actionfor example, saving state, aborting the main thread, and restarting the entire application. If an interactive program responds to a user's command by calling functions that may fail slowly, it may use a watchdog to abort the entire command and return to a known state that allows the user to proceed with other commands. This gives rise to a defensive programming style that plans for the possible abortion of every command.
Why must an API adhere to a performance contract? Because major structures of an application program may depend on adherence to such a contract. A programmer chooses APIs, data structures, and overall program structures based in part on expectations of API performance. If the expectations or the performance are grossly wrong, then the programmer cannot recover merely by tuning API calls but must rewrite large, possibly major, portions of the program. The defensive structure of the interactive program previously mentioned is another example.
In effect, a serious violation of the performance contract leads to a composition failure: a program written to the contract cannot be mated to (composed with) an implementation that fails to uphold the contract.
Of course, there are many programs whose structure and performance are influenced very little by library performance (scientific computations and large simulations are often in this category); however, much of today's "routine IT," especially the software that pervades Web-based services, makes extensive use of libraries whose performance is vital to overall performance.
Many programmers expect computers and their software to get faster with timeuniformly.
Even small variations in performance can cause major changes in the perception of the program by its users. This is especially true in programs dealing with various kinds of media. Dropping occasional frames of a video stream might be acceptable (indeed, more acceptable than allowing the frame rate to lag other media), but humans have evolved to detect even slight dropouts in audio, so minor changes in the performance of that type of media may have major impacts on the acceptability of the overall program. Such worries have led to considerable interest in notions of quality of service, which in many ways is the attempt to ensure performance at a high level.
How can contract violations be avoided? Although performance-contract violations are rare and rarely catastrophic, paying attention to the role of performance when using a software library can help lead to more robust software. Here are some precautions and strategies to use:
libc) arguably have become so in part because their performance contracts have been stable during the evolution of the API. The same goes for ports of APIs.
It would be helpful if clients of these services made note of the performance they expect and produce log entries that help diagnose problems (this is what
syslog is for). When your file backup seems unreasonably slow (one hour to back up 200MB), is it slower than yesterday? Slower than it was before the latest operating system software update? Given that several computers may be sharing the backup device, is it slower than you should expect? Or is there some logical explanation (for example, the backup system finds a damaged data structure and embarks on a long procedure to rebuild it)?
Diagnosing performance problems in compositions of opaque software (where no source code is available, and there are no details of the modules and APIs that form the composition) requires the software play a role in reporting performance and detecting problems. While you cannot address performance problems within the software itself (it is opaque), you can make adjustments or repairs to operating systems and the network. If the backup device is slow because its disk is almost full, then you can surely add more disk space. Good logs and associated tools would help; sadly, logs are an undervalued and neglected area of computer system evolution.
Today's software systems depend on composing independently developed components in such a way that they workmeaning they perform the desired computations at acceptable speed. The dream of statically checking a composition to guarantee the composition will be correct ("correct by composition") remains elusive. Instead, software-engineering practice has developed methods for testing components and compositions that work pretty well. Every time an application binds to a dynamic library or launches atop an operating-system interface, the correctness of the composition is required.
Despite its importance, the performance of a compositionwhether the client and the provider of an interface adhere to the performance contract between themhas been given short shrift. True, this contract is not as important as the correctness contract, but the full power of composition depends on it.
Thanks to Butler Lampson, who coined the term cheap to describe functions whose performance is fast enough to dispel all attempts to optimize them away; it has a wonderful ring of "economical yet serviceable," Eric Allman, for his helpful comments, and Jon Bentley, an inveterate performance debugger.
API Design Matters
Revisiting Network I/O APIs: The netmap Framework
Passing a Language through the Eye of a Needle
Roberto Ierusalimschy, Luiz Henrique de Figueiredo and Waldemar Celes
2. GNU C Library; http://www.gnu.org/software/libc/manual/html_node/index.html.
3. Java Platform, Standard Edition 7. API Specification; http://docs.oracle.com/javase/7/docs/api/index.html.
5. Oracle. Solaris Dynamic Tracing Guide; http://docs.oracle.com/cd/E19253-01/817-6223/.
8. Vo, K.-P. Vmalloc: a general and efficient memory allocator. Software Practice and Experience 26, 3 (1996), 357374; http://www2.research.att.com/~astopen/download/ref/vmalloc/vmalloc-spe.pdf.
©2014 ACM 0001-0782/14/03
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and full citation on the first page. Copyright for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and/or fee. Request permission to publish from firstname.lastname@example.org or fax (212) 869-0481.
The Digital Library is published by the Association for Computing Machinery. Copyright © 2014 ACM, Inc.
No entries found