Advertisement

Research and Advances

Generalized working sets for segment reference strings

The working-set concept is extended for programs that reference segments of different sizes. The generalized working-set policy (GWS) keeps as its resident set those segments whose retention costs do not exceed their retrieval costs. The GWS is a model for the entire class of demand-fetching memory policies that satisfy a resident-set inclusion property. A generalized optimal policy (GOPT) is also defined; at its operating points it minimizes aggregated retention and swapping costs. Special cases of the cost structure allow GWS and GOPT to simulate any known stack algorithm, the working set, and VMIN. Efficient procedures for computing demand curves showing swapping load as a function of memory usage are developed for GWS and GOPT policies. Empirical data from an actual system are included.
Research and Advances

A note on virtual memory indexes

In [4] Maruyama and Smith analyzed design alternatives for virtual memory indexes. The indexes investigated were those modeled on the structure of VSAM [5] which are closely related to B-trees [1]. Maruyama and Smith presented alternatives for search strategies, page format, and whether or not to compress keys. They made a comparative analysis based on the average cost of processing the index for retrieval of a key with its associated data pointer (or a sequence of keys and data pointers). The analysis showed that the optimal solution depends on machine and file characteristics and they provided formulas for selecting the best alternative. In this brief note1 we propose that another alternative for the page format be considered. The basic idea is to replace the sequential representation within a page by a linked representation. The main advantage of this method is realized in the construction (and maintenance) phase of the index. Although this phase is difficult to analyze quantitatively—as alluded to in [4]—we provide data to substantiate our claim of a net saving in construction cost without any corresponding loss in the average retrieval cost. In the ensuing discussion we assume that the reader is familiar both with B-trees and the paper of Maruyama and Smith [4].
Research and Advances

A controlled experiment in program testing and code walkthroughs/inspections

This paper describes an experiment in program testing, employing 59 highly experienced data processing professionals using seven methods to test a small PL/I program. The results show that the popular code walkthrough/inspection method was as effective as other computer-based methods in finding errors and that the most effective methods (in terms of errors found and cost) employed pairs of subjects who tested the program independently and then pooled their findings. The study also shows that there is a tremendous amount of variability among subjects and that the ability to detect certain types of errors varies from method to method.
Research and Advances

Right brother trees

Insertion and deletion algorithms are provided for the class of right (or one-sided) brother trees which have O (log n) performance. The importance of these results stems from the close relationship of right brother trees to one-sided height-balanced trees which have an insertion algorithm operating in O (log2 n). Further, although both insertion and deletion can be carried out in O (log n) time for right brother trees, it appears that the insertion algorithm is inherently much more difficult than the deletion algorithm—the reverse of what one usually obtains.
Research and Advances

Simulations of dynamic sequential search algorithms

In [3], R.L. Rivest presents a set of methods for dynamically reordering a sequential list containing N records in order to increase search efficiency. The method Ai (for i between 1 and N) performs the following operation each time that a record R has been successfully retrieved: Move R forward i positions in the list, or to the front of the list if it was in a position less than i. The method A1 is called the transposition method, and the method AN-1 is called the move-to-front method.
Research and Advances

On the complexity of computing the measure of ∪[ai,bi]

The decision tree complexity of computing the measure of the union of n (possibly overlapping) intervals is shown to be &OHgr;(n log n), even if comparisons between linear functions of the interval endpoints are allowed. The existence of an &OHgr;(n log n) lower bound to determine whether any two of n real numbers are within ∈ of each other is also demonstrated. These problems provide an excellent opportunity for discussing the effects of the computational model on the ease of analysis and on the results produced.
Research and Advances

Interpolation search—a log logN search

Interpolation search is a method of retrieving a desired record by key in an ordered file by using the value of the key and the statistical distribution of the keys. It is shown that on the average log logN file accesses are required to retrieve a key, assuming that the N keys are uniformly distributed. The number of extra accesses is also estimated and shown to be very low. The same holds if the cumulative distribution function of the keys is known. Computational experiments confirm these results.
Research and Advances

Time, clocks, and the ordering of events in a distributed system

The concept of one event happening before another in a distributed system is examined, and is shown to define a partial ordering of the events. A distributed algorithm is given for synchronizing a system of logical clocks which can be used to totally order the events. The use of the total ordering is illustrated with a method for solving synchronization problems. The algorithm is then specialized for synchronizing physical clocks, and a bound is derived on how far out of synchrony the clocks can become.
Research and Advances

An English language question answering system for a large relational database

By typing requests in English, casual users will be able to obtain explicit answers from a large relational database of aircraft flight and maintenance data using a system called PLANES. The design and implementation of this system is described and illustrated with detailed examples of the operation of system components and examples of overall system operation. The language processing portion of the system uses a number of augmented transition networks, each of which matches phrases with a specific meaning, along with context registers (history keepers) and concept case frames; these are used for judging meaningfulness of questions, generating dialogue for clarifying partially understood questions, and resolving ellipsis and pronoun reference problems. Other system components construct a formal query for the relational database, and optimize the order of searching relations. Methods are discussed for handling vague or complex questions and for providing browsing ability. Also included are discussions of important issues in programming natural language systems for limited domains, and the relationship of this system to others.
Research and Advances

A selective traversal algorithm for binary search trees

The problem of selecting data items from a binary search tree according to a list of range conditions is considered. The process of visiting a minimal number of nodes to retrieve data satisfying the range conditions is called selective traversal. Presented in this paper is an algorithm for selective traversal which uses a tag field for each node in the tree. The algorithm is particularly useful and efficient when examination of data is more time consuming than examination of a tag field.
Research and Advances

An interference matching technique for inducing abstractions

A method for inducing knowledge by abstraction from a sequence of training examples is described. The proposed method, interference matching, induces abstractions by finding relational properties common to two or more exemplars. Three tasks solved by a program that uses an interference-matching algorithm are presented. Several problems concerning the description of the training examples and the adequacy of interference matching are discussed, and directions for future research are considered.
Research and Advances

Optimal conversion of extended-entry decision tables with general cost criteria

A general dynamic programming algorithm for converting limited, extended, or mixed entry decision tables to optimal decision trees is presented which can take into account rule frequencies or probabilities, minimum time and/or space cost criteria, common action sets, compressed rules and ELSE rules, sequencing constraints on condition tests, excludable combinations of conditions, certain ambiguities, and interrupted rule masking.
Research and Advances

A technique for isolating differences between files

A simple algorithm is described for isolating the differences between two files. One application is the comparing of two versions of a source program or other file in order to display all differences. The algorithm isolates differences in a way that corresponds closely to our intuitive notion of difference, is easy to implement, and is computationally efficient, with time linear in the file length. For most applications the algorithm isolates differences similar to those isolated by the longest common subsequence. Another application of this algorithm merges files containing independently generated changes into a single file. The algorithm can also be used to generate efficient encodings of a file in the form of the differences between itself and a given “datum” file, permitting reconstruction of the original file from the diference and datum files.
Research and Advances

The use of an interactive information storage and retrieval system in medical research

This paper presents the results of a study of the use of an interactive computerized storage and retrieval system. A monitor built into the computer system provided usage data for the study. Additional data on user reactions were gathered from a questionnaire. The results show the important role played by frequently chosen laboratory reference leaders in influencing the use of this system. The implications of the study for the design of similar systems are discussed.
Research and Advances

Value orientation of computer science students

Technological and nontechnological value orientations are investigated with special attention to the complexity of value structures. Computer science students, who are closely associated with technology, contrast with social science students, who are often technologically aloof. This is confirmed by the value ratings of 313 students at the University of Minnesota in 1972. Computer science majors were found to have a more complex value structure than social science majors.
Research and Advances

Management utilization of computers in American local governments

Traditional concepts of management information systems (MIS) bear little relation to the information systems currently in use by top management in most US local governments. What exists is management-oriented computing, involving the use of relatively unsophisticated applications. Despite the unsophisticated nature of these systems, management use of computing is surprisingly common, but also varied in its extent among local governments. Management computing is most prevalent in those governments with professional management practices where top management is supportive of computing and tends to control computing decisions and where department users have less control over design and implementation activities. Finally, management computing clearly has impacts for top managers, mostly involving improvements in decision information.
Research and Advances

B-trees re-examined

The B-tree and its variants have, with increasing frequency, been proposed as a basic storage structure for multiuser database applications. Here, three potential problems which must be dealt with in such a structure that do not arise in more traditional static directory structures are indicated. One problem is a possible performance penalty.
Research and Advances

Specifying queries as relational expressions: the SQUARE data sublanguage

This paper presents a data sublanguage called SQUARE, intended for use in ad hoc, interactive problem solving by non-computer specialists. SQUARE is based on the relational model of data, and is shown to be relationally complete; however, it avoids the quantifiers and bound variables required by languages based on the relational calculus. Facilities for query, insertion, deletion, and update on tabular data bases are described. A syntax is given, and suggestions are made for alternative syntaxes, including a syntax based on English key words for users with limited mathematical background.
Research and Advances

A vector space model for automatic indexing

In a document retrieval, or other pattern matching environment where stored entities (documents) are compared with each other or with incoming patterns (search requests), it appears that the best indexing (property) space is one where each entity lies as far away from the others as possible; in these circumstances the value of an indexing system may be expressible as a function of the density of the object space; in particular, retrieval performance may correlate inversely with space density. An approach based on space density computations is used to choose an optimum indexing vocabulary for a collection of documents. Typical evaluation results are shown, demonstating the usefulness of the model.
Research and Advances

Optimizing the performance of a relational algebra database interface

An approach for implementing a “smart” interface to support a relational view of data is proposed. The basic idea is to employ automatic programming techniques so that the interface analyzes and efficiently refines the high level query specification supplied by the user. A relational algebra interface, called SQUIRAL, which was designed using this approach, is described in detail. SQUIRAL seeks to minimize query response time and space utilization by: (1) performing global query optimization, (2) exploiting disjoint and pipelined concurrency, (3) coordinating sort orders in temporary relations, (4) employing directory analysis, and (5) maintaining locality in page references. Algorithms for implementing the operators of E. F. Codd's relational algebra are presented, and a methodology for composing them to optimize the performance of a particular user query is described.
Research and Advances

CONVERT: a high level translation definition language for data conversion

This paper describes a high level and nonprocedural translation definition language, CONVERT, which provides very powerful and highly flexible data restructuring capabilities. Its design is based on the simple underlying concept of a form which enables the users to visualize the translation processes, and thus makes data translation a much simpler task. “CONVERT” has been chosen for conveying the purpose of the language and should not be confused with any other language or program bearing the same name.
Research and Advances

Implementation of a structured English query language

The relational model of data, the XRM Relational Memory System, and the SEQUEL language have been covered in previous papers and are reviewed. SEQUEL is a relational data sublanguage intended for ad hoc interactive problem solving by non-computer specialists. A version of SEQUEL that has been implemented in a prototype interpreter is described. The interpreter is designed to minimize the data accessing operations required to respond to an arbitrary query. The optimization algorithms designed for this purpose are described.

Shape the Future of Computing

ACM encourages its members to take a direct hand in shaping the future of the association. There are more ways than ever to get involved.

Get Involved

Communications of the ACM (CACM) is now a fully Open Access publication.

By opening CACM to the world, we hope to increase engagement among the broader computer science community and encourage non-members to discover the rich resources ACM has to offer.

Learn More