Advertisement

Research and Advances

Interpolation search—a log logN search

Interpolation search is a method of retrieving a desired record by key in an ordered file by using the value of the key and the statistical distribution of the keys. It is shown that on the average log logN file accesses are required to retrieve a key, assuming that the N keys are uniformly distributed. The number of extra accesses is also estimated and shown to be very low. The same holds if the cumulative distribution function of the keys is known. Computational experiments confirm these results.
Research and Advances

Time, clocks, and the ordering of events in a distributed system

The concept of one event happening before another in a distributed system is examined, and is shown to define a partial ordering of the events. A distributed algorithm is given for synchronizing a system of logical clocks which can be used to totally order the events. The use of the total ordering is illustrated with a method for solving synchronization problems. The algorithm is then specialized for synchronizing physical clocks, and a bound is derived on how far out of synchrony the clocks can become.
Research and Advances

An English language question answering system for a large relational database

By typing requests in English, casual users will be able to obtain explicit answers from a large relational database of aircraft flight and maintenance data using a system called PLANES. The design and implementation of this system is described and illustrated with detailed examples of the operation of system components and examples of overall system operation. The language processing portion of the system uses a number of augmented transition networks, each of which matches phrases with a specific meaning, along with context registers (history keepers) and concept case frames; these are used for judging meaningfulness of questions, generating dialogue for clarifying partially understood questions, and resolving ellipsis and pronoun reference problems. Other system components construct a formal query for the relational database, and optimize the order of searching relations. Methods are discussed for handling vague or complex questions and for providing browsing ability. Also included are discussions of important issues in programming natural language systems for limited domains, and the relationship of this system to others.
Research and Advances

A selective traversal algorithm for binary search trees

The problem of selecting data items from a binary search tree according to a list of range conditions is considered. The process of visiting a minimal number of nodes to retrieve data satisfying the range conditions is called selective traversal. Presented in this paper is an algorithm for selective traversal which uses a tag field for each node in the tree. The algorithm is particularly useful and efficient when examination of data is more time consuming than examination of a tag field.
Research and Advances

An interference matching technique for inducing abstractions

A method for inducing knowledge by abstraction from a sequence of training examples is described. The proposed method, interference matching, induces abstractions by finding relational properties common to two or more exemplars. Three tasks solved by a program that uses an interference-matching algorithm are presented. Several problems concerning the description of the training examples and the adequacy of interference matching are discussed, and directions for future research are considered.
Research and Advances

Optimal conversion of extended-entry decision tables with general cost criteria

A general dynamic programming algorithm for converting limited, extended, or mixed entry decision tables to optimal decision trees is presented which can take into account rule frequencies or probabilities, minimum time and/or space cost criteria, common action sets, compressed rules and ELSE rules, sequencing constraints on condition tests, excludable combinations of conditions, certain ambiguities, and interrupted rule masking.
Research and Advances

A technique for isolating differences between files

A simple algorithm is described for isolating the differences between two files. One application is the comparing of two versions of a source program or other file in order to display all differences. The algorithm isolates differences in a way that corresponds closely to our intuitive notion of difference, is easy to implement, and is computationally efficient, with time linear in the file length. For most applications the algorithm isolates differences similar to those isolated by the longest common subsequence. Another application of this algorithm merges files containing independently generated changes into a single file. The algorithm can also be used to generate efficient encodings of a file in the form of the differences between itself and a given “datum” file, permitting reconstruction of the original file from the diference and datum files.
Research and Advances

The use of an interactive information storage and retrieval system in medical research

This paper presents the results of a study of the use of an interactive computerized storage and retrieval system. A monitor built into the computer system provided usage data for the study. Additional data on user reactions were gathered from a questionnaire. The results show the important role played by frequently chosen laboratory reference leaders in influencing the use of this system. The implications of the study for the design of similar systems are discussed.
Research and Advances

Value orientation of computer science students

Technological and nontechnological value orientations are investigated with special attention to the complexity of value structures. Computer science students, who are closely associated with technology, contrast with social science students, who are often technologically aloof. This is confirmed by the value ratings of 313 students at the University of Minnesota in 1972. Computer science majors were found to have a more complex value structure than social science majors.
Research and Advances

Management utilization of computers in American local governments

Traditional concepts of management information systems (MIS) bear little relation to the information systems currently in use by top management in most US local governments. What exists is management-oriented computing, involving the use of relatively unsophisticated applications. Despite the unsophisticated nature of these systems, management use of computing is surprisingly common, but also varied in its extent among local governments. Management computing is most prevalent in those governments with professional management practices where top management is supportive of computing and tends to control computing decisions and where department users have less control over design and implementation activities. Finally, management computing clearly has impacts for top managers, mostly involving improvements in decision information.
Research and Advances

B-trees re-examined

The B-tree and its variants have, with increasing frequency, been proposed as a basic storage structure for multiuser database applications. Here, three potential problems which must be dealt with in such a structure that do not arise in more traditional static directory structures are indicated. One problem is a possible performance penalty.
Research and Advances

Specifying queries as relational expressions: the SQUARE data sublanguage

This paper presents a data sublanguage called SQUARE, intended for use in ad hoc, interactive problem solving by non-computer specialists. SQUARE is based on the relational model of data, and is shown to be relationally complete; however, it avoids the quantifiers and bound variables required by languages based on the relational calculus. Facilities for query, insertion, deletion, and update on tabular data bases are described. A syntax is given, and suggestions are made for alternative syntaxes, including a syntax based on English key words for users with limited mathematical background.
Research and Advances

A vector space model for automatic indexing

In a document retrieval, or other pattern matching environment where stored entities (documents) are compared with each other or with incoming patterns (search requests), it appears that the best indexing (property) space is one where each entity lies as far away from the others as possible; in these circumstances the value of an indexing system may be expressible as a function of the density of the object space; in particular, retrieval performance may correlate inversely with space density. An approach based on space density computations is used to choose an optimum indexing vocabulary for a collection of documents. Typical evaluation results are shown, demonstating the usefulness of the model.
Research and Advances

Optimizing the performance of a relational algebra database interface

An approach for implementing a “smart” interface to support a relational view of data is proposed. The basic idea is to employ automatic programming techniques so that the interface analyzes and efficiently refines the high level query specification supplied by the user. A relational algebra interface, called SQUIRAL, which was designed using this approach, is described in detail. SQUIRAL seeks to minimize query response time and space utilization by: (1) performing global query optimization, (2) exploiting disjoint and pipelined concurrency, (3) coordinating sort orders in temporary relations, (4) employing directory analysis, and (5) maintaining locality in page references. Algorithms for implementing the operators of E. F. Codd's relational algebra are presented, and a methodology for composing them to optimize the performance of a particular user query is described.
Research and Advances

CONVERT: a high level translation definition language for data conversion

This paper describes a high level and nonprocedural translation definition language, CONVERT, which provides very powerful and highly flexible data restructuring capabilities. Its design is based on the simple underlying concept of a form which enables the users to visualize the translation processes, and thus makes data translation a much simpler task. “CONVERT” has been chosen for conveying the purpose of the language and should not be confused with any other language or program bearing the same name.
Research and Advances

Implementation of a structured English query language

The relational model of data, the XRM Relational Memory System, and the SEQUEL language have been covered in previous papers and are reviewed. SEQUEL is a relational data sublanguage intended for ad hoc interactive problem solving by non-computer specialists. A version of SEQUEL that has been implemented in a prototype interpreter is described. The interpreter is designed to minimize the data accessing operations required to respond to an arbitrary query. The optimization algorithms designed for this purpose are described.
Research and Advances

A preliminary system for the design of DBTG data structures

The functional approach to database design is introduced. In this approach the goal of design is to derive a data structure which is capable of supporting a set of anticipated queries rather than a structure which “models the business” in some other way. An operational computer program is described which utilizes the functional approach to design data structures conforming to the Data Base Task Group specifications. The automatic programming technology utilized by this program, although typically used to generate procedure, is here used to generate declaratives.
Research and Advances

Multidimensional binary search trees used for associative searching

This paper develops the multidimensional binary search tree (or k-d tree, where k is the dimensionality of the search space) as a data structure for storage of information to be retrieved by associative searches. The k-d tree is defined and examples are given. It is shown to be quite efficient in its storage requirements. A significant advantage of this structure is that a single data structure can handle many types of queries very efficiently. Various utility algorithms are developed; their proven average running times in an n record file are: insertion, O(log n); deletion of the root, O(n(k-1)/k); deletion of a random node, O(log n); and optimization (guarantees logarithmic performance of searches), O(n log n). Search algorithms are given for partial match queries with t keys specified [proven maximum running time of O(n(k-t)/k)] and for nearest neighbor queries [empirically observed average running time of O(log n).] These performances far surpass the best currently known algorithms for these tasks. An algorithm is presented to handle any general intersection query. The main focus of this paper is theoretical. It is felt, however, that k-d trees could be quite useful in many applications, and examples of potential uses are given.
Research and Advances

Combining decision rules in a decision table

The techniques for minimizing logic circuits are applied to the simplification of decision tables by the combining of decision rules. This method is logically equivalent to the Quine-McCluskey method for finding prime implicants. If some of the decision rules implied in the ELSE Rule occur with low frequency, then the ELSE Rule can be used to further simplify the decision table. Several objectives merit consideration in optimizing a decision table: reducing machine execution time; reducing preprocessing time; reducing required machine memory; reducing the number of decision rules. (This often improves the clarity of the decision table to a human reader.) It will be shown that objectives (3) and (4) can be furthered with the above methods. Objective (1) is also attained if overspecified decision rules are not combined. Objective (2) must be compared against the potential benefits of objectives (1), (3), and (4) in deciding whether to use the above methods.
Research and Advances

Consecutive storage of relevant records with redundancy

This paper studies the properties of a new class of file organizations (CRWR) where records relevant to every query are stored in consecutive storage locations but the organizations contain redundancy. Some theorems which provide tools for reducing redundancy in CRWR organizations have been also developed. Redundancies obtained by the application of these theorems are compared with that of query-inverted file organizations. Some CRWR organizations with minimum redundancy have also been developed for queries which specify sets of keys.
Research and Advances

Interactive consulting via natural language

Interactive programming systems often contain help commands to give the programmer on-line instruction regarding the use of the various systems commands. It is argued that it would be relatively easy to make these help commands significantly more helpful by having them accept requests in natural language. As a demonstration, Weizenbaum's ELIZA program has been provided with a script that turns it into a natural language system consultant.
Research and Advances

The restriction language for computer grammars of natural language

Over the past few years, a number of systems for the computer analysis of natural language sentences have been based on augmented context-free grammars: a context-free grammar which defines a set of parse trees for a sentence, plus a group of restrictions to which a tree must conform in order to be a valid sentence analysis. As the coverage of the grammar is increased, an efficient representation becomes essential for further development. This paper presents a programming language designed specifically for the compact and perspicuous statement of restrictions of a natural language grammar. It is based on ten years' experience parsing text sentences with the comprehensive English grammar of the N.Y.U. Linguistic String Project, and embodies in its syntax and routines the relations which were found to be useful and adequate for computerized natural language analysis. The language is used in the current implementation of the Linguistic String Parser.

Shape the Future of Computing

ACM encourages its members to take a direct hand in shaping the future of the association. There are more ways than ever to get involved.

Get Involved

Communications of the ACM (CACM) is now a fully Open Access publication.

By opening CACM to the world, we hope to increase engagement among the broader computer science community and encourage non-members to discover the rich resources ACM has to offer.

Learn More