HCI

Research and Advances Oct 1 1975

A preliminary system for the design of DBTG data structures

The functional approach to database design is introduced. In this approach the goal of design is to derive a data structure which is capable of supporting a set of anticipated queries rather than a structure which “models the business” in some other way. An operational computer program is described which utilizes the functional approach to design data structures conforming to the Data Base Task Group specifications. The automatic programming technology utilized by this program, although typically used to generate procedure, is here used to generate declaratives.

Rob Gerritsen

Research and Advances Sep 1 1975

Multidimensional binary search trees used for associative searching

This paper develops the multidimensional binary search tree (or k-d tree, where k is the dimensionality of the search space) as a data structure for storage of information to be retrieved by associative searches. The k-d tree is defined and examples are given. It is shown to be quite efficient in its storage requirements. A significant advantage of this structure is that a single data structure can handle many types of queries very efficiently. Various utility algorithms are developed; their proven average running times in an n record file are: insertion, O(log n); deletion of the root, O(n(k-1)/k); deletion of a random node, O(log n); and optimization (guarantees logarithmic performance of searches), O(n log n). Search algorithms are given for partial match queries with t keys specified [proven maximum running time of O(n(k-t)/k)] and for nearest neighbor queries [empirically observed average running time of O(log n).] These performances far surpass the best currently known algorithms for these tasks. An algorithm is presented to handle any general intersection query. The main focus of this paper is theoretical. It is felt, however, that k-d trees could be quite useful in many applications, and examples of potential uses are given.

Jon Louis Bentley

Research and Advances Aug 1 1975

Combining decision rules in a decision table

The techniques for minimizing logic circuits are applied to the simplification of decision tables by the combining of decision rules. This method is logically equivalent to the Quine-McCluskey method for finding prime implicants. If some of the decision rules implied in the ELSE Rule occur with low frequency, then the ELSE Rule can be used to further simplify the decision table. Several objectives merit consideration in optimizing a decision table: reducing machine execution time; reducing preprocessing time; reducing required machine memory; reducing the number of decision rules. (This often improves the clarity of the decision table to a human reader.) It will be shown that objectives (3) and (4) can be furthered with the above methods. Objective (1) is also attained if overspecified decision rules are not combined. Objective (2) must be compared against the potential benefits of objectives (1), (3), and (4) in deciding whether to use the above methods.

Keith Shwayder

Research and Advances Aug 1 1975

Consecutive storage of relevant records with redundancy

This paper studies the properties of a new class of file organizations (CRWR) where records relevant to every query are stored in consecutive storage locations but the organizations contain redundancy. Some theorems which provide tools for reducing redundancy in CRWR organizations have been also developed. Redundancies obtained by the application of these theorems are compared with that of query-inverted file organizations. Some CRWR organizations with minimum redundancy have also been developed for queries which specify sets of keys.

Sakti P. Ghosh

Research and Advances Aug 1 1975

Interactive consulting via natural language

Interactive programming systems often contain help commands to give the programmer on-line instruction regarding the use of the various systems commands. It is argued that it would be relatively easy to make these help commands significantly more helpful by having them accept requests in natural language. As a demonstration, Weizenbaum's ELIZA program has been provided with a script that turns it into a natural language system consultant.

Stuart C. Shapiro and Stanley C. Kwasny

Research and Advances Jul 1 1975

The restriction language for computer grammars of natural language

Over the past few years, a number of systems for the computer analysis of natural language sentences have been based on augmented context-free grammars: a context-free grammar which defines a set of parse trees for a sentence, plus a group of restrictions to which a tree must conform in order to be a valid sentence analysis. As the coverage of the grammar is increased, an efficient representation becomes essential for further development. This paper presents a programming language designed specifically for the compact and perspicuous statement of restrictions of a natural language grammar. It is based on ten years' experience parsing text sentences with the comprehensive English grammar of the N.Y.U. Linguistic String Project, and embodies in its syntax and routines the relations which were found to be useful and adequate for computerized natural language analysis. The language is used in the current implementation of the Linguistic String Parser.

Naomi Sager and Ralph Grishman

Research and Advances Jun 1 1975

Sorting X + Y

L. H. Harper, T. H. Payne, J. E. Savage, and E. Straus

Research and Advances Jun 1 1975

Efficient string matching: an aid to bibliographic search

This paper describes a simple, efficient algorithm to locate all occurrences of any of a finite number of keywords in a string of text. The algorithm consists of constructing a finite state pattern matching machine from the keywords and then using the pattern matching machine to process the text string in a single pass. Construction of the pattern matching machine takes time proportional to the sum of the lengths of the keywords. The number of state transitions made by the pattern matching machine in processing the text string is independent of the number of keywords. The algorithm has been used to improve the speed of a library bibliographic search program by a factor of 5 to 10.

Alfred V. Aho and Margaret J. Corasick

Research and Advances May 1 1975

Analysis and performance of inverted data base structures

The need to envision and architecture data base systems in a hierarchical level by level framework is stressed. The inverted data base (file) organization is then analyzed, considering implementation oriented aspects. The inverted directory is viewed realistically as another large data base which itself is subjected to inversion. Formulations are derived to estimate average access time (read only) and storage requirements, formalizing the interaction of data base content characteristics, logical complexity of queries, and machine timing and blocking specifications identified as having a first-order effect on performance. The formulations presented are necessary to be used in conjunction with any index selection criteria to determine the optimum set of index keys.

Alfonso F. Cárdenas

Research and Advances Mar 1 1975

The algorithm sequential access method: an alternative to index sequential

A requirement of many computer file processing systems is the ability to access data stored on a direct access storage device (DASD) both sequentially and randomly. A popular method which provides both types of access is the index sequential organization. The purpose of this study is to propose and analyze a file organization method which should be appropriate for the same type of environment as index sequential but would be more efficient at the two types of access. This proposed method is designated the Algorithm Sequential Access Method (ASAM).

Jerry D. Whitt and A. Gale Sullenberger

Research and Advances Mar 1 1975

A reply to gentleman and Marovich

Gentleman and Marovich in their short communication [Comm. ACM 17, 5 (May 1974) 276-277] make the claim that for FORTRAN “… the language definitions require that subprograms can be separately compiled…,” citing the ANSI FORTRAN standard X3.9-1966 as their authority.

Richard Alan Karp

Research and Advances Feb 1 1975

Sentence paraphrasing from a conceptual base

A model of natural language generation based on an underlying language-free representation of meaning is described. A program based on this model is able to produce sentence paraphrases which demonstrate understanding with respect to a given context. This generator operates in conjunction with a natural language analyzer and a combined memory and inference model. In generating sentences from meaning structures, the program employs both the information retrieval and deduction capabilities of the memory model. The model encompasses several diverse classes of linguistic knowledge, which include: (1) executable tests of conceptual properties stored in discrimination nets; (2) information relating conceptual to syntactic roles, stored in a word-sense dictionary, and (3) surface grammatical knowledge, stored in a formal grammar.

Neil M. Goldman

Research and Advances Jan 1 1975

Professionalism in the computing field

The term professional means different things to different people; nevertheless, there are certain general technical and social standards normally associated with a professional. Further, the term is more generally applied to the practitioner rather than to the researcher. But within the rather broad definition specified, the computing practitioner is, as yet, not regarded as a professional. Each of the four types of institutions—academic, industry, government, and the professional society—that educate, employ, regulate, and mold the practitioner contributes to the “nonprofessional” status of the computing practitioner. The roles of these institutions are examined, various shortcomings are noted, and recommended changes are suggested. In the last analysis, professional status is not bestowed; it is earned. However, universities and industry, specifically, can make certain improvements to help the computing practitioner achieve professional status.

Aaron Finerman

Research and Advances Dec 1 1974

A heuristic approach to inductive inference in fact retrieval systems

Heuristic procedures are presented which have been developed to perform inferences by generalizing from available information. The procedures make use of a similarity structure which is imposed on the data base using nonnumerical clustering algorithms. They are implemented in a model fact retrieval system which uses a formal query language and a property-list data structure. A program of experiments is described wherein the procedures are used with test data bases which are altered by deleting part of the data and by purposely introducing false data. It is found that the system can infer the correct response under a variety of conditions involving incomplete and inconsistent data.

C. William Skinner

News Dec 1 1974

President’s letter: how and why you should get active in ACM

Jean E. Sammet

Research and Advances Nov 1 1974

The best-match problem in document retrieval

C. J. van Rijsbergen

Research and Advances Nov 1 1974

An on-site data management system application in field archaeology

J. A. Brown and Bernard Werner

Research and Advances Nov 1 1974

A method for composing simple traditional music by computer

A method is described for composing musical rounds by computer. This method uses some music theory plus additional heuristics. Fundamental to the method is a set of productions together with sets of applicability rules and weight rules which operate on the productions deciding when and to what extent they are available for use. Several rounds generated by the computer implementation of the method are presented. Generally, the resultant music sounds mediocre to the professional although usually pleasing to the layman. It appears that full-blown music theory is not needed for rounds—all the hardware required for structural levels is not necessary for these pieces. The author has tried to address both musicians and computer scientists.

Gary M. Rader

Research and Advances Nov 1 1974

Guidelines for humanizing computerized information systems: a report from Stanley House

Theodor D. Sterling

Research and Advances Oct 1 1974

A back-end computer for data base management

It is proposed that the data base management function be placed on a dedicated back-end computer which accepts commands (in a relatively high level language such as the CODASYL Data Base Task Group, April 1971 Report) from a host computer, accesses the data base on secondary storage, and returns results. The advantages of such a configuration are discussed. An experimental implementation, called the eXperimental Data Management System, XDMS, is described and certain conclusions about the back-end approach are drawn from this implementation.

Research and Advances Oct 1 1974

Structured data structures

Programming systems which permit arbitrary linked list structures enable the user to create complicated structures without sufficient protection. Deletions can result in unreachable data elements, and there is no guarantee that additions will be performed properly. To remedy this situation, this paper proposes a Data Structure Description and Manipulation Language which provides for the creation of a restricted class of data structures but ensures the correctness of the program. This is accomplished by an explicit structure declaration facility, a restriction on the permissible operations, and execution-time checks.

Ben Shneiderman and Peter Scheuermann

Research and Advances Sep 1 1974

An interactive graphic display for region partitioning by linear programming

Using linear programming, an interactive graphic display system has been implemented to solve the region design problem of partitioning a region into N nonoverlapping subregions in such a way that their areas are in specified proportions and that the total cost of servicing them is a minimum. In a conversational manner, a user can easily obtain different partitionings by specifying and modifying the boundary, the service centers' locations, the area proportions, and the cost functions. Examples are included.

To-Yat Cheung

Research and Advances Aug 1 1974

A new technique for compression and storage of data

The widespread tendency toward storage of large programs and blocks of text has produced a need for efficient methods of compressing and storing data. This paper describes techniques that can, in most cases, decrease storage size by a factor of from two to four. The techniques involve special handling of leading and trailing blanks, and the encoding of other symbols in groups of fixed size as unique fixed point numbers. The efficiency of the system is considered and pertinent statistics are given and compared with statistics for other information coding techniques.

Bruce Hahn

Opinion Jul 1 1974

ACM forum

CORPORATE CACM Staff

Shape the Future of Computing

Communications of the ACM (CACM) is now a fully Open Access publication.