Advertisement

Research and Advances

A preliminary system for the design of DBTG data structures

The functional approach to database design is introduced. In this approach the goal of design is to derive a data structure which is capable of supporting a set of anticipated queries rather than a structure which “models the business” in some other way. An operational computer program is described which utilizes the functional approach to design data structures conforming to the Data Base Task Group specifications. The automatic programming technology utilized by this program, although typically used to generate procedure, is here used to generate declaratives.
Research and Advances

Multidimensional binary search trees used for associative searching

This paper develops the multidimensional binary search tree (or k-d tree, where k is the dimensionality of the search space) as a data structure for storage of information to be retrieved by associative searches. The k-d tree is defined and examples are given. It is shown to be quite efficient in its storage requirements. A significant advantage of this structure is that a single data structure can handle many types of queries very efficiently. Various utility algorithms are developed; their proven average running times in an n record file are: insertion, O(log n); deletion of the root, O(n(k-1)/k); deletion of a random node, O(log n); and optimization (guarantees logarithmic performance of searches), O(n log n). Search algorithms are given for partial match queries with t keys specified [proven maximum running time of O(n(k-t)/k)] and for nearest neighbor queries [empirically observed average running time of O(log n).] These performances far surpass the best currently known algorithms for these tasks. An algorithm is presented to handle any general intersection query. The main focus of this paper is theoretical. It is felt, however, that k-d trees could be quite useful in many applications, and examples of potential uses are given.
Research and Advances

Combining decision rules in a decision table

The techniques for minimizing logic circuits are applied to the simplification of decision tables by the combining of decision rules. This method is logically equivalent to the Quine-McCluskey method for finding prime implicants. If some of the decision rules implied in the ELSE Rule occur with low frequency, then the ELSE Rule can be used to further simplify the decision table. Several objectives merit consideration in optimizing a decision table: reducing machine execution time; reducing preprocessing time; reducing required machine memory; reducing the number of decision rules. (This often improves the clarity of the decision table to a human reader.) It will be shown that objectives (3) and (4) can be furthered with the above methods. Objective (1) is also attained if overspecified decision rules are not combined. Objective (2) must be compared against the potential benefits of objectives (1), (3), and (4) in deciding whether to use the above methods.
Research and Advances

Consecutive storage of relevant records with redundancy

This paper studies the properties of a new class of file organizations (CRWR) where records relevant to every query are stored in consecutive storage locations but the organizations contain redundancy. Some theorems which provide tools for reducing redundancy in CRWR organizations have been also developed. Redundancies obtained by the application of these theorems are compared with that of query-inverted file organizations. Some CRWR organizations with minimum redundancy have also been developed for queries which specify sets of keys.
Research and Advances

Interactive consulting via natural language

Interactive programming systems often contain help commands to give the programmer on-line instruction regarding the use of the various systems commands. It is argued that it would be relatively easy to make these help commands significantly more helpful by having them accept requests in natural language. As a demonstration, Weizenbaum's ELIZA program has been provided with a script that turns it into a natural language system consultant.
Research and Advances

The restriction language for computer grammars of natural language

Over the past few years, a number of systems for the computer analysis of natural language sentences have been based on augmented context-free grammars: a context-free grammar which defines a set of parse trees for a sentence, plus a group of restrictions to which a tree must conform in order to be a valid sentence analysis. As the coverage of the grammar is increased, an efficient representation becomes essential for further development. This paper presents a programming language designed specifically for the compact and perspicuous statement of restrictions of a natural language grammar. It is based on ten years' experience parsing text sentences with the comprehensive English grammar of the N.Y.U. Linguistic String Project, and embodies in its syntax and routines the relations which were found to be useful and adequate for computerized natural language analysis. The language is used in the current implementation of the Linguistic String Parser.
Research and Advances

Efficient string matching: an aid to bibliographic search

This paper describes a simple, efficient algorithm to locate all occurrences of any of a finite number of keywords in a string of text. The algorithm consists of constructing a finite state pattern matching machine from the keywords and then using the pattern matching machine to process the text string in a single pass. Construction of the pattern matching machine takes time proportional to the sum of the lengths of the keywords. The number of state transitions made by the pattern matching machine in processing the text string is independent of the number of keywords. The algorithm has been used to improve the speed of a library bibliographic search program by a factor of 5 to 10.
Research and Advances

Analysis and performance of inverted data base structures

The need to envision and architecture data base systems in a hierarchical level by level framework is stressed. The inverted data base (file) organization is then analyzed, considering implementation oriented aspects. The inverted directory is viewed realistically as another large data base which itself is subjected to inversion. Formulations are derived to estimate average access time (read only) and storage requirements, formalizing the interaction of data base content characteristics, logical complexity of queries, and machine timing and blocking specifications identified as having a first-order effect on performance. The formulations presented are necessary to be used in conjunction with any index selection criteria to determine the optimum set of index keys.
Research and Advances

The algorithm sequential access method: an alternative to index sequential

A requirement of many computer file processing systems is the ability to access data stored on a direct access storage device (DASD) both sequentially and randomly. A popular method which provides both types of access is the index sequential organization. The purpose of this study is to propose and analyze a file organization method which should be appropriate for the same type of environment as index sequential but would be more efficient at the two types of access. This proposed method is designated the Algorithm Sequential Access Method (ASAM).
Research and Advances

A reply to gentleman and Marovich

Gentleman and Marovich in their short communication [Comm. ACM 17, 5 (May 1974) 276-277] make the claim that for FORTRAN “… the language definitions require that subprograms can be separately compiled…,” citing the ANSI FORTRAN standard X3.9-1966 as their authority.
Research and Advances

Sentence paraphrasing from a conceptual base

A model of natural language generation based on an underlying language-free representation of meaning is described. A program based on this model is able to produce sentence paraphrases which demonstrate understanding with respect to a given context. This generator operates in conjunction with a natural language analyzer and a combined memory and inference model. In generating sentences from meaning structures, the program employs both the information retrieval and deduction capabilities of the memory model. The model encompasses several diverse classes of linguistic knowledge, which include: (1) executable tests of conceptual properties stored in discrimination nets; (2) information relating conceptual to syntactic roles, stored in a word-sense dictionary, and (3) surface grammatical knowledge, stored in a formal grammar.
Research and Advances

Professionalism in the computing field

The term professional means different things to different people; nevertheless, there are certain general technical and social standards normally associated with a professional. Further, the term is more generally applied to the practitioner rather than to the researcher. But within the rather broad definition specified, the computing practitioner is, as yet, not regarded as a professional. Each of the four types of institutions—academic, industry, government, and the professional society—that educate, employ, regulate, and mold the practitioner contributes to the “nonprofessional” status of the computing practitioner. The roles of these institutions are examined, various shortcomings are noted, and recommended changes are suggested. In the last analysis, professional status is not bestowed; it is earned. However, universities and industry, specifically, can make certain improvements to help the computing practitioner achieve professional status.
Research and Advances

A heuristic approach to inductive inference in fact retrieval systems

Heuristic procedures are presented which have been developed to perform inferences by generalizing from available information. The procedures make use of a similarity structure which is imposed on the data base using nonnumerical clustering algorithms. They are implemented in a model fact retrieval system which uses a formal query language and a property-list data structure. A program of experiments is described wherein the procedures are used with test data bases which are altered by deleting part of the data and by purposely introducing false data. It is found that the system can infer the correct response under a variety of conditions involving incomplete and inconsistent data.
Research and Advances

A method for composing simple traditional music by computer

A method is described for composing musical rounds by computer. This method uses some music theory plus additional heuristics. Fundamental to the method is a set of productions together with sets of applicability rules and weight rules which operate on the productions deciding when and to what extent they are available for use. Several rounds generated by the computer implementation of the method are presented. Generally, the resultant music sounds mediocre to the professional although usually pleasing to the layman. It appears that full-blown music theory is not needed for rounds—all the hardware required for structural levels is not necessary for these pieces. The author has tried to address both musicians and computer scientists.
Research and Advances

A back-end computer for data base management

It is proposed that the data base management function be placed on a dedicated back-end computer which accepts commands (in a relatively high level language such as the CODASYL Data Base Task Group, April 1971 Report) from a host computer, accesses the data base on secondary storage, and returns results. The advantages of such a configuration are discussed. An experimental implementation, called the eXperimental Data Management System, XDMS, is described and certain conclusions about the back-end approach are drawn from this implementation.
Research and Advances

Structured data structures

Programming systems which permit arbitrary linked list structures enable the user to create complicated structures without sufficient protection. Deletions can result in unreachable data elements, and there is no guarantee that additions will be performed properly. To remedy this situation, this paper proposes a Data Structure Description and Manipulation Language which provides for the creation of a restricted class of data structures but ensures the correctness of the program. This is accomplished by an explicit structure declaration facility, a restriction on the permissible operations, and execution-time checks.
Research and Advances

An interactive graphic display for region partitioning by linear programming

Using linear programming, an interactive graphic display system has been implemented to solve the region design problem of partitioning a region into N nonoverlapping subregions in such a way that their areas are in specified proportions and that the total cost of servicing them is a minimum. In a conversational manner, a user can easily obtain different partitionings by specifying and modifying the boundary, the service centers' locations, the area proportions, and the cost functions. Examples are included.
Research and Advances

A new technique for compression and storage of data

The widespread tendency toward storage of large programs and blocks of text has produced a need for efficient methods of compressing and storing data. This paper describes techniques that can, in most cases, decrease storage size by a factor of from two to four. The techniques involve special handling of leading and trailing blanks, and the encoding of other symbols in groups of fixed size as unique fixed point numbers. The efficiency of the system is considered and pertinent statistics are given and compared with statistics for other information coding techniques.

Shape the Future of Computing

ACM encourages its members to take a direct hand in shaping the future of the association. There are more ways than ever to get involved.

Get Involved

Communications of the ACM (CACM) is now a fully Open Access publication.

By opening CACM to the world, we hope to increase engagement among the broader computer science community and encourage non-members to discover the rich resources ACM has to offer.

Learn More