Advertisement

Research and Advances

Efficient string matching: an aid to bibliographic search

This paper describes a simple, efficient algorithm to locate all occurrences of any of a finite number of keywords in a string of text. The algorithm consists of constructing a finite state pattern matching machine from the keywords and then using the pattern matching machine to process the text string in a single pass. Construction of the pattern matching machine takes time proportional to the sum of the lengths of the keywords. The number of state transitions made by the pattern matching machine in processing the text string is independent of the number of keywords. The algorithm has been used to improve the speed of a library bibliographic search program by a factor of 5 to 10.
Research and Advances

Analysis and performance of inverted data base structures

The need to envision and architecture data base systems in a hierarchical level by level framework is stressed. The inverted data base (file) organization is then analyzed, considering implementation oriented aspects. The inverted directory is viewed realistically as another large data base which itself is subjected to inversion. Formulations are derived to estimate average access time (read only) and storage requirements, formalizing the interaction of data base content characteristics, logical complexity of queries, and machine timing and blocking specifications identified as having a first-order effect on performance. The formulations presented are necessary to be used in conjunction with any index selection criteria to determine the optimum set of index keys.
Research and Advances

The algorithm sequential access method: an alternative to index sequential

A requirement of many computer file processing systems is the ability to access data stored on a direct access storage device (DASD) both sequentially and randomly. A popular method which provides both types of access is the index sequential organization. The purpose of this study is to propose and analyze a file organization method which should be appropriate for the same type of environment as index sequential but would be more efficient at the two types of access. This proposed method is designated the Algorithm Sequential Access Method (ASAM).
Research and Advances

A reply to gentleman and Marovich

Gentleman and Marovich in their short communication [Comm. ACM 17, 5 (May 1974) 276-277] make the claim that for FORTRAN “… the language definitions require that subprograms can be separately compiled…,” citing the ANSI FORTRAN standard X3.9-1966 as their authority.
Research and Advances

Sentence paraphrasing from a conceptual base

A model of natural language generation based on an underlying language-free representation of meaning is described. A program based on this model is able to produce sentence paraphrases which demonstrate understanding with respect to a given context. This generator operates in conjunction with a natural language analyzer and a combined memory and inference model. In generating sentences from meaning structures, the program employs both the information retrieval and deduction capabilities of the memory model. The model encompasses several diverse classes of linguistic knowledge, which include: (1) executable tests of conceptual properties stored in discrimination nets; (2) information relating conceptual to syntactic roles, stored in a word-sense dictionary, and (3) surface grammatical knowledge, stored in a formal grammar.
Research and Advances

Professionalism in the computing field

The term professional means different things to different people; nevertheless, there are certain general technical and social standards normally associated with a professional. Further, the term is more generally applied to the practitioner rather than to the researcher. But within the rather broad definition specified, the computing practitioner is, as yet, not regarded as a professional. Each of the four types of institutions—academic, industry, government, and the professional society—that educate, employ, regulate, and mold the practitioner contributes to the “nonprofessional” status of the computing practitioner. The roles of these institutions are examined, various shortcomings are noted, and recommended changes are suggested. In the last analysis, professional status is not bestowed; it is earned. However, universities and industry, specifically, can make certain improvements to help the computing practitioner achieve professional status.
Research and Advances

A heuristic approach to inductive inference in fact retrieval systems

Heuristic procedures are presented which have been developed to perform inferences by generalizing from available information. The procedures make use of a similarity structure which is imposed on the data base using nonnumerical clustering algorithms. They are implemented in a model fact retrieval system which uses a formal query language and a property-list data structure. A program of experiments is described wherein the procedures are used with test data bases which are altered by deleting part of the data and by purposely introducing false data. It is found that the system can infer the correct response under a variety of conditions involving incomplete and inconsistent data.
Research and Advances

A method for composing simple traditional music by computer

A method is described for composing musical rounds by computer. This method uses some music theory plus additional heuristics. Fundamental to the method is a set of productions together with sets of applicability rules and weight rules which operate on the productions deciding when and to what extent they are available for use. Several rounds generated by the computer implementation of the method are presented. Generally, the resultant music sounds mediocre to the professional although usually pleasing to the layman. It appears that full-blown music theory is not needed for rounds—all the hardware required for structural levels is not necessary for these pieces. The author has tried to address both musicians and computer scientists.
Research and Advances

A back-end computer for data base management

It is proposed that the data base management function be placed on a dedicated back-end computer which accepts commands (in a relatively high level language such as the CODASYL Data Base Task Group, April 1971 Report) from a host computer, accesses the data base on secondary storage, and returns results. The advantages of such a configuration are discussed. An experimental implementation, called the eXperimental Data Management System, XDMS, is described and certain conclusions about the back-end approach are drawn from this implementation.
Research and Advances

Structured data structures

Programming systems which permit arbitrary linked list structures enable the user to create complicated structures without sufficient protection. Deletions can result in unreachable data elements, and there is no guarantee that additions will be performed properly. To remedy this situation, this paper proposes a Data Structure Description and Manipulation Language which provides for the creation of a restricted class of data structures but ensures the correctness of the program. This is accomplished by an explicit structure declaration facility, a restriction on the permissible operations, and execution-time checks.
Research and Advances

An interactive graphic display for region partitioning by linear programming

Using linear programming, an interactive graphic display system has been implemented to solve the region design problem of partitioning a region into N nonoverlapping subregions in such a way that their areas are in specified proportions and that the total cost of servicing them is a minimum. In a conversational manner, a user can easily obtain different partitionings by specifying and modifying the boundary, the service centers' locations, the area proportions, and the cost functions. Examples are included.
Research and Advances

A new technique for compression and storage of data

The widespread tendency toward storage of large programs and blocks of text has produced a need for efficient methods of compressing and storing data. This paper describes techniques that can, in most cases, decrease storage size by a factor of from two to four. The techniques involve special handling of leading and trailing blanks, and the encoding of other symbols in groups of fixed size as unique fixed point numbers. The efficiency of the system is considered and pertinent statistics are given and compared with statistics for other information coding techniques.
Research and Advances

An information-theoretic approach to text searching in direct access systems

Using direct access computer files of bibliographic information, an attempt is made to overcome one of the problems often associated with information retrieval, namely, the maintenance and use of large dictionaries, the greater part of which is used only infrequently. A novel method is presented, which maps the hyperbolic frequency distribution of text characteristics onto a rectangular distribution. This is more suited to implementation on storage devices. This method treats text as a string of characters rather than words bounded by spaces, and chooses subsets of strings such that their frequencies of occurrence are more even than those of word types. The members of this subset are then used as index keys for retrieval. The rectangular distribution of key frequencies results in a much simplified file organization and promises considerable cost advantages.
Research and Advances

Emotional content considered dangerous

I had hoped that Moorer's rebuttal to my short communication in the November 1972 Communications would close the debate on a topic which, like the computer itself, has provoked an inordinately large quantity of unqualified argument. Unfortunately, the short communications by McMorrow and Wexelblat in the May 1973 Communications lead me to believe that my position is still grossly misunderstood. Therefore, allow me to clarify these matters.
Research and Advances

Scan conversion algorithms for a cell organized raster display

Raster scan computer graphics with “real time” character generators have previously been limited to alphanumeric characters. A display has been described which extends the capabilities of this organization to include general graphics. Two fundamentally different scan conversion algorithms which have been developed to support this display are presented. One is most suitable to noninteractive applications and the other to interactive applications. The algorithms were implemented in Fortran on the CDC6400 computer. Results obtained from the implementations show that the noninteractive algorithms can significantly reduce display file storage requirements at little cost in execution time over that of a conventional raster display. The interactive algorithm can improve response time and reduce storage requirements.
Research and Advances

Attribute based file organization in a paged memory environment

The high cost of page accessing implies a need for for more careful data organization in a paged memory than is typical of most inverted file and similar approaches to multi-key retrieval. This article analyses that cost and proposes a method called multiple key hashing which attempts to minimize it. Since this approach is not always preferable to inversion, a combined method is described. The exact specifications of this combination for a file with given data and traffic characteristics is formulated as a mathematical program. The proposed heuristic solution to this program can often improve on a simple inversion technique by a factor of 2 or 3.

Shape the Future of Computing

ACM encourages its members to take a direct hand in shaping the future of the association. There are more ways than ever to get involved.

Get Involved

Communications of the ACM (CACM) is now a fully Open Access publication.

By opening CACM to the world, we hope to increase engagement among the broader computer science community and encourage non-members to discover the rich resources ACM has to offer.

Learn More