January 1983 - Vol. 26 No. 1

January 1983 issue cover image

Features

Research and Advances

Sequential formula translation

The syntax of an algorithmic language such as ALGOL is conveniently described as a sequence of states indicated by an element called cellar. Transitions are controlled by admissible state-symbol pairs which may be represented by a transition matrix. This description of syntax furnishes at the same time an extremely simple rule for translating into machine programs statements in the algorithmic language. Sequential treatment, however, is not feasible in the case of certain optimizing processes such as recursive address calculation.
Research and Advances

A syntax directed compiler for ALGOL 60

Although one generally thinks of a compiler as a program for a computer which translates some object language into a target language, in fact this program also serves to define the object language in terms of the target language. In early compilers, these two functions are fused inextricably in the machine language program which is the compiler. This fusion makes incorporation into the compiler of extensions or modifications to the object language extremely difficult. This paper describes a compiling system which essentially separates the functions of defining the language and translating it into another. Part 1 presents the meta-language used to define the object language in terms of the target language. This meta-language is an extension of the syntax meta-language used in the ALGOL 60 report which allows specification of meaning (in terms of the target language) as well as of form. This succinct definition allows modifications to the form or meaning of the object language to be incorporated easily into the system, and in fact makes the original specification of the object language a reasonably easy task. Part 2 is a description of the program which utilizes a direct machine representation of the meta-linguistic specifications to effect a translation. Before proceeding to a description of the meta-language we wish to demonstrate heuristically that the proposed meta-language does suffice to specify a translation for any language it can describe. If one proposes to translate language A into language B, it is necessary to have some kind of description of language A in terms of language B. More specifically, one must be able to describe the alphabet of A in B, and must have a set of rules for assigning meaning in B to various possible structures which can be formed in A by concatenating the characters of A's alphabet. The set of rules might be called the syntax of language A, if one considers definitions (in the usual sense of the word) to be merely additional rules of syntax. A translation process might then be to start with the beginning symbols of the string to be translated and to assign meaning and a new syntactic name to symbol groups as they fall into the several syntactic structures. Having thus formed a new set of syntactic elements, the next step is to modify the meanings or amplify them according to the new structures into which these syntactic elements fall. If one considers the characters of the alphabet to be syntactical units themselves, the two steps in the process are indeed indentical. Evidently the only restriction necessary to make such a description uniquely specify a language is that there be a unique syntactic structure for any possible finite string of symbols in the language. Given that the syntactic description meets this uniqueness requirement, a translation using the description can be effected by fitting already discovered syntactic units (starting with the syntactic units which are the basic symbols of the language) into the syntactic structure to produce a new set of larger syntactic units, and assign meanings to these new units according to the meanings of the original units. This translation algorithm is then repeated on the new set of larger syntactic units.Since several syntactic units are coalesced effectively into one each time the translation algorithm is performed, the process will converge to exactly one syntactic unit—a “program”. Also the meaning of that unit will have been discovered. In essence, the process is to diagram the string of symbols according to syntax and simultaneously to keep track of the meaning associated with each structure. It therefore suffices to define language A in terms of B by listing a series of “definitions”, each definition listing:A string of syntactic units in A.One syntactic unit of A which is the equivalent of the string.The meaning (described in language B) associated with the syntactic structure or, if meaning has already been assigned to some of the syntactic units of 1, modifications or amplifications of these meanings. When human beings translate one language to another, we would generally consult a subset of rules like these, namely a dictionary, in order to assign basic meanings to concatenated characters of the alphabet. The rules for assigning meaning (or modifying meanings) of larger syntactic units are usually kept in the head of even the amateur translator. But nevertheless, the rules must be referenced to complete any translation and, if we are to ask a computing machine to translate for us, we must be able to make these rules explicit.
Research and Advances

Use of tree structures for processing files

In data processing problems, files are frequently used which must both be searched and altered. Binary search techniques are efficient for searching large files, but the associated file organization is not readily adapted to the file alterations. Conversely, a chained file allocation permits elcient alteration but cannot be searched efficiently. A file organized into a tree-like structure is discussed, and it is shown that such a file may both be searched and altered with times proportional to s log, N, where N is the number of file items and s is a parameter of the tree. It is also shown that optimizing the value of s leads to a search time which is only 25 per cent slower than the binary search. The tree organization employs two data chains and may be considered to be a compromise between the organizations for the binary search and the chained file. The relation of the tree organization to multidimensional indexing and to the trie structure is also discussed.
Research and Advances

ELIZA — a computer program for the study of natural language communication between man and machine

ELIZA is a program operating within the MAC time-sharing system of MIT which makes certain kinds of natural language conversation between man and computer possible. Input sentences are analyzed on the basis of decomposition rules which are triggered by key words appearing in the input text. Responses are generated by reassembly rules associated with selected decomposition rules. The fundamental technical problems with which ELIZA is concerned are: (1) the identification of key words, (2) the discovery of minimal context, (3) the choice of appropriate transformations, (4) generation of responses in the absence of key words, and (5) the provision of an editing capability for ELIZA “scripts”. A discussion of some psychological issues relevant to the ELIZA approach as well as of future developments concludes the paper.
Research and Advances

Programming semantics for multiprogrammed computations

The semantics are defined for a number of meta-instructions which perform operations essential to the writing of programs in multiprogrammed computer systems. These meta-instructions relate to parallel processing, protection of separate computations, program debugging, and the sharing among users of memory segments and other computing objects, the names of which are hierarchically structured. The language sophistication contemplated is midway between an assembly language and an advanced algebraic language.
Research and Advances

An improved hash code for scatter storage

Introduced is a hash coding method based on fixed-point division rather than multiplication or logical operations. This new method allows the hash table to have almost any length. Also a new method of handling collisions is discussed. Known as quadratic search, this method is faster than random search and free from the “clusters” that build up with a linear search.
Research and Advances

The working set model for program behavior

Probably the most basic reason behind the absence of a general treatment of resource allocation in modern computer systems is an adequate model for program behavior. In this paper a new model, the “working set model,” is developed. The working set of pages associated with a process, defined to be the collection of its most recently used pages, provides knowledge vital to the dynamic management of paged memories. “Process” and “working set” are shown to be manifestations of the same ongoing computational activity; then “processor demand” and “memory demand” are defined; and resource allocation is formulated as the problem of balancing demands against available equipment.
Research and Advances

The structure of “THE”-multiprogramming system

A multiprogramming system is described in which all activities are divided over a number of sequential processes. These sequential processes are placed at various hierarchical levels, in each of which one or more independent abstractions have been implemented. The hierarchical structure proved to be vital for the verification of the logical soundness of the design and the correctness of its implementation.
Research and Advances

An axiomatic basis for computer programming

In this paper an attempt is made to explore the logical foundations of computer programming by use of techniques which were first applied in the study of geometry and have later been extended to other branches of mathematics. This involves the elucidation of sets of axioms and rules of inference which can be used in proofs of the properties of computer programs. Examples are given of such axioms and rules, and a formal proof of a simple theorem is displayed. Finally, it is argued that important advantages, both theoretical and practical, may follow from a pursuance of these topics.
Research and Advances

An efficient context-free parsing algorithm

A parsing algorithm which seems to be the most efficient general context-free algorithm known is described. It is similar to both Knuth's LR(k) algorithm and the familiar top-down algorithm. It has a time bound proportional to n3 (where n is the length of the string being parsed) in general; it has an n2 bound for unambiguous grammars; and it runs in linear time on a large class of grammars, which seems to include most practical context-free programming language grammars. In an empirical comparison it appears to be superior to the top-down and bottom-up algorithms studied by Griffiths and Petrick.
Research and Advances

A relational model of data for large shared data banks

Future users of large data banks must be protected from having to know how the data is organized in the machine (the internal representation). A prompting service which supplies such information is not a satisfactory solution. Activities of users at terminals and most application programs should remain unaffected when the internal representation of data is changed and even when some aspects of the external representation are changed. Changes in data representation will often be needed as a result of changes in query, update, and report traffic and natural growth in the types of stored information. Existing noninferential, formatted data systems provide users with tree-structured files or slightly more general network models of the data. In Section 1, inadequacies of these models are discussed. A model based on n-ary relations, a normal form for data base relations, and the concept of a universal data sublanguage are introduced. In Section 2, certain operations on relations (other than logical inference) are discussed and applied to the problems of redundancy and consistency in the user's model.
Research and Advances

Program development by stepwise refinement

The creative activity of programming—to be distinguished from coding—is usually taught by examples serving to exhibit certain techniques. It is here considered as a sequence of design decisions concerning the decomposition of tasks into subtasks and of data into data structures. The process of successive refinement of specifications is illustrated by a short but nontrivial example, from which a number of conclusions are drawn regarding the art and the instruction of programming.
Research and Advances

A technique for software module specification with examples

This paper presents an approach to writing specifications for parts of software systems. The main goal is to provide specifications sufficiently precise and complete that other pieces of software can be written to interact with the piece specified without additional information. The secondary goal is to include in the specification no more information than necessary to meet the first goal. The technique is illustrated by means of a variety of examples from a tutorial system.
Research and Advances

A statistical study of the accuracy of floating point number systems

This paper presents the statistical results of tests of the accuracy of certain arithmetic systems in evaluating sums, products and inner products, and analytic error estimates for some of the computations. The arithmetic systems studied are 6-digit hexadecimal and 22-digit binary floating point number representations combined with the usual chop and round modes of arithmetic with various numbers of guard digits, and with a modified round mode with guard digits. In a certain sense, arithmetic systems differing only in their use of binary or hexadecimal number representations are shown to be approximately statistically equivalent in accuracy. Further, the usual round mode with guard digits is shown to be statistically superior in accuracy to the usual chop mode in all cases save one. The modified round mode is found to be superior to the chop mode in all cases.
Research and Advances

The UNIX time-sharing system

UNIX is a general-purpose, multi-user, interactive operating system for the Digital Equipment Corporation PDP-11/40 and 11/45 computers. It offers a number of features seldom found even in a larger operating systems, including: (1) a hierarchical file system incorporating demountable volumes; (2) compatible file, device, and inter-process I/O; (3) the ability to initiate asynchronous processes; (4) system command language selectable on a per-user basis; and (5) over 100 subsystems including a dozen languages. This paper discusses the nature and implementation of the file system and of the user command interface.
Research and Advances

Ethernet: distributed packet switching for local computer networks

Ethernet is a branching broadcast communication system for carrying digital data packets among locally distributed computing stations. The packet transport mechanism provided by Ethernet has been used to build systems which can be viewed as either local computer networks or loosely coupled multiprocessors. An Ethernet's shared communication facility, its Ether, is a passive broadcast medium with no central control. Coordination of access to the Ether for packet broadcasts is distributed among the contending transmitting stations using controlled statistical arbitration. Switching of packets to their destinations on the Ether is distributed among the receiving stations using packet address recognition. Design principles and implementation are described, based on experience with an operating Ethernet of 100 nodes along a kilometer of coaxial cable. A model for estimating performance under heavy loads and a packet protocol for error controlled communication are included for completeness.
Research and Advances

A method for obtaining digital signatures and public-key cryptosystems

An encryption method is presented with the novel property that publicly revealing an encryption key does not thereby reveal the corresponding decryption key. This has two important consequences: Couriers or other secure means are not needed to transmit keys, since a message can be enciphered using an encryption key publicly revealed by the intended recipient. Only he can decipher the message, since only he knows the corresponding decryption key. A message can be “signed” using a privately held decryption key. Anyone can verify this signature using the corresponding publicly revealed encryption key. Signatures cannot be forged, and a signer cannot later deny the validity of his signature. This has obvious applications in “electronic mail” and “electronic funds transfer” systems. A message is encrypted by representing it as a number M, raising M to a publicly specified power e, and then taking the remainder when the result is divided by the publicly specified product, n, of two large secret prime numbers p and q. Decryption is similar; only a different, secret, power d is used, where e * d = 1(mod (p - 1) * (q - 1)). The security of the system rests in part on the difficulty of factoring the published divisor, n.
Research and Advances

Communicating sequential processes

This paper suggests that input and output are basic primitives of programming and that parallel composition of communicating sequential processes is a fundamental program structuring method. When combined with a development of Dijkstra's guarded command, these concepts are surprisingly versatile. Their use is illustrated by sample solutions of a variety of familiar programming exercises.

Recent Issues

  1. December 2024 CACM cover
    December 2024 Vol. 67 No. 12
  2. November 2024 CACM cover
    November 2024 Vol. 67 No. 11
  3. October 2024 CACM cover
    October 2024 Vol. 67 No. 10
  4. September 2024 CACM cover
    September 2024 Vol. 67 No. 9