May 1963 - Vol. 6 No. 5

May 1963 issue cover image

Features

Research and Advances

Internal and tape sorting using the replacement-selection technique

A general technique for sequencing unsorted records is presented. The technique is shown to be applicable for the first stage of a generalized sort program (the formation of initial strings) as well as for sorting records within a memory storage (an internal sort). It is shown that given N records in memory storage, records are sequenced using 1+log2 N tests per record, that initial string lengths will average 2N for random input records, and that reading, writing and processing can be accomplished simultaneously if the computer permits such overlap.
Research and Advances

Multiphase sorting

With a limited number of tape drives available for sorting, the polyphase technique of merging provides faster sorting than the conventional balanced method of merging. This paper is an attempt to describe the polyphase method used in the IBM 1401 Sort 2 program for a computer system with four tape drives available.
Research and Advances

Read-backward polyphase sorting

Read-backward Polyphase sorting provides more efficient use of the tapes available to a sort than most other sorting techniques. Backward Polyphase produces a continuous merging process from n - 1 tapes where n is the total number of tapes being used in the sorting process. Any of the available presorting techniques may be used in conjunction with the Polyphase merge sort provided that the presort has the capability of producing both ascending and descending strings and distributing the strings on the various tapes as required by the Polyphase Merge.
Research and Advances

A comparison between the polyphase and oscillating sort techniques

Read-backward Polyphase sorting provides more efficient use of the tapes available to a sort than most other sorting techniques. Backward Polyphase produces a continuous merging process from n - 1 tapes where n is the total number of tapes being used in the sorting process. Any of the available presorting techniques may be used in conjunction with the Polyphase merge sort provided that the presort has the capability of producing both ascending and descending strings and distributing the strings on the various tapes as required by the Polyphase Merge.
Research and Advances

Computer planned collates

Little effort has been directed toward the creation of efficient, easy-to-use file merging routines. This lack of effort undoubtedly stems from the feeling that a process as simple and straightforward as merging two or more files does not lend itself to sophisticated techniques.
Research and Advances

A tape file merge pattern generator

A routine is presented which specifies the sequence of merge cycles to effect the merging of sorted tape files. The routine is designed to minimize elapsed computer time by varying the power of the merge cycles, so as to use all the available tape drives, with its characteristic of assigning one drive to a single-eel file and two drives to each multiple-reel file.
Research and Advances

Sorting with large volume, random access, drum storage

An approach to sorting records is described using random access drum memory. The Sort program described is designed to be a generalized, self-generating sort, applicable to a variety of record statements. This description is divided into three parts. The first part presents the operating environment; the second defines the general solution; the third part describes the internal sort-merge technique.
Research and Advances

Organization and structure of data on disk file memory systems for efficient sorting and other data processing programs

An approach to the organization and structure of data on Bryant Disc File Memory Systems for sorting and performing other data processing functions is presented. The following areas are covered: characteristics of Bryant Disc File Systems on the Bendix G-20 and RCA 301; two proposed “chaining” structures for data; and functions of a Disk File Executive Routine. The concepts for sorting and performing file maintenance processing using the proposed structure and executive routine are discussed. Additionally, it is shown that sorting can be accomplished without the use of disk storage work areas.
Research and Advances

Some characteristics of sorting computing systems using random access storage devices

The substantial differences in characteristics of random access storage and tape devices dictate that concepts and objectives of computer program design be considered from the viewpoint of the external file medium used. This is particularly true in the case of sorting. In a tape-oriented system, the major sorting problem is that of minimizing merge time despite the limited orders of merge possible. In contrast, sorting in a random access-oriented system encourages the selection of the optimum order of merge from many possible orders. The latter problem is discussed in this paper, along with criteria developed for determining the optimum order of merge according to the various properties of random access storage devices. Attention is also given to the problem of key sorting versus record sorting and the possibly serious disadvantage of key sorting on a random access system.
Research and Advances

The COBOL sorting verb

The COBOL-61 specifications have recently been augmented with a number of extensions, one of which is a SORT verb. COBOL was designed initially for use in the processing of data-files stored on serially-accessed input-output devices. In applications of this general type of processing, the sorting of data files is frequently found to be necessary. Furthermore, it is often convenient and economical to carry out other operations at the same time, such as pre-editing or pre-selection of data records as they enter the sorting process, or editing, merging, or report preparation as the records emerge from the sort. It is a great convenience to the user to be able to express all of these operations in the same procedural language that he is using for his other data-processing applications, instead of having to use a sorting system in which first pass and last pass own coding must be expressed in some other language. Furthermore, COBOL itself is not well adapted to the writing of efficient sorting programs. Accordingly, a sort function appeared to be a very desirable addition to the COBOL specifications.
Research and Advances

A method of comparing the time requirements of sorting methods

Formulas have been developed for estimating the sorting time required by most of the known internal sorting methods in terms of parameters describing the file to be sorted and the computer time required for basic sorting operations. Procedures are described for using these estimates to compare the efficiency of various methods, and results of several comparisons are given. A procedure to determine optimum combinations of sorting methods is shown. An expression for tape time required by a balanced merge sort is obtained, and difficulties in obtaining such expressions for cascade, polyphase and oscillating sorts are discussed.
Research and Advances

Design and characteristics of a variable-length record sort using new fixed-length record sorting techniques

This paper describes the application of several new techniques for sorting fixed-length records to the problem of variable-length record sorting. The techniques have been implemented on a Sylvania 9400 computer system with 32,000 fixed-length words of memory. Specifically, the techniques sequence variable-length records of unrestricted size, produce long initial strings of data, merge strings of data at the power of T - 1, where T is the number of work tapes in a system, and do not restrict the volume of input data.
Research and Advances

Use of tree structures for processing files

In data processing problems, files are frequently used which must both be searched and altered. Binary search techniques are efficient for searching large files, but the associated file organization is not readily adapted to the file alterations. Conversely, a chained file allocation permits e 1 cient alteration but cannot be searched efficiently. A file organized into a tree-like structure is discussed, and it is shown that such a file may both be searched and altered with times proportional to s logs N, where N is the number of file items and s is a parameter of the tree. It is also shown that optimizing the value of s leads to a search time which is only 25 per cent slower than the binary search. The tree organization employs two data chains and may be considered to be a compromise between the organizations for the binary search and the chained file. The relation of the tree organization to multidimensional indexing and to the trie structure is also discussed.

Recent Issues

  1. November 2024 CACM cover
    November 2024 Vol. 67 No. 11
  2. October 2024 CACM cover
    October 2024 Vol. 67 No. 10
  3. September 2024 CACM cover
    September 2024 Vol. 67 No. 9
  4. August 2024 CACM cover
    August 2024 Vol. 67 No. 8