Advertisement

Organization and structure of data on disk file memory systems for efficient sorting and other data processing programs

An approach to the organization and structure of data on Bryant Disc File Memory Systems for sorting and performing other data processing functions is presented. The following areas are covered: characteristics of Bryant Disc File Systems on the Bendix G-20 and RCA 301; two proposed “chaining” structures for data; and functions of a Disk File Executive Routine. The concepts for sorting and performing file maintenance processing using the proposed structure and executive routine are discussed. Additionally, it is shown that sorting can be accomplished without the use of disk storage work areas.

The COBOL sorting verb

The COBOL-61 specifications have recently been augmented with a number of extensions, one of which is a SORT verb. COBOL was designed initially for use in the processing of data-files stored on serially-accessed input-output devices. In applications of this general type of processing, the sorting of data files is frequently found to be necessary. Furthermore, it is often convenient and economical to carry out other operations at the same time, such as pre-editing or pre-selection of data records as they enter the sorting process, or editing, merging, or report preparation as the records emerge from the sort. It is a great convenience to the user to be able to express all of these operations in the same procedural language that he is using for his other data-processing applications, instead of having to use a sorting system in which first pass and last pass own coding must be expressed in some other language. Furthermore, COBOL itself is not well adapted to the writing of efficient sorting programs. Accordingly, a sort function appeared to be a very desirable addition to the COBOL specifications.

A tape file merge pattern generator

A routine is presented which specifies the sequence of merge cycles to effect the merging of sorted tape files. The routine is designed to minimize elapsed computer time by varying the power of the merge cycles, so as to use all the available tape drives, with its characteristic of assigning one drive to a single-eel file and two drives to each multiple-reel file.

Use of tree structures for processing files

In data processing problems, files are frequently used which must both be searched and altered. Binary search techniques are efficient for searching large files, but the associated file organization is not readily adapted to the file alterations. Conversely, a chained file allocation permits e 1 cient alteration but cannot be searched efficiently. A file organized into a tree-like structure is discussed, and it is shown that such a file may both be searched and altered with times proportional to s logs N, where N is the number of file items and s is a parameter of the tree. It is also shown that optimizing the value of s leads to a search time which is only 25 per cent slower than the binary search. The tree organization employs two data chains and may be considered to be a compromise between the organizations for the binary search and the chained file. The relation of the tree organization to multidimensional indexing and to the trie structure is also discussed.

A comparison between the polyphase and oscillating sort techniques

Read-backward Polyphase sorting provides more efficient use of the tapes available to a sort than most other sorting techniques. Backward Polyphase produces a continuous merging process from n - 1 tapes where n is the total number of tapes being used in the sorting process. Any of the available presorting techniques may be used in conjunction with the Polyphase merge sort provided that the presort has the capability of producing both ascending and descending strings and distributing the strings on the various tapes as required by the Polyphase Merge.

Some characteristics of sorting computing systems using random access storage devices

The substantial differences in characteristics of random access storage and tape devices dictate that concepts and objectives of computer program design be considered from the viewpoint of the external file medium used. This is particularly true in the case of sorting. In a tape-oriented system, the major sorting problem is that of minimizing merge time despite the limited orders of merge possible. In contrast, sorting in a random access-oriented system encourages the selection of the optimum order of merge from many possible orders. The latter problem is discussed in this paper, along with criteria developed for determining the optimum order of merge according to the various properties of random access storage devices. Attention is also given to the problem of key sorting versus record sorting and the possibly serious disadvantage of key sorting on a random access system.

Read-backward polyphase sorting

Read-backward Polyphase sorting provides more efficient use of the tapes available to a sort than most other sorting techniques. Backward Polyphase produces a continuous merging process from n - 1 tapes where n is the total number of tapes being used in the sorting process. Any of the available presorting techniques may be used in conjunction with the Polyphase merge sort provided that the presort has the capability of producing both ascending and descending strings and distributing the strings on the various tapes as required by the Polyphase Merge.

Multiphase sorting

With a limited number of tape drives available for sorting, the polyphase technique of merging provides faster sorting than the conventional balanced method of merging. This paper is an attempt to describe the polyphase method used in the IBM 1401 Sort 2 program for a computer system with four tape drives available.

Sorting with large volume, random access, drum storage

An approach to sorting records is described using random access drum memory. The Sort program described is designed to be a generalized, self-generating sort, applicable to a variety of record statements. This description is divided into three parts. The first part presents the operating environment; the second defines the general solution; the third part describes the internal sort-merge technique.

A method of comparing the time requirements of sorting methods

Formulas have been developed for estimating the sorting time required by most of the known internal sorting methods in terms of parameters describing the file to be sorted and the computer time required for basic sorting operations. Procedures are described for using these estimates to compare the efficiency of various methods, and results of several comparisons are given. A procedure to determine optimum combinations of sorting methods is shown. An expression for tape time required by a balanced merge sort is obtained, and difficulties in obtaining such expressions for cascade, polyphase and oscillating sorts are discussed.

Design and characteristics of a variable-length record sort using new fixed-length record sorting techniques

This paper describes the application of several new techniques for sorting fixed-length records to the problem of variable-length record sorting. The techniques have been implemented on a Sylvania 9400 computer system with 32,000 fixed-length words of memory. Specifically, the techniques sequence variable-length records of unrestricted size, produce long initial strings of data, merge strings of data at the power of T - 1, where T is the number of work tapes in a system, and do not restrict the volume of input data.

Shape the Future of Computing

ACM encourages its members to take a direct hand in shaping the future of the association. There are more ways than ever to get involved.

Get Involved

Communications of the ACM (CACM) is now a fully Open Access publication.

By opening CACM to the world, we hope to increase engagement among the broader computer science community and encourage non-members to discover the rich resources ACM has to offer.

Learn More