Fifty-three years ago a small team working to automate the business processes of the General Electric Company built the first database management system. The Integrated Data Store—IDS—was designed by Charles W. Bachman, who won the ACM’s 1973 A.M. Turing Award for the accomplishment. Before General Electric, he had spent 10 years working in engineering, finance, production, and data processing for the Dow Chemical Company. He was the first ACM A.M. Turing Award winner without a Ph.D., the first with a background in engineering rather than science, and the first to spend his entire career in industry rather than academia.
Some stories, such as the work of Babbage and Lovelace, the creation of the first electronic computers, and the emergence of the personal computer industry have been told to the public again and again. They appear in popular books, such as Walter Isaacson’s recent The Innovators: How a Group of Hackers, Geniuses and Geeks Created the Digital Revolution, and in museum exhibits on computing and innovation. In contrast, perhaps because database management systems are rarely experienced directly by the public, database history has been largely neglected. For example, the index of Isaacson’s book does not include entries for “database” or for any of the four people to have won Turing Awards in this area: Charles W. Bachman and Edgar F. Codd (1981), James Gray (1988), or Michael Stonebraker (2014).
That’s a shame, because if any technology was essential to the rebuilding of our daily lives around digital infrastructures, which I assume is what Isaacson means by “the Digital Revolution,” then it was the database management system. Databases undergird the modern world of online information systems and corporate intranet applications. Few skills are more essential for application developers than a basic familiarity with SQL, the standard database query language, and a database course is required for most computer science and information systems degree programs. Within ACM, SIGMOD—the Special Interest Group for Management of Data—has a long and active history fostering database research. Many IT professionals center their entire careers on database technology: the census bureau estimates the U.S. alone employed 120,000 database administrators in 2014 and predicts faster than average growth for this role.
Bachman’s IDS was years ahead of its time, implementing capabilities that had until then been talked about but never accomplished. Detailed functional specifications for the system were complete by January 1962, and Bachman was presenting details of the planned system to his team’s in-house customers by May of that year. It is less clear from archival materials when the system first ran, but Bachman tells me that a prototype installation of IDS was tested with real data in the summer of 1963, running twice as fast as a custom-built manufacturing control system performing the same tasks.
The details of IDS, Bachman’s life story, and the context in which it arose have been explored elsewhere.2,6 In this column, I focus on two specific questions:
- Why do we view IDS as the first database management system, and
- What were its similarities and differences versus later systems?
There will always be an element of subjectivity in judgments about “firsts,” particularly as IDS predated the concept of a database management system. As a fusty historian I value nuance and am skeptical of the idea that any important innovation can be fully understood by focusing on a single breakthrough moment. I have documented many ways in which IDS built on earlier file management and report generation systems.7 However, if any system deserves the title of “first database management system” then it is clearly IDS. It became a model for the earliest definitions of “data base management system” and included most of the core capabilities later associated with the concept.
What Was IDS For?
Bachman created IDS as a practical tool, not an academic research project. In 1963 there was no database research community. Computer science was just beginning to emerge as an academic field, but its early stars focused on programming language design, theory of computation, numerical analysis, and operating system design. In contrast to this academic neglect, the efficient and flexible handling of large collections of structured data was the central challenge for what we would now call corporate information systems departments, and was then called business data processing.
During the early 1960s the hype and reality of business computing diverged dramatically. Consultants, visionaries, business school professors, and computer salespeople had all agreed that the best way to achieve real economic payback from computerization was to establish a “totally integrated management information system.”8 This would integrate and automate all the core operations of a business, ideally with advanced management reporting and simulation capabilities built right in. The latest and most expensive computers of the era had new capabilities that seemed to open the door to a more aggressive approach. Compared to the machines of the 1950s they had relatively large memories. They featured disk storage as well as tape drives, could process data more rapidly, and some were even used to drive interactive terminals.
If any technology was essential to the rebuilding of our daily lives around digital infrastructures, it was the database management system.
The reality of data processing changed much more slowly than the hype, and remained focused on simple administrative applications that batch processed large files to accomplish tasks such as weekly payroll processing, customer statement generation, or accounts payable reporting.
Many companies announced their intention to build totally integrated management information systems, but few ever claimed significant success. A modern reader would not be shocked to learn that firms were unable to create systems of comparable scope to today’s Enterprise Resources Planning and data warehouse projects using computers with perhaps the equivalent of 64KB of memory, no real operating system, and a few megabytes of disk storage. Still, even partially integrated systems covering significant portions of a business would have real value. The biggest roadblocks to even modest progress toward this goal were the sharing of data between applications and the difficulties application programmers faced in exploiting random access disk storage.
Getting a complex job done might involve dozens of small programs and the generation of many working tapes full of intermediate data. These banks of whirring tape drives provided computer centers with their main source of visual interest in the movies of the era. Tape-based processing techniques evolved directly from those used with pre-computer mechanical punched card machines: files, records, fields, keys, grouping, merging data from two files, and the hierarchical combination of master and detail records within a single file. These applied to magnetic tape much as they had done to punched cards, except that tape storage made sorting much harder. The formats of tape files were usually fixed by the code of the application programs working with the data. Every time a field was added or changed all the programs working with the file would need to be rewritten. If applications were integrated, for example, by treating order records from the sales accounting system as input for the production scheduling application, the resulting web of dependencies made it increasingly difficult to make even minor changes when business needs shifted.
The other key challenge was making effective use of random access storage in business application programs. Sequential tape storage was conceptually simple, and the tape drives themselves provided some intelligence to aid programmers in reading or writing records. Applications were batch-oriented because searching a tape to find or update a particular record was too slow to be practical. Instead, master files were periodically updated with accumulated data or read through to produce reports. With the arrival, in the early 1960s, of disk storage a computer could theoretically apply updates one at a time as new data came in and generate reports as needed based on current data. Indeed this was the target application of IBM’s RAMAC computer, the first to be equipped with a hard disk drive. A programmer working with a disk-based system could easily instruct the disk drive to pull data from any particular platter or track, but the hard part was figuring out where on the disk the desired record could be found. The phrase “data base” was associated with random access storage but was not particularly well established, so Bachman’s alternative choice of “data store” would not have seemed any more or less familiar at the time.
Without significant disk file management support from the rudimentary operating systems of the era only elite programmers could hope to create an efficient random access application. Mainstream application programmers were beginning to shift from assembly language to high-level languages such as COBOL, which included high-level support for structuring data in tape files but lacked comparable support for random access storage. Harnessing the power of disks meant finding ways to sequence, insert, delete, or search for records that did not simply replicate the sequential techniques used with tape. Solutions such as hashing, linked lists, chains, indexing, inverted files, and so on were quickly devised but these were relatively complex to implement and demanded expert judgment to select the best method for a particular task (see Figure 1).
IDS was intended to substantially solve these two problems, so that applications could be integrated to share data files and ordinary programmers could effectively develop random access applications using high-level languages. Bachman designed IDS to meet the needs of an integrated systems project called MIACS, for Manufacturing Information and Control System. General Electric had many factories spread over its various divisions, and could not produce and support a different integrated manufacturing system for each one. Furthermore, it was entering the computer business, and its managers recognized that a flexible and generic integrated system based on disk storage would be a powerful tool in selling its machines to other companies. A prototype version of MIACS was being built and tested on the firm’s Low Voltage Switch Gear department by a group of systems-minded staff specialists.
Was IDS a Database Management System?
By interposing itself between application programs and the disk files in which they stored data, IDS carried out what we still consider the core task of a database management system. Programs could not manipulate data files directly, instead making calls to IDS so that it would perform the data operations on their behalf.
IDS was designed to be used with a high-level programming language.
Like modern database management systems, IDS explicitly stored and manipulated metadata about the records and their relationships, rather than expecting each application program to understand and respect the format of every data file it worked with. It enforced relationships between different record types, and would protect database integrity. Database designers specified record clusters, linked list sequencing, indexes, and other details of record organization to boost performance based on expected usage patterns. However, the first versions did not include a formal data description language. Instead of being defined through textual commands the metadata was punched onto specially formatted input cards. A special command told IDS to read and apply this information. New elements could be added without deleting existing records. Each data manipulation command contained a reference to the appropriate element in the metadata.
IDS was designed to be used with a high-level programming language. In the initial prototype version, operational in early 1963, this was General Electric’s own GECOM language, though performance and memory concerns drove a shift to assembly language for the application programming in a higher performance version completed in 1964. Calls to IDS operations such as store, retrieve, modify, and delete were evaluated at runtime against embedded metadata. As high-level languages matured and memory grew less scarce, later versions of IDS worked with application programs written in COBOL.
This provided a measure of what is now called data independence for programs. If a file was restructured to add fields or modify their length then the programs using it would continue to work properly. Files could be moved around and records reorganized without rewriting application programs. That made running different application programs against the same database much more feasible. IDS also included its own system of paging data in and out of memory, to create a virtual memory capability transparent to the application programmer.
The concept of transactions is fundamental to modern database management systems. Programmers specify that a series of interconnected updates must take place together, so that if one fails or is undone they all are. IDS was also transaction oriented, though not in exactly the same sense. Bachman devised an innovative transaction processing system, which he called the Problem Controller. The Problem Controller and IDS were loaded when the computer was booted. The Problem Controller and IDS occupied 4,000 words of memory. They took control of the entire computer, which might have only 8,000 words of memory. The residual area in memory was used for paging buffers by IDS’s virtual memory manager.
Requests from users to process particular transactions were read from “problem control records” stored and retrieved by IDS in the same manner as application data records. Transactions could be simple, or contain a batch of data cards to be processed. The Problem Controller processed one transaction at a time by executing the designated application program. It worked its way through the queue of transaction requests, choosing the highest priority outstanding job and refreshing the queue from the card reader after each transaction was finished.
The Problem Controller did not appear in later versions of IDS but did provide a basis for an early online transaction processing system. By 1965 an expanded version of the Problem Controller was built and installed at Weyerhaeuser, on a computer hooked up to a national Teletype network. The system serviced remote users at their Teletypes without any intervention needed by local operators. Requests to process order entry, inventory management, invoicing, and other business transactions were processed automatically by the Problem Controller and application programs.
Bachman’s original version of IDS lacked a backup and recovery system, a key feature of later database management systems. This was added in 1964 by the International General Electric team that produced and operated the first production installation of IDS. A recovery and restart magnetic tape logged each new transaction as it was started and captured database pages “before” and “after” they were modified by the transaction, so that the database could be restored to a prior consistent state if something went wrong before the transaction was completed. The same tape also served as a backup of all changes written to the disk in case there was a disk failure since the last full database backup.
The first packaged versions of IDS did lack some features later viewed as essential for database management systems. One was the idea that specific users could be granted or denied access to particular parts of the database. This omission was related to another limitation: IDS databases could be queried or modified only by writing and executing programs in which IDS calls were included. There was no capability to specify “ad hoc” reports or run one-off queries without having to write a program.a These capabilities did exist during the 1960s in report generator systems (such as 9PAC and MARK IV) and in online interactive data management systems (such as TDMS) but these packages were generally seen as a separate class of software from database management systems. By the 1970s report generation packages, still widely used, included optional modules to interface with data stored in database management systems.
IDS and CODASYL
After Bachman handed IDS over to a different team within General Electric in 1964 it was made available as a documented and supported software package for the company’s 200-series computers. In those days software packages from computer manufacturers were paid for by hardware sales and given to customers without an additional charge. Later versions supported its 400- and 600-series systems. New versions followed in the 1970s after Honeywell bought out General Electric’s computer business. IDS was a strong product, in many respects more advanced than IBM’s competing IMS that appeared several years later. However, IBM machines so dominated the industry that software from other manufacturers was doomed to relative obscurity.
IDS was a strong product, in many respects more advanced than IBM’s competing IMS that appeared several years later.
During the late 1960s the ideas Bachman created for IDS were taken up by the Database Task Group of CODASYL, a standards body for the data processing industry best known for its creation and promotion of the COBOL language. Its initial report, issued in 1969, drew heavily on IDS in defining a proposed standard for database management systems, in part thanks to Bachman’s own service on the committee.4 The report documented foundational concepts and vocabulary such as data definition language, data manipulation language, schemas, data independence, and program independence. It went beyond early versions of IDS by adding security features, including “privacy locks” and “sub-schemas,” roughly equivalent to views in modern systems, so that particular programs could be constrained to work with defined subsets of the database.
CODASYL’s definition of the architecture of a database management system and its core capabilities were quite close to that included in textbooks to this day. In particular, it suggested that a database management system should support online, interactive applications as well as batch-driven applications and have separate interfaces. In retrospect, the committee’s work, and a related effort by CODASYL’s Systems Committee to evaluate existing systems within the new framework,5 were significant primarily for formulating and spreading the concept of a “data base management system.”
Although IBM itself refused to support the CODASYL approach many other computer vendors endorsed the committee’s recommendations and eventually produced systems incorporating these features. The most successful CODASYL system, IDMS, came from an independent software company. It began as a port of IDS to IBM’s dominant System/360 mainframe platform.b
The Legacy of IDS
IDS and CODASYL systems did not use the relational data model, formulated years later by Ted Codd, which underlies today’s dominant SQL database management systems. Instead it introduced what would later be called the “network data model.” This encoded relationships between different kinds of records as a graph, rather than the strict hierarchy enforced by tape systems and some other software packages of the 1960s such as IBM’s later and widely used IMS. The network data model was widely used during the 1970s and 1980s, and commercial database management systems based on this approach were among the most successful products of the mushrooming packaged software industry.
Bachman spoke memorably in his 1973 Turing Award lecture of the “Programmer as Navigator,” charting a path through the database from one record to another.3 The network approach used in IDS required programmers to work with one record at a time. Performing the same operation on multiple records meant retrieving a record, processing and if necessary updating it, and then moving on to the next record of interest to repeat the process. For some tasks this made programs longer and more cumbersome than the equivalent in a relational system, where a task such as deleting all records more than a year old or adding 10% to the sales price of every item could be performed with a single command.
IDS and other network systems encoded what we now think of as the “joins” between different kinds of records as part of the database structure rather than specifying them in each query and rebuilding them when the query is processed (see Figure 2). Bachman introduced a data structure diagramming, often called the “Bachman diagram” to describe these relationships.c Hardcoding the relationships between record sets made IDS much less flexible than later relational systems, but also much simpler to implement and more efficient for routine operations.
IDS was a useful and practical tool for business use from the mid-1960s, while relational systems were not commercially significant until the early 1980s. Relational systems did not become feasible until computers were orders of magnitude more powerful than they had been in 1963 and some extremely challenging implementation issues had been overcome by pioneers such as IBM’s System R group and Berkeley’s INGRES team. Even after relational systems were commercialized the two approaches were seen for some time as complementary, with network systems used for high-performance transaction-processing systems handling routine operations on large numbers of records (for example, credit card transaction processing or customer billing) and relational systems best suited for “decision support” analytical data crunching. IDMS, the successor to IDS, underpins some very large mainframe applications and is still being supported and enhanced by its current owner Computer Associates, most recently with release 18.5 in 2014. However it, and other database management systems based on Bachman’s network data model, have long since been superseded for new applications and for mainstream computing needs.
IDS was a useful and practical tool for business use from the mid-1960s, while relational systems were not commercially available until the early 1980s.
Although by any standard a successful innovator, Bachman does not fit neatly into the “hackers, geniuses, and geeks” framework favored by Walter Isaacson. During his long career Bachman had also founded a public company, played a leading role in formulating the OSI seven-layer model for data communications, and pioneered online transaction processing. In 2014, he visited the White House to receive from President Obama a National Medal of Technology and Innovation in recognition of his “fundamental inventions in database management, transaction processing, and software engineering.”d Bachman sees himself above all as an engineer, retaining a professional engineer’s zest for the elegant solution of difficult problems and faith in the power of careful and rational analysis. As he wrote in a note at the end of the transcript of an oral history interview I conducted with him in 2004, “My work has been my play.”1
When database specialists look at IDS today they immediately see its limitations compared to modern systems. Its strengths are more difficult to recognize, because its huge influence on the nascent software industry meant that much of what was revolutionary about it in 1963 was soon taken for granted. Without IDS, or Bachman’s tireless championing of the ideas it contained, the very concept of a “database management system” might never have taken root. IDS did more than any other single piece of software to broaden the range of business problems to which computers could usefully be applied and so to usher in today’s “digital” world where every administrative transaction is realized through a flurry of database queries and updates rather than by completing, routing, and filing in triplicate a set of paper forms.
Figures
Figure 1. This image, from a 1962 internal General Electric document, conveyed the idea of random access storage using a set of “pigeon holes” in which data could be placed.
Figure 2. This drawing, from the 1962 presentation “IDS: The Information Processing Machine We Need,” shows the use of chains to connect records. The programmer looped through GET NEXT commands to navigate between related records until an end-of-set condition is detected.
Join the Discussion (0)
Become a Member or Sign In to Post a Comment