Computing Applications Research highlights

Technical Perspective: High-Level Data Structures

Posted Dec 1 2012

Article
References
Author

It is a defining moment for many a CS student when they start thinking about data structures relationally; when maps, n-tuples, sets, and bags become more natural terms of discourse than lists, vectors, hash tables, and binary trees. I often see a student’s eyes light up when realizing that what they need is, say, a mapping from some Xs to an unordered set of Ys, with the correctness of the rest of their program unaffected by whether this mapping is implemented by hash tables and lists, trees and vectors, or something else. This difference in thinking is crucial for programming projects with any amount of data structure complexity, from the real world all the way down to university coursework. In a compiler course project, for instance, the difference is striking. Students who manage to grasp the separation of relational and low-level thinking have no trouble adapting to complex requirements. Students who cannot escape their preoccupation with low-level data structures often spend all their time changing the intertwined implementations of a mess of lexically scoped symbol tables, member tables, and global type structures.

This lifting of data structure thinking to the relational level has long inspired computer scientists. Relational databases have given us a vocabulary for data management, but have also hidden the internals of the implementation from the end programmer. General-purpose programming has largely remained low-level. Even when a library offers an abstraction with different implementations—for example, a map that can be implemented via either a hash table or a binary tree—standard relational reasoning is not supported automatically. A map from a pair of X and Y values to Z values is equivalent to a map from X values to maps from Ys to Zs, for instance, but the programmer cannot represent both forms equivalently and has to structure the code for one or the other. As a result, data structure code is almost universally too low-level to be amenable to compiler analysis, for performance or correctness.

Hawkins et al. make a contribution to a long line of work in elevating data manipulation.

In the following paper, the authors aim at elevating data structure programming to the relational realm. The separation of the relational view from the low-level implementation view of the data structure is performed in a disciplined way, and even automated. Given a relational specification of data to be stored, the system derives and verifies a "decomposition:" a combination of data structures that together implement the relational specification. Insertions, deletions, and lookups on the data structure are performed via actions that remain unchanged for different decompositions. Eventually the system automatically produces code that correctly implements the desired query. Key novelties of the approach are the use of functional dependencies over data and a type system that verifies the adequacy of a decomposition for representing the relations and dependencies.

In a greater context, Hawkins et al. make a contribution to a long line of work in elevating data manipulation. The SETL language⁴ pioneered high-level data structures by offering a programming model based on set theory, together with automatic selection of concrete data structures at compile time. Even closer to the authors’ work in terms of programming model, Batory introduced a sequence of specialized languages^1,2,5 exploring the refinement of a relational specification to concrete, intertwined data structures. The program is written in terms of queries oblivious to the specific data structures used. Once the data structures are selected (manually, in a single configuration statement^2,5 or automatically via search in the space of data structures¹), a query is classified as a scan, an ordered range lookup, or a direct map lookup, and efficient code is generated for the given structures. Additionally, other interesting angles on the relational view of data structures have been examined, such as the Collection Programming Language (CPL) by Buneman et al.³ CPL offers declarative whole-structure operations—an approach of great power, but probably more difficult to integrate in general-purpose code with imperative updates and a traversal state.

Accordingly, the work toward relational thinking in data structures is not over, nor is it likely to be over soon. The issue of the "right" high-level programming interface is not settled yet. Furthermore, there is always a tension between programmer control and automation. In some cases there may be a benefit to "cracking open the shell" and obtaining better performance via clever low-level data structure manipulations (for example, employing hybrid structures, such as an "augmented" red-black tree). Regardless of where a language or library draws the line, I hope that upon reading this paper you will agree that the effort to conceptually separate relational from low-level data structure design will be key to the future of high-level programming.

Submit an Article to CACM

CACM welcomes unsolicited submissions on topics of relevance and value to the computing community.

You Just Read

Technical Perspective: High-Level Data Structures

View in the ACM Digital Library

Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and full citation on the first page. Copyright for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and/or fee. Request permission to publish from permissions@acm.org or fax (212) 869-0481.

DOI

10.1145/2380656.2380676

December 2012 Issue

Published: December 1, 2012

Vol. 55 No. 12

Page: 90

Table of Contents

Join the Discussion (0)

Become a Member or Sign In to Post a Comment

The Latest from CACM

Explore More

News Apr 23 2024

Maximizing Power Grid Security

R. Colin Johnson

Security and Privacy

News Apr 18 2024

Keeping AI Out of Elections

Bennie Mols

Artificial Intelligence and Machine Learning

BLOG@CACM Apr 17 2024

Technical Marvels

Herbert Bruderer

Computer History

Shape the Future of Computing

ACM encourages its members to take a direct hand in shaping the future of the association. There are more ways than ever to get involved.

Get Involved

Communications of the ACM (CACM) is now a fully Open Access publication.

By opening CACM to the world, we hope to increase engagement among the broader computer science community and encourage non-members to discover the rich resources ACM has to offer.

Learn More

Technical Perspective: High-Level Data Structures

DOI

December 2012 Issue

Related Reading

Join the Discussion (0)

Become a Member or Sign In to Post a Comment

Shape the Future of Computing

Communications of the ACM (CACM) is now a fully Open Access publication.