Computing Applications Digital government

Table Servers Protect Confidentiality in Tabular Data Releases

Federal statistical agencies must balance concern over confidentiality of data with their obligation to report information to the public. Advances in IT threaten privacy, but new technologies can also protect confidentiality while meeting user needs in innovative ways.

By Alan F. Karr, Adrian Dobra, and Ashish P. Sanil

Posted Jan 1 2003

Introduction
System Design and Prototypes
References
Authors
Figures

Here we describe table servers being developed by the National Institute of Statistical Sciences (NISS) that disseminate tabular summaries of statistical data in response to user queries for marginal sub-tables of a large (for example, 40 dimensions with 4 categories each) contingency table containing counts or sums. Table servers evaluate disclosure risk dynamically, in light of previously answered queries.

The query space Q, which contains all 2^K sub-tables of a K-way table, is partially ordered by set inclusion of attributes in sub-tables. The set R(t) of all tables released through some time t contains direct releases in response to queries and indirect releases (previously unreleased children of direct releases); R(t) is specified by the released frontier RF(t) of its maximal elements.

Underlying dynamic release decisions is a risk criterion RC defined on subsets of Q: at all times the system must satisfy RC (R(t)) ≤ a, where a is a risk threshold set by the operators. A typical risk criterion is accuracy of bounds based on R(t) for sensitive (small count) cells in the full table. Bounds can be computed using integer programming methods, but these techniques do not scale to the table sizes under consideration here. There are also exact techniques for special cases. For example, if the released sub-tables constitute the minimal sufficient statistics of a decomposable graphical model [4], then bounds can be expressed as explicit functions of these sub-tables [2].

Whenever an answered query releases previously unreleased information, other queries become unanswerable. Consequently, at t there is an unreleasable set U(t) of sub-tables whose release would be too risky, with an unreleasable frontier UF (t) of its minimal elements.

Release rules determine which requests for unreleased tables will be fulfilled. The simplest is the myopic rule of releasing T at t as long as RC(R(t) T) ≤ a. To prevent the table server from taking excessively large steps, one can allow only tables adding but one attribute to a previously released table to be eligible for release. To prevent a single user (or a set of colluding users) from driving the table server into a region of Q that suits their needs but not those of other users, release rules can be biased against releases that add large numbers of tables to U(t). Rules can also incorporate the value of releasing T [3, 5].

System Design and Prototypes

A prototype table server, written as a Java application, is shown in the accompanying figure. Its principal strength is the engaging (but non-scalable) visualization of Q, which illustrates the underlying abstractions such as releasable and unreleasable frontiers.

A more powerful table server has been written using the Java 2 Enterprise Edition platform, with query processing performed by Java Servlets and risk computations by native executables. This prototype has been tested on a 14-dimensional, 300,000,000-cell, but extremely sparse, table derived from the Current Population Survey.

Table servers disseminate tabular summaries of statistical data in response to user queries for marginal sub-tables of a large contingency table containing counts or sums. Table servers evaluate disclosure risk dynamically, in light of previously answered queries.

If the requested table lies on or below RF (t), it is provided immediately, ordinarily via downloaded XML (HTML tables and text are alternatives). Releases are governed by the myopic and at most one step away from R(t) rules, and disclosure risk is evaluated in real time. The query history database, with tables for users, queries and the time trajectories of RF (t) and UF (t), is maintained in a MySQL database server. A frontier display facility monitors evolution of RF (t).

The system employs data structures based on hash tables for storing tables and algorithms that exploit sparsity and the fact that R(t) and U(t) are characterized completely by RF (t) and UF (t). The risk criterion is narrowness of cell bounds computed via a generalized shuttle algorithm [1].

Figures

Figure. Java table server prototype. The visualization of the query space Q shows direct releases (yellow), indirect releases (blue), unacceptably risky (red), and the potential effect (dark blue, magenta, and dark red) of releasing the 5-way table indicated by the cursor. The released (unreleasable) frontier lies at the top of the lower-left (bottom of the upper-right) portion of the visualization.

Submit an Article to CACM

CACM welcomes unsolicited submissions on topics of relevance and value to the computing community.

You Just Read

Table Servers Protect Confidentiality in Tabular Data Releases

View in the ACM Digital Library

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee.

DOI

10.1145/602421.602451

January 2003 Issue

Published: January 1, 2003

Vol. 46 No. 1

Pages: 57-58

Table of Contents

Join the Discussion (0)

Become a Member or Sign In to Post a Comment

The Latest from CACM

Explore More

News May 20 2024

COBOL Programmers are Back In Demand. Seriously.

John Delaney

Computing Profession

BLOG@CACM May 16 2024

When Colorless Green DNNs Sleep Furiously in an Unexplainable Fantasy

Walid Saba

Architecture and Hardware

News May 14 2024

Assistive Robotics Takes a Step Forward

Samuel Greengard

Architecture and Hardware

Credit: Wandercraft feet of the Wandercraft exoskeleton worn by a person climbing a step

Shape the Future of Computing

ACM encourages its members to take a direct hand in shaping the future of the association. There are more ways than ever to get involved.

Get Involved

Communications of the ACM (CACM) is now a fully Open Access publication.

By opening CACM to the world, we hope to increase engagement among the broader computer science community and encourage non-members to discover the rich resources ACM has to offer.

Learn More