Selling personal information is very different from selling physical goods, and raises novel challenges. On the sell-side of the market, individuals own their own personal data and experience costs based on the usage of their data insofar as that usage leads to future quantifiable harm. On the buy-side of the market, buyers are interested in "statistical information" about the dataset, that is, aggregate information, rather than information derived from a single individual. Differential privacy1 provides a means to quantify the harm that can come to individual data owners as the result of the use of their data. This ability to quantify harm allows for data owners to be compensated for the risk they incur. Past work studying markets for private data focused on the simple case in which the buyer is interested in only the answer to a single linear function of the data,2,3,4,6 which makes the buy-side of the market particularly simple.
The following paper introduces a fascinating and complicated issue that arises on the buy-side of the market when buyers are interested in multiple linear functions of the same dataset. Information exhibits complementarities: given some information about a dataset, it is possible to learn other things about the dataset. This means that when pricing information, there might be opportunities for arbitrage: rather than directly buying the answer to the query he is interested in, the buyer might instead more cheaply buy a bundle of queries that lets him deduce the answer he is interested in. The authors give conditions under which a pricing is arbitrage free. This is a compelling condition to ask for: it means that it is a dominant strategy for arriving buyers to faithfully request the answer to the query they are interested in, rather than trying to game the system. By asking for arbitrage-free pricings, the authors are making the market safe for buyers.
Reasoning about these arbitrage opportunities can be complicated: if the values of the purchased linear functions were revealed exactly, then the answer to any other query in the span of the purchased queries would be derivable. But to guarantee the sellers differential privacy, it is necessary to sell only noisy estimates of the data. This makes reasoning about what is derivable complex. Sensibly, since they are introducing a new problem, the authors opt to study a restricted notion of derivability and arbitrage. They give pricings that rule out arbitrage opportunities when the buyer is only allowed to learn by taking linear combinations of observed queries, is interested only in unbiased estimates of query values, and will attempt arbitrage only at the level of one query at a time. Because of the richness of the authors' problem, one of the most exciting aspects of this work is the doors it opens for future exploration. Here, I will highlight what I think are the most interesting problems coming out of this paper:
This paper opens a rich research direction. I recommend that new Ph.D. students (or anyone looking for an attractive problem) read it.
To view the accompanying paper, visit doi.acm.org/10.1145/3139457
The Digital Library is published by the Association for Computing Machinery. Copyright © 2017 ACM, Inc.
No entries found