Opinion
Computing Profession Opinion

We Need to Focus on How Our Data Is Used, Not Just How It Is Shared

Toward better understanding of the different ways data circulates between users and platforms.

Posted
people on a walkway featuring colored bands of light, illustration

Early visions of the Internet imagined it as a safe haven, a democratic and anonymous space where information could flow freely. A few decades later, it has become a place where huge economic and political power are concentrated in the hands of a few. Financial incentives drive a complex data ecosystem finely tuned to harvest every possible detail about us: our shopping habits, social and romantic relationships, finances, physical and mental health, religious beliefs, political views, and much more. This data is used to curate services and experiences as bait for our attention, which is then sold for a profit.

The dangers of our digital world have become increasingly clear in recent years. The Cambridge Analytica scandal and testimony from Facebook whistle-blower Frances Haugen showed how social media platforms drive political polarization, promote misinformation, and allow hate speech to fester.4 Researchers have revealed gender discrimination in Facebook’s job ads and racial bias in ads served up during Google searches.3,8 A study of Twitter data found Twitter’s content algorithm favors right-leaning news sources.2

Today’s data ecosystem endangers privacy and personal autonomy, sharpens political divisions, reinforces societal biases, amplifies misinformation, and undermines democratic institutions—issues new generative AI tools are expected to dramatically exacerbate. While these problems have been widely discussed, the search for solutions often flattens a wide variety of complicated issues under the single rubric of individual privacy rights. Accordingly, many government regulations and technical advances have focused on giving users more control and consent over how their data is gathered and shared.1

Addressing privacy is important and there is still much to be done on both the technical and regulatory fronts. But we also need to tackle the other side of the equation—how companies use our data to shape the information and opportunities they provide to us. We need data protections that focus on the information we receive (curated content, recommendations, ads, and the like—what we call the incoming vector) in addition to the data that we share (social media posts, our location, what content we consume, and so much more—the outgoing vector).

Here, we offer a more precise framework for understanding the different ways data circulates between users and platforms. We argue government regulation and platform self-regulation are insufficient to address ecosystem challenges, and present a vision of a new class of entities we call data cooperatives to provide collective, comprehensive solutions that address both vectors.

Back to Top

The Limits of Privacy

Within the data ecosystem, information is constantly flowing in many directions. Discussions about data protection tend to focus on the outgoing vector—the data that flows out from individuals, wittingly or unwittingly, and is collected by external actors. But the incoming vector is just as important.1 Companies use the masses of personal data they have collected to curate and tailor the information they serve each of us, whether a product suggestion, a job opportunity, a news item, a political ad, a mortgage offer, or a potential romantic partner. In the process, the incoming vector shapes how each of us sees and interacts with the world.

The dominant paradigm for understanding and managing our data ecosystem is individual privacy, a framework that largely produces interventions targeting the outgoing vector by attempting to manage what information is collected and shared, by whom, under what conditions. Technical advances including differential privacy9 and secure multiparty computation7 help protect users’ anonymity and give them more control with the data they share. Government regulations including the European Union’s General Data Protection Regulation (GDPR) and the California Consumer Privacy Act (CCPA) in the U.S. codify the idea individuals have certain rights regarding their data, such as the right to limit access to it, to know who has obtained it, or to have it erased.


Treating data protection solely as an issue of individual property rights is like trying to combat air pollution by focusing only on your own backyard.


However, these regulations have much less to say about how companies use the data they gather from masses of users to shape the information they present to us. We all deserve to protect our data and control who it is shared with, but the incoming vector raises concerns that go far beyond privacy and have implications for our entire society. Treating data protection solely as an issue of individual property rights is like trying to combat air pollution by focusing only on your own backyard. It transfers responsibility for the harms of the data ecosystem to individuals without giving them the information or tools to enact change.

Back to Top

The Incoming Vector Is a Collective Problem that Demands a Collective Solution

The incoming vector is created by aggregating data from millions of people and using it to tailor the information and opportunities served back to users. The harms that result—from misinformation to ideological fragmentation to algorithmic discrimination—are systemic and collective. These problems are nearly impossible for any of us to solve on our own.

Even if each of us had complete control over what data we share (that is, privacy protections along the outgoing vector), we could not, as lone individuals, understand or manage the incoming vector. For example, even if a woman hunting for jobs online were provided with adequate privacy and security for the information she reveals in the process (along the outgoing vector), this does nothing to ensure the opportunities she is presented with (along the incoming vector) are free from discrimination.

Individuals only experience their own very tiny, personal slice of the incoming vector; we do not see what other users are being shown. The woman hunting for jobs will not know whether the opportunities she is presented differ from what she would see if she were a man. Once she applies for a job, companies may use automated tools to filter and rate her application, but she will have no window into how her outcomes differ from those of a similarly qualified man.

Furthermore, there is little we can do as individuals to reduce the curation and tailoring of the information and opportunities presented to us online. Algorithms infer information about the job applicant not just from what she herself shares, but from anything former colleagues, social media connections, employers, and institutions have shared that the algorithms deem relevant. This information is combined with data from millions of others to make predictions about whether she will be a good employee.

Our data exists within a network. Even if we exercise all the controls at our disposal over our individual node in the Web, we cannot extricate ourselves from the actions of others.

Back to Top

Platforms Will Not Save Us

In our existing data ecosystem, platforms only offer some degree of insight into and control over the incoming vector. A job-hunting site is best positioned to analyze and tweak how its algorithms tailor job ads based on characteristics such as a person’s gender. Similarly, social media companies have singular access to the data needed to understand how our newsfeeds are personalized for different audiences.

However, platforms’ business models are built around personalization of the incoming vector. Their financial interests and obligations to shareholders may not be aligned with curbing harms along the incoming vector. Platforms such as Facebook and YouTube have been criticized for spreading public health misinformation and political fragmentation, but pushing incendiary content is a natural outcome for algorithms designed to capture user attention.

While platforms publicly gesture toward improving data protection, they hamper internal ethics teams and even obstruct outside researchers, such as NYU’s Ad Observatory Project, from reviewing how their incoming vectors operate. We cannot rely on platforms to self-regulate. In the testimony from Facebook whistleblower Frances Haugen last year, we saw clearly how the company identified harms caused by its algorithms yet chose not to reveal or act on them. We need independent actors who can get a high-level view into how platforms are shaping the incoming vector and take collective action to manage it.6

Back to Top

Data Cooperatives as a Collective Solution

Platforms will not save us and individual users cannot manage the incoming vector on their own. We need solutions that can see the data landscape as a whole and act on behalf of masses of users. We propose the creation of data cooperatives: a new, independent entity that would represent groups of users and could monitor (and potentially manage) data flows between users and platforms.5

Individuals could choose to join a co-op to represent, protect, and advocate for their individual and collective interests. These data cooperatives would serve as an intermediary between users and platforms, giving them the needed bird’s-eye view into the content being presented across large numbers of users. This would increase transparency and accountability for what platforms are doing with the incoming vector. In the case of job searches and hiring, for example, data cooperatives might be able to see how a platform is presenting opportunities to their members and whether there are differences based on characteristics like gender and race. (A number of recent initiatives and policy proposals have started to push for such transparency; one encouraging example is the Mozilla Rally project.)

Data cooperatives could also aggregate the power of individual users and serve as a sort of union that negotiates with platforms on behalf of users. In the current relationship, individual users have almost no power when up against massive tech companies. They can agree to platforms’ existing terms of services or be locked out of using their products. By representing groups of users, data coops would help equalize the balance of power in our data ecosystem.


Our data exists within a network. Even if we exercise all the controls at our disposal over our individual node in the Web, we cannot extricate ourselves from the actions of others.


A data co-op could negotiate or limit the types of data collected and used and the kinds of personalization applied to co-op members. It could also push for strong data protection technologies, such as differential privacy or secure multiparty computation. With job searches, for example, the data co-op could negotiate with major platforms about the terms of service that govern how job ads are curated for the co-op’s members and what personal characteristics can and cannot be used. As leverage, the co-op could alert members (as well as regulators and analysts) to any potential biases they detect, provide ratings of platforms based on how well they meet the conditions agreed upon by members, or steer its members away from platforms that don’t meet minimum standards for security and fairness.

Data cooperatives would not replace the need for government regulation or platform accountability, but rather would complement it. Policy interventions might establish standards that platforms would aim to adhere to and that co-ops could detect and enforce. By assigning these responsibilities to a third party, we avoid the problems that could arise from the government having access to and oversight of the masses of personal data collected by platforms, something it also does not have the technical capability to do. In addition, coops could be nimbler than government in responding to technical advances and the changing needs of users.

Back to Top

Shaping the Future

Tech giants use personalization to optimize their revenue. As a side effect, they shape our worldviews, opportunities, and social interactions, often in hidden ways. The current response, an individualistic framework that focuses on privacy, cannot effectively address the systemic harms created along the incoming vector, from algorithmic discrimination to ideological bubbles. Efforts to protect individual privacy must be complemented by collective solutions like data co-ops that can respond to problems in the data ecosystem that are invisible and intractable from the individual perspective.

The development of a collective solution such as data co-ops raises many exciting challenges that must be addressed jointly through technology and policy. What should these entities look like and how will they work? What technological, legal, and governance frameworks are needed to make them a reality? Technologists will need to develop secure, privacy-preserving tools to measure and manage the effects of personalization on the incoming vector. Policymakers and leaders will need to put in place frameworks that help individuals and groups understand and negotiate the trade-offs involved in the use of our data.

Now is the time for policymakers and technologists to collaborate to build solutions that can restore our confidence in the Internet as a source of information and opportunities.

 

    The authors co-lead the Data Co-Ops Project (https://www.datacoopslab.org/), a cross-disciplinary initiative rethinking the structure of the data ecosystem by designing a new institution, the data cooperative, to mediate between users and platforms. This work is supported in part by a gift to the McCourt School of Public Policy and Georgetown University. Any opinions, findings, conclusions, or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of their employers or funders. Conflict disclosure: Ligett and Nissim are currently visiting faculty at Google.

Join the Discussion (0)

Become a Member or Sign In to Post a Comment

The Latest from CACM

Shape the Future of Computing

ACM encourages its members to take a direct hand in shaping the future of the association. There are more ways than ever to get involved.

Get Involved

Communications of the ACM (CACM) is now a fully Open Access publication.

By opening CACM to the world, we hope to increase engagement among the broader computer science community and encourage non-members to discover the rich resources ACM has to offer.

Learn More