Do Know Harm: Considering the Ethics of Online Community Research

Strategies for engaging in more ethical research with online communities.

By Stevie Chancellor, Joseph A. Konstan, Loren Terveen, and Svetlana Yarosh

Posted May 7 2024

Example 1: Balancing Global vs. Local Benefits—Linux Kernel Patch Process Vulnerabilities
Example 2: Researcher vs. Community Norm Mismatches in Wikipedia
Example 3: Privacy and Reuse of Publicly Available Sensitive Data—Online Health Communities
Lessons and Plans Going Forward
References
Footnotes

Computing technology is integrated into people’s lives, impacting how we work, learn, play, and connect. As a result, computing researchers are increasingly conducting studies that involve people as research participants.⁴ Oversight mechanisms and boards typically review research studies using well-established ethical standards to ensure individual participants are respected and not harmed. These mechanisms include Institutional Review Boards (IRBs) in the U.S. and Ethics Review Boards in other countries.

Computing technology also forms the infrastructure for social interaction. Research on groups can raise significant ethical issues by disrupting and burdening communities. But review boards are often “silent” about proposed research on online communities because it falls outside their scope as they focus exclusively on research on individual subjects. This leaves computing researchers without mandated oversight of ethical decisions. Some computing professionals are discovering that there is more to ethics than what traditional ethics boards consider—and they are responsible for their research’s ethical consequences.

Drawing on our experiences in human-computer interaction and social computing, we provide three examples of computing research on online communities and community members that raised ethical issues. The resulting tensions include: balancing global benefits of research and local trade-offs, reconciling researcher and community norm mismatches, and violating privacy by reuse of publicly available sensitive data. We identify steps that researchers should take to study online communities ethically.

Example 1: Balancing Global vs. Local Benefits—Linux Kernel Patch Process Vulnerabilities

University of Minnesota computer security researchers hypothesized open source software was vulnerable to malicious patches that could introduce bugs not detected during review. The researchers tested this vulnerability by submitting “hypocrite patches” to the Linux Kernel that hid potentially malicious bugs (but stopped the patch before it could be applied). They did not seek the consent of the code reviewers or community because knowing they were studied may change peoples’ behavior. The local IRB determined (after the fact) that consent was not needed because this was not human subjects research—the researchers were studying the system, not the people.

The Linux community was enraged that the researchers had created a burden for them without consent and benefited only the researchers, not the community. The community was also upset about wasted time and that bogus patches invalidated their metrics for assessing “legitimate” interest in code sections. The community believed the research was pointless because the system was not designed to protect against malicious software patches. Ultimately, the researchers apologized, voluntarily retracted their publication^a and were banned from submitting future patches pending a resolution.

That the researchers acted unethically, however, is not a foregone conclusion. Should researchers avoid revealing potential security dangers for fear of angering a community? Since this research does not fall under IRB requirements, the decision fell to the researchers. If the Linux maintainers ignored a serious vulnerability and the research made the Kernel more secure, this global benefit could justify deception. Yet the Linux community’s anger was clearly justified—research of this type wastes community effort and damages community longevity as non-legitimate contributions can make the community skeptical of new contributions and hostile toward newcomers. The researchers might have obtained representative consent (and cover) by engaging community leaders, which could have mitigated the harm of deception. Even better, engaging community leaders in the initial design of the research could have focused the study on a topic of importance to both parties, bringing benefits to both.

Example 2: Researcher vs. Community Norm Mismatches in Wikipedia

Ethical issues can arise from a mismatch between researcher and community norms. One author began researching Wikipedia in 2006, and his first two studies³^,⁹ went smoothly. The following study planned to interview Wikipedia editors and recruit them through invitations on their user talk pages. The research had IRB approval, and recruitment began.

The response was immediate and negative; some editors viewed this as violating Wikipedia norms, and the study had to be abandoned. Subsequent conversations with editors identified a conflict between Wikipedia and research norms. Wikipedia editors are there to edit Wikipedia. Interaction on article and user talk pages is focused on editing Wikipedia. Activity from user accounts with little Wikipedia history (such as those of researchers) is treated with suspicion. The research practice of randomly selecting editors as participants violated Wikipedia community norms and killed the project.

To learn how to engage the community ethically, researchers on the team worked with Wikipedia editors to develop guidelines for ethically researching the community,¹³ including these best practices:

“Methodology that interferes with the main goals of the encyclopedia is unlikely to get consent.”
“In order not to unwittingly violate community rules or norms, at least one author should have become an editor and learned the culture of the community before starting.”
“Consult with and gain the consent of the community before beginning.” Similar to the previous example, there are ethical challenges in handling these mismatches. In subsequent work,¹¹^,¹² we published our study plans and design ideas on appropriate Wikipedia forums, sought feedback from Wikipedia editors, revised our plans, and research only proceeded when community consensus was reached. This worked in this scenario but may have limitations in other communities that actively resist researcher intervention (for example, communities that propagate misinformation). How should we balance work that may be important for other values and goals that may also be critical of or harmful to community norms or reveal that certain practices are harmful? There are also the cumulative effects of research on a given community damaging the platform over time—one research project that pushes community norms may not be harmful. Still, the cumulative impact of many research projects may damage a platform, its mission, or its community.

Example 3: Privacy and Reuse of Publicly Available Sensitive Data—Online Health Communities

Ethics board oversight is not typically needed for research that gathers publicly available online data, comparable to naturalistic observation in public contexts like a town square. However, ethical tensions arise in online communities focused on sharing health information. Participants share their data as part of seeking and providing advice as they manage complex and stigmatized health conditions. Many communities, such as those for substance use disorder recovery, emphasize personal anonymity while remaining public and accessible to newcomers.⁸^,¹⁰ Narcotics Anonymous states at every meeting that they “are under no surveillance at any time.”⁸

Researchers have used public social media to study online health communities and infer behavior,² such as substance use disorders, mental health crises, and suicidality.¹ This creates ethical tension between the expectations of the community participants and researchers reusing the data for societally positive but unrelated purposes. Research in these spaces may violate trust and social norms, leading members to leave or limit their self-disclosure. This research may harm the community, even if the impact on individuals is minor. In the most extreme case, data can be used for purposes irrelevant to the initial use case. For example, Crisis Text Line was widely criticized for using data from mental health crisis text chats to start a for-profit spinoff to train AI customer service chatbots.⁶

How do we balance getting consent for research at scales where research can chill participation and potentially violate privacy? Consent for each research project is impractical given the large number of community members, research projects, and fluctuations in community membership. Because these consequences are abstract, the trade-offs between research that may benefit the general public and the autonomy and well being of individuals and online communities need to be clarified. Asking people for their preferences about privacy can be challenging when the risks are abstract and difficult to formalize for an individual.

Lessons and Plans Going Forward

These examples illustrate some ethical tensions in researching online communities and highlight the need for researchers to make their own ethical decisions. Even when ethics review boards judge that a research project on online communities is appropriate, the community may disagree, sometimes (Examples 1 and 2) harming the community, reducing trust in the research process (Example 3), and causing the research project to fail.

Researching a community with the approval of all impacted community members (and stakeholders) is impractical. Such an absolute position excludes work that can be ethically executed and is in the interest of the community and the public. Some research opposed by communities has inherent public value (for example, communities encouraging hate speech or self-harm) or cannot include advanced consent. There is often no correct objective decision regarding ethics. Instead, there are trade-offs and tensions between the benefits of the work and potential harm to individuals and the community.

Therefore, we present a set of strategies to engage in more ethical research with online communities. These do not guarantee ethical behavior but can help avoid egregious misunderstandings and improve the chances of conducting ethical research.

Embedding in communities. Researchers and practitioners should participate in, or embed in, the communities they will study to learn community norms. This means getting involved in the community of interest, clarifying when they are doing research, and engaging the community in study planning.

For example, researchers should plan and iterate research designs with the community whenever possible—or at least post research plans publicly to the online community for comment and respond to feedback. When “consensus is reached,” proceed with the research. When it is not appropriate to “participate” in a community (for example, a health community expects participants to have a particular illness), researchers should commit to studying, observing, and giving back to the community over time rather than treating it as a one-time research “site.” This will also help practitioners better understand the nuances of the community, respect their choices, and evaluate when perhaps proceeding with research is a justifiable decision.

Approval from key community leaders. Winning the approval (or at least acknowledgment) of influential stakeholders in an online community involves communicating with moderators or administrators about the research. This is particularly important for experiments or field studies. When leaders say no (as has happened to all authors of this Opinion column),⁵ researchers must consider the trade-offs of moving forward and often choose to conduct research with a different community.

Evaluating dams in research design. Generally, we recommend that research designs (for example, random assignment, deception) cannot be used if they are unacceptable to a community. Value-sensitive design researchers call this a “dam” when there is such strong resistance that it is a barrier.⁷ If a community strongly opposes a research design, computing professionals should not implement it.

However, researchers must engage in thoughtful, ethical evaluation here—problematic communities may resist research (those that protect harmful extreme social viewpoints or racist content), or social values may outweigh other values of autonomy.

Independent ethical advice. Researchers may seek out or create an independent entity to assist them in reasoning about the ethical consequences of their decisions to research and engage with online communities. The SIGCHI Ethics Review executive committee could serve as a model. It provides a team of ethical experts that evaluates research ethics during the conference review process. These boards could be internal or external to a professional’s institution, hold research accountable during publication, assist in ethical thinking around human subjects research, and suggest ways for researchers to pursue the abovementioned ideas. They may also be able to provide legal oversight in industry or non-U.S. contexts.

Online communities are integral to society, powering work and leisure, providing information and disinformation, promoting health, and causing harm. Researching online communities can provide valuable insights into their functioning, human behavior, and the effective design of community tools. It can also be disruptive and stressful to individual community members and endanger the community as a whole. To avoid repeating mistakes, computer science researchers must be equipped to perform this research ethically. This should include expanding the guidance or oversight of ethics review boards and developing appropriate training in engaging with communities. These examples and principles help computing practitioners and researchers conduct ethical research on these communities.

Submit an Article to CACM

CACM welcomes unsolicited submissions on topics of relevance and value to the computing community.

You Just Read

Do Know Harm: Considering the Ethics of Online Community Research

View in the ACM Digital Library

DOI

10.1145/3635303

June 2024 Issue

Published: June 1, 2024

Vol. 67 No. 6

Pages: 35-38

Table of Contents

Join the Discussion (0)

Become a Member or Sign In to Post a Comment

The Latest from CACM

Explore More

BLOG@CACM Oct 31 2025

Minimal Sufficiency: A Principle ‘Similar’ to End-to-End

Micah D. Beck

Artificial Intelligence and Machine Learning

News Oct 31 2025

Why It’s Time to Sunset the Turing Test

Paul Marks

Artificial Intelligence and Machine Learning

BLOG@CACM Oct 30 2025

GenAI for Computing Careers: A Sunny Take

Saurabh Bagchi

Artificial Intelligence and Machine Learning

multiethnic college students with books and laptop computers

Shape the Future of Computing

ACM encourages its members to take a direct hand in shaping the future of the association. There are more ways than ever to get involved.

Get Involved

Communications of the ACM (CACM) is now a fully Open Access publication.

By opening CACM to the world, we hope to increase engagement among the broader computer science community and encourage non-members to discover the rich resources ACM has to offer.

Learn More

Example 1: Balancing Global vs. Local Benefits—Linux Kernel Patch Process Vulnerabilities

Example 2: Researcher vs. Community Norm Mismatches in Wikipedia

Example 3: Privacy and Reuse of Publicly Available Sensitive Data—Online Health Communities

Lessons and Plans Going Forward

Do Know Harm: Considering the Ethics of Online Community Research

DOI

June 2024 Issue

Related Reading

Join the Discussion (0)

Become a Member or Sign In to Post a Comment

Shape the Future of Computing

Communications of the ACM (CACM) is now a fully Open Access publication.