Terms of Service, Ethics and Bias

Four years ago, researchers at AOL released supposedly-anonymized search logs for more a half-million users. The New York Times then used the data to identify individuals such as Thelma Arnold, who had run searches including "dog that urinates on everything" and "60 single men". AOL moved on, but history repeats itself: two years ago, a little sleuthing de-anonymized a new dataset of Facebook activity, putting hundreds of Harvard students’ lives in the public eye.

Researchers examining social processes online have emailed fake help requests to online groups and posted hundreds of Facebook statuses to friends — maybe to you. Data mining researchers, application builders, and social scientists want to collect and analyze data about your activities online. But when do we cross the line? What if you found out that your friend’s vicious breakup on Facebook was actually just posed as part of an experiment, or that your Facebook wall had been analyzed because one of your friends opted in?

This year at CSCW, beyond keynotes and research papers, ethics in in the air. Saturday saw a full-day workshop on the topic, followed by a lively panel on Monday. The organizers’ ultimate goal is to define a set of ethical guidelines for online research to preempt the inevitable online research calamity closer in scale to the Milgram Experiment or the Stanford Prison Experiment.

One topic of debate is the distinction between Terms of Service (ToS) and ethical boundaries. For example, it’s easy to pull off a completely ethical study or data set that is very much in violation of terms of service: for example, collecting information about Facebook users using an application with their consent. (Facebook disallows the storage of any of its information for more than 24 hours. Empirically, a great many applications do this anyway, but should researchers be held to a higher standard?) Likewise, it’s simple to design legally acceptable research that is nonetheless ethically questionable: a lack of third-party consent means that you can’t save information about a person’s Facebook friends even though Facebook gives out that information freely to its users.

Some open questions:

How are graduate students supposed to plan their research when approval can take months or years, when data is locked up behind mountains of legal forms, or when it might be faster to perform the research in a skunkworks or start-up environment?
How real are the ethical dangers of research on the social web? Our ethical guidelines were largely put in place for medical trials and traumatic psychological experiments, not post-hoc scraping or technological probes. As one panelist wryly put it, "I’ve yet to give anyone syphilis online, though God knows I’ve tried…"
How does informed consent play out online? When Amy Bruckman joined chat rooms and announced she was monitoring them for research purposes, the participants were not happy. Most immediately booted her from the room — one even issued a yo mamma joke in the process. If you build a research application and require an informed consent page during sign-up, Bob Kraut empirically found that 20% of participants dropped out.
A provocative question nobody had a (public) answer for: what research is so important that you would consider breaking Terms of Service to do it?
How can communities that traditionally utilize IRB protocols work through these issues in collaboration with communities that typically have not, such as data mining and security research?
In online communities research, we need to worry not just about danger to the individuals, but about the risks to the community itself. If Facebook revealed that many of its status messages were fakes generated by a researcher studying response rates, Facebook users might lose faith in the site and stop coming back.
How do we deal with multiple populations using our software and participating in our studies? The EU has strict privacy policies; we can’t be sure that minors aren’t participating in our research; more edge cases abound…

Want to become part of the conversation? Leave a comment or email cscwethics@gmail.com.

Michael Bernstein is a PhD student in the Computer Science and Artificial Intelligence Lab at MIT. You should follow him on Twitter.