Network measurement—because it is typically at arm’s length from humans—does not comfortably fit into the usual human-centered models for evaluating ethical research practices. Nonetheless, the network measurement community increasingly finds its work potentially affects humans’ well-being and itself poorly prepared to address the resulting ethical issues. Here, we discuss why ethical issues are different for network measurement versus traditional human-subject research and propose requiring measurement papers to include a section on ethical considerations. Some of the ideas will also prove applicable to other areas of computing systems measurement, where a researcher’s attempt to measure a system could indirectly or even directly affect humans’ well-being.
Key Insights
- The network measurement community increasingly finds itself ill-equipped to deal with the implications of the potential effects its experiments can have on human well-being.
- We aim to minimize the risk of inflicting harm.
- We advocate exposing ethical thinking through a published “ethical considerations” section included in all empirically based papers.
A conference program committee (PC) is usually the first outside independent organization to evaluate research work that measures network systems. In recent years, questions about whether the work submitted follows sound ethical practices have become increasingly common within PC discussions. We have experience with this situation as researchers and as members and leaders of PCs struggling with ethical concerns.
The fundamental cause of this struggle is that the network measurement community lacks a set of shared ethical norms. Historically, measurements of computing and communications systems have not been viewed as affecting humans to a degree that they require ethical review. Indeed, ethics review boards often declare measurement research “exempt” from full review as not involving human subjects; Burnett and Feamster4 described such an experience in 2015. Beyond the need to protect the privacy of communications content, researchers lack consensus about how to ethically handle even the most basic situations. Authors often work from one set of ethical notions, while a PC applies one or more different sets of ethical notions as part of its review. This divergence leaves well-meaning researchers—in all roles—on fundamentally different pages. The situation is further exacerbated because the network measurement community lacks a culture of describing the ethical reasoning behind a set of experiments. It also leaves PCs to infer the ethical foundations on which a paper is based, while precautions taken by careful researchers are not exposed to others who may leverage or build on previous techniques in subsequent work.
In this article, we advocate requiring an “ethical considerations” section in measurement papers as a first step toward addressing these issues. By requiring such a section—even if the result is a statement that there are no ethical issues—we provide the starting point for a discussion about ethics in which authors have a chance to justify the ethical foundations of their experimental methodologies and PC members can review the authors’ perspective and provide specific feedback, as necessary. Further, by including these sections in published papers, the entire research community would begin to develop a collective understanding of both what is ethically acceptable and how to think through ethics issues.a
Our aim here is to present an initial straw man for discussing ethics. We do not attempt to prescribe what is and what is not ethical. We do not tackle all possible ethical questions that arise in our work as Internet empiricists. Rather, we advocate for a framework to help the measurement research community start an explicit conversation about the largest ethical issues involved in measuring networked systems (such as the Internet, cloud computing systems, and distributed transactions systems).
Background
Three strands of intellectual activity come together when examining ethics and network measurement.
Evolution of the field of ethics. The study of ethics in computing has evolved as the capabilities of computer systems have evolved.
Evolution of our ability to extract information from measurement data. Developing an empirical understanding of network behavior has been a pillar of network research since its earliest days. The area has steadily improved, refining its tools to extract ever more information from measurements, such that longtime assumptions about what information can be extracted from a measurement often no longer hold.
The law. The legal issues surrounding network measurement are at best murky in any single jurisdiction,15 since there is little case law to establish how courts will interpret communication systems law within the context of modern data networks; such issues multiply when a measurement study crosses (many) jurisdictions. We encourage researchers to consult their local counsel when legal questions arise. However, for the purposes of this article our focus is on ethical issues; we mention legal issues only when they help illuminate the ethical issues.
Ethics. The study of ethics in information and communication science has broadly followed two (overlapping) lines of inquiry. The first focuses on human-centered values (such as life, health, security, and happiness). This thinking originated in the 1940s with Norbert Weiner and has been carried down to the present. A more recent expression is the 2012 Menlo Report by the U.S. Department of Homeland Security,7 focusing on potential harm to persons, either through revealing confidential information or altering their environment and ensuring the risks of harm from the experiment are recognized, moderated, and equitably distributed (such as in seeking to ensure that those persons whose environment is altered by the experiment are also persons who are likely to benefit from the experimental results).
The second line of ethical thinking has focused on the professional responsibility of the computing and information sciences professional. The focus has been on following good industry practices in the creation of software artifacts and codes of conduct that outline a professional’s responsibilities to society, employers, colleagues, and self. A detailed expression of this thinking is the joint IEEE/ACM Software Engineering Code of Ethics and Professional Practice, which identifies eight principles and presents more than 80 distinct admonitions.1
Both approaches concern the effect of one’s work on other humans and systems that directly interface with humans.
Network measurements, and many other types of system measurement, are usually at least one step removed from directly interfacing with humans. Intuitively, probing a network or counting hits in a cache does not affect humans. Nonetheless, measurement work can affect humans, and we focus here on measurements where the effect, however indirect, can be envisaged. This focus means we do not discuss ethical issues where the harm, to first order, might come to vendors, systems, or intellectual-property-rights owners.
Evolution of network measurement. The field of network measurement—broadly defined—is relatively old. As best we can tell, beginning with the electronic telegraph, all networks have been the subject of various forms of measurement. We briefly trace the evolution of the field both technically and in a legal and ethical context and finish with some observations.
Technical evolution of measurement. Measuring a communications network and analyzing the results has been a staple of (data) communications research from its inception. By 1911, AT&T had a statistical group that, among other functions, leveraged measurement to better engineer the telephone system and predict demand. When the ARPANET (forerunner of the Internet) was turned on in 1969, its first node was installed at UCLA for the use of Leonard Kleinrock’s measurement group.
Measurement can be passive or active. Passive measurement simply observes in-situ traffic. Active measurement injects new traffic to observe a system’s response. Given that networks are digital systems, built according to standards, one might imagine that examining network traffic (passively or actively) is largely an exercise in detecting bugs. In reality, the interactions of traffic in networks give rise to complex patterns. Furthermore, because the communications infrastructure is distributed, the interaction of delays between components and routine failures can lead to complex phenomena. Finally, variations in how specifications are implemented can lead to interesting interactions.
Examples of important research results from passive monitoring include methods for ensuring sequence numbers are robust against device crashes,17 the discovery of self-similarity in network traffic,11 and methods to avoid the self-synchronization of network traffic.8 Examples from active probing include measurements to develop the Network Time Protocol (which keeps distributed clocks synchronized)12 and the study of network topology.21
Ethics and law of measurement. Much of our legal, social, and ethical dialog about network measurement uses legal terminology that was developed in the early days of measurement. Specifically, the ethics and legality of network measurements are often evaluated with the implicit assumption that the only parties allowed to capture data outside a workplace campus are communications companies providing service and government agencies given access to communications companies’ data centers; see, for instance, the U.S. Code.19 Further, a typical formulation distinguishes between two classes of data, as follows.
The first class of data reveals when and for how long two parties communicated. U.S. law defines a device capable of capturing such data as a “pen register.” More recently, the term “metadata” has been used to describe an expanded set of information, including packet headers. The U.S. government has suggested metadata is comparable to pen register data.20
The second class of data reveals the contents of the conversation. To highlight the distinction, consider a phone call to a bank. A pen register records that a call took place at a specific time and for a specific duration. The contents of the conversation would reveal that the call was a balance inquiry. U.S. law has recognized, since 1967, that the content of a conversation is a distinct class of information that has a higher expectation of privacy,18 and this distinction between content and metadata is often carried over into ethical discussions.
Metadata is becoming content. A variety of factors has eroded the distinction between content and metadata. Specifically, researchers’ ability to leverage metadata to infer—or even recreate—content is increasing rapidly.
Strictly speaking, active measurements have the potential to inflict direct and tangible harm.
Several examples illustrate this point:
Traffic tables. Measuring when devices in a network transmit is sufficient to derive traffic tables that can distinguish routers from end systems and identify what nodes are communicating with each other.5
Behavior of queues. The Queue Inference Engine takes information about transactions (such as pen register style data) and reverse engineers it to determine the behavior of queues.10 Researchers have made steady progress in using techniques like the Queue Inference Engine to characterize queues from metadata. For instance, a researcher can tell whether and for approximately how long a person likely waited in line at a bank ATM machine by tracking when transactions at the machine started and ended.2
Gaps between transmissions. Inter-packet gaps (a form of metadata) between encrypted transmissions can help infer where users’ fingers were on the keyboard and thus give guidance about what letters are in their passwords.16
Packet headers. In some cases, it is possible to determine what words are being said in an encrypted voice conversation simply by looking at packet headers.22
That is, with less data than a pen register would collect, a researcher is often able to determine that the call to a bank was, say, a balance inquiry. Furthermore, users and researchers alike should expect the distinction between metadata to data to continue to erode over time.
The Contours of Harm
While myriad ethical issues confront researchers conducting network measurements, our aim here is to address those causing tangible harm to people. We are not concerned with notions of potential harm to network resources (such as bandwidth) or equipment, except insofar as the effect on resources and equipment could cause tangible harm to a human. How a researcher’s work affects individual human beings is the most important ethical issue.
Additionally, our goal, which mirrors the Menlo Report, is not to eliminate the possibility of harm within experiments. Rather, we aim to minimize the risk of inflicting harm. In this context we make several observations bearing on how researchers should manage risk in their experiments:
A spectrum of harm. “Harm” is difficult to define. Rather than a precise definition we offer that a single probe packet sent to an IP address constitutes at most slight harm.b Meanwhile, a persistent high-rate series of probes to a given IP address may well be viewed as both an attack and cause serious harm, as in unintentionally clogging a link precisely when the link is needed for an emergency. These ends of the spectrum are useful as touchstones when thinking about how to cope with the risk involved in specific experiments.
Indirect harm. We also recognize that the field of network measurement focuses on (for the most part) understanding systems and not directly assessing people. Any effect on people is thus a side effect of a researcher’s measurements. While researchers must grapple with the ethics of harm due to their measurements regardless of whether the harm is direct or indirect, the nature of the harm can sometimes dictate the manner in which researchers cope.
Potential harm. Note most often the research does not cause harm but rather only sets up the possibility of harm. That is, additional events or factors beyond the measurements must happen or exist for actual harm to be inflicted. Again, this does not absolve researchers from understanding the ethical implications of their experiments but does speak to how they may manage the risk involved in conducting a particular experiment.
Direct consent is not possible in most Internet measurements; the community of measurement researchers thus needs to cope with ethical challenges without relying on consent.
While fuzzy, these aspects of “harm” present the broad contours of the issues with which researchers must grapple. Further, there is no one-size-fits-all way to manage harm, and we encourage honest disagreement among researchers about when potential and indirect harm rises to the level of making an experiment problematic. For instance, in the context of the example described earlier about probes causing slight vs. serious harm, we privately discussed whether periods of high-rate transmissions could be made short enough to reasonably be felt to avoid potential harm. We agreed it was possible but disagreed about when the experiment transitioned from slight harm to potentially serious harm.
Collecting Data
Strictly speaking, active measurements have the potential to inflict direct and tangible harm. Passive measurements, by their nature, are simply recordings of observations and in no way directly change—benignly or harmfully—the operation of the network. Likewise, downloading and (re)using a shared dataset does not alter a network’s operation—even if collecting it in the first place did. Previously collected data brings up thorny issues of the use of so-called “found data.” For instance, consider the Carna botnet,3 which leveraged customer devices with guessable passwords allowing illicit access and was used to take measurements that were publicly released. If a paper submission’s methodology section were to say, “We first compromised a set of customer devices,” the paper would likely be rejected as unethical, and probably illegal. However, if, instead, a researcher simply downloaded this data—causing no further harm to the customer devices or their users—would it be ethical to use it as part of one’s research?
On the one hand, a researcher can make the case that any harm done by collecting the data has already transpired and thus, by simply downloading and using it, the researcher is, in fact, causing no harm. Further, if the data can provide insights into the network, then perhaps the research community can view using the data as making the best of a bad situation. Alternatively, the research community could view the use of such data as a moral hazard.
The use of data whose collection was objectionable is an open one in the medical community, as in Mostow.13 The measurement community needs to find its own answers. There are likely different answers for different situations. For instance, a public dataset obtained through unethical means (such as the 2015 leak of the Ashley Madison dating website dataset) may be viewed differently from a non-public dataset that happens to have been leaked to a researcher. The research community may view the first case as less problematic because of the reach of the data release, whereas in the latter case the community may decide the researcher is more culpable because, if not for the researcher’s work, less would be known about the (potentially harmful) dataset. We encourage researchers to be thoughtful about the ethical issues related to the sources of their data.
Storing Data
The measurement community generally encourages the preservation of measurement data to facilitate revisiting it in response to questions or concerns during the initial work, to look at new research questions later, or to facilitate historical comparisons. Furthermore, the community encourages researchers to make their data public to better enable access to other researchers, as in CAIDA’s Data Catalog (DatCat; http://datcat.org/), a National Science Foundation-sponsored repository of measurement data. Preserving and publishing measurement data raises a number of ethical issues; we highlight two in the following paragraphs.
First, how does a researcher determine if a dataset can ethically be made public? There are plenty of examples of successful data de-anonymization.14 As discussed earlier, a researcher’s ability to extract information from seemingly innocuous data continues to improve. As an example, datasets published in the 1980s and early 1990s could likely be mined for passwords using packet-timing algorithms published in 2001.16,c
Second, if the data cannot be made public, but is retained, what safeguards does the community expect the researcher to implement to avoid accidental disclosure? For instance, should the community expect all data stored on removable media to be encrypted? Should the data also be encrypted on non-removable disks? Should the rules vary according to the perceived sensitivity of the data?
It is not reasonable to expect researchers to anticipate all future analysis advances. However, it is reasonable to expect researchers to understand how current techniques could exploit their measurement data and expect them to provide appropriate safeguards.
On the Limitations of Consent
One traditional way of dealing with ethical issues in experiments is to require (informed) consent from participants. This approach allows the people who could potentially be harmed by an experiment to weigh those potential harms against the potential benefits and directly decide whether or not to participate. In some cases, Internet measurement can (and does) leverage consent. For instance, the Netalyzr measurement platform9 aims to assess a user’s Internet connection by providing a webpage the user must purposefully access. Further, the web-page spells out what will happen and requires the user to explicitly start the measurements—hence consenting.
The Netalyzr situation is akin to the consent model in more traditional areas (such as medicine) and works well. However, in other settings, obtaining informed consent for large-scale Internet measurements is significantly more difficult. Consider a study of end-user networks that uses a different methodology from the one in Netalyzr. Dischinger et al.6 used various tests to probe IP addresses they believed represented home networks unbeknownst to the users. These tests provided a large-scale dataset that Netalyzr cannot match but without the consent of the potentially affected people.d
Consent in Internet-scale measurements is difficult for two reasons. First, unlike, say, medical experiments, it is often unclear who is being measured and affected by Internet measurements. Further, even if the affected human participants could be identified, the logistics of obtaining consent range from significantly difficult to impossible.
In more traditional areas of experimentation involving humans, proxy consent is generally not allowed, but in network measurements we lean on this mechanism. For instance, network measurements taken on a university campus typically seek consent from the university. However, probes sent off-campus might affect third parties with no connection to the university. While proxy consent can thus foster useful review to help identify and mitigate ethical issues, some potentially affected users are not covered directly or represented by an advocate.
There are thus cases where Internet measurements can leverage consent, and we encourage researchers to do so in these situations. However, direct consent is not possible in most Internet measurements; the community of measurement researchers thus needs to cope with ethical challenges without relying on consent.
Proposal: An Ethics Section
Measurement researchers lack norms or examples to guide them. Our position is twofold: as a community we are not able to prescribe ethical norms for researchers to follow, and the best starting approach is to expose ethical thinking through a published “ethical considerations” section in all empirically based papers. This approach serves three main goals:
Recognize ethical implications. While some researchers are careful to understand the ethical implications of their work, such care is not universal; the first goal of an “ethical considerations” section is thus to force all authors to publicly examine the ethical implications of their own work.
Give explicit ethical voice. Rather than counting on PCs and editors to impute the ethical foundations of a piece of work, an ethical considerations section will give explicit voice to these issues; reviewers will be able to directly evaluate the stated ethical implications of the work and give concrete feedback to the authors, grounded in the authors’ own approach.
Create public examples of good ethics. Ethics sections are not usually required by conferences and, if they are, are typically addenda to the paper seen by the PC and not published.
Public ethics sections in papers will foster a conversation among measurement researchers based on published exemplars, leading the community toward norms.
Here, we outline four strawman questions authors should answer in such an ethical considerations section. We aim for a short list of questions, believing that capturing 80% of the ethics issues is better than a longer list that is still not exhaustive:
For datasets directly collected by the author(s), could the collection of the data in the study be reasonably expected to cause tangible harm to any person’s well being? If so, discuss measures taken to mitigate the risk of harm.
For datasets not directly collected by the author(s), is there an ethical discussion of the collection elsewhere? If so, provide a citation. If not, the paper should include a discussion of the ethics involved in both collecting and using the data—beyond simply noting that no additional data-collection harm would occur in reusing the data. This is especially important for non-public datasets.
Using current techniques, can the data used in the study reveal private or confidential information about individuals? If so, discuss measures taken to keep the data protected from inappropriate disclosure or misuse.
Please discuss additional ethical issues specific to the work that are not explicitly covered by one of these questions.
These questions do not intentionally address two important items:
Institutional review board. There is no suggestion of when it might be appropriate to consult an institutional review board or similar body. Furthermore, the involvement of such a body (or its non-involvement) is not a substitute for the measurement community’s own ethical review.
Research results. We do not attempt to assess the ethics of the research result. Researchers are committed to advancing knowledge, which, in our view, includes publishing results and techniques that may, if used unethically, cause tangible harm.
Moreover, making ethics a core part of measurement papers will create new challenges for reviewers and PCs alike, including:
Review practices. Review forms likely will have to be updated to ask reviewers to discuss the strengths and weaknesses of the ethics section.
Mechanisms. Various mechanisms will be needed to help reviewers evaluate ethics. Possible mechanisms include ethics guidelines from the program chair, ethics training, or simply an ethics teleconference at the start of the reviewing period. Over time, we hope example published papers will help this process.
Clear philosophy. PCs will need a clear philosophy on when papers are rejected based on ethical considerations and when papers with ethical gaps can be accepted, subject to revision. The questions concerning collection, as discussed earlier, will also come up, and PCs will need to find the measurement research community’s answer(s).
Finally, what does it mean to reject a research paper on ethical grounds? While some papers may be resurrected by revising its analysis to not use an objectionable dataset, the rejection often means the measurements used to support the paper’s research results may have caused harm. That determination may raise questions about how to mitigate the harm and prevent such harm in the future.
Conclusion
We have presented a strawman suggestion that authors of measurement papers include a (short) ethics section in their published papers. Doing so would help identify ethics issues around individual measurement studies in a way that allows PCs to evaluate the ethics of a measurement experiment and the broader community to move toward a common ethical foundation.
Acknowledgments
This article has benefited from many conversations with too many colleagues to name individually. Our thanks to all of them. Bendert Zevenbergen has organized multiple ethics discussions since 2014 in which this work has been discussed. This work is funded in part by National Science Foundation grant CNS-1237265.
Join the Discussion (0)
Become a Member or Sign In to Post a Comment